Conservation and innovation: Plastome evolution during rapid radiation of Rhodiola on the Qinghai-Tibetan Plateau

Conservation and innovation: Plastome evolution during rapid radiation of Rhodiola on the Qinghai-Tibetan Plateau

Journal Pre-proofs Conservation and innovation: plastome evolution during rapid radiation of Rhodiola on the Qinghai-Tibetan Plateau Dan-Ni Zhao, Yi R...

23MB Sizes 0 Downloads 33 Views

Journal Pre-proofs Conservation and innovation: plastome evolution during rapid radiation of Rhodiola on the Qinghai-Tibetan Plateau Dan-Ni Zhao, Yi Ren, Jian-Qiang Zhang PII: DOI: Reference:

S1055-7903(19)30412-9 https://doi.org/10.1016/j.ympev.2019.106713 YMPEV 106713

To appear in:

Molecular Phylogenetics and Evolution

Received Date: Revised Date: Accepted Date:

8 July 2019 16 December 2019 16 December 2019

Please cite this article as: Zhao, D-N., Ren, Y., Zhang, J-Q., Conservation and innovation: plastome evolution during rapid radiation of Rhodiola on the Qinghai-Tibetan Plateau, Molecular Phylogenetics and Evolution (2019), doi: https://doi.org/10.1016/j.ympev.2019.106713

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

© 2019 Published by Elsevier Inc.

Conservation and innovation: plastome evolution during rapid radiation of Rhodiola on the Qinghai-Tibetan Plateau

Dan-Ni Zhao, Yi Ren, Jian-Qiang Zhang* College of Life Sciences, Shaanxi Normal University, Xi’an 710119, China *Corresponding author: [email protected] (J. Q. Zhang) Running title: Plastome evolution of Rhodiola

1

ABSTRACT The amount of plastome sequence data available has soared in the last decade, but the nature of plastome evolution during rapid radiations is largely unknown. Moreover, although there is increasing evidence showing that plastomes may have undergone adaptive evolution in order to allow adaptation to various environments, few studies have systematically investigated the role of the plastome in alpine adaptation. To address these questions, we sequenced and analyzed 12 representative species of Rhodiola, a genus which includes ca. 70 perennial herbs growing in alpine habitats in the Qinghai-Tibet Plateau and the Hengduan Mountains. Rapid radiation in this genus was triggered by the uplift of the Qinghai-Tibet Plateau. We also included nine species of Crassulaceae as the outgroups. All plastomes were conserved with respect to size, structure, and gene content and order, with few variations: each contained 134 genes, including 85 proteincoding genes, 37 tRNAs, 8 rRNAs, and 4 potential pseudogenes. Three types of repeat sequence were detected. Slight contraction and expansion of the inverted repeats were also revealed. Both the genome-wide alignment and sequence polymorphism analyses showed that the inverted repeats and coding regions were more conserved than the single-copy regions and the noncoding regions. Positive selection analyses identified three genes containing sites of positive selection (rpl16, ndhA, ndhH), and one gene with a faster than average rate of evolution (psaA). The products of these genes may be involved in the adaptation of Rhodiola to alpine environments such as low CO2 concentration and high-intensity light. Keywords: adaptive evolution, plastome, Qinghai-Tibetan Plateau, rapid radiation, Rhodiola, structural variation 1. Introduction Plastids were once free-living prokaryotes, and over time their genomes (plastomes) have shrunk substantially from their ancestral size as a result of transferring most of their genes to the nuclear genome (Timmis et al., 2004; Kleine et al., 2009). In photosynthetic angiosperms, plastomes are relatively conserved in structure, gene number and arrangement (Palmer and Stein, 1986; Daniell et al., 2016). Typically, a plastome is comprised of a large single-copy region (LSC) and a small single-copy region (SSC) separated by two identical inverted repeats (IRa and IRb), including approximately 80 protein coding genes (PCGs), 30 transfer RNA (tRNA) genes and four ribosomal RNA (rRNA) genes (Daniell et al., 2016). PCGs encode photosynthetic genes,

2

transcription and translation related genes, as well as some proteins related to other metabolism and synthesis processes (Kleine et al., 2009). Despite their conserved structure and sequence, plastomes have been found to be variable in size, structure, and evolutionary rates of genes even among closely related species (Sloan et al., 2014; Williams et al., 2019). In addition, gene gain and loss events in plastomes have also been reported (Daniell et al., 2016; Ye et al., 2018). However, it is still largely unknown how plastomes evolved during evolutionary radiations, which are ubiquitous across the Tree of Life.

Plastomes have played a critical role in phylogenetic and phylogeographic studies for more than three decades due to their conserved structure, lack of recombination, uniparental inheritance, and ease of amplification (Shaw et al., 2005; Shaw et al., 2014). All these studies have been based on one untested assumption: that sequence variation in plastomes is maintained by genetic drift, i.e., polymorphisms are selectively neutral. Nevertheless, evidence has been accumulating that positive selection may have played a role in the evolution of plastomes (reviewed by Bock et al., 2014). In particular, comparative genomic studies in diverse groups of plants, facilitated by the Next Generation Sequencing (NGS), have detected signatures of positive selection from the patterns of sequence diversity in plastomes (Kapralov and Filatov, 2007; Hu et al., 2015; Jiang et al., 2018; Ye et al., 2018). However, few studies have systematically investigated whether adaptive evolution at the plastomic level has occurred in extreme environments (e.g., alpine areas), where adaptive evolution has been detected in a number of nuclear genes (Zhang et al., 2016; Zhang et al., 2019).

Rhodiola L. (Crassulaceae) is a good model to study plastome evolution in rapid radiations and the adaptive evolution of plastid genes when organisms are adapting to the alpine environment. The genus comprises ca. 70 species, of which ~90% are endemic to the Qinghai-Tibetan Plateau and the adjacent areas (QTP hereafter) (Fu and Fu 1984; Ohba, 1978; Fu and Ohba, 2001; Ohba, 2005). All species of Rhodiola grow in high-altitude habitats, and the species diversity of the genus was generated by a rapid radiation triggered by the uplift of the QTP in the middle Miocene (Zhang et al., 2014a, b). Importantly, several species (e.g., R. crenulata) are distributed in nival or subnival areas of the QTP, which makes them among the earth’s highest-living vascular plants. Although plant genomic adaptation to high altitudes has been investigated at the

3

nuclear genome level in several plants (Zhang et al., 2016; Zhang et al., 2019), it is still less clear what role the plastome has played in the process. Given the many important functions of plastids, including photosynthesis, the synthesis of amino acids, nucleotides, fatty acids, phytohormones, vitamins, and the assimilation of sulfur and nitrogen, it is to be expected that the plastome, which encodes many proteins that are key to these functions, would be subject to strong natural selection in high altitudes, an environment that is characterized by low temperature, strong winds, and hypoxic air (Körner, 2003).

In the present study, using Rhodiola as the system, we aim to infer the structure and variation in gene content of plastomes among species which experienced a rapid radiation, and to detect possible signatures of positive selection in plastid genes in response to adaptation to alpine habitats. Our specific hypotheses are: 1) the structure and gene content of plastomes in Rhodiola are conserved, as the rapid radiation has occurred very recently, leaving little time to accumulate these variations; 2) some plastomic genes related to photosynthesis and other physiological functions would be under positive selection. To test these hypotheses, we sequenced 19 plastomes including 12 Rhodiola species and seven outgroups in the same family, and conducted systematic analyses of these genomes. Overall, this study will increase our understanding of plastome evolution during rapid radiations and in extreme environments.

2. Materials and Methods

2.1 Taxon sampling and DNA extraction We included 12 Rhodiola species that represented different clades according to the previous phylogeny (Zhang et al., 2014a), as well as seven species from other genera of the Crassulaceae as outgroups. Collection information, voucher and habitat information are provided in Table S1. Leaves were dried in silica gel in the field or frozen fresh at -20 °C.

The CTAB method (Doyle and Doyle, 1987) was used to extract DNA from 20-30 mg of dried tissue or 50-60 mg of frozen tissue. A Qubit 2.0 fluorometer (Invitrogen, Life Technologies) was used to check DNA amount and quality. The final concentration for library preparation was normalized to 20 ng/μl.

4

2.2 Sequencing, genome assembly and gene annotation Library preparation and sequencing were conducted at BioMarker Technology Co., Ltd. (Beijing, China) according to the standard protocol provided by Illumina. Briefly, genomic DNA was mechanically fragmented by ultrasonic treatment, then purified and the terminals repaired. After addition of sequencing adaptors, fragments of 270 bp in length were selected by agarose gel electrophoresis. Finally, PCR was performed to amplify the library. After quality inspection, the library was sequenced with Illumina X Ten (Illumina, San Diego, CA, USA). Paired-end reads with a length of 150 bp were produced. These raw reads were trimmed of adaptors and filtered for quality, keeping reads with over 90% of the base pairs having a quality score of greater than Q30. The main data filtering steps were as follows: (1) removing adapters; (2) filtration of reads with N content > 10%; (3) removing reads with a mass value of less than 10 and over 50%.

Due to the large amount of data obtained, we firstly used Geneious R10 (Biomatters Ltd., Auckland, New Zealand) to estimate the coverage of the total data by mapping the original sequencing data to a single PCG, psbA, extracted from the reference sequence, and then used CLC Genomic Workbench v8 (CLC Bio, Aarhus, Denmark) to set the final coverage to 100× 200× as the target. We then randomly extracted part of the original data for further assembly. With Sedum sarmentosum (GenBank accession: NC_023085) as the reference, we used MITObim v1.9 (Hahn et al., 2013) to assemble and repair gaps in the plastome of R. rosea. Gene annotation was performed with Geneious R10. The plastome of R. rosea was subsequently used as the reference for assembly and gene annotation of other genomes. Draft annotations were manually checked and adjusted according to the reference genome in order to accurately delimit start and stop codons and the boundaries between gene exons and introns. tRNAscan-SE v1.21 (Schattner et al., 2005) was used to verify annotated tRNA genes. All annotated genomes were deposited in GenBank, and the accession numbers are shown in Table 1. To visualize the structure and genomic content of Rhodiola plastomes, a physical map of R. smithii was drawn in the Organellar Genome Draw program (OGDRAW; http://ogdraw.mpimp-golm.mpg.de/; Lohse et al., 2013). For further analyses, we constructed three data sets: Rhodiola only (R), Rhodiola +

5

two closest relatives (S. sarmentosum and P. kamtschaticus; R2), and Rhodiola + all outgroups (Ra).

2.3 Comparative analysis Multiple sequence alignment was conducted in MAFFT v7 (Katoh and Standley, 2013) with default parameters. The alignment obtained was then checked manually in Geneious R10. In order to detect structural variation among Rhodiola plastomes, mVISTA (Frazer et al., 2004) was used to visualize the alignment of data set R, with R. rosea as the reference and using ShuffleLAGAN mode. In addition, we used Mauve v2.3.1 (Darling et al., 2010) to determine whether gene rearrangement events had happened within the genus. As the structures of all Rhodiola plastomes were highly conserved (See 3.2), we chose R. rosea as the representative of Rhodiola and compared it with other outgroups using Mauve v2.3.1. We also compared the locations of genes at region boundaries manually in Geneious R10 in order to detect contraction and expansion of these boundaries in data set R2. To characterize differences in variation among different regions of the genome, nucleotide diversity (Pi) in data set R was estimated by DnaSP v5.0 (Librado and Rozas, 2009) for all coding and non-coding regions (intergenic regions and introns) separately. The trnC (GCA)-petN region of R. smithii had a deletion of approximately 400 bp, so this sequence was deleted to avoid dragging down the overall diversity of the region.

2.4 Repetitive sequence analyses Four types of repetitive sequence were searched for in data set R: tandem, dispersed, palindromic repeats, and microsatellite sequences (SSRs). The online software package Tandem Repeats Finder v4.09 (http://tandem.bu.edu/trf/trf.html) was used to search for tandem repeats with a length of at least 10 bp. Alignment parameters (match, mismatch and indel) were set at 2, 7 and 7 respectively. REPuter software (Kurtz et al., 2001) was used to search for dispersed repeat and palindromic repeat sequences, with the minimum repeat size set at 30 bp, and the minimum interval between repeat sequences at 3 bp. The minimum similarity between sequences was set as default as 90%. The MICROSATELLITE (MISA) perl script (Thiel et al., 2003) was used to search for SSRs. Mono-, di-, tri-, tetra-, penta-, and hexa-nucleotide thresholds were set to 10, 5, 4, 3, 3, and 3 respectively.

6

2.5 Phylogenetic analyses In order to reconstruct a phylogeny for further selection analyses, we applied maximum parsimony (MP), maximum likelihood (ML) and Bayesian inference (BI) methods to data set Ra. The data set included 12 Rhodiola species plus nine species belonging to eight genera (seven newly sequenced and two published: Phedimus kamtschaticus and Sedum sarmentosum; Table 1) used as outgroups according to previous phylogenetic studies (Mort et al., 2001; Mayuzumi and Ohba, 2004). As the phylogenies based on coding or noncoding regions coincided with that based on the complete genomes, we used complete genomes for the analyses. The optimal nucleotide substitution model was determined by jModeltest v2.1 (Darriba et al., 2012) using the Akaike Information Criterion (AIC). The MP tree was constructed by PAUP* 4.0b10 (Swofford, 2003) with the same configuration in Zhang et al., (2014). RaxML v7.2.8 (Stamatakis, 2006) was used for the ML tree with the GTRIG model and 1,000 bootstrap replicates. MrBayes 3.2.2 (Ronquist and Huelsenbeck, 2003) was used to perform BI analysis. A random tree was chosen as the starting tree, and four Metropolis-coupled Markov chain Monte Carlo (MCMC) chains (20,000,000 generations for each chain) were used. The sampling frequency was set to every 2000 generations. The first 20% trees were discarded as burn-in. Convergence of the MCMC chains was inspected by Tracer v1.7.1 (Rambaut et al., 2018) with an effective sample size (ESS) for each parameter of over 200. Finally, TreeAnnotator v1.7.1 (Drummond et al., 2012) was used to summarize the most probable trees into a maximum clade credibility (MCC) tree with median node heights. The alignments used in this study and the resulting phylogenetic trees are available from the TreeBase: http://purl.org/phylo/treebase/phylows/study/TB2:S25517.

2.6 Positive selection analysis Phylogenies based on different methods yield congruent topology, so we performed subsequent analyses based on the ML tree. We firstly used a branch model in the PAML (Yang, 2007) codeml program to identify rapidly evolving genes. The null model assumes that all branches evolve at the same rate, while the alternative model allows the foreground branch to evolve at a different rate. Likelihood ratio test (LRT) values at df = 1 were used to distinguish between the null and alternative models. Genes with p < 0.05 and ω > 1 are regarded as genes that evolve significantly faster than others.

7

We then used the codeml program's branch-site model to identify positively selected genes in the foreground branch. Based on a Bayes empirical Bayes (BEB) result with a posterior probability greater than 0.95, an LRT value was established to compare a model allowing positive selection (ω > 1) acting on a site in the foreground branch with a null model where the site may have undergone neutral evolution (ω = 1) or purifying selection (ω < 1). Genes with p < 0.05 in a chisquare test were selected as candidate positives.

For both models, we used two settings for the foreground: one with all Rhodiola species and the other with only the dioecious clade of Rhodiola (see 3.4). The first setting was used to detect positive evolution at the generic level, and the second at the infrageneric level.

2.7 Environmental analyses To link environmental factors to adaptability of genes positively selected and with faster rate of evolution in plastomes of Rhodiola (see 3.5), we tested whether there were significant habitat differences between Rhodiola and the outgroups, as well as between the two clades within Rhodiola (see 3.4). We downloaded occurrence sites of Rhodiola species included in the present study and outgroups from GBIF (https://www.gbif.org/) using R package ‘dismo’ (Hijmans, 2014). To avoid over-representation of R. rosea and R. integrifolia sites, we used a grid sampling method (Li et al., 2018) to select 44 points for each species, respectively. All coordinates were manually checked to exclude duplicated and irrational points. We also excluded points with elevation beyond the normal range documented in Fu and Ohba (2001). We finally collected 503 points for Rhodiola and 753 points for the outgroups (500 Phedimus and 253 other outgroups). The information of all sampled points is shown in Table S7. We compared altitudes between Rhodiola and all outgroups, as well as between the two clades within Rhodiola using Welch's ttest in R v3.6.1 (R Core Team, 2019). Furthermore, we downloaded 19 bioclimatic variables of current version 1.4 from WorldClim (Hijmans et al., 2005; http://worldclim.org/), and extracted variables for Rhodiola species and their closest relative Phedimus using DIVA-GIS v5.2 (Hijmans et al., 2012). We then conducted a Principal Component Analysis (PCA) using R v3.6.1 (R Core Team, 2019). Before conducting the PCA, all data was subjected to ShapiroWilk’s test in R to check normality. For bioclimatic variables that show greater contribution to PCs, we tested if they were significantly different in Rhodiola and outgroups using Welch's t-test

8

in R (R Core Team, 2019). All data visualization was performed in R package ‘ggplot2’ (Wickham, 2016).

3. Results

3.1 Characteristics of Rhodiola plastomes and comparison with other genera A total of 19 complete plastomes including 12 from Rhodiola and seven from outgroups were sequenced and annotated. The final reads (2,883,056-26,124,956) were mapped to reference sequences with mean coverage ranging from 127.60 × (R. ovatisepala) to 1,006.00 × (R. smithii) (Table S1). Plastomes of the newly obtained species all had the quadripartite structure typical of angiosperms (Fig. 1), including the LSC (81,651-83,253), the SSC (16,784-17,093) and two IRs (25,430-25,880) (Table 1). Among Rhodiola species, R. smithii had the smallest plastome (150,286 bp), while R. fastigiata had the largest (151,924 bp). The overall GC content of the genome was 37.7%-37.8%, while that of the IR regions was higher (42.9%-43.0%) than those of the LSC (35.7%-35.8%) and SSC (31.6%-31.9%) regions (Table 1). Gene number was also uniform among Rhodiola species: each plastome contained 134 predicted genes, of which 20 were duplicated in the IR regions. The 114 unique genes included 80 PCGs, 30 tRNA genes and four rRNA genes (Tables 1, S2; Fig. 1). Among the 80 PCGs, three were potentially pseudogenes (ycf15, both copies; ycf1 and rps19, one of the two copies in each case). A total of 18 genes (excluding five repeated copies) had introns, with 15 containing one and three containing two introns (Table S2). Compared to outgroups from other genera of Crassulaceae, Rhodiola species did not show significant differences in genome size, GC content or gene number (Table 1).

Comparison of Rhodiola with two outgroups revealed relatively stable IRs, with no significant expansion or contraction. In these 14 plastomes, the LSC-IRa and LSC-IRb borders were located in the rps19 gene, and no shifts were detected (Fig. 2). The boundary of SSC-IRa lay in the intergenic spacer region between ycf1 and ndhF, while the SSC-IRb border was within the other copy of ycf1. The exact position of the SSC-IRb border had shifted 2-5 bp among Rhodiola species, while those in the two outgroups had shifted 14 bp in P. kamtschaticus and 3 bp in S.

9

sarmentosum (Fig. 2). The length of the portion of the ycf1 gene that located in the IRa region was from 1078 bp (R. ovatisepala) to 1243 bp (P. kamtschaticus).

3.2 Structure and sequence variation Using R. rosea as the reference, mVISTA results showed that all Rhodiola plastomes had high sequence similarity, especially in the coding regions (Fig. 3). However, there were several significant differences in non-coding regions. The region with the lowest similarity was petApsbJ, in which R. smithii had a 694 bp gap compared to R. rosea. Multiple alignment analysis of the 12 Rhodiola plastomes by Mauve revealed no genomic rearrangement events (Fig. S1). Compared to outgroups, again, no gene rearrangement inversion events were detected (Fig. S2).

The range of nucleotide diversity in the non-coding regions was 0-0.0518, with an average of 0.0091, which was significantly higher than that in the coding regions (0-0.0313, 0.0034; twelch = 6.784, p = 9.07 × 10 -11) (Table S3; Fig. S3). Also, consistent with the results of genome-wide comparisons (Fig. 3), the nucleotide diversity values for IR regions (0.0003 and 0.0019 for coding and non-coding regions, respectively) were lower than those for the LSC (0.0036, twelch = 2.894, p = 0.0047; 0.0105, twelch = 5.303, p = 5.73 × 10 -7 ), and SSC (0.0059, twelch = 7.602, p = 2.27 × 10 -8 and 0.0128, twelch = 6.171, p = 4.6 × 10 -7) (Table S3). The five genes with the highest sequence diversity were rps16, rps8, trnL (UAA), rpl22 and ycf1, and the corresponding noncoding regions were: rps3-rpl22, trnH-psbA, trnW-trnP, rpoC1-rpoB and rpl33-rps18 (Fig. S3). The five most informative non-coding regions with a length equal to or greater than that of psbA-trnH (271bp) were psbA-trnH, trnG-trnR, rpl32-trnL, trnD-trnY and ccsA-ndhD.

3.3 Repetitive sequences The distributions of tandem repeats, dispersed and palindromic repeats, and SSRs in Rhodiola species were analyzed. A total of 624 SSRs, including mono-, di-, tri-, tetra-, penta-, and hexanucleotide, were found; the numbers of the first five classes were 478, 86, 3, 51, and 5, respectively. Rhodiola rosea contained one hexanucleotide repeat. Rhodiola rhodantha contained the most SSRs (58), while R. integrifolia and R. dumulosa (48) had the least (Fig. 4A). Trinucleotides were present only in R. yunnanensis (1) and R. crenulata (2) (Fig. 4A). Mononucleotides (77%) could be divided into two types: A/T (477) and G/C (one in R.smithii)

10

(Fig. 4A; Fig. S4A, B). The numbers and compositions of the other three types of repeats for each species are shown in Fig. 4B and Tables S4-S6. Sixty-two percent of the repeats were 25-40 bp long, and 35 percent were 41-65 bp long (Fig. S4C, D). Repetitive sequences were distributed mostly in the LSC, with only a few in the SSC and IRs (Fig. 4C). In addition, repetitive sequences were distributed mainly in intergenic spacer regions (IGS) (Fig. 4D), though a few were also found in intron regions and coding regions (such as ycf2, ndhF, psaB) (Tables S4-S6).

3.4 Phylogenetic analysis Based on the complete plastomes of the 21 species included, the MP, ML and BI trees were congruent in topology, thus only the ML tree is shown and clade support values for the MP and BI methods are indicated on the corresponding clades. Except for the clade comprising R. humilis and R. ovatisepala, all clades were 100% supported (Fig. 5). All Rhodiola species formed a monophyletic clade, sister to Phedimus. The Rhodiola clade was further divided into two subclades; one of these was comprised of four dioecious species (R. rosea, R. yunnanensis, R. crenulate and R. fastigiata), and the other clade included all the hermaphrodite species (Fig. 5).

3.5 Positively selected genes At the generic level, i.e., using all Rhodiola species as the foreground, the branch model in codeml detected only one gene (psaA) with a significantly accelerated rate of evolution (p = 0.04). The branch-site model detected three genes containing sites that had been subject to positive selection: ndhA, ndhH and rpl16. The p-values and selected sites are shown in Table 2. At the infra-generic level, i.e., taking the dioecious clade as the foreground clade, no signals of positive selection were detected in either model.

3.6 Environmental analyses In the PCA for Rhodiola and Phedimus, PC1 and PC2 explained 68.17% of the total variation (Table S8). PC1 (45.33% of variation) was mainly contributed by bio1, bio6, bio9, bio10, bio11, bio12, bio13, bio16, and bio17, while bio3, bio4 and bio15 had higher contributions to PC2 (22.84% of variation) (Table S8). All these bioclimatic variables but bio13 were significantly different between Rhodiola and Phedimus (Table S8). Most Rhodiola locations were separated

11

from those of Phedimus in the biplot (Fig. 6A). Rhodiola rosea and R. integrifolia had overlapped niche with Phedimus, which is not unexpected as they have overlapped distributions with Phedimus in Siberia. In addition, the niche of R. rosea and R. integrifolia was different from other Rhodiola species on the QTP (Fig. 6A). In the PCA for the two clades within Rhodiola, the first two principal components accounted for 45.9% and 25.5% of the total variance (cumulative proportion: 71.4%; Table S9). The locations of the two clades were largely overlapped in the biplot (Fig. 6B). The altitudes of Rhodiola species were significantly higher than the outgroups (twelch = 36.49, p = 3.1 × 10 -199) (Fig. 7A), while the altitudes of the two clades within Rhodiola were not

significantly different (twelch = 0.35, p = 0.73) (Fig. 7B).

4. Discussion 4.1 Phylogenetic relationships The topologies of the MP, ML and BI trees were consistent, rendering all Rhodiola species as a monophyletic clade, which was sister to Phedimus. The monophyly of Rhodiola is therefore confirmed by our plastomic data, a finding consistent with previous studies (Mayuzumi and Ohba, 2004; Zhang et al., 2014a, b). At the intrageneric level, Rhodiola was divided into two clades, of which one was comprised of four dioecious species (R. rosea, R. yunnanensis, R. crenulata and R. fastigiate, all belonging to the subgenus Rhodiola of Ohba (1978)), and the other clade included all hermaphrodite species, except for R. integrifolia, which has been tentatively classed as a hybrid between R. rosea and R. rhodantha (Hermsmeier et al., 2012). This dichotomy was not revealed by previous studies based on a limited number of plastid markers and the ITS data (Mayuzumi and Ohba, 2004; Zhang et al., 2014a, b), demonstrating the power of using plastomes to investigate rapid radiations. Plastomes have been also used in other studies to unravel phylogenetic relationships among closely-related species (Fu et al., 2017; Huang et al., 2014; Welch et al., 2016). However, our data included only 12 Rhodiola species, and more sampling is needed in order to fully utilize the power of plastome research in resolving the enigma of Rhodiola phylogeny.

4.2 Conservation of Rhodiola plastomes

12

The genome size, gene order and number were highly conserved both within Rhodiola and when this genus was compared to outgroups (Table 1). Each plastome encoded 134 genes, including 85 PCGs, 37 tRNAs, 8 rRNAs and 4 potential pseudogenes (Table 1). The overall GC contents of these plastomes were also similar to those of other angiosperms (Palmer, 1991). IR regions had a higher GC content than LSC and SSC regions, as a result of the high GC content of rRNA genes located in these regions (Kim and Lee, 2004; Ramen et al., 2017). We did not detect any inversions either within Rhodiola or in comparison to the outgroups (Fig. S1, S2). These findings are consistent with a previous survey of four species of Saxifragales (including Sedum sarmentosum), which showed that the organization of the plastomes of plants in this order is well conserved (Dong et al., 2013). Despite this conservation, there were some peculiarities that merit discussion.

Previous studies have revealed frequent IR/SSC and IR/LSC boundary shifts (e.g. Ye et al., 2018; Zeng et al., 2017; Zhang et al., 2017), which are common in angiosperms (Kim and Lee, 2004; Zhu et al., 2016). The IR region is very important in stabilizing the structure of the chloroplast genome (Maréchal and Brisson, 2010), and it is mainly responsible for variations in the length of the plastome. For Rhodiola, we detected SSC-IRb border shifts of 2-5 bp, while the location of the LSC/IR boundary was identical across fourteen species (Fig. 2). As expected, these shifts are relatively trivial, demonstrating the conserved nature of Rhodiola plastomes.

Four potential pseudogenes (ycf1, rps19, two copies of ycf15) were annotated in the plastomes of Rhodiola and outgroup species; all four were located in the IR region (Table 1; Fig. 1). For both ycf1 and rps19, there were two copies (of different length), each in one of the IR regions (Fig. 1). The pseudogenization of both shorter copies was clearly caused by incomplete duplication, which has also been demonstrated in other studies (e.g. Dong et al., 2013; Ye et al., 2018).

Compared to N. tabacum, the longer copy of the ycf1 gene has two deletions (186 bp and 108 bp) (Fig. S5A). As the observed indels would not affect the translation of the gene, whether the longer copy of ycf1 is pseudogenes needs further investigation. No indel or premature stop codon were detected in the larger copy of the rps19 gene (Fig. S5C). The two ycf15 gene copies were revealed to be pseudogenes in other studies (e.g. Dong et al., 2013; Du et al., 2015). Compared to

13

N. tabacum, R. rhodantha has two deletions in this gene (51 bp and 131 bp), and the remaining Rhodiola species have one deletion of 51 bp. Moreover, the gene contains multiple premature stop codons (Fig. S5B).

Slip chain mismatch and intramolecular recombination are the mechanisms that probably lead to most SSRs (Ochoterena, 2009). In plant molecular studies, SSRs are a major tool for elucidating genomic polymorphism across species and for conducting population genetics on the basis of repeat length polymorphism (Zhou et al.,2014; Qi et al., 2016; Yu et al., 2017). Dispersed, palindromic and tandem repeats were identified in the plastomes of Rhodiola, most of them being distributed in intergenic and intron regions (Fig. 4D), which were highly variable regions in the plastome (Table S3; Fig. 3, S3). We also identified 624 SSRs, which were also mostly distributed in noncoding regions (Fig. 4A, D). These SSRs may be useful for subsequent population genetic and evolutionary studies.

Like those in most angiosperms, IRs and PCGs in Rhodiola plastomes were more conserved than SCs and CNS, as shown by both the whole-genome alignment and the polymorphism analysis (Table S3; Fig. 3, S3). Lower sequence diversity in the IRs than in other regions of the plastomes has been reported in other studies (Alwadani et al., 2019; Fu et al., 2018; Su et al., 2018). Copy correction between IRs and the purging of deleterious mutations by gene conversion may be the reason for the slower evolutionary rate of IRs (Khakhlova and Bock, 2006). Higher divergence in noncoding regions of plastomes has been clearly demonstrated by both large-scale studies (Shaw et al., 2014) and specific case studies (e.g. Alwadani et al., 2019; Ye et al., 2018). The five most divergent non-coding sequences we detected, trnH-psbA, rpoC1-rpoB, trnW-trnP, rpl33-rps18 and rps3-rpl22, had no overlap with those found by Shaw et al. (2014). When we focused on regions with an appropriate length for further systematics studies (here we define this length as equal to or greater than that of psbA-trnH), psbA-trnH, trnG-trnR, rpl32-trnL, trnD-trnY and ccsA-ndhD were found to be the best performers. Interestingly, rpl32-trnL was also suggested by Shaw et al. (2014). These newly identified regions could provide more phylogenetic information than markers previously used for phylogeographic and population genetic studies at the intraspecific level.

14

4.3 Adaptive evolution of plastomes during rapid radiation Rhodiola species are distributed mainly in alpine habitats, which are characterized by low temperature and low cumulative temperature, large diurnal temperature variation, strong ultraviolet radiation, and low air density (Körner, 2003). We confirmed that the distributional altitudes of Rhodiola were significantly higher than the outgroups (Fig. 7A), and the PCA also demonstrated significant niche divergence between Rhodiola and Phedimus (Table S8; Fig. 6A) We anticipated that some genes in the plastomes of Rhodiola might have undergone adaptive evolution to adapt to alpine environment, although overall genome size, structure and gene number have changed little. We examined evolution at two levels, one for Rhodiola as a whole and the other focusing on the dioecious clade (Fig. 5). Using Rhodiola as the foreground, the branch model detected the rate of evolution of psaA as being significantly faster than in the background species. In addition, the branch-site model detected three possible positively selected genes (PSGs): ndhA, ndhH and rpl16. To our knowledge, these genes are the first to be detected that may be involved in plant adaptation to high altitude. However, using the dioecious clade as the foreground, both the branch and the branch-site model failed to detect any PSG. This is congruent with the fact that the dioecious clade’s niche is not significantly diverged from that of the hermaphrodite clade (Fig. 6B). Also, the two clades’ altitudes are not significantly different from each other (Fig. 7B). These findings indicate that plastomic adaptation of Rhodiola occurred when the common ancestor of the genus diverged from Phedimus.

The ndh genes in plastomes encode a protein complex catalyzing electron transfer from NADH to photosystem I (Martín and Sabater, 2010), and they have been reported as having been lost in some parasite species (Depamphilis et al., 1990; Wicke and Naumann, 2018). The ndhA gene encodes the NDH-A protein, which has been shown to bind to the nuclear-encoded proteins NDH48 and NDH45 (Sirpio et al., 2009), and to be involved in the protection of barley leaves against mild drought stress and in protecting chloroplasts against photooxidative stress (Martin et al., 1996). Moreover, NDH-A plays a role in the response of cyanobacterial to high-intensity light (Hihara et al., 2001). The intermediate region of the second exon of the ndhA gene was found to be the region containing antigen determinants in NDH-A (Martin et al., 1996). The branch-site model detected five positively selected sites (37S, 275 V, 324H, 340L, 357C) of ndhA in Rhodiola, which were probably involved in the protection of Rhodiola plants from

15

strong light, drought and other stressful environments in higher altitudes. It is noteworthy that, the positively selected site 275V is located in the middle of the second exon of the gene, which is probably in the NDH-A epitope, vital to the function of the protein encoded.

The ndhH gene encodes the protein NDH-H which participates in the adaptation to low CO2 concentration in cyanobacteria (Deng et al., 2003). We found a positively selected site (48F) in ndhH using the branch-site model. Since the CO2 concentration in high-altitude environments is lower than that at the sea level, the product of ndhH may participate in adaptation of Rhodiola to low-CO2 environments. The psaA gene encodes photosystem I P700 chlorophyll A apolipoprotein A1, whose synthesis and accumulation in rice are light-dependent (Kapoor et al., 1994). The complex light conditions prevalent in alpine areas may have resulted in the gene evolving faster in Rhodiola than in the low-altitude species.

Another potential positively selected gene is rpl16, which encodes the ribosomal protein L16 which binds directly to 23S rRNA (Brosius and Chen, 1976). As the chloroplast ribosome and E. coli share many structural and functional similarities (Hoober, 1984), it is likely that the protein encoded by the chloroplast gene rpl16 performs the same function, but it is unclear why this highly conserved gene is positively selected in Rhodiola.

Acknowledgements We are grateful to Dr. Zeng-Qiang Qian for his assistance in genome assembly. This study was supported by the National Natural Science Foundation of China (no. 31500177 and 31870194), Shaanxi Science and Technology Program (no. 2019JM-188), and the Fundamental Research Funds for the Central Universities (no. GK201903058 to J.Q. Zhang).

References

Alwadani, K.G., Janes, J.K., Andrew, R.L., 2019. Chloroplast genome analysis of box-ironbark Eucalyptus. Mol. Phylogenet. Evol. 136, 76–86. Bock, D.G., Andrew, R.L., Rieseberg, L.H., 2014. On the adaptive value of cytoplasmic genomes in plants. Mol. Ecol. 23, 4899–4911.

16

Brosius, J., Chen, R., 1976. The primary structure of protein L16 located at the peptidyltransferase center of E. coliribosomes. FEBS Lett. 68, 105–109. Daniell, H., Lin, C.S., Yu, M., Chang, W.J., 2016. Chloroplast genomes: diversity, evolution, and applications in genetic engineering. Genome Biol. 17, 134. Darling, A.E., Mau, B., Perna, N.T., 2010. Progressive Mauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS One 5, e11147. Darriba, D., Taboada, G.L., Doallo, R., Posada, D., 2012. jModelTest 2: more models, new heuristics and parallel computing. Nat. Methods 9, 772. Deng, Y., Ye, I., Mi, H., 2003. Effects of low CO2 on NAD(P)H dehydrogenase, a mediator of cyclic electron transport around photosystem I in the cyanobacterium Synechocystis PCC6803. Plant Cell Physiol. 44, 534–540. Depamphilis, C.W., Palmer, J.D., 1990. Loss of photosynthetic and chlororespiratory genes from the plastid genome of a parasitic flowering plant. Nature 348, 337. Dong, W.P., Xu, C., Cheng, T., Zhou, S.L., 2013. Complete chloroplast genome of Sedum sarmentosum and chloroplast genome evolution in Saxifragales. PLoS One 8, e77965. Doyle, J.J., Doyle, J.L., 1987. A rapid DNA isolation procedure for small quantities of fresh leaf tissue. Phytochem. Bull. 19, 11–15. Drummond, A.J., Suchard, M.A., Xie, D., Rambaut, A., 2012. Bayesian phylogenetics with BEAUti and the BEAST 1.7. Mol. Biol. Evol. 29, 1969–1973. Du, F.K., Lang, T.G., Lu, S.H., Wang, Y.Y., Li, J.Q., Yin, K.Q., 2015. An improved method for chloroplast genome sequencing in non-model forest tree species. Tree Genet. Genomes 11, 114. Frazer, K.A., Pachter, L., Poliakov, A., Rubin, E.M., Dubchak, I., 2004. Vista: computational tools for comparative genomics. Nucl. Acids Res. 32, 273–279. Fu, C.N., Li, H.T., Milne, R.I., Zhang, T., Ma, P.F., Yang, J., Li, D.Z., Gao, L.M., 2017. Comparative analyses of plastid genomes from fourteen Cornales species: inferences for phylogenetic relationships and genome evolution. BMC Genom. 18, 956. Fu, K.T., Ohba, H., 2001. Crassulaceae. In: Wu, C.Y., Raven, P.H. (Eds.), Flora of China, vol. 8. Science Press, Beijing, pp. 202–268.

17

Fu, P.C., Sun, S.S., Zhou, X.J., Cheng, Y.W., Zhang, F.Q., Chen, S.L., Gao, Q.B., 2018. The complete plastome sequences of seven species in Gentiana sect. Kudoa (Gentianaceae): Insights into plastid gene loss and molecular evolution. Front. Plant Sci. 9, 493. Fu, S.H., Fu, K.T., 1984. Crassulaceae. In: Chen, W.Q., Ruan, Y.Z. (Eds.), Flora Reipublicae Popularis Sinicae, vol. 34. Science Press, Beijing, pp. 33–216. Hahn, C., Bachmann, L., Chevreux, B., 2013. Reconstructing mitochondrial genomes directly from genomic next-generation sequencing reads—a baiting and iterative mapping approach. Nucl. Acids Res. 41, e129. Hermsmeier, U., Grann, J., Plescher, A., 2012. Rhodiola integrifolia: hybrid origin and Asian relatives. Botany 90, 1186–1190. Hihara, Y., Kamei, A., Kanehisa, M., Kaplan, A., Ikeuchi, M., 2001. DNA microarray analysis of cyanobacterial gene expression during acclimation to high light. Plant Cell 13, 793–806. Hijmans, R.J., Cameron, S.E., Parra, J.L., Jones, P.G. Jarvis, A., 2005. Very high-resolution interpolated climate surfaces for global land areas. Int. J. Climatol. 25, 1965–1978. Hijmans, R.J., Guarino, L., Mathur, P., 2012. DIVA-GIS, version 7.5. A geographic information system for the analysis of species distribution data. Hijmans, R.J., Phillips, S., Leathwick, J., Elith, J., 2014. dismo: Species Distribution Modeling. R package version 1.1-4. https://CRAN.R-project.org/package=dismo. Hoober, J.K., 1984. Chloroplasts. Plenum Press, New York. Huang, H., Shi, C., Liu, Y., Mao, S.Y., Gao, L.Z., 2014. Thirteen Camellia chloroplast genome sequences determined by high-throughput sequencing: genome structure and phylogenetic relationships. BMC Evol. Biol. 14, 151. Hu, S.L., Sablok, G., Wang, B., Qu, D., Barbaro, E., Viola, R., Li, M.H., Varotto, C., 2015. Plastome organization and evolution of chloroplast genes in Cardamine species adapted to contrasting habitats. BMC Genomics 16, 306. Jiang, P., Shi, F.X., Li, M.R., Liu, B., Wen, J., Xiao, H.X., Li, L.F., 2018. Positive selection driving cytoplasmic genome evolution of the medicinally important ginseng plant genus Panax. Front. Plant Sci. 9, 359. Kapoor, S., Maheshwari, S.C., Tyagi, A.K., 1994. Developmental and light-dependent cues interact to establish steady-state levels of transcripts for photosynthesis-related genes (psbA, pbsD, psaA and rbcL) in rice (Oryza sativa L.). Curr. Genet. 25, 362–366.

18

Kapralov, M.V., Filatov, D.A., 2007. Widespread positive selection in the photosynthetic Rubisco enzyme. BMC Evol. Biol. 7, 73. Katoh, K., Standley, D.M., 2013. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780. Khakhlova, O., Bock, R., 2006. Elimination of deleterious mutations in plastid genomes by gene conversion. Plant J. 46, 85–94. Kim, K.J., Lee, H.L., 2004. Complete chloroplast genome sequences from Korean Ginseng (Panax schinseng Nees) and comparative analysis of sequence evolution among 17 vascular plants. DNA Res. 11, 247–261. Kleine, T., Maier, U.G., Leister, D., 2009. DNA transfer from organelles to the nucleus: the idiosyncratic genetics of endosymbiosis. Annu. Rev. Plant Biol. 60, 115–138. Körner, C., 2003. Alpine Plant Life: Functional Plant Ecology of High Mountain Ecosystems. Springer, Berlin. Kurtz, S., Choudhuri, J.V., Ohlebusch, E., Schleiermacher, C., Stoye, J., Giegerich, R., 2001. REPuter: the manifold applications of repeat analysis on a genomic scale. Nucl. Acids Res. 29, 4633–4642. Librado, P., Rozas, J., 2009. DnaSP v5: a software for comprehensive analysis of DNA polymorphism data. Bioinformatics 25, 1451–1452. Li, H.Q., Liu, X.L., Wang, J.H., Fu, Y.Y., Ding, S.M., Xie, W.Y., Zhang, J., 2018. Influence of climate change on the suitable ranges for planting pickled mustard tuber in Chongqing. J. Appl. Ecol. 29, 2651-2657. Lohse, M., Drechsel, O., Kahlau, S., Bock, R., 2013. OrganellarGenomeDRAW-a suite of tools for generating physical maps of plastid and mitochondrial genomes and visualizing expression data sets. Nucl. Acids Res. 41, 575–581. Maréchal, A., Brisson, N., 2010. Recombination and the maintenance of plant organelle genome stability. New Phytol. 186, 299–317. Martin, M., Casano, L.M., Sabater, B., 1996. Identification of the product of ndhA gene as a thylakoid protein synthesized in response to photooxidative treatment. Plant Cell Physiol. 37, 293–298. Martín, M., Sabater, B., 2010. Plastid ndh genes in plant evolution. Plant Physiol. Biochem. 48, 636–645.

19

Mayuzumi, S., Ohba, H., 2004. The phylogenetic position of eastern Asia Sedoideae (Crassulaceae) inferred from chloroplast and nuclear DNA sequences. Syst. Bot. 29, 587– 598. Mort, M.E., Soltis, D.E., Soltis, P.S., Francisco-Ortega, J., Santos-Guerra, A., 2001. Phylogenetic relationships and evolution of Crassulaceae inferred from matK sequence data. Am. J. Bot. 88, 76–91. Ochoterena, H., 2009. Homology in coding and non-coding DNA sequences: a parsimony perspective. Plant Syst. Evol. 282, 151–168. Ohba, H., 1978. Generic and infrageneric classification of the Old World Sedoideae (Crassulaceae). J. Fac. Sci. Univ. Tokyo 3, 138–139. Ohba, H., 2005. Rhodiola. In: Eggli, U. (Ed.), Illustrated Handbook of Succulent Plants: Crassulaceae, vol. 14. Springer, Berlin-Heidelberg-New York, pp. 210–227. Palmer, J.D., 1991. Plastid chromosomes: structure and evolution. Mol. Biol. Plastids 7, 5–53. Palmer, J.D., Stein, D.B., 1986. Conservation of chloroplast genome structure among vascular plants. Curr. Genet. 10, 823–833. Qi, W.C., Lin, F., Liu, Y.H., Huang, B.Q., Cheng, J.H., Zhang, W., Zhao, H., 2016. Highthroughput development of simple sequence repeat markers for genetic diversity research in Crambe abyssinica. BMC Plant Biol. 16, 139. Raman, G., Park, V., Kwak, M., Lee, B., Park, S., 2017. Characterization of the complete chloroplast genome of Arabis stellari and comparisons with related species. PLoS One 12, e0183197. Rambaut, A., Drummond, A.J., Xie, D., Baele, G., Suchard, M.A., 2018. Posterior summarisation in Bayesian phylogenetics using Tracer 1.7. Syst. Biol. 67, 901–904. R Core Team., 2019. R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. Ronquist, F., Huelsenbeck, J.P., 2003. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19, 1572–1574. Schattner, P., Brooks, A.N., Lowe, T.M., 2005. The tRNAscan-SE, snoscan and snoGPS web servers for the detection of tRNAs and snoRNAs. Nucl. Acids. Res. 33, 686–689.

20

Shaw, J., Lickey, E.B., Beck, J.T., Farmer, S.B., Liu, W., Miller, J., Siripun, K.C., Winder, C.T., Schilling, E.E., Small, R.L., 2005. The tortoise and the hare II: relative utility of 21 noncoding chloroplast DNA sequences for phylogenetic analysis. Am. J. Bot. 92, 142–166. Shaw, J., Shafer, H.L., Leonard, O.R., Kovach, M.J., Schorr, M., Morris, A.B., 2014. Chloroplast DNA sequence utility for the lowest phylogenetic and phylogeographic inferences in angiosperms: the tortoise and the hare IV. Am. J. Bot. 101, 1987–2004. Sirpio, S., Allahverdiyeva, Y., Holmstrom, M., Khrouchtchova, A., Haldrup, A., Battchikova, N., Aro, E.M., 2009. Novel nuclear-encoded subunits of the chloroplast NAD(P)H dehydrogenase complex. J. Biol. Chem. 284, 905–912. Sloan, D.B., Triant, D.A., Forrester, N.J., Bergner, L.M., Wu, M., Taylor, D.R., 2014. A recurring syndrome of accelerated plastid genome evolution in the angiosperm tribe Sileneae (Caryophyllaceae). Mol. Phylogenet. Evol. 72, 82–89. Stamatakis, A., 2006. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22, 2688–2690. Su, Y., Huang, L., Wang, Z., Wang, T., 2018. Comparative chloroplast genomics between the invasive weed Mikania micrantha and its indigenous congener Mikania cordata: structure variation, identification of highly divergent regions, divergence time estimation, and phylogenetic analysis. Mol. Phylogenet. Evol. 126, 181–195. Swofford, D., 2003. PAUP: phylogenetic analysis using parsimony (and other methods). Sunderland: Sinauer Associates. Thiel, T., Michalek, W., Varshney, R., Graner, A., 2003. Exploiting EST databases for the development and characterization of gene-derived SSR-markers in barley (Hordeum vulgare L.). Theor. Appl. Genet. 106, 411–422. Timmis, J.N., Ayliffe, M.A., Huang, C.Y., Martin, W., 2004. Endosymbiotic gene transfer: organelle genomes forge eukaryotic chromosomes. Nat. Rev. Genet. 5, 123. Welch, A.J., Collins, K., Ratan, A., Drautz-Moses, D.I., Schuster, S.C., Lindqvist, C., 2016. The quest to resolve recent radiations: plastid phylogenomics of extinct and endangered Hawaiian endemic mints (Lamiaceae). Mol. Phylogenet. Evol. 99, 16–33. Wicke, S., Naumann, J., 2018. Molecular evolution of plastid genomes in parasitic flowering plants. Adv. Bot. Res. 85, 315–347. Wickham, H., 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag, New York.

21

Williams, A.M., Friso, G., Van Wijk, K.J., Sloan, D.B., 2019. Extreme variation in rates of evolution in the plastid Clp protease complex. Plant J. 98, 243–259. Yang, Z., 2007. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591. Ye, W.Q., Yap, Z.Y., Li, P., Comes, H.P., Qiu, Y.X., 2018. Plastome organization, genomebased phylogeny and evolution of plastid genes in Podophylloideae (Berberidaceae). Mol. Phylogenet. Evol. 127, 978–987. Yu, J.Y., Dossa, K., Wang, L.H., Zhang, Y.X., Wei, X., Liao, B., Zhang, X.R., 2017. PMDBase: a database for studying microsatellite DNA and marker development in plants. Nucl. Acids Res. 45, D1046–D1053. Zeng S.Y., Zhou T., Han K., Yang, Y.C., Zhao, J.H., Liu, Z.L., 2017. The complete chloroplast genome sequences of six Rehmannia species. Genes 8, 103. Zhang, J.Q., Meng, S.Y., Allen, G.A., Wen, J., Rao, G.Y., 2014a. Rapid radiation and dispersal out of the Qinghai-Tibetan Plateau of an alpine plant lineage Rhodiola (Crassulaceae). Mol. Phylogenet. Evol. 77, 147–158. Zhang, J.Q., Meng, S.Y., Wen, J., Rao, G.Y., 2014b. Phylogenetic relationships and character evolution of Rhodiola (Crassulaceae) based on nuclear ribosomal ITS and plastid trnL-F and psbA-trnH sequences. Syst. Bot. 39, 441–451. Zhang, J., Tian, Y., Yan, L., Zhang, G.H., Wang, X., Zeng, Y., Zhang, J.J., Ma, X., Tan, Y.T., Long, N., Wang, Y.Z., Ma, Y.J., He, Y.Q., Xue, Y., Hao, S.M., Yang, S.C., Wang, W., Zhang, L.S., Dong, Y., Chen, W., Sheng, J., 2016. Genome of plant Maca (Lepidium meyenii) illuminates genomic basis for high-altitude adaptation in the central Andes. Mol. Plant 9, 1066–1077. Zhang, T.C, Qiao, Q., Novikova, P.Y., Wang, Q., Yue, J.P., Guan, Y.L., Ming, S.P., Liu, T.M., De, J., Liu, Y.X., Al-Shehbaz, I.A., Sun, H., Montagu, M.V., Huang, J.L., Peer, Y.V.D., Qiong, L., 2019. Genome of Crucihimalaya himalaica, a close relative of Arabidopsis, shows ecological adaptation to high altitude. PNAS 116, 7137–7146. Zhang, X., Zhou, T., Kanwal, N., Zhao, Y.M., Bai, G.Q., Zhao, G.F., 2017. Completion of eight Gynostemma BL. (Cucurbitaceae) chloroplast genomes: characterization, comparative analysis, and phylogenetic relationships. Front. Plant Sci. 8, 1583.

22

Zhou, S.L., Dong, W.P., Chen, X.Q., Zhang, X.C., Wen, J., Schneider, H., 2014. How many species of bracken (Pteridium) are there? Assessing the Chinese brackens using molecular evidence. Taxon 63, 509–521. Zhu, A., Guo, W., Gupta, S., Fan, W., Mower, J.P., 2016. Evolutionary dynamics of the plastid inverted repeat: the effects of expansion, contraction, and loss on substitution rates. New Phytol. 209, 1747–1756.

23

Table 1. Characteristics of plastomes of the 12 Rhodiola species and nine outgroups Species

GenBank Accession No.

Size (bp)

GC content (%)

Overall

LSC

SSC

IRs

Overall

LSC

SSC

IRs

Gene no.

PCG

tRNA

rRNA

Pseudogene

genes with introns

R. crenulata

MN794322

151,732

83,011

17,057

25,832

37.7

35.7

31.8

42.9

134

85

37(7)

8(4)

4

18

R. dumulosa

MN794323

151,582

82,873

17,009

25,850

37.8

35.8

31.8

42.9

134

85

37(7)

8(4)

4

18

R. fastigiata

MN794324

151,924

83,110

17,054

25,880

37.7

35.7

31.8

42.9

134

85

37(7)

8(4)

4

18

R. hobsonii

MN794325

150,641

81,998

17,077

25,783

37.8

35.8

31.7

43

134

85

37(7)

8(4)

4

18

R. humilis

MN794326

150,779

82,415

16,784

25,790

37.8

35.8

31.8

42.9

134

85

37(7)

8(4)

4

18

R. integrifolia

MN794327

151,452

82,915

17,055

25,741

37.8

35.7

31.8

43

134

85

37(7)

8(4)

4

18

R. ovatisepala

MN794328

151,073

82,348

17,093

25,816

37.7

35.7

31.6

42.9

134

85

37(7)

8(4)

4

18

R. prainii

MN794329

151,594

82,953

17,053

25,794

37.7

35.7

31.7

42.9

134

85

37(7)

8(4)

4

18

R. rhodantha

MN794330

151,061

82,998

17,039

25,512

37.7

35.7

31.8

43

134

85

37(7)

8(4)

4

18

R. rosea

MH410216

151,348

82,716

17,052

25,790

37.7

35.7

31.7

42.9

134

85

37(7)

8(4)

4

18

R. smithii

MN794331

150,286

81,651

17,019

25,808

37.7

35.7

31.6

42.9

134

85

37(7)

8(4)

4

18

R. yunnanensis

MN794332

151,257

82,561

17,008

25,844

37.8

35.8

31.9

42.9

134

85

37(7)

8(4)

4

18

Hylotelephium ewersii

MN794014

151,699

83,253

16,838

25,804

37.7

35.7

31.6

43

134

85

37(7)

8(4)

4

18

Kalanchoe tomentosa

MN794319

150,757

82,846

17,051

25,430

37.6

35.6

31.3

43

134

85

37(7)

8(4)

4

18

Orostachys japonica

MN794320

151,419

83,016

16,849

25,777

37.8

35.8

31.6

43

134

85

37(7)

8(4)

4

18

Phedimus aizoon

MN794321

151,393

82,868

17,043

25,741

37.7

35.7

31.8

43

134

85

37(7)

8(4)

4

18

Phedimus kamtschaticus

MG680403

151,634

83,000

16,688

25,973

37.8

35.8

31.9

42.8

134

85

37(7)

8(4)

4

18

Rosularia alpestris

MN794333

151,288

82,931

16,785

25,786

37.8

35.7

31.8

43

134

85

37(7)

8(4)

4

18

Sedum sarmentosum

NC_023085

150,126

81,890

16,670

25,783

37.7

35.7

32.1

42.9

134

85

37(7)

8(4)

4

18

Sinocrassula indica

MN7943234

151,755

83,159

16,888

25,854

37.7

35.6

31.5

42.9

134

85

37(7)

8(4)

4

18

Umbilicus rupestris

MN794335

150,995

82,681

16,926

25,694

37.6

35.6

31.1

42.9

134

85

37(7)

8(4)

4

18

24

LSC: large-single-copy; SSC: small-single-copy; IR: inverted repeat; PCG: protein-coding gene. The number of duplicated genes is shown in brackets. The number of genes with introns presented in the table does not take into account replicated genes.

25

Table 2. Positively selected genes and sites detected in the plastomes of Rhodiola species Gene name rpl16

ndhA ndhH

LRT p-value

Positive sites 22 G 0.988*, 26 C 0.970*,49 F 0.987*,87 R 0.989*,104 V 0.974*,118 L 0.989* 37 S 0.984*,275 V 0.980*,324 H 0.977*,340 L 0.985*,357 C 0.985* 48 F 0.972*

0.03

0.03 0.0014

26

Figure legends Fig. 1. Gene map of the Rhodiola smithii plastome. Genes outside the circle are transcribed in a counter-clockwise direction, whereas those inside the circle are transcribed in a clockwise direction. In the inner circle, the dark gray area indicates GC content and the thick line shows the extent of different regions. Different colors for genes indicate different functional groups. LSC: large-single-copy; SSC: small-single-copy; IR: inverted repeat. Fig. 2. Gene locations at region boundaries in plastomes of the 12 Rhodiola species and two outgroups. Fig. 3. Visualization of the alignment of Rhodiola plastomes by mVISTA. Rhodiola rosea is the reference. The gray arrows above show genes. Different colors represent different regions (coding and non-coding). Position in the genome is shown on the horizontal axis at the bottom of each block. Alignment similarity percentages are shown on the right side of the graph (the vertical axis). Two black frames indicate the two IR regions. Fig. 4. Type, distribution and numbers of three types of repeat sequences and SSRs in Rhodiola. (A) Numbers of different types of SSRs in each Rhodiola species; (B) Numbers of three different types of repeats in each Rhodiola species; (C, D) Distributions of three types of repeats and SSRs in different regions of the plastome in each Rhodiola species. LSC: largesingle-copy; SSC: small-single-copy; IR: inverted repeat. CDS, coding sequences; IGS, intergenic spacers. Fig. 5. ML tree of 12 Rhodiola and nine outgroup species based on the complete plastome. * means nodes with full support according to all three methods. Supporting values (ML/MP/PP) for the node without full support are indicated. ML, maximum likelihood; MP, maximum parsimony; PP, Bayesian inference posterior probability. Fig. 6. PCA of 19 environmental variables. (A) Rhodiola and Phedimus species; (B) The dioecious and the hermaphrodite clades of Rhodiola. Fig. 7. Violin plots of distributional altitudes among different groups. (A) Rhodiola and the outgroups; (B) The dioecious and the hermaphrodite clades of Rhodiola.

27

Supporting Information Table S1. Sampling, sequencing and assembly information for 12 Rhodiola species and 7 outgroups. Table S2. Genes in the plastomes of 12 Rhodiola species and 7 outgroups. Table S3. Nucleotide diversity (Pi) across 12 Rhodiola plastomes. Table S4. Dispersed repeats and palindromic repeats in the plastomes of 12 Rhodiola species. Table S5. SSRs in the plastomes of 12 Rhodiola species. Table S6. Tandem repeats in the plastomes of 12 Rhodiola species. Table S7. Information of the 1256 localities used in the environmental analyses of Rhodiola and outgroups. Table S8. Principal component analysis for the 19 bioclimatic variables of Rhodiola and Phedimus. Table S9. Principal component analysis for the 19 bioclimatic variables of the dioecious and the hermaphradite clades within Rhodiola. Fig. S1. Structural variation among plastomes of the 12 Rhodiola species revealed by Mauve. Fig. S2. Structural variation between plastomes of Rhodiola and outgroups revealed by Mauve. The plastome of Rhodiola rosea was selected as the representative of the genus Rhodiola. Fig. S3. Comparison of nucleotide polymorphisms among 12 Rhodiola plastomes. (A) Coding regions; (B) Intergenic regions and introns. Fig. S4. Number of six types of SSRs and length distribution of three types of repeat sequences. (A, B) Numbers and percentages of six classes of SSRs; (C, D) Lengths and distributions of three types of repeat sequences. In B and D, the number before comma indicates the number, and that after the comma the percentage. Fig. S5. Nucleotide alignment of ycf1, ycf15 and rps19 genes from Rhodiola and Nicotiana tabacum. (A, C) 12 Rhodiola species. (B) 11 Rhodiola species and R. rhodantha are shown separately.

28

* * * * *

* * *

* * * *

68/68/0.9

* * *

* * *

R. rosea R. yunnanensis Dioecious R. crenulata clade R. fastigiata R. smithii R. dumulosa R. rhodantha R. integrifolia Hermaphrodite R. hobsonii clade R. prainii R. humilis R. ovatisepala u Phedimus kamtschaticus Phedimus aizoon Orostachys japonica Hylotelephium ewersii Sinocrassula indica Umbilicus rupestris Rosularia alpestris Sedum sarmentosum Kalanchoe tomentosa

Other Rhodiola sp.

PC2 (22.84%)

2

B

6

PC2 (25.53%)

A

4

R. rosea and R. integrifolia Outgroups

1 0 −1

Dioecious clade Hermaphradite clade

2

0 −2

−3

−2

−1

0

PC1 (45.33%)

1

2

−3

−2

−1

0

1

PC1 (45.95%)

2

Altitude

A

7500

5000

2500

0 Outgroups

Rhodiola sp.

Groups

Altitude

B

7500

5000

2500

0

Dioecious clade

Hermaphradite clade

Groups

Highlights 1. Plastomes of 12 representative species of Rhodiola are conserved with respect to size, structure, gene content and order, number of repeat sequences, with few variations. 2. Rhodiola is divided into two clades: dioecious and hermaphrodite. 3. Positive selection analyses suggest that four chloroplast genes of Rhodiola may be involved in the adaptation to alpine environments.

29

Graphical abstract

30

Author statement Dan-Ni Zhao: Formal analysis, Investigation, Writing-Original Draft, Visualization. Yi Ren: Supervision, Writing-Reviewing and Editing. Jian-Qiang Zhang: Conceptualization, Methodology, Writing-Original draft preparation, Writing-Reviewing and Editing, Visualization, Funding acquisition.

31