Contrasting evolutionary patterns of multiple loci uncover new aspects in the genome origin and evolutionary history of Leymus (Triticeae; Poaceae)

Contrasting evolutionary patterns of multiple loci uncover new aspects in the genome origin and evolutionary history of Leymus (Triticeae; Poaceae)

Accepted Manuscript Contrasting evolutionary patterns of multiple loci uncover new aspects in the genome origin and evolutionary history of Leymus (Tr...

2MB Sizes 0 Downloads 19 Views

Accepted Manuscript Contrasting evolutionary patterns of multiple loci uncover new aspects in the genome origin and evolutionary history of Leymus (Triticeae; Poaceae) Li-Na Sha, Xing Fan, Jun Li, Jin-Qiu Liao, Jian Zeng, Yi Wang, Hou-Yang Kang, Hai-Qin Zhang, You-Liang Zheng, Yong-Hong Zhou PII: DOI: Reference:

S1055-7903(17)30368-8 http://dx.doi.org/10.1016/j.ympev.2017.05.015 YMPEV 5828

To appear in:

Molecular Phylogenetics and Evolution

Received Date: Revised Date: Accepted Date:

21 October 2016 14 May 2017 16 May 2017

Please cite this article as: Sha, L-N., Fan, X., Li, J., Liao, J-Q., Zeng, J., Wang, Y., Kang, H-Y., Zhang, H-Q., Zheng, Y-L., Zhou, Y-H., Contrasting evolutionary patterns of multiple loci uncover new aspects in the genome origin and evolutionary history of Leymus (Triticeae; Poaceae), Molecular Phylogenetics and Evolution (2017), doi: http:// dx.doi.org/10.1016/j.ympev.2017.05.015

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Contrasting evolutionary patterns of multiple loci uncover new aspects in the genome origin and evolutionary history of Leymus (Triticeae; Poaceae) Li-Na Sha a, b, 1, Xing Fan a, 1, Jun Li c, Jin-Qiu Liao d, Jian Zeng e, Yi Wang a, Hou-Yang Kang a, Hai-Qin Zhang a, You-Liang Zheng a, b, Yong-Hong Zhou a, b * a Triticeae Research Institute, Sichuan Agricultural University, Wenjiang 611130, Sichuan, China b Key Laboratory of Crop Genetic Resources and Improvement, Ministry of Education, Sichuan Agricultural University, Yaan 625014, Sichuan, China c Crop Research Institute, Sichuan Academy of Agricultural Science, Chengdu 610066, Sichuan, China d College of Life Science, Sichuan Agricultural University, Yaan 625014, Sichuan, China e College of Resources, Sichuan Agricultural University, Wenjiang 611130, Sichuan, China * Author for correspondence. E-mail: [email protected]. Tel.: 86-28-86291005. Fax: 86-28-82650350. 1 These authors contributed equally.

1

Abstract Leymus Hochst. (Triticeae: Poaceae), a group of allopolyploid species with the NsXm genomes, is a perennial genus with diversity in morphology, cytology, ecology, and distribution in the Triticeae. To investigate the genome origin and evolutionary history of Leymus, three unlinked low-copy nuclear genes (Acc1, Pgk1, and GBSSI) and three chloroplast regions (trnL-F, matK, and rbcL) of 32 Leymus species were analyzed with those of 36 diploid species representing 18 basic genomes in the Triticeae. The phylogenetic relationships were reconstructed

using

Bayesian

inference,

Maximum

parsimony,

and

NeighborNet methods. A time-calibrated phylogeny was generated to estimate the evolutionary history of Leymus. The results suggest that reticulate evolution has occurred in Leymus species, with several distinct progenitors contributing to the Leymus. The molecular data in resolution of the Xm-genome lineage resulted in two apparently contradictory results, with one placing the Xm-genome lineage as closely related to the P/F genome and the other splitting the Xm-genome lineage as sister to the Ns-genome donor. Our results suggested that (1) the Ns genome of Leymus was donated by Psathyrostachys, and additional Ns-containing alleles may be introgressed into some Leymus polyploids by recurrent hybridization; (2) The phylogenetic incongruence regarding the resolution of the Xm-genome lineage suggested that the Xm genome of Leymus was closely related to the P genome of Agropyron; (3) Both Ns- and Xm-genome lineages served as the maternal donor during the speciation of Leymus species; (4) The Pseudoroegneria, Lophopyrum and Australopyrum genomes contributed to some Leymus species. 2

Keywords: Genome origin, evolutionary history, Leymus, polyploid, interspecific hybridization, phylogenetic incongruence, low-copy nuclear (Acc1, Pgk1, GBSSI).

Introduction Leymus Hochst. is in the economically important grass tribe Triticeae and contains approximately 30 species that are distributed in the temperate regions of Eurasia, North America, and South America, extending to the subtropics and tropical alpine regions (Dewey, 1984; Löve, 1984; Barkworth and Atkins 1984; Fan et al., 2009, 2014; Sha et al., 2014). Associated with extensive geographic dispersal, the natural habitats of Leymus include saline or alkaline lands and dry or semidry areas in addition to shady and moist forests. Subsequently, species of Leymus show large morphological variation from absent to strong rhizomes, single to multiple spikelets per node, erectly involute to loosely flat leaves, and subulate to lanceolate to absent glumes (Fan et al., 2009; Yen et al., 2009; Sha et al., 2014, 2016). Cytologically, the species are all allopolyploids with chromosome numbers ranging from 2n = 4X = 28 to 2n = 12X = 84 (Löve, 1984). With diversity in morphology, cytology, ecology and distribution in the Triticeae, the genus Leymus is a good model to study polyploid speciation and to estimate the forces acting on species richness. In attempts to sort the phylogenetic details of the genus, phylogenetic reconstructions have been based on a molecular RAPD (Random Amplified Polymorphic DNA) marker (Yang et al., 2008), morphological characteristics (Sha et al., 2009), chloroplast genes (Liu et al., 2008; Zhou et al., 2010; Culumber et al., 2011; Sha et al., 2014), a mitochondrial gene (Sha et al., 2014), internal transcribed spacer (ITS) sequences (Liu et al., 2008; Sha et al., 3

2008; Fan et al., 2014), and single-copy nuclear genes (Fan et al., 2009; Zhou et al., 2010; Sha et al., 2016). However, a consensus definition of clades has not emerged because phylogenetic trees are inferred from either a limited number of informative characters (Sha et al., 2008; Yang et al., 2008), a limited number of genes (Sha et al., 2008, 2009, 2014, 2016; Fan et al., 2009, 2014), or a limited number of samples (Liu et al., 2008; Zhou et al., 2010; Culumber et al., 2011). Attention has also focused on the historical evolutionary processes that drove species diversification and dispersal events in Leymus. Recent time-calibrated phylogenies suggested that most extant Leymus species are derived from the rapid diversification of an ancestor lineage during the Miocene (Fan et al., 2009, 2014; Sha et al., 2016), despite orthologous comparisons from limited genome homologous copies of the genes used. However, a comprehensive data set including all the species using multiple loci continues to be lacking, although such a data set is necessary to develop good hypotheses of phylogenetic relationships and evolutionary history within Leymus. Additionally, little is known about the origin Leymus and subsequent diversification and range expansion to its current ecological distribution. Cytogenetic evidence confirms that the origin of Leymus occurred through a process of allopolyploidization and that the genus has two basic genomes, Ns and Xm (Wang et al., 1994; Sun et al., 1995; Zhang et al., 2006; Yen et al., 2009). The Ns genome originated from Psathyrostachys, which has been repeatedly substantiated by meiotic pairing in interspecific hybrids (Wang and Jensen, 1994; Zhang et al., 2006), DNA hybridization patterns (Zhang and Dvorak, 1991), and DNA sequence information (Sha et al., 2008, 2010, 2014, 2016; Liu et al., 2008; Fan et al., 2009; Zhou et al., 2010; Culumber et al., 2011). However, despite 50 4

years of intense effort, the origin of the Xm genome of Leymus remains undetermined. To date, the Xm genome is proposed to be donated by Pseudoroegneria (St) (Shiotani, 1968), Thinopyrum (Eb) (Dewey, 1984), Psathyrostachys (Ns) (Zhang and Dvorak, 1991), Lophopyrum (Ee) (Sun et al., 1995), Agropyron (P) (Fan et al., 2009; Sha et al., 2010), and/or Eremopyrum (F) (Fan et al., 2009; Sha et al., 2014). Phylogenetic analysis of single-copy nuclear DMC1 data suggests that the origin of the Xm genome of Leymus could differ among species (Sha et al., 2016). Topological conflict regarding the resolution of the Xm-genome lineage is common among previous analyses of molecular data (Liu et al., 2008; Fan et al., 2009; Sha et al., 2010; Zhou et al., 2010), which highlights the necessity to resolve which factors lead to the topological conflict in the origin of Xm genome. Consequently, the genomes of Leymus species are designated as NsXm until the source of Xm is identified (Yen et al., 2009; Wang and Lu, 2014). Phylogenetic analysis is routinely applied to address evolutionary questions. Chloroplast DNA is usually maternally inherited in angiosperms (Zimmer and Wen, 2013); therefore, this DNA cannot detect hybridization events (Zimmer and Wen, 2013) but can be used to identify the maternal parents of hybrid speciation in polyploids (Zimmer and Wen, 2013; Sha et al., 2014; Brassac and Blattner, 2015). In comparison with plastid DNA, nuclear genes represent the overwhelming majority of the cellular genome, thereby providing markers to track organismal evolution through both male and female Mendelian inheritance. Single- and low-copy nuclear genes have long been considered to be ideal tools for reconstructing polyploid phylogenies, as demonstrated by tracing the origin of a polyploid (Kim et al., 2008; Brassac and Blattner, 2015), identifying genome 5

donors (Mason-Gamer, 2004; Fan et al., 2009), substantiating hybridization events or introgression (Mason-Gamer, 2004; Fan et al., 2013a), examining duplicate gene evolution in polyploids (Sun et al., 2007; Fan et al., 2012), and clarifying patterns of plant diversification (Sha et al., 2016; Brassac and Blattner, 2015). Single- and low-copy nuclear markers also have some disadvantages, such as difficulty in designing the universal PCR and/or homoeolog-specific primers for cloning the amplicons (Triplett et al., 2012; Brassac and Blattner, 2015), intensive lab work in isolating the orthologs (Petersen and Seberg, 2004), and incongruences among gene trees (Brassac and Blattner, 2015). Currently, researchers prefer the strategies that effectively deduce the incongruences (Chen et al., 2015). Because multigene analyses are expected to be more sensitive to phylogenetic signal quality, comparative phylogenies between rapidly and slowly evolving genes, together with improved taxon sampling, are suggested to reduce incongruences and resolve different level relationships in phylogeny (Jian et al., 2008; Nozaki et al., 2009; Chen et al., 2015). In this study, to identify the origin and the relationships of the polyploid Leymus species, we present a phylogeny for 32 species of Leymus and 36 diploid species representing 18 basic genomes in Triticeae based on three unlinked single-copy nuclear genes (Acc1, plastid Acetyl-CoA carboxylase; Pgk1, Phosphoglycerate kinase; GBSSI, Granule-Bound Starch Synthase I) and three chloroplast regions (trnL-F, trnL (UAA)–trnF (GAA); matK, maturase coding gene; rbcL, ribulose-1, 5-bisphosphate carboxylase/oxygenase). In Triticeae plants, the single-copy nuclear Acc1 and Pgk1 genes are in the groups 2 and 3 of homoeologous chromosomes, respectively (Huang et al., 2002), and the low-copy genes encoding GBSSI are located on chromosomes 7 and 4 (Yamamori et al., 6

1994). The evolutionary rate for each locus was also estimated in comparative phylogenies. Our objectives were (1) to elucidate the phylogenetic relationships of Leymus species; (2) to demonstrate the polyploid speciation in Leymus; (3) to deduce the unknown origin of the Xm genome in Leymus; and (4) to reveal the genomic relationships among Leymus species.

Materials and Methods Plant Materials Thirty-two species, one variety and one subspecies of Leymus were included in this study, which represented nearly all the species accepted in the classification of the genus Leymus, although the definitions of species number within Leymus are variable (Yen et al., 2009). These members of Leymus were analyzed together with 36 diploid species representing 18 basic genomes in the tribe Triticeae. The species of Bromus L. was used as the outgroup based on previous phylogenetic studies of Triticeae (Mason-Gamer, 2004; Fan et al., 2009) and Poaceae (Minaya et al., 2015). The Pgk1, GBSSI, trnL-F, matK, and rbcL regions of all Leymus taxa plus the Acc1 region of L. alaicus, L. erianthus and L. mollis were sequenced. The remaining gene sequences from polyploid species and diploid Triticeae plants were downloaded from GenBank (Fan et al., 2009, 2012; Dong et al., 2013; MasonGamer, 2004). Plant materials with genomic constitutions, ploidy and GenBank accession numbers are presented in Table S1. The geographic origin and ploidy level of Leymus species was determined by Yen and Yang (2011). The origin, ploidy, and the numbers of voucher specimens for sampled Leymus species are listed in Table S2. The seed materials of Leymus with PI and W6 numbers were kindly provided by American National Plant Germplasm System (Pullman, 7

Washington, USA), and L. duthiei var. longearistata was kindly provided by Dr. S. Sakamoto (Kyoto University, Japan). The seed materials of Leymus with ZY and Y numbers were collected from the Qinghai–Tibetan Plateau and adjacent region by the authors. The plants and voucher specimens of the Leymus species are deposited at the Herbarium of Triticeae Research Institute, Sichuan Agricultural University, China (SAUTI). DNA Amplification, cloning, and sequencing Total DNA was extracted from fresh leaves of plants using the CTAB (cetyl trimethyl ammonium bromide) method (Doyle and Doyle, 1987). The Acc1, Pgk1, GBSSI, trnL-F, matK, and rbcL sequences were amplified with the primers listed in Table S3. All PCRs (Polymerase Chain Reaction) were conducted in a 50 μL reaction volume, containing 1.5 mmol/L MgCl2, 200 μmol/L dNTPs, 1.0 μmol/L of each primer, 20-40 ng of template DNA, 1.5 U Ex Taq Polymerase (TaKaRa Biotechnology (Dalian) Co., Ltd, Liaoning, China) and sterile water to the final volume. PCR was conducted under the cycling conditions reported previously (Hilu et al., 1999; Huang et al., 2002; Mason-Gamer et al., 2002; Mason-Gamer, 2004; McMillan and Sun, 2004). All PCR products were purified using ExoSAP-IT (Affymetrix, USA). PCR products were cloned into the pMD18-T vector (TaKaRa, Dalian, China) following the manufacturer’s instructions, and transformed into E. coli JM109 competent cells (Promega) following the manufacturer’s protocol. Between 20 and 30 cloned fragments were amplified directly from white colonies using the same primers and recipe as those used for the original PCR. At least fifteen

clones

per

allopolyploid

were

sequenced

to

obtain

all

the

possible Acc1, Pgk1, and GBSSI sequences from the putative donor species. For the chloroplast regions, at least five independent clones were selected and sequenced. 8

Sunbiotech Company (Beijing, China) commercially sequenced the cloned DNA in both directions. DNA Baser v.3 (http://www.DnaBaser.com) was used to evaluate the chromatograms for base confirmation. DNA analysis Multiple sequence alignment was conducted using ClustalX (Thompson et al., 1999), with default parameters and additional manual edits to minimize gaps. Amino acid translations were used to guide the nucleotide alignments. The sequence statistics, including nucleotide substitutions, transition/transversion ratio, and variability of the sequences, were calculated by MEGA 6 (Tamura et al., 2013). Cloning of PCR amplicons, particularly those from single-copy nuclear genes of allopolyploid species, isolates homoeologous sequences from each nuclear genome, with two or more homoeologous sequences types included in allopolyploid species (Fan et al., 2013a). Following an initial phylogenetic analysis, the number of sequences was reduced (from 15-20 independent clones for nuclear gene sequences and from five independent clones for chloroplast sequences), keeping only one sequence when more sequences of the same individual formed a monophyletic group. Additionally, according to the p-distance (<0.01) criterion (Díaz-Pérez et al., 2014), highly similar sequences were grouped into consensus types. Following this criterion, a single sequence was used to represent the same orthologous copy in monophyletic group. The distinct sequences with a p-distance more than 0.01 within an individual that separated into different clades were all included in the data analyses. Detection of potential recombinants, pseudogenes, and non-purifying sequences The Recombination Detection Program (RDP4) version 4.16 (Martin et al., 9

2010) was used to estimate potential recombination events within single- or lowcopy nuclear gene sequences (Acc1, Pgk1, and GBSSI) of Leymus. Recombination events, including recombinant, parental sequences, and scores for potential recombination events, were examined by the MaxChi, Chimaera, SiScan, and 3SEQ methods implemented in RDP4. Since masking of similar sequences was allowed to increase the power of the recombination detection methods, only exon region from Leymus was used to perform recombination detection. Pseudogenic copies were detected by two methods. First, because some potential pseudogenic copies could accumulate mutations and eventually produce premature terminal codons (PTCs) during evolutionary history (Minaya et al., 2015), the PTCs were checked among the coding positions of the sampled nuclear sequences from polyploids. Secondly, theory assumed that most of the coding positions in a functional protein are constrained by purifying selection, while pseudogenes should show neutral variation (Pond and Frost, 2005; Minaya et al., 2015). Purifying and diversifying selection were investigated by BranchSite random effects likelihood (BS-REL) model, implemented in the WEB interface DataMonkey (Pond and Frost, 2005). The BS-REL methods are able to estimate the proportion of sites with different selection pressure ranges (ω = dN/dS; dN and dS representing nonsynonymous/synonymous substitutions per nonsynonymous/synonymous sites, respectively) for all lineages of the tree (Pond et al., 2011). If the sequences with high proportions of sites under neutral or nearly neutral selection (represented by ω = dN/dS ≈ 1) were observed, it would be considered evidence of potential pseudogenic copies. In BS-REL analyses, the codon sequences of low-copy nuclear gene (Acc1, Pgk1, and GBSSI) from Leymus were used. 10

We also carried out BS-REL analysis to estimate low-copy nuclear gene lineage-specific proportion of sites (Pr1, Pr2 and Pr3) showing rates circumscribed to three discrete classes (ω1, ω2, and ω3), respectively. Classes ω1 and ω2 are less than 1, specifying purifying selection, whereas ω3 (unconstrained class) is greater than 1, and is related to adaptive selection. One advantage of BS-REL for detecting selection is that it requires no prior knowledge about which lineages are of interest (i.e., are more likely have experienced episodic diversifying selection). This method of detecting selection was chosen since our primary interest was to determine which gene copies, rather than sites, are under selection. Phylogenetic analyses Phylogenetic analyses were conducted using Bayesian inference (BI) and Maximum parsimony (MP). Bayesian and parsimony methods of phylogenetic inference can provide a phylogenetic framework for analyzing highly heterogeneous data sets including anomalies that could have originated from incomplete lineage sorting, horizontal gene transfer, recombination, or hybridization (Minaya et al., 2015; Blair and Murphy, 2011). In phylogenetic analyses, the recombinants and pseudogenes with PTCs were excluded, and the sequences with neutral or nearly neutral selection detected by BS-REL model were included into phylogenetic analyses to identify its placements in phylogenetic tree. Prior to phylogenetic analyses, the partition homogeneity test implemented in

the

program

PAUP*4.0b10

(Swofford

D

L,

Sinauer

Associates,

http://www.sinauer.com) was used to determine whether different loci (Acc1, Pgk1, GBSSI, trnL-F, matK, rbcL) contained significantly different signals. The partition homogeneity test can generate significant results if one gene has 11

experienced many multiple substitutions or contains random information. The result of the partition homogeneity test revealed strong incongruence among the Acc1, Pgk1, and GBSSI loci and between nuclear (Acc1 + Pgk1 + GBSSI) and plastid (trnL-F + matK + rbcL) data sets (P < 0.01), but not among the plastid markers (P > 0.05). Four data matrixes, including Acc1 data, Pgk1 data, GBSSI data, and combined chloroplast (trnL-F + matK + rbcL) data, were thus used to conduct separate phylogenetic analyses. The evolutionary model used for the phylogenetic analyses was determined using jModelTest 2 (Darriba et al., 2012) with Hierarchical Likelihood Ratio Test (hLRTs), AIC (Akaike information criterion) and BIC (Bayesian information criterion) tests. The optimal model identified for each data set is listed in Table S4. Bayesian inference analysis was performed using MrBayes v3.2 (Ronquist et al., 2012), and BI analyses of the Acc1 data, Pgk1 data, and GBSSI data were conducted under the identical evolutionary model as the jModelTest 2 analysis. Indels of GBSSI introns were treated as missing data, as indicated by Díaz-Pérez et al. (2014). For the combined chloroplast data, because the TVM + G + I model was not implemented in MrBayes v3.2, we chose instead the closely related GTR + G + I model. Four MCMC (Markov Chain Monte Carlo) chains (one cold and three heated), applying MrBayes default heating values (t = 0.2), were run for 10,000,000 generations for each data set, with each sampled every 100 generations. The first 25,000 trees were discarded as ‘‘burn-in”. The program Tracer v1.4 (Rambaut and Drummond, 2007) was used to examine the log likelihoods, ensuring that they were in the stationary “furry caterpillar” phase. The remaining trees were used to construct the 50%-majority rule consensus trees. Two independent runs were performed to check whether convergence on 12

the same posterior distribution was reached. The statistical confidence in nodes was evaluated by posterior probabilities (PP). A PP-value less than 90% was not included in figures. Maximum parsimony (MP) analysis was performed using New Technology Search in TNT v1.1 (Goloboff et al., 2008). Search parameters included 100 random addition replicates, including 20 iterations of the parsimony ratchet, 10 cycles of tree drifting, and 15 rounds of tree fusing. All characters were equally weighted and nonadditive. Gaps were treated as missing data. Equally parsimonious trees were collapsed into a strict consensus. Bootstrap values for nodal support were obtained in TNT with 1000 pseudoreplicates, using random taxon addition and TBR branch swapping. To visualize phylogenetic structure and possible reticulating relationships between Leymus and its relatives, the NeighborNet method and the EqualAngle display option, implemented in the program SplitsTree version 4.14.4 (Huson and Bryant, 2006), was used to generate phylogenetic networks for nuclear data sets. NeighborNet partition network analysis was performed after removal of the recombinants and potential pseudogenes (including the sequences with neutral or nearly neutral selection and the one with PTCs). To assess the support for the observed structure, a bootstrap analysis was conducted with 1000 replicates. Considering the possibility that the phylogenetic analysis algorithm might face problems when data, particularly from chloroplast sequences, did not represent a tree-like structure (Jakob and Blattner, 2006), the Median-Joining (MJ) network method implemented in the Network 4.1.1.2 program (Fluxus Technology, Clare, UK) was used to reveal relationships between ancestral and derived chloroplast sequence haplotypes. The MJ network has been used successfully to reveal 13

chloroplast progenitor–descendant relationships of polyploid species within the Triticeae (Jakob and Blattner, 2006; Sha et al., 2010, 2014; Dong et al., 2013). Divergence dating Divergence time estimates with 95 % confidence intervals (CIs) were performed in BEAST v.1.8.0 (Drummond et al., 2012), incorporating an uncorrelated exponential clock model and Yule speciation process. The analysis of divergence dating was conducted on the same data sets as analysis in BI and MP analyses. The lack of fossils for the Triticeae precluded a direct calibration of tree topologies. Instead, molecular dating was based on the divergence time for the basal-most split in the Triticeae (Marcussen et al., 2014). Priors on the Triticeae crown age (15.32 Ma ± 0.34) were set as inferred by Marcussen et al. (2014), where several macrofossils from grass (Festuca, Berriochloa, and Nassella) were used to calibrate the age of the Triticeae. The analysis was run using the Yule species tree prior, as well as the piecewise linear and constant root population model. Three independent analyses were computed for 20 000 000 generations each under the GTR + G + I model (with the associated parameters specified by jModelTest 2 as the priors), sampling the states every 1000 generations. Tracer v1.4 (Rambaut and Drummond, 2007) was used to ensure the convergence of the mixing in terms of the effective sample size (ESS) values and the coefficient rate. Appropriate burn-ins were estimated from each trace file, discarded and all analyses

were

combined

with

LogCombiner

(http://beast.bio.ed.ac.uk/LogCombiner). Tree files were compiled into a 14

maximum

clade

credibility

tree

using

TreeAnnotator

(http://beast.bio.ed.ac.uk/TreeAnnotator) to display mean node ages and highest posterior density (HPD) intervals at 95 % for each node. Trees were then viewed in FigTree v. 1.3.1 (http://tree.bio.ed.ac.uk/).

Results 3.1 Acc1 data At least fifteen clones with putative Acc1 inserts were sequenced for polyploid L. alaicus, L. erianthus and L. mollis. In cases when multiple identical sequences resulted from cloned PCR products of each accession, only one sequence was included in the dataset. These sequences, together with published Ns- and Xmtype Acc1 sequences from 29 species and one subspecies of Leymus (Fan et al., 2009), were analyzed with those from 34 diploid species representing 18 basic genomes in Triticeae. Consequently, sixty Acc1 sequences, representing three distinct types (Ns-, St-, and Xm-types, see Discussion) were detected from a total of 33 Leymus taxa (Table S2). No PTCs among Acc1 coding sequences were observed. Recombination event in the Acc1 exon region was also not detected by two or more algorithms (p < 0.05) implemented in the program RDP4. The Branch-Site REL (BS-REL) analysis detected different sequence positions under negative, neutral, and positive selection within Acc1 exons. Seven sequences showed a high proportion of neutral sites (> 50%), low proportion of sites under positive selection, and a complete absence of positions under purifying selection (Figure S1). They were: LAKMO (59% neutral sites/41% positively selected 15

sites), LANGU2 (76%/26%), LFLEX1 (88%/12%), LKARE2 (58%/42%), LMULT (77%/23%), LOVAT1

(88%/12%), LPEND1

(80%/20%), and

LPEND2

(74%/26%). This suggests relaxed selection rates in these seven sequences, which is an indicator of potential pseudogenes. The aligned Acc1 sequences yielded a total of 1555 characters of which 589 were variable characters and 290 were informative. Most of the Acc1 sequences substitutions were silent, with a predominance of transition mutations (si) (33) compared with transversions (sv) (22) and a mean si/sv ratio of 1.5. Bayesian phylogenetic reconstruction of the Acc1 resulted in a tree with high posterior probability support across most clades. Parsimony analysis of the Acc1 data set recovered 56 equally most parsimonious trees, and the topology of MP strict consensus was highly similar to that of the BI consensus tree. The tree illustrated in Figure 1A is the phylogenetic tree with collapsed clades, and the full tree is shown in Supplementary Data Figure S1. Allopolyploid Leymus Acc1 sequences were split into three major clades (Clades I-III) with consistent statistical support (> 70% BS; > 90% PP) (Figure 1A and Figure S1). Clade I included Agropyron (P), Eremopyrum (F), and Leymus species with NsXm genomes (89% BS and 100% PP). Clade II consisted of Pseudoroegneria (St) species and L. mollis (99% BS and 100%

PP).

Clade

III

contained

the

Psathyrostachys (Ns)

and

polyploid Leymus species with NsXm genomes (72% BS and 100% PP). The analysis of phylogenetic network after removal of seven potential Acc1 pseudogenes detected by BS-REL analysis showed that three statistically 16

supported groups with moderate bootstrap support were recovered (Figure 1B), which is consistent with the groupings generated from BI and MP analysis. These groups were: (i) Agropyron + Eremopyrum + Leymus species with Xm-copy types (81%

BS);

(ii)

Pseudoroegneria +

L.

mollis

(100%

BS);

(iii)

Psathyrostachys + Leymus with Ns-copy types (73% BS). The BEAST analyses of the Acc1 sequences within Leymus and its putative diploid relatives generated a time-calibrated tree (Figure S2). Under an uncorrelated exponential clock model, the coefficient of rate variation was estimated to be 0.90 (95% C.I., 0.63-1.05) (Table 1), indicating that a relaxed clock was appropriate. In the time-calibrated tree, the Ns-copy sequences of Leymus species were grouped with the sequences of Psathyrostachys with well statistic support (100%PP), the Xm-copy sequences of Leymus were clustered with

the

sequences

of

Agropyron/Eremopyrum

with

100%PP,

and

Pseudoroegneria species and L. mollis formed one clade (100% PP). This is well consistent with the groupings generated from BI, MP, and NeighboNet analysis. The mean ages with 95 % CIs are indicated in the chronogram (Figure S2). The Triticeae crown clade age (16.04 MYA, 95 % CI 14.04, 23.90) fitted to our prior. Based on time calibration analysis, the estimate of the age of the most recent common ancestor (MRCA) of Leymus and Psathyrostachys (Node Ns) was 8.61 MYA (7.49, 15.62) MYA, and the age for the divergence of Leymus, Agropyron, and Eremopyrum (Node Xm) was dated to 9.12 MYA (7.55, 14.63) MYA. The average substitution rate of Acc1 genes in Triticeae was dated as 0.00189 (0.00171, 17

0.00207) substitutions per site per MYA. 3.2 Pgk1 data At least fifteen clones with putative Pgk1 inserts were sequenced for sampled Leymus species. Clones with a p-distance less than 0.01 from the same individual formed a monophyletic group, suggesting that they represent the same orthologous copy. Only one sequence from monophyletic group was thus included in the Pgk1 analyses. Consequently, fifty-three Pgk1 sequences, representing three distinct types (Ns-, St/Ee/W-, and Xm-types, see Discussion) were detected from a total of 32 Leymus taxa (Table S2). There were no PTCs among the Pgk1 coding sequences. The RDP analysis showed one recombination event that occurred in the Ns-copy sequence of L. chinensis (LCHIN1) (Table 2). Nine Pgk1 sequences showed a high proportion of neutral sites (> 50%) (LAMBI1, 60% neutral sites/40% positively selected sites; LAREN, 52%/48%; LCINE2A, 67%/33%; LERIA1, 72%/28%; LFLEX1, 86%/14%; LQING1, 92%/8%; LSALI, 76%/24%; LTRIT, 63%/37%; LYIWU1, 57%/43%), indicating that they might be potential pseudogenes. Of 1560 total characters of the Pgk1 data, 532 characters were variable and 250 characters were informative. The transition and transversion mutations were 35 and 22, respectively. A mean si/sv ratio was 1.6. Bayesian inference of the Pgk1 generated a tree with high posterior probability support across most clades. Parsimony analysis of the Pgk1 data set recovered 86 equally most parsimonious trees, and the strict consensus topology was largely congruent with the trees produced in the BI analyses. The tree illustrated in Figure 2A is the phylogenetic tree with collapsed clades, and the full tree is shown in Supplementary Data Figure 18

S3. Allopolyploid Leymus Pgk1 sequences were split into three major clades (Clades A-C) with consistent statistical support (> 70% BS; > 90% PP) (Figure 2A and Figure S3). Clade A was composed of the Psathyrostachys (Ns) and Leymus (NsXm) species (89% BS and 100% PP), Clade B contained Leymus (NsXm) species with 100% BS and 100% PP, and Clade C included Pseudoroegneria (St), Lophopyrum (Ee), Australopyrum (W), and Leymus (NsXm) species (99% PP). After removal of seven potential Pgk1 pseudogenes detected by BS-REL analysis, the NeighborNet graph partitioned Leymus into three groups with moderate bootstrap support (Figure 2B), which is congruent with the groupings generated from BI and MP analysis. These groups were: (i) Agropyron + Eremopyrum

+

Leymus species

with

Xm-copy

types

(81%

BS);

(ii)

Pseudoroegneria + L. mollis (100% BS); (iii) Pseudoroegneria + Lophopyrum (Ee) + Australopyrum (W) + Leymus (70% BS). Consistent with the placement of Ag. monglicum (AGMON) in BI tree, AGMON was clustered with Eremopyrum species in NeighborNet graph. The BEAST analyses of the Pgk1 sequences generated a time-calibrated tree (Figure S4). Under an uncorrelated exponential clock model, the coefficient of rate variation was estimated to be 0.92 (95% C.I., 0.68-1.12) (Table 1), indicating that a relaxed clock was appropriate. In the time-calibrated tree inferred from the Pgk1, the Ns-copy sequences of Leymus species were grouped with the sequences of Psathyrostachys (100% PP), and the Xm-copy sequences of Leymus formed one clade (100% PP). Four Leymus species (Leymus mollis, Leymus salinus, Leymus paboanus, and Leymus racemosus) were clustered with Pseudoroegneria, 19

Lophopyrum (Ee), and Australopyrum (W) species (100% PP). These results are well in agreement with the groupings generated from BI, MP, and NeighboNet analysis. The estimated age of the MRCA of Leymus and Psathyrostachys (Node Ns) was 6.21 (95% C.I., 6.19-12.00) MYA, and the age for the divergence of Leymus (Node Xm) was dated to 9.01 (95% C.I., 5.28-11.49) MYA. The Triticeae crown clade age (16.00 MYA, 95 % CI 13.20-22.31) fitted to our prior. The average substitution rate of Pgk1 genes in Triticeae was dated as 0.0021 (95% C.I., 0.00182-0.00226) substitutions per site per MYA. 3.3 GBSSI data At least fifteen clones with putative GBSSI inserts were sequenced for sampled Leymus species. The sequences with a p-distance more than 0.01 were analyzed those from 35 diploid taxa representing 18 basic genomes in Triticeae. Seventyfour GBSSI sequences, representing three distinct types (Ns-, St-, and Xm-types, see Discussion) were detected from a total of 32 Leymus taxa (Table S2). Four GBSSI sequences with PTCs (LFLEX1B, LKARE2D, LRAMO2B, and LRACE2C) were found. On the basis of the exon region of the GBSSI, four recombination events from L. coreanus (LCORE2A and LCORE2B), L. leptostachys (LLEPT1C), and L. arenarius (LAREN3B) were detected by RDP4 (Table 2). The BS-REL analysis showed that eleven GBSSI sequences (LALAI2A, 84%/16%; LAMBI2A, 81%/19%; LANGU1B, 55%/45%; LAREN2A, 56%/44%; LCORE1A, 67%/33%; LQING2A, 76%/24%; LOVAT2B, 99%/1%; LRACE1B, 84%/16%; LRACE2A, 82%/18%; LSALT1A, 78%/22%; LTIAN1A, 60%/40%) have a high proportion of neutral sites and relatively low proportion of sites under positive selection. This suggests 20

relaxed selection rates in these eleven sequences. In the GBSSI sequence data matrix, of the 1342 total characters, 727 were variable and 488 were parsimony informative. Most GBSSI sequence substitutions were silent, with a predominance of transition mutations (49) compared with transversions (38) and a mean si/sv ratio of 1.3. The MP analysis resulted in 20 equally parsimonious trees, and the strict consensus topology was highly congruent with the trees produced in the BI analyses. The tree illustrated in Figure 3A is the phylogenetic tree with collapsed clades, and the full tree is shown in Supplementary Data Figure S5. The Leymus GBSSI sequences formed four clades (Clades i-iv). Clade i included the Psathyrostachys (Ns) and Leymus (NsXm) species (86% BS and 92% PP), Clade ii was composed of Leymus (NsXm) species with 70% BS and 92% PP, Clade iii contained Pseudoroegneria (St), Australopyrum (W), and Leymus (NsXm) species (90% PP), and Clade iv contained Pseudoroegneria (St) and Leymus (NsXm) species (95% BS and 100% PP). After removal of both four pseudogenes with PTCs and eleven potential pseudogenes detected by BS-REL method, the NeighborNet graph partitioned Leymus into six groups with moderate bootstrap support (Group i-vi) (Figure 3B), which is congruent with the groupings generated from BI and MP analysis. Group i included Leymus and Psathyrostachys lanuginosa (90% BS). Group ii contianed Leymus and three Psathyrostachys species (Psa. juncea, Psa. fragilis, and Psa. huashanica) (73% BS). All the Xm-copy sequences of Leymus were divided into three groups (Group iii-v). Group vi comprised Pseudoroegneria, Australopyrum, and Leymus species (71% BS). In the time-calibrated tree inferred from the GBSSI, all the Ns-copy sequences of Leymus species were grouped with the sequences of Psathyrostachys (100% 21

PP), and all the Xm-copy sequences of Leymus formed one clade (100% PP) (Figure S6). Two Leymus species (Leymus akmolinensis and Leymus karelinii) were grouped with Pseudoroegneria strigosa (100% PP). Four Leymus species (L. paboanus, L. alaicus, L. arenarius, and L. karelinii) and Pseudoroegneria, and Australopyrum were in one clade (100% PP) (Figure S6). These results are congruent with the groupings generated from BI, MP, and NeighboNet analysis. Under an uncorrelated exponential clock model, the coefficient of rate variation was estimated to be 0.85 (95% C.I., 0.66-0.98) (Table 1). The estimated age of the MRCA of Leymus and Psathyrostachys (Node Ns) was 8.36 (95% C.I., 6.05-10.49) MYA, and the age for the divergence of Leymus (Node Xm) was dated to 7.10 (95% C.I., 6.03-11.95) MYA. The Triticeae crown clade age (15.60 MYA, 95 % CI 12.6420.07) fitted to our prior. The average substitution rate of GBSSI genes in Triticeae was dated as 0.0050 (95% C.I., 0.00456-0.00544) substitutions per site per MYA (Table 1). 3.4 Combined chloroplast data The combined matK, rbcL, and trnL-F data yielded a total of 2435 characters of which 357 were variable characters and 96 were informative. The transition mutations and transversions are not equal, with one (si) being 17 and another (sv) being 15. A mean si/sv ratio was 1.1. Parsimony analysis of the combined chloroplast data set recovered 4 equally most parsimonious trees, and the topology of MP strict consensus was identical to that of the BI. The tree illustrated in Figure 4A is the BI tree in which BS values are shown above the branches and PP below the branches. Combined chloroplast sequences of Leymus were represented in two clades, corresponding to the two genomic types (Ns and Xm). The Ns clade included the Psathyrostachys (Ns) and Leymus (NsXm) species from 22

Eurasia (except L. coreanus and L. komarovii from East Asia) (99% BS and 100% PP). The Xm clade contained the Leymus species from America and L. coreanus and L. komarovii from East Asia (71% BS and 94% PP). The genealogical relationships among 47 combined chloroplast haplotypes from 47 taxa were described by the MJ network (Figure 4B). In MJ analysis, each circular network node represents a single sequence haplotype, with node size proportional to the number of isolates with that haplotype. Mv (median vectors representing missing intermediates) show unsampled nodes inferred by MJ network analysis, and the number along the branches shows the number of mutations separating nodes. The network showed that Leymus species were nested into two basic clusters of haplotypes. The first cluster, Ns, contained two diploid Psathyrostachys (PSAJU and PSAFR) haplotypes and 20 Leymus haplotypes. The second cluster, Xm, included six Leymus haplotypes from America and two Leymus haplotypes from East Asia. The closest Xm haplotype was 28, 31, 35, 37, 41, 45, and 50 mutational steps removed from the P, Ns, E e, F, Eb, St, and H haplotypes, respectively (Figure 4B). The topology of the time-calibrated tree inferred from combined chloroplast data was identical to that of BI tree (Figure S7). Molecular dating for the Ns and Xm nodes was performed under an uncorrelated exponential clock. The estimated of coefficient of rate variation was 0.93 (95% C.I., 0.65-1.16) (Table 1). The average substitution rate of the combined matK, rbcL, and trnL-F loci in Triticeae was dated as 0.00099 (95% C.I., 0.000862-0.00111) substitutions per site per MYA. Based on the combined chloroplast data, the age of the MRCA of Leymus and Psathyrostachys (Node Ns) was dated to 7.99 (95% C.I., 3.73-10.87) MYA, and the estimated age for the divergence of Leymus in Node Xm was 7.70 (95% C.I., 2.7323

10.24) MYA (Table 1). The Triticeae crown clade age (14.14 MYA, 95 % CI 10.0919.46) fitted to our prior.

Discussion Evolutionary dynamics of the Acc1, Pgk1, and GBSSI sequences in Leymus polyploids Cloning of PCR amplicons from low-copy nuclear genes from allopolyploid species will isolate homoeologous sequences from each nuclear genome. In many cases, one or two distinct Acc1, Pgk1, and GBSSI sequences derived from their ancestral genome donors (Ns and Xm genomes) were detected from each of sampled accession (Table S2). The missing genome homoeologous types might result from a sampling artifact in PCR reaction, or they might show the occasional loss of one copy of the gene, either through homoeologous recombination or deletion. Sampling artifact is the most straightforward explanation, as it would result in a random and balanced sampling for each genome type. The present study showed the complex evolutionary dynamics of the Acc1, Pgk1, and GBSSI genes in Leymus, which associated with pseudogenization, recombination, and relaxed selection. We fail to detect PTCs as evidence of pseudogenic copies within the Acc1 and Pgk1 exons, while GBSSI exons with PTCs were observed from four species (LFLEX1B, LKARE2D, LRACE2C, and LRAMO2B). Since completely Acc1 and Pgk1 exons were not sequenced, this criterion alone does not conclusively rule out the presence of pseudogenes. In theory, increase in gene copy number might be more prone to lack of functional constraint (pseudogenization). In Triticeae, the low-copy GBSSI gene (located on chromosomes 7 and 4) might thus be more prone to pseudogenization than the 24

single-copy Acc1 (located on chromosome 2) and Pgk1 (located on chromosome 3) genes (Yamamori et al., 1994; Huang et al., 2002). Based on the BS-REL method, we interpreted sequences with a large proportion of sites under neutral selection as an indicator of relaxation of selection pressure. The eight Acc1 sequences, nine Pgk1 sequences, and eleven GBSSI sequences were detected to have a high proportion of neutral sites and a complete absence of sites under purifying selection, which is suggestive of potential pseudogenes and/or relaxation of selection pressure. Paralogous gene copies can be deduced by the tree-based method, from confirmed diploid relatives showing copies in two divergent positions of a phylogenetic tree (Minaya et al., 2015). In this study, several paralogous gene copies were observed using tree-based method within the Acc1 (LERIA2A and LERIA 2B), Pgk1 (LOVAT2A and LOVAT2B; LCINE2A and LCINE2B; LTRIT2A and LTRIT2B), and GBSSI (LCORE1A and LCORE1C; LPABO1A, LPABO1B, and LPABO1C; LKARE3A and LKARE3B) genes. For example, in the Clade I of Acc1 tree, one paralogous copy of L. erianthus (LERIA2A) from America and the Xmcopy sequences of four America Leymus (L. triticoides, L. ambiguous, L. salinus, and L. innovates) and one Eurasia Leymus species (L. karelinii) formed a paraphyletic grade within the subclade. Another paralogous copy of L. erianthus (LERIA2B) was basal to the rest of the Clade I (Figure S1). These results suggest that L. erianthus might derive their Xm genome by recurrent hybridization with different Leymus species. Similarly, the paralogous copies within Pgk1 and GBSSI gene trees did not form one sister group but were scattered in different subclades (Figure S2-S6). Given the presence of paralogous copies within sampled gene data, it can be suggested that a multiple origin of some Leymus 25

species resulting from recurrent hybridization is the rule rather than the exception. Multiple origins resulting from recurrent hybridization within polyploid Leymus species is further supported by the present recombination analyses. Although the major and minor parental sequences of the recombinants cannot be unambiguously identified, one Pgk1 recombinants (LCHIN1) and four GBSSI (LAREN2B, LCORE2A, LCORE2B, and LLEPT1C) recombinants suggest that cross-species recombination occurred in Leymus. Multiple origins help to explain the rich genetic diversity among Leymus species and also promoted rapid adaptation of the Leymus species to different ecological habitats. Ns genome donor of Leymus The phylogenetic analyses based on the three single- or low-copy nuclear genes (Acc1, Pgk1, and GBSSI) showed that Leymus was grouped with Psathyrostachys species with good statistical support (Table 3), indicating that the Ns genome of Leymus (NsXm) originated from Psathyrostachys. This conclusion is consistent with previous results from cytogenetic evidence (Wang and Jensen, 1994; Wang et al., 1994; Zhang et al., 2006), DNA hybridization patterns (Zhang and Dvorak, 1991), AFLP profiles (Culumber et al., 2011) and DNA sequence information (Sha et al., 2008, 2010, 2014, 2016; Fan et al., 2014). The combined chloroplast data further suggested that all the sampled Leymus species from Eurasia (nineteen species and one variety; the exceptions were L. coreanus and L. komarovii from East Asia) were clustered with Psathyrostachys, which indicated that the Psathyrostachys species (Ns genome) served as the maternal donor during the polyploid speciation of these Eurasian Leymus taxa. The genus Psathyrostachys is the Ns haplome donor, which contains approximately eight diploid (NsNs) or tetraploid (NsNsNsNs) species distributed 26

in the Middle East, central Asia, and northern China (Yen and Yang, 2011). Based on DNA hybridization and FISH patterns, Wang et al. (2006) proposed that the Ns genome of Leymus might have originated from Psa. juncea and Psa. lanuginosa, whereas Psa. fragilis and Psa. huashanica were unlikely to be donors to Leymus. Previous analysis of Acc1 exon sequence diversity and genealogical patterns also suggests that Psa. juncea may be the Ns-genome donor of Leymus species (Fan et al., 2009). In this study, the data for the three nuclear genes all showed that the Psathyrostachys species (Psa. juncea, Psa. fragilis, Psa. huashanica, and Psa. lanuginosa) did not group together but were scattered in one clade (Ns Clade) that included the Leymus species. Particularly, in both Pgk1 and GBSSI gene trees, Psa. juncea was basal to the other Psathyrostachys and Leymus species in the Ns Clade. These results were consistent with the suggestion that Psa. juncea is likely the ancestral Ns-genome donor to Leymus species. The geographic range of Psathyrostachys provides further support for this hypothesis: Psa. juncea is widely distributed in the former USSR, Mongolia, and northwestern part of China, and Psa. fragilis, Psa. huashanica and Psa. lanuginosa are restricted in distribution to some regions of central Asia (Yen and Yang, 2011). The non-monophyly of Psathyrostachys and its groupings with different Leymus species within the Ns clade suggest that additional Ns-containing alleles were introgressed into some Leymus

polyploids

by

recurrent

hybridization

with

different

diploid

Psathyrostachys species. Gene-tree incongruence associated with the origin of the Xm genome in Leymus The Xm genome is represented in all Leymus species. Despite decades of 27

intensive efforts, there are still uncertainties regarding the origin of the Xm genome of Leymus. In this study, Acc1 data showed that the Xm-type Acc1 sequences of Leymus were grouped with those sequences from Agropyron and Eremopyrum. In the phylogenetic tree inferred from the combined chloroplast sequences (matK + rbcL + trnL-F), the six Leymus species from America and the two species from East Asia (L. coreanus and L. komarovii) all formed one monophyletic clade (Xm Clade) that was distinct from the Ns clade (Ns Clade), which indicated that the Xm genome donor might serve as the maternal lineage during the polyploid speciation of these eight Leymus species. The MJ network further revealed that Agropyron was closely related to the Xm genome donor of these eight Leymus species. Both Pgk1 and GBSSI data suggested that all the Xmtype sequences of Leymus species did not group with any diploid Triticeae species but were sister to the Ns clade. Thus, the molecular phylogenies based on the Acc1, Pgk1, GBSSI, and combined chloroplast data in resolution of the Xm-genome lineage led to two apparently contradictory results, with one (the Acc1 and combined chloroplast data) placing the Xm-genome lineage as closely related to the P/F genome and the other (the Pgk1 and GBSSI data) splitting the Xm-genome lineage as sister to the Ns-genome donor (Table 3). Discordances among gene trees result from methodological artifacts (e.g., sampling error and/or a failure of gene evolutionary models) and the complex dynamics of the evolutionary processes in organisms (e.g., paralogy, hybridization, and/or incomplete lineage sorting) (Sota and Vogler, 2001; Cranston et al., 2009; Mason-Gamer, 2013; Betancur-R et al., 2014). Given sufficient taxa, resolving methodological artifacts places the focus on the model of gene evolution (Jian et al., 2008; Betancur-R et al., 2014). Theoretical predictions emphasize that slow28

evolving genes are individually unlikely to yield well-resolved and highly supported trees; however, collectively, they can be informative because slowevolving genes are expected to meet the assumptions of evolutionary models more closely than the complex dynamics of fast-evolving genes (which are otherwise subject to phylogenetic error and model misspecification issues) (Zhang et al., 2012; Betancur-R et al., 2014). To reduce incongruence between data sets, empirical evidence shows that slowly evolving genes, despite having fewer informative sites, provide higher resolution and support for ancient divergences and deep-level relationships than rapidly evolving genes (e.g., Jian et al., 2008; Zhang et al., 2012; Chen et al., 2015). In this study, nearly all of the Leymus species and the monogenomic genera accepted in genome-based classifications of the Triticeae were sampled for the phylogenetic analysis of each data set, despite the absence of several genome donors (G genome, Festucopsis; M genome, Aegilops comosa; U genome, Aegilops umbellutata; T genome, Aegilops mutica; N genome, Aegilops uniaristata; Xa genome, Hordeum marinum; Xu genome, Hordeum murinum). Thus, the current incongruences among gene-trees were unlikely to be the result of sampling error. The methodological effect of gene-tree discordances on the Xm-genome lineage depended on the models of gene evolution. Estimates of the average substitution rate in Acc1 and chloroplast sequences in Triticeae were 0.00189 and 0.00099 substitutions per site per MYA, respectively. Both Acc1 and chloroplast data suggested a close relationship between the P/F genome donors and the Xm-genome lineage. The Pgk1 and GBSSI genes were dated separately with 0.0021 and 0.0050 substitutions per site per MYA, indicating a faster evolutionary rate than that of the Acc1 and chloroplast data in Triticeae. Both Pgk1 and GBSSI data suggested that the Xm-genome lineage was sister to the Ns29

genome lineage. Given the current data, the resolution and support for ancestral divergence and the Xm-genome lineage in Leymus were higher with slowly evolving genes than with fast evolving genes. Several categories of biological factors, including paralogs, hybridization, and incomplete lineage sorting, are major evolutionary processes that can result in true incongruence among gene genealogies (Sota and Vogler, 2001; Cranston et al., 2009; Mason-Gamer, 2013; Betancur-R et al., 2014). Based on the tree-based method in detecting potential paralogous copies, orthologs/paralogs confusion between the Ns- and Xm-copy of Leymus was unlikely to cause topological incongruence in resolution of the Xm-genome lineage. In Acc1 tree, the Ns-type sequences of Leymus were grouped with the sequences of Psathyrostachys and were distinct from the Xm-copy sequences of Leymus. Despite several potentially paralogous copies within the Xm-type sequences of Leymus (LERIA2A and LERIA2B, as discussed above), our Acc1 phylogeny indicates that the Ns-type sequences of Leymus represents the orthologous alleles from the Ns genome Psathyrostachys, and the Xm-type are the orthologous Acc1 from the Xm genome donor. In both Pgk1 and GBSSI gene tree, the sequences of Psathyrostachys did not grouped with the Xm-type sequences of Leymus but were clustered with the Nstype sequences of Leymus, and Psa. juncea was sister to the rest of the Ns Clade. Although the Ns-type sequences from Psathyrostachys and Leymus were sister to the Xm-type sequences of Leymus, the sister node should be considered to represent a speciation event. Both hybridization and incomplete lineage sorting acting alone or in concert can generate discordance and therefore are the principal processes to explain the phylogenetic incongruence in Triticeae species (Mason-Gamer, 2013; Fan et al., 2013a). Because a history of incomplete lineage 30

sorting is not unexpected for diploid lineages (Mason-Gamer and Kellogg, 1996; Escobar et al., 2011; Fan et al., 2013b) or for polyploid species (Mason-Gamer, 2004; Fan et al., 2013a) within the Triticeae, the hypothesis of incomplete lineage sorting is a likely explanation for the incongruence. The hypothesis of hybridization is also a likely candidate to explain the conflict because sympatric distributions among Leymus, Psathyrostachys and Agropyron species placed these genera physically close to one another and provided opportunities for hybridization events. If the Xm was the Ns genome as indicated by the Pgk1 and GBSSI gene data, all the P-like sequences in the Acc1 and chloroplast gene data would be the outcome of incomplete lineage sorting of ancestral polymorphisms involving Psathyrostachys and Agropyron, recurrent hybridization between Leymus and Agropyron species, or inter-locus gene conversion following polyploidization. However, it is difficult to understand how the incomplete lineage sorting of ancestral polymorphisms involving Psathyrostachys and Agropyron was restricted to several Leymus species (i.e., the six species from America and L. komarovii and L. coreanus from East Asia), as shown in the chloroplast gene data. It is also unlikely that the imprinting of maternal Agropyron lineage in the putative Leymus entity resulted from recurrent hybridization between high ploidy Leymus species with “NsNs” genomes (served as paternal donor) and diploid P genome Agropyron species. Additionally, it is difficult to understand that interlocus gene conversion simultaneously occurred in all the sampled Leymus species. By contrast, if the genomic constitution of Leymus was NsXm (P) as indicated by the Acc1 and chloroplast gene data, the Ns2-type sequences in the Pgk1 and GBSSI gene data would all be the result of incomplete lineage sorting during the radiation 31

of Psathyrostachys or recurrent hybridization between Leymus species and distinct Psathyrostachys species. The following evidence provides support for the hypothesis of the NsXm (P) genome. First, extensive cytogenetic studies conclude that all Leymus species have an allopolyploid origin (Löve, 1984; Dewey, 1984; Wang et al., 1994; Wang and Jensen, 1994; Sun et al., 1995; Zhang et al., 2006; Yen et al., 2009). Second, although Psathyrostachys was polyphyletic in origin, additional Ns-containing alleles from distinct Psathyrostachys species could be introgressed into Leymus polyploids by homologous pairing following recurrent hybridization. Finally, recent phylogenomic analysis of chloroplast genomes in Leymus species conclude that the complete chloroplast genome of the Xm-genome donor served as maternal donor during the speciation of six Leymus species from America and two Leymus species from East Asia (Fan et al., unpublished data), confirming that the Xm genome has a different origin from that of the Ns genome. Because rapid change in genome structure due to nonreciprocal recombination following allopolyploidization can occur and has been documented in polyploid plants (Salmon et al., 2010; Lucas et al., 2014; Gill, 2015), it cannot be ruled out that the Xm-type (sister to the Ns-type) of both Pgk1 and GBSSI sequences might be an outcome of nonreciprocal recombination following allopolyploidization between Psathyrostachys and the ancestral Xm genome donor (Agropyron) or recurrent hybridization between Psathyrostachys and ancestral Leymus species. It has been presumed that the Ns-specific repetitive DNA sequences would homogenize the Leymus genomes (Anamthawat-Jónsson, 2014), and in this case, GISH has failed to detect the “lost” parental genome. Recent GISH analysis of Elymus tangutorum also found the H-genome translocations in the StY genomes (Yang et al., 2015). Thus, the Ns genome fragments likely translocated 32

predominantly and rapidly across genomes and homogenized the nuclear genomes of Leymus, converting the Xm-type donated by ancestral Agropyron into the Ns-like sequences. Because numerous wild species of the wheat tribe have been widely collected and recorded, but no putative Xm-genome diploids have been identified (Wang et al., 1994), it also cannot be ruled out that the Xm genome donor lineage may be extinct. Taken together, it is possible that there are multiple factors (incomplete lineage sorting, recurrent hybridization, nonreciprocal recombination, and extinct donors) all acting to complicate the origin of the Xm genome. To determine the origin of the Xm genome requires further studies using genomic-level approaches. Phylogenetic relationships among Leymus, Pseudoroegneria, Lophopyrum, and Australopyrum Attention has focused on the phylogenetic relationships among Leymus, Pseudoroegneria, and Lophopyrum since the St (Shiotani, 1968) and Ee (Sun et al., 1995) genomes were presumed as the Xm genome donor based on morphological characteristics and chromosomal pairing behavior. However, according to molecular cytogenetic evidence, the St and Ee genomes are rejected for the Leymus species (Zhang et al., 2006). Data from nuclear ITS (Fan et al., 2014) and DMC1 (Sha et al., 2016) sequences indicate that the Pseudoroegneria genome may contribute to some Leymus species. The analysis of chloroplast and mitochondria data by Sha et al. (2014) suggests that apart from Pseudoroegneria, Lophopyrum may be a contributor to the Leymus polyploids. In our Acc1 tree, the Acc1 sequence of Leymus mollis (Leymus mollis-1) was grouped with those sequences from all sampled Pseudoroegneria with high statistical support (100% PP and 100% BS), 33

and this group and Australopyrum and Lophopyrum formed a paraphyletic grade. In the Pgk1 tree, four Leymus species (L. mollis, L. salinus, L. paboanus, and L. racemosus), Australopyrum and Lophopyrum were in one clade (Clade C) but with weak statistical support (93% PP and 54% BS). Similar to the Acc1 and Pgk1 topologies, the GBSSI data also showed that four Leymus species (L. paboanus, L. arenarius, L. alacius, and L. karelinii) were clustered with Pseudoroegneria spicata and Australopyrum (96% PP) and two Leymus species (L. akmolinensis and L. karelinii) were grouped with Pse. strigosa (100% PP and 95% BS). These results indicated that the St, Ee, and W genomic lineages in Leymus represented the contributions of Pseudoroegneria, Lophopyrum and Australopyrum genomes, respectively. Recurrent hybridization and incomplete lineage sorting may explain such genealogical patterns involving Leymus, Pseudoroegneria, Lophopyrum and Australopyrum. Recurrent hybridization can explain the contribution of Pseudoroegneria and Lophopyrum genomes to Leymus because the sympatric distributions among Pseudoroegneria, Lophopyrum, and Leymus species in some regions of Eurasia placed these genera physically close to one another and provided the opportunity for hybridization events. However, this hypothesis cannot explain the contribution of the Australopyrum genome to Leymus because no sympatric distribution is recorded between Australopyrum native to Australia and Leymus native to Eurasia and America. We prefer the explanation that the contribution of Pseudoroegneria, Lophopyrum and Australopyrum to the nuclear lineage of Leymus was the result of incomplete lineage sorting of ancestral polymorphisms. Several studies demonstrate that a history of incomplete lineage sorting is the rule rather than the exception in Triticeae plants (Kellogg et al., 1996; Mason-Gamer, 2004; Fan et al., 2013a, 2013b; Sha et al., 2014, 2016). 34

Acknowledgements This work was funded by the National Natural Science Foundation of China (Nos. 30900087, 31200252, 31270243, 31470305), Special Fund for Protection and Utilization of Crop Germplasm Resources in China (No. 2016NWB030-02) the Science and Technology Bureau (No. 2060503) and Education Bureau of Sichuan Province (No. 15ZA0006). We are very grateful to American National Plant Germplasm System (Pullman, Washington, USA) providing the part seed materials for this study.

References Anamthawat-Jónsson, K., 2014. Molecular cytogenetics of Leymus: mapping the Ns genome-specific repetitive sequences. J. Syst. Evol. 52, 716-721. Barkworth, M.E., Atkins, R.J., 1984. Leymus Hochst. (Gramineae: Triticeae) in North America: taxonomy and distribution. Am. J. Bot. 71, 609-625. Betancur-R, R., Naylor, G.J.P., Orti, G.. 2014. Conserved genes, sampling error, and phylogenomic inference. Syst. Biol. 63, 257-262. Blair, C., Murphy, R.W., 2011. Recent Trends in Molecular Phylogenetic Analysis: Where to Next? J. Hered. 102, 130-138 Brassac, J., Blattne, F.R., 2015. Species-level phylogeny and polyploid relationships in Hordeum (Poaceae) inferred by next-generation sequencing and in silico cloning of multiple nuclear Loci. Syst. Biol. 64, 792-808. Chen, M.Y., Liang, D., Zhang, P., 2015. Selecting question-specific genes to reduce incongruence in phylogenomics: A case study of jawed vertebrate backbone phylogeny. Syst. Biol. 64, 1104-1120. 35

Cranston, K.A., Hurwitz, B., Ware, D., Stein, L., Wing, R.A., 2009. Species trees from highly incongruent gene trees in rice. Syst. Biol. 58, 489-500. Culumber, C.M., Larson, S.R., Jensen, K.B., Jones, T.A., 2011. Genetic structure of Eurasian and North American Leymus (Triticeae) wildryes assessed by chloroplast DNA sequences and AFLP profiles. Plant Syst. Evol. 294, 207-225. Darriba, D., Taboada, G. L., Doallo, R., Posada, D., 2012. jModelTest 2: more models, new heuristics and parallel computing. Nat. methods, 9, 772-772. Dewey, D.R., 1984. The genomic system of classification as a guide to intergeneric hybridization with the perennial Triticeae. In: Gustafson, J.P., (Eds). Gene manipulation in plant improvement. Columbia University Press, New York, pp. 209-279. Díaz-Pérez, A. J., Sharifi-Tehrani, M., Inda, L. A., Catalán, P., 2014. Polyphyly, geneduplication and extensive allopolyploidy framed the evolution of the ephemeral Vulpia grasses and other fine-leaved Loliinae (Poaceae). Mol. Phylogenet. Evol. 79, 92-105. Dong, Z.Z., Fan, X., Sha, L.N., Zeng, J., Wang, Y., Chen, Q., Kang, H.Y., Zhang, H.Q., Zhou, Y.H., 2013. Phylogeny and molecular evolution of the rbcL gene of St genome in Elymus sensu lato (Poaceae: Triticeae). Biochem. Syst. Ecol. 50, 322-330. Doyle, J.J., Doyle, J.L., 1987. A rapid DNA isolation procedure for small quantities of fresh leaf tissue. Phytochemi. Bull. 19, 11-15. Drummond, A.J., Rambaut, A., 2007. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol. Biol. 7, 214. Escobar, J.S., Scornavacca, C., Cenci, A., Guilhaumon, C., Santoni, S., Douzery, E.J.P., Ranwez, V., Glémin, S., David, J., 2011. Multigenic phylogeny and analysis of tree 36

incongruences in Triticeae (Poaceae). BMC Evol. Biol. 11, 181. Fan, X., Liu, J., Sha, L.N., Sun, G.L., Hu, Z.Q., Zeng, J., Kang, H.Y., Zhang, H.Q., Wang, X.L., Zhang, L., Ding, C.B., Yang, R.W., Zheng, Y.L., Zhou, Y.H., 2014. Evolutionary pattern of rDNA following polyploidy in Leymus (Triticeae: Poaceae). Mol. Phylogenet. Evol. 77, 296-306. Fan, X., Sha L.N., Dong, Z.Z., Zhang, H.Q., Kang, H.Y., Wang, Y., Wang, X.L., Zhang, L., Ding, C.B., Yang, R.W., Zheng, Y.L., Zhou, Y.H., 2013a. Phylogenetic relationships and Y genome origin in Elymus L. sensu lato (Triticeae; Poaceae) based on single-copy nuclear Acc1 and Pgk1 gene sequences. Mol. Phylogenet. Evol. 69, 919-928. Fan, X., Sha, L.N., Yang, R.W., Zhang, H.Q., Kang, H.Y., Zhang, L., Ding, C.B., Zheng, Y.L., Zhou, Y.H., 2009. Phylogeny and evolutionary history of Leymus (Triticeae; Poaceae) based on a single-copy nuclear gene encoding plastid acetyl-CoA carboxylase. BMC Evol. Biol. 9, 247. Fan, X., Sha, L.N., Yu, S.B., Wu, D.D., Chen, X.H., Zhuo, X.F., Zhang, H.Q., Kang, H.Y., Wang, Y., Zheng, Y.L., Zhou, Y.H., 2013b. Phylogenetic reconstruction and diversification of the Triticeae (Poaceae) based on single-copy nuclear Acc1 and Pgk1 gene data. Biochem. Syst. Ecol. 50, 346-360. Fan, X., Sha, L.N., Zeng, J., Kang, H.Y., Zhang, H.Q., Wang, X.L., Zhang, L., Yang, R.W., Ding, C.B., Zheng, Y.L., Zhou, Y.H., 2012. Evolutionary dynamics of the Pgk1 gene in the polyploid genus Kengyilia (Triticeae: Poaceae) and its diploid relatives. PLoS ONE 7, e31122. Gill, B.S., 2015. Wheat chromosome analysis. In: Ogihara Y, Takumi S, Handa H, (Eds.). Advances in wheat genetics: from genome to field. Springer Japan KK Press, Tokyo pp. 65-72. 37

Goloboff, P.A., Farris, J.S., Nixon, K.C., 2008. TNT, a free program for phylogenetic analysis. Cladistics 24, 774-786. Hilu, K.W., Alice, L.A., Liang, H.P., 1999. Phylogeny of Poaceae inferred from matK sequences. Ann, Mo. Bot. Gard. 86, 835-851. Huang, S.X., Sirikhachornkit, A., Faris, J.D., Su, X.J., Gill, B.S., Haselkorn, R., Gornicki, P., 2002. Phylogenetic analysis of the acetyl-CoA carboxylase and 3phosphoglycerate kinase loci in wheat and other grasses. Plant Mol. Biol. 48, 805-820. Huson, D.H., Bryant, D., 2006. Application of phylogenetic networks in evolutionary studies. Mol. Biol. Evol. 23, 254-267. Jakob, S.S., Blattner, F.R., 2006. A Chloroplast Genealogy of Hordeum (Poaceae): Long-Term Persisting Haplotypes, Incomplete Lineage Sorting, Regional Extinction, and the Consequences for Phylogenetic Inference. Mol. Biol. Evol. 23, 1602-1612. Jian, S., Soltis, P.S., Gitzendanner, M.A., Moore, M.J., Li, R., Hendry, T.A., Qiu, Y.L., Dhingra, A., Bell, C.D., Soltis, D.E., 2008. Resolving an ancient, rapid radiation in Saxifragales. Syst Biol. 57, 38-57. Kellogg, E.A., Appels, R., Mason-Gamer, R.J., 1996. When genes tell different stories: the diploid genera of Triticeae. Syst. Bot., 21,321-347. Kim, S.T., Sultan,

S.E., Donoghue, M.J.,

2008.

Allopolyploid

speciation

in Persicaria (Polygonaceae): Insights from a low-copy nuclear region. P. Natl. Acad. Sci. USA. 105, 12370-12375. Liu, Z.P., Chen, Z.Y., Pan, J., Li, X.F., Su, M., Wang, L.J., Li, H.J., Liu, G.S., 2008. Phylogenetic relationships in Leymus (Poaceae: Triticeae) revealed by the nuclear ribosomal internal transcribed spacer and chloroplast trnL-F 38

sequences. Mol. Phylogenet. Evol. 46, 278-289. Löve, À., 1984. Conspectus of the Triticeae. Feddes Repertorium 95,425-521. Lucas, S.J., Akpınar, B.A., Šimková, H., Kubaláková, M., Doležel, J., Budak, H., 2014. Next-generation sequencing of flow-sorted wheat chromosome 5D reveals

lineage-specific

translocations

and

widespread

gene

duplications. BMC Genomics 15, 1080. Marcussen, T., Sandve, S.R., Heier, L., Spannagl, M., Jakobsen, K.S., Pfeifer, M., The International Wheat Genome Sequencing Consortium, Jakobsen, K.S., Wulff, B.B.H., Steuernagel, B., Mayer, K.F.X., Olsen, O.-A., 2014. Ancient hybridizations among the ancestral genomes of bread wheat. Science 345, 1250092. Martin, D. P., Lemey, P., Lott, M., Moulton, V., Posada, D., Lefeuvre, P., 2010. RDP3: a flexible

and

fast

computer

program

for

analyzing

recombination.

Bioinformatics, 26, 2462-2463. Mason-Gamer, R.J., 2004. Reticulate evolution, introgression, and intertribal gene capture in an allohexaploid grass. Syst. Biol. 53, 25-37. Mason-Gamer, R.J., 2013. Phylogeny of a genomically diverse group of Elymus (Poaceae) allopolyploids reveals multiple levels of reticulation. PLoS ONE 8, e78449. Mason-Gamer, R.J., Kellogg, E.A., 1996. Testing for phylogenetic conflict among molecular data sets in the tribe Triticeae (Gramineae). Syst. Biol. 45, 524-545. Mason-Gamer, R.J., Orme, N.L., Anderson, C.M., 2002. Phylogenetic analysis of North American Elymus and monogenomic Triticeae (Poaceae) using three chloroplast DNA data sets. Genome 45, 991-1002. McMillan, E., Sun, G., 2004. Genetic relationships of tetraploid Elymus species and their genomic donor species inferred from polymerase chain reaction39

restriction length polymorphism analysis of chloroplast gene regions. Theor. Appl. Geneti. 108, 535-542. Minaya, M., Díaz-Pérez, A., Mason-Gamer R., Pimentel M., Catalán, P., 2015. Evolution of the beta-amylase gene in the temperate grasses: Non-purifying selection, recombination, semiparalogy, homeology and phylogenetic signal. Mol. Phylogenet. Evol. 91, 68–85 Nozaki, H., Maruyama, S., Matsuzaki, M., Nakada, T., Kato, S., Misawa, K., 2009. Phylogenetic positions of Glaucophyta, green plants (Archaeplastida) and Haptophyta (Chromalveolata) as deduced from slowly evolving nuclear genes. Mol. Phylogenet. Evol. 53, 872-880. Petersen, G., Seberg, O., 2004. On the origin of the tetraploid species Hordeum capense and H. secalinum (Poaceae). Syst. Bot. 29, 862-873. Pond, S.L.K., Frost, S.D., 2005. Datamonkey: rapid detection of selective pressure on individual sites of codon alignments. Bioinformatics, 21, 2531-2533. Pond, S.L.K., Murrell, B., Fourment, M., Frost, S.D., Delport, W., Scheffler, K., 2011. A random effects branch-site model for detecting episodic diversifying selection. Mol. Biol. Evol., 28, 3033–3043. Posada, D., Crandall, K.A., 1998. Modeltest: testing the model of DNA substitution. Bioinformatics 14, 817-818. Rambaut, A., Drummond, A.J., 2007. Tracer V1.4: MCMC trace analyses tool. URL http://beast.bio.ed.ac.uk/Tracer. Ronquist, F., Teslenko, M., Van Der Mark, P., Ayres, D.L., Darling, A., Höhna, S., Larget, B., Liu, L., Suchard, M.A., Huelsenbeck, J.P., 2012. MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. 40

Syst. Boil. 61, 539-542. Salmon, A., Flagel, L., Ying, B., Udall, J.A., Wendel, J.F., 2010. Homoeologous nonreciprocal recombination in polyploid cotton. New Phytol. 186, 123-134. Sha, L.N., Fan, X., Yang, R.W., Wang, X.L., Zhou, Y.H. 2009. Cladistic analysis of the genus Leymus (Triticeae: Poaceae) based on morphological data. J. Sichuan Agri. University 27, 6-13. Sha, L.N., Fan, X., Yang, R.W., Kang, H.Y., Ding, C.B., Zhang, L., Zheng, Y.L., Zhou, Y.H., 2010. Phylogenetic relationships between Hystrix and its closely related genera (Triticeae; Poaceae) based on nuclear Acc1, DMC1 and chloroplast trnL-F sequences. Mol. Phylogenet. Evol. 54, 327-335. Sha, L.N., Fan, X., Zhang, H.Q., Kang, H.Y., Wang, Y., Wang, X.L., Zhang, L., Ding, C.B., Yang, R.W., Zhou, Y.H., 2014. Phylogenetic relationships in Leymus (Triticeae; Poaceae): Evidence from chloroplast trnH-psbA and mitochondrial coxII intron sequences. J. Syst. Evol. 52, 722-734. Sha, L.N., Fan, X., Zhang, H.Q., Kang, K.Y., Wang, Y., Wang, X.L., Yu, X.F., Zhou, Y.H., 2016. Phylogeny and molecular evolution of the DMC1 gene in the polyploid genus Leymus (Triticeae: Poaceae) and its diploid relatives. J. Syst. Evol. 54, 250-263. Sha, L.N., Yang, R.W., Fan, X., Wang, X.L., Zhou, Y.H., 2008. Phylogenetic analysis of Leymus (Poaceae: Triticeae) inferred from nuclear rDNA ITS sequences. Biochem. Genet. 46, 605-619. Shiotani, I., 1968. Species differentiation in Agropyron, Elymus, Hystrix, and Sitanion. In: Oshima, C., (Ed.), Proceedings of the 12th International Congress of Genetics. The Science Council of Japan, Tokyo, pp.184. (Abstract) Sota, T., Vogler, A.P., 2001. Incongruence of mitochondrial and nuclear gene trees in 41

the carabid beetles Ohomopterus. Syst. Biol. 50, 39-59. Sun, G.L., Daley, T., Ni, Y., 2007. Molecular evolution and genome divergence at RPB2 gene of the St and H genome in Elymus species. Plant Mol. Biol. 64, 645655. Sun, G.L., Yen, C., Yang, J.L., 1995. Morphology and cytology of interspecific hybrids involving Leymus multicaulis (Poaceae). Plant Syst. Evol. 194, 83-91. Tamura, K., Stecher, G., Peterson, D., Filipski, A., Kumar, S., 2013. MEGA6: molecular evolutionary genetics analysis version 6.0. Mol. Biol. Evol. 30, 2725-2729. Thompson, J.D., Plewniak, F., Poch, O., 1999. A comprehensive comparison of multiple sequence alignment programs. Nucleic Acids Res. 27, 2682-2690. Triplett, J.K., Wang, Y.J., Zhong, J.S., Kellogg, E.A., 2012. Five nuclear loci resolved the polyploidy history of switchgrass (Panicum virgatum L.) and relatives. PLoS ONE 7, e38702. Wang, R.R.-C., von Bothmer, R., Dvorak, J., Fedak, G., Linde-Laursen, I., Muramatsu, M., 1994. Genome symbols in the Triticeae. In: Wang, R.R.-C., Jensen, K.B., Jaussi, C., (Eds.). Proceedings of 2nd International Triticeae Symposium. USDAForage and Range Laboratory, Logan pp. 29-34. Wang, R.R.-C., Jensen, K.B., 1994. Absence of the J genome in Leymus species (Poaceae: Triticeae): evidence from DNA hybridization and meiotic pairing. Genome 37, 231-235. Wang, R.R.-C., Lu, B.R., 2014. Biosystematics and evolutionary relationships of perennial Triticeae species revealed by genomic analyses. J. Syst. Evol. 52, 697705.

42

Wang, R.R.-C., Zhang, J.Y., Lee, B.S., Jensen, K.B., Kishii, M., Tsujimoto, H., 2006. Variations in abundance of 2 repetitive sequences in Leymus and Psathyrostachys species. Genome 49, 511-519. Yamamori, M., Nakamura, T., Endo, T.R., Nagamine, T., 1994. Waxy protein deficiency and chromosomal location of coding genes in common wheat. Theor. Appl. Genet. 89:179–184. Yang, C.R., Zhang, H.Q., Zhao, F.Q., Liu, X.Y., Fan, X., Sha, L.N., Kang, H.Y., Wang, Y., Zhou, Y.H., 2015. Genome constitution of Elymus tangutorum (Poaceae: Triticeae) inferred from meiotic pairing behavior and genomic in situ hybridization. J. Syst. Evol. 53, 529-534. Yang, R.W., Zhou, Y.H., Ding, C.B., Zheng, Y.L., Zhang, L., 2008. Relationships among Leymus species assessed by RAPD markers. Biol. Plantarum 52, 237241. Yen, C., Yang, J.L., Baum, B.R., 2009. Synopsis of Leymus Hochst. (Triticeae: Poaceae). J. Syst. Evol. 47, 67-86. Yen, C., Yang, J.L., 2011. Triticeae biosystematics, vol 4. Chinese Agricultural Press, Beijing, pp. 175-180. Zhang, H.B., Dvorak, J., 1991. The genome origin of tetraploid species of Leymus (Poaceae: Triticeae) inferred from variation in repeated nucleotide sequences. Am. J. Bot. 78, 871-884. Zhang, H.Q., Yang, R.W., Dou, Q.W., Tsujimoto, H., Zhou, Y.H., 2006. Genome constitutions of Hystrix patula, H. duthiei ssp. duthiei and H. duthiei ssp. longearistata (Poaceae: Triticeae) revealed by meiotic pairing behavior and genomic in situ hybridization. Chromosome Res. 14, 595-604.

43

Zhang, N., Zeng, L., Shan, H., Ma, H., 2012. Highly conserved low-copy nuclear genes as effective markers for phylogenetic analyses in angiosperms. New Phytol. 195, 923-937. Zhou, X.C., Yang, X.M., Li, X.Q., Li, L.H., 2010. Genome origins in Leymus (Poaceae: Triticeae): Evidence of maternal and paternal progenitors and implications for reticulate evolution. Plant Syst. Evol. 289, 165-179. Zimmer, E.A., Wen, J., 2013. Reprint of: Using nuclear gene data for plant phylogenetics: Progress and prospects. Mol. Phylogenet. Evol. 66, 539-550.

Figure legends Figure 1. Phylogenetic topology of the Acc1 sequences from Leymus and its affinitive species. (A) Fifty-percent majority-rule Bayesian tree inferred from the Acc1 sequences of Leymus and its affinitive species. The recombinants examined by RPD and pseudogenes with PTCs were excluded, and the sequences with neutral or nearly neutral selection detected by BS-REL model were included into BI analyses. Numbers with bold above nodes are MP bootstrap values ≥ 50%, and numbers below nodes are Bayesian posterior probability values ≥ 90%. The numbers after species names refer to the distinct homeologs of Acc1 gene. The capital letters in bracket indicate the genome type of the species. The clade including Leymus and its putative donor is highlighted by a collapsed clade. (B) Phylogenetic network inferred from the Acc1 sequences after removal of recombinants and potential pseudogenes detected by both PTCs and BS-REL 44

methods. The sequences from diploid species are highlighted with color box. Abbreviations of species names are listed in Table S1. The numbers after species names refer to the distinct homeologs of Acc1 gene. Bootstrap support values with moderate statistic support are indicated for the three main splits with different color.

Figure 2. Phylogenetic topology of the Pgk1 sequences from Leymus and its affinitive species. (A) Fifty-percent majority-rule Bayesian tree inferred from the Acc1 sequences of Leymus and its affinitive species. The recombinants examined by RPD and pseudogenes with PTCs were excluded, and the sequences with neutral or nearly neutral selection detected by BS-REL model were included into BI analyses. Numbers with bold above nodes are MP bootstrap values ≥ 50%, and numbers below nodes are Bayesian posterior probability values ≥ 90%. The numbers after species names refer to the distinct homeologs of Pgk1 gene. The capital letters in bracket indicate the genome type of the species. The clade including Leymus and its putative donor is highlighted by a collapsed clade. (B) Phylogenetic network inferred from the Pgk1 sequences after removal of recombinants and potential pseudogenes detected by both PTCs and BS-REL methods. The sequences from diploid species are highlighted with color box. Abbreviations of species names are listed in Table S1. The numbers after species names refer to the distinct homeologs of Pgk1 gene. Bootstrap support values with moderate statistic support are indicated for the three main splits with different color.

45

Figure 3. Phylogenetic topology of the GBSSI sequences from Leymus and its affinitive species. (A) Fifty-percent majority-rule Bayesian tree inferred from the GBSSI sequences of Leymus and its affinitive species. The recombinants examined by RPD and pseudogenes with PTCs were excluded, and the sequences with neutral or nearly neutral selection detected by BS-REL model were included into BI analyses. Numbers with bold above nodes are MP bootstrap values ≥ 50%, and numbers below nodes are Bayesian posterior probability values ≥ 90%. The numbers after species names refer to the distinct homeologs of GBSSI gene. The capital letters in bracket indicate the genome type of the species. The clade including Leymus and its putative donor is highlighted by a collapsed clade. (B) Phylogenetic network inferred from the GBSSI sequences after removal of recombinants and potential pseudogenes detected by both PTCs and BS-REL methods. The sequences from diploid species are highlighted with color box. Abbreviations of species names are listed in Table S1. The numbers after species names refer to the distinct homeologs of GBSSI gene. Bootstrap support values with moderate statistic support are indicated for the three main splits with different color.

Figure 4. Phylogenetic topology of the combined chloroplast haplotype (matK + rbcL + trnL-F) from Leymus and its affinitive species. (A) Fifty-percent majorityrule Bayesian tree inferred from the combined chloroplast sequences of Leymus and its affinitive species. Numbers with bold above nodes are MP bootstrap values ≥50%, and numbers below nodes are Bayesian posterior probability values ≥ 90%. Haplotypes in network are represented by circles. Numbers along

46

network branches indicate number of mutational changes between nodes. Abbreviations of species names are listed in Table S1.

47

Table 1. The estimated divergence dates for nodes labeled Ns and Xm and evolutionary rate for the sampled gene. Node Ns Node Xm Triticeae Mean rate Coefficient of variation

Acc1 Mean (95% C. I.) 8.61 MYA (7.49, 15.62) 9.12 MYA (7.55, 14.63) 16.04 MYA (14.04, 23.90) 1.89E-9 (1.71E-9, 2.07E-9) 0.90 (0.63-1.05)

Pgk1 Mean (95% C. I.) 6.21 MYA (6.19, 12.00) 9.01 MYA (5.28, 11.49) 16.00 MYA (13.20, 22.31) 2.05E-9 (1.82E-9, 2.26E-9) 0.92 (0.68-1.12)

GBSSI Mean (95% C. I.) 8.36 MYA (6.05, 10.49) 7.10 MYA (6.03, 11.95) 15.60 MYA (12.64, 20.07) 4.99E-9 (4.56E-9, 5.44E-9) 0.85 (0.66-0.98)

Combined cpDNA Mean (95% C. I.) 7.99 MYA (3.73, 10.87) 7.7 0 MYA (2.73, 10.24) 14.14 MYA (10.09, 19.46) 9.94E-10 (8.62E-10, 1.11E-9) 0.93 (0.65-1.16)

Table 2. Summary of recombination events identified in the exon region of the Pgk1 and GBSSI loci by two or more methods implemented in the program RDP4.

Major

Minor

Probability of significant tests for different detection methods MaxChi Chimaera SiScan 3Seq

LLEPT2

LSHAN1

--

--

3.51E-5

5.32E-4

LAREN2B

LPABO2B

LALAI3A

--

--

6.48E-6

9.90E-3

LCORE2A

LPSEU2A

Unkown

--

--

--

4.44E-2

LCORE2B

Unkown

LPSEU2A

--

--

--

3.80E-3

LLEPT1C

LSHAN1A

Unkown

--

--

--

4.66E-3

Sequence recombination

under

Parental sequence(s)

Pgk1 LCHIN1 GBSSI

48

Table 3. The closest diploid relatives of Leymus based on the Acc1, Pgk1, GBSSI, and combined chloroplast data. Clades with moderate to strong support are shown. BS = bootstrap support; PP = posterior probabilities. Acc1 Leymus (Ns copy) Leymus (Xm copy) Leymus (St/Ee/W-like copy)

Psathyrostachys Agropyron Eremopyrum Pseudoroegneria

BS/PP

Pgk1

BS/PP

GBSSI

BS/PP

76/100 92/100

Psathyrostachys --

95/100

Psathyrostachys --

73/99

100/100

Pseudoroegneria Lophopyrum Australopyrum

52/100

Pseudoroegneria Australopyrum

/90

49

Combined chloroplast DNA Psathyrostachys Agropyron

The least mutational steps 7 28

Supplement data Table S1: Leymus taxa and other related genera in Triticeae used in this study. Table S2: List of Leymus information with accession, voucher, ploidy, origin, and genome type of gene in this study. Table S3: Names, sequences, and references of primers used in this study Table S4: The optimal models and estimated substitution parameters for each dataset in phylogenetic analysis. Figure S1 Full 50% majority-rule Bayesian tree inferred from the Acc1 sequences of Leymus and its affinitive species. The recombinants examined by RPD and pseudogenes with PTCs were excluded, and the sequences with neutral or nearly neutral selection detected by BS-REL model were included into BI analyses. Numbers with bold above nodes are MP bootstrap values ≥ 50%, and numbers below nodes are Bayesian posterior probability values ≥ 90%. The numbers after species names refer to the distinct homeologs of Acc1 gene. The capital letters in bracket indicate the genome type of the species. Different color labeled the geographic information of Leymus species. Colors on the horizontal bars indicate negative selection (dark blue), neutral selection (gray), and strong positive selection (red).

Figure S2 A time-calibrated tree inferred from the Acc1 sequences of Leymus and its affinitive species using a Bayesian relaxed clock method in BEAST. Numbers with blue above nodes are Bayesian posterior probability values ≥ 90%. Different color labeled the geographic information of Leymus species.

50

Figure S3 Full 50% majority-rule Bayesian tree inferred from the Pgk1 sequences of Leymus and its affinitive species. The recombinants examined by RPD and pseudogenes with PTCs were excluded, and the sequences with neutral or nearly neutral selection detected by BS-REL model were included into BI analyses. Numbers with bold above nodes are MP bootstrap values ≥ 50%, and numbers below nodes are Bayesian posterior probability values ≥ 90%. The numbers after species names refer to the distinct homeologs of Pgk1 gene. The capital letters in bracket indicate the genome type of the species. Different color labeled the geographic information of Leymus species. Colors on the horizontal bars indicate negative selection (dark blue), neutral selection (gray), and strong positive selection (red).

Figure S4 A time-calibrated tree inferred from the Pgk1 sequences of Leymus and its affinitive species using a Bayesian relaxed clock method in BEAST. Numbers with blue above nodes are Bayesian posterior probability values ≥ 90%. Different color labeled the geographic information of Leymus species.

51

Figure S5 Full 50% majority-rule Bayesian tree inferred from the GBSSI sequences of Leymus and its affinitive species. The recombinants examined by RPD and pseudogenes with PTCs were excluded, and the sequences with neutral or nearly neutral selection detected by BS-REL model were included into BI analyses. Numbers with bold above nodes are MP bootstrap values ≥ 50%, and numbers below nodes are Bayesian posterior probability values ≥ 90%. The numbers after species names refer to the distinct homeologs of GBSSI gene. The capital letters in bracket indicate the genome type of the species. Different color labeled the geographic information of Leymus species. Colors on the horizontal bars indicate negative selection (dark blue), neutral selection (gray), and strong positive selection (red).

Figure S6 A time-calibrated tree inferred from the GBSSI sequences of Leymus and its affinitive species using a Bayesian relaxed clock method in BEAST. Numbers with blue above nodes are Bayesian posterior probability values ≥ 90%. Different color labeled the geographic information of Leymus species.

Figure S7 A time-calibrated tree inferred from the combined chloroplast sequences of 52

Leymus and its affinitive species using a Bayesian relaxed clock method in BEAST. Numbers with blue above nodes are Bayesian posterior probability values ≥ 90%. Different color labeled the geographic information of Leymus species.

53

54

55

56

57

58

59

Highlights (1) Additional Ns-containing alleles may be introgressed into some Leymus polyploids through recurring hybridization; (2) The phylogenetic incongruence suggested that the Xm genome of Leymus was closely related to the P genome of Agropyron; (3) Both Ns- and Xm-genome lineages severed as the maternal donor during the speciation of Leymus species; (4) The Pseudoroegneria, Lophopyrum and Australopyrum genomes may contribute to some Leymus species.

60