Association of candidate genes with heading date in a diverse Dactylis glomerata population

Association of candidate genes with heading date in a diverse Dactylis glomerata population

Accepted Manuscript Title: Association of candidate genes with heading date in a diverse Dactylis glomerata population Authors: Xinxin Zhao, B. Shaun ...

1MB Sizes 2 Downloads 94 Views

Accepted Manuscript Title: Association of candidate genes with heading date in a diverse Dactylis glomerata population Authors: Xinxin Zhao, B. Shaun Bushman, Xinquan Zhang, Matthew D. Robbins, Steven R. Larson, Joseph G. Robins, Aaron Thomas PII: DOI: Reference:

S0168-9452(17)30697-0 https://doi.org/10.1016/j.plantsci.2017.10.002 PSL 9680

To appear in:

Plant Science

Received date: Revised date: Accepted date:

28-7-2017 29-9-2017 3-10-2017

Please cite this article as: Xinxin Zhao, B.Shaun Bushman, Xinquan Zhang, Matthew D.Robbins, Steven R.Larson, Joseph G.Robins, Aaron Thomas, Association of candidate genes with heading date in a diverse Dactylis glomerata population, Plant Science https://doi.org/10.1016/j.plantsci.2017.10.002 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

1 Association of candidate genes with heading date in a diverse Dactylis glomerata population

Xinxin Zhaoab, B. Shaun Bushmana*, Xinquan Zhangb, Matthew D. Robbinsa, Steven R. Larsona, Joseph G. Robinsa, Aaron Thomasc

a

USDA-ARS Forage and Range Research Laboratory, 695 North 1100 East, Logan, UT 84322-

6300, USA b

Department of Grassland Science, Sichuan Agricultural University, Chengdu, China

c Utah

State University, Center for Integrated Biosystems, Logan, UT, USA

*Corresponding

author email: [email protected], tel. 1-435-797-2901, fax 1-435-797-

3075

Highlights    

The CO1, FT1, VRN-like MADS, and PPD1-like genes were amplified and sequenced in a diverse orchardgrass population that included cultivars and collections from around the world. The entries exhibited high allelic variation and little LD among loci indicative of perennial selfincompatible grasses. Equal-dose association modelling identified several significant polymorhpisms within the DgCO1 and DgFT1 genes. By modelling allele dosage for a tetraploid, additional loci were detected in the DgCO1, DgMADS, and DgPPD1-like genes.

Abstract Flowering occurs in response to cues from both temperature and photoperiod elicitors in coolseason, long-day forage grasses, and genes involved in sensing the elicitors and inducing downstream flowering responses have been associated with heading date and flowering time in

2 perennial forage grasses as well as cereal grasses. In this study we test for association between orchardgrass (Dactylis glomerata L.) heading date and polymorphisms in the CONSTANS (DgCO1), FLOWERING TIME (DgFT1), a VRN1 like MADS-box (DgMADS), and PHOTOPERIOD (DgPPD1-like) containing genes. A diverse population of 150 genotypes was measured for heading date across three years, genotyped, and candidate genes sequenced. Although pairwise population kinship values were generally low, the genotypes fit into a twogroup structure model. Linkage disequilibrium decayed rapidly, reaching r2 levels below 0.2 within the 500bp of each gene. SNPs significantly associated with heading date were detected in equal-dose and tetraploid dosage models.

The DgCO1 gene had the most significant

polymorphisms and those with the largest effects, while DgMADS had several significant polymorphisms in its first intron with smaller effects. These polymorphisms can be used for further validation, selection, and development of breeding lines of orchardgrass.

Keywords: Dactylis glomerata, CONSTANS, heading date, flowering time, vernalization, autotetraploid

3

1. Introduction Orchardgrass, or cocksfoot (Dactylis glomerata L.), is a major cool-season forage grass that supplies grazing, hay, and silage across the temperate regions of the world [1, 2]. Orchardgrass is highly palatable to livestock, exhibits early season growth, and is one of the most compatible perennial forage grasses when sown with perennial legumes, e.g. alfalfa (Medicago sativa L.) [24].

As an early flowering grass, most orchardgrass cultivars and germplasm switch from

vegetative to reproductive tillers before alfalfa and other legumes. Orchardgrass users therefore prefer varieties with later heading dates to better match legume maturity and maximize forage quality when cut for hay in mixed pastures. Additionally, the prolonged vegetative growth from later heading varieties can lead to increased forage quality [5], and adaptability to diverse climates [6]. Conversely, North American orchardgrass seed growers prefer and inadvertently select for early heading to maximize seed production and avoid additional irrigation or other inputs in the major seed growing areas [7]. With such major roles in plant growth, development, seed production, and forage quality, optimizing heading date is a continual need in orchardgrass selection and breeding. Flowering occurs in response to cues from both temperature and photoperiod elicitors in long-day plants such as forage grasses [6]. For temperature response, vernalization plays a key role to ensure that perennial forages do not flower prior to a prolonged cold period [8]. A key gene in this process, VERNALIZATION 1 (VRN1), a MADS box transcription factor similar to AP1 of Arabidopsis thaliana [9], expresses at a stable low level until cold temperatures begin in fall. Thereupon the expression increases in proportion to the duration of cold temperatures [10]. During springtime, as temperatures increase, VRN1 enters a positive feedback loop with the FLOWERING TIME (FT) gene to induce downstream flowering processes [8]. VRN1 is part of a MADS-box

4 gene family, of which at least three members show homology and expression profiles consistent with VRN1 activity in the diploid forage grass, perennial ryegrass (Lolium perenne L.)[10]. The flowering response is further modulated by photoperiod cues, ensuring that plants do not flower too early in spring in cold temperate climates. Central to photoperiod sensing is the CONSTANS gene (CO1), which is involved in plant circadian oscillations within Arabidopsis as well as grass species [11]. CO genes are expressed during the late afternoon, and the CO protein is stable under light. Under long days, when CO protein function coincides with light, expression of the downstream FT1 gene is induced. The FT gene product then travels through the phloem to the shoot apical meristem to induce expression of other floral transition genes [6]. Another gene involved in circadian diurnal expression patterns in Triticeae grasses is PHOTOPERIOD-1 (PPD1), a pseudo-response regulator that acts as a repressor of early morning circadian gene expression [12]. PPD1 appears to be necessary for long day induction of FT1 [12-14], and is a major gene affecting flowering time in cereals [13]. The majority of flowering time research for temperate grasses has occurred in annual or biennial cereals (e.g., rice, wheat, barley). Nevertheless, temperate perennial forage grasses such as orchardgrass show similar responses to environmental cues. The perennial forage grasses also require the inductive vernalization phase of short days and cold temperatures, followed by a transition where increasing temperature and longer day length initiate the inflorescences[6, 15]. Additionally, the results from cereals grasses have identified functional gene orthologs that affect flowering time, thus providing a bridge of candidate genes for perennial forage grasses. QTL analyses for heading date have occurred in the diploid perennial ryegrass, and detected significant loci on linkage groups 2, 4, 5, 6, and 7[16, 17]. The major QTL on LpLG4 was identified as VRN1 [16, 17] while QTLs with large effects on LpLG7 were identified as a CO1 orthologue [18-20] or

5 a FT1 orthologue [19, 21]. Later, the VRN1 and FT1 genes were examined in perennial ryegrass through association analyses, and further found to associate with heading date or flowering time variation [17, 21-23] Recently, genetic maps were also constructed in auto-tetraploid orchardgrass populations [7, 24]. Although unable to confidently infer many linkage groups from synteny to perennial ryegrass or cereal grass genetic maps, the orchardgrass maps allowed for identification of QTLs affecting heading date that were located in regions proposed to be near CO1, FT1, VRN1, and PPD1. Based on those findings and the previous temperate forage grass QTL studies, and using heading date as a measure of flowering time, we conduct an association analysis in a diverse orchardgrass population for the four candidate genes: CO1, FT1, a VRN1-like MADS-box, and PPD1-like. We use independent SSR markers to estimate population structure and kinship, and amplicon re-sequencing and variant detection for candidate gene associations. Additive and dominant models that include auto-tetraploid dosage differences among genotypes are considered and tested. With a goal to hone in on mutations selectable for marker assisted selection, we identify polymorphic markers within these four genes that are significantly associated with heading date.

6

2. Materials and Methods 2.1 Plant materials A diverse panel of orchardgrass germplasm and cultivars were sampled from North American, European, and Japanese breeding programs; and continental tetraploid accessions from central Asia and Eastern Europe (Supplemental Table 1). Eighty accessions or cultivars were selected that spanned the gamut of heading dates, with one to two plants sampled from each source, for a total of 150 genotypes. The population was transplanted at one meter spacing in a completely random design at Evan’s Research Farm near Logan, UT, USA (41°41´52² N 111°49´53² W; 1378 m above sea level; 432 mm annual precipitation; Nibley silty clay loam soil [fine, mixed, active, mesic Aquic Argixerolls]) and measured for three years after the establishment year. Border plants of the cultivar ‘Potomac’ were planted to remove border effects. Heading date phenotypes were recorded as the day, from January 1, when three inflorescences emerged from leaf sheaths in a plant. Spearman rank correlations were generated across years using R [25].

2.2 Population structure and kinship Tissue was harvested from each genotype and DNA extracted using the QIAGEN DNeasy (Germantown, MD, USA) kit following the manufacturer’s protocol. For population structure and kinship estimations, plant DNA samples were genotyped with 26 previously described orchardgrass EST-derived SSR markers [26] that mapped to all seven orchardgrass tentative linkage groups [7] (Supplemental Table 2), and resolved on an ABI3730 instrument (Life Technologies, Foster City, CA). Population structure was determined by Bayesian clustering implemented in STRUCTURE v2.3.4 [27], using the admixture model without population flags. The most appropriate structure was determined using the second order rate of change method [28]

7 implemented in Structure Harvester [29]. Kinship estimates were computed in the program SPAGeDi, using Ritland’s kinship matrix and an auto-tetraploid model [30]. For tetraploid general association modeling, the kinship matrix was converted into a positive semi-definite matrix by adding 10E-6 to the kinship values and the diagonal in a custom R script. Two-sample T-tests of heading date means between Structure groups were also conducted in R.

2.3 Amplicon sequencing For candidate gene primer development, orthologous gene sequences were obtained from perennial ryegrass (Lolium perenne L.) and barley (Hordeum vulgare L.). The perennial ryegrass gene sequences for LpVrn1 [31], LpFT3 [32], and LpCO1 [33] corresponded to the barley HvVrn1 [34], HvFT1 (KT199239.1), and HvCO1 [11], respectively (Table 1). The fourth candidate gene, PPD1 [12], had no sequence available in the Poeae tribe such that a barley sequence was used as a reference for that gene. Orchardgrass cDNA and genomic sequence libraries [26] were searched against these references using the BLASTn algorithm, to detect homologous orchardgrass sequences with preference for conserved transcription initiation and untranslated regions. For VRN1 and PPD1 candidate genes, for which no orchardgrass sequences hit, heterologous primers were designed based on conserved regions between both perennial ryegrass and/or barley reference sequences (Table 1). LongAmp Taq Polymerase (NEB, Ipswich, MA) was used to amplify the full-length or nearly full-length genes, following the manufacturer’s recommendations. The FX DNA library preparation kit (QIAGEN, Valencia, CA) was used to fragment, ligate barcodes, preamplify, and pool amplicons for sequencing. Pooled, fragmented amplicons were size selected for 400-600bp using a Blue Pippin (Sage Science, Beverly, MA), and sequenced on an Illumina MiSeq

8 (Illumina, San Diego, CA) with a v2 250.250 paired-end sequencing kit, at the Utah State University Center for Integrated Biosystems.

2.4 Sequence analysis and SNP calling Because of poor intron conservation between the candidate genes in orchardgrass and the reference sequences from perennial ryegrass or barley, especially in DgMADS, new composite orchardgrass reference sequences were generated for the association mapping. Sequence reads from 10 of the 150 orchardgrass samples were assembled individually for each candidate gene, using the CLC Workbench de novo assembly tool (QIAGEN, Aarhus, Denmark), with a word size of 26, a bubble size of 400, a length fraction of 0.5, and a similarity fraction of 80%. Consensus sequences from each of the 10 assemblies were aligned using Geneious version 10.2.1 (Biomatters Ltd, Auckland, New Zealand), to form a new orchardgrass reference sequence for each of the four genes.

These new orchardgrass reference sequences were then compared against the non-

redundant database of NCBI using the BLASTn algorithm to verify that their highest homology was to the intended reference gene, and were also compared to the original perennial ryegrass and barley reference sequences for exon location and homology. The four reference sequences are shown in Supplemental Table 3. Raw sequence reads of each genotype were then end-trimmed, quality-trimmed at a Phred error probably rate (Quality limit) of 0.01, and filtered for length greater than 100bp using the CLC Workbench. Sequence reads were then de novo assembled by sample using the same parameters as listed above, mapped to the new respective orchardgrass reference sequences, and locally realigned to optimize Indel positioning using the CLC Workbench Resequencing Tool. Consensus sequences of each gene for each sample, using degenerate nucleotide codes at heterozygous sites,

9 were deposited in the National Center for Bioinformatics Institute (NCBI) under accession numbers KY881048- KY881656.

2.5 Equal-dose association analysis In auto-tetraploid plants, there are three possible classes of heterozygotes for bi-allelic polymorphisms, varying in dosage of alleles (e.g., 1000, 1100, or 1110). For a gene association model that assumes an equal-dose hybrid for heterozygotes (1100), association mapping methods follow a diploid organism model and can take advantage of common association software programs. For this equal-dose model, the mapped and locally re-aligned sequence reads were first aligned into consensus sequences for each genotype; requiring a coverage of 10 reads and an assignment of heterozygous loci with degenerate nucleotide codes. These consensus sequences were aligned and exported from CLC Workbench for linkage disequilibrium (LD) assessments and association mapping using Tassel v5.0 [35]. The Tassel parameters required sequence information for at least 95% of genotypes, and a minor allele frequency of at least 0.03 to consider SNPs.

Linkage disequilibrium was estimated on polymorphisms within each gene by the

standardized disequilibrium coefficient. The r2 between SNPs was plotted against the pairwise physical distance between polymorphic sites to estimate the LD decay for sites, with heterozygous nucleotide positions changed to missing data. Association analysis between SNPs and heading date was carried out using a mixed linear model (MLM) implemented in TASSEL, and included the population structure(Q) and relative kinship matrix (K) on marker-trait associations for the heading dates across three years. Correction for multiple testing used a False Discovery Rate at P < 0.05, and the model (additive or dominant) with the lowest adjusted P-value is reported.

2.6 Tetraploid association analysis

10 For the tetraploid model association mapping, variants were detected within each genotype using a fixed tetraploid error model implemented in CLC Workbench. Briefly, nucleotide sites within each gene were first tested for the likelihood that a site differed from the reference nucleotide. For significant sites, the genotype (of five possible bi-allelic genotype doses in an auto-tetraploid) with the highest probability was selected. The 150 individual sample variant tables were formatted into a combined variant table using a custom R script, and combined with phenotype and kinship files into the GWASpoly R-package [36]. Alternations for GWASpoly input were required: (1) chromosome numbers were replaced with candidate gene names and variant positions within genes, (2) Indels were re-coded as ‘A’ for absence and ‘T’ for presence for association analyses, (3) multi-allelic SNPs were removed or re-coded as missing data if the additional alleles were sparse, and (4) the kinship matrix was converted to a positive semi-definite matrix as mentioned above. The variant table is shown as Supplemental Table 4. Both Q and K matrices were included in the association model, and additive and dominance models were tested. Correction for multiple testing used a False Discovery Rate at P < 0.05, and the model with the highest –log10(p) score is reported. Two-way epistasis was tested among significant loci from the equal-dose and tetraploid models. Detection was implemented in the web-based SHEsisPlus program [37] using a tetraploid model, and marker-interactions were deemed significant with an adjusted False Discovery Rate at P < 0.05.

3. Results 3.1 Population heading date and structure

11 To test four candidate genes for impact on heading date in orchardgrass, heading dates were measured in a diverse association mapping population representing breeding lines and cultivars from North America, Japan, and Europe, as well as public accessions from across the world (Supplemental Table 1). All genotypes were confirmed as auto-tetraploids with flow cytometry (data not shown). The heading dates ranged from 134 to 168 days from January 1 in 2013, from 132 to 174 days from January 1 in 2014, and 124 to 167 days from January 1 in 2015 (Fig. 1). The rank correlation of heading dates among the years ranged from 0.65-0.80 (P < 0.001). There was an abundance of early and mid-heading genotypes, with a trailing tail of later heading genotypes. Using SSR markers from Bushman et al. [26], the number of subpopulation groups (K) tested were from one to ten. The second order rate of change supported a two-group structure (Supplemental Fig. 1), which split the population into one group of 39 genotypes and another group of 111 genotypes (Supplemental Table 1). As a self-incompatible and highly heterozygous species [38], different plants from the same germplasm source often did not reside in the same Structure group (Supplemental Table 1). The average heading date across years of the smaller group was 144.5 days from January 1 while the larger group of genotypes encompassed circumboreal origins and a wider range of heading dates, with an average heading date across years of 139.7 days from January 1 (Fig. 1). Because of this significant structural difference in heading date (P < 0.01), the two-group structure was included in the association models. For kinship analysis, pairwise kinship values ranged from 0 – 0.43, with 75% of the pairwise kinship estimates were less than 0.05 and an average of 0.026 ± 0.005.

3.2 Candidate gene structure and identity

12 Amplicons of the four candidate genes encompassed nearly 14 kb of gene space (Table 1), and were obtained for all samples and for each gene. The 2,461 bp DgCO1 amplicon had the highest homology to the barley HvCO1 gene [11] and perennial ryegrass LpCO1 gene [33]; and included both exons, the intron, 155bp of 5’UTR and promoter regions, and 76bp of 3’UTR or downstream regions (Fig. 2). The 1,890bp DgFT1 amplicon showed the highest barley homology to HvFT1 [39] and perennial ryegrass homology to LpFT3 [21, 40]; and included the three exons, both intron sequences, 264bp of 5’UTR and promoter sequence, and 730bp of 3’ UTR and downstream regions. The 6,577bp MADS-like amplicon showed high homology to six exons and part of the first intron of the VRN1 MADS-box gene from perennial ryegrass and barley (Fig. 2); however, when compared against the perennial ryegrass MADS-like cDNAs reported in [10], the top BLASTn hit was to MADS2 rather than MADS1 (VRN1). The last two of the eight MADS-like exons were not amplified. The 3,055bp DgPRR1-like amplicon showed the highest homology to portions of a barley HvPPD1 gene [41] and other Triticeae PPD1 proteins. The orchardgrass DgPRR1-like amplicon used a forward primer homologous to sequence near the end of the second predicted exon of HvPPD1, and a reverse primer near the 3’ end of the eighth predicted exon. Thus it included portions of exons two and eight, and complete exons 3-7.

3.3 Linkage disequilibrium Linkage disequilibrium exhibited rapid decay in the four genes in this population. Using a logarithmic decay line, the LD decayed to an r2 < 0.2 within 100bp for all four genes (Supplemental Fig. 2). Both DgCO1 and DgFT1 exhibited small localized LD blocks within the gene (Fig. 3), resulting from 8bp and 36bp Indels, respectively. DgMADS and DgPPD1-like had scattered patterns of significant LD across the gene, and included distant sites within each gene (Fig. 3).

13

3.4 Equal-dose association analysis Using a diploid model, four polymorphisms were significantly associated with heading date at an adjusted (FDR P < 0.05) false discovery rate; from two of the four candidate genes (Table 2). Effects of these polymorphisms ranged from 1.2 to 6.3 days differences in heading date. Three of the significant SNPs were within DgCO1 (Fig. 2), two supported by an additive gene action model and the Indel at position 62 associated with dominance action. The SNP at position 189 caused a Glu/Lys substitution in exon 1 and was association with a 4.3 day difference in heading date (Table 2). The fourth significant polymorphism was a two base-pair Indel within the DgFT1 gene intron 1 at position 490-491 (Table 2, Fig. 2). This polymorphism was associated with a 1.2 day difference in heading date, with the presence of TT at the site dominant over the deletion. At a false discovery rate of P < 0.05, no significant SNPs or Indels were detected in the DgMADS or DgPPD1-like genes using the diploid models.

3.5 Tetraploid association analysis As sufficient sequence coverage was obtained for allele dosage modeling, association analysis was conducted using the GWASpoly R package and tested tetraploid additive and dominant models. Five polymorphisms within the DgCO1 gene were significantly associated (false discovery rate P < 0.05) with heading date (Table 2, Fig. 2). Three resided in the first exon while two were in the sole intron. The synonymous SNP at position 401 was associated with 7.5 to 10.9 day heading date differences, was detected over two years, and exhibited dominant effects of the T over the C nucleotide. The SNP at position 717 caused a Ser/Cys amino acid change, with the Cys exhibiting dominance, and was associated with a 3.3 day difference in heading date. A 33 bp Indel at position 879 also showed dominance effects, a 2.6 day difference in heading date, and

14 corresponded to the insertion of 11 amino acids near the end of exon 1. Both polymorphisms in the intron, at positions 1826 and 1855 exhibited additive action, where increasing doses of alleles corresponded to significant changes in heading date (Table 2). The DgMADS gene had six polymorphisms significantly associated with heading date using tetraploid models (Table 2; Fig. 2). Five of those polymorphisms resided in the first intron, and were associated with changes in heading date between 1.5 and 3.3 days. The sixth DgMADS polymorphism was in the second intron, a dominant acting Indel of three nucleotides. For the DgPPD1-like gene, a single Indel was detected as significant with effects of 2.76 days in heading date (Table 2, Fig. 2). This Indel was located in the fourth exon and caused an insertion of an Arg amino acid with the additive tetraploid model showing the strongest support. With five possible genotypes for each plant at each locus in an autotetraploid organism, epistatic interactions can have a low power of detection. Regardless, two-way interactions were tested using their tetraploid genotypes, among loci that showed significance. The SNP at DgCO1 position 189 showed a significant interaction with the DgCO1 Indel at position 62, in that heterozygous genotypes at both positions only occurred when the other position was homozygous (Fig. 4). Additionally, homozygous genotypes at both loci corresponded to the earliest heading dates.

4. Discussion Orchardgrass is a highly adaptable species, and has become circumboreal in its temperate climate distribution [42]. Although it is a relatively early flowering forage grass, in our study there was an approximately 40 day range of heading dates as well as differences in the distribution of heading dates among the three years (Fig. 1). However, the differences across years were primarily caused

15 by magnitude changes rather than rank changes due to the high and significant rank correlations. Those magnitude differences are consistent with other forage grass flowering time research, which indicates that the rank of heading dates and flowering times are relatively consistent across years and that the trait is highly heritable [43]. Included in this study were genotypes with a heading at or after the predominant forage legume grown in the western USA, alfalfa (Medicago sativa L.) [38], with which orchardgrass is often sown. A previous report of relationships among orchardgrass germplasm used some of the same genotypes included in this study, and found that a two-group structure model was the most supported grouping of the included entries [38]. Similarly, our study found a two-group model to fit the data, despite the addition of many wildland collections of orchardgrass from across the world (Supplemental Table 1). The smaller group of 39 genotypes comprised entries mainly from central or eastern Asian origins and with later average heading dates [38], and this bias necessitated that structure results be included in the mixed linear model for association analysis. Although this smaller group of genotypes could be selected as breeding material per se for later heading orchardgrass cultivars, we sought polymorphisms that could be used to select for, or homogenize, the later heading phenotype in elite cultivars and genotypes regardless of origin. Within the four candidate genes, DgCO1 had the lowest variation, with one SNP per 42bp on average. DgPPD1-like had one SNP per 32bp, DgMADS had one SNP per 22bp, and DgFT1 had one SNP per 15bp. Even with heterozygous loci changed to missing data (because of unknown phasing), all four genes still showed a decay in LD below r2 = 0.2 within several hundred nucleotides in each respective gene. These LD decay levels are similar to the range of decay values in other self-incompatible perennial forage grasses, such as perennial ryegrass [44-47] and meadow fescue (Festuca pratensis Huds.)[48]. This paucity of LD contributes to confidence that

16 the significant polymorphisms were less likely to be merely linked to other unknown significant polymorphisms. Interestingly, some portions of genes showing LD were related to variation in heading date. Within DgCO1 there was a 33bp significant Indel in the intron of DgCO1 that was present in varying doses in 144 of the 150 genotypes. Additionally SNPs at positions 168, 431, and 700 in DgMADS were all significantly associated with heading date and linked to each other with significant LD (Fig. 3). The presence of heterozygosity within autotetraploid plants provides opportunities and challenges in the movement toward marker assisted selection. As mentioned above, heterozygous loci within plants prevents LD assessments from fragmented amplicon sequences due to an inability to ascribe linkage phases. However, heterozygosity increases the power of detection in tetraploid modelling for association analysis (e.g., DgCO1 SNP at position 401) while concomitantly hampering strictly equal-dose modelling. For downstream molecular marker genotyping of associated polymorphisms, heterozygosity at a locus along with tetraploid modelling provides an ability to ascribe the parental dosages that would produce F1 progeny with 1:1 or 5:1 segregation ratios. The heterozygous and significant SNPs detected in the current study will allow for confirmation genetic mapping as well as development of a gene dosage series to better quantify the effects of these candidate gene polymorphisms on heading date. Of the four candidate genes tested, the DgCO1 gene had the greatest number of significant polymorphisms and the highest magnitude of effects on orchardgrass heading date in both equaldose and tetraploid models (Table 2). The SNPs at positions 189, 401, 717, and 1855 in particular were associated with three to ten day differences in heading date. The SNP at position 189 was positioned in the first exon and responsible for a Glu/Lys amino acid change. The SNP with the largest effects, at position 401, caused a synonymous mutation in the first exon that may be acting

17 through transcriptional regulation or linkage to other unidentified polymorphisms. The SNP at position 717 was downstream of the zinc-finger B-box2 domain of the first exon [11], and plants heterozygous for a Cys/Ser at that position were later in heading than those homozygous for Ser. The SNP at position 1855 had additive gene action, was detected in both equal-dose and tetraploid models, and was located in the sole intron near exon 2. DgMADS also had several significant SNPs and Indels for heading date, but with smaller effects and often in LD with each other. Our DgMADS was most homologous to a perennial ryegrass MADS2, which exhibited a similar expression profile as VRN1 and resided in a sister clade from a phylogenetic comparison [10]. It is possible that DgMADS plays a role in flowering time, possibly similar to or redundant with VRN1. Interestingly, all but one polymorphism was located in the first intron, a region reported to play a role in the regulation of the related VRN1 gene in other grasses [22]. Both DgFT1 and DgPPD1-like had a single significant polymorphism associated with heading date. The Indel in DgFT1 occurred in an intron, carried no currently ascribable reason to affect heading date, and was associated with relatively small effects. This paucity of significant polymorphisms and smaller effects suggests that florigen production or regulation through the FT1 gene play a smaller role in orchardgrass heading date than upstream photoperiod and vernalization reception or signaling. The DgPPD1-like Indel, however, was associated with the addition of an Arg amino acid and had greater effects on heading date than the DgFT1 Indel. Interestingly, the Triticeae PPD1 gene maps to LG2 [49], which is syntenic with perennial ryegrass LG2 [50]. QTLs for flowering time have been detected in that region of perennial ryegrass [16, 17, 19], and a homologous orchardgrass LG2 contained a QTL for heading date [7]. Although the DgPPD1-like gene structure reported herein was highly similar to homologous genes in Triticeae species, no

18 similar gene has been characterized for effects toward flowering time in temperate perennial grasses. It is possible that the effects of PPD1 in annual or biennial Triticeae are specific for that clade or for annual grasses, that the PRR gene family of which PPD1 is a member [12] includes an uncharacterized but functionally related member that is more active in cool-season perennial grasses like orchardgrass, or it is possible that characterization in temperate perennial grasses has merely lagged due to low relative homology and insubstantial genomic information available for the latter. This study examined two subpopulations within 150 diverse orchardgrass genotypes, and found minor evidence of kinship among the genotypes from SSR markers. Based on these data, linkage disequilibrium within the population was relatively low, which was characteristic of obligate out-crossing perennial grasses.

Accounting for both covariates, SNPs and Indels

significantly associated with heading date were found in all four genes, but predominantly in DgCO1 and DgMADS. The role of these genes in heading date and flowering time was suggested by previous genetic mapping in orchardgrass, as well as supported by previous forward genetic searches in other cool-season perennial grasses. The significant polymorphisms within these genes can be validated for further marker assisted selection or population development, with emphasis on those present across multiple years, across different models, or those with the largest effects.

Funding Sources This work was partially supported by the China Scholarship Council and the National Natural Science Foundation of China [NSFC 31372363].

Acknowledgements We would like to acknowledge the technical support of Lisa Michaels and Ramsey Buffam.

19

20

References [1] A. Stewart, N. Ellison, The genus Dactylis. Wealth of wild species: Role in plant genome elucidation and improvement, in, Springer: New York, 2011. [2] E. Van Santen, D.A. Sleper, Orchardgrass, Cool-season forage grasses, (1996) 503-534. [3] E.C. Brummer, K.J. Moore, Persistence of perennial cool-season grass and legume cultivars under continuous grazing by beef cattle, Agronomy Journal, 92 (2000) 466-471. [4] D. Chamblee, R. Lovvorn, The effect of rate and method of seeding on the yield and botanical composition of alfalfa-orchardgrass and alfalfa-tall fescue, Agronomy Journal, 45 (1953) 192-196. [5] P. Wilkins, M. Humphreys, Progress in breeding perennial forage grasses for temperate agriculture, The Journal of Agricultural Science, 140 (2003) 129-150. [6] S. Fjellheim, S. Boden, B. Trevaskis, The role of seasonal flowering responses in adaptation of grasses to temperate climates, Frontiers in plant science, 5 (2014) 431. [7] W. Xie, J.G. Robins, B.S. Bushman, A genetic linkage map of tetraploid orchardgrass (Dactylis glomerata L.) and quantitative trait loci for heading date, Genome, 55 (2012) 360-369. [8] F. Bouché, D. Woods, R.M. Amasino, Winter memory throughout the plant kingdom: different paths to flowering, Plant Physiology, (2016) pp. 01322.02016. [9] F.Y. Peng, Z. Hu, R.-C. Yang, Genome-wide comparative analysis of flowering-related genes in Arabidopsis, wheat, and barley, International journal of plant genomics, 2015 (2015). [10] K. Petersen, T. Didion, C.H. Andersen, K.K. Nielsen, MADS-box genes from perennial ryegrass differentially expressed during transition from vegetative to reproductive growth, Journal of plant physiology, 161 (2004) 439-447. [11] S. Griffiths, R.P. Dunford, G. Coupland, D.A. Laurie, The evolution of CONSTANS-like gene families in barley, rice, and Arabidopsis, Plant physiology, 131 (2003) 1855-1867. [12] A. Turner, J. Beales, S. Faure, R.P. Dunford, D.A. Laurie, The pseudo-response regulator PpdH1 provides adaptation to photoperiod in barley, Science (New York, N.Y.), 310 (2005) 10311034. [13] C. Campoli, M. Shtaya, S.J. Davis, M. von Korff, Expression conservation within the circadian clock of a monocot: natural variation at barley Ppd-H1 affects circadian expression of flowering time genes, but not clock orthologs, BMC Plant Biology, 12 (2012) 97. [14] S. Kitagawa, S. Shimada, K. Murai, Effect of Ppd-1 on the expression of flowering-time genes in vegetative and reproductive growth stages of wheat, Genes & genetic systems, 87 (2012) 161168. [15] D.M. Calder, Stage Development and Flowering in Dactylis glomerata L., Annals of Botany, 28 (1964) 187-206. [16] S. Byrne, E. Guiney, S. Barth, I. Donnison, L.A. Mur, D. Milbourne, Identification of coincident QTL for days to heading, spike length and spikelets per spike in Lolium perenne L, Euphytica, 166 (2009) 61-70. [17] L.B. Jensen, J.R. Andersen, U. Frei, Y. Xing, C. Taylor, P.B. Holm, T. Lübberstedt, QTL mapping of vernalization response in perennial ryegrass (Lolium perenne L.) reveals co-location with an orthologue of wheat VRN1, Theoretical and Applied Genetics, 110 (2005) 527-536. [18] I. Armstead, L. Skøt, L. Turner, K. Skøt, I. Donnison, M. Humphreys, I. King, Identification of perennial ryegrass (Lolium perenne (L.)) and meadow fescue (Festuca pratensis (Huds.))

21 candidate orthologous sequences to the rice Hd1 (Se1) and barley HvCO1 CONSTANS‐like genes through comparative mapping and microsynteny, New Phytologist, 167 (2005) 239-247. [19] I.P. Armstead, L.B. Turner, M. Farrell, L. Skøt, P. Gomez, T. Montoya, I.S. Donnison, I.P. King, M.O. Humphreys, Synteny between a major heading-date QTL in perennial ryegrass (Lolium perenne L.) and the Hd3 heading-date locus in rice, Theoretical and Applied Genetics, 108 (2004) 822-828. [20] L. Skøt, M.O. Humphreys, I. Armstead, S. Heywood, K.P. Skøt, R. Sanderson, I.D. Thomas, K.H. Chorlton, N.R.S. Hamilton, An association mapping approach to identify flowering time genes in natural populations of Lolium perenne (L.), Molecular Breeding, 15 (2005) 233-245. [21] L. Skøt, R. Sanderson, A. Thomas, K. Skøt, D. Thorogood, G. Latypova, T. Asp, I. Armstead, Allelic variation in the perennial ryegrass FLOWERING LOCUS T gene is associated with changes in flowering time across a range of populations, Plant Physiology, 155 (2011) 1013-1022. [22] J.R. Andersen, L.B. Jensen, T. Asp, T. Lübberstedt, Vernalization response in perennial ryegrass (Lolium perenne L.) involves orthologues of diploid wheat (Triticum monococcum) VRN1 and rice (Oryza sativa) Hd1, Plant molecular biology, 60 (2006) 481-494. [23] Å. Ergon, C. Fang, Ø. Jørgensen, T. Aamlid, O. Rognli, Quantitative trait loci controlling vernalisation requirement, heading time and number of panicles in meadow fescue (Festuca pratensis Huds.), Theoretical and applied genetics, 112 (2006) 232-242. [24] X. Zhao, L. Huang, X. Zhang, J. Wang, D. Yan, J. Li, L. Tang, X. Li, T. Shi, Construction of highdensity genetic linkage map and identification of flowering-time QTLs in orchardgrass using SSRs and SLAF-seq, Scientific reports, 6 (2016). [25] R Development Core Team, R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. 2013, in, 2014. [26] B.S. Bushman, S.R. Larson, M. Tuna, M.S. West, A.G. Hernandez, D. Vullaganti, G. Gong, J.G. Robins, K.B. Jensen, J. Thimmapuram, Orchardgrass (Dactylis glomerata L.) EST and SSR marker development, annotation, and transferability, Theoretical and applied genetics, 123 (2011) 119129. [27] D. Falush, M. Stephens, J.K. Pritchard, Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies, Genetics, 164 (2003) 1567-1587. [28] G. Evanno, S. Regnaut, J. Goudet, Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study, Molecular ecology, 14 (2005) 2611-2620. [29] D.A. Earl, STRUCTURE HARVESTER: a website and program for visualizing STRUCTURE output and implementing the Evanno method, Conservation genetics resources, 4 (2012) 359-361. [30] O.J. Hardy, X. Vekemans, SPAGeDi: a versatile computer program to analyse spatial genetic structure at the individual or population levels, Molecular ecology notes, 2 (2002) 618-620. [31] T. Asp, S. Byrne, H. Gundlach, R. Bruggmann, K.F. Mayer, J.R. Andersen, M. Xu, M. Greve, I. Lenk, T. Lübberstedt, Comparative sequence analysis of VRN1 alleles of Lolium perenne with the co-linear regions in barley, wheat, and rice, Molecular genetics and genomics, 286 (2011) 433447. [32] R.W. King, T. Moritz, L.T. Evans, J. Martin, C.H. Andersen, C. Blundell, I. Kardailsky, P.M. Chandler, Regulation of flowering in the long-day grass Lolium temulentum by gibberellins and the FLOWERING LOCUS T gene, Plant Physiology, 141 (2006) 498-507.

22 [33] J. Martin, M. Storgaard, C.H. Andersen, K.K. Nielsen, Photoperiodic regulation of flowering in perennial ryegrass involving a CONSTANS-like homolog, Plant molecular biology, 56 (2004) 159-169. [34] J. Dubcovsky, A. Loukoianov, D. Fu, M. Valarik, A. Sanchez, L. Yan, Effect of photoperiod on the regulation of wheat vernalization genes VRN1 and VRN2, Plant molecular biology, 60 (2006) 469-480. [35] P.J. Bradbury, Z. Zhang, D.E. Kroon, T.M. Casstevens, Y. Ramdoss, E.S. Buckler, TASSEL: software for association mapping of complex traits in diverse samples, Bioinformatics, 23 (2007) 2633-2635. [36] U.R. Rosyara, W.S. De Jong, D.S. Douches, J.B. Endelman, Software for Genome-Wide Association Studies in Autopolyploids and Its Application to Potato, The plant genome, 9 (2016). [37] J. Shen, Z. Li, J. Chen, Z. Song, Z. Zhou, Y. Shi, SHEsisPlus, a toolset for genetic studies on polyploid species, Scientific reports, 6 (2016) 24095. [38] W. Xie, B.S. Bushman, Y. Ma, M.S. West, J.G. Robins, L. Michaels, K.B. Jensen, X. Zhang, M.D. Casler, S.D. Stratton, Genetic diversity and variation in North American orchardgrass (Dactylis glomerata L.) cultivars and breeding lines, Grassland science, 60 (2014) 185-193. [39] S. Faure, J. Higgins, A. Turner, D.A. Laurie, The FLOWERING LOCUS T-like gene family in barley (Hordeum vulgare), Genetics, 176 (2007) 599-609. [40] L. Skøt, J. Humphreys, M.O. Humphreys, D. Thorogood, J. Gallagher, R. Sanderson, I.P. Armstead, I.D. Thomas, Association of candidate genes with flowering time and water-soluble carbohydrate content in Lolium perenne (L.), Genetics, 177 (2007) 535-547. [41] H. Jones, F.J. Leigh, I. Mackay, M.A. Bower, L.M. Smith, M.P. Charles, G. Jones, M.K. Jones, T.A. Brown, W. Powell, Population-based resequencing reveals that the flowering time adaptation of cultivated barley originated east of the Fertile Crescent, Molecular biology and evolution, 25 (2008) 2211-2219. [42] A. Stewart, N. Ellison, A. Joachimiak, The genus Phleum; wealth of wild species: role in plant genome elucidation and improvement, in, New York, Springer, 2010. [43] A. Jafari, H. Naseri, Genetic variation and correlation among yield and quality traits in cocksfoot (Dactylis glomerata L.), The Journal of Agricultural Science, 145 (2007) 599-610. [44] R.C. Ponting, M.C. Drayton, N.O. Cogan, M.P. Dobrowolski, G.C. Spangenberg, K.F. Smith, J.W. Forster, SNP discovery, validation, haplotype structure and linkage disequilibrium in fulllength herbage nutritive quality genes of perennial ryegrass (Lolium perenne L.), Molecular Genetics and Genomics, 278 (2007) 585-597. [45] Y. Xing, U. Frei, B. Schejbel, T. Asp, T. Lübberstedt, Nucleotide diversity and linkage disequilibrium in 11 expressed resistance candidate genes in Lolium perenne, BMC Plant Biology, 7 (2007) 43. [46] X. Yu, G. Bai, S. Liu, N. Luo, Y. Wang, D.S. Richmond, P.M. Pijut, S.A. Jackson, J. Yu, Y. Jiang, Association of candidate genes with drought tolerance traits in diverse perennial ryegrass accessions, Journal of experimental botany, (2013) ert018. [47] A. Fiil, I. Lenk, K. Petersen, C.S. Jensen, K.K. Nielsen, B. Schejbel, J.R. Andersen, T. Lübberstedt, Nucleotide diversity and linkage disequilibrium of nine genes with putative effects on flowering time in perennial ryegrass (Lolium perenne L.), Plant Science, 180 (2011) 228-237. [48] H. Shinozuka, M.L. Hand, N.O. Cogan, G.C. Spangenberg, J.W. Forster, Nucleotide diversity of vernalization and flowering‐time‐related genes in a germplasm collection of meadow fescue

23 (Festuca pratensis Huds. syn. Lolium pratense (Huds.) Darbysh.), Ecology and evolution, 3 (2013) 4415-4426. [49] A. Díaz, M. Zikhali, A.S. Turner, P. Isaac, D.A. Laurie, Copy number variation affecting the Photoperiod-B1 and Vernalization-A1 genes is associated with altered flowering time in wheat (Triticum aestivum), PLoS One, 7 (2012) e33234. [50] E.S. Jones, N.L. Mahoney, M.D. Hayward, I.P. Armstead, J.G. Jones, M.O. Humphreys, I.P. King, T. Kishida, T. Yamada, F. Balfourier, G. Charmet, J.W. Forster, An enhanced molecular marker based genetic map of perennial ryegrass (Lolium perenne) reveals comparative relationships with other Poaceae genomes, Genome, 45 (2002) 282-295.

24 Fig. 1. Histogram of heading dates for three years in a diverse orchardgrass association mapping population.

Fig. 2. The length, exon structure, and location of polymorphisms associated with heading date for gene amplicons re-sequenced in a diverse orchardgrass association mapping population.

Fig. 3. Heatmaps of significant linkage disequilibrium among polymorphisms for four candidate genes in the orchardgrass association mapping population. P-values are on upper diagonal while r2 values are on the lower diagonal. Asterisks represent SNP positions significantly associated with heading date.

25

Fig. 4. Mean and standard deviations of heading dates for the two-way interaction of SNP189 and Indel62 in the DgCO1 gene. Numbers in parenthesis correspond to the numbers of individuals in each genotype class.

26 Table 1. Homology based annotation of composite D. glomerata reference sequences, amplicon lengths, and primers used for sequencing in a diverse population of 150 orchardgrass genotypes. Candidate Gene

Top H. vulgare hit

Top L. perenne hit

Amplicon Length

Forward Primer

Reverse Primer

DgCO1

AF490468, CO1

AM489608, Hd1

2461

TCACATAGGTCGTGCATGAAA

AACTGCTTGCACTGGACCTT

DgFT1

DQ100327, FT1

FN993916, FT3

1890

CCTGGTCATCGTCAACCTCT

GGTTCATCCATCGCCCTTA

DgMADS

AK249833

JN969602, MADS2

6577

ACGAGATCTCCGTGCTCTG

TTCTCCTCCTGCAGTGACCT

DgPPD1-like

FJ515477, PRR

n/a

3055

CCTGCTCTGGTACCTCACCT

CCCTCATCAACTCCCCTAGA

27 Table 2. Polymorphisms significantly associated with heading date in a diverse D. glomerata population under diploid and tetraploid dosage models.

a

Year

Amplicon

Position

Marker Type

Modela

Reference Allele

Alternate Allele

Effect (days)

Y2013

DgCO1

62

Indel

EqD dominant

-

C

6.38

Y2015

DgCO1

189

SNP

EqD additive

G

A

4.27

Y2015

DgCO1

1855

SNP

EqD additive

G

A

1.22

Y2014

DgFT1

490-91

Indel

EqD (Alt)

-

T

-1.23

2014

DgCO1

401

SNP

4x dominant (Alt)

C

T

-7.54

2015

DgCO1

401

SNP

4x dominant (Alt)

C

T

-10.91

2014

DgCO1

717

SNP

4x dominant (Alt)

A

T

-3.26

2014

DgCO1

879-911

Indel

4x dominant (Alt)

TCCATGACTGCTGGGGT CAGTGCTTACACAGGT

-

-2.59

2015

DgCO1

1826

Indel

4x additive

T

-

-1.48

2015

DgCO1

1855

SNP

4x additive

A

G

-3.32

2015

DgMADS

96

SNP

4x additive

C

T

1.74

2015

DgMADS

108

Indel

4x additive

-

C

1.69

2015

DgMADS

168

SNP

4x additive

C

T

-1.70

2015

DgMADS

431

SNP

4x additive

A

T

-1.78

2015

DgMADS

700

SNP

4x additive

C

T

-1.67

2014

DgMADS

2773-2775

Indel

4x dominant (Alt)

CTC

-

3.98

2015

DgPPD1-like

425-426

Indel

4x additive

-

GCT

-2.76

dominant

The EqD corresponds to equal-dose hybrid, or diploid models, while 4x corresponds to tetraploid models with allele dosage considered.