Genome-wide association mapping of phenolic acids in tetraploid wheats

Genome-wide association mapping of phenolic acids in tetraploid wheats

Journal of Cereal Science 75 (2017) 25e34 Contents lists available at ScienceDirect Journal of Cereal Science journal homepage: www.elsevier.com/loc...

2MB Sizes 48 Downloads 126 Views

Journal of Cereal Science 75 (2017) 25e34

Contents lists available at ScienceDirect

Journal of Cereal Science journal homepage: www.elsevier.com/locate/jcs

Genome-wide association mapping of phenolic acids in tetraploid wheats Domenica Nigro a, Barbara Laddomada b, Giovanni Mita b, Emanuela Blanco c, Pasqualina Colasuonno d, Rosanna Simeone a, Agata Gadaleta d, Antonella Pasqualone e, Antonio Blanco a, * a

Department of Soil, Plant & Food Sciences, Plant Breeding Section, University of Bari, Via Amendola 165/A, 70126, Bari, Italy Institute of Sciences of Food Production, ISPA, CNR, Via Prov.le Monteroni, 73100, Lecce, Italy Institute of Biosciences and Bioresources, IBBR, CNR, Via Amendola 165/A, 70126, Bari, Italy d Department of Agricultural & Environmental Science, Research Unit of “Genetics and Plant Biotechnology”, University of Bari, Via Amendola 165/A, 70126, Bari, Italy e Department of Soil, Plant & Food Sciences, Food Science Section, University of Bari, Via Amendola 165/A, 70126, Bari, Italy b c

a r t i c l e i n f o

a b s t r a c t

Article history: Received 28 November 2016 Received in revised form 23 January 2017 Accepted 24 January 2017 Available online 20 March 2017

Phenolic acids are major components of cell walls in wheat and have important implications on human health as antioxidants with anti-tumor activity. Our objectives were to identify phenolic acid genes in wheat by single nucleotide polymorphisms (SNPs) detected within the coding sequences of candidate genes, and to identify chromosomal regions associated with single phenolic acids and total soluble phenolic compounds. A set of candidate genes involved in the biosynthesis of hydroxycinnamic acid derivatives were identified by comparative genomics. SNPs found in the coding sequences of six genes (PAL1, PAL2, C4H, C3H, COMT1 and COMT2) were used to determine their chromosomal location and accurate map position on two reference consensus linkage maps. The genome-wide association study (GWAS), based on genotyping a tetraploid wheat collection with 81,587 gene-associated SNPs, detected 22 quantitative trait loci (QTL) distributed on almost all durum wheat chromosomes. Two QTL for pcoumaric acid were coincident with the phenylalanine ammonia-lyase (PAL2) and p-coumarate 3hydroxylase (C3H) genes on chromosome arms 2AL and 1AL, respectively. The availability of candidate gene-based markers can allow elucidating the mechanism of phenolic acids accumulation in wheat kernels and exploiting the genetic variability of phenolic acids content for the nutritional improvement of wheat end-products. © 2017 Elsevier Ltd. All rights reserved.

Keywords: GWAS Phenylpropanoid pathway SNPs Phenolic acids genes Wheat

1. Introduction Common wheat (Triticum aestivum L. subsp. aestivum, genome AABBDD, 2n ¼ 6x ¼ 42) and durum wheat (Triticum turgidum L. subsp. durum (Desf.) Husnot, genome AABB, 2n ¼ 4x ¼ 28) are polyploid species grown globally for the production of bread, pasta, couscous and other local food products. The high content of carbohydrate and protein make wheat-end products important in the human diet. Wheat species contain a variety of bioactive components, such as tocols, sterols, alkylresorcinols, folates, phenolic acids and fiber components (Shewry et al., 2010). In particular,

* Corresponding author. E-mail address: [email protected] (A. Blanco). http://dx.doi.org/10.1016/j.jcs.2017.01.022 0733-5210/© 2017 Elsevier Ltd. All rights reserved.

phenolic acids have received great attention due to their healthpromoting and disease-preventing value (review by Verpoorte et al., 2002). These compounds are associated in plant species with numerous biological functions, including photosynthesis, nutrient uptake, protein synthesis, structural components, seed dormancy, biotic and abiotic stress responses (Bravo, 1998). In cereals, phenolic acids are particularly abundant and include hydroxycinnamic acid derivatives (e.g. p-coumaric, caffeic, ferulic, and sinapic acids) and hydroxybenzoic acid derivatives (e.g. phydroxybenzoic, vanillic, syringic, and gallic acids) (Li et al., 2008). Phenolic acids in the wheat grain are typically located in the bran and germ fraction as soluble or esterified to saccharides and other low molecular mass components (e.g. organic acids), and primarily as insoluble bound forms linked to cell wall polymers. The general phenylpropanoid pathway is involved in the biosynthesis of

26

D. Nigro et al. / Journal of Cereal Science 75 (2017) 25e34

hydroxycinnamic-acid derivatives (reviewed by Vogt, 2010). Briefly, the hydroxylation of phenylalanine by phenylalanine ammonialyase (PAL), cinnamic acid 4-hydroxylase (C4H) and p-coumaric acid hydroxylase (C3H) leads to the formation of hydroxycinnamates and their activated forms. Methylation of caffeic acid, catalyzed by caffeic acid 3-O-methyltransferase (COMT), yields ferulic and sinapic acids (Supplementary Fig. 1). The hydroxycinnamic-acid derivative group of phenolics includes ferulic acid (4-hydroxy-3-methoxycinnamic acid), which is the most abundant phenolic acid of wheat at all stages of development, representing 90% of the total. The concentration of ferulic acid increases steadily during grain development, prior to a 50% decrease during grain ripening. This phenolic acid arises from the metabolism of phenylalanine and tyrosine and it is ubiquitously present in plant cell walls (Bravo, 1998). The antioxidant properties of phenolic compounds are well known and extensively studied in a variety of plants (Rice-Evans et al., 1997). The presence of the CH]CHeCOOH group in its structure is considered to be the key for significantly higher antioxidative efficiency compared to that of hydroxybenzoic acids. Significant correlations also are known between the level of phenolic compounds and the antioxidant activity of whole-meal semolina in large sets of durum wheat samples (Pasqualone et al., 2014). Phenolic compounds display antioxidant activity as terminators of free radicals by donating a hydrogen atom (Bravo, 1998). Moreover, phenoxy radical intermediates are resonance stabilized; therefore, a new chain reaction is not easily initiated. Phenolic compounds are subject to the activity of polyphenol oxidases (PPO) (E.C. 1.14.18.1), a class of enzymes that catalyse the oxidation of phenolics to quinones in presence of oxygen. In bread wheat, PPO causes undesired discoloration of oriental noodles and dough browning in durum wheats (Taranto et al., 2012). Phenotypic variation in phenolic acid content is extensively studied in wheat germplasm and cultivars (Gawlik-Dziki et al., 2012; Laddomada et al., 2016; Li et al., 2008; Narwal et al., 2014; Pasqualone et al., 2014; Ragaee et al., 2012; Shewry et al., 2010; Yilmaz et al., 2015; Verma et al., 2008) and indicates that total and individual phenolic acid content are complex traits influenced by both genotype and environmental factors. The heritability estimates for total phenolic acids in winter and spring bread wheat genotypes was shown to be low due to the strong influence of environmental factors on the trait (Shewry et al., 2010). Actually, a recent study on tetraploid genotypes showed a higher ratio of genotypic variance to total variance both for individual and total phenolic acids, suggesting that it might be realistic to improve the trait in durum cultivars through appropriate breeding programs (Laddomada et al., 2016). The dissection of quantitative traits, such as phenolic acids, typically uses DNA-based molecular markers and biparental mapping populations. This approach requires developing specific segregating populations, and QTL detection is limited to loci segregating between crosses. Moreover, the detected QTL cover many cM and additional steps are required to narrow the QTL region or clone the genes. Linkage disequilibrium-based association mapping (AM) is a recent, alternative approach that uses a set of genotypes (germplasm accessions, breeding lines, cultivars) representing the products of hundreds of recombination cycles, thus providing higher resolution QTL mapping (Rafalski, 2010). The limitation of AM studies (genome-wide association study and candidate genes approaches) is the high frequency of false-positive and false-negative associations, which depend on population structure, relative kinship among individuals, and on multiple testing of thousands of markers. The genetic control of phenolic acids in cereals has been investigated in rice, barley and sorghum, with studies limited to determining the total phenolic acid content and not considering

individual phenolic compounds (Cai et al., 2015; Jin et al., 2009; Mohammadi et al., 2014; Rhodes et al., 2014). As far as we know, no study on QTL and genes coding for individual phenolic acids has been carried out in wheat. We recently explored the genetic variability of phenolic compounds in a core collection composed of 112 tetraploid wheat (T. turgidum L.) genotypes, about half of which were represented by durum cultivars and the remainder by landraces and wild types (Laddomada et al., 2016; Pasqualone et al., 2014). This core collection was derived from a larger set of 237 genotypes that were screened previously for genetic variability by Simple Sequence Repeat (SSR) and Diversity Arrays Technology (DArT) markers. The molecular data were submitted to cluster  et al., 2013), and a core analysis to group the genotypes (Laido collection was generated by picking genotypes from each cluster in order to maintain the genetic variability characterizing the full set. The present study was designed to identify candidate genes for the hydroxycinnamic acid derivatives in wheat and to investigate association between regions of the durum wheat genome and the accumulation of individual phenolic compounds as well as total soluble phenolic components. With this aim, we analyzed the 112 core collection of tetraploid wheats using a molecular marker array including 81,587 gene-associated SNPs (Wang et al., 2014). The importance of identifying genes and QTL for phenolic acid composition and content in wheat grain is based on the lack of information on the genetic basis of phenolic acid metabolism in wheat. The characterization of key genes involved in the biosynthetic pathway of phenolic acids could enable the improvement of wheat cultivars by traditional and molecular breeding, and by further advanced biotechnology, such as metabolic engineering. Durum wheat cultivars with higher phenolic acid content would lead to end-products with enhanced health-promoting properties. 2. Material and methods 2.1. Plant materials The set of 112 tetraploid wheat genotypes included 65 old and modern cultivars of durum wheat and various T. turgidum subspecies, namely subsp. turgidum (12 accessions), subsp. turanicum (8 accessions), subsp. polonicum (8 accessions), subsp. carthlicum (3 accessions), subsp. dicoccum (9 accessions) and subsp. dicoccoides (7 accessions). Plant material was grown in the experimental field of the University of Bari at Valenzano (Bari, Italy) in the 2011-12 and 2012-13 growing seasons in a randomized complete block design with three replicates and plots consisting of 1-m rows, 30 cm apart, with 50 germinating seeds per plot. During the growing season, 100 kg/ha of N was applied and standard cultivation practices were adopted. Plots were hand-harvested at maturity and grain samples from each plot were separately ground on a laboratory mill equipped with 1-mm sieve (Cyclotec Sample Mill, Tecator Foss, Hillerød, Denmark) to obtain wholemeal semolina. 2.2. Quantitative analysis of total phenolic compounds (soluble fraction) The total soluble phenolic compounds (TSPC, composed of free phenolic acids and phenolics bound to low-molecular-mass molecules) were extracted and determined as in Pasqualone et al. (2014). Briefly, 1 mL of methanol was added to 0.1 g wholemeal semolina, purged with a stream of nitrogen, kept on orbital shaker at 200 rev/ min for 2 h in the dark, and centrifuged at 7000  g for 5 min. The recovered supernatant was subjected to the Folin-Ciocalteu reaction and subsequently measured at 765 nm by a Cary 60 UVeVis spectrophotometer (Agilent Technologies Inc., Santa Clara, CA, USA). A calibration curve was built by methanol solutions of ferulic

D. Nigro et al. / Journal of Cereal Science 75 (2017) 25e34

acid (Sigma-Aldrich Chemical Co., St. Louis, MO, USA) at concentrations between 0.1 and 2 g/L (y ¼ 0.0007x þ 0.0089; r2 ¼ 0.9985). The results were expressed as mg ferulic acid equivalents (FAE) per g. All tests were in triplicate. 2.3. Quali-quantitative analysis of single phenolic acids (sum of soluble and insoluble fractions Phenolic acids were extracted from 250 mg wholemeal semolina, and analyzed by HPLC analysis as shown in Laddomada et al. (2016). After delipidation, 10 ml of an internal standard (3,5dichloro-4-hydroxybenzoic acid) were added to the residue prior to NaOH hydrolysis. After centrifugation, the supernatant was acidified to pH 2 with 12 M HCl, and phenolic acids were extracted into ethyl acetate. After evaporation, phenolic acids were dissolved in 80% methanol and analyzed using an Agilent 1100 HPLC equipped with a photodiode array detector. 2.4. DNA extraction and SNP genotyping Genomic DNA of each accession, isolated from freeze-dried leaf tissue, was diluted to 50 ng/mL and used for SNP genotyping with a wheat 90 K Infinium iSelect array containing 81,587 geneassociated SNP markers (Wang et al., 2014). The genotyping procedure was performed at TraitGenetics Laboratory, Gatersleben, Germany (http://www.traitgenetics.de) following the manufacturer's recommendations. The genotyping assays used an Illumina iScan reader and analyzed using Genome Studio software version 2011.1.

27

Laddomada et al. (2016) and Pasqualone et al. (2014), respectively, whereas SNP marker diversity and structure analysis of the tetraploid wheat collection are reported by Marcotuli et al. (2015). Mean values across replicates and two-years of data of p-coumaric, ferulic and sinapic acids, and mean values across replicates of TSPC were used in the genome-wide association study using TASSEL software version 5 (http://www.maizegenetics.net). Association between SNP markers and individual phenolic acids and TSPC was tested by using a) the general linear model (GLM) and the GLM including the Q-matrix derived from the principal component analysis (PCs) as implemented in TASSEL (GLM þ PCs) and b) the mixed linear model based on the kinship-matrix (MLM þ K) and the MLM based on both the K-matrix and the Q-matrix (MLM þ K þ PCs). SNP markers with >10% missing data points and markers with a minimum allele frequency of less than 10% were removed from the data matrix prior to GWAS. Significance of marker-trait associations (MTA) in the GWAS analysis was considered at threshold elog10(P)3.0 determined by the modified Bonferroni correction as implemented in the software Genstat. The consensus, high-density linkage maps described by Maccaferri et al. (2015) for durum wheat and by Wang et al. (2014) for common wheat were used as reference maps for the chromosome localization and map the position of phenolic candidate genes and SNP markers associated to QTL for phenolic acids. MapChart 2.2 software was used for the graphical representation of linkage groups and QTL. 3. Results 3.1. Identification and mapping of phenolic acid gene sequences in wheat

2.5. Identification of putative phenolic acid gene sequences The Arabidopsis thaliana phenylpropanoid pathway, reported in the Kyoto Encyclopedia of Genes and Genomes (KEGG) website (http://www.kegg.jp/), was used to identify the key enzymes involved in the major phenolic acid biosynthesis and retrieve the corresponding gene sequences. Orthologous genes for Brachypodium distachyon L. Beauv, Oryza sativa L., Hordeum vulgare L., Zea mays L. and three wheat species, in particular T. aestivum L., Aegilops tauschii Coss. and T. urartu Tumanian ex Gandilyan, were retrieved from the EnsemblePlant database (http://plants.ensembl.org/) by blasting each gene of Arabidopsis against each species genome. The accuracy of the retrieved sequences for Brachypodium, O. sativa and Z. mays were confirmed by keyword searches in the UniGene Cluster database at NCBI (https://www.ncbi.nlm.nih.gov/). For ease of reading, abbreviations of common names and genus names will be used as follows. Each plant species is indicated with a two letter prefix (followed by each gene symbol): At for A. thaliana, Bd for B. distachyon, Os for O. sativa, Zm for Z. mays, Hv for H. vulgare, Ta for T. aestivum, Ae for A. tauschii and Tu for Triticum urartu. All retrieved gene cDNAs were aligned by using the ClustalW method via Mega7 software. Phylogenetic analysis was carried out using the Neighbor-Joining method and a 1000-replication bootstrap test for significance. The tree was generated with Mega7 (http://www.ebi. ac.uk/Tools/phylogeny/) and modified with the FigTree program (http://tree.bio.ed.ac.uk/software/figtree/). Wheat phenolic acid gene sequences were blasted against the available dataset of SNP marker sequences reported by Wang et al. (2014), and markers aligned with 80% identity were considered as markers within the coding sequences of phenolic genes. 2.6. QTL detection Analysis of variance of individual phenolic acids and total soluble phenolic compounds (TSPC) were previously reported by

The A. thaliana phenylpropanoid pathway was used to identify the key enzymes involved in the biosynthesis of the major phenolic acids (p-coumaric, ferulic and sinapic acids) in wheat and retrieve the corresponding gene sequences. A. thaliana gene sequences were used as a query to retrieve orthologous genes for B. distachyon, O. sativa, H. vulgare, Z. mays, T. urartu, A. tauschii and T. aestivum from the EnsemblePlant database (http://plants.ensembl.org/). We found a single gene for C4H, C3H and F5H, whereas two copies were identified for the gene families COMT and PAL (Table 1). We reported the putative genes as COMT1 and COMT2 and PAL1 and PAL2, based on literature data (Ma et al., 2016) and known enzymatic pathways, which identify those as the enzymes primarily involved in hydroxycinnamic acid derivative biosynthesis (Vogt, 2010). All the orthologues clustered in the same clade of the phylogenetic tree and shared common conserved motifs in the cDNA sequences. The phylogenetic analysis revealed very high similarity among the C4H, C3H and F5H orthologue cDNAs, which were closer compared to those of the COMT and PAL gene families (Fig. 1). In fact, the analysis revealed a common branch that subsequently underwent a functional diversification, first originating as two branches differentiating F5H and, more recently, a second diversification between C4H and C3H. The BLASTn analysis of the seven hydroxycinnamic acids genes of wheat and the full wheat SNP dataset (Wang et al., 2014) identified a total of 50 SNP markers within six phenolic gene sequences (PAL1, PAL2, C4H, C3H, COMT1 and COMT2); no SNP was detected within F5H (Table 2). The physical location of the SNP sequences on wheat contigs (Wang et al., 2014) and their genetic position on the consensus durum (Maccaferri et al., 2015) and bread wheat maps (Wang et al., 2014), used as reference maps, indicated that the two paralogous genes for the phenylalanine ammonia-lyase enzyme were located on the homoeologous chromosomes of group 1 (PAL2) and group 2 (PAL1), and that the two paralogous genes for the caffeic acid 3-O-metyltransferase enzyme were located on the

TRIUR3_22522 TRIUR3_02596 TRIUR3_18982-T1 TRIUR3_23576 TRIUR3_02449 TRIUR3_32612-T1 TRIUR3_24298 F775_06188 F775_06189 F775_29972 F775_27986 F775_31276 F775_32449 F775_13391

T. aestivum

Traes_2BL_C051606EA,1 Traes_1BS_BD86C90A7.1 TRAES3BF006600010CFD_g Traes_1AL_A0B81FF76.1 Traes_6BS_881DA479E TRAES3BF065400030CFD_g TRAES3BF057900080CFD

Z. mays H. vulgare

MLOC_64900 MLOC_67067 MLOC_4708 MLOC_16097 MLOC_65378 MLOC_73233 MLOC_77998

O. sativa

Os02g0626400 OS04G0518400 OS05G0320700 OS05G0494000 OS08T0157500-01 OS08G0157500 OS10G0512400

B. distachyon

Ferulate-5-hydroxylase

Phenylalanine ammonia-lyase

Trans-cinnamate 4-monooxygenase p-Coumarate 3-hydroxylase Caffeic acid 3-O-methyltransferase

Enzyme Gene

PAL1 PAL2 C4H C3H COMT1 COMT2 F5H

A. thaliana

AT2G37040 AT3G53260 AT2G30490 AT2G40890 AT5G54160 AT1G33030 AT4G36220

BRADI5G15830 BRADI3G49260 BRADI2G53470 BRADI2G21300 BRADI2G02390 BRADI3G16530 BRADI3G30590

GRMZM2G118345 GRMZM2G029048_T01 GRMZM2G139874 GRMZM2G140817 AC196475.3_FG004 GRMZM5G814904 GRMZM2G100158

A. tauschii

T. urartu

D. Nigro et al. / Journal of Cereal Science 75 (2017) 25e34 Table 1 Ensembl entries of the fenolic acids metabolic genes PAL1, PAL2, C4H, C3H, COMT1, COMT2 and F5H retrieved in Arabidopsis thaliana, Brachypodium distachyon, Oryza sativa, Hordeum vulgare, Zea mays, Triticum aestivum, Aegilops tauschii and Triticum urartu. (EnsemblPlants website: http://plants.ensembl.org/).

28

Fig. 1. Phylogenetic tree of the hydroxycinnamic acid derivatives genes from Arabidpsis thaliana (At), Brachypodium distachyon (Bd), Zea mays (Zm), Hordeum vulgare (Hv), Oryza sativa (Os), Aegilops tauchii (Ae), Triticum uraru (Tu) and Triticum aestivum (Ta). Gene abbreviations: PAL1, phenylalanine ammonia-lyase 1; PAL2, phenylalanine ammonia-lyase 2; C4H, trans-cinnamate 4-monooxygenase; C3H, p-coumarate 3hydroxylase; COMT1, caffeic acid 3-O-methyltransferase 1; COMT2, caffeic acid 3-Omethyltransferase 2; F5H, ferulate-5-hydroxylase.

homoeologous chromosomes of group 3 (COMT2) and group 6 (COMT1). C4H was located on group-3 chromosomes and C3H on the long arm of the group 1 chromosomes. The genetic position of 1e2 homoeologous loci for each gene on the reference consensus durum and bread wheat maps is reported in Table 2 and illustrated in Fig. 2. 3.2. Detection of QTL for phenolic acids by GWAS The genetic variability of p-coumaric, ferulic and sinapic acids and total soluble phenolic compounds in the durum wheat core collection considered in the present study was previously investigated by Laddomada et al. (2016) and Pasqualone et al. (2014). The variation for phenolic compounds was explained by significant effects of genotype, year, and the year  genotype interaction. The normal distribution pattern and the large variation observed for the traits suggested a polygenic control. The descriptive statistics for pcoumaric, ferulic and sinapic acids are shown in Table 3. The durum wheat core collection was genotyped by using a 90 K iSelect array containing 81,587 SNPs (Wang et al., 2014). After a first screening, we removed failed and monomorphic markers, and SNPs with more than 10% missing data or with a minor allele frequency less than 0.10. Unmapped markers also were not considered for further analyses. As a result, a total of 13,639 SNPs mapped in the durum consensus map (Maccaferri et al., 2015) were retained for the present GWAS analysis. Marker-trait associations were determined by four statistical models: the general linear model (GLM), the GLM þ Q model taking into account the population structure as determined by the principal component analysis (PCs) and the

D. Nigro et al. / Journal of Cereal Science 75 (2017) 25e34

29

Table 2 Chromosome location and map position of the identified hydroxycinnamic acid derivatives genes on the durum (Maccaferri et al., 2015) and bread wheat (Wang et al., 2014) consensus maps. Gene

PAL1

PAL2

Enzyme

Phenylalanine ammonia-lyase

Phenylalanine ammonia-lyase

C4H

Trans-cinnamate 4-monooxygenase

C3H

p-Coumarate 3-hydroxylase

COMT1

COMT2

a

Caffeic acid 3-O-methyltransferase

SNP name

Excalibur_rep_c105249_330 Kukri_rep_c106373_383 RAC875_rep_c112249_262 Excalibur_rep_c107431_108 Ra_c34214_1320 Excalibur_c34913_743 Excalibur_c11476_159 Excalibur_c34913_88 Ex_c34913_558 Excalibur_c108058_104 Excalibur_c34913_286 BobWhite_c15765_325 Excalibur_c108058_281 Excalibur_c108058_71 RAC875_rep_c113285_345 wsnp_Ex_rep_c105551_89940311 Excalibur_c114791_328 Excalibur_rep_c101770_75 tplb0048a09_2378 tplb0048a09_2106 RAC875_c3476_736 RAC875_c29297_122 RAC875_c3476_106 RAC875_c68501_380 Excalibur_rep_c103134_980 Excalibur_c34712_295 Excalibur_c55964_227 Excalibur_s110087_118 Excalibur_s110087_291 RAC875_c47463_231 RAC875_c47137_112 Kukri_c25539_790 Ku_c25539_389 RAC875_c47463_213 RAC875_c66785_71 TA004038-0552 Tdurum_contig30790_290 wsnp_BE445113A_Ta_2_2 RFL_Contig3919_1348 GENE-0484_294 GENE-0484_382 BobWhite_c43135_397 BobWhite_c43135_397 BobWhite_c43135_430 tplb0021f23_738 tplb0021f23_305 RFL_Contig3495_534 RAC875_c6578_645 RFL_Contig3495_261

SNP ID

IWB30107 IWB49172 IWB62135 IWB30264 IWB51885a IWB25530 IWB21894 IWB25531 IWB20546 IWB21736 IWB25529 IWB834 IWB21737 IWB21738 IWB62201 IWA5141 IWB21895a IWB29755 IWB74899 IWB74898 IWB56994 IWB56315 IWB56993 IWB60186 IWB29905 IWB25490 IWB27616 IWB31523 IWB31524 IWB58364 IWB58317a IWB43141 IWB38956 IWB58363 IWB60044 IWB65774 IWB70108 IWA163 IWB64537 IWB31911 IWB31912 IWB3282 IWB3282 IWB3283 IWB74101 IWB74099 IWB64375.1 IWB64375.2 IWB59978 IWB64374

FSWC contig

2AS_5244490 2AL_6384500 2AL_6384500 2AL_6384500 2AL_6426374 2AL_6385501 2AL_6385501 2AL_6385501 2AL_6385501 2AL_6385501 2AL_6385501 2AL_6385501 2AL_6385501 2AL_6385501 2BS_5242716 2BL_8085347 2BL_8085347 2BL_8085347 2BL_8062131 2BL_8062131 2BL_8029675 2BL_8029675 2BL_8029675 2BL_8029675 2BL_8029675 2DL_9841349 1BS_3481671 1BS_3481671 1BS_3481671 1DS_1884488 1BS_3468038 1DS_1884487 1DS_1884487 1DS_1884488 1DS_1884488 3B_10512033 3B_10512033 1AL_3977384 1AL_3977384 1DL_933550 1DL_933550 6BS_2955192 6BS_2955192 6BS_2955192 3B_10655791 3B_10655791 3B_10655791 3B_10655791 3B_10655791

Wheat map position Chrom

Bread map

Durum map

2A 2A

102.0

2B 2B

104.8 104.8

116.0

1B 1B

60.6 60.6

33.7

3B

67.5

66.7

1A 1A

82.9 84.0

79.9

6A 6B

0.4

3B 3B 3B 3D

144.7 144.7 144.7161.2

131.1

0.1 7.2 209.1 209.1

Comigrating marker.

mixed linear model (MLM) incorporating the K matrix (MLM þ K) and the K and Q matrices (MLM þ K þ PCs) to take into account the confounding effects of both population structure and relative kinship to minimize type-1 errors (false-positive associations). The evaluation of the four MTA models for all the analyzed phenolic traits by inspection of Q-Q plots (Supplementary Fig. 2) and Manhattan plots (not reported) indicated significant deviations of observed -log10(P) values from the expected -log10(P) distributions for the GLM and GLM þ PCs models, whereas the closer observed and expected distributions of the -log10(P) values in the MLM þ K and MLM þ K þ PCs models suggested the reduction of potential spurious MTAs. The last two models produced similar results, and the MLM þ K þ PCs model was definitively used in the GWAS analysis. The GWAS, based on mean values of individual phenolic acids across two environments, detected ten significant QTL for p-

coumaric acid on chromosomes 1A, 2A (two), 2B (two), 3A, 3B (two), 5B and 7A (Table 4 and Fig. 2). Interestingly, the QTL for pcoumaric acid on chromosome arm 1AL (closest marker IWB52651 at 73.2 cM) was found to be in the same QTL interval as marker IWA163 (significant at sub-threshold -log10(P) > 2.9) located within the coding sequences of the C3H genes. Moreover, the closest marker (IWB9073) to the QTL on 2AL at 131.7 cM was found to be 0.6 cM from marker IWB25530, which is located within the PAL1 coding sequence. The phenotypic variation explained (R2) by each of these markers ranged from 10.5% to 13.9%. Five significant associations were detected for ferulic acid and localized on chromosomes 2B, 3A, 4B, 5B and 6A, each explaining 11.0e15.9% of the phenotypic variation. For sinapic acid, two QTL were found on chromosomes 2A and 4B that explained 12.5% and 11.8%, respectively, of the phenotypic variation. Five consistent QTL for TSPC were detected on chromosomes 1B, 2B, 5A, 5B and 7A. The

30

D. Nigro et al. / Journal of Cereal Science 75 (2017) 25e34

D. Nigro et al. / Journal of Cereal Science 75 (2017) 25e34

31

Fig. 2. (continued).

Table 3 Mean, standard deviation (SD), ranges and coefficient of variation (CV%) of individual phenolic acids (mg/g dry matter) and total soluble phenolic compounds (TSPC, mg g-1 as ferulic acid equivalents) in a tetraploid wheat collection grown at Valenzano (Bari, Italy).

Mean SD Min Max CV(%)

p-Coumaric acid

Ferulic acid

Sinapic acid

TSPC

25.9 10.1 11.6 81.9 38.9

597.9 110.8 363.7 938.2 18.5

122.3 35.8 52.5 234.9 29.2

2.1 0.3 1.3 3.2 15.3

phenotypic variation explained (R2) by each of these markers ranged from 10.6% to 12.6%. 4. Discussion The phenylpropanoid biosynthetic pathway has been widely studied in model plants and crop species because of their important role as derived compounds in photosynthesis and nutrient uptake as well as in biotic and abiotic stress responses (Vogt, 2010).

Numerous studies reported the chemical, biological, agricultural and pharmacological characterization of individual components and total phenolic acid content (see review by Verpoorte et al., 2002). Phenolic acids also have received increasing attention because of their antioxidant activity and free radical scavenging ability, both with important roles in degenerative disease prevention. A number of breeding and biotechnological programs were recently undertaken for the nutritional improvement of crops that are the basis of human and animal feed, such as rice, maize and wheat (Shewry et al., 2010). Wheat phenolic acids also are important because their oxidation by different enzymes could confer an undesirable brown color to some wheat end-products, such as pasta, couscous and noodles (Taranto et al., 2012). Despite their importance for animal and human health, few studies have been carried out on the inheritance and accumulation of these bioactive compounds in cereal grain. Knowledge on the genetic control of phenolic acids is essential for conventional and molecular breeding programs aimed at improving the nutritional properties of raw materials and derived food products. Several studies on the phenotypic variability of phenolic acid content in wheat (Gawlik-Dziki et al., 2012; Laddomada et al., 2016;

Fig. 2. Schematic representation of the A- and B-genome chromosomes of the durum consensus linkage map (Maccaferri et al., 2015) with map positions of candidate phenolic acid genes and QTL for hydroxycinnamic acid derivatives. Each chromosome map is represented by the first and last SNP marker, and by a SNP marker approximately every 20 cM. SSR markers are inserted every 20 cM for comparing the consensus SNP map with published SSR-based maps. Markers are indicated on the right side and cM distances on the left side of the bar. QTL are represented by bars on the right of each chromosome bar. QTL names indicate the phenolic compound; the closest SNP marker is indicated in red. Phenolic acid genes are indicated after the corresponding SNP located within the gene sequence (underlined in blue). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

32

D. Nigro et al. / Journal of Cereal Science 75 (2017) 25e34

Table 4 SNP markers significantly associated (log10(P)3) with individual phenolic acids and total soluble phenolic acids by GWAS (model MLM þ K þ PCs) in a tetraploid wheat collection. Chromosome and map position from Maccaferri et al. (2015) and -log10 (P) values are reported for each marker. SNP marker

IWB52651 IWB30243 IWB74467 IWB9073a IWB13648 IWB40211 IWB25841 IWB24230 IWB24230 IWB66111 IWB48393 IWB59103 IWB49810 IWB69587 IWB9610 IWB12115 IWB43085 IWB35584 IWB49598 IWB36794 IWB73688 IWB34460

Chrom. arm

1AL 1BL 2AS 2AL 2AL 2BL 2BL 2BL 2BL 3AS 3AS 3BL 3BL 4BS 4BS 5AL 5BS 5BL 5BL 6AL 7AS 7AL

Position (cM)

73.2 98.8 9.4 131.7 146.5 97.2 131.2 147.4 148.0 51.5 68.0 82.5 181.2 28.5 37.7 91.4 15.8 44.0 107.6 72.6 61.6 123.1

p-coumaric acid

Ferulic acid

log10 (P)

R2

3.4

13.3

3.3 2.9

12.3 10.5

3.6

13.9

3.0

11.2

3.1

11.5

3.3 3.0

12.6 11.2

log10 (P)

Sinapic acid R2

log10 (P)

3.3

3.0

11.0

4.0

15.9

3.2

12.3

3.0

3.1

log10 (P)

R2

3.2

11.3

3.3

11.8

3.0 3.5

10.6 12.6

3.0

10.8

12.5

11.8

11.6 3.1 3.3

3.3

TSPC R2

12.1 12.5

12.2

R2 ¼ Phenotypic variation (%). a SNP within the coding sequences of the gene PAL2 significantly associated at the sub-threshold log10(p) ¼ 2.9.

Li et al., 2008; Narwal et al., 2014; Ragaee et al., 2012; Shewry et al., 2010; Yilmaz et al., 2015; Verma et al., 2008) indicated that the most abundant phenolic compounds are ferulic, p-coumaric and sinapic acids. These hydroxycinnamic acid derivatives are produced directly in the first steps of the phenylpropanoid biosynthetic pathway by the catalytic activities of the enzymes PAL, C4H, C3H, F5H and COMT (Bravo, 1998; Vogt, 2010). In higher plants, these genes are found as families of paralogous genes. For instance, the key gene PAL was found in several copies in plant species including four genes in Arabidopsis, five in pine and poplar, six in tomato and nine in rice (Hamberger et al., 2007). As recently reported by Cai et al. (2015), the genetic control of polyphenol has been intensively investigated in pear, cider apple and tomato, and some studies also have focused on sorghum, rice and barley. The lack of information in wheat is mainly due to its large genome size and complexity, despite provisional sequencing data and gene annotation that is readily available in public databases. So far, syntenic studies are a very useful and powerful tool to obtain functional information across different taxa. In fact, comparative genomic analysis allows the transfer of information from well-studied model organisms to those less-studied. By using similar approaches, previous studies identified key genes involved in nitrogen metabolism in wheat, such as GS genes (Gadaleta et al., 2011) and GOGAT genes (Nigro et al., 2014). We followed this approach, focusing on the model species A. thaliana, in order to obtain information about the phenylpropanoid biosynthetic pathway in wheat. Several SNPs in the coding sequences of six phenolic candidate genes (PAL1, PAL2, C4H, C3H, COMT1 and COMT2) were identified by a BLASTn analysis of the whole SNP dataset against the wheat phenolic gene sequences. The phenolic acid sequences were used to search the corresponding contigs in the wheat genome database reported in Wang et al. (2014) and assign chromosome arm locations for each gene. The precise map position of the six phenolic genes was determined on the high-resolution consensus maps of durum (Maccaferri et al., 2015) and common wheat (Wang et al., 2014) (Table 2 and Fig. 2). No information on the number of copies of each gene family and the chromosome location of

phenolic acid genes in the wheat genome is known. We found that the hydroxycinnamic derivatives genes are distributed on the homoeologous chromosomes of groups 1, 2, 3 and 6 of wheat. For the PAL1 and PAL2 genes, at least one SNP was identified for each of the three homoeologous loci present in the wheat genomes. Using nulli-tetrasomic lines of Chinese Spring and probes representing defense response genes, Li et al. (1999) were able to assign two paralogous PAL genes to the chromosome 3B and to the homoeologous chromosomes of group 6. We mapped two paralogous PAL genes on group-1 and -2 chromosomes. This apparent discrepancy could be because several PAL copies mapping in different chromosomes may be present in the wheat genome. Recent investigations in developing grains of white, purple, and red wheat (Ma et al., 2016) indicated that the expression of phenolic acid biosynthesis genes varied throughout grain development and that they are closely related to the accumulation of phenolic acids in the grain. The detection of QTL for individual phenolic acids and total soluble phenolic compounds in the tetraploid wheat collection was carried out by using both the GLM and the MLM models to consider the effects of population structure and the kinship among genotypes. In agreement with several GWAS on agronomical important traits in several crops (see review by Gupta et al., 2014), the model MLM (K þ PCs) was found to be the most suitable for the GWAS of the phenolic traits. Several QTL for individual genes and total soluble phenolic compounds (Table 4 and Fig. 2) were detected on 11 chromosomes of durum. Interestingly, two QTL for p-coumaric acid were found to be coincident with the candidate genes phenylalanine ammonia-lyase (PAL2) and p-coumarate 3-hydroxylase (C3H) on chromosome arms 2AL and 1AL, respectively. The QTL found on chromosome arm 2B (marker interval 94e120 cM) co-localized with the Ppo1 and Ppo2 genes (Taranto et al., 2015), which are known to interact with the phenolic acid content by oxidative enzymatic activity (review by Ficco et al., 2014). In cereals, detection of QTL for phenolic acids has been carried out in tritordeum, rice, barley and sorghum, mainly for total phenolic content and not considering individual phenolic

D. Nigro et al. / Journal of Cereal Science 75 (2017) 25e34

compounds. Using chromosome substitution lines in the tritordeum amphiploid (x Tritordeum Ascherson and Graebner), Navas-Lopez et al. (2014) reported that genes involved in hydroxycinnamic acid biosynthesis and/or accumulation may be located on wheat chromosome groups 1 and 2. In a doubled-haploid population of rice, one QTL for phenolic content, with a large additive effect explaining 16.9% of the total phenotypic variation, was identified on chromosome 2 using a composite interval mapping approach (Jin et al., 2009). Three significant regions on chromosomes 3H, 4H and 5H were found to be associated with total phenolic acids in a wide panel of elite spring barley breeding lines (Mohammadi et al., 2014). In a collection of 68 barley genotypes, Cai et al. (2015) identified eight QTL for p-coumaric acid on chromosomes 1H (two), 2H, 4H, and 7H (two) and three for ferulic acid on chromosomes 3H and 7H (two). One marker was found to match a contig containing the cytochrome P450 gene encoding cinnamate 4-hydroxylase enzyme, which catalyzes cinnamic acid to p-coumaric acid in the phenylpropanoid pathway, leading to the biosynthesis of lignin and numerous other phenolic compounds in plants. 5. Conclusion Six candidate genes involved in the biosynthesis of hydroxycinnamic acid derivatives and total soluble phenolic compounds content were identified in wheat by exploiting molecular and genetic resources and genotyping a tetraploid wheat collection with 81,587 gene-associated SNPs. The wheat coding sequences were highly conserved and more closely related to those of monocot species rather than to Arabidopsis. The SNPs found in the coding sequences of the candidate genes determined the chromosomal location and accurate map position on two reference genetic consensus maps. The marker-trait association analysis in wheat collections can help to validate QTL detected in biparental populations and to unravel new QTL for phenolic acid content. The availability of SNPs in candidate genes involved in the metabolism of phenolic acid compounds can help to elucidate the mechanism of phenolic accumulation in wheat kernels and exploit the genetic variability of wild and cultivated germplasm. The identification of functional markers and precise map position can be particularly useful for breeders in marker-assisted selection programs. Acknowledgements This research was supported by grants from MIUR, Italy, project “PON-01_01145 e ISCOCEM” and by Puglia Region, Italy, project PSR “SAVEGRAIN”. The authors gratefully acknowledge W. John Raupp (Wheat Genetics Resource Center, Kansas State University, Manhattan, KS, USA) for linguistic revision and critical reading of the article. Appendix A. Supplementary data Supplementary data related to this article can be found at http:// dx.doi.org/10.1016/j.jcs.2017.01.022. References Bravo, L., 1998. Polyphenols: chemistry, dietary sources, metabolism, and nutritional significance. Nutr. Rev. 56, 317e333. Cai, S., Han, Z., Huang, Y., Chen, Z.H., Zhang, G., Dai, F., 2015. Genetic diversity of individual phenolic acids in barley and their correlation with barley malt quality. J. Agric. Food Chem. 63, 7051e7057. Ficco, D.B.M., Mastrangelo, A.M., Trono, D., 2014. The colours of durum wheat: a review. Crop Pasture Sci. 65, 1e15. Gadaleta, A., Nigro, D., Giancaspro, A., Blanco, A., 2011. The glutamine synthetase (GS2) genes in relation to grain protein content of durum wheat. Funct. Integr.

33

Genomics 11 (4), 665e670.  Gawlik-Dziki, U., Swieca, M., Dziki, D., 2012. Comparison of phenolic acids profile and antioxidant potential of six varieties of Spelt (Triticum spelta L.). J. Agric. Food Chem. 60, 4603e4612. Gupta, P.K., Kulwal, P.L., Jaiswal, V., 2014. Association mapping in crop plants: opportunities and challenges. In: Friedmann, T., Dunlap, J., Goodwin, S. (Eds.), Adv. Genet. 85, 109e148. Hamberger, B., Ellis, M., Friedmann, M., de Azevedo Sousa, C., Barbazuk, B., Douglas, C., 2007. Genome-wide analyses of phenylpropanoid-related genes in Populus trichocarpa, Arabidopsis thaliana and Oryza sativa: the Populus lignin toolbox and conservation and diversification of angiosperm gene families. Can. J. Bot. 85, 1182e1201. Jin, L., Xiao, P., Lu, Y., Shao, Y., Shen, Y., Bao, J., 2009. Quantitative trait loci for brown rice color, phenolics, flavonoid contents, and antioxidant capacity in rice grain. Cereal Chem. 86, 609e615. Laddomada, B., Durante, M., Mangini, G., D'Amico, L., Lenucci, M.S., Simeone, R., Piarulli, L., Mita, G., Blanco, A., 2016. Genetic variation for phenolic acids concentration and composition in a tetraploid wheat (Triticum turgidum L.) collection. Gen. Res. Crop. Evol. http://dx.doi.org/10.1007/s10722-016-0386-z. , G., Mangini, G., Taranto, F., Gadaleta, A., Blanco, A., Cattivelli, L., Marone, D., Laido Mastrangelo, A.M., Papa, R., De Vita, P., 2013. Genetic diversity and population structure of tetraploid wheats (Triticum turgidum L.) estimated by SSR, DArT and pedigree data. PLoS One 8 e67280ee67280. Li, W.L., Faris, J.D., Chittoor, J.M., Leach, J.E., Hulbert, S.H., Liu, D.J., Chen, P.D., Gill, B.S., 1999. Genomic mapping of defense response genes in wheat. Theor. Appl. Genet. 98, 226e233. Li, L., Shewry, P.R., Ward, J.L., 2008. Phenolic acids in wheat varieties in the HEALTHGRAIN diversity screen. J. Agric. Food. Chem. 56, 9732e9739. Ma, D., Li, Y., Zhang, J., Wang, C., Qin, H., Ding, H., Xie, Y., Guo, T., 2016. Accumulation of phenolic compounds and expression profiles of phenolic acid biosynthesisrelated genes in developing grains of white, purple, and red wheat. Front. Plant Sci. 7, 528. Maccaferri, M., Ricci, A., Salvi, S., Milner, S.G., Noli, E., Martelli, P.L., Casadio, R., Akhunov, E., Scalabrin, S., Vendramin, V., Ammar, K., Blanco, A., Desiderio, F., Distelfeld, A., Dubcovsky, J., Fahima, T., Faris, J., Korol, A., Massi, A., Mastrangelo, A.M., Morgante, M., Pozniak, C.J., N'Diaye, A., Xu, S., Tuberosa, R., 2015. A high-density, SNP-based consensus map of tetraploid wheat as a bridge to integrate durum and bread wheat genomics and breeding. Plant Biotechnol. J. 13, 648e663. Marcotuli, I., Houston, K., Waugh, R., Fincher, G.B., Burton, R.A., Blanco, A., Gadaleta, A., 2015. Genome wide association mapping for arabinoxylan content in a collection of tetraploid wheats. PLoS One 10 (7), e0132787. Mohammadi, M., Endelman, J.B., Nair, S., Chao, S., Jones, S.S., Muehlbauer, G.J., Ullrich, S.E., Baik, B., Wise, M.L., Smith, K.P., 2014. Association mapping of grain hardness, polyphenol oxidase, total phenolics, amylose content, and b-glucan in US barley breeding germplasm. Mol. Breed. 34, 1229e1243. Narwal, S., Thakur, V., Sheoran, S., Dahiya, S., Jaswal, S., Gupta, R.K., 2014. Antioxidant activity and phenolic content of the Indian wheat varieties. J. Plant Biochem. Biotechnol. 23, 11e17.  n, F., Navas-Lopez, J.F., Ostos-Garrido, F.J., Castillo, A., Martín, A., Gimenez, M.J., Pisto 2014. Phenolic content variability and its chromosome location in tritordeum. Front. Plant. Sci. 5, 10. Nigro, D., Blanco, A., Anderson, O.D., Gadaleta, A., 2014. Characterization of ferredoxin-dependent glutamine-oxoglutarate amidotransferase (Fd-GOGAT) genes and their relationship with grain protein content QTL in wheat. PLoS One 9 (8), e103869. Pasqualone, A., Delvecchio, L.N., Mangini, G., Taranto, F., Blanco, A., 2014. Variability of total soluble phenolic compounds and antioxidant activity in a collection of tetraploid wheat. Agric. Food Sci. 23, 307e316. Rafalski, J.A., 2010. Association genetics in crop improvement. Curr. Opin. Plant Biol. 13, 174e180. Ragaee, S., Guzar, I., Abdel-A, E.S.M., Seetharaman, K., 2012. Bioactive components and antioxidant capacity of Ontario hard and soft wheat varieties. Can. J. Plant Sci. 92, 9e30. Rice-Evans, C., Miller, N., Paganga, G., 1997. Antioxidant properties of phenolic compounds. Trends Plant Sci. 2, 152e159. Rhodes, D.H., Hoffmann Jr., L., Rooney, W.L., Ramu, P., Morris, G.P., Kresovich, S., 2014. Genome-wide association study of grain polyphenol concentrations in global sorghum [Sorghum bicolour (L.) Moench] germplasm. J. Agric. Food Chem. 62, 10916e10927. Shewry, P.R., Piironen, V., Lampi, A.M., Edelmann, M., Kariluoto, S., Nurmi, T., Fernandez-Orozco, R., Ravel, C., Charmet, G., Andersson, A.A.M., Åman, P., Boros, D., } , Z., Gebruers, K., Dornez, E., Courtin, C.M., Delcour, J.A., Rakszegi, M., Bedo Ward, J.L., 2010. The HEALTHGRAIN wheat diversity screen: effects of genotype and environment on phytochemicals and dietary fiber components. J. Agric. Food Chem. 58, 921e928. Taranto, F., Delvecchio, L.N., Mangini, G., Del Faro, L., Blanco, A., Pasqualone, A., 2012. Molecular and physic-chemical evaluation of enzymatic browning of whole meal and dough in a collection of tetraploid wheats. J. Cereal Sci. 55, 405e414. Taranto, F., Mangini, M., Pasqualone, A., Gadaleta, A., Blanco, A., 2015. Mapping and allelic variations of Ppo-B1 and Ppo-B2 gene-related polyphenol oxidase activity in durum wheat. Mol. Breed. 35, 80. Verma, B., Hucl, P., Chibbar, R.N., 2008. Phenolic content and antioxidant properties of grain in 51 wheat cultivars. Cereal Chem. 85, 544e549. Verpoorte, R., Contin, A., Memelink, J., 2002. Biotechnology for the production of

34

D. Nigro et al. / Journal of Cereal Science 75 (2017) 25e34

plant secondary metabolites. Phytochem. Rev. 1, 13e25. Vogt, T., 2010. Phenylpropanoid biosynthesis. Mol. Plant 3, 2e20. Wang, S.C., Wong, D., Forrest, K., Allen, A., Chao, S., Huang, B.E., Maccaferri, M., Salvi, S., Milner, S.G., Cattivelli, L., Mastrangelo, A.M., Whan, A., Stephen, S., Barker, G., Wieseke, R., Plieske, J., International Wheat Genome Sequencing Consortium, Lillemo, M., Mather, D., Appels, R., Dolferus, R., Brown-Guedira, G., Korol, A., Akhunova, A.R., Feuillet, C., Salse, J., Morgante, M., Pozniak, C.,

Luo, M.C., Dvorak, J., Morell, M., Dubcovsky, J., Ganal, M., Tuberosa, R., Lawley, C., Mikoulitch, I., Cavanagh, C., Edwards, K.J., Hayden, M., Akhunov, E., 2014. Characterization of polyploid wheat genomic diversity using a high-density 90000 single nucleotide polymorphism array. Plant Biotechnol. J. 12, 787e796. Yilmaz, V.A., Brandolini, A., Hidalgo, A., 2015. Phenolic acids and antioxidant activity of wild, feral and domesticated diploid wheats. J. Cereal Sci. 64, 168e175.