Legume isoflavone synthase genes have evolved by whole-genome and local duplications yielding transcriptionally active paralogs

Legume isoflavone synthase genes have evolved by whole-genome and local duplications yielding transcriptionally active paralogs

Plant Science 264 (2017) 149–167 Contents lists available at ScienceDirect Plant Science journal homepage: www.elsevier.com/locate/plantsci Legume ...

2MB Sizes 21 Downloads 39 Views

Plant Science 264 (2017) 149–167

Contents lists available at ScienceDirect

Plant Science journal homepage: www.elsevier.com/locate/plantsci

Legume isoflavone synthase genes have evolved by whole-genome and local duplications yielding transcriptionally active paralogs

MARK



Dorota Narożnaa,1, Michał Książkiewiczb, ,1, Łucja Przysieckab,c, Joanna Króliczaka, Bogdan Wolkob, Barbara Naganowskab, Cezary J. Mądrzaka a b c

Department of Biochemistry and Biotechnology, Faculty of Agronomy and Bioengineering, Poznań University of Life Sciences, Dojazd 11, 60-632, Poznań, Poland Department of Genomics, Institute of Plant Genetics, Polish Academy of Sciences, Strzeszyńska 34, 60-479, Poznań, Poland NanoBioMedical Centre, Adam Mickiewicz University, Umultowska 85, 61-614, Poznań, Poland

A R T I C L E I N F O

A B S T R A C T

Keywords: Lupinus angustifolius Isoflavone synthase Genome duplication Multigene family Gene expression Chromosomal localization

Isoflavone synthase (IFS) is the key enzyme of isoflavonoid biosynthesis. IFS genes were identified in numerous species, although their evolutionary patterns have not yet been reconstructed. To address this issue, we performed structural and functional genomic analysis. Narrow leafed lupin, Lupinus angustifolius L., was used as a reference species for the genus, because it has the most developed molecular tools available. Nuclear genome BAC library clones carrying IFS homologs were localized by linkage mapping and fluorescence in situ hybridization in three chromosome pairs. Annotation of BAC, scaffold and transcriptome sequences confirmed the presence of three full-length IFS genes in the genome. Microsynteny analysis and Bayesian inference provided clear evidence that IFS genes in legumes have evolved by lineage-specific whole-genome and tandem duplications. Gene expression profiling and RNA-seq data mining showed that the vast majority of legume IFS copies have maintained their transcriptional activity. L. angustifolius IFS homologs exhibited organ-specific expression patterns similar to those observed in other Papilionoideae. Duplicated lupin IFS homologs retained non-negligible levels of substitutions in conserved motifs, putatively due to positive selection acting during early evolution of the genus, before the whole-genome duplication. Strong purifying selection preserved newly arisen IFS duplicates from further nonsynonymous changes.

1. Introduction Isoflavonoids constitute a large subfamily of flavonoids – plant secondary metabolites common to most of the vascular plants, playing a critical role in their growth and development [1]. These compounds, contrary to other groups of flavonoids, were for a long time presumed to be specific to the Fabaceae. Leguminous plants still remain the most abundant source of isoflavonoids which perform diverse biological functions [2], particularly those linked with plant-microbe interactions, both symbiotic [3,4] and pathogenic [5]. The occurrence of isoflavonoids in other plant families is sporadic [6–8]. It should be pointed out that some isoflavonoids (e.g. daidzein, genistein or coumestrol) are structurally similar to estradiol and can act as estrogens in humans and

animals. They are therefore referred to as phytoestrogens [5,9]. Isoflavonoids present in genus Lupinus constitute structurally diverse group of natural products, presenting various potential biological or pharmacological activities [10,11]. Except differentially hydroxylated isoflavones genistein and 2′-hydroxygenistein, also isoflavones modified with isoprenyl groups (wighteone and luteone) were found in roots and leaves of white and narrow-leafed lupins [12]. All isoflavones exist in the plant tissue as the variety of O- or C-glycosylated derivatives, often acylated with malonyl group on sugar moieties. Free aglycones are released in the cells under abiotic or biotic stress [13,14]. Isoflavonoids are becoming interesting bioproducts and therefore metabolic engineering methods are being applied in order to increase their content in plant products [15–17].

Abbreviations: BAC, bacterial artificial chromosome; BLAST, basic local alignment search tool; EMBL, European Molecular Biology Laboratory; ER, endoplasmic reticulum; GA, gene expression atlas; IFS, isoflavone synthase; Ka, nonsynonymous substitution rate; Ks, synonymous substitution rate; NLL, narrow leafed lupin linkage group; SDS, sodium dodecyl sulfate; SRA, sequence read archive; SSC, saline & sodium citrate; SSPE, saline – sodium phosphate – EDTA; TA, transcriptome assembly; TE, transposable element; WGD, whole-genome duplication ⁎ Corresponding author. E-mail addresses: [email protected] (D. Narożna), [email protected] (M. Książkiewicz), [email protected] (Ł. Przysiecka), [email protected] (J. Króliczak), [email protected] (B. Wolko), [email protected] (B. Naganowska), [email protected] (C.J. Mądrzak). 1 Dorota Narożna and Michał Książkiewicz contributed equally to this article. http://dx.doi.org/10.1016/j.plantsci.2017.09.007 Received 27 January 2017; Received in revised form 5 September 2017; Accepted 11 September 2017 Available online 18 September 2017 0168-9452/ © 2017 Elsevier B.V. All rights reserved.

Plant Science 264 (2017) 149–167

D. Narożna et al.

Station Wiatrowo (Poznań Plant Breeders LTD. Tulce, Poland). Threeday-old seedlings were used for DNA isolation. Plant tissues for RNA isolation were sampled from plants grown in a growth chamber under controlled conditions (temperature 18 °C, photoperiod 12/12 h). Shoot apical meristems, stems, and leaves were collected from 5 day-old plants. Root nodules were isolated from greenhouse grown plants 28 days after infection with Bradyrhizobium japonicum (lupinus) UPP331 [47]. A nuclear genome BAC library of L. angustifolius cv. Sonet, constructed in pIndigoBAC5 HindIII-Cloning Ready vector [30], was used as a source of BAC clones for molecular and cytogenetic analyses. Seeds of the L. angustifolius mapping population, developed as 89 F8 recombinant inbred lines from the cross combination 83A:476 × P27255 [48], were kindly provided by Dr. Hua’an Yang from the Department of Agriculture and Food, Western Australia.

Isoflavone synthase (IFS) is the key enzyme responsible for isoflavonoid biosynthesis in plants. It is the first enzyme in the branch of the phenylpropanoid pathway catalyzing the conversion of common flavonoid precursors liquirtigenin and naringenin into 2,7,4′-trihydroxyisoflavanon and 2,5,7,4′-tetrahydroxyisoflavanon, respectively − the precursors of all known isoflavonoids [18]. Isoflavone synthase (2hydroxyisoflavanone synthase) is a cytochrome P450–dependent monooxygenase [19], belonging to the CYP93C subfamily [20]. The conversion of flavanone substrates into isoflavonoid precursors, catalyzed by IFS, involves the 2-hydroxylation and aryl ring migration of substrates to yield a 2-hydroxyisoflavanone. The reaction is followed by a dehydration step catalyzed by 2-hydroxyisoflavanone dehydratase [21] to yield the isoflavone product (daidzein or genistein). The genes encoding isoflavone synthase have been identified from a variety of legumes such as soybean, pea, alfalfa, red clover, peanut and others. In all those plant species, except red clover [22], they constitute small multigene families [20,21,23,24]. The homology between IFS genes from different genera is high, often exceeding 95% at the level of protein similarity. In angiosperms, IFS genes contain a single intron and two well-conserved exons. Legumes (Fabaceae) are the third largest family of higher plants, comprising about 20 thousand species grouped into several clades, of which the largest is the subfamily Papilionoideae. Lupins belong to genistoids, one of the earliest diverging lineages of the Papilionoideae which evolved ∼50–60 million years ago [25]. Lupin crops are relevant sources of protein for animal and human consumption, as well as natural fertilizer plants improving soils and increasing yields of subsequent crops. Within the genus, Lupinus angustifolius L. (narrow-leafed lupin) is the species with the most developed molecular resources, including linkage maps with sequence-defined molecular markers [26–28], bacterial artificial chromosome (BAC) libraries representing 6–12 × genome coverage [29,30], transcriptome assemblies for wild and domesticated lines [31] as well as a draft genome sequence [32]. As such it has been considered as a reference species for the whole genus, and subjected to comprehensive studies involving targeted gene-rich region sequencing [33], molecular cytogenetic survey [34,35], cross-genera comparative mapping and gene-based phylogenetic inference [36,37]. The possibilities of family-wide studies were greatly enhanced by the release of high-quality genome assemblies of several legume species representing the main clades of Papilionoideae: dalbergioids (Arachis) [38], genistoids (Lupinus) [32], robinioids (Lotus) [39], millettioids (Cajanus, Glycine, Phaseolus, Vigna) [40–43], and the inverted repeat-lacking clade (Medicago and Cicer) [44,45]. Recent studies provided novel evidence for the evolutionary uniqueness of the Lupinus genus represented by L. angustifolius, manifested by its early divergence in the Papilionoideae lineage and retention of triplicated gene homologs or even whole chromosome segments as remnants of ancient whole-genome duplication events [25,26,35,46]. Therefore, we aimed to reconstruct the evolution of IFS genes in the legume family utilizing genomic data from L. angustifolius as a case study. Here, we harnessed available genomic resources to perform structural and functional analysis of IFS genes. The L. angustifolius nuclear genome BAC library was screened with probes designed based on the sequences of L. luteus (yellow lupin) IFS cDNAs. BAC clone localization in L. angustifolius chromosomes was visualized using fluorescence in situ hybridization. IFS sequences were anchored to a linkage map of the species, using gene-based markers. IFS sequences were also retrieved from sequenced legume genomes by nucleotide and protein multiple alignment followed by proteinbased hidden Markov model gene prediction. To track evolution of particular homologs, Bayesian inference of phylogeny was applied. Transcriptional activity of duplicated gene loci was revealed by gene expression analysis as well as RNA-seq and microarray data mining.

2.2. Hybridization probe and BAC library screening BAC clones containing sequences of IFS genes were selected by screening the nuclear BAC library of L. angustifolius. Primers for the amplification of IFS fragments were designed based on the nucleotide sequences of L. luteus IFS cDNAs determined earlier in our laboratory (LlIFS1 – FJ539089 and LlIFS2 – FJ539090). Primers Ll-IFSR and LlIFSF (Supplementary Table 1) were designed to amplify fragments of exon II of IFS genes. Polymerase chain reactions using Applied Biosystems GenAmp PCR system 2400 were performed in a reaction mixture (total volume 25 μl) containing: 1 × PCR buffer, 100 μM dNTP (each) and 1U Taq DNA polymerase. The amplification consisted of the following steps: 94 °C, 2 min; followed by 30 cycles of: 94 °C, 30s; 54 °C, 30s; 72 °C, 1 min, and final extension: 72 °C, 4 min. Hybridization probes were labeled with P32 dATP (MP Biomedicals) using PCR with annealing temperature of 54 °C. The probe was purified using Montage PCR Centrifugal Filter Devices (Merck Millipore, Darmstadt, Germany), denatured at 94 °C for 5 min and incubated on ice. Hybridization of the labeled probe with three macroarrays containing the whole nuclear genome BAC library of L. angustifolius cv. Sonet [30] was conducted in 5 × SSPE buffer with 0.5% SDS at 50 °C overnight. Filters were washed for 15 min at 50 °C in solutions of increasing stringency (5 × SSC + 0.5% SDS; 2.5 × SSC + 0.25% SDS; 1 × SSC + 0.1% SDS; 0.5 × SSC + 0.05% SDS). Blots were exposed to BAS-MS 2340 imaging plates (Fujifilm, Tokyo, Japan) for 48 h and analyzed using the FLA-5100 phosphorimager (Fujifilm).

2.3. Verification and sequencing of selected BAC clones DNA from clones yielding positive hybridization signals was isolated using the PhasePrep BAC DNA Kit (Sigma-Aldrich, St. Louis, USA). The presence of IFS encoding fragments was verified by PCR using LlIFSeieF/R primers (Supplementary Table 1) amplifying the fragment consisting of part of exon I, the intron, and part of exon II of IFS genes. Positively verified BAC clones were delivered to Genomed (Warsaw, Poland) for sequencing. GS FLX TITANIUM 454 DNA Sequencing (Roche 454 Life Sciences, Branford, USA) was performed with tagged BAC DNA samples. The 1/8 picotiterplate (PTP) standard shotgun reads for the set of 3 BACs were performed with read length up to 400 nucleotides. Assuming even distribution of 454 reads between the different tagged samples, the planned sequencing scheme was equivalent to approximately 20 × coverage of 454 reads for the sequence of single BAC clones. Sequences were assembled by Genomed using CLC Genomic Workbench (v7.0.4) software under default parameters [35]. Ambiguous sites were resolved by PCR amplification and sequencing of products amplified from a particular BAC clone.

2. Materials and methods 2.1. Biological material Seeds of L. angustifolius cv. Sonet were obtained from the Breeding 150

Plant Science 264 (2017) 149–167

D. Narożna et al.

(v0.9, DOE-JGI and USDA-NIFA, http://www.phytozome.net), and Vigna radiata [40] (project PRJNA243847, accession JJMO00000000). Additionally, Beta vulgaris [54], Trifolium pratense [55] and V. angularis [56] genome assemblies were screened. Moreover, transcriptome assemblies were mined: lupins (L. luteus [50], L. albus [51], L. angustifolius [31] and L. polyphyllus) as well as representatives of other clades, namely: Cercideae (Bauhinia tomentosa, Cercis canadensis), Detarieae (Copaifera officinalis), Mimosoid/MCC (Acacia argyrophylla, Desmanthus illinoensis, Gleditsia sinensis, Gleditsia triacanthos, Gymnocladus dioicus, Senna hebecarpa), Mirbelioids (Gompholobium polymorphum), early diverging papilionoids (Cladrastis lutea, Xanthocercis zambesiaca), phaseolids (Apios americana, Bituminaria bituminosa, Codariocalyx motorius) and IRLC (Lathyrus sativus, Astragalus propinquus, Astragalus membranaceus) [57,58] http://dx.doi.org/10.5061/ dryad.ff1tq. The following tblastn parameters were set in Geneious v8.1 (http://www.geneious.com; [59]: matrix, blosum62; maximum e-value, 1e6, word size, 3; gap cost open/extend, 11/1. For every alignment, the matching sequence of genome and 10000 bp flanking regions were extracted and submitted to FGENESH+ gene prediction using Glyma07g32330.1 as a reference. Predicted mRNA and protein sequences were blasted against Refseq to verify that they belonged to the IFS subfamily. Then, predicted IFS mRNA and/or protein sequences were aligned to mRNA/protein annotations of A. duranensis and A. ipaensis (PeanutBase http://mediadb.peanutbase.com/), G. max (http://www. soybase.org), (http://www.kazusa.or.jp/lotus/), C. cajan, C. arietinum, M. truncatula, P. vulgaris (Legume IP http://plantgrn.noble.org/LegumeIPv2/ index.jsp, [60] and V. radiata (http://legumeinfo.org/). Cross-reference accession numbers from these databases were assigned whenever applicable. Additionally, two published IFS sequences from each of Beta vulgaris (AF195816.1 and AF195817.1, [24], L. luteus (FJ539089.1 and FJ539090.1) and Pisum sativum (AAQ10282.2 and AAF34533.1) were included in the phylogenetic survey [61]. The set of 42 IFS mRNA sequences with trimmed stop codons was aligned in Geneious using MAFFT 7.017 [62] with specified parameters: alignment, translation; genetic code, standard; translation frame, 1; algorithm, E-INS-I; scoring matrix, blosum 80; gap open penalty, 1.25; offset value, 0. The same parameters were applied for protein sequence alignment. The selection of best-fit model for nucleotide substitution was done in jModelTest 2.1.10 [63]. mRNA-based phylogenetic inference was conducted in MrBayes 3.2.2 [64] with the following settings: rate variation, gamma; gamma categories, 5; chain length (number of iterations), 2500000; subsampling frequency, 400; burn-in length (number of initial iterations excluded), 400000; nchains (number of chains used for Metropoliscoupled Markov chain Monte Carlo analysis), 6; heated chain temperature, 0.30; unconstrained branch lengths, exponential 10; shape parameter, exponential 10. Two substitution models were tested, codon M1 (site-specific neutral) and GTR + I + G (indicated by jModelTest as the best-fit); For protein-based phylogeny the JTT + G + i substitution model was selected [65] based on a ProtTest 2.4 survey [66], all other parameters were identical. Posterior output trees were drawn in Geneious. Based on the topology of the tree, pairs of IFS sequences were selected. Pairwise translation sequence alignments were done in MAFFT v7.017 [62], using E-INS, BLOSUM 80 and gap open penalty 1.25. Selection pressure parameters, Ka (the number of nonsynonymous substitutions per nonsynonymous site), Ks (the number of synonymous substitutions per synonymous site), and Ka/Ks ratios were calculated in DnaSP 5 [67]. Additionally, branch-site test of positive selection was performed in PAML4 [68] for monophyletic Lupinus, Arachis and Lotus clades. Two models were considered: a null model, in which the foreground branch might have different proportions of sites under neutral selection to the background (i.e. relaxed purifying selection), and an alternative model, in which the foreground branch might have a proportion of sites under positive selection. Verification of hypothesis was based on the likelihood ratio test (alternative vs null model) and p-value under a Chi-square distribution and 1 ° of freedom.

2.4. Anchoring BAC clone sequences to scaffolds of the draft narrow-leafed lupin genome assembly The sequenced BAC clones were anchored to the L. angustifolius draft genome assembly by screening the collection of whole-genome shotgun contigs and scaffolds [32], deposited in the NCBI sequence database (Project No. PRJNA179231, assembly version GCA_000338175.1, accessions KB405099–KB441797 and AOCW01000001–AOCW01191454). A sequence identity cut-off value of 99% was applied. The BLASTn algorithm was optimized for highly similar sequences (word size, 28; match/mismatch scores, 1/-2; and gap costs, linear). 2.5. Functional annotation of BAC sequences In order to characterize genetic elements encoded in L. angustifolius regions carrying IFS genes, obtained BAC sequences were subjected to in silico annotation. This process included de novo detection of specific signals located on the genomic sequences as well as a comparative analysis. Transposable element-related repeats (TEs) were identified using RepeatMasker Web Server version 4.0.3 with implemented repeat libraries RMLib 20140131 and Dfam 1.3 (A.F.A. Smit, R. Hubley & P. Green, unpublished data, http://www.repeatmasker.org), with the following options selected: search engine, cross_match; speed/sensitivity, slow; DNA source, Arabidopsis thaliana. DNA sequences were then compared to the RepeatMasker Web Server database of TE encoded proteins. Next, sequences were submitted to Censor [49] with the following settings: sequence source, Viridiplantae; force translated search, report simple repeats. Based on the results of Repeatmasker and Censor repetitive content was masked, then Basic Local Alignment Search Tool (BLAST, http://blast.ncbi. nlm.nih.gov) was used to examine similarities with integrated, non-redundant, and annotated sequences of genomic DNA, transcripts, and proteins deposited in the RefSeq database (http://www.ncbi.nlm.nih.gov/ refseq). Hypothetical genes were subjected to sequence homology searches against the transcriptome sequences of L. luteus young leaves, buds, flowers, and seeds [50], L. albus roots and leaves [51], as well as a reference L. angustifolius transcriptome of five different tissue types (leaf, stem, root, flower and seed) [31]. The following sequence repositories were used: L. luteus, http://www.cgna.cl/lupinus (project PRJNA170318, archive SRX159101); L. albus, http://comparative-legumes.org (gene index LAGI 1.0); L. angustifolius, http://www.lupinexpress.org. The analysis was performed using the CoGe BLAST algorithm [52] with the following parameters: e-value cut-off, 1e-10; word size, 8; gap existence cost, 5; gap elongation cost, 2; nucleotide match/mismatch scores, 1/-2. For every putative gene the reference RefSeq protein sequence was selected (based on alignment statistics), and BAC sequences were subjected to protein-based hidden Markov model gene prediction in FGENESH+ [53], using these proteins as references. Annotated BAC sequences were deposited in EMBL (accession numbers: LT577944-6). IFS gene sequences annotated in BAC clones were verified by PCR on the template of DNA isolated from L. angustifolius cv. Sonet and Sanger sequencing. Primers are provided in Supplementary Table 1. 2.6. Phylogenetic survey of L. angustifolius IFS proteins Phylogenetic analysis of legume IFS genes included four steps: identification of IFS genes using available sequence data, prediction of mRNA/ protein sequences, multiple alignment and Bayesian inference. According to recently published data [23], two reference IFS proteins, Glyma07g32330.1 and Glyma13g24200.1, were selected to screen the following genome assemblies: Arachis duranensis (accession V14167) and A. ipaensis (accession K30076) [38], Cajanus cajan [43] (project PRJNA72815, v1.0), Cicer arietinum [44] (v1.0 unmasked, http://comparative-legumes.org), Glycine max [41] (v. 9.0 unmasked, http://www.phytozome.net), Lotus japonicus [39] (v2.5 pseudomolecules unmasked, http://www.kazusa.or.jp), L. angustifolius [32] (project PRJNA179231), Medicago truncatula [45] (strain A17, JCVI v4.0 unmasked, http://www.jcvi.org/medicago/), Phaseolus vulgaris [42] 151

Plant Science 264 (2017) 149–167

D. Narożna et al.

mapping population parental lines 83A:476 and P27255 [48] as a template: La-IFS-062L01-F/R, La-IFS-136E13-F/R and La-IFS-082N10F/R (Supplementary Table 1). The PCR annealing temperature was 60 °C for the pair of La-IFS-062L01 primers, 56 °C for La-IFS-136E13 primers and 58 °C for La-IFS-82N10 primers. The PCR products were purified and sequenced. The polymorphisms were visualized by electrophoresis in 10% polyacrylamide gels stained with silver nitrate. IFS genes were genetically mapped according to the procedure described previously [37]. DNA was isolated from three-day-old seedlings of an L. angustifolius mapping population including parental lines 83A:476 and P27255, using the GenElute Plant Genomic DNA Miniprep Kit (Sigma-Aldrich). The segregation of markers, tested among all RILs of the mapping population, was analyzed using MapManager v. QTXb20 [71]. New markers were distributed in linkage groups (map function Kosambi, linkage criterion 1e-4) of the reference genetic map of L. angustifolius [26]. MapChart software [72] was used to draw the linkage groups containing new IFS markers.

2.7. Microsynteny analysis To identify links of shared collinearity, L. angustifolius BAC clone sequences carrying IFS genes were comparatively mapped to the legume genomic sequences available. BAC sequences (assembled with overlapping scaffold sequences) were masked for repetitive contents and low-complexity regions, and then subjected to comparative mapping to the following genome sequences: A. duranensis, A. ipaensis, C. cajan, C. arietinum, G. max, L. japonicus, M. truncatula, P. vulgaris, and V. radiata. The CoGe BLAST algorithm [52] was used to perform sequence similarity analyses with the following parameters: e-value cut-off, 1e10; word size, 8; gap existence cost, 5; gap elongation cost, 2; nucleotide match/mismatch scores, 1/-2. Sequences producing excessive numbers of alignments to loci dispersed over numerous chromosomes were considered as repetitive, and links related to these sequences were not drawn. Microsyntenic blocks were visualized using the Web-based Genome Synteny Viewer [69] and Circos [70]. 2.8. Chromosomal localization of IFS genes

2.10. Expression analysis of Lupinus angustifolius IFS genes

To directly visualize chromosome location of IFS gene copies, BAC clones selected from the L. angustifolius genome library [30] were applied as molecular probes for fluorescence in situ hybridization (FISH). Probe DNA was labeled with digoxygenin-11-dUTP and/or tetramethylrhodamine-5-dUTP by incubation at 15 °C for 110 min followed by inactivation at 65 °C for 15 min using Sensoquest Labcycler (Göttingen, Germany). Mitotic chromosome preparations were obtained according to [34], with minor modifications. Briefly, roots (1.5–2.0 cm) were excised and treated with ice-cold water (2–3 °C, 24 h), then fixed in ethanol: glacial acetic acid mixture (3:1) and stored at −20 °C. Tissue maceration was performed in an enzyme cocktail of: 40% (v/v) pectinase, 3% (w/v) cellulase (Sigma Aldrich, St. Louis, USA), and 1.5% (w/ v) cellulase ‘Onozuka R-10′ (Serva Electrophoresis, Heidelberg, Germany), at 37 °C for about 1.5 h. Dissected meristems were squashed in 60% acetic acid. The quality of preparations was checked under a phase contrast microscope BX41 (Olympus, Tokyo, Japan). FISH procedure was performed according to the protocol optimized for L. angustifolius [34]. Briefly, the slides were pre-treated with RNase (100 μg/ml) in a 2 × SSC buffer (humid chamber, 37 °C, 1 h) and washed in 2 × SSC (RT). The cytoplasm was removed by washing in 0.01 M HCl (2 min) followed by treatment with pepsin in HCl solution (5 μg/ml, 12 min, humid chamber). After several washes (H2O, 2 × SSC) the slides were postfixed in 10% formalin, washed in 2 × SSC, dehydrated in ethanol series and air dried. The hybridization mixture (50% deionized formamide, 10% dextran sulfate, 2 × SSC, 0.5% SDS, blocking DNA 0.05 μg/μl and 300 ng probe per slide) was pre-denatured (90 °C, 10 min), applied to the slides and denatured together with the chromosome material (78 °C, 10 min), then allowed to hybridize in a humid chamber at 37 °C, for 22 h. Post-hybridization washes depended on probe type. Immunodetection of digoxigenin-labeled DNA probes was conducted with fluorescein isothiocyanate (FITC)-conjugated antidigoxigenin primary antibodies (Roche Diagnostics, Basel, Switzerland). Whole chromosomes were counterstained with 2 μg/ml DAPI (Sigma Aldrich, St. Louis, USA) in Vectashield antifade mounting medium (Vector Laboratories, Burlingame, USA). Slides were analyzed with epifluorescence microscope BX60 (Olympus, Tokyo, Japan). Images were captured using F-View monochromatic camera and superimposed using Micrografx (Corel) Picture Publisher 10 software.

To confirm the transcriptional activity of L. angustifolius IFS gene copies, expression profiling was performed using semi-quantitative RT-PCR method. RNA was isolated from 28-day-old roots, leaves, stalks, root nodules and pods (90 mg of fresh plant tissue), using the SV Total RNA isolation System (Promega). The cDNA samples for semi quantitative RT-PCR experiments were synthesized from 1 μg of total RNA and oligo(dT)18 primers, using the High Capacity cDNA Reverse Transcription Kit (Applied Biosystems). One μl of reaction product (cDNA) was used as a template for PCR reactions carried out with three pairs of specific PCR primers. The primers were designed based on the analysis of IFS encoding genes located within selected BAC clones (062L01, 136E13 and 082N10). Specific forward primers (La-IFS062L01exF, La-IFS-136E13exF, La-IFS-082N10exF) were used in pair with a common reverse primer La-IFScom-exR (Supplementary Table 1). The fragment of yellow lupin actin (KP257588) coding sequence was amplified as a reference gene using actinF/R primers (Supplementary Table 1). Three technical replicates were assayed. The standard deviation was calculated in Excel from the range of three replicates.

2.11. In silico gene expression survey To provide insight into IFS gene expression patterns across selected representatives of the legume family, data mining of sequence repositories was performed. Depending on the availability, three types of resources were used: gene expression atlases (GA), transcriptome assemblies (TA) and transcriptome raw sequence read archives (SRA). The harnessed resources were as follows: A. duranensis and A. ipaensis, 25-kmer Trinity TA [73]; C. cajan, CcTA v2 TA [74]; C. arietinum, CaTA v2 TA [75]; G. max, http://soybase.org/soyseq GA [76]; L. japonicus, http://ljgea.noble.org/v2 GA [77]; M. truncatula, http://mtgea.noble. org/v3 GA [78,79]; P. sativum, http://bios.dijon.inra.fr/FATAL/cgi/ pscam.cgi TA [80]; P. vulgaris, http://plantgrn.noble.org/PvGEA GA [81]; V. radiata, SRX598035 SRA [82]. GA repositories were analyzed using gene identification codes and gene names, whereas TA and SRA datasets were searched by blast alignments. GA-derived expression levels were retrieved from servers as normalized values of reads per kilobase million (RPKM) [76]. G. max and P. vulgaris IFS genes were annotated directly in the GA database whereas L. japonicus and M. truncatula genes were assigned to specific probe sets by sequence alignments. Two factor analysis of variance without replication was used. For testing contrasts between means the F-test in the analysis of variance was used [83].

2.9. Development of PCR markers and linkage mapping of IFS genes To localize IFS genes on the L. angustifolius linkage map novel PCR markers were generated. Based on the nucleotide sequences of selected BAC clones 062L01, 136E13 and 082N10, three pairs of primers were designed, amplifying fragments of different length with DNA of 152

Plant Science 264 (2017) 149–167

D. Narożna et al.

3. Results

sequences of IFS genes located in three L. angustifolius loci is within the range of 77–85%. Amino acid sequence identity is 76–92%, and the similarity is 81–96% (Table 3). Isoflavone synthase genes located within three regions represented by analyzed BAC clones were named as LangIFS1 (062L01), LangIFS2 (136E13) and LangIFS3 (082N10).

3.1. Three BAC clones representing IFS loci were selected, sequenced and annotated Eight signals were detected after hybridization of the labeled probe obtained using Ll-IFSR and Ll-IFSF PCR primers with macroarrays containing the whole nuclear genome BAC library of L. angustifolius [30]. The presence of IFS encoding fragments was confirmed by PCR using Ll-IFSeieF and Ll-IFSeieR primers amplifying the fragment consisting of part of exon I, the intron, and part of exon II of IFS genes in the case of 6 clones. Three groups of clones with different IFS gene intron size were identified: 372 bp (clone 136E13), 1495 bp (062L01 and 137G21), and 161 bp (082N10, 070J20, and 120F10). Based on the similarities of restriction patterns within these groups (data not shown) they were considered to represent three loci. Three BAC clones representing these IFS loci were selected for whole-insert sequencing. The 454 sequencing, followed by contig construction and PCR-based verification of constructed contigs, resulted in the complete assembly of BAC inserts of all three sequenced clones. The length of constructed assemblies was as follows: 062L01–62 440 nt; 082N10–84 355 nt; 136E13–71 689 nt. Mapping to the L. angustifolius cv. Tanjil genome [32] resulted in unambiguous identification of scaffold sequences for all BACs: no multiple scaffold matches in the query range were found. Based on assigned Tanjil sequences, consensus superscaffolds were assembled as follows: AOCW01121037.1 + 062L01 (63 384 nt), KB434177.1 + 082N10 + KB425963.1 (98 938 nt), AOCW01155075.1 + 136E13 (82 829 nt). RepeatMasker and Censor survey revealed that obtained superscaffolds represent narrow-leafed lupin genome regions with very low repetitive content (Table 1). Total content of repeats varied from 2.77% in BAC 062L01 to 6.29% in 136E13, thus it was several times lower than the value of 50%, estimated for the species after pilot-phase genome sequencing [32]. The most abundant classes of repetitive elements were LTR/Gypsy, non-LTR/LINE/L1, DNA/Helitron, and simple repeats. Protein-based hidden Markov model gene prediction yielded in the identification of 7 genes in 062L01, 7 in 082N10, and 5 in 136E13 (Table 2). One copy of IFS was annotated in each clone. Mining of the L. angustifolius genome assembly [32] and reference transcriptome [31] did not reveal the presence of IFS copies other than those three. Alignment to L. angustifolius transcriptome assemblies of Tanjil, Unicrop and P27255 provided cDNA evidence for 3 genes in BAC 062L01, 2 in 082N10 and 4 in 136E13, including all IFS copies annotated (Fig. 1). IFS open reading frames were amplified on the template of DNA isolated from L. angustifolius cv. Sonet and directly sequenced. Obtained sequences were identical to those elucidated from BAC clone assemblies. The list of L. angustifolius, L. luteus and L. albus transcriptome sequences assigned to gene sequences annotated in BAC-derived superscaffolds is given in Supplementary Table 2. Alignment revealed that the mutual similarity between the coding

3.2. Three L. angustifolius IFS genes are located in different chromosomes corresponding to three linkage groups The physical position of IFS genes within the L. angustifolius genome was visualized by application of selected BAC clones containing IFS encoding sequences as molecular probes for FISH reaction. The clones were differently labeled which made it possible to identify all three signals in the same experiment. The images observed showed that all three clones yielded unique ‘single-locus’ signals and hybridized to different chromosome pairs (Fig. 2). This decisively illustrated that each of IFS genes is located in a different linkage group of L. angustifolius. Three pairs of PCR primers amplifying DNA fragments located within the BAC clones containing three IFS encoding genes were used as polymorphic markers linked with these genes, differentiating parental lines. Markers specific for LangIFS1 (La-IFS-062L01), LangIFS2 (La-IFS136E13) and LangIFS3 (La-IFS-082N10) were mapped in different linkage groups of the L. angustifolius genetic map [26]: NLL-19, NLL-16 and NLL-06, respectively (Fig. 2). Segregation data for L. angustifolius IFS markers are provided in Supplementary Table 3. Genetic mapping results were consistent with cytogenetic mapping. The small multigene family of IFS genes is dispersed into three distinct loci located in different chromosomes of L. angustifolius. BAC-derived markers developed in this study significantly improved the existing set of L. angustifolius cytogenetic/linkage anchors[35]. New markers were found to be localized at distances of ∼20 to ∼65 cM from the existing chromosomespecific landmarks, and as such constituted, together with those recently published, a valuable tool for tracking large-scale rearrangements of chromosomes NLL-06, NLL-16 and NLL-19. 3.3. All three L. angustifolius IFS genes are transcriptionally active An exploratory analysis of expression of the genes in question was performed. The expression of three genes encoding IFS in different tissues of narrow leafed lupin was confirmed by semi quantitative RTPCR using gene specific pairs of primers and actin specific primers as a reference. The results presented as the ratio of IFS genes expression versus actin gene expression are shown in Fig. 3. Transcripts of all three IFS genes were present in all analyzed tissues, confirming the transcriptional activity of all three paralogs. Their expression levels were different in particular organs. However, comprehensive evaluation of tissue-specificity of IFS gene expression and influence of environmental factors requires a separate experimental approach. Expression values and standard deviations are provided in Supplementary Table 4. To supplement the gene expression results obtained for L. angustifolius we referred to RNA-derived resources available for other legumes. Transcriptome-derived datasets were screened to survey IFS gene expression across species representing all main lineages of Papilionoideae. Transcriptional activity was confirmed for the vast majority of the studied IFS copies (Table 4, Fig. 4). For G. max, L. japonicus, M. truncatula and P. vulgaris gene expression atlases are available, and these resources were used to quantify expression levels in different tissues. The highest abundance of particular transcripts was observed in roots, the second highest in nodules. Such a pattern converges with that observed for L. angustifolius. There were significant differences between transcription levels of gene homologs, especially in roots and nodules, but the expression profiles were similar to each other. However, in L. japonicus, relatively high transcription was also detected in stems and petioles (Fig. 4). Data on organ-specific profiles of IFS gene expression derived from microarray and RNA-seq databases of G. max, L. japonicus, M. truncatula and P. vulgaris are provided in Supplementary Table 4.

Table 1 Content of repetitive sequences in L. angustifolius IFS BAC-derived superscaffolds [%]. Class

062L01

082N10

136E13

DNA/EnSpm/CACTA DNA/Harbinger DNA/Helitron LTR/Copia LTR/Gypsy Non-LTR/LINE/L1 Non-LTR/RTE Non-LTR/SINE/SINE2 Simple repeat Total repeats

– 0.34 – – 0.33 0.96 – – 1.14 2.77

– – – 0.31 3.23 – 0.17 0.13 1.74 5.58

0.43 – 1.27 – 0.25 2.49 – – 1.85 6.29

153

Plant Science 264 (2017) 149–167

D. Narożna et al.

Table 2 Genes predicted in L. angustifolius IFS BAC-derived superscaffolds. BAC

No.

Positions

Protein length

Fgenesh score

Reference accession

Reference gene sequence name

062L01

1 2 3 4

752–4279 8104–10583 17276–19103 24134–32274

517 357 376 851

1475.74 993.67 1591.64 3229.0

AF195798.1 XP_007148101.1 XP_003593675.1 XP_006588460.1

5 6 7

37343–38410 44810–49175 50359–58028

277 892 618

841.47 3414.37 836.47

XP_003593691.1 XP_003593704.1 XP_006594562.1

Isoflavone synthase 1 [G. max] Hypothetical protein PHAVU_006G180900 g [P. vulgaris] Anthocyanidin 3-O-glucosyltransferase [M. truncatula] PREDICTED: probable leucine-rich repeat receptor-like protein kinase At5g49770-like isoform X1 [G. max] Ethylene responsive transcription factor 1b [M. truncatula] Glutamate receptor 2.7 [M. truncatula] PREDICTED: protein IQ-DOMAIN 1 isoform X3 [G. max]

082N10

1 2 3 4 5 6 7

17803–23509 32283–33587 41385–43303 45310–45787 48318–52970 55123–57046 85309–87655

140 315 520 136 217 267 678

618.01 892.73 2507.34 549.44 797.43 870.43 2261.92

XP_003528467.1 XP_003542715.1 AF195798.1 XP_003628691.1 XP_003542712.1 XP_003541718.1 XP_003593704.1

PREDICTED: receptor-like protein kinase HSL1-like [G. max] PREDICTED: isoflavone 4′-O-methyltransferase [G. max] Isoflavone synthase 1 [G. max] Histone H3 [M. truncatula] PREDICTED: ras-related protein RABA1f-like [G. max] PREDICTED: ethylene-responsive transcription factor ERF091-like [G. max] Glutamate receptor 2.7 [M. truncatula]

136E13

1 2

2284–5199 7044–14245

380 852

1311.44 3177.32

XP_006594207.1 XP_003541454.1

3 4 5

20916–25913 37783–38487 50391–53181

575 150 516

1575.38 461.44 1336.02

XP_003593674.1 XP_003593638.1 AF195798.1

PREDICTED: WUSCHEL-related homeobox 9-like [G. max] PREDICTED: probable leucine-rich repeat receptor-like protein kinase At5g49770-like [G. max] hypothetical protein MTR_2g014890 [M. truncatula] hypothetical protein MTR_2g014430 [M. truncatula] Isoflavone synthase 1 [G. max]

sequence (Table 5). Despite similar coordinates of targeted sequences, IFS BAC-derived superscaffolds differed by the total score value of constructed sequence alignments, reflecting distinctive level of sequence conservation. Namely, 062L01 and 136E13 exhibited higher similarity than 082N10 to loci at A. duranensis chromosome 10 (Ad_10), Ai_10, Ca_01, Cc_scaffold000401, Lj_06, Mt_02, Pv_06, Vr_10, none of which regions contain IFS sequences. On the other hand, 082N10 showed higher level of synteny than 062L01 and 136E13 to loci at Ad_03, Ai_03, Ca_06, Gm_07, Gm_13, Lj_04, Mt_04, and Pv_03, all of

3.4. The region carrying LangIFS3 revealed higher sequence conservation than those of LangIFS1 and LangIFS2 Comparative mapping of repeat-masked IFS BAC-derived superscaffolds to genome assemblies of nine legume species revealed clear patterns of shared cross-genera synteny for all three L. angustifolius regions studied. In general, all three superscaffolds showed blocks of conserved collinearity to the same regions in legume genomes, including those carrying one or more IFS copies and those lacking any IFS

Fig. 1. Visualization of sequence annotation of IFS BAC-derived superscaffolds. Particular elements are marked by color: navy blue, superscaffold sequences 062L01, 082N10 and 136E13; black, repetitive elements; green, genes; red, coding sequences; blue, L. angustifolius transcriptome reads. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

154

Plant Science 264 (2017) 149–167

D. Narożna et al.

In V. radiata genome assembly, one IFS sequence was localized in a scaffold whereas the other was on a chromosome (Table 6). Transcriptome mining revealed the presence of IFS homologs in Apios americana (1), Astragalus membranaceus (1), Astragalus propinquus (1), Bituminaria bituminosa (1), Cladrastis lutea (1), Gompholobium polymorphum (1), Lathyrus sativus (1), L. luteus (2), L. albus (2), and Xanthocercis zambesiaca (Table 6). We did not identify any IFS sequences in the transcriptome assemblies of Cercideae (Bauhinia tomentosa, Cercis canadensis), Detarieae (Copaifera officinalis) and Mimosoid/MCC (Acacia argyrophylla, Desmanthus illinoensis, Gleditsia sinensis, Gleditsia triacanthos, Gymnocladus dioicus, Senna hebecarpa) clades. IFS fragments (too truncated to be included in the phylogenetic assay) were identified for L. polyphyllus and Codariocalyx motorius. Coding sequence (CDS) lengths varied from 1545 to 1590 nt besides 9 sequences (BbIFS1, CcIFS1, CcIFS2, LalbIFS2, LjIFS2, PvIFS3, PvIFS4, TpIFS1 and TpIFS2), which were shorter. B. vulgaris IFS sequences from Genbank, BvIFS1 (AF195816.1) and BvIFS2 (AF195817.1) [24], were aligned to the genome assemblies of B. vulgaris and G. max. No IFS homolog was identified in the B. vulgaris genome (Table 7). However, these sequences matched IFS homolog Glyma13g24200.1 at chromosome Gm13 with high confidence (100% coverage, more than 99% identity). V. radiata IFS sequences from GenBank (AF195806.1-AF195809.1) were also aligned to the genomes of source species and G. max. These sequences matched two IFS homologs in the V. radiata genome assembly (VrIFS1, VrIFS2), however, percentage nucleotide identity values were relatively low (∼85-86%). Alignments constructed for sequences from the G. max genome had considerably higher sequence identity (above 99.5%). Therefore, sequences AF195816.1, AF195817.1 and AF195806.1-AF195809.1 were considered as allelic variants of the GmIFS2 gene, Glyma13g24200.1. Based on the sequence alignments, Pisum sativum PsIFS2 sequence (AAF34533.1) was also considered as G. max cross-contamination and removed from further analyses. The set of sequences used for phylogenetic inference is provided in the Supplementary Table 7. The topology of majority rule consensus of 12502 trees derived from mRNA sequence alignment highlighted the recently resolved phylogeny of Papilionoideae, with the early diverging ADA and Cladrastis clades localized at ancestral nodes [25]. Arachis, Astragalus, Cicer, Cajanus, Glycine, Lotus, Lupinus, Medicago, and Trifolium spp. IFS sequences constituted monophyletic clades. Branches leading to Lupinus and Arachis clades were the longest. Moreover, the branch lengths in the Lupinus clade were much longer than those found in the rest of the tree, suggesting a faster rate of evolution. P. vulgaris, V. radiata and V. angularis IFS copies formed two clades – one carrying three P. vulgaris and single V. radiata/V. angularis sequences, and the second carrying single IFS copies from each species (Fig. 6). This indicates that one duplication putatively occurred before the divergence of Phaseolus and Vigna lineages.

Table 3 The mutual similarity between the coding sequences of IFS genes. LangIFS2 136E13

LangIFS3 082N10

85a 92b (96)c

79 79 (81)

LangIFS1 062L01

77 76 (88)

LangIFS2 136E13

a b c

Nucleotide sequence identity. Amino acid sequence identity. Amino acid sequence similarity.

which regions encode at least one copy of an IFS gene. To summarize, 082N10 matched more closely to regions carrying IFS sequences than 062L01 and 136E13, whereas the matches with greater similarity to 062L01 and 136E13 did not contain IFS genes. Moreover, comparison of total score values of sequence alignments constructed for all these syntenic regions demonstrated that 082N10 has a higher level of sequence conservation than 062L01 and 136E13. Besides these syntenic blocks, some additional collinearity links were observed, located at Gm_10 and Gm_15 for 062L01 and 136E13, as well as at Cc_02, Pv_02, Vr_07 for 082N10. All these regions do not have any IFS sequence annotated. Surprisingly, two regions from V. radiata carrying IFS sequences (from chromosome 6 and scaffold 178) were not syntenic to any of L. angustifolius BACs analyzed. The same phenomenon was observed for C. cajan scaffolds encoding IFS genes. Legume regions syntenic to L. angustifolius IFS superscaffolds but not possessing any IFS homologs were inspected for the possibility that the absence of an IFS gene was due to assembly gaps. No such gap was revealed but two short nucleotide tracks were ambiguous in Lj06 (2.9 kb) and in Vr10 (1.5 kb). A deeper insight into microsynteny revealed that the patterns of collinearity of Cc_02, Gm_10, Pv_02, and Vr_07 chromosome regions were observed only at one side of the L. angustifolius IFS loci and diminished before reaching the IFS gene sequence. Contrary, syntenic regions of Ad_10, Ai_10, Ca_01, Gm_15, Lj_06, Mt_02, Pv_06 and Vr_10 chromosomes revealed considerable level of DNA collinearity at both sides of the L. angustifolius IFS loci. However, no significant sequence similarity to the IFS gene was revealed in these regions (e-value threshold 1e-10), indicating the lack of any truncated, pseudogenized IFS copy. Taking into consideration the average total lengths of shared syntenic regions, L. angustifolius sequences were the shortest, whereas those of L. japonicus the longest. The average ratio of syntenic region lengths between analyzed legume species and L. angustifolius varied from 2.82 for G. max to 6.04 for L. japonicus, indicating that, during evolution, L. angustifolius sequences experienced numerous minor deletions or syntenic legume regions – insertions. Visualization of syntenic links between L. angustifolius IFS BAC-derived superscaffolds and genome regions of other legumes is presented in Fig. 5. A list of inter-genus syntenic loci with alignment data is provided in Supplementary Table 5.

3.6. L. angustifolius IFS proteins revealed the presence of typical motifs determining their biological functionality Isoflavone synthase protein sequence alignment was screened for the presence of conserved motifs (Supplementary Table 8). An endoplasmic reticulum (ER) targeting sequence (LLELAIGLVVLALFLHLR) [84–86] was found in all IFS sequences except those truncated at the Nterminus, namely CcIFS2, BvIFS1, BvIFS2, PvIFS4. Numerous amino acid substitutions were identified in this region, especially in Arachis and Lupinus spp. The truncated region in PvIFS4 encompassed also two segments localized behind the ER leader. The ER-targeting motif was followed by a region rich in basic amino acid residues (KSKALRH) [85–87], which were revealed to be highly conserved among all analyzed species excluding lupins. Indeed, considerable alterations in this sequence were found in LlutIFS1 and LlutIFS2 (two basic residues preserved), LangIFS1 and LalbIFS1 (three), and LangIFS2 (four basic amino acids present but localized at different positions). Immediately adjacent

3.5. All Lupinus spp. IFS genes form a monophyletic clade Functional annotation of eleven legume genome assemblies resulted in identification of IFS gene sequences in all species analyzed: one copy in A. ipaensis, two copies in A. duranensis, C. arietinum, C. cajan, G. max, V. angularis and V. radiata, three in L. japonicus, M. truncatula, T. pratense and four in P. vulgaris. List of loci identified in legume genome sequences by BLAST analysis of IFS gene sequences, with alignment data, is provided in Supplementary Table 6. In seven species, namely A. duranensis, C. arietinum, L. japonicus, M. truncatula, P. vulgaris, T. pratense and V. angularis, all IFS copies were tandem duplicates. IFS sequences in G. max were located on different chromosomes. Both C. cajan IFS copies were found in scaffolds unassigned to any chromosome. 155

Plant Science 264 (2017) 149–167

D. Narożna et al.

Fig. 2. Cytogenetic and linkage mapping of IFS genes in the L. angustifolius genome. A: representation of three linkage groups of the L. angustifolius genetic map [26] containing IFS loci. B: localization of selected BAC clones in L. angustifolius mitotic chromosomes using FISH. NLL – narrow-leafed lupin linkage groups. Chromosomes were counterstained with DAPI (blue). Arrows indicate signals of the BAC clones representing: LangIFS1 (clone 062L01 labeled with digoxigenin-11-dUTP) – green signals, LangIFS2 (clone 136E13, tetramethylrhodamine-5-dUTP) – red signals, and LangIFS3 (clone 082N10, both dyes) – yellow signals. Bar: 5 μm.

[90]. This position was revealed to be preserved in all analyzed IFS sequences except TpIFS2 (K to Q substitution). About 110 amino acids before the C-terminus the highly conserved motif among P450 sequences (PERF) with unknown function exists [89], and such a motif was localized in all analyzed IFS sequences but PvIFS3. A heme-iron ligand signature (FGSGRRMCPG) [91] was found to be present in all IFSs except PvIFS3 and PvIFS4. Moreover, M to I substitution was revealed in TpIFS2. Multiple sequence alignment of IFS proteins with indicated conserved loci is shown in Supplementary Table 8.

to this region, a small proline-rich sequence (PNPPSPKP) [87] has been identified. Although this region was shown to be highly conserved, substitution of the fifth proline by leucine was observed in LangIFS1, LalbIFS1, LangIFS2, LlutIFS1 and LlutIFS2. The fourth proline was lacking in BbIFS1. Close to the middle of the IFS protein sequence, another highly conserved block has been annotated, known as “I helix” (FSAGTDST) [88,89]. This sequence was identified in all IFS proteins except LalbIFS2. Exactly 66 amino acids after the I helix, a lysine should be present, which is a key residue essential for IFS enzymatic activity

156

Plant Science 264 (2017) 149–167

D. Narożna et al.

Fig. 3. Expression of IFS encoding genes in 28-dayold roots (R), leaves (L), stalks (S), root nodules (N) and young developing pods (P). Three technical replicates were assayed. The standard deviation is presented as error bars. Expression level and standard deviation values are provided in Supplementary Table 4.

quantification. For example, it was previously reported that three IFS homologs exist in the L. japonicus genome: two transcriptionally active copies and one truncated pseudogene with transcription pattern not analyzed [93]. Mining of the released whole genome sequence [39] and gene atlas [77], conducted in this study, evidenced that all three homologs are, in fact, full-length, transcriptionally active genes. It should be emphasized that estimation of gene copy number using transcriptome data is not as effective as using genome sequences, because some copies may not be expressed. Moreover, similar homologs may be incorrectly combined into one copy during the process of sequence assembly as highly similar reads from closely related gene pairs are ambiguous [94]. This is much more unlikely to happen in genome sequences where introns help to distinguish homologs. Taking into consideration these possibilities, we decided to anchor the copy number analysis in genome-based datasets and use RNA-derived data only as references during the annotation procedure. The exploitation of genome assembly enabled us to identify putative misannotations in public databases for IFS sequences from B. vulgaris and V. radiata. Two B. vulgaris IFS sequences AF195816.1 and AF195817.1 [24] turned out to be very similar to G. max (99.1% and 99.5%) whereas no sequence statistically similar to those two was found in the recently published B. vulgaris genome [54]. The same phenomenon was observed for four V. radiata IFS accessions (AF195806.1-AF195809.1). The analysis of syntenic blocks and gene-based phylogenetic inference allowed us to present the hypothesis that there was a single ancestral IFS gene which later, during the evolution of legumes, underwent lineage-specific duplications. Topology of the tree indicates that one ancient duplication had putatively occurred in the Lupinus lineage before divergence to L. angustifolius and L. albus, as two pairs of IFS sequences are located at sister branches. Additional IFS sequences in L. angustifolius and L. luteus might be considered as remnants of a second duplication, which might have occurred in parallel after divergence of these species. Similarly, one IFS duplication putatively has occurred before the divergence of P. vulgaris and V. radiata, and the second afterward, in the P. vulgaris lineage. Localization of L.

3.7. L. angustifolius IFS sequences underwent strong purifying selection According to the topology of the majority consensus tree, 20 pairs of duplicated sequences were selected, including those located at sister branches and those originating from different clades. The nonsynonymous to synonymous substitution rate (Ka/Ks) ratio analysis revealed that all pairs except PvIFS3/PvIFS4 were under strong purifying selection, with Ka/Ks values from 0.04 to 0.37 (Table 8). It should be noted that the PvIFS4 sequence was found to be considerably truncated at the N-terminus, lacking several conserved motifs (Supplementary Table 8). The faster rate of IFS gene evolution in lupins was highlighted by the higher Ks between LangIFS2/LangIFS3 than LangIFS1/LangIFS2. To address the selection pressure in a wider phylogenetic context, branch-site test of positive selection was performed for three monophyletic clades, namely Arachis, Lotus and Lupinus. Analysis detected statistically significant likelihood of positive selection in the Lupinus branch (P = 0.019) while the p-value for the Arachis branch was 0.064. Empirical Bayes analysis [92] revealed high probability (98%) of positive selection in Lupinus at position 298 (E to P substitution) and moderate but not statistically significant (79%) probability of such pressure at position 494 (V to C). Two amino acid positions (159 and 429) had a moderate probability (80–82%) of being under positive selection in Arachis (R to K). Results of branch-site analysis of positive selection are provided in Supplementary Table 9. 4. Discussion 4.1. IFS genes in legumes have evolved by lineage-specific whole-genome and tandem duplications Isoflavone synthase genes constitute a small family, counting from one gene in red clover to 4 genes in mung bean [22,24]. On average, three copies per genome exist [23], as was the case in the present survey of the L. angustifolius genome. The availability of high-quality legume genome sequences considerably facilitated IFS copy number Table 4 IFS gene expression verification by transcriptome sequence alignments. Query

Subject

Score

E-value

Identitites

Coverage

Verification

AdIFS1 AdIFS2 AiIFS1 CaIFS1 CaIFS2 CcIFS1 CcIFS2 PsIFS1 PsIFS2 VrIFS1 VrIFS2

comp9907_c0_seq1 comp9907_c0_seq1 comp18282_c1_seq1, comp18282_c0_seq1 Cicar_201201_TA040564 Cicar_201201_TA006129 lista_cajca-201012|000142 lista_cajca-201012|000142 PsCam025494 PsCam025494 SRX598035 SRX598035

2666.67 2682.9 2786.974 2836.18 2578.3 2649.917 2260.91 597 2831 − −

0 0 0 0 0 0 0 1E-122 0 − −

97.30% 97.60% 100.00% 100.00% 96.70% 99.80% 96.90% 81.54% 98.01% 100.00% 94.39%

100.00% 100.00% 98.22% 100.00% 100.00% 100.00% 100.00% 75.56% 99.24% 90.79% 80.78%

− + + + + + − − + + +

157

Plant Science 264 (2017) 149–167

D. Narożna et al.

Fig. 4. Organ-specific profiles of IFS gene expression (roots, leaves, stems, nodules, petioles, flowers, pods and seeds) derived from microarray and RNA-seq databases of G. max (A), L. japonicus (B), M. truncatula (C) and P. vulgaris (D). [76–79,81]. Assignment to particular groups based on two factor analysis of variance and testing contrasts between means was indicated by small letters above the bars. Calculated values are provided in Supplementary Table 4.

G. max, L. japonicus and M. truncatula revealed the presence of macrosynteny blocks, well-conserved but highly dispersed among these plant species [40]. As the IFS copies in M. truncatula, L. japonicus, P. vulgaris and C. cajan are tandemly duplicated with close localization of particular copies, synteny-based inference may not provide reliable indications in this case. The evolutionary history of IFS homologs in these species remains unclear. It is well-known that WGDs substantially contributed to the evolutionary divergence of legumes. Ancient WGD occurred in the ancestral line of Papilionoideae and the traces of this event can be found in existing clades including the dalbergioids (Arachis spp.), genistoids (e.g. L.

angustifolius and G. max duplicated IFS loci in different chromosomes within segments of well-preserved collinearity provided evidence for the contribution of whole-genome duplications (WGDs) in evolution of IFS genes in these species. A similar conclusion has been reached for the Cupin gene family, where all the segmental duplications of the G. max Cupin genes occurred around 13 mya [95]. A genome-wide soybean survey revealed that duplicated genes were often found in syntenic blocks, and some blocks of duplicated genes were co-regulated [96]. Synteny-based approaches can support the reconstruction of the evolutionary changes of particular genes located in conserved blocks. The genome comparison of V. radiata with A. thaliana, C. arietinum, C. cajan, 158

Plant Science 264 (2017) 149–167

D. Narożna et al.

Table 5 Regions of shared synteny identified for L. angustifolius IFS BAC-derived superscaffolds in representatives of main Papilionoideae lineages. Legume chromosome

IFS copies in syntenic region

BAC-derived superscaffold

Total score of sequence alignments

Total length of BAC syntenic region

Total length of legume syntenic region

Ratio of syntenic region lengths

Average ratio of syntenic region lengths for the species

Ad_03

2

0

4049.1 10998.1 2649.3 7921.4 4103.1 7109.8

60735 89890 79617 44849 73678 69217

181696 391862 498419 118775 193789 127009

2.99 4.36 6.26 2.65 2.63 1.83

A. duranensis 3.62

Ad_10

062L01 082N10 136E13 062L01 082N10 136E13

Ai_03

1

0

3644.3 10469.6 2173.4 7894.9 4039.9 7747.0

58187 88491 78198 44835 73678 78537

167506 456895 579629 151923 226021 305216

2.88 5.16 7.41 3.39 3.07 3.89

A. ipaensis 4.47

Ai_10

062L01 082N10 136E13 062L01 082N10 136E13

Ca_01

0

2

8416.4 4550.6 8838.8 3024.1 13331.9 2916.8

46568 93892 78185 45966 92353 49509

162650 235944 298223 140452 311079 64593

3.49 2.51 3.81 3.06 3.37 1.30

C. arietinum 2.98

Ca_06

062L01 082N10 136E13 062L01 082N10 136E13

Cc_02 Cc_s000401

0 0

082N10 062L01 082N10 136E13

5565.1 8717.8 3610.6 10247.8

39373 38716 92439 67207

80881 161051 336797 232110

2.05 4.16 3.64 3.45

C. cajan 3.41

Gm_07

1

0

Gm_13

1

Gm_15

0

3061.8 12273.2 1406.9 3415.3 6074.0 12299.0 16582.3 10717.3 3021.6 1304.2

58210 93972 49510 20269 25694 58857 187979 126784 22163 42112

160133 333044 47523 45162 67111 197491 605422 332950 58122 89707

2.75 3.54 0.96 2.23 2.61 3.36 3.22 2.63 2.62 2.13

G. max 2.82

Gm_10

062L01 082N10 136E13 062L01 136E13 062L01 082N10 136E13 062L01 136E13

Lj_04

3

0

4646.8 15314.9 2605.2 9904.8 3187.6 10128.3

48087 88590 34686 51248 94019 79331

455766 832277 352638 171093 203685 377781

9.48 9.39 10.17 3.34 2.17 4.76

L. japonicus 6.04

Lj_06

062L01 082N10 136E13 062L01 082N10 136E13

Mt_02

0

3

10923.6 5622.9 10202.7 4846.5 14921.6 3050.1

41896 93892 78185 48089 88419 49508

205126 321019 408626 177404 372152 95365

4.90 3.42 5.23 3.69 4.21 1.93

M. truncatula 3.95

Mt_04

062L01 082N10 136E13 062L01 082N10 136E13

Pv_02 Pv_03

0 3

0

2607.0 4592.0 11514.2 3087.4 8744.2 4489.4 7628.4

24720 27871 48167 51236 50987 73704 78535

58423 148982 269975 180967 155899 195489 296066

2.36 5.35 5.60 3.53 3.06 2.65 3.77

P. vulgaris 3.68

Pv_06

082N10 062L01 082N10 136E13 062L01 082N10 136E13

Vr_07 Vr_10

0 0

082N10 062L01 082N10 136E13

9515.2 8230.3 3832.0 7661.2

74413 50954 73697 73071

429803 143912 230985 141625

5.78 2.82 3.13 1.94

V. radiata 3.48

Abbreviated species names are as follows: Ad, A. duranensis; Ai, A. ipaensis; Ca, C. arietinum; Cc, C. cajan; Gm, G. max; Lj, L. japonicus; Mt, M. truncatula; Pv, P. vulgaris; Vr, V. radiata.

evidence was provided for several independent WGDs which occurred about 30–55 mya during early evolution of such legume lineages as Mimosoideae-Cassiinae-Caesalpinieae, Detarieae, Cercideae and Lupinus [25]. It is thought that the Lupinus lineage experienced a WGD event at the beginning of its evolution, putatively before the divergence of New World and Old World clades [25,37,103]. The distribution of duplicated L. angustifolius, L. luteus and L. albus IFS copies among several genus-specific branches supports the hypothesis of early WGD of lupin ancestor. In lupins, WGD contributed to the evolution of the

angustifolius), galegoids (M. truncatula, L. japonicus, C. arietinum), millettioids (P. vulgaris, G. max, C. cajan, V. radiata), Xanthocercis and Cladrastis [25,97–100]. This WGD, dated to about 44–65 mya, directly preceded the divergence of ancient lineages of Papilionoideae [25,44,97,101,102]. Galegoid and millettioid clades separated ∼54 mya [102]. The analysis of V. radiata synteny blocks revealed that there was a single major peak of Ks frequency with a modal value at 0.61, corresponding to the modal age of 59 mya [40]. This implies that V. radiata has experienced only one ancient WGD. Recently, molecular 159

Plant Science 264 (2017) 149–167

D. Narożna et al.

Fig. 5. Intergenus microsyntenic patterns revealed for narrow-leafed lupin IFS clones and corresponding regions of legume chromosomes. L. angustifolius genomic sequences and homology links are marked by colors: blue, BAC clone 062L01; orange, 082N10; red, 136E13. Legume chromosomes are drawn in green. Abbreviated species names are provided as follows: Ad, A. duranensis; Ai, A. ipaensis (A); Ca, C. arietinum; Cc, C. cajan (B); Gm, G. max (C); Lj, L. japonicus; Mt, M. truncatula (D); Pv, P. vulgaris; Vr, V. radiata (E). IFS loci are indicated by black bars. Ribbons visualize homologous links identified by DNA sequence alignments.

160

Plant Science 264 (2017) 149–167

D. Narożna et al.

Table 6 IFS genes identified in sequenced legume genomes or in sequence databases. IFS gene

Species

Chromosome

CDS start

CDS end

CDS length

Accession number

AdIFS1 AdIFS2 AiIFS1 CaIFS1 CaIFS2 CcIFS1

A. duranensis A. duranensis A. ipaensis C. arietinum C. arietinum C. cajan

C. cajan G. max G. max L. japonicus L. japonicus L. japonicus M. truncatula M. truncatula M. truncatula P. vulgaris P. vulgaris P. vulgaris P. vulgaris T. pratense T. pratense T. pratense V. angularis V. angularis V. radiata V. radiata A. americana A. membranaceus A. propinquus B. bituminosa B. vulgaris B. vulgaris C. lutea G. polymorphum L. albus L. albus L. luteus L. luteus L. sativus P. sativum P. sativum

118668435 119987650 120993580 17441820 17446579 8 8 30 11582 37261238 27567360 8253179 8513507 8545556 34754334 34762316 34779994 11401344 6352552 6363755 6376583 14172303 14181669 14195402 15581861 15604257 398067 13679748 – – – – – – – – – – – – – – –

118670052 119989267 120995225 17443498 17448263 868 491 352 29813 37263018 27569058 8255046 8515158 8547234 34756142 34764118 34781875 11403003 6354399 6365415 6377796 14174057 14185999 14197086 15583706 15605912 399856 13681411 – – – – – – – – – – – – – – –

1545 1551 1572 1572 1557 1473

CcIFS2 GmIFS1 GmIFS2 LjIFS1 LjIFS2 LjIFS3 MtIFS1 MtIFS2 MtIFS3 PvIFS1 PvIFS2 PvIFS3 PvIFS4 TpIFS1 TPIFS2 TpIFS3 VaIFS1 VaIFS2 VrIFS1 VrIFS2 AaIFS1 AmIFS1 ApIFS1 BbIFS1 BvIFS1a BvIFS2a ClIFS1 GpIFS1 LalbIFS1 LalbIFS2 LlutIFS1 LlutIFS2 LsIFS1 PsIFS1 PsIFS2a

Ad_03 Ad_03 Ai_03 Ca_06 Ca_06 Cc_Scaffold116605 Cc_Scaffold114123 Cc_Scaffold112026 Cc_Scaffold133626 Gm_07 Gm_13 Lj_04 Lj_04 Lj_04 Mt_04 Mt_04 Mt_04 Pv_03 Pv_03 Pv_03 Pv_03 TGAC_v2_LG3 TGAC_v2_LG3 TGAC_v2_LG3 CM003371.1 CM003371.1 Vr_scaffold_178 Vr_06 – – – – – – – – – – – – – – –

Aradu.9F1DZ.1 Aradu.TCQ2L.1 Araip.0P3RJ.1 Ca_06357 Ca_06358 C.cajan_47755 – – C.cajan_42529 Glyma07g32330.1 Glyma13g24200.1 chr4.CM0432.2900.r2.m chr4.CM0432.3150.r2.m chr4.CM0432.3190.r2.m Medtr4g088160.1 Medtr4g088170.1 Medtr4g088195.1 Phvul.003G074000.1 Phvul.003G051700.1 Phvul.003G051800.1 – gene14926 gene14945 gene14936 Vang0349s00160 Vang0126s00350 Vradi0178s00030 Vradi06g07980 NXOH-2004052 HJMP-2075983 MYMP-2062414 TVSH-2001926 AF195816.1 AF195817.1 SLYR-2023995 VLNB-2022286 LAGI01_26153 LAGI01_27229 FJ539089.1 FJ539090.1 KNMB-2061016 AAQ10282.2 AAF34533.1

a

1302 1563 1563 1554 1506 1566 1566 1569 1566 1566 1563 1443 969 1500 1476 1575 1563 1566 1563 1566 1548 1575 1575 1431 1470 1473 1590 1587 1551 1491 1551 1560 1575 1572 1563

based on the sequence alignments and phylogenetic inference, these sequences were considered as G. max cross-contamination and removed from further analyses.

copies could be acquired after gene duplication, in the processes of neofunctionalization, sub-functionalization or pseudogenization [104–106]. Such a mechanism might result in evolutionary divergence, as functional diversification of the surviving duplicated genes is a phenomenon typically observed in the long-term evolution of polyploids [104]. Soybean is a good model for functional and evolutionary tracking of duplicated genes as the genome of this species contains 46.4 thousand protein-coding sequences of which more than 31 thousand are paralogs [41]. A comprehensive transcriptome survey revealed that approximately 50% of G. max paralogs had undergone expression sub-functionalization, whereas only a small proportion of the duplicated genes have been neo-functionalized or non-functionalized [96]. In the present study, gene expression profiling supplemented by in silico RNA-seq and microarray data mining delivered clear lines of evidence that the vast majority of analyzed legume IFS copies are actively transcribed. To provide insight into hypothetical functional activity of expressed proteins, the presence of conserved motifs and amino acid residues was assayed. Most of these regions are essential for biological functionality of the protein, involving correct cellular compartment localization, structural folding or enzymatic activity. Theoretically all analyzed IFS proteins may be catalytically active as they have preserved a crucial residue of IFS amino acid chain: a lysine obligatory for aryl migration of the flavanone molecule to produce an isoflavone skeleton [90]. However, PvIFS4 lacks several important motifs and probably is a pseudogene. This hypothesis is

phosphatidylethanolamine binding protein family [46], chalcone isomerase, chalcone isomerase like, and fatty acid binding protein genes [37], as well as early nodulin 40, nodulin 26-like intrinsic protein, phosphoenolpyruvate carboxylase, and glutamine synthetase genes [35]. The analysis of transversion rates at fourfold degenerate sites in galegoid genome sequences tentatively estimated the divergence of C. arietinum from L. japonicus ∼20–30 mya and from M. truncatula ∼10–20 mya [44]. It should be noted that some legume WGDs happened relatively recently, e.g. ∼13 mya in G. max and just few mya in Arachis [25,41]. Precise dating of duplicated copies could help to decipher the evolutionary mechanism underlying tandem duplications of IFS genes in several legume lineages. However, values of molecular clock substitution rates applicable for divergence time calculation are not consistent across the whole family. It has been shown that Arachis has accumulated synonymous substitutions at a rate ∼1.4 times faster than G. max [38]. Such studies require precise molecular clock calibration.

4.2. Despite numerous duplications legume IFS proteins preserved crucial amino acid residues conferring enzymatic activity Several evolutionary scenarios are possible when multiple copies of the same gene appear in the genome. Different functions of homologous gene 161

Plant Science 264 (2017) 149–167

D. Narożna et al.

(Ka/Ks) ratio performed for 20 gene pairs revealed that legume IFS genes were under strong purifying selection. Such a finding concurs with the general observation that upstream enzymes evolve more slowly than downstream ones and are subjected to purifying selection, whereas downstream genes exhibit relaxed constraints [109]. Similar observations were made for carotenoid and terpenoid biosynthetic pathways [110,111]. The correlation between position in a pathway and evolutionary rate has been found in numerous studies and several hypotheses accounting for the explanation of this relationship have been put forward. One of the possible explanations addresses the issue of uneven distribution of flux control along the enzymatic pathway. Natural selection may act discriminatively on enzymes with high flux control, which are usually located at the beginning of the pathway, as was observed for the glucosinolate pathway in Arabidopsis [112]. The other explanation is that amino acid substitutions in upstream enzymes are not likely to be fixed due to large deleterious consequences since they affect more end products than those in downstream enzymes [113]. Moreover, it was observed that soybean gene copies involved in isoflavonoid synthesis, i.e. chalcone reductase, chalcone synthase, chalcone isomerase, as well as IFS, have experienced purifying selection [23]. Thus, the conclusion was drawn that the enzymes in the isoflavonoid pathway have undergone convergent evolution and were under similar selection pressures [23]. Our branch-site study revealed the presence of positive selection marks in the Lupinus lineage. Indeed, such positive selection pressure might have occurred during early evolution of the Lupinus lineage, i.e. before the WGD event. After duplication, newly arisen IFS paralogs were putatively preserved from further amino acid substitutions by strong purifying selection.

Table 7 Data on nucleotide alignments generated for B. vulgaris (AF195816.1 and AF195817.1) and V. radiata (AF195806.1-AF195809.1) IFS gene bank entries by mapping to source and G. max genome assemblies. Query

Matched sequence

Total coverage

Total identity

Total score

Min. evalue

AF195806.1

GmIFS2 GmIFS1 VrIFS1 VrIFS2

100.00% 100.00% 99.80% 99.68%

99.59% 92.84% 85.74% 82.35%

2790.58 2318.10 1829.38 1594.95

0 0 0 0

AF195807.1

GmIFS2 GmIFS1 VrIFS1 VrIFS2

100.00% 100.00% 100.00% 99.68%

99.66% 92.91% 86.04% 82.56%

2795.99 2321.71 1843.81 1609.37

0 0 0 0

AF195808.1

GmIFS2 GmIFS1 VrIFS1 VrIFS2

100.00% 100.06% 100.00% 99.68%

99.72% 92.97% 85.91% 82.43%

2799.59 2327.119 1834.792 1600.357

0 0 0 0

AF195809.1

GmIFS2 GmIFS1 VrIFS1 VrIFS2

100.00% 100.00% 100.00% 99.68%

99.77% 93.03% 86.03% 82.55%

2805 2330.719 1838.402 1603.964

0 0 0 0

AF195816.1

GmIFS2 GmIFS1 Bvchr2.sca017

100.00% 100.00% 3.81%

99.19% 92.41% 87.00%

2595.82 2148.577 127.005

0 0 3.33E08

AF195817.1

GmIFS2 GmIFS1 Bvchr2.sca017

100.00% 100.00% 3.80%

99.53% 92.76% 87.00%

2624.67 2175.637 127.005

0 0 3.34E08

4.4. The issue of IFS duplicates sub-functionalization remains unsolved supported by the observation that the PvIFS3/PvIFS4 pair had a quasineutral Ka/Ks ratio, which may indicate that PvIFS4 does not encode a functional protein and as such has been released from selective pressure. The lack of sequence addressing protein deposition to the ER [84–86] observed in CcIFS2 may have resulted from low quality genome assembly, as this sequence was found in a short unlinked scaffold. Substitutions observed in ER-targeting sequence in Lupinus and Arachis spp. IFS proteins may reflect considerable evolutionary distance between these species and the other downstream lineages of Papilionoideae [25]. Moreover, ER leader sequences are quite variable with little restriction placed either on the length or the primary sequence [107]. Some Lupinus spp. IFS copies displayed also variation in a region rich in basic amino acid residues as well as in a prolinerich motif. The first region is required for correct orientation of the protein in the ER membrane whereas the second is crucial for correct protein folding and heme incorporation [85–87]. In a yeast protein model, substitution of only one of the proline residues in the region resulted in the complete loss of heme incorporation [87]. The hypothetical influence of the observed substitution of the fifth proline by leucine on the formation of the correct conformation of Lupinus spp. P-450 molecules requires further investigation. The highly conserved I helix was found to be lacking in LalbIFS2. This motif was suggested to be involved in oxygen binding by P450s [88,89]. To summarize, several examples of sequence substitutions in vital regions of duplicated IFS genes were identified, raising questions as to the impact of these polymorphisms on subcellular localization, conformation and enzymatic activity. Some of these questions could be addressed by genetic engineering of lupins targeting particular IFS homologs but such studies have been hampered by the lack of an efficient transformation system. Transgene insertion is possible in L. angustifolius, however the plant regeneration and homozygous line selection procedure is technically demanding and excessively time consuming [108].

To test the potential for transcriptional sub-functionalization of IFS genes, gene expression of particular homologs was surveyed in different organs, directly by semi-quantitative PCR in L. angustifolius and in silico by RNA data mining in several other legumes. Generally, it was observed that all lupin IFS copies and the vast majority of IFS homologs in other legume species are expressed, and the level of transcripts is the highest in roots. This fact is probably associated with either constitutive or induced exudation of isoflavonoids into the soil. It has been suggested, however, that the concentration of these compounds may strongly depend on the growth conditions and stress factors applied, so the issue of specificity of root-exuded flavonoids and their concentrations should be methodologically approached with great care. Nevertheless, a number of isoflavones (genistein, hydroxygenistein, luteone, wighteone, isowighteone, lupiwighteone, lupalbigenin) and their various glycosides are exuded by roots of white and yellow lupins as it has been reported in papers reviewed by Cesco et al. [114]. It is reasonable to state that root-borne isoflavonoids in leguminous plants are essential regulators influencing the rhizosphere microbial populations and thereby influencing such important aspects as rhizobial symbioses, mycorrhizas, and microbe dependent availability of nutrients including nitrogen, phosphorus, and iron, as well as micronutrients such as manganese and copper. The role of isoflavonoids in the plant-rhizobia molecular communication may explain the high level of expression of all three IFS genes in root nodules. The low level of IFS transcripts in leaves and stalks is probably related to the fact that in these organs isoflavonoid synthesis is most probably associated with lupin plant response to biotic stresses. The stress-induced expression of isoflavone synthase genes in leaves of narrow-leafed lupin has been investigated previously [115]; their transcription was enhanced when plants were treated with phytotoxin or spores of Colletotrichum lupini. The results of the in silico survey presented in Fig. 4 are in concurrence with earlier reports. It has been suggested for Medicago truncatula that only roots and root nodules express IFS genes under standard conditions [116]. It should be stressed, however, that the expression of

4.3. Selection constraints of IFS sequences correspond to their position in the metabolic pathway The analysis of the nonsynonymous to synonymous substitution rate 162

Plant Science 264 (2017) 149–167

D. Narożna et al.

Fig. 6. Majority rule consensus of 12502 trees found in a Bayesian analysis of 3 L. angustifolius and 39 other legume family IFS sequences. Numbers are posterior probabilities. 1629 nucleotide positions were included in the MAFFT alignment used for phylogenetic inference.

GmIFS2 increased during late stages of seed development while GmIFS1 was constant [121]. It was demonstrated that soybean IFS genes respond differently to water deficit conditions: GmIFS1 transcript abundance was maintained at a constant rate, whereas GmIFS2 was downregulated [122]. Indeed, high correlation of expression level and isoflavone accumulation in both stress and control assays was shown for GmIFS2, not GmIFS1 [123]. Differential expression patterns of soybean IFS genes was also observed in response to nodulation as well as pathogen attack in soybean shoot and root tissues [119,120]. Besides the variability of gene expression levels between two homologs, the diversity of IFS allelic variants may also have functional implications influencing the production of isoflavonoids. It has been reported for soybean that polymorphic forms of both GmIFS1 and GmIFS2 genes can be found in different genotypes [124], associated with the efficiency of isoflavone biosynthesis [125]. Emerging data from genomic and genetic studies on IFS – the key enzyme of this pathway – can positively influence both metabolic engineering

IFS genes, similarly to many other genes encoding enzymes of phenylpropanoid biosynthesis, is highly variable. It depends strongly upon the environmental stimuli and stresses [116–118]. It was reported for Psoralea corylifolia that PcIFS gene was expressed constitutively, at similar levels in various tissues [61], however the additive level of genistein and daidzein was significantly higher in roots than in other tissues. Moreover, the expression of this gene was enhanced, and the isoflavone levels increased when the plant was treated with methyl jasmonate, salicylic acid or mechanically wounded. Identification of two soybean IFS genes [24] launched a wide range of follow-up research approaches which collectively provided significant support for the hypothesis of advanced functional divergence of duplicated IFS genes. Two genes encoding IFS in soybean are expressed primarily in the roots and seeds [119], however, there are differences between homologs and tissue types. The highest GmIFS1 levels were observed in roots and the seed coat, while for GmIFS2 the highest levels were found in embryos and pods [120]. Additionally, the expression of 163

Plant Science 264 (2017) 149–167

D. Narożna et al.

Table 8 Substitutions in IFS family paralogous comparisons. IFS sequence pair

Synonymous differences

Synonymous positions

Ks

Non synonymous differences

Non synonymous positions

Ka

Ka/Ks

AdIFS1, AdIFS2 CaIFS1, CaIFS2 CcIFS1, CcIFS2 GmIFS1, GmIFS2 LalbIFS1, LalbIFS2 LangIFS1, LangIFS2 LangIFS2, LangIFS3 LjIFS1, LjIFS2 LjIFS1, LjIFS3 LlutIFS1, LlutIFS2 MtIFS1, MtIFS2 MtIFS2, MtIFS3 PsIFS1, PsIFS2 PvIFS2, PvIFS1 PvIFS1, PvIFS4 PvIFS3, PvIFS4 VrIFS1, VrIFS2 Va_IFS1, Va_IFS2 Tp_IFS2, Tp_IFS1 Tp_IFS2, Tp_IFS3

16.0 93.5 0.0 93.5 169.0 109.0 185.0 67.3 104.5 85.4 89.5 68.0 215.8 82.5 18.0 2.5 113.0 104.5 91.8 99.9

362.3 362.8 274.7 365.5 335.5 352.5 352.5 346.0 359.5 352.0 362.2 361.3 362.8 364.8 218.8 211.9 365.1 366.4 323.3 337.8

0.046 0.316 0.000 0.313 0.835 0.399 0.902 0.225 0.368 0.293 0.300 0.217 1.182 0.269 0.087 0.012 0.399 0.359 0.357 0.376

17.0 54.5 0.0 15.5 164.0 62.0 174.0 35.7 55.5 120.6 47.5 35.0 127.2 73.5 14.0 10.5 66.0 65.5 82.2 84.1

1182.7 1194.3 928.3 1197.5 1140.5 1195.5 1189.5 1160.0 1194.5 1196.0 1200.8 1201.7 1200.2 1198.3 750.3 727.1 1197.9 1196.6 1086.7 1138.2

0.015 0.047 0.000 0.013 0.160 0.054 0.163 0.031 0.048 0.108 0.041 0.030 0.114 0.064 0.019 0.015 0.057 0.057 0.080 0.078

0.319 0.149 – 0.042 0.191 0.135 0.180 0.139 0.131 0.369 0.135 0.137 0.097 0.238 0.217 1.227 0.143 0.158 0.223 0.207

Conflict of interest

approaches and breeding strategies targeting enhanced secondary metabolite yield, since the isoflavonoid level, particularly in seeds, is an important agronomic trait. Therefore, besides its obvious importance for science, the issue of isoflavonoid biosynthesis has a significant practical value.

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. Acknowledgments

5. Conclusions The authors would like to thank Dr Hua’an Yang (Department of Agriculture and Food Western Australia, Perth, Australia) and Dr Matthew Nelson (The University of Western Australia, Crawley, Australia) for the seeds of the L. angustifolius mapping population and for information on marker segregation data of the L. angustifolius linkage map. We also thank Dr William Truman (Institute of Plant Genetics, Polish Academy of Sciences) for manuscript proofreading. The study was realized with the financial support of the Polish Ministry of Science and Higher Education research grant N N301 391939.

1. Isoflavone synthase genes in legumes have evolved by lineage-specific whole-genome and tandem duplications, as demonstrated by shared sequence collinearity patterns and location on particular genus branches of the consensus phylogenetic tree. 2. Surviving legume IFS duplicates have been subjected to strong purifying selection, corresponding to their position in the metabolic pathway, and have maintained their transcriptional activity. 3. All three L. angustifolius IFS genes revealed organ-specific expression patterns similar to those observed in other representatives of the Papilionoideae. 4. Duplicated IFS homologs of Lupinus spp. retained non-negligible levels of substitutions in conserved amino acid motifs, putatively due to positive selection acting during early evolution of the genus, before the whole-genome duplication.

Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at http://dx.doi.org/10.1016/j.plantsci.2017.09.007. References

Author contributions

[1] J. Mierziak, K. Kostyn, A. Kulma, Flavonoids as important molecules of plant interactions with the environment, Molecules 19 (2014) 16240–16241. [2] F.D. Dakora, D.A. Phillips, Diverse functions of isoflavonoids in legumes transcend anti-microbial definitions of phytoalexins, Physiol. Mol. Plant Pathol. 49 (1996) 1–20. [3] R.M. Kosslak, R. Bookland, J. Barkei, H.E. Paaren, E.R. Applebaum, Induction of Bradyrhizobium japonicum common nod genes by isoflavones isolated from Glycine max, Proc. Natl. Acad. Sci. U. S. A. 84 (1987) 7428–7432. [4] S. Subramanian, G. Stacey, O. Yu, Endogenous isoflavones are essential for the establishment of symbiosis between soybean and Bradyrhizobium japonicum, Plant J. 48 (2006) 261–273. [5] R.A. Dixon, D. Ferreira, Genistein, Phytochemistry 60 (2002) 205–211. [6] O. Lapčík, Isoflavonoids in non-leguminous taxa: a rarity or a rule? Phytochemistry 68 (2007) 2909–2916. [7] J.C. Martínez Valderrama, Distribution of flavonoids in the Myristicaceae, Phytochemistry 55 (2000) 505–511. [8] J. Reynaud, D. Guilet, R. Terreux, M. Lussignol, N. Walchshofer, Isoflavonoids in non-leguminous families: an update, Nat. Prod. Rep. 22 (2005) 504–515. [9] R.A. Dixon, Phytoestrogens, Annu. Rev. Plant Biol. 55 (2004) 225–261. [10] T. Cornwell, W. Cohick, I. Raskin, Dietary phytoestrogens and health, Phytochemistry 65 (2004) 995–1016. [11] R.A. Dixon, G.M. Pasinetti, Flavonoids and isoflavonoids: from plant biology to agriculture and neuroscience, Plant Physiol. 154 (2010) 453–457. [12] D. Muth, P. Kachlicki, P. Krajewski, M. Przystalski, M. Stobiecki, Differential

DN carried out BAC library screening and verification of hybridization results by PCR, restriction fingerprinting, conducted IFS gene expression profiling, has developed molecular markers for genetic mapping, has contributed in general to the concept of the research scheme and participated in manuscript drafting and figures preparation. MK performed the functional annotation of BAC sequences, phylogenetic survey, microsynteny analysis, RNA-seq & microarray data survey, was involved in data analysis and participated in manuscript drafting and preparation of figures and supplementary material. ŁP performed BAC-FISH analysis and genetic mapping. JK contributed to BAC library screening and verification of hybridization results by PCR, restriction fingerprinting, and to IFS gene expression profiling. CJM and BN contributed to the work conceptualization and data interpretation, to the draft preparation and its approval. BW participated in the general conceptualization of the study and experiment design. DN and MK contributed equally to this article. All authors have read and approved the final manuscript. 164

Plant Science 264 (2017) 149–167

D. Narożna et al.

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23] [24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

metabolic response of narrow leafed lupine (Lupinus angustifolius) leaves to infection with Colletotrichum lupini, Metabolomics 5 (2009) 354–362. A. Wojakowska, K. Kułak, M. Jasiński, P. Kachlicki, S. Stawiński, M. Stobiecki, Metabolic response of narrow leaf lupine (Lupinus angustifolius) plants to elicitation and infection with Colletotrichum lupini under field conditions, Acta Physiol. Plant. 37 (2015) 152. P. Bednarek, L. Kerhoas, J. Einhorn, R. Frański, P. Wojtaszek, M. Rybus-Zając, M. Stobiecki, Profiling of flavonoid conjugates in Lupinus albus and Lupinus angustifolius responding to biotic and abiotic stimuli, J. Chem. Ecol. 29 (2003) 1127–1142. B.K. Franzmayr, S. Rasmussen, K.M. Fraser, P.E. Jameson, Expression and functional characterization of a white clover isoflavone synthase in tobacco, Ann. Bot. 110 (2012) 1291–1301. L. Gou, R. Zhang, L. Ma, F. Zhu, J. Dong, T. Wang, Multigene synergism increases the isoflavone and proanthocyanidin contents of Medicago truncatula, Plant Biotechnol. J. (2015) (n/a-n/a). C.H. Shih, Y. Chen, M. Wang, I.K. Chu, C. Lo, Accumulation of isoflavone genistin in transgenic tomato plants overexpressing a soybean isoflavone synthase gene, J. Agric. Food Chem. 56 (2008) 5655–5661. M. Pičmanová, D. Reňák, J. Feciková, P. Růžička, P. Mikšátková, O. apčík, D. Honys, Functional expression and subcellular localization of pea polymorphic isoflavone synthase CYP93C18, Biol. Plant 57 (2013) 635–645. M.F. Hashim, T. Hakamatsuka, Y. Ebizuka, U. Sankawa, Reaction mechamism of oxidative rearrangement of flavanone in isoflavone biosynthesis, FEBS Lett. 271 (1990) 219–222. T. Akashi, T. Aoki, S.i. Ayabe, Cloning and functional expression of a cytochrome P450 cDNA encoding 2-hydroxyisoflavanone synthase involved in biosynthesis of the isoflavonoid skeleton in licorice, Plant Physiol. 121 (1999) 821–828. N. Shimada, T. Akashi, T. Aoki, S. Ayabe, Induction of isoflavonoid pathway in the model legume Lotus japonicus: molecular characterization of enzymes involved in phytoalexin biosynthesis, Plant Sci. 160 (2000) 37–47. B.G. Kim, S.-Y. Kim, H.S. Song, C. Lee, H.-G. Hur, S.-I. Kim, J.-H. Ahn, Cloning and expression of the isoflavone synthase gene (IFS-Tp) from Trifolium pratense, Mol. Cells 15 (2003) 301–306. S. Chu, J. Wang, H. Cheng, Q. Yang, D. Yu, Evolutionary study of the isoflavonoid pathway based on multiple copies analysis in soybean, BMC Genet. 15 (2014) 76. W. Jung, O. Yu, S.M.C. Lau, D.P. O'Keefe, J. Odell, G. Fader, B. McGonigle, Identification and expression of isoflavone synthase, the key enzyme for biosynthesis of isoflavones in legumes, Nat. Biotechnol. 18 (2000) 208. S.B. Cannon, M.R. McKain, A. Harkess, M.N. Nelson, S. Dash, M.K. Deyholos, Y. Peng, B. Joyce, C.N. Stewart, M. Rolf, T. Kutchan, X. Tan, C. Chen, Y. Zhang, E. Carpenter, G.K.-S. Wong, J.J. Doyle, J. Leebens-Mack, Multiple polyploidy events in the early radiation of nodulating and nonnodulating legumes, Mol. Biol. Evol. 32 (2015) 193–210. M. Kroc, G. Koczyk, W. Święcicki, A. Kilian, M.N. Nelson, New evidence of ancestral polyploidy in the Genistoid legume Lupinus angustifolius L. (narrow-leafed lupin), Theor. Appl. Genet. 127 (2014) 1237–1249. M.N. Nelson, H.T.T. Phan, S.R. Ellwood, P.M. Moolhuijzen, J. Hane, A. Williams, C.E. O'Lone, J. Fosu-Nyarko, M. Scobie, M. Cakir, M.G.K. Jones, M. Bellgard, M. Książkiewicz, B. Wolko, S.J. Barker, R.P. Oliver, W.A. Cowling, The first genebased map of Lupinus angustifolius L.-location of domestication genes and conserved synteny with Medicago truncatula, Theor. Appl. Genet. 113 (2006) 225–238. M.N. Nelson, P.M. Moolhuijzen, J.G. Boersma, M. Chudy, K. Lesniewska, M. Bellgard, R.P. Oliver, W. Święcicki, B. Wolko, W.A. Cowling, S.R. Ellwood, Aligning a new reference genetic map of Lupinus angustifolius with the genome sequence of the model legume, Lotus japonicus, DNA Res. 17 (2010) 73–83. L.L. Gao, J.K. Hane, L.G. Kamphuis, R. Foley, B.J. Shi, C.A. Atkins, K.B. Singh, Development of genomic resources for the narrow-leafed lupin (Lupinus angustifolius): construction of a bacterial artificial chromosome (BAC) library and BACend sequencing, BMC Genomics 12 (2011) 521–521. A. Kasprzak, J. Šafář, J. Janda, J. Doležel, B. Wolko, B. Naganowska, The bacterial artificial chromosome (Bac) library of the narrow-leafed lupin (Lupinus angustifolius L.), Cell. Mol. Biol. Lett. 11 (2006) 396–407. L.G. Kamphuis, J.K. Hane, M.N. Nelson, L. Gao, C.A. Atkins, K.B. Singh, Transcriptome sequencing of different narrow-leafed lupin tissue types provides a comprehensive uni-gene assembly and extensive gene-based molecular markers, Plant Biotechnol. J. 13 (2015) 14–25. H. Yang, Y. Tao, Z. Zheng, Q. Zhang, G. Zhou, M.W. Sweetingham, J.G. Howieson, C. Li, Draft genome sequence, and a sequence-defined genetic linkage map of the legume crop species Lupinus angustifolius L, PLoS One 8 (2013) e64799. M. Książkiewicz, K. Wyrwa, A. Szczepaniak, S. Rychel, K. Majcherkiewicz, Ł. Przysiecka, W. Karlowski, B. Wolko, B. Naganowska, Comparative genomics of Lupinus angustifolius gene-rich regions: BAC library exploration, genetic mapping and cytogenetics, BMC Genomics 14 (2013) 79–79. K. Lesniewska, M. Książkiewicz, M.N. Nelson, F. Mahé, A. Aînouche, B. Wolko, B. Naganowska, Assignment of 3 genetic linkage groups to 3 chromosomes of narrow-leafed lupin, J. Heredity 102 (2011) 228–236. K. Wyrwa, M. Książkiewicz, A. Szczepaniak, K. Susek, J. Podkowiński, B. Naganowska, Integration of Lupinus angustifolius L. (narrow-leafed lupin) genome maps and comparative mapping within legumes, Chromosome Res. 24 (2016) 355–378. M. Książkiewicz, A. Zielezinski, K. Wyrwa, A. Szczepaniak, S. Rychel, W. Karlowski, B. Wolko, B. Naganowska, Remnants of the legume ancestral genome preserved in gene-rich regions: insights from Lupinus angustifolius physical, genetic, and comparative mapping, Plant Mol. Biol. Rep. 33 (2015) 84–101. Ł. Przysiecka, M. Książkiewicz, B. Wolko, B. Naganowska, Structure, expression

[38]

[39]

[40]

[41]

[42]

[43]

[44]

[45]

[46]

[47]

[48]

165

profile and phylogenetic inference of chalcone isomerase-like genes from the narrow-leafed lupin (Lupinus angustifolius L.) genome, Front. Plant Sci. 6 (2015). D.J. Bertioli, S.B. Cannon, L. Froenicke, G. Huang, A.D. Farmer, E.K.S. Cannon, X. Liu, D. Gao, J. Clevenger, S. Dash, L. Ren, M.C. Moretzsohn, K. Shirasawa, W. Huang, B. Vidigal, B. Abernathy, Y. Chu, C.E. Niederhuth, P. Umale, A.C. Araujo, A. Kozik, K. Do Kim, M.D. Burow, R.K. Varshney, X. Wang, X. Zhang, N. Barkley, P.M. Guimaraes, S. Isobe, B. Guo, B. Liao, H.T. Stalker, R.J. Schmitz, B.E. Scheffler, S.C.M. Leal-Bertioli, X. Xun, S.A. Jackson, R. Michelmore, P. OziasAkins, The genome sequences of Arachis duranensis and Arachis ipaensis, the diploid ancestors of cultivated peanut, Nat. Genet. 48 (2016) 438–446. S. Sato, Y. Nakamura, T. Kaneko, E. Asamizu, T. Kato, M. Nakao, S. Sasamoto, A. Watanabe, A. Ono, K. Kawashima, T. Fujishiro, M. Katoh, M. Kohara, Y. Kishida, C. Minami, S. Nakayama, N. Nakazaki, Y. Shimizu, S. Shinpo, C. Takahashi, T. Wada, M. Yamada, N. Ohmido, M. Hayashi, K. Fukui, T. Baba, T. Nakamichi, H. Mori, S. Tabata, Genome structure of the legume, Lotus japonicus, DNA Res. 15 (2008) 227–239. Y.J. Kang, S.K. Kim, M.Y. Kim, P. Lestari, K.H. Kim, B.K. Ha, T.H. Jun, W.J. Hwang, T. Lee, J. Lee, S. Shim, M.Y. Yoon, Y.E. Jang, K.S. Han, P. Taeprayoon, N. Yoon, P. Somta, P. Tanya, K.S. Kim, J.G. Gwag, J.K. Moon, Y.H. Lee, B.S. Park, A. Bombarely, J.J. Doyle, S.A. Jackson, R. Schafleitner, P. Srinives, R.K. Varshney, S.H. Lee, Genome sequence of mungbean and insights into evolution within Vigna species, Nat. Commun. 5 (2014). J. Schmutz, S.B. Cannon, J. Schlueter, J. Ma, T. Mitros, W. Nelson, D.L. Hyten, Q. Song, J.J. Thelen, J. Cheng, D. Xu, U. Hellsten, G.D. May, Y. Yu, T. Sakurai, T. Umezawa, M.K. Bhattacharyya, D. Sandhu, B. Valliyodan, E. Lindquist, M. Peto, D. Grant, S. Shu, D. Goodstein, K. Barry, M. Futrell-Griggs, B. Abernathy, J. Du, Z. Tian, L. Zhu, N. Gill, T. Joshi, M. Libault, A. Sethuraman, X.C. Zhang, K. Shinozaki, H.T. Nguyen, R.A. Wing, P. Cregan, J. Specht, J. Grimwood, D. Rokhsar, G. Stacey, R.C. Shoemaker, S.A. Jackson, Genome sequence of the palaeopolyploid soybean, Nature 463 (2010) 178–183. J. Schmutz, P.E. McClean, S. Mamidi, G.A. Wu, S.B. Cannon, J. Grimwood, J. Jenkins, S. Shu, Q. Song, C. Chavarro, M. Torres-Torres, V. Geffroy, S.M. Moghaddam, D. Gao, B. Abernathy, K. Barry, M. Blair, M.A. Brick, M. Chovatia, P. Gepts, D.M. Goodstein, M. Gonzales, U. Hellsten, D.L. Hyten, G. Jia, J.D. Kelly, D. Kudrna, R. Lee, M.M.S. Richard, P.N. Miklas, J.M. Osorno, J. Rodrigues, V. Thareau, C.A. Urrea, M. Wang, Y. Yu, M. Zhang, R.A. Wing, P.B. Cregan, D.S. Rokhsar, S.A. Jackson, A reference genome for common bean and genome-wide analysis of dual domestications, Nat. Genet. 46 (2014) 707–713. R.K. Varshney, W. Chen, Y. Li, A.K. Bharti, R.K. Saxena, J.A. Schlueter, M.T.A. Donoghue, S. Azam, G. Fan, A.M. Whaley, A.D. Farmer, J. Sheridan, A. Iwata, R. Tuteja, R.V. Penmetsa, W. Wu, H.D. Upadhyaya, S.P. Yang, T. Shah, K.B. Saxena, T. Michael, W.R. McCombie, B. Yang, G. Zhang, H. Yang, J. Wang, C. Spillane, D.R. Cook, G.D. May, X. Xu, S.A. Jackson, Draft genome sequence of pigeonpea (Cajanus cajan), an orphan legume crop of resource-poor farmers, Nat. Biotech. 30 (2012) 83–89. R.K. Varshney, C. Song, R.K. Saxena, S. Azam, S. Yu, A.G. Sharpe, S. Cannon, J. Baek, B.D. Rosen, B. Tar'an, T. Millan, X. Zhang, L.D. Ramsay, A. Iwata, Y. Wang, W. Nelson, A.D. Farmer, P.M. Gaur, C. Soderlund, R.V. Penmetsa, C. Xu, A.K. Bharti, W. He, P. Winter, S. Zhao, J.K. Hane, N. Carrasquilla-Garcia, J.A. Condie, H.D. Upadhyaya, M.C. Luo, M. Thudi, C.L.L. Gowda, N.P. Singh, J. Lichtenzveig, K.K. Gali, J. Rubio, N. Nadarajan, J. Dolezel, K.C. Bansal, X. Xu, D. Edwards, G. Zhang, G. Kahl, J. Gil, K.B. Singh, S.K. Datta, S.A. Jackson, J. Wang, D.R. Cook, Draft genome sequence of chickpea (Cicer arietinum) provides a resource for trait improvement, Nat. Biotech. 31 (2013) 240–246. N.D. Young, F. Debellé, G.E.D. Oldroyd, R. Geurts, S.B. Cannon, M.K. Udvardi, V.A. Benedito, K.F.X. Mayer, J. Gouzy, H. Schoof, Y. Van de Peer, S. Proost, D.R. Cook, B.C. Meyers, M. Spannagl, F. Cheung, S. De Mita, V. Krishnakumar, H. Gundlach, S. Zhou, J. Mudge, A.K. Bharti, J.D. Murray, M.A. Naoumkina, B. Rosen, K.A.T. Silverstein, H. Tang, S. Rombauts, P.X. Zhao, P. Zhou, V. Barbe, P. Bardou, M. Bechner, A. Bellec, A. Berger, H. Berges, S. Bidwell, T. Bisseling, N. Choisne, A. Couloux, R. Denny, S. Deshpande, X. Dai, J.J. Doyle, A.M. Dudez, A.D. Farmer, S. Fouteau, C. Franken, C. Gibelin, J. Gish, S. Goldstein, A.J. Gonzalez, P.J. Green, A. Hallab, M. Hartog, A. Hua, S.J. Humphray, D.H. Jeong, Y. Jing, A. Jocker, S.M. Kenton, D.J. Kim, K. Klee, H. Lai, C. Lang, S. Lin, S.L. Macmil, G. Magdelenat, L. Matthews, J. McCorrison, E.L. Monaghan, J.H. Mun, F.Z. Najar, C. Nicholson, C. Noirot, M. O'Bleness, C.R. Paule, J. Poulain, F. Prion, B. Qin, C. Qu, E.F. Retzel, C. Riddle, E. Sallet, S. Samain, N. Samson, I. Sanders, O. Saurat, C. Scarpelli, T. Schiex, B. Segurens, A.J. Severin, D.J. Sherrier, R. Shi, S. Sims, S.R. Singer, S. Sinharoy, L. Sterck, A. Viollet, B.B. Wang, K. Wang, M. Wang, X. Wang, J. Warfsmann, J. Weissenbach, D.D. White, J.D. White, G.B. Wiley, P. Wincker, Y. Xing, L. Yang, Z. Yao, F. Ying, J. Zhai, L. Zhou, A. Zuber, J. Denarie, R.A. Dixon, G.D. May, D.C. Schwartz, J. Rogers, F. Quetier, C.D. Town, B.A. Roe, The Medicago genome provides insight into the evolution of rhizobial symbioses, Nature 480 (2011) 520–524. M. Książkiewicz, S. Rychel, M.N. Nelson, K. Wyrwa, B. Naganowska, B. Wolko, Expansion of the phosphatidylethanolamine binding protein family in legumes: a case study of Lupinus angustifolius L. FLOWERING LOCUS T homologs, LanFTc1 and LanFTc2, BMC Genomics 17 (2016) 820. T. Stępkowski, M. Żak, L. Moulin, J. Króliczak, B. Golińska, D. Narożna, V.I. Safronova, C.J. Mądrzak, Bradyrhizobium canariense and Bradyrhizobium japonicum are the two dominant rhizobium species in root nodules of lupin and serradella plants growing in Europe, Syst. Appl. Microbiol. 34 (2011) 368–375. J.G. Boersma, M. Pallotta, C. Li, B.J. Buirchell, K. Sivasithamparam, H. Yang, Construction of a genetic linkage map using mflp and Identification of molecular markers linked to domestication genes in narrow-leafed lupin (Lupinus angustifolius L.), Cell. Mol. Biol. Lett. 10 (2005) 331–334.

Plant Science 264 (2017) 149–167

D. Narożna et al.

using peanut (Arachis spp.) RNA-seq data, PLoS One 9 (2014) e115055. [74] H. Kudapa, A.K. Bharti, S.B. Cannon, A.D. Farmer, B. Mulaosmanovic, R. Kramer, A. Bohra, N.T. Weeks, J.A. Crow, R. Tuteja, T. Shah, S. Dutta, D.K. Gupta, A. Singh, K. Gaikwad, T.R. Sharma, G.D. May, N.K. Singh, R.K. Varshney, A comprehensive transcriptome assembly of pigeonpea (Cajanus cajan L.) using sanger and secondgeneration sequencing platforms, Mol. Plant 5 (2012) 1020–1028. [75] H. Kudapa, S. Azam, A.G. Sharpe, B. Taran, R. Li, B. Deonovic, C. Cameron, A.D. Farmer, S.B. Cannon, R.K. Varshney, Comprehensive transcriptome assembly of chickpea (Cicer arietinum L.) using Sanger and next generation sequencing platforms: development and applications, PLoS One 9 (2014) e86039. [76] A.J. Severin, J.L. Woody, Y.T. Bolon, B. Joseph, B.W. Diers, A.D. Farmer, G.J. Muehlbauer, R.T. Nelson, D. Grant, J.E. Specht, M.A. Graham, S.B. Cannon, G.D. May, C.P. Vance, R.C. Shoemaker, RNA-seq atlas of Glycine max: a guide to the soybean transcriptome, BMC Plant Biol. 10 (2010) 160–160. [77] J. Verdier, I. Torres-Jerez, M. Wang, A. Andriankaja, S.N. Allen, J. He, Y. Tang, J.D. Murray, M.K. Udvardi, Establishment of the Lotus japonicus Gene Expression Atlas (LjGEA) and its use to explore legume seed maturation, Plant J. 74 (2013) 351–362. [78] V.A. Benedito, I. Torres-Jerez, J.D. Murray, A. Andriankaja, S. Allen, K. Kakar, M. Wandrey, J. Verdier, H. Zuber, T. Ott, S. Moreau, A. Niebel, T. Frickey, G. Weiller, J. He, X. Dai, P.X. Zhao, Y. Tang, M.K. Udvardi, A gene expression atlas of the model legume Medicago truncatula, Plant J. 55 (2008) 504–513. [79] J. He, V.A. Benedito, M. Wang, J.D. Murray, P.X. Zhao, Y. Tang, M.K. Udvardi, The Medicago truncatula gene expression atlas web server, BMC Bioinf. 10 (2009) 441–441. [80] S. Alves-Carvalho, G. Aubert, S. Carrére, C. Cruaud, A.L. Brochot, F. Jacquin, A. Klein, C. Martin, K. Boucherot, J. Kreplak, C. Da Silva, S. Moreau, P. Gamas, P. Wincker, J. Gouzy, J. Burstin, Full-length de novo assembly of RNA-seq data in pea (Pisum sativum L.) provides a gene expression atlas and gives insights into root nodulation in this species, Plant J. 84 (2015) 1–19. [81] J.A. O'Rourke, L.P. Iniguez, F. Fu, B. Bucciarelli, S.S. Miller, S.A. Jackson, P.E. McClean, J. Li, X. Dai, P.X. Zhao, G. Hernandez, C.P. Vance, An RNA-Seq based gene expression atlas of the common bean, BMC Genomics 15 (2014) 866. [82] H. Chen, L. Wang, S. Wang, C. Liu, M.W. Blair, X. Cheng, Transcriptome sequencing of mung bean (Vigna radiata L.) genes and the identification of EST-SSR markers, PLoS One 10 (2015) e0120273. [83] J.H. Zar, Biostatistical Analysis, Prentice-Hall, Englewood Cliffs, NJ, 1984. [84] M. Sakaguchi, K. Mihara, R. Sato, Signal recognition particle is required for cotranslational insertion of cytochrome P-450 into microsomal membranes, Proc. Natl. Acad. Sci. U. S. A. 81 (1984) 3361–3364. [85] M. Sakaguchi, K. Mihara, R. Sato, A short amino-terminal segment of microsomal cytochrome P-450 functions both as an insertion signal and as a stop-transfer sequence, EMBO J. 6 (1987) 2425–2431. [86] P. Wiriyaampaiwong, S. Thanonkeo, P. Thanonkeo, Molecular characterization of isoflavone synthase gene from Pueraria candollei var. mirifica, Afr. J. Agric. Res. 7 (2012) 4489–4498. [87] S. Yamazaki, K. Sato, K. Suhara, M. Sakaguchi, K. Mihara, T. Omura, Importance of the Proline-Rich Region Following signal-anchor sequence in the formation of correct conformation of microsomal cytochrome P-450s, J. Biochem. 114 (1993) 652–657. [88] S.E. Graham-Lorence, J.A. Peterson, Structural alignments of P450s and extrapolations to the unknown, in: F.J.a.M. Eric (Ed.), Methods in Enzymology Cytochrome P450, Part B, Academic Press, 1996, 2017, pp. 315–326. [89] C.L. Steele, M. Gijzen, D. Qutob, R.A. Dixon, Molecular characterization of the enzyme catalyzing the aryl migration reaction of isoflavonoid biosynthesis in soybean, Arch. Biochem. Biophys. 367 (1999) 146–150. [90] Y. Sawada, K. Kinoshita, T. Akashi, T. Aoki, S. i. Ayabe, Key amino acid residues required for aryl migration catalysed by the cytochrome P450 2-hydroxyisoflavanone synthase, Plant J. 31 (2002) 555–564. [91] M.A. Schuler, Plant cytochrome P450 monooxygenases, Crit. Rev. Plant Sci. 15 (1996) 235–284. [92] Z. Yang, W.S.W. Wong, R. Nielsen, Bayes empirical bayes inference of amino acid sites under positive selection, Mol. Biol. Evol. 22 (2005) 1107–1118. [93] N. Shimada, S. Sato, T. Akashi, Y. Nakamura, S. Tabata, S.i. Ayabe, T. Aoki, Genome-wide Analyses of the structural gene families involved in the legumespecific 5-deoxyisoflavonoid biosynthesis of Lotus japonicus, DNA Res. 14 (2007) 25–36. [94] L.A. Honaas, E.K. Wafula, N.J. Wickett, J.P. Der, Y. Zhang, P.P. Edger, N.S. Altman, J.C. Pires, J.H. Leebens-Mack, C.W. dePamphilis, Selecting superior de novo transcriptome assemblies: lessons learned by leveraging the best plant genome, PLoS One 11 (2016) e0146062. [95] X. Wang, H.Z. Zhang, Y. Gao, G. Sun, W. Zhang, L. Qiu, A comprehensive analysis of the cupin gene family in soybean (Glycine max), PLoS One 9 (2014) e110092. [96] A. Roulin, P.L. Auer, M. Libault, J. Schlueter, A. Farmer, G. May, G. Stacey, R.W. Doerge, S.A. Jackson, The fate of duplicated genes in a polyploid plant genome, Plant J. 73 (2013) 143–153. [97] J.A.S. chlueter, P. Dixon, C. Granger, D. Grant, L. Clark, J.J. Doyle, R.C. Shoemaker, Mining EST databases to resolve evolutionary events in major crop species, Genome 47 (2004) 868–876. [98] B.E. Pfeil, J.A. Schlueter, R.C. Shoemaker, J.J. Doyle, Placing paleopolyploidy in relation to taxon divergence: a phylogenetic analysis in legumes using 39 gene families, Syst. Biol. 54 (2005) 441–454. [99] D.J. Bertioli, M.C. Moretzsohn, L.H. Madsen, N. Sandal, S.C. Leal-Bertioli, P.M. Guimar+úes, B.K. Hougaard, J. Fredslund, L. Schauser, A.M. Nielsen, S. Sato, S. Tabata, S.B. Cannon, J. Stougaard, An analysis of synteny of Arachis with Lotus and Medicago sheds new light on the structure, stability and evolution of legume

[49] O. Kohany, A.J. Gentles, L. Hankus, J. Jurka, Annotation, submission and screening of repetitive elements in Repbase: repbaseSubmitter and Censor, BMC Bioinf. 7 (2006) 1–7. [50] L.B. Parra-González, G.A. Aravena-Abarzúa, C.S. Navarro-Navarro, J. Udall, J. Maughan, L.M. Peterson, H.E. Salvo-Garrido, I.J. Maureira-Butler, Yellow lupin (Lupinus luteus L.) transcriptome sequencing: molecular marker development and comparative studies, BMC Genomics 13 (2012) 1–15. [51] J.A. O'Rourke, S.S. Yang, S.S. Miller, B. Bucciarelli, J. Liu, A. Rydeen, Z. Bozsoki, C. Uhde-Stone, Z.J. Tu, D. Allan, J.W. Gronwald, C.P. Vance, An RNA-seq transcriptome analysis of orthophosphate-deficient white lupin reveals novel insights into phosphorus acclimation in plants, Plant Physiol. 161 (2013) 705–724. [52] E. Lyons, B. Pedersen, J. Kane, M. Alam, R. Ming, H. Tang, X. Wang, J. Bowers, A. Paterson, D. Lisch, M. Freeling, Finding and comparing syntenic regions among Arabidopsis and the outgroups papaya, poplar, and grape: CoGe with rosids, Plant Physiol. 148 (2008) 1772–1781. [53] V. Solovyev, Statistical approaches in eukaryotic gene prediction, Handbook of Statistical Genetics, John Wiley & Sons, Ltd, 2004, 2017. [54] J.C. Dohm, A.E. Minoche, D. Holtgrawe, S. Capella-Gutierrez, F. Zakrzewski, H. Tafer, O. Rupp, T.R. Sorensen, R. Stracke, R. Reinhardt, A. Goesmann, T. Kraft, B. Schulz, P.F. Stadler, T. Schmidt, T. Gabaldon, H. Lehrach, B. Weisshaar, H. Himmelbauer, The genome of the recently domesticated crop plant sugar beet (Beta vulgaris), Nature 505 (2014) 546–549. [55] J.J. De Vega, S. Ayling, M. Hegarty, D. Kudrna, J.L. Goicoechea, A. Ergon, O.A. Rognli, C. Jones, M. Swain, R. Geurts, C. Lang, K.F.X. Mayer, S. Rössner, S. Yates, K.J. Webb, I.S. Donnison, G.E.D. Oldroyd, R.A. Wing, M. Caccamo, W. Powell, M.T. Abberton, L. Skot, Red clover (Trifolium pratense L.) draft genome provides a platform for trait improvement, Sci. Rep. 5 (2015) 17394. [56] Y.J. Kang, D. Satyawan, S. Shim, T. Lee, J. Lee, W.J. Hwang, S.K. Kim, P. Lestari, K. Laosatit, K.H. Kim, T.J. Ha, A. Chitikineni, M.Y. Kim, J.M. Ko, J.G. Gwag, J.K. Moon, Y.H. ee, B.S. Park, R.K. Varshney, S.H. Lee, Draft genome sequence of adzuki bean, Vigna angularis, Sci. Rep. 5 (2015) 8069. [57] N. Matasci, L.H. Hung, Z. an, E.J. Carpenter, N.J. Wickett, S. Mirarab, N. Nguyen, T. Warnow, S. Ayyampalayam, M. Barker, J.G. Burleigh, M.A. Gitzendanner, E. Wafula, J.P. Der, C.W. de Pamphilis, B. Roure, H. Philippe, B.R. Ruhfel, N.W. Miles, S.W. Graham, S. Mathews, B. Surek, M. Melkonian, D.E. Soltis, P.S. Soltis, C. Rothfels, L. Pokorny, J.A. Shaw, L. De Gironimo, D.W. Stevenson, J.C. Villarreal, T. Chen, T.M. Kutchan, M. Rolf, R.S. Baucom, M.K. Deyholos, R. Samudrala, Z. Tian, X. Wu, X. Sun, Y.Z. Hang, J. Wang, J. Leebens-Mack, G.K.S. Wong, Data access for the 1,000 plants (1KP) project, Gigascience 3 (2014) 17. [58] N.J. Wickett, S. Mirarab, N. Nguyen, T. Warnow, E. Carpenter, N. Matasci, S. Ayyampalayam, M.S. Barker, J.G. Burleigh, M.A. Gitzendanner, B.R. Ruhfel, E. Wafula, J.P. Der, S.W. Graham, S. Mathews, M. Melkonian, D.E. Soltis, P.S. Soltis, N.W. Miles, C.J. Rothfels, L. Pokorny, A.J. Shaw, L. De Gironimo, D.W. Stevenson, B. Surek, J.C. Villarreal, B. Roure, H. Philippe, C.W. de Pamphilis, T. Chen, M.K. Deyholos, R.S. Baucom, T.M. Kutchan, M.M. Augustin, J. Wang, Y. Zhang, Z. Tian, Z. Yan, X. Wu, X. Sun, G.K.-S. Wong, J. Leebens-Mack, Phylotranscriptomic analysis of the origin and early diversification of land plants, Proc. Natl. Acad. Sci. U. S. A. 111 (2014) E4859–E4868. [59] M. Kearse, R. Moir, A. Wilson, S. Stones-Havas, M. Cheung, S. Sturrock, S. Buxton, A. Cooper, S. Markowitz, C. Duran, T. Thierer, B. Ashton, P. Meintjes, A. Drummond, Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data, Bioinformatics 28 (2012) 1647–1649. [60] J. Li, X. Dai, T. Liu, P.X. Zhao, LegumeIP: an integrative database for comparative genomics and transcriptomics of model legumes, Nucl. Acids Res. 40 (2012) D1221–D1229. [61] P. Misra, A. Pandey, S.K. Tewari, P. Nath, P.K. Trivedi, Characterization of isoflavone synthase gene from Psoralea corylifolia: a medicinal plant, Plant Cell Rep. 29 (2010) 747–755. [62] K. Katoh, K. Misawa, K. Kuma, T. Miyata, MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform, Nucl. Acids Res. 30 (2002) 3059–3066. [63] D. Darriba, G.L. Taboada, R. Doallo, D. Posada, jModelTest 2: more models, new heuristics and high-performance computing, Nat. Methods 9 (2012) 772–772. [64] J.P. Huelsenbeck, F. Ronquist, MRBAYES: Bayesian inference of phylogenetic trees, Bioinformatics 17 (2001) 754–755. [65] D.T. Jones, W.R. Taylor, J.M.T. hornton, The rapid generation of mutation data matrices from protein sequences, Comput. Appl. Biosci.: CABIOS 8 (1992) 275–282. [66] F. Abascal, R. Zardoya, D. Posada, ProtTest: selection of best-fit models of protein evolution, Bioinformatics 21 (2005) 2104–2105. [67] P. Librado, J. Rozas, DnaSP v5: a software for comprehensive analysis of DNA polymorphism data, Bioinformatics 25 (2009) 1451–1452. [68] Z. Yang, PAML 4: phylogenetic analysis by maximum likelihood, Mol. Biol. Evol. 24 (2007) 1586–1591. [69] K.V. Revanna, C.C. Chiu, E. Bierschank, Q. Dong, GSV: a web-based genome synteny viewer for customized data, BMC Bioinf. 12 (2011) 1–4. [70] M. Krzywinski, J. Schein, N. Birol, J. Connors, R. Gascoyne, D. Horsman, S.J. Jones, M.A. Marra, Circos: an information aesthetic for comparative genomics, Genome Res. 19 (2009) 1639–1645. [71] K.F. Manly, R.H. Cudmore, J.M. Meer, Map Manager QTX cross-platform software for genetic mapping, Mamm. Genome 12 (2001) 930–932. [72] R.E. Voorrips, MapChart: software for the graphical presentation of linkage maps and QTLs, J. Hered. 93 (2002) 77–78. [73] R. Chopra, G. Burow, A. Farmer, J. Mudge, C.E. Simpson, M.D. Burow, Comparisons of de novo transcriptome assemblers in diploid and polyploid species

166

Plant Science 264 (2017) 149–167

D. Narożna et al.

(2013) 14–23. [113] M.D. Rausher, R.E. Miller, P. Tiffin, Patterns of evolutionary rate variation among genes of the anthocyanin biosynthetic pathway, Mol. Biol. Evol. 16 (1999) 266–274. [114] S. Cesco, G. Neumann, N. Tomasi, R. Pinton, L. Weisskopf, Release of plant-borne flavonoids into the rhizosphere and their role in plant nutrition, Plant Soil 329 (2010) 1–25. [115] A. Wojakowska, D. Muth, D. Narożna, C. Mądrzak, M. Stobiecki, P. Kachlicki, Changes of phenolic secondary metabolite profiles in the reaction of narrow leaf lupin (Lupinus angustifolius) plants to infections with Colletotrichum lupini fungus or treatment with its toxin, Metabolomics 9 (2013) 575–589. [116] R.A. Dixon, L. Achnine, P. Kota, C.J. Liu, M.S.S. Reddy, L. Wang, The phenylpropanoid pathway and plant defence – a genomics perspective, Mol. Plant Pathol. 3 (2002) 371–390. [117] R.A. Dixon, M.J. Harrison, N.L. Paiva, The isoflavonoid phytoalexin pathway: from enzymes to genes to transcription factors, Physiol. Plant 93 (1995) 385–392. [118] R.A. Dixon, N.L. Paiva, Stress-induced phenylpropanoid metabolism, Plant Cell 7 (1995) 1085–1097. [119] S. Subramanian, X. Hu, G. Lu, J. Odelland, O. Yu, The promoters of two isoflavone synthase genes respond differentially to nodulation and defense signals in transgenic soybean roots, Plant Mol. Biol. 54 (2004) 623–639. [120] S. Dhaubhadel, B.D. McGarvey, R. Williams, M. Gijzen, Isoflavonoid biosynthesis and accumulation in developing soybean seeds, Plant Mol. Biol. 53 (2003) 733–743. [121] S. Dhaubhadel, M. Gijzen, P. Moy, M. Farhangkhoee, Transcriptome analysis reveals a critical role of CHS7 and CHS8 genes for isoflavonoid synthesis in soybean seeds, Plant Physiol. 143 (2007) 326–338. [122] J.J. Gutierrez-Gonzalez, S. Guttikonda, D.L. Aldrich, L.S.P. Tran, R. Zhong, O. Yu, H.T. Nguyen, D.A. Sleper, Differential expression of isoflavone biosynthetic genes in soybean during water deficits, Plant Cell Physiol. (2010). [123] J.J. Gutierrez-Gonzalez, X. Wu, J.D. Gillman, J.D. Lee, R. Zhong, O. Yu, G. Shannon, M. Ellersieck, H.T. Nguyen, D.A. Sleper, Intricate environmentmodulated genetic networks control isoflavone accumulation in soybean seeds, BMC Plant Biol. 10 (2010) 105. [124] H.-K. Kim, Y.-H. Jang, I.-S. Baek, J.-H. Lee, M.J. Park, Y.-S. Chung, J.-I. Chung, J.K. Kim, Polymorphism and expression of isoflavone synthase genes from soybean cultivars, Mol. Cells 19 (2005) 67–73. [125] H. Cheng, O. Yu, D. Yu, Polymorphisms of IFS1 and IFS2 gene are associated with isoflavone concentrations in soybean seeds, Plant Sci. 175 (2008) 505–512.

genomes, BMC Genomics 10 (2009) 45–45. [100] S.B. Cannon, L. Sterck, S. Rombauts, S. Sato, F. Cheung, J. Gouzy, X. Wang, J. Mudge, J. Vasdewani, T. Schiex, M. Spannagl, E. Monaghan, C. Nicholson, S.J. Humphray, H. Schoof, K.F.X. Mayer, J. Rogers, F. Quétier, G.E. Oldroyd, F. Debellé, D.R. Cook, E.F. Retzel, B.A. Roe, C.D. Town, S. Tabata, Y. Van de Peer, N.D. Young, Legume genome evolution viewed through the Medicago truncatula and Lotus japonicus genomes, Proc. Natl. Acad. Sci. U. S. A. 103 (2006) 14959–14964. [101] S.B. Cannon, G.D. May, S.A. Jackson, Three sequenced legume genomes and many crop species: rich opportunities for translational genomics, Plant Physiol. 151 (2009) 970–977. [102] M. Lavin, P.S. Herendeen, M.F. Wojciechowski, Evolutionary rates analysis of leguminosae implicates a rapid diversification of lineages during the tertiary, Syst. Biol. 54 (2005) 575–594. [103] J.K. Hane, Y. Ming, L.G. Kamphuis, M.N. Nelson, G. Garg, C.A. Atkins, P.E. Bayer, A. Bravo, S. Bringans, S. Cannon, D. Edwards, R. Foley, L.L. Gao, M.J. Harrison, W. Huang, B. Hurgobin, S. Li, C.W. Liu, A. McGrath, G. Morahan, J. Murray, J. Weller, K.B. Singh, A comprehensive draft genome sequence for lupin (Lupinus angustifolius), an emerging health food: insights into plant-microbe interactions and legume evolution, Plant Biotechnol. J. 15 (2017) 318–330. [104] G. Blanc, K.H. Wolfe, Functional divergence of duplicated genes formed by polyploidy during Arabidopsis evolution, Plant Cell 16 (2004) 1679–1691. [105] B.P. Cusack, K.H. Wolfe, When gene marriages don't work out: divorce by subfunctionalization, T.I.G 23 (2007) 270–272. [106] R.C. Moore, M.D. Purugganan, The evolutionary dynamics of plant duplicate genes, Curr. Opin. Plant Biol. 8 (2005) 122–128. [107] D.H. Kim, I. Hwang, Direct targeting of proteins from the cytosol to organelles: the ER versus endosymbiotic organelles, Traffic 14 (2013) 613–621. [108] C. Atkins, P. Smith, C. Rodriguez-Medina, Macromolecules in phloem exudates – a review, Protoplasma 248 (2011) 165–172. [109] J.M. Cork, M.D. Purugganan, The evolution of molecular genetic pathways and networks, Bioessays 26 (2004) 479–484. [110] J. Clotault, D. Peltier, V. Soufflet-Freslon, M. Briard, E. Geoffriau, Differential selection on carotenoid biosynthesis genes as a function of gene position in the metabolic pathway: a study on the carrot and dicots, PLoS One 7 (2012) e38724. [111] H. Ramsay, L.H.R. ieseberg, K. Ritland, The correlation of evolutionary rate with pathway position in plant terpenoid biosynthesis, Mol. Biol. Evol. 26 (2009) 1045–1053. [112] C.F. Olson-Manning, C.R. Lee, M.D. Rausher, T. Mitchell-Olds, Evolution of flux control in the glucosinolate pathway in Arabidopsis thaliana, Mol. Biol. Evol. 30

167