Journal Pre-proof Genetic contribution of paleopolyploidy to adaptive evolution in angiosperms Shengdan Wu, Baocai Han, Yuannian Jiao
PII: DOI: Reference:
S1674-2052(19)30359-4 https://doi.org/10.1016/j.molp.2019.10.012 MOLP 843
To appear in: MOLECULAR PLANT Accepted Date: 23 October 2019
Please cite this article as: Wu S., Han B., and Jiao Y. (2019). Genetic contribution of paleopolyploidy to adaptive evolution in angiosperms. Mol. Plant. doi: https://doi.org/10.1016/j.molp.2019.10.012. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. All studies published in MOLECULAR PLANT are embargoed until 3PM ET of the day they are published as corrected proofs on-line. Studies cannot be publicized as accepted manuscripts or uncorrected proofs. © 2019 The Author
Page 1
Original research
1 2 3
Genetic contribution of paleopolyploidy to adaptive evolution in angiosperms
4 5
Shengdan Wu1,2, Baocai Han1, Yuannian Jiao1,2*
6 7
1
8
Chinese Academy of Sciences, Beijing 100093, China.
9
2
State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany,
University of Chinese Academy of Sciences, Beijing 100049, China.
10 11
*Corresponding author:
12
Email:
[email protected]
13 14
Running title: Paleopolyplodization and plant adaptive evolution
15 16
Short summary: Polyploidy has been recognized as a driving force for the success of
17
angiosperms. Through comprehensive investigation of gene retention pattern after
18
independent WGDs, this study provides novel evidence in genetic level supporting
19
ancient WGDs have played an essential role in plant adaptation to dramatic
20
environmental changes.
21 22 23 24 25 26 27
Page 2
28
Abstract
29
Ancient whole-genome duplications (WGD or polyploidy) are prevalent in plants, and
30
some WGDs occurred during the timing of severe global environmental changes. It
31
has been suggested that WGDs may have contributed to plant adaptation. However, it
32
still lacks of empirical evidence from genetic level to support the hypothesis. Here,
33
we investigated the survivors of gene duplicates from multiple ancient WGD events
34
on the major branches of angiosperm phylogeny, and aimed to explore genetic
35
evidence supporting the significance of polyploidy. Duplicated genes co-retained from
36
three waves of independent WGDs (~120 million years ago (Ma), ~66 Ma and <20
37
Ma) were investigated in 25 selected species. Gene families functioning in low
38
temperature and darkness were commonly retained gene duplicates after the eight
39
independently occurred WGDs in many lineages around the Cretaceous–Paleocene
40
(K-Pg) boundary, when the global cooling and darkness were the two main stresses.
41
Moreover, the commonly retained duplicates could be key factors which may have
42
contributed to the robustness of the critical stress related pathways. In addition,
43
genome-wide transcription factors (TFs) functioning in stresses tend to retain
44
duplicates after waves of WGDs, and the co-selected gene duplicates in many lineages
45
may play critical roles during severe environmental stresses. Finally, our results shed
46
new light on the significant contribution of paleopolyploidy to plant adaptation during
47
global environmental changes in the evolutionary history of angiosperms.
48 49
Key words: whole-genome duplication, paleopolyploidy, adaptive evolution,
50
phylogenomic, Cretaceous-Paleocene boundary, gene regulatory network
51 52 53 54 55
Page 3
56
Introduction
57
Angiosperms (or flowering plants) are the most diverse and abundant in the plant
58
kingdom, with about 350,000 known species on Earth. Charles Darwin described the
59
rapid rise and early diversification of angiosperms from the middle to late Cretaceous
60
period as “an abominable mystery” (Friedman, 2009). Currently, angiosperms
61
constitute the dominant vegetation of the Earth’s surface covering regions from
62
tropical to polar terrestrial zones, as well as aquatic habitats. The success is speculated
63
due to, to some extent, prevalent whole-genome duplication (WGD) events in the
64
evolutionary history of angiosperms (Levin, 1983; Soltis et al., 2009; Van de Peer et
65
al., 2009, 2017). WGD has long been recognized as an important evolutionary force
66
for speciation, adaptation, and diversification (Wood et al., 2009; Soltis and Soltis,
67
2016).
68 69
Within recent two decades, tremendous efforts have shown that WGDs are far more
70
prevalent than previously thought in the evolutionary history of flowering plants
71
(Bowers et al., 2003; Blanc and Wolfe, 2004; Cui et al., 2006; Soltis et al., 2009; Jiao
72
et al., 2011; Renny-Byfield and Wendel, 2014; Van de Peer et al., 2017). Two
73
ancestral WGDs were identified before the diversification of extant angiosperms and
74
seed plants, respectively (Jiao et al., 2011). Two major clades in angiosperms,
75
eudicots and monocots, both experienced paleopolyploidization events early in their
76
evolutionary history, named gamma (γ) and tau (τ) (Jaillon et al., 2007; Tang et al.,
77
2010; Jiao et al., 2012; Vekemans et al., 2012; Jiao et al., 2014; Ming et al., 2015). In
78
addition, WGDs also occurred in the common ancestors of many species-rich groups,
79
such as Asteraceae, Brassicaceae, Cucurbitaceae, Fabaceae, and Poaceae (Cannon et
80
al., 2015; Edger et al., 2015; Huang et al., 2016; McKain et al., 2016; Ren et al., 2018;
81
Wang et al., 2018). Especially, WGDs were recurrently occurred in many lineages.
82
For example, it had undergone three rounds of WGDs (gamma-beta-alpha) in
83
evolutionary history of Arabidopsis thaliana after split from monocots (Bowers et al.,
84
2003); and the lineage of Musa (bananas) independently experienced three rounds of
Page 4
85
WGDs after split with Poaceae (D’Hont et al., 2012).
86 87
Furthermore, previous studies found that the timing of WGDs are not randomly
88
distributed across the phylogeny of angiosperms, indicating possible roles of WGDs
89
under environmental selection. A wave of ancient WGDs occurred around the K-Pg
90
boundary independently in many plant lineages, suggesting WGDs potentially helped
91
species survived the extinction event (Fawcett et al., 2009; Vanneste et al., 2014). It
92
also has been proposed that polyploidization was associated with C4 grassland
93
expansion during the late Miocene, as well as with adaptation to recent glaciation
94
maxima (Estep et al., 2014; Novikova et al., 2018). Therefore, WGD has been
95
speculated with association to extinction events and other extreme environmental
96
changes. However, the evidence of genetic level supporting the contribution and
97
significance of WGDs to adaptation remains largely unexplored.
98 99
It is well acknowledged that polyploidy simultaneously duplicates tens of thousands
100
of genes by adding one extra set of genomes, which provides large amount of raw
101
genetic materials for evolution (Adams and Wendel, 2005; Doyle et al., 2008; Hegarty
102
and Hiscock, 2008; Soltis et al., 2015; Van de Peer et al., 2017). During the
103
subsequent fractionation and diploidization processes, a large proportion of genes will
104
quickly return to single copy state (Lynch and Conery, 2000), while the retained ones
105
are considered of particular importance to genetic innovation through the ways of
106
neo-functionalization and sub-functionalization (Ohno, 1970; Force et al., 1999). In
107
addition, the duplicated genes might also result in changes of gene regulatory
108
networks (GRNs) (Conant, 2010; De Smet and Van de Peer, 2012), which could
109
potentially contribute to plant adaptation.
110 111
To explore the significance of WGDs, here we comprehensively tracked the
112
evolutionary history of global gene families in 25 plant genomes, and investigated the
113
genetic modifications after independent WGDs. Firstly, twenty-five sequenced plant
114
genomes representing major lineages of angiosperms were used to reconstruct global
Page 5
115
gene families, and phylogenomic analyses were performed to identify the gene
116
families retained duplicated genes after ancient WGDs. Then, we identified gene
117
families with retained duplicates after independent WGDs from certain period with
118
extreme environmental changes, looking for potential selection signatures from
119
genetic level. Finally, by reconstructing GRNs from RNA-seq data and integrating
120
previously known pathways, we provided evidence of how retained duplications have
121
contributed to reshape the GRNs in response to environmental stresses.
122 123
Results
124
Identifying paralogous genes retained after WGDs
125
To identify genetic contribution of WGDs, we investigated twenty-one well
126
acknowledged polyploidization events in the evolutionary history of angiosperms
127
(Figure 1). We selected 25 sequenced plant genomes (Supplemental Table 1) and
128
constructed putative gene families from their protein coding sequences. In total,
129
66,509 orthogroups were constructed by OrthoMCL (Li et al., 2003). Among them,
130
12,077 orthogroups with four or more genes and including at least one gene from
131
outgroups (Physcomitrella patens, Selaginella moellendorffii, Amborella trichopoda)
132
were used to reconstruct maximum likelihood phylogenetic trees (see Methods). Then,
133
phylogenomic analyses were performed to look for duplication events from each gene
134
family phylogeny as previous described (Jiao et al., 2011).
135 136
To largely ensure the duplications generated from WGDs, tandem duplications were
137
removed firstly based on the chromosomal positions of the duplicated genes, which
138
could screen out some ancient small scale duplications (see Methods). Furthermore,
139
synteny analyses of each species were able to provide synteny support for a large
140
proportion the duplications identified in this study (Figure 1 and Supplemental Figure
141
1). In addition, some duplications were hard to classify due to a lack of branching
142
species between two consecutive WGD events, such as ρ or σ in Poales, α/βM or γM in
143
Musa acuminata. We employed synonymous substitution per synonymous site (KS)
Page 6
144
approach to distinguish the gene survivors from certain duplication events (see
145
Methods). Together, we were able to gather the gene families with surviving
146
duplicates from each WGD event (Figure 1).
147 148
Gene retention pattern after WGDs occurred in three periods
149
WGD could generate a large number of duplicated genes in one event, which provides
150
a tremendous amount of raw genetic material for evolution. If the independent WGDs
151
helped species survive environmental changes, similar functional gene duplicates
152
would be retained in these different species as they might have been selected by
153
common environmental pressures.
154 155
The previously identified and dated twenty-one WGDs were denoted on the species
156
tree of selected land pants, fourteen of which could be classified into three waves
157
based on periods of the occurrence timing (Figure 1 and Supplemental Table 2). The
158
most ancient wave was around ~120 Ma when γ (Jaillon et al., 2007; Jiao et al., 2012;
159
Vekemans et al., 2012) and τ (Jiao et al., 2014; Ming et al., 2015) events occurred in
160
the early evolutionary history of eudicots and monocots respectively. The second one
161
is the well-known wave around the K-Pg boundary, when a large number of WGD
162
events occurred (Paterson et al., 2004; Tuskan et al., 2006; Rensing et al, 2008;
163
Fawcett et al., 2009; Schmutz et al., 2010; D’Hont et al., 2012; The Tomato Genome
164
Consortium, 2012; Singh et al., 2013; Vanneste et al., 2014). The relatively recent
165
wave was within 20 Ma during which four independent WGDs occurred in the
166
evolutionary history of Glycine max, Panicum virgatum, Tarenaya hassleriana, and
167
Zea mays (Blanc and Wolfe, 2004; Schmutz et al., 2010; Cheng et al., 2013; Lu et al.,
168
2013).
169 170
We found certain gene families indeed survived gene duplicates from several
171
independent WGDs in many species (Figure 2A), which were likely the signal of
172
selection from the specific stress environments. 66 gene families commonly retained
173
gene duplicates in three periods (Supplemental Figure 2), which were mainly protein
Page 7
174
kinases, transporters, and protein binding gene families (Supplemental Table 3). 320
175
gene families retained duplicates from the most ancient wave of WGDs (γ and τ),
176
which were enriched for genes functioning in response to water deprivation and salt
177
stress (Figure 2B). These survivors may be, at least in part, selected by the arid
178
climate around 120 Ma of Cretaceous (Heimhofer et al., 2005). The second wave of
179
WGDs were during K-Pg boundary with severe environmental changes, including
180
global cooling, darkness, acid rain and wildfires (Nichols and Johnson, 2008; Schulte
181
et al., 2010). 493 gene families retained gene duplicates from at least 6 independent
182
WGDs during the K-Pg boundary (Figure 2A), which were enriched for many stress
183
related Gene Ontology (GO) terms including cold, heat, osmotic, salt stress, water
184
deprivation, and wounding (Figure 2B), as well as several other biological process
185
associated with stress response (e.g. abscisic acid signaling pathway, cellular response
186
to phosphate starvation, defense response, and response to karrikin) (Supplemental
187
Figure 3). We also investigated the other five lineages without poleopolyploidization
188
events during the K-Pg boundary and found that they retained small-scale
189
duplications in 12 gene families (Supplemental Figure 4). However, these gene
190
families mainly encode enzymes or transporters in plant metabolic processes, which
191
are not directly related to environmental adaptation (Supplemental Table 4). The most
192
recent wave of WGDs were within 20 Ma and retained duplicates from 844 gene
193
families (Figure 2A), of which the functional category enrichments are response to
194
salt stress, cold stress, water deprivation, and wounding (Figure 2B). The recorded
195
environmental changes were low CO2 concentrations and relatively cool temperatures
196
during that period (Zachos et al., 2008). Our enriched GO terms might partially
197
explain the environmental changes, but also suggesting the presence of other
198
environmental selection for different lineages.
199 200
Biased retention of transcription factor gene families after WGDs
201
TFs play a critical role in the transcriptional regulation of genes involved in many
202
biological processes (e.g. growth, development, and stress responses) (de Mendoza et
203
al., 2013). Previous studies demonstrated TFs are the vastly over-retained genes after
Page 8
204
WGD (Maere et al., 2005; Freeling et al., 2009).
205 206
We examined the retention pattern of gene duplicates of TFs after three waves of
207
WGDs based on the retention value (R-value, see Methods). In general, the majority
208
of the TF genes tend to be retained after WGDs (Figure 3), which is consistent with
209
previous analyses (Maere et al., 2005; Freeling et al., 2009). However, we found that
210
not all TF gene families were over-retained, and different families of TF showed
211
certain retention preferences (Figure 3). For example, high-retention gene families,
212
including ARF, C2H2, C3H, CO-like, ERF, G2-like, GRAS, HD-ZIP, HSF, LBD,
213
MYB, NAC, Trihelix, WRKY, bHLH and bZIP gene families, tend to repeatedly
214
retain duplicates after WGDs independent of the evolutionary periods and diverse
215
lineages (Figure 3). Many of the high-retention TF gene families are involved in
216
diverse development processes and response to abiotic and biotic stresses (Khan et al.,
217
2018). However, some TFs were lowly retained after many WGDs, such as FAR1,
218
HB-PHD, HRT-like, LFY, LSD, NF-X1, S1Fa-like, STAT, SAP, and Whirly,
219
suggesting the conservative function and dosage of these TFs (Figure 3). Most of
220
low-retention TF gene families were functioning in conserved biological processes.
221
For instance, LFY controls the switch from vegetative to reproductive development
222
(William et al., 2004) and LSD1 negatively regulates plant cell death pathway
223
(Dietrich et al., 1997).
224 225
Duplicated TF genes, which were co-retained after specific wave of WGDs, are
226
considered as critical genetic contributions for species surviving environmental
227
changes. Co-retained TF genes at ~120 Ma mainly involved in plant growth,
228
development, morphogenesis, and stress response (Supplemental Table 5). For
229
example, the retained duplicate genes and their functional divergence in four
230
orthogroups of MADS-box gene family likely contributed to morphological novelty
231
of floral organs in both core eudicots and monocots (Zhao et al., 2017). Two
232
orthogroups of heat stress transcription factor (HSF) play roles in responding to heat
233
stress. However, co-retained TFs at K-Pg boundary mainly involved in responding to
Page 9
234
various abiotic stresses (Supplemental Table 6). Orthogroups of the C2H2,ERF, and
235
RAV families involved in the response to low temperature. Orthogroups of the
236
HD-ZIP family involved in the shade avoidance syndrome and dehydration stress
237
responses, respectively. Orthogroup of the WRKY family involved in the response to
238
low phosphate stress (Supplemental Table 6).
239 240
Contribution of WGDs on the complexity of gene regulatory networks
241
As WGDs could potentially rewire the gene regulatory networks (Conant, 2010; De
242
Smet and Van de Peer, 2012), we aimed to explore the contribution of WGDs on
243
reshaping networks during adaptation to environmental changes following the K-Pg
244
boundary (Alvarez et al., 1980; Nichols and Johnson, 2008; Schulte et al., 2010).
245 246
Global cooling (or low temperatures) was a major environmental stress during the
247
mass extinction period (Schulte et al., 2010), and the C-repeat/DREB binding factor
248
(CBF)-dependent signaling pathway is the well-known major cold signaling pathway
249
(Chinnusamy et al., 2007; Shi et al., 2015, 2018). Currently, the core components of
250
the CBF-dependent signaling pathway have been deciphered in A. thaliana (Shi et al.,
251
2015). CBF genes, as key components in the pathway, are regulated by upstream ICE
252
and CAMTA TFs (Shi et al., 2015; Zhao et al., 2015), and are able to trigger the
253
expression of many cold-responsive (COR) genes under cold stress (Chinnusamy et
254
al., 2007).
255 256
By tracking the evolutionary history of key gene families in CBF pathway, we found
257
that the CBF, ICE, CAMTA and other related families (SIZ, EIN etc.) presented as
258
duplicated states in many different lineages (Figure 4A). The ICE1-ICE2 were
259
duplicated from β WGD in Arabidopsis (Figures 4B and C). The ice1 loss-of-function
260
mutant is sensitive to cold stress, which lead to significant reduction of survival rate
261
than wild type (Chinnusamy et al., 2003). Overexpression of ICE2 greatly enhanced
262
the cold tolerance in transgenic plants (Fursova et al., 2009). In Oryza sativa, CBF
263
genes were also retained as duplicated copies after the ρ WGD, which also play
Page 10
264
important role in cold stress (Supplemental Figure 5). Therefore, duplicated genes that
265
were retained from WGDs occurred during K-Pg boundary in difference lineages have
266
largely contributed to the copy number (probably dosage at first) and the complexity
267
of the current CBF-dependent signaling network functioning in cold stress tolerance
268
in eudicots and monocots (Figure 4B).
269 270
We also performed comparison of the network of CBF pathway members following
271
the polyploidization events of certain lineages. Co-expression networks have been
272
widely used for identifying functional related genes (Obayashi and Kinoshita, 2010;
273
You et al., 2016; Obayashi et al., 2018). For investigation of network evolution in
274
Arabidopsis lineage, Vitis is an ideal outgroup that experienced non-additional WGD
275
after the γ event. We constructed the cold-specific co-expression networks for A.
276
thaliana and Vitis vinifera using 162 and 60 RNA-seq data, respectively (see
277
Methods). For the duplicated ICE genes from β WGD, we examined the
278
co-expression networks of AthICE1 and AthICE2 in Arabidopsis and their
279
orthologous VviICE in Vitis (Figure 5). Most of the co-expressed genes in VviICE
280
module have orthologous genes clustered in AthICE1 and AthICE2 module, and the
281
corresponding orthologous in Arabidopsis could be divided into three sets: one set
282
specifically co-expressed with AthICE1, one set specifically co-expressed with
283
AthICE2, and one set co-expressed with both AthICE1 and AthICE2 (Figure 5),
284
indicating sub-functionalization of the duplicated ICE genes after WGD. Moreover,
285
the module of AthICE1 and AthICE2 in Arabidopsis is twice larger than the module of
286
VviICE by recruiting additional genes into the network after β and α WGDs, which
287
might potentially increase the cold stress tolerance.
288 289
Darkness (or low light) was another major environmental stress encountered by
290
species at mass extinction period, due to the atmospheric dust reflecting sunlight for a
291
long time (Schulte et al., 2010). We investigated the key components in the shade
292
avoidance pathway in plants (Jiao et al., 2007; Ruberti et al., 2012), and also found
293
several key genes were duplicated from the WGDs in multiple lineages (Figure 6A).
Page 11
294
In A. thaliana, the ATHB2 and HAT1 in the HD-ZIP II gene family, which function in
295
shade avoidance response, were derived from the β WGD (Figures 6B and C).
296
Molecular genetic analyses revealed that ATHB2 is rapidly induced by low R:FR light
297
in Arabidopsis, and the athb2 loss-of-function mutant display significant reduction of
298
hypocotyl elongation and shade avoidance ability comparing to wild type (Carabelli et
299
al., 2013). By using Arabidopsis as an example, a putative model for network
300
evolution from pre-WGD to post-WGD was illustrated in Figure 6B. Despite that the
301
predicted ancestral network is somewhat uncertain, our results showed clear evidence
302
for the expansion of shade avoidance pathway after WGDs, which may enhance the
303
perception of light signals and better adapt to low light environment.
304 305
To test the possible link between WGDs and plant adaptation, we compared the
306
specific retention pattern of regulatory genes in response to the cold and dark stresses
307
after the three waves of WGDs. The regulatory genes of cold stress pathway have a
308
higher chance to be retained after the recent two waves of WGDs (~66 Ma and <20
309
Ma), when the global cooling has been recorded during these two periods (Nichols
310
and Johnson, 2008; Zachos et al., 2008; Schulte et al., 2010) (Supplemental Figure 6).
311
The global darkness was only reported during the K-Pg boundary (Nichols and
312
Johnson, 2008; Schulte et al., 2010). Genes in shade avoidance pathway have a
313
particularly higher retention after the WGDs around the K-Pg boundary than the
314
retention after the other two waves of WGDs (Supplemental Figure 6). In addition, we
315
further investigated another stress pathway (Na+ tolerance), while high Na+ was not
316
the main global stress during the K-Pg boundary. The Salt-Overly-Sensitive (SOS)
317
signaling pathway has functions in maintaining ion homeostasis under high Na+
318
tolerance (Ji et al., 2013; Supplemental Figure 7A). Duplicates of the core members of
319
SOS pathway, such as SOS3, ScaBP8, SOS2, and SOS1, were only biasedly retained
320
after the examined WGDs (Supplemental Figures 6 and 7B). Therefore, the
321
preferential retention of the key members in stress related networks after multiple
322
independent WGDs may serve as critical evidence supporting the contribution of
323
WGDs to the adaptation of species during the global environmental changes.
Page 12
324 325
Discussion
326
The nature of periodically occurrence of ancient WGDs in angiosperms
327
To bridge the gap of the genetic contribution of paleopolyploidizations with adaptive
328
evolution in general, we need to explore empirical adaptive genetic signatures of
329
many WGDs during the evolutionary history of angiosperms. Polyploids are very
330
common in nature. However, the nascent polyploid individuals tend to encounter
331
internal and external obstacles, including increased rates of chromosome segregation
332
errors, small effective population size, competition with progenitor diploid species
333
and so on (Comai, 2005; Arrigo and Barker, 2012). Several studies suggested that
334
polyploid is usually an evolutionary dead end (Stebbins, 1950; Mayrose et al., 2011).
335
Recently formed polyploid plants have to find certain ecological niches that is
336
different from corresponding diploid species to survive (Stebbins, 1950; Levin, 1983;
337
Ramsey, 2011; te Beest et al., 2012; Visger et al., 2016). Polyploid plants could move
338
to a new, but stressful environment with no competition with their ancestral diploids,
339
or survive after a strong environmental selection that swapped the diploid ancestors
340
for polyploids (Otto and Whitton, 2000; Brochmann et al., 2004; Ramsey, 2011; te
341
Beest et al., 2012; Chao et al., 2013; Parisod and Broennimann, 2016). Therefore, the
342
paleopolyploidization events in angiosperms clustered and co-occurred with past
343
global environmental changes, which might have played significant roles in the
344
establishment of polyploids (Van de Peer et al., 2009, 2017).
345 346
Challenges to infer the evolutionary significance of WGDs
347
Due to the recurrent occurrences and subsequent large-scale gene losses, the
348
remaining signals of genetic contribution from ancient WGDs became complicated
349
and ambiguous (Doyle et al., 2008; Schnable et al., 2011; Wendel et al., 2016). In
350
addition, the environmental selection pressure usually not last as long as tens of
351
million years. The increased novel genetic contribution of ancient WGDs might have
352
lost after the environmental conditions changed. Moreover, hybridization and
Page 13
353
recombination could also remove the critical genetic information that helped species
354
surviving severe environmental changes during a particular period. Therefore, it poses
355
challenges to infer the significance of WGDs in the evolutionary history of
356
angiosperms. Here, we have to include many high-quality completely sequenced
357
genomes sharing one ancient WGD in our analysis to avoid missing critical genes due
358
to incomplete and/or improper genome assembly and annotation. More importantly,
359
although these critical genes might be lost in some species, we still be able to piece
360
together a broad picture from simultaneous consideration of many species. Finally, we
361
investigated several independent WGDs occurred at same period to look for shared
362
duplicated genes. Therefore, we were able to identify critical genetic signal for
363
species surviving dramatic environmental changes, and propose the likely
364
evolutionary consequences of WGDs for plant adaptation.
365 366
Genetic evidence shed new light on the contribution of WGD to adaptation
367
In addition to illustrate genetic impact of individual WGDs, we also need to consider
368
certain waves of ancient WGDs independently occurred in different lineages. The
369
severe environmental changes should have pose the similar selection for all species on
370
Earth. Previous studies have demonstrated the biased retention for genes related to
371
regulatory and development (Maere et al., 2005; Freeling, 2009). After comprehensive
372
investigation of gene families, we found certain functional genes were duplicated after
373
independent WGDs occurred during the same period, which provides likely evidence
374
supporting the global environmental selection on the paleopolyploids from different
375
lineages. For example, in response to low temperature and low light environmental
376
changes during K-Pg boundary, the second wave of ancient WGDs have contributed
377
to reshape the CBF-dependent signaling (Figures 4 and 5) and shade avoidance
378
pathways (Figure 6). The duplicates in ICE and CBF gene families are recruited in the
379
pathway which indeed enhanced the cold tolerance in plants (Shi et al., 2015).
380 381
Methods
Page 14
382
Genome data
383
We selected twenty-five sequenced plant genomes representing major lineages of
384
angiosperms and having clear WGD records during their evolutionary history. The
385
studied species included ten eudicots (A. thaliana, Boechera stricta, Eucalyptus
386
grandis, G. max, Medicago truncatula, Populus trichocarpa, Solanum lycopersicum,
387
Solanum tuberosum, Tarenaya hassleriana, and V. vinifera), twelve monocots
388
(Brachypodium distachyon, O. sativa, P. virgatum, Sorghum bicolor, Setaria italic,
389
Spirodela polyrhiza, Setaria viridis, Z. mays, Aegilops tauschii, Elaeis guineensis,
390
Hordeum vulgare, and M. acuminata), one living representative of a lineage that
391
represents the extant earliest diverging lineage of flowering plants also named basal
392
angiosperm (A. trichopoda), one lycophyte (S. moellendorffii), and one moss (P.
393
patens). Genome data of A. tauschii, E. guineensis, and M. acuminata were
394
downloaded from their project websites (Supplemental Table 1), and the other
395
genome data were mainly downloaded from Phytozome (version 11) (Goodstein et al.,
396
2012).
397 398
Gene family classification and phylogenetic analysis
399
We classified protein-coding genes into putative gene families or subfamilies by using
400
the OrthoMCL method (version 2.0.9) (Li et al., 2003) with an inflation parameter of
401
1.5, and obtained 66,509 orthogroups in total. The orthogroups with less than four
402
genes and/or without at least one gene from outgroups were filtered out, and the
403
remaining 12,077 orthogroups were processed to phylogenetic analysis. The
404
taxonomic distribution of the 12,077 orthogroups by considering the last common
405
ancestor of genes in each orthogroup was shown in Supplemental Figure 8. For the
406
construction of global gene family trees, the amino acid sequences of each orthogroup
407
were aligned with MAFFT (Katoh et al., 2005). Then the corresponding nucleic acid
408
sequences were forced onto amino acid alignments using PAL2NAL (Suyama et al.,
409
2006). To remove poorly aligned regions, the nucleic acid alignments were refined
410
using trimAl 1.4 (Capella-Gutirrez et al., 2009) with the option “automated1”.
411
Phylogenetic trees were conducted using maximum likelihood method in RAxML
Page 15
412
8.2.11 (Stamatakis, 2014) with the fast bootstrap option, 100 replicates under
413
GTRGAMMA model.
414 415
Identifying gene duplication events
416
In order to accurately identify gene duplications, we followed the same standard for
417
gene trees and species tree reconciliation as proposed by Jiao et al. (2011). That is,
418
two child branches need to have genes from at least one common species, and the
419
parental node and one of the child nodes should both have bootstrap values equal or
420
greater than 50%. Since the phylogenetic relationships of 25 species sampled in this
421
study were clear (Angiosperm Phylogeny Website), we directly adopted the currently
422
accepted topology as species tree. Genes from outgroups (P. patens, S. moellendorffii,
423
and A. trichopoda) were used to root trees.
424 425
Firstly, we used Notung 2.9 (Stolzer et al., 2012), a gene tree-species tree
426
reconciliation program, to batch reconciling all the nodes of gene trees with
427
corresponding nodes in species tree. Parsimony-based optimization criterion was
428
employed in Notung to minimize the duplication/loss cost. We ran the analysis based
429
on the duplication-loss (DL) events model. In addition to “--reconcile” mode, the
430
“--rearrange” mode was also performed with parameters setting as “--threshold 50%”.
431
This option could rearrange weakly supported edges (such as BS < 50%) and reduce
432
the duplication uncertainty of inference (Notung 2.9 manual). Secondly, after
433
carefully checking thousands of reconciled trees resulted from Notung, we further
434
applied the standard that “two child branches need have genes from at least one
435
common species”, and removed some low confidence duplications from all the
436
reconciled trees.
437 438
Eliminating tandem duplications
439
We defined two genes located within five genes as tandem duplicates. If a duplication
440
node contains two genes (gene1, gene2), or two child branches ((gene1, gene2),
441
(gene3, gene4)), either of the two genes located proximal to each other were treated as
Page 16
442
from tandem duplications. Based on the above criteria, we removed tandem
443
duplication events for all the reconciled gene trees.
444 445
KS calculation of duplication node and circumscription of individual WGD
446
We used KS value to date two WGDs occurred on the same branch which could not be
447
circumscribed by using phylogenetic method. KS estimates for pairwise comparisons
448
(one gene in m branch and the other gene in n branch) were obtained using the
449
Nei-Gojobori method (Nei and Gojobori, 1986) implemented in yn00 program of the
450
PAML package (Yang, 1997). The sum of KS values for all pairwise comparisons were
451
then divided by the number of KS estimates (m*n). Thus, we got a weighted KS value
452
for a duplication event. According to the KS ranges of ρ and σ events indicated by a
453
sequential KS curve of syntenic gene pairs of O. sativa as obtained by using Plant
454
Genome Duplication Database (PGDD) online tools, we roughly defined that
455
duplication node with KS ≥ 1.0 belong to σ event, and KS < 1.0 belong to ρ event
456
(Supplemental Figure 9). KS ≥ 0.7 belonging to γM event and KS < 0.7 belonging to
457
α/βM event were defined based on the circumscription of previous study (D’Hont et al.,
458
2012).
459 460
Syntenic conservation analysis of retained paralogous genes
461
The above procedures allow us to obtain the paralogous genes potential related to
462
each WGDs. To further validate if the duplicate genes are still located on syntenic
463
blocks, we performed collinearity analysis of WGD-derived species. The
464
intra-genome syntenic blocks were detected by using MCScanX based on the default
465
parameters (Wang et al., 2012). We then scored the percentage of paralogous genes
466
with syntenic evidence out of the total genes through phylogenomic timing.
467 468
GO annotation of gene orthogroups and functional enrichment analysis
469
To annotate orthogroups, we used full GO term of A. thaliana genes as the annotation
470
of orthogroups if they have Arabidopsis genes. Otherwise, we searched InterPro
471
domain of protein sequences using InterProScan (Zdobnov and Apweiler, 2001) and
Page 17
472
got the full GO term annotation. Statistical enrichment of GO terms were evaluated by
473
comparing the sample (common retained orthogroups) with background (all annotated
474
orthogroups) based on Fisher’s exact test and adjusted P values according to
475
Benjamini and Hochbery (false-discovery rate) method (Ashburner et al., 2000).
476 477
Retention analysis of transcription factors
478
We used the gene families of transcription factors of A. thaliana downloaded from
479
PlantTFDB 4.0 (Jin et al., 2017) to annotate orthogroups. For each WGD event, the
480
retained orthogroups of each TF family were identified. Owning to the sequence
481
divergence after duplications, some TF families are usually classified into multiple
482
orthogroups. To eliminate the influence of the size of orthogroups in one TF gene
483
family, we calculated a normalized value (retention value, R-value) to reflect the
484
retention pattern of each TF after corresponding WGD. R-value is calculated from the
485
formula as following:
486
=
487
where
488
TF family after corresponding WGD and
489
the probability of retention of all TF families after corresponding WGD.
!" ! #ℎ!% !&' ()#ℎ #*#)!* )* '&+)")+ ,,!#./ * !" ! #ℎ!% !&' )* '&+)")+ ,-
,
is the probability of retention of specific
is
490 491
Co-expression network construction and comparison
492
A total of 222 cold-related RNA-seq samples (162 for A. thaliana in Supplemental
493
Table 7 and 60 for V. vinifera in Supplemental Table 8) were downloaded from the
494
NCBI’s Sequence Read Archive database (https://www.ncbi.nlm.nih.gov/sra/). These
495
data sets were then proceeded with quality control, clean reads mapping, FPKM
496
calculation same as in You et al. (2016). After removing none expression genes of
497
each sample (FPKM 0.14 and FPKM 0.36 were selected as cutoffs for A. thaliana and
498
V. vinifera, respectively; detailed method followed You et al. (2016)), the remaining
499
expressed genes were used to calculate co-expression relationships by using Pearson’s
500
correlation coefficient (PCC). Subsequently, Mutual rank (MR; calculated as the
Page 18
501
geometric mean of the PCC rank from gene A to gene B and the rank of gene B to
502
gene A), was used to construct co-expression network (Obayashi and Kinoshita, 2010;
503
You et al., 2016). MR was demonstrated to be more effective to get credible
504
co-expression gene pairs than PCC (Obayashi and Kinoshita, 2010). Therefore, we
505
constructed MR-based co-expression networks for each species. Then, we selected the
506
top 300 co-expressed genes (a threshold used by You et al. (2016) and Obayashi et al.
507
(2018)) in Arabidopsis and in Vitis for network comparison. In such case, the
508
co-expression networks of AthICE1 and AthICE2 (formed by β WGD) in Arabidopsis
509
and their orthologous VviICE in Vitis were used to assess the evolutionary pattern of
510
these key genes in the CBF signaling pathway.
511 512
Supplemental information
513
Supplemental information includes Supplemental Figures 1-9 and Tables 1-8.
514 515
Author contributions
516
Y.J. and S.W. designed the study; S.W., B.H., and Y.J. performed the data analyses
517
and wrote the paper.
518 519
Acknowledgements
520
This research was supported by the Strategic Priority Research Program of the
521
Chinese Academy of Sciences (XDB31000000). We also thank the start-up funding
522
from State Key Laboratory of Systematic and Evolutionary Botany, Institute of
523
Botany, the Chinese Academy of Sciences. No conflict of interest declared.
524 525
Competing interests
526
The authors declare no competing interests.
527
Page 19
528
References
529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569
Adams, K.L., and Wendel, J.F. (2005). Polyploidy and genome evolution in plants. Curr. Opin. Plant Biol. 8:135–141. Alvarez, L.W., Alvarez, W., Asaro, F., and Michel, H.V. (1980). Extraterrestrial cause for Cretaceous-Tertiary extinct. Science 208:1095–1108. Arrigo, N., and Barker, M.S. (2012). Rarely successful polyploids and their legacy in plant genomes. Curr. Opin. Plant Biol. 15:140–146. Ashburner, M., Ball, C., Ja, Botstein, D., Butler, H., Cherry, J., Davis, A., Dolinski, K., Dwight, S., Eppig, J., and Harris, M. (2000). Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature Genet. 25:25–29. Blanc, G., and Wolfe, K.H. (2004). Widespread paleopolyploidy in model plant species inferred from age distributions of duplicate genes. Plant Cell 16:1667– 1678. Bowers, J.E., Chapman, B.A., Rong, J.K., and Paterson, A.H. (2003). Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events. Nature 422:433–438. Brochmann, C., Brysting, A.K., Alsos, I.G., Borgen, L., Grundt, H.H., Scheen, A.C., and Elven, R. (2004). Polyploidy in arctic plants. Biol. J. Linn. Soc. Lond 82:521–536. Cannon, S.B., McKain, M.R., Harkess, A., Nelson, M.N., Dash, S., Deyholos, M.K., Peng, Y., Joyce, B., Stewart, C.N., Stewart Jr., Rolf, M., et al. (2015). Multiple polyploidy events in the early radiation of nodulating and nonnodulating legumes. Mol. Biol. Evol. 32:193–210. Capella-Gutirrez, S., Silla-Martínez, J.M., and Gabaldn, T. (2009). trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25:1972–1973. Carabelli, M., Turchi, L., Ruzza, V., Morelli, G., and Ruberti, I. (2013). Homeodomain-Leucine zipper II family of transcription factors to the limelight. Plant Signal Behav. 8:e25447. Chao, D.Y., Dilkes, B., Luo, H., Douglas, A., Yakubova, E., Lahner, B., and Salt, D.E. (2013). Polyploids exhibit higher potassium uptake and salinity tolerance in Arabidopsis. Science 341:658–659. Cheng, S., van den Bergh, E., Zeng, P., Zhong, X., Xu, J., Liu, X., Hofberger, J., de Bruijn, S., Bhide, A.S., Kuelahoglu, C., et al. (2013). The Tarenaya hassleriana genome provides insight into reproductive trait and genome evolution of Crucifers. Plant Cell 25:2813–2830. Chinnusamy, V., Ohta, M., Kanrar, S., Lee, B.H., Hong, X., Agarwal, M., and Zhu, J.K. (2003). ICE1: a regulator of cold-induced transcriptome and freezing tolerance in Arabidopsis. Genes Dev. 17:1043–1054. Chinnusamy, V., Zhu, J., and Zhu, J.K. (2007). Cold stress regulation of gene expression in plants. Trends Plant Sci. 12:444–451.
Page 20
570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613
Comai, L. (2005). The advantages and disadvantages of being polyploid. Nat. Rev. Genet. 6:836–846. Conant, G.C. (2010). Rapid reorganization of the transcriptional regulatory network after genome duplication in yeast. Proc. R. Soc. B. 277:869–876. Cui, L., Wall, P.K., Leebens-Mack, J.H., Lindsay, B.G., Soltis, D.E., Doyle, J.J., Soltis, P.S., Carlson, J.E., Arumuganathan, K., Barakat, A., et al. (2006). Widespread genome duplications throughout the history of flowering plants. Genome Res. 16:738–749. D'Hont, A., Denoeud, F., Aury, J.M., Baurens, F.C., Carreel, F., Garsmeur, O., Noel, B., Bocs, S., Droc, G., Rouard, M., et al. (2012). The banana (Musa acuminata) genome and the evolution of monocotyledonous plants. Nature 488:213–217. de Mendoza, A., Sebé-Pedrós, A., Šestak, M.S., Matejčić, M., Torruella, G., Domazet-Lošo, T., and Ruiz-Trillo, I. (2013). Transcription factor evolution in eukaryotes and the assembly of the regulatory toolkit in multicellular lineages. Proc. Natl. Acad. Sci. USA 110:E4858–E4866. De Smet, R., and Van de Peer, Y. (2012). Redundancy and rewiring of genetic networks following genome-wide duplication events. Curr. Opin. Plant Biol. 15:168–176. Dietrich, R.A., Richberg, M.H., Schmidt, R., Dean, C., and Dangl, J.L. (1997). A novel zinc finger protein is encoded by the Arabidopsis LSD1 gene and functions as a negative regulator of plant cell death. Cell 88:685–694. Doyle, J.J., Flagel, L.E., Paterson, A.H., Rapp, R.A., Soltis, D.E., Soltis, P.S., and Wendel, J.F. (2008). Evolutionary genetics of genome merger and doubling in plants. Annu. Rev. Genet. 42:443–461. Edger, P.P., Heidel-Fischer, H.M., Bekaert, M., Rota, J., Glöckner, G., Platts, A.E., Heckel, D.G., Der, J.P., and Wafula, E.K. (2015). The butterfly plant arms-race escalated by gene and genome duplications. Proc. Natl. Acad. Sci. USA 112:8362–8366. Estep, M.C., McKain, M.R., Vela Diaz, D., Zhong, J., Hodge, J.G., Hodkinson, T.R., Layton, D.J., Malcomber, S.T., Pasquet, R., and Kellogg, E.A. (2014). Allopolyploidy, diversification, and the Miocene grassland expansion. Proc. Natl. Acad. Sci. USA 111:15149–15154. Fawcett, J.A., Maere, S., and Van de Peer, Y. (2009). Plants with double genomes might have had a better chance to survive the Cretaceous-Tertiary extinction event. Proc. Natl. Acad. Sci. USA 106:5737–5742. Force, A., Lynch, M., Pickett, F.B., Amores, A., Yan, Y.L., and Postlethwait, J. (1999). Preservation of duplicate genes by complementary, degenerative mutations. Genetics 151:1531–1545. Freeling, M. (2009). Bias in plant gene content following different sorts of duplication: tandem, whole-genome, segmental, or by transposition. Annu. Rev. Plant Biol. 60:433–453. Friedman, W.E. (2009). The meaning of Darwin's 'abominable mystery'. Am. J. Bot. 96:5–21.
Page 21
614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657
Fursova, O.V., Pogorelko, G.V., and Tarasov, V.A. (2009). Identification of ICE2, a gene involved in cold acclimation which determines freezing tolerance in Arabidopsis thaliana. Gene 429:98–103. Goodstein, D.M., Shu, S., Howson, R., Neupane, R., Hayes, R.D., Fazo, J., Mitros, T., Dirks, W., Hellsten, U., Putnam, N., et al. (2012). Phytozome: a comparative platform for green plant genomics. Nucleic Acids Res. 40:D1178–D1186. Hegarty, M.J., and Hiscock, S.J. (2008). Genomic clues to the evolutionary success of polyploid plants. Curr. Biol. 18:R435–R444. Heimhofer, U., Hochuli, P.A., Burla, S., Dinis, J.M.L., and Weissert, H. (2005). Timing of Early Cretaceous angiosperm diversification and possible links to major paleoenvironmental change. Geology 33:141. Huang, C.H., Zhang, C., Liu, M., Hu, Y., Gao, T., Qi, J., and Ma, H. (2016). Multiple polyploidization events across Asteraceae with two nested events in the early history revealed by nuclear phylogenomics. Mol. Biol. Evol. 33:2820–2835. Jaillon, O., Aury, J.M., Noel, B., Policriti, A., Clepet, C., Casagrande, A., Choisne, N., Aubourg, S., Vitulo, N., Jubin, C., et al. (2007). The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature 449:463–467. Ji, H., Pardo, J.M., Batelli, G., Van Oosten, M.J., Bressan, R.A., and Li, X. (2013). The Salt Overly Sensitive (SOS) pathway: established and emerging roles. Mol. Plant 6:275–286. Jiao, Y., Lau, O.S., and Deng, X.W. (2007). Light-regulated transcriptional networks in higher plants. Nat. Rev. Genet. 8:217–230. Jiao, Y., Leebens-Mack, J., Ayyampalayam, S., Bowers, J.E., McKain, M.R., McNeal, J., Rolf, M., Ruzicka, D.R., Wafula, E., Wickett, N.J., et al. (2012). A genome triplication associated with early diversification of the core eudicots. Genome Biol. 13:R3. Jiao, Y., Li, J., Tang, H., and Paterson, A.H. (2014). Integrated syntenic and phylogenomic analyses reveal an ancient genome duplication in monocots. Plant Cell 26:2792–2802. Jiao, Y., Wickett, N.J., Ayyampalayam, S., Chanderbali, A.S., Landherr, L., Ralph, P.E., Tomsho, L.P., Hu, Y., Liang, H., Soltis, P.S., et al. (2011). Ancestral polyploidy in seed plants and angiosperms. Nature 473:97–100. Jin, J., Tian, F., Yang, D.C., Meng, Y.Q., Kong, L., Luo, J., and Gao, G. (2017). PlantTFDB 4.0: toward a central hub for transcription factors and regulatory interactions in plants. Nucleic Acids Res. 45:D1040–D1045. Katoh, K., Kuma, K., Toh, H., and Miyata, T. (2005). MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res. 33:511–518. Khan, S.A., Li, M.Z., Wang, S.M., and Yin, H.J. (2018). Revisiting the role of plant transcription factors in the battle against abiotic stress. Int. J. Mol. Sci. 19:1634.
Page 22
658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700
Levin, D.A. (1983). Polyploidy and novelty in flowering plants. Am. Nat. 122: 1–25. Li, L., Stoeckert, C.J., and Roos, D.S. (2003). OrthoMCL: Identification of ortholog groups for eukaryotic genomes. Genome Res. 13:2178–2189. Lu, F., Lipka, A.E., Glaubitz, J., Elshire, R., Cherney, J.H., Casler, M.D., Buckler, E.S., and Costich, D.E. (2013). Switchgrass genomic diversity, ploidy, and evolution: novel insights from a network-based SNP discovery protocol. PLoS Genet. 9:e1003215. Lynch, M., and Conery, J.S. (2000). The evolutionary fate and consequences of duplicate genes. Science 290:1151–1155. Maere, S., De Bodt, S., Raes, J., Casneuf, T., Van Montagu, M., Kuiper, M., and Van de Peer, Y. (2005). Modeling gene and genome duplications in eukaryotes. Proc. Natl. Acad. Sci. USA 102:5454–5459. Mayrose, I., Zhan, S.H., Rothfels, C.J., Magnuson-Ford, K., Barker, M.S., Rieseberg, L.H., and Otto, S.P. (2011). Recently formed polyploid plants diversify at lower rates. Science 333:1257. McKain, M.R., Tang, H., McNeal, J.R., Ayyampalayam, S., Davis, J.I., depamphilis, C.W., Givnish, T.J., Pires, J.C., Stevenson, D.W., and Leebens-Mack, J.H. (2016). A phylogenomic assessment of ancient polyploidy and genome evolution across the Poales. Genome Biol. Evol. 8:1150–1164. Ming, R., VanBuren, R., Wai, C.M., Tang, H., Schatz, M.C., Bowers, J.E., Lyons, E., Wang, M.L., Chen, J., Biggers, E., et al. (2015). The pineapple genome and the evolution of CAM photosynthesis. Nat. Genet. 47:1435–1442. Nei, M., and Gojobori, T. (1986). Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol. 3:418. Nichols, D.J., and Johnson, K.R. (2008). Plants and the K-T boundary (England: Cambridge University Press). Novikova, P.Y., Hohmann, N., and Van de Peer, Y. (2018). Polyploid Arabidopsis species originated around recent glaciation maxima. Curr. Opin. Plant Biol. 42:8–15. Obayashi, T., Aoki, Y., Tadaka, S., Kagaya, Y., and Kinoshita, K. (2018). ATTED-II in 2018: A plant coexpression database based on investigation of the statistical property of the mutual rank index. Plant Cell Physiol. 59:e3(1– 7). Obayashi, T., and Kinoshita, K. (2010). Coexpression landscape in ATTED-II: usage of gene list and gene network for various types of pathways. J. Plant Res. 123:311–319. Ohno, S. (1970). Evolution by gene duplication (Berlin: Springer-Verlag). Otto, S.P., and Whitton, J. (2000). Polyploid Incidence and Evolution. Annu. Rev. Genet. 34:401–437. Parisod, C., and Broennimann, O. (2016). Towards unified hypotheses of the impact of polyploidy on ecological niches. New Phytol. 212:540–542.
Page 23
701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744
Paterson, A.H., Bowers, J.E., Chapman, B.A. (2004). Ancient polyploidization predating divergence of the cereals, and its consequences for comparative genomics. Proc. Natl. Acad. Sci. USA 101:9903–9908. Ramsey, J. (2011). Polyploidy and ecological adaptation in wild yarrow. Proc. Natl Acad. Sci. USA 108:7096–7101. Ren, R., Wang, H., Guo, C., Zhang, N., Zeng, L., Chen, Y., Ma, H., and Qi, J. (2018). Widespread whole genome duplications contribute to genome complexity and species diversity in angiosperms. Mol. Plant 11:414–428. Renny-Byfield, S., and Wendel, J.F. (2014). Doubling down on genomes: polyploidy and crop plants. Am. J. Bot. 101:1711–1725. Rensing, S.A., Lang, D., Zimmer, A.D., Terry, A., Salamov, A., Shapiro, H., Nishiyama, T., Perroud, P., Lindquist, E.A., Kamisugi, Y., et al. (2008). The Physcomitrella Genome Reveals Evolutionary Insights into the Conquest of Land by Plants. Science 319:64–69. Ruberti, I., Sessa, G., Ciolfi, A., Possenti, M., Carabelli, M., and Morelli, G. (2012). Plant adaptation to dynamically changing environment: the shade avoidance response. Biotechnol. Adv. 30:1047–1058. Schmutz, J., Cannon, S.B., Schlueter, J., Ma, J., Mitros, T., Nelson, W., Hyten, D.L., Song, Q., Thelen, J.J., Cheng, J., et al. (2010). Genome sequence of the palaeopolyploid soybean. Nature 463:178–183. Schnable, J.C., Springer, N.M., and Freeling, M. (2011). Differentiation of the maize subgenomes by genome dominance and both ancient and ongoing gene loss. Proc. Natl. Acad. Sci. USA 108:4069–4074. Schulte, P., Alegret, L., Arenillas, I., Arz, J.A., Barton, P.J., Bown, P.R., Bralower, T.J., Christeson, G.L., Claeys, P., Cockell, C.S., et al. (2010). The chicxulub asteroid impact and mass extinction at the Cretaceous-Paleogene boundary. Science 327:1214–1218. Shi, Y., Ding, Y., and Yang, S. (2015). Cold signal transduction and its interplay with phytohormones during cold acclimation. Plant Cell Physiol. 56:7–15. Shi, Y., Ding, Y., and Yang, S. (2018). Molecular regulation of CBF signaling in cold acclimation. Trends Plant Sci. 23:623–637. Singh, R., Ong-Abdullah, M., Low, E.L., Manaf, M.A.A., Rosli, R., Nookiah, R., Ooi, L.C., Ooi, S., Chan, K., Halim, M.A., et al. (2013). Oil palm genome sequence reveals divergence of interfertile species in Old and New Worlds. Nature 500:335–339. Soltis, D.E., Albert, V.A., Leebens-Mack, J., Bell, C.D., Paterson, A.H., Zheng, C., Sankoff, D., Depamphilis, C.W., Wall, P.K., and Soltis, P.S. (2009). Polyploidy and angiosperm diversification. Am. J. Bot. 96:336–348. Soltis, P.S., Marchant, D.B., Van de Peer, Y., and Soltis, D.E. (2015). Polyploidy and genome evolution in plants. Curr. Opin. Genet. Dev. 35:119–125. Soltis, P.S., and Soltis, D.E. (2016). Ancient WGD events as drivers of key innovations in angiosperms. Curr. Opin. Plant Biol. 30:159–165. Stamatakis, A. (2014). RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30:1312–1313.
Page 24
745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787
Stebbins, G.L. (1950). Variation and evolution in plants (New York: Columbia University Press). Stolzer, M., Lai, H., Xu, M., Sathaye, D., Vernot, B., and Durand, D. (2012). Inferring duplications, losses, transfers and incomplete lineage sorting with nonbinary species trees. Bioinformatics 28:i409–i415. Suyama, M., Torrents, D., and Bork, P. (2006). PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Res. 34:W609. Tang, H., Bowers, J.E., Wang, X., and Paterson, A.H. (2010). Angiosperm genome comparisons reveal early polyploidy in the monocot lineage. Proc. Natl. Acad. Sci. USA 107:472–477. te Beest, M., Le Roux, J.J., Richardson, D.M., Brysting, A.K., Suda, J., Kubesova, M., and Pysek, P. (2012). The more the better? The role of polyploidy in facilitating plant invasions. Ann. Bot. 109:19–45. The Tomato Genome Consortium (2012). The tomato genome sequence provides insights into fleshy fruit evolution. Nature 485:635–641. Tuskan, G.A., DiFazio, S., Jansson, S., Bohlmann, J., Grigoriev, I., Hellsten, U., Putnam, N., Ralph, S., Rombauts, S., Salamov, A., et al. (2006). The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science 313:1596–1604. Van de Peer, Y., Maere, S., and Meyer, A. (2009). The evolutionary significance of ancient genome duplications. Nat. Rev. Genet. 10:725–732. Van de Peer, Y., Mizrachi, E., and Marchal, K. (2017). The evolutionary significance of polyploidy. Nat. Rev. Genet. 18:411–424. Vanneste, K., Baele, G., Maere, S., and Van de Peer, Y. (2014). Analysis of 41 plant genomes supports a wave of successful genome duplications in association with the Cretaceous-Paleogene boundary. Genome Res. 24:1334–1347. Vekemans, D., Proost, S., Vanneste, K., Coenen, H., Viaene, T., Ruelens, P., Maere, S., Van de Peer, Y., and Geuten, K. (2012). Gamma paleohexaploidy in the stem lineage of core eudicots: significance for MADS-box gene and species diversification. Mol. Biol. Evol. 29:3793–3806. Visger, C.J., Germain-Aubrey, C.C., Patel, M., Sessa, E.B., Soltis, P.S., and Soltis, D.E. (2016). Niche divergence between diploid and autotetraploid Tolmiea. Am. J. Bot. 103:1396–1406. Wang, J., Sun, P., Li, Y., Liu, Y., Yang, N., Yu, J., Ma, X., Sun, S., Xia, R., Liu, X., et al. (2018). An Overlooked Paleotetraploidization in Cucurbitaceae. Mol. Biol. Evol. 35:16–26. Wang, Y., Tang, H., Debarry, J.D., Tan, X., Li, J., Wang, X., Lee, T.H., Jin, H., Marler, B., Guo, H., et al. (2012). MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 40:e49. Wendel, J.F., Jackson, S.A., Meyers, B.C., and Wing, R.A. (2016). Evolution of plant genome architecture. Genome Biol. 17:37.
Page 25
788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822
William, D.A., Su, Y., Smith, M.R., Lu, M., Baldwin, D.A., and Wagner, D. (2004). Genomic identification of direct target genes of LEAFY. Proc. Natl. Acad. Sci. USA 101:1775–1780. Wood, T.E., Takebayashi, N., Barker, M.S., Mayrose, I., Greenspoon, P.B., and Rieseberg, L.H. (2009). The frequency of polyploid speciation in vascular plants. Proc. Natl. Acad. Sci. USA 106:13875–13879. Yang, Z. (1997). PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 13:555–556. You, Q., Zhang, L., Yi, X., Zhang, K., Yao, D., Zhang, X., Wang, Q., Zhao, X., Ling, Y., Xu, W., et al. (2016). Co-expression network analyses identify functional modules associated with development and stress response in Gossypium arboreum. Sci. Rep. 6:38436. Zachos, J.C., Dickens, G.R., and Zeebe, R.E. (2008). An early Cenozoic perspective on greenhouse warming and carbon-cycle dynamics. Nature 451:279–283. Zdobnov, E.M., and Apweiler, R. (2000). InterProScan--an integration platform for the signature-recognition methods in InterPro. Bioinformatics 17:847–848. Zhao, C., Lang, Z., and Zhu, J.K. (2015). Cold responsive gene transcription becomes more complex. Trends Plant Sci. 20:466–468. Zhao, T., Holmer, R., de Bruijn, S., Angenent, G.C., van den Burg, H.A., and Schranz, M.E. (2017). Phylogenomic synteny network analysis of MADS-Box transcription factor genes reveals lineage-specific transpositions, ancient tandem duplications, and deep positional conservation. Plant Cell 29:1278–1292.
823
Figure Legends
824
Figure 1. Inferring number of gene families surviving duplicates after each WGD
825
in the evolutionary history of land plants. Phylogenetic tree showed the topology
826
and divergence times of 25 plants in this study. The evolutionary relationships of the
827
25 species were based on current accepted topology (Angiosperm Phylogeny
828
Website). Divergence time of each node of the species tree was obtained from the
Page 26
829
TIMETREE website (http://timetree.org/). Well acknowledged whole-genome
830
duplication (circle) and triplication (square) events were positioned onto the branches
831
of the phylogeny. Three periods (~120 Ma, ~66 Ma, <20 Ma) with prolific WGDs
832
were recognized, and are denoted in green, orange, and blue, respectively. The
833
number of gene families with duplicates retention following each WGD are shown
834
around the corresponding circle or square. The proportion of the duplications verified
835
by synteny evidence were generated from WGDs, which were indicated in the dashed
836
circles. A sketch map in the upper left shows the major environmental stresses during
837
the Cretaceous-Paleogene extinction period.
838 839
Figure 2. Retention patterns of stress-related gene families in three periods. (A)
840
Venn diagram showing the shared and specific gene families surviving duplicates
841
after multiple WGDs in certain periods. Numbers represent the number of gene
842
families with gene duplications. Numbers in square bracket indicate number of WGDs
843
with sharing gene families surviving duplicates. (B) The significantly enriched GO
844
terms of stress-related biological processes for the shared gene families retained gene
845
duplicates in three periods. The three columns with different colors are corresponding
846
to the WGDs in three periods as in panel A.
847 848
Figure 3. Biased retention patterns of transcription factor gene families after
849
WGDs. TFs (rows) were clustered based on their retention values, and WGDs
850
(columns) were grouped according to their occurring timing. Gene families to the top
851
of the heatmap were the high retention ones after WGDs, while TFs to the bottom of
852
the heatmap were the low retention ones. Color key on the upper left denotes the
853
retention values of the TFs. The number in each cell of the heatmap represents the
854
retention value of each TF after corresponding WGD. The numbers in parentheses
855
after the TF names represent the total number of orthogroups belonging to the TF
856
gene families.
857
Page 27
858
Figure 4. The duplication pattern of key genes in cold responsive pathway after
859
WGDs around K-Pg boundary. (A) Summary of duplicates retention status of
860
known important gene families in CBF-dependent signaling pathway in angiosperms
861
after 8 WGDs at K-Pg boundary. The ICE, CAMTA and CBF are the key transcription
862
factor gene families, and the SIZ, OST, EIN, and FRY are other related gene families
863
involved in CBF-dependent signaling pathway. “x” denotes no retention and solid
864
dots indicate gene retentions. (B) An illustration of expansion and remodeling of
865
CBF-dependent signaling pathway following WGD in A. thaliana. ICE1, ICE2 were
866
duplicated from β WGD. CBF1, CBF2 and CBF3 were generated by tandem
867
duplications. (C) Phylogeny of the ICE gene family showed the duplications in its
868
evolutionary history. Solid circles indicate duplications occurred in different periods.
869
Numbers on branches show the bootstrap supporting values. Syntenic blocks with ICE
870
genes were placed on the right of the phylogenetic tree.
871 872
Figure 5. Comparison of co-expression networks between duplicates of ICE1 and
873
ICE2 in A. thaliana and the orthologous ICE in V. vinifera. ICE1, ICE2 were
874
generated by β WGD. The red sinewave lines link corresponding orthologous pairs as
875
they clustered in the same orthogroup. Green dashed lines between two nodes indicate
876
positive co-expression relationships. Four genes in the Arabidopsis co-expression
877
network, which have been previously demonstrated function in cold treatment
878
response, were highlighted with annotation information.
879 880
Figure 6. The retention pattern of key genes in shade avoidance pathway after
881
WGDs around K-Pg boundary. (A) Summary of retention status of the PHY and
882
HB gene families after 8 independent WGDs around K-Pg boundary. PHY and HB are
883
the two major gene families in shade avoidance pathway. “x” denotes no retention and
884
solid dots indicate gene retentions. (B) An illustration of expansion and remodeling of
885
shade avoidance pathway following WGD by comparing a predicted ancestral
886
network with the current network in A. thaliana. ATHB2 and HAT1 were generated by
887
β WGD. (C) Phylogeny of the HD-ZIP II gene family showed the duplications in its
Page 28
888
evolutionary history. Solid circles indicate duplications occurred in different periods.
889
Numbers on branches show the bootstrap supporting values. Syntenic blocks with
890
HD-ZIP II genes were placed on the right of the phylogenetic tree.
Cretaceous
Jurassic Asteroid impact
Paleogene-Quaternary 60%
dust reflecting sunlight
Cooling
Darkness
Boechera stricta
75%
β
78%
91%
5,806 αPt
84%
2,727 βG
Arabidopsis thaliana
4,223 αT
Tarenaya hassleriana Populus trichocarpa
93%
1,592 αE
Medicago truncatula
8,769 αG
Glycine max Elaeocarpus grandis
1,575 γ
Duplication
Eudicots
37%
2,185 α
868
Vitis vinifera 55%
68%
2,595 αS
Solanum tuberosum Solanum lycopersicum
Triplication
Setaria viridis 56%
8,337 αPv
54%
66%
1,684 ρ
58%
Panicum virgatum Sorghum bicolor Zea mays Hordeum vulgare Aegilops tauschii
818 τ
1,063 ε ζ
Brachypodium distachyon 1,263 γ
34%
81%
51%
Oryza sativa
4,171
Musa acuminata
M
α/β 65%
1,008
p
2,254
Elaeis guineensis
S
Spirodela polyrhiza
α/β 41%
Amborella trichopoda
36%
200
180
160
140
120
100
80
Selaginella moellendorffii
1,185 αPp
60
Physcomitrella patens 40
20
0 Ma
Monocots
943 σ
1,831 αZ
Setaria italica
A
B
αPt
1,220
β
βG
25
Pp
α
189 49 [8] 195 [7] 493 [6] 1,106 [5]
406
Stress biological process 269
response to cold
αS
response to heat −Log10 (Adjusted P value)
119
198 p
response to osmotic stress
ρ
408 M
α/β
γ
1,255
τ
320
498
αG
αPv
1,335
115
612
110 117
407 213
844
108
αZ
10 37
531
6 8 >10
65
1,386 3,248
response to salt stress
αT
4
response to water deprivation response to wounding
0
1 2 Value
3
β
αPt
βG
αS
ρ
α/β
p
αPp
αT
αG
αPv
αZ
1.89 0.81 1.15 0.81 1.32 0.52 0.99 0.18 1.18 0.63 1.57 1.57 1.1 0.98 0.93 1.19 1.05 0.79 1.48 1.57 0.9 0.63 1.26 0.79 3.15 1.57 0.94 1.57 1.18 0.79 0.52 1.57 2.1 0.79 1.45 0.63 1.26 3.15 1.57 1.57 0.35 3.15 0 0 0 0 0 0 0 0 3.15 0 0 0 0 0 0
3.31 0.53 1.1 0.95 2.18 1.38 0.65 0.97 1.55 0.83 2.07 1.66 0.39 0.78 1.46 0.86 2.76 1.03 1.95 2.76 2.37 0.83 3.31 2.07 8.28 0.83 0 0 0 0 0 0 0 2.07 2.55 0 0 0 0 4.14 0.92 4.14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3.57 1.3 0.6 1.53 0.47 1.49 1.18 0.53 2.23 2.68 1.12 2.68 1.25 0.56 0.53 1.23 0 0 0 0 0 0 3.57 0 0 0 0.89 2.23 0 4.47 0.74 0 0 0 0.69 0 0 0 0 4.47 0.99 0 0 0 4.47 0 4.47 0 0 0 0 0 0 0 0 0 0
0.95 0.96 1.03 0.92 1 1.19 1.1 1.05 1.12 1.07 0.74 1.07 1 1.15 1.05 0.86 1.19 1.19 1.05 0.79 1.02 0.95 0.95 1.19 1.19 0.95 0.95 1.19 0.89 1.19 0.99 1.19 1.19 1.19 0.92 1.19 0.71 1.19 0.89 0.6 0.93 1.19 0.6 0.71 1.19 1.19 1.19 1.19 1.19 0 0 1.19 0.6 0 1.19 1.19 0
1.89 0.76 1.03 0.61 0.87 0.39 0.87 0.97 1.18 1.66 1.77 1.66 1.05 1.4 0.7 0.82 2.37 1.48 0.97 0.79 1.01 1.89 1.89 2.37 2.37 0.24 1.18 1.18 0.89 1.18 1.18 1.18 0.79 1.77 1.09 0.95 0.47 2.37 1.18 1.18 0.53 1.18 0 0 0 2.37 1.18 0 0 0 0 0 0 0 2.37 1.18 0
1.96 0.84 1.42 1.31 1.03 1.64 1.21 0.58 0.61 1.64 0.82 0.65 0.99 0.92 0.77 0.56 2.18 0.82 0.96 1.64 0.94 0.65 1.31 2.45 3.27 0.98 2.29 1.64 1.23 0.82 0.82 3.27 1.09 0.82 1.01 0.65 1.31 1.64 0 0 0 0 1.64 0.65 0 0 1.64 0 0 0 0 0 0 3.27 0 0 0
2.84 0.53 0.79 0.68 1 1.58 1.74 0.28 2.07 2.37 1.18 1.89 1.43 0.3 0.56 0.82 1.58 1.77 0.56 0.79 1.35 0.95 0 0 4.73 0.47 2.37 1.18 0.59 2.37 0.39 2.37 3.15 2.37 1.46 0 0 0 1.18 2.37 0.53 0 0 0 2.37 0 2.37 2.37 0 0 0 0 0 0 0 0 0
1.15 0.88 1.15 0.86 0.9 0.95 1.06 1.01 1.16 1.43 1.07 1.29 1.1 0.94 1.1 0.79 1.43 0.9 1.1 0.95 1.23 1.15 1.15 1.07 0 1 1.15 1.43 0.72 1.43 1.19 1.43 0.95 1.43 0.77 0.86 0.57 1.43 1.07 1.43 0.8 0.72 1.43 0.86 1.43 0 1.43 0 1.43 0 0 0 0.72 1.43 0 0 0
1.38 0.93 0.77 1.12 0.73 1.92 1.15 1.35 1.44 1.84 1.72 1.38 0.32 1.08 0.68 1.19 1.53 0.86 0.95 1.53 0.98 1.61 0.92 0.57 2.3 1.38 1.84 1.15 0.57 1.72 1.72 0 0 0 0 1.38 0.46 0 0 1.15 1.28 2.3 1.15 0.92 2.3 2.3 0 0 0 2.3 0 0 0 0 0 0 0
1.32 0.9 1.1 1.03 1.21 1.65 1.04 1.16 1.65 0.66 0.82 1.98 1.15 0.41 0.58 0.57 1.1 0.41 1.55 1.65 0.47 0.33 0.66 0.82 3.29 0.66 1.32 0.82 1.65 0.82 0.55 1.65 3.29 1.65 0.51 0 1.32 3.29 0 1.65 0.37 3.29 0 1.98 0 0 1.65 1.65 1.65 3.29 0 3.29 0 0 0 1.65 3.29
1.69 0.84 0.96 1.01 1.06 0.84 0.84 1.09 0.74 1.52 1.47 0.84 1.02 1.26 0.79 0.99 1.12 1.05 1.09 0.56 1.2 1.35 0.67 1.26 1.69 1.18 1.35 0.84 1.05 0.84 1.26 1.69 1.69 1.26 1.04 1.01 0.67 1.69 1.69 0 0.37 0 0.84 0.34 1.69 1.69 0.84 1.69 0.84 1.69 1.69 0 0 0 0 0 0
1.07 0.98 0.96 0.95 1.07 1.07 1.07 1.01 0.87 1.07 1.07 0.96 1.07 1.07 0.88 0.85 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 0.96 0.85 1.07 0.93 1.07 0.98 1.07 1.07 1.07 1.07 1.28 0.64 1.07 1.07 1.07 0.95 1.07 1.07 1.07 1.07 2.14 0.53 0.53 1.07 1.07 1.07 1.07 1.07 1.07 0 1.07 0
1.07 0.84 1.2 0.99 1.33 0.44 0.81 0.94 1.25 1.2 0.33 1.07 1.05 1.21 0.78 0.83 1.33 1.17 1.33 1.11 1.14 0.67 1.33 0.67 1.33 0.93 1.47 1.33 0.83 1 0.89 1.33 1.33 0.33 0.72 1.07 1.07 1.33 1 0.67 0.89 1.33 0.67 1.33 1.33 1.33 0.67 0.67 1.33 1.33 1.33 0 1.33 0 1.33 1.33 1.33
2.54 0.75 1.13 0.85 1.34 2.12 0.78 1.25 1.32 0.85 1.59 0.85 0.59 1.19 0.5 1.02 1.41 0.53 1.25 2.12 0.6 0.85 1.69 1.06 4.23 0 2.12 1.06 1.06 0 0 2.12 1.41 4.23 1.63 1.69 0.85 0 1.06 0 0.47 0 2.12 0.85 2.12 4.23 0 0 0 0 0 0 0 0 4.23 2.12 0
M
ARF (5) bHLH (62) bZIP (30) C2H2 (35) C3H (19) CO−like (6) ERF (38) G2−like (17) GRAS (16) HD−ZIP (10) HSF (8) LBD (10) MYB (43) NAC (32) Trihelix (17) WRKY (29) CAMTA (3) AP2 (8) MYB_related (17) NF−YB (6) SBP (7) TALE (10) Nin−like (5) ARR−B (4) VOZ (1) TCP (10) GATA (10) NF−YA (4) B3 (8) ZF−HD (4) Dof (12) EIL (2) E2F/DP (3) DBB (4) MIKC_MADS (13) HB−other (5) M−type_MADS (5) CPP (2) YABBY (4) GeBP (2) WOX (9) BES1 (2) GRF (2) NF−YC (5) BBR−BPC (2) NF−X1 (1) RAV (2) LSD (2) SRS (2) LFY (1) FAR1 (2) S1Fa−like (1) Whirly (2) SAP (1) STAT (1) HB−PHD (2) HRT−like (1)
low retention
τ
high retention
γ
A
B
Cold pathway genes
Ortho ID
α Pt
β
●
3343
ICE
610
CAMTA
x
568
CBF
x
962
SIZ
x
110
OST
x
1242
EIN
x
926
FRY
x
● ● ● ● ● ● ●
Predicted ancestral network
G
S
β
α
x
x
● ● ● ● ●
x
● ● ● ● ●
x
WGD
ICE1
CBF*
α
●
x
● ● ●
●
x
x
x x
x
x
●
x
CAMTAs
ICE2
CBF3
CBF2
CBF1 CBF1 β WGD
COR genes
COR genes
Cold/Freezing tolerance
C
● ● ● ● ● ● ●
Pp
p
Cold stress Cold stress Ca 2+, kinases
Ca 2+, kinases
CAMTA*
● ● ● ● ● ● ●
M
α/β
Current network
Cold stress
ICE*
ρ
tandem duplication
Cold/Freezing tolerance Mtr Medtr3g498695.1
100 92
Gma Glyma.04G200500.1.p
Gma Chr4 47.21-47.45Mb Gma Chr6 13.78-13.53Mb
Gma Glyma.06G165000.1.p 100
Stu PGSC0003DMP400050145 Sly Solyc06g068870.2.1
100 100 70
Ptr Chr12 12.93-13.10Mb Ptr Chr15 12.26-12.41Mb
Ptr Potri.012G106000.1 Ptr Potri.015G105200.1 Tha XP 010536585.1
93
Vvi Chr17 0.02-0.38Mb
Vvi GSVIVT01008637001
Vvi Chr14 24.65-25.19Mb
Vvi GSVIVT01032998001 Egr Eucgr.G01938.1.p
80 100
82
Ath Chr1 4.37-4.43Mb
Ath AT1G12860.1 (ICE2) Bst Bostr.13671s0164.1.p
100
98
Tha XP 010528979.1 Tha XP 010554141.1
66 100
Tha Sca11 1.11-1.17Mb Tha Sca4 4.55-4.51Mb
Bst Bostr.0556s0344.1.p Ath AT3G26744.1 (ICE1)
Ath Chr3 9.87-9.71Mb
AT4G37190 AT2G25100 AT4G04670
AthICE1
AT3G48470
AT3G26744
AT5G13000
GSVIVG01012726001 AT1G20910
CAAX protease self-immunity protein
GSVIVG01007540001
Cold shock protein 2 (CSP2)
GSVIVG01007560001
AT3G49650 AT3G19080
AT2G27470
AT5G02120 AT1G18090
AT4G38680
AT5G17270
AT1G18100 AT2G20290
AT1G75690AT1G23880 AT1G63650 AT1G69523 AT3G17970
AT4G24790
AT4G10450
GSVIVG01013439001
AT2G32415
AT4G35290 GSVIVG01013400001
AT4G25630
AT4G11175 AT1G66760
AT5G52950 AT3G56070AT4G29020
AT1G26761
AT4G34730 AT2G40430
AT2G35260
AT5G15430 AT5G37130
GSVIVG01019319001
AT5G25590
AT3G09890
GSVIVG01031124001
AT1G01970
GSVIVG01000118001
AT2G19740 GSVIVG01000109001
AT1G15510 AT5G56710 GSVIVG01038569001
Response regulator 15 (ARR15)
AT3G49890 AT5G53080
AT1G12860
AthICE2
AT5G57030 AT1G33170
AT5G55580
GSVIVG01032998001
VviICE
AT5G27680
AT5G58370 AT1G20870
AT5G03250 AT4G13650
GSVIVG01016664001
AT3G10840
AT2G20830 AT4G27510
AT1G42540
AT3G57200
AT5G42070
AT1G74890
AT4G40000
AT3G44750
AT4G38970
AT5G09320
AT1G12920 AT5G48760
GSVIVG01010722001
AT1G72320
Fructose-bisphosphate aldolase 2 (FBA2)
AT3G61770
GSVIVG01003549001
GSVIVG01015169001
GSVIVG01008020001
GSVIVG01037780001
A
Ortho ID
B
Shade avoidance genes
480
PHY
623
HB
β
x
●
α
G
Pt
● ●
Predicted ancestral network
α
● ●
● ●
WGD
100
Gma Glyma.01G207300.1.p Gma Glyma.11G035900.1.p
87
Mtr Medtr5g013010.1
99 94
Gma Glyma.05G062700.1.p Gma Glyma.17G144700.1.p
97
Mtr Medtr4g100550.1 100
● ●
● ●
● ●
●
phyE
Pfr PIF TFs
HFR1
ATHB4 genes ATHB2 HAT1 other
Shade avoidance
C
α
Pr
Pfr
ATHB4 genes ATHB* other
p
x
phyB/D
PIF TFs HFR1
Shade avoidance Gma Chr1 53.83-54.04Mb
Gma Chr1 53.86-54.01Mb Gma Chr11 2.64-2.47Mb
Gma Chr5 6.26-5.61Mb
Gma Chr5 5.77-6.18Mb Gma Chr17 11.65-11.94Mb
Stu PGSC0003DMP400045581
Sly Chr8 59.00-59.53Mb
Sly Solyc08g078300.2.1
91
Stu PGSC0003DMP400041776 100
Sly Solyc06g060830.2.1
100
Ath AT5G47370.1 Bst Bostr.14419s0034.1.p
100
Ath AT4G17460.1 (HAT1)
100
100
Sly Chr6 35.34-35.16Mb Ath Chr5 19.17-19.29Mb Ath Chr4 9.76-9.68Mb
Bst Bostr.30275s0201.1 Tha XP_010529979.1
64 100
Ath AT4G16780.1 (ATHB2) Bst Bostr.30275s0139.1.p
100 63
Tha XP_010531658.1 Tha XP_010538408.1
100
Ptr Potri.001G155100.1 Ptr Potri.003G079800.1
Pp
α/β
Low R/FR light
phyE
Pr
M
ρ
Current network
Low R/FR light phy*
S
β
Ptr Chr1 12.66-12.98Mb Ptr Chr3 10.65-10.93Mb