Genetic Contribution of Paleopolyploidy to Adaptive Evolution in Angiosperms

Genetic Contribution of Paleopolyploidy to Adaptive Evolution in Angiosperms

Journal Pre-proof Genetic contribution of paleopolyploidy to adaptive evolution in angiosperms Shengdan Wu, Baocai Han, Yuannian Jiao PII: DOI: Refer...

3MB Sizes 0 Downloads 42 Views

Journal Pre-proof Genetic contribution of paleopolyploidy to adaptive evolution in angiosperms Shengdan Wu, Baocai Han, Yuannian Jiao

PII: DOI: Reference:

S1674-2052(19)30359-4 https://doi.org/10.1016/j.molp.2019.10.012 MOLP 843

To appear in: MOLECULAR PLANT Accepted Date: 23 October 2019

Please cite this article as: Wu S., Han B., and Jiao Y. (2019). Genetic contribution of paleopolyploidy to adaptive evolution in angiosperms. Mol. Plant. doi: https://doi.org/10.1016/j.molp.2019.10.012. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. All studies published in MOLECULAR PLANT are embargoed until 3PM ET of the day they are published as corrected proofs on-line. Studies cannot be publicized as accepted manuscripts or uncorrected proofs. © 2019 The Author

Page 1

Original research

1 2 3

Genetic contribution of paleopolyploidy to adaptive evolution in angiosperms

4 5

Shengdan Wu1,2, Baocai Han1, Yuannian Jiao1,2*

6 7

1

8

Chinese Academy of Sciences, Beijing 100093, China.

9

2

State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany,

University of Chinese Academy of Sciences, Beijing 100049, China.

10 11

*Corresponding author:

12

Email: [email protected]

13 14

Running title: Paleopolyplodization and plant adaptive evolution

15 16

Short summary: Polyploidy has been recognized as a driving force for the success of

17

angiosperms. Through comprehensive investigation of gene retention pattern after

18

independent WGDs, this study provides novel evidence in genetic level supporting

19

ancient WGDs have played an essential role in plant adaptation to dramatic

20

environmental changes.

21 22 23 24 25 26 27

Page 2

28

Abstract

29

Ancient whole-genome duplications (WGD or polyploidy) are prevalent in plants, and

30

some WGDs occurred during the timing of severe global environmental changes. It

31

has been suggested that WGDs may have contributed to plant adaptation. However, it

32

still lacks of empirical evidence from genetic level to support the hypothesis. Here,

33

we investigated the survivors of gene duplicates from multiple ancient WGD events

34

on the major branches of angiosperm phylogeny, and aimed to explore genetic

35

evidence supporting the significance of polyploidy. Duplicated genes co-retained from

36

three waves of independent WGDs (~120 million years ago (Ma), ~66 Ma and <20

37

Ma) were investigated in 25 selected species. Gene families functioning in low

38

temperature and darkness were commonly retained gene duplicates after the eight

39

independently occurred WGDs in many lineages around the Cretaceous–Paleocene

40

(K-Pg) boundary, when the global cooling and darkness were the two main stresses.

41

Moreover, the commonly retained duplicates could be key factors which may have

42

contributed to the robustness of the critical stress related pathways. In addition,

43

genome-wide transcription factors (TFs) functioning in stresses tend to retain

44

duplicates after waves of WGDs, and the co-selected gene duplicates in many lineages

45

may play critical roles during severe environmental stresses. Finally, our results shed

46

new light on the significant contribution of paleopolyploidy to plant adaptation during

47

global environmental changes in the evolutionary history of angiosperms.

48 49

Key words: whole-genome duplication, paleopolyploidy, adaptive evolution,

50

phylogenomic, Cretaceous-Paleocene boundary, gene regulatory network

51 52 53 54 55

Page 3

56

Introduction

57

Angiosperms (or flowering plants) are the most diverse and abundant in the plant

58

kingdom, with about 350,000 known species on Earth. Charles Darwin described the

59

rapid rise and early diversification of angiosperms from the middle to late Cretaceous

60

period as “an abominable mystery” (Friedman, 2009). Currently, angiosperms

61

constitute the dominant vegetation of the Earth’s surface covering regions from

62

tropical to polar terrestrial zones, as well as aquatic habitats. The success is speculated

63

due to, to some extent, prevalent whole-genome duplication (WGD) events in the

64

evolutionary history of angiosperms (Levin, 1983; Soltis et al., 2009; Van de Peer et

65

al., 2009, 2017). WGD has long been recognized as an important evolutionary force

66

for speciation, adaptation, and diversification (Wood et al., 2009; Soltis and Soltis,

67

2016).

68 69

Within recent two decades, tremendous efforts have shown that WGDs are far more

70

prevalent than previously thought in the evolutionary history of flowering plants

71

(Bowers et al., 2003; Blanc and Wolfe, 2004; Cui et al., 2006; Soltis et al., 2009; Jiao

72

et al., 2011; Renny-Byfield and Wendel, 2014; Van de Peer et al., 2017). Two

73

ancestral WGDs were identified before the diversification of extant angiosperms and

74

seed plants, respectively (Jiao et al., 2011). Two major clades in angiosperms,

75

eudicots and monocots, both experienced paleopolyploidization events early in their

76

evolutionary history, named gamma (γ) and tau (τ) (Jaillon et al., 2007; Tang et al.,

77

2010; Jiao et al., 2012; Vekemans et al., 2012; Jiao et al., 2014; Ming et al., 2015). In

78

addition, WGDs also occurred in the common ancestors of many species-rich groups,

79

such as Asteraceae, Brassicaceae, Cucurbitaceae, Fabaceae, and Poaceae (Cannon et

80

al., 2015; Edger et al., 2015; Huang et al., 2016; McKain et al., 2016; Ren et al., 2018;

81

Wang et al., 2018). Especially, WGDs were recurrently occurred in many lineages.

82

For example, it had undergone three rounds of WGDs (gamma-beta-alpha) in

83

evolutionary history of Arabidopsis thaliana after split from monocots (Bowers et al.,

84

2003); and the lineage of Musa (bananas) independently experienced three rounds of

Page 4

85

WGDs after split with Poaceae (D’Hont et al., 2012).

86 87

Furthermore, previous studies found that the timing of WGDs are not randomly

88

distributed across the phylogeny of angiosperms, indicating possible roles of WGDs

89

under environmental selection. A wave of ancient WGDs occurred around the K-Pg

90

boundary independently in many plant lineages, suggesting WGDs potentially helped

91

species survived the extinction event (Fawcett et al., 2009; Vanneste et al., 2014). It

92

also has been proposed that polyploidization was associated with C4 grassland

93

expansion during the late Miocene, as well as with adaptation to recent glaciation

94

maxima (Estep et al., 2014; Novikova et al., 2018). Therefore, WGD has been

95

speculated with association to extinction events and other extreme environmental

96

changes. However, the evidence of genetic level supporting the contribution and

97

significance of WGDs to adaptation remains largely unexplored.

98 99

It is well acknowledged that polyploidy simultaneously duplicates tens of thousands

100

of genes by adding one extra set of genomes, which provides large amount of raw

101

genetic materials for evolution (Adams and Wendel, 2005; Doyle et al., 2008; Hegarty

102

and Hiscock, 2008; Soltis et al., 2015; Van de Peer et al., 2017). During the

103

subsequent fractionation and diploidization processes, a large proportion of genes will

104

quickly return to single copy state (Lynch and Conery, 2000), while the retained ones

105

are considered of particular importance to genetic innovation through the ways of

106

neo-functionalization and sub-functionalization (Ohno, 1970; Force et al., 1999). In

107

addition, the duplicated genes might also result in changes of gene regulatory

108

networks (GRNs) (Conant, 2010; De Smet and Van de Peer, 2012), which could

109

potentially contribute to plant adaptation.

110 111

To explore the significance of WGDs, here we comprehensively tracked the

112

evolutionary history of global gene families in 25 plant genomes, and investigated the

113

genetic modifications after independent WGDs. Firstly, twenty-five sequenced plant

114

genomes representing major lineages of angiosperms were used to reconstruct global

Page 5

115

gene families, and phylogenomic analyses were performed to identify the gene

116

families retained duplicated genes after ancient WGDs. Then, we identified gene

117

families with retained duplicates after independent WGDs from certain period with

118

extreme environmental changes, looking for potential selection signatures from

119

genetic level. Finally, by reconstructing GRNs from RNA-seq data and integrating

120

previously known pathways, we provided evidence of how retained duplications have

121

contributed to reshape the GRNs in response to environmental stresses.

122 123

Results

124

Identifying paralogous genes retained after WGDs

125

To identify genetic contribution of WGDs, we investigated twenty-one well

126

acknowledged polyploidization events in the evolutionary history of angiosperms

127

(Figure 1). We selected 25 sequenced plant genomes (Supplemental Table 1) and

128

constructed putative gene families from their protein coding sequences. In total,

129

66,509 orthogroups were constructed by OrthoMCL (Li et al., 2003). Among them,

130

12,077 orthogroups with four or more genes and including at least one gene from

131

outgroups (Physcomitrella patens, Selaginella moellendorffii, Amborella trichopoda)

132

were used to reconstruct maximum likelihood phylogenetic trees (see Methods). Then,

133

phylogenomic analyses were performed to look for duplication events from each gene

134

family phylogeny as previous described (Jiao et al., 2011).

135 136

To largely ensure the duplications generated from WGDs, tandem duplications were

137

removed firstly based on the chromosomal positions of the duplicated genes, which

138

could screen out some ancient small scale duplications (see Methods). Furthermore,

139

synteny analyses of each species were able to provide synteny support for a large

140

proportion the duplications identified in this study (Figure 1 and Supplemental Figure

141

1). In addition, some duplications were hard to classify due to a lack of branching

142

species between two consecutive WGD events, such as ρ or σ in Poales, α/βM or γM in

143

Musa acuminata. We employed synonymous substitution per synonymous site (KS)

Page 6

144

approach to distinguish the gene survivors from certain duplication events (see

145

Methods). Together, we were able to gather the gene families with surviving

146

duplicates from each WGD event (Figure 1).

147 148

Gene retention pattern after WGDs occurred in three periods

149

WGD could generate a large number of duplicated genes in one event, which provides

150

a tremendous amount of raw genetic material for evolution. If the independent WGDs

151

helped species survive environmental changes, similar functional gene duplicates

152

would be retained in these different species as they might have been selected by

153

common environmental pressures.

154 155

The previously identified and dated twenty-one WGDs were denoted on the species

156

tree of selected land pants, fourteen of which could be classified into three waves

157

based on periods of the occurrence timing (Figure 1 and Supplemental Table 2). The

158

most ancient wave was around ~120 Ma when γ (Jaillon et al., 2007; Jiao et al., 2012;

159

Vekemans et al., 2012) and τ (Jiao et al., 2014; Ming et al., 2015) events occurred in

160

the early evolutionary history of eudicots and monocots respectively. The second one

161

is the well-known wave around the K-Pg boundary, when a large number of WGD

162

events occurred (Paterson et al., 2004; Tuskan et al., 2006; Rensing et al, 2008;

163

Fawcett et al., 2009; Schmutz et al., 2010; D’Hont et al., 2012; The Tomato Genome

164

Consortium, 2012; Singh et al., 2013; Vanneste et al., 2014). The relatively recent

165

wave was within 20 Ma during which four independent WGDs occurred in the

166

evolutionary history of Glycine max, Panicum virgatum, Tarenaya hassleriana, and

167

Zea mays (Blanc and Wolfe, 2004; Schmutz et al., 2010; Cheng et al., 2013; Lu et al.,

168

2013).

169 170

We found certain gene families indeed survived gene duplicates from several

171

independent WGDs in many species (Figure 2A), which were likely the signal of

172

selection from the specific stress environments. 66 gene families commonly retained

173

gene duplicates in three periods (Supplemental Figure 2), which were mainly protein

Page 7

174

kinases, transporters, and protein binding gene families (Supplemental Table 3). 320

175

gene families retained duplicates from the most ancient wave of WGDs (γ and τ),

176

which were enriched for genes functioning in response to water deprivation and salt

177

stress (Figure 2B). These survivors may be, at least in part, selected by the arid

178

climate around 120 Ma of Cretaceous (Heimhofer et al., 2005). The second wave of

179

WGDs were during K-Pg boundary with severe environmental changes, including

180

global cooling, darkness, acid rain and wildfires (Nichols and Johnson, 2008; Schulte

181

et al., 2010). 493 gene families retained gene duplicates from at least 6 independent

182

WGDs during the K-Pg boundary (Figure 2A), which were enriched for many stress

183

related Gene Ontology (GO) terms including cold, heat, osmotic, salt stress, water

184

deprivation, and wounding (Figure 2B), as well as several other biological process

185

associated with stress response (e.g. abscisic acid signaling pathway, cellular response

186

to phosphate starvation, defense response, and response to karrikin) (Supplemental

187

Figure 3). We also investigated the other five lineages without poleopolyploidization

188

events during the K-Pg boundary and found that they retained small-scale

189

duplications in 12 gene families (Supplemental Figure 4). However, these gene

190

families mainly encode enzymes or transporters in plant metabolic processes, which

191

are not directly related to environmental adaptation (Supplemental Table 4). The most

192

recent wave of WGDs were within 20 Ma and retained duplicates from 844 gene

193

families (Figure 2A), of which the functional category enrichments are response to

194

salt stress, cold stress, water deprivation, and wounding (Figure 2B). The recorded

195

environmental changes were low CO2 concentrations and relatively cool temperatures

196

during that period (Zachos et al., 2008). Our enriched GO terms might partially

197

explain the environmental changes, but also suggesting the presence of other

198

environmental selection for different lineages.

199 200

Biased retention of transcription factor gene families after WGDs

201

TFs play a critical role in the transcriptional regulation of genes involved in many

202

biological processes (e.g. growth, development, and stress responses) (de Mendoza et

203

al., 2013). Previous studies demonstrated TFs are the vastly over-retained genes after

Page 8

204

WGD (Maere et al., 2005; Freeling et al., 2009).

205 206

We examined the retention pattern of gene duplicates of TFs after three waves of

207

WGDs based on the retention value (R-value, see Methods). In general, the majority

208

of the TF genes tend to be retained after WGDs (Figure 3), which is consistent with

209

previous analyses (Maere et al., 2005; Freeling et al., 2009). However, we found that

210

not all TF gene families were over-retained, and different families of TF showed

211

certain retention preferences (Figure 3). For example, high-retention gene families,

212

including ARF, C2H2, C3H, CO-like, ERF, G2-like, GRAS, HD-ZIP, HSF, LBD,

213

MYB, NAC, Trihelix, WRKY, bHLH and bZIP gene families, tend to repeatedly

214

retain duplicates after WGDs independent of the evolutionary periods and diverse

215

lineages (Figure 3). Many of the high-retention TF gene families are involved in

216

diverse development processes and response to abiotic and biotic stresses (Khan et al.,

217

2018). However, some TFs were lowly retained after many WGDs, such as FAR1,

218

HB-PHD, HRT-like, LFY, LSD, NF-X1, S1Fa-like, STAT, SAP, and Whirly,

219

suggesting the conservative function and dosage of these TFs (Figure 3). Most of

220

low-retention TF gene families were functioning in conserved biological processes.

221

For instance, LFY controls the switch from vegetative to reproductive development

222

(William et al., 2004) and LSD1 negatively regulates plant cell death pathway

223

(Dietrich et al., 1997).

224 225

Duplicated TF genes, which were co-retained after specific wave of WGDs, are

226

considered as critical genetic contributions for species surviving environmental

227

changes. Co-retained TF genes at ~120 Ma mainly involved in plant growth,

228

development, morphogenesis, and stress response (Supplemental Table 5). For

229

example, the retained duplicate genes and their functional divergence in four

230

orthogroups of MADS-box gene family likely contributed to morphological novelty

231

of floral organs in both core eudicots and monocots (Zhao et al., 2017). Two

232

orthogroups of heat stress transcription factor (HSF) play roles in responding to heat

233

stress. However, co-retained TFs at K-Pg boundary mainly involved in responding to

Page 9

234

various abiotic stresses (Supplemental Table 6). Orthogroups of the C2H2,ERF, and

235

RAV families involved in the response to low temperature. Orthogroups of the

236

HD-ZIP family involved in the shade avoidance syndrome and dehydration stress

237

responses, respectively. Orthogroup of the WRKY family involved in the response to

238

low phosphate stress (Supplemental Table 6).

239 240

Contribution of WGDs on the complexity of gene regulatory networks

241

As WGDs could potentially rewire the gene regulatory networks (Conant, 2010; De

242

Smet and Van de Peer, 2012), we aimed to explore the contribution of WGDs on

243

reshaping networks during adaptation to environmental changes following the K-Pg

244

boundary (Alvarez et al., 1980; Nichols and Johnson, 2008; Schulte et al., 2010).

245 246

Global cooling (or low temperatures) was a major environmental stress during the

247

mass extinction period (Schulte et al., 2010), and the C-repeat/DREB binding factor

248

(CBF)-dependent signaling pathway is the well-known major cold signaling pathway

249

(Chinnusamy et al., 2007; Shi et al., 2015, 2018). Currently, the core components of

250

the CBF-dependent signaling pathway have been deciphered in A. thaliana (Shi et al.,

251

2015). CBF genes, as key components in the pathway, are regulated by upstream ICE

252

and CAMTA TFs (Shi et al., 2015; Zhao et al., 2015), and are able to trigger the

253

expression of many cold-responsive (COR) genes under cold stress (Chinnusamy et

254

al., 2007).

255 256

By tracking the evolutionary history of key gene families in CBF pathway, we found

257

that the CBF, ICE, CAMTA and other related families (SIZ, EIN etc.) presented as

258

duplicated states in many different lineages (Figure 4A). The ICE1-ICE2 were

259

duplicated from β WGD in Arabidopsis (Figures 4B and C). The ice1 loss-of-function

260

mutant is sensitive to cold stress, which lead to significant reduction of survival rate

261

than wild type (Chinnusamy et al., 2003). Overexpression of ICE2 greatly enhanced

262

the cold tolerance in transgenic plants (Fursova et al., 2009). In Oryza sativa, CBF

263

genes were also retained as duplicated copies after the ρ WGD, which also play

Page 10

264

important role in cold stress (Supplemental Figure 5). Therefore, duplicated genes that

265

were retained from WGDs occurred during K-Pg boundary in difference lineages have

266

largely contributed to the copy number (probably dosage at first) and the complexity

267

of the current CBF-dependent signaling network functioning in cold stress tolerance

268

in eudicots and monocots (Figure 4B).

269 270

We also performed comparison of the network of CBF pathway members following

271

the polyploidization events of certain lineages. Co-expression networks have been

272

widely used for identifying functional related genes (Obayashi and Kinoshita, 2010;

273

You et al., 2016; Obayashi et al., 2018). For investigation of network evolution in

274

Arabidopsis lineage, Vitis is an ideal outgroup that experienced non-additional WGD

275

after the γ event. We constructed the cold-specific co-expression networks for A.

276

thaliana and Vitis vinifera using 162 and 60 RNA-seq data, respectively (see

277

Methods). For the duplicated ICE genes from β WGD, we examined the

278

co-expression networks of AthICE1 and AthICE2 in Arabidopsis and their

279

orthologous VviICE in Vitis (Figure 5). Most of the co-expressed genes in VviICE

280

module have orthologous genes clustered in AthICE1 and AthICE2 module, and the

281

corresponding orthologous in Arabidopsis could be divided into three sets: one set

282

specifically co-expressed with AthICE1, one set specifically co-expressed with

283

AthICE2, and one set co-expressed with both AthICE1 and AthICE2 (Figure 5),

284

indicating sub-functionalization of the duplicated ICE genes after WGD. Moreover,

285

the module of AthICE1 and AthICE2 in Arabidopsis is twice larger than the module of

286

VviICE by recruiting additional genes into the network after β and α WGDs, which

287

might potentially increase the cold stress tolerance.

288 289

Darkness (or low light) was another major environmental stress encountered by

290

species at mass extinction period, due to the atmospheric dust reflecting sunlight for a

291

long time (Schulte et al., 2010). We investigated the key components in the shade

292

avoidance pathway in plants (Jiao et al., 2007; Ruberti et al., 2012), and also found

293

several key genes were duplicated from the WGDs in multiple lineages (Figure 6A).

Page 11

294

In A. thaliana, the ATHB2 and HAT1 in the HD-ZIP II gene family, which function in

295

shade avoidance response, were derived from the β WGD (Figures 6B and C).

296

Molecular genetic analyses revealed that ATHB2 is rapidly induced by low R:FR light

297

in Arabidopsis, and the athb2 loss-of-function mutant display significant reduction of

298

hypocotyl elongation and shade avoidance ability comparing to wild type (Carabelli et

299

al., 2013). By using Arabidopsis as an example, a putative model for network

300

evolution from pre-WGD to post-WGD was illustrated in Figure 6B. Despite that the

301

predicted ancestral network is somewhat uncertain, our results showed clear evidence

302

for the expansion of shade avoidance pathway after WGDs, which may enhance the

303

perception of light signals and better adapt to low light environment.

304 305

To test the possible link between WGDs and plant adaptation, we compared the

306

specific retention pattern of regulatory genes in response to the cold and dark stresses

307

after the three waves of WGDs. The regulatory genes of cold stress pathway have a

308

higher chance to be retained after the recent two waves of WGDs (~66 Ma and <20

309

Ma), when the global cooling has been recorded during these two periods (Nichols

310

and Johnson, 2008; Zachos et al., 2008; Schulte et al., 2010) (Supplemental Figure 6).

311

The global darkness was only reported during the K-Pg boundary (Nichols and

312

Johnson, 2008; Schulte et al., 2010). Genes in shade avoidance pathway have a

313

particularly higher retention after the WGDs around the K-Pg boundary than the

314

retention after the other two waves of WGDs (Supplemental Figure 6). In addition, we

315

further investigated another stress pathway (Na+ tolerance), while high Na+ was not

316

the main global stress during the K-Pg boundary. The Salt-Overly-Sensitive (SOS)

317

signaling pathway has functions in maintaining ion homeostasis under high Na+

318

tolerance (Ji et al., 2013; Supplemental Figure 7A). Duplicates of the core members of

319

SOS pathway, such as SOS3, ScaBP8, SOS2, and SOS1, were only biasedly retained

320

after the examined WGDs (Supplemental Figures 6 and 7B). Therefore, the

321

preferential retention of the key members in stress related networks after multiple

322

independent WGDs may serve as critical evidence supporting the contribution of

323

WGDs to the adaptation of species during the global environmental changes.

Page 12

324 325

Discussion

326

The nature of periodically occurrence of ancient WGDs in angiosperms

327

To bridge the gap of the genetic contribution of paleopolyploidizations with adaptive

328

evolution in general, we need to explore empirical adaptive genetic signatures of

329

many WGDs during the evolutionary history of angiosperms. Polyploids are very

330

common in nature. However, the nascent polyploid individuals tend to encounter

331

internal and external obstacles, including increased rates of chromosome segregation

332

errors, small effective population size, competition with progenitor diploid species

333

and so on (Comai, 2005; Arrigo and Barker, 2012). Several studies suggested that

334

polyploid is usually an evolutionary dead end (Stebbins, 1950; Mayrose et al., 2011).

335

Recently formed polyploid plants have to find certain ecological niches that is

336

different from corresponding diploid species to survive (Stebbins, 1950; Levin, 1983;

337

Ramsey, 2011; te Beest et al., 2012; Visger et al., 2016). Polyploid plants could move

338

to a new, but stressful environment with no competition with their ancestral diploids,

339

or survive after a strong environmental selection that swapped the diploid ancestors

340

for polyploids (Otto and Whitton, 2000; Brochmann et al., 2004; Ramsey, 2011; te

341

Beest et al., 2012; Chao et al., 2013; Parisod and Broennimann, 2016). Therefore, the

342

paleopolyploidization events in angiosperms clustered and co-occurred with past

343

global environmental changes, which might have played significant roles in the

344

establishment of polyploids (Van de Peer et al., 2009, 2017).

345 346

Challenges to infer the evolutionary significance of WGDs

347

Due to the recurrent occurrences and subsequent large-scale gene losses, the

348

remaining signals of genetic contribution from ancient WGDs became complicated

349

and ambiguous (Doyle et al., 2008; Schnable et al., 2011; Wendel et al., 2016). In

350

addition, the environmental selection pressure usually not last as long as tens of

351

million years. The increased novel genetic contribution of ancient WGDs might have

352

lost after the environmental conditions changed. Moreover, hybridization and

Page 13

353

recombination could also remove the critical genetic information that helped species

354

surviving severe environmental changes during a particular period. Therefore, it poses

355

challenges to infer the significance of WGDs in the evolutionary history of

356

angiosperms. Here, we have to include many high-quality completely sequenced

357

genomes sharing one ancient WGD in our analysis to avoid missing critical genes due

358

to incomplete and/or improper genome assembly and annotation. More importantly,

359

although these critical genes might be lost in some species, we still be able to piece

360

together a broad picture from simultaneous consideration of many species. Finally, we

361

investigated several independent WGDs occurred at same period to look for shared

362

duplicated genes. Therefore, we were able to identify critical genetic signal for

363

species surviving dramatic environmental changes, and propose the likely

364

evolutionary consequences of WGDs for plant adaptation.

365 366

Genetic evidence shed new light on the contribution of WGD to adaptation

367

In addition to illustrate genetic impact of individual WGDs, we also need to consider

368

certain waves of ancient WGDs independently occurred in different lineages. The

369

severe environmental changes should have pose the similar selection for all species on

370

Earth. Previous studies have demonstrated the biased retention for genes related to

371

regulatory and development (Maere et al., 2005; Freeling, 2009). After comprehensive

372

investigation of gene families, we found certain functional genes were duplicated after

373

independent WGDs occurred during the same period, which provides likely evidence

374

supporting the global environmental selection on the paleopolyploids from different

375

lineages. For example, in response to low temperature and low light environmental

376

changes during K-Pg boundary, the second wave of ancient WGDs have contributed

377

to reshape the CBF-dependent signaling (Figures 4 and 5) and shade avoidance

378

pathways (Figure 6). The duplicates in ICE and CBF gene families are recruited in the

379

pathway which indeed enhanced the cold tolerance in plants (Shi et al., 2015).

380 381

Methods

Page 14

382

Genome data

383

We selected twenty-five sequenced plant genomes representing major lineages of

384

angiosperms and having clear WGD records during their evolutionary history. The

385

studied species included ten eudicots (A. thaliana, Boechera stricta, Eucalyptus

386

grandis, G. max, Medicago truncatula, Populus trichocarpa, Solanum lycopersicum,

387

Solanum tuberosum, Tarenaya hassleriana, and V. vinifera), twelve monocots

388

(Brachypodium distachyon, O. sativa, P. virgatum, Sorghum bicolor, Setaria italic,

389

Spirodela polyrhiza, Setaria viridis, Z. mays, Aegilops tauschii, Elaeis guineensis,

390

Hordeum vulgare, and M. acuminata), one living representative of a lineage that

391

represents the extant earliest diverging lineage of flowering plants also named basal

392

angiosperm (A. trichopoda), one lycophyte (S. moellendorffii), and one moss (P.

393

patens). Genome data of A. tauschii, E. guineensis, and M. acuminata were

394

downloaded from their project websites (Supplemental Table 1), and the other

395

genome data were mainly downloaded from Phytozome (version 11) (Goodstein et al.,

396

2012).

397 398

Gene family classification and phylogenetic analysis

399

We classified protein-coding genes into putative gene families or subfamilies by using

400

the OrthoMCL method (version 2.0.9) (Li et al., 2003) with an inflation parameter of

401

1.5, and obtained 66,509 orthogroups in total. The orthogroups with less than four

402

genes and/or without at least one gene from outgroups were filtered out, and the

403

remaining 12,077 orthogroups were processed to phylogenetic analysis. The

404

taxonomic distribution of the 12,077 orthogroups by considering the last common

405

ancestor of genes in each orthogroup was shown in Supplemental Figure 8. For the

406

construction of global gene family trees, the amino acid sequences of each orthogroup

407

were aligned with MAFFT (Katoh et al., 2005). Then the corresponding nucleic acid

408

sequences were forced onto amino acid alignments using PAL2NAL (Suyama et al.,

409

2006). To remove poorly aligned regions, the nucleic acid alignments were refined

410

using trimAl 1.4 (Capella-Gutirrez et al., 2009) with the option “automated1”.

411

Phylogenetic trees were conducted using maximum likelihood method in RAxML

Page 15

412

8.2.11 (Stamatakis, 2014) with the fast bootstrap option, 100 replicates under

413

GTRGAMMA model.

414 415

Identifying gene duplication events

416

In order to accurately identify gene duplications, we followed the same standard for

417

gene trees and species tree reconciliation as proposed by Jiao et al. (2011). That is,

418

two child branches need to have genes from at least one common species, and the

419

parental node and one of the child nodes should both have bootstrap values equal or

420

greater than 50%. Since the phylogenetic relationships of 25 species sampled in this

421

study were clear (Angiosperm Phylogeny Website), we directly adopted the currently

422

accepted topology as species tree. Genes from outgroups (P. patens, S. moellendorffii,

423

and A. trichopoda) were used to root trees.

424 425

Firstly, we used Notung 2.9 (Stolzer et al., 2012), a gene tree-species tree

426

reconciliation program, to batch reconciling all the nodes of gene trees with

427

corresponding nodes in species tree. Parsimony-based optimization criterion was

428

employed in Notung to minimize the duplication/loss cost. We ran the analysis based

429

on the duplication-loss (DL) events model. In addition to “--reconcile” mode, the

430

“--rearrange” mode was also performed with parameters setting as “--threshold 50%”.

431

This option could rearrange weakly supported edges (such as BS < 50%) and reduce

432

the duplication uncertainty of inference (Notung 2.9 manual). Secondly, after

433

carefully checking thousands of reconciled trees resulted from Notung, we further

434

applied the standard that “two child branches need have genes from at least one

435

common species”, and removed some low confidence duplications from all the

436

reconciled trees.

437 438

Eliminating tandem duplications

439

We defined two genes located within five genes as tandem duplicates. If a duplication

440

node contains two genes (gene1, gene2), or two child branches ((gene1, gene2),

441

(gene3, gene4)), either of the two genes located proximal to each other were treated as

Page 16

442

from tandem duplications. Based on the above criteria, we removed tandem

443

duplication events for all the reconciled gene trees.

444 445

KS calculation of duplication node and circumscription of individual WGD

446

We used KS value to date two WGDs occurred on the same branch which could not be

447

circumscribed by using phylogenetic method. KS estimates for pairwise comparisons

448

(one gene in m branch and the other gene in n branch) were obtained using the

449

Nei-Gojobori method (Nei and Gojobori, 1986) implemented in yn00 program of the

450

PAML package (Yang, 1997). The sum of KS values for all pairwise comparisons were

451

then divided by the number of KS estimates (m*n). Thus, we got a weighted KS value

452

for a duplication event. According to the KS ranges of ρ and σ events indicated by a

453

sequential KS curve of syntenic gene pairs of O. sativa as obtained by using Plant

454

Genome Duplication Database (PGDD) online tools, we roughly defined that

455

duplication node with KS ≥ 1.0 belong to σ event, and KS < 1.0 belong to ρ event

456

(Supplemental Figure 9). KS ≥ 0.7 belonging to γM event and KS < 0.7 belonging to

457

α/βM event were defined based on the circumscription of previous study (D’Hont et al.,

458

2012).

459 460

Syntenic conservation analysis of retained paralogous genes

461

The above procedures allow us to obtain the paralogous genes potential related to

462

each WGDs. To further validate if the duplicate genes are still located on syntenic

463

blocks, we performed collinearity analysis of WGD-derived species. The

464

intra-genome syntenic blocks were detected by using MCScanX based on the default

465

parameters (Wang et al., 2012). We then scored the percentage of paralogous genes

466

with syntenic evidence out of the total genes through phylogenomic timing.

467 468

GO annotation of gene orthogroups and functional enrichment analysis

469

To annotate orthogroups, we used full GO term of A. thaliana genes as the annotation

470

of orthogroups if they have Arabidopsis genes. Otherwise, we searched InterPro

471

domain of protein sequences using InterProScan (Zdobnov and Apweiler, 2001) and

Page 17

472

got the full GO term annotation. Statistical enrichment of GO terms were evaluated by

473

comparing the sample (common retained orthogroups) with background (all annotated

474

orthogroups) based on Fisher’s exact test and adjusted P values according to

475

Benjamini and Hochbery (false-discovery rate) method (Ashburner et al., 2000).

476 477

Retention analysis of transcription factors

478

We used the gene families of transcription factors of A. thaliana downloaded from

479

PlantTFDB 4.0 (Jin et al., 2017) to annotate orthogroups. For each WGD event, the

480

retained orthogroups of each TF family were identified. Owning to the sequence

481

divergence after duplications, some TF families are usually classified into multiple

482

orthogroups. To eliminate the influence of the size of orthogroups in one TF gene

483

family, we calculated a normalized value (retention value, R-value) to reflect the

484

retention pattern of each TF after corresponding WGD. R-value is calculated from the

485

formula as following:

486

 =

487

where

488

TF family after corresponding WGD and

489

the probability of retention of all TF families after corresponding WGD.

                                 



 !" ! #ℎ!% !&' ()#ℎ #*#)!* )* '&+)")+ ,,!#./ * !" ! #ℎ!% !&' )* '&+)")+ ,-

        

,

is the probability of retention of specific

                   

is

490 491

Co-expression network construction and comparison

492

A total of 222 cold-related RNA-seq samples (162 for A. thaliana in Supplemental

493

Table 7 and 60 for V. vinifera in Supplemental Table 8) were downloaded from the

494

NCBI’s Sequence Read Archive database (https://www.ncbi.nlm.nih.gov/sra/). These

495

data sets were then proceeded with quality control, clean reads mapping, FPKM

496

calculation same as in You et al. (2016). After removing none expression genes of

497

each sample (FPKM 0.14 and FPKM 0.36 were selected as cutoffs for A. thaliana and

498

V. vinifera, respectively; detailed method followed You et al. (2016)), the remaining

499

expressed genes were used to calculate co-expression relationships by using Pearson’s

500

correlation coefficient (PCC). Subsequently, Mutual rank (MR; calculated as the

Page 18

501

geometric mean of the PCC rank from gene A to gene B and the rank of gene B to

502

gene A), was used to construct co-expression network (Obayashi and Kinoshita, 2010;

503

You et al., 2016). MR was demonstrated to be more effective to get credible

504

co-expression gene pairs than PCC (Obayashi and Kinoshita, 2010). Therefore, we

505

constructed MR-based co-expression networks for each species. Then, we selected the

506

top 300 co-expressed genes (a threshold used by You et al. (2016) and Obayashi et al.

507

(2018)) in Arabidopsis and in Vitis for network comparison. In such case, the

508

co-expression networks of AthICE1 and AthICE2 (formed by β WGD) in Arabidopsis

509

and their orthologous VviICE in Vitis were used to assess the evolutionary pattern of

510

these key genes in the CBF signaling pathway.

511 512

Supplemental information

513

Supplemental information includes Supplemental Figures 1-9 and Tables 1-8.

514 515

Author contributions

516

Y.J. and S.W. designed the study; S.W., B.H., and Y.J. performed the data analyses

517

and wrote the paper.

518 519

Acknowledgements

520

This research was supported by the Strategic Priority Research Program of the

521

Chinese Academy of Sciences (XDB31000000). We also thank the start-up funding

522

from State Key Laboratory of Systematic and Evolutionary Botany, Institute of

523

Botany, the Chinese Academy of Sciences. No conflict of interest declared.

524 525

Competing interests

526

The authors declare no competing interests.

527

Page 19

528

References

529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569

Adams, K.L., and Wendel, J.F. (2005). Polyploidy and genome evolution in plants. Curr. Opin. Plant Biol. 8:135–141. Alvarez, L.W., Alvarez, W., Asaro, F., and Michel, H.V. (1980). Extraterrestrial cause for Cretaceous-Tertiary extinct. Science 208:1095–1108. Arrigo, N., and Barker, M.S. (2012). Rarely successful polyploids and their legacy in plant genomes. Curr. Opin. Plant Biol. 15:140–146. Ashburner, M., Ball, C., Ja, Botstein, D., Butler, H., Cherry, J., Davis, A., Dolinski, K., Dwight, S., Eppig, J., and Harris, M. (2000). Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature Genet. 25:25–29. Blanc, G., and Wolfe, K.H. (2004). Widespread paleopolyploidy in model plant species inferred from age distributions of duplicate genes. Plant Cell 16:1667– 1678. Bowers, J.E., Chapman, B.A., Rong, J.K., and Paterson, A.H. (2003). Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events. Nature 422:433–438. Brochmann, C., Brysting, A.K., Alsos, I.G., Borgen, L., Grundt, H.H., Scheen, A.C., and Elven, R. (2004). Polyploidy in arctic plants. Biol. J. Linn. Soc. Lond 82:521–536. Cannon, S.B., McKain, M.R., Harkess, A., Nelson, M.N., Dash, S., Deyholos, M.K., Peng, Y., Joyce, B., Stewart, C.N., Stewart Jr., Rolf, M., et al. (2015). Multiple polyploidy events in the early radiation of nodulating and nonnodulating legumes. Mol. Biol. Evol. 32:193–210. Capella-Gutirrez, S., Silla-Martínez, J.M., and Gabaldn, T. (2009). trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25:1972–1973. Carabelli, M., Turchi, L., Ruzza, V., Morelli, G., and Ruberti, I. (2013). Homeodomain-Leucine zipper II family of transcription factors to the limelight. Plant Signal Behav. 8:e25447. Chao, D.Y., Dilkes, B., Luo, H., Douglas, A., Yakubova, E., Lahner, B., and Salt, D.E. (2013). Polyploids exhibit higher potassium uptake and salinity tolerance in Arabidopsis. Science 341:658–659. Cheng, S., van den Bergh, E., Zeng, P., Zhong, X., Xu, J., Liu, X., Hofberger, J., de Bruijn, S., Bhide, A.S., Kuelahoglu, C., et al. (2013). The Tarenaya hassleriana genome provides insight into reproductive trait and genome evolution of Crucifers. Plant Cell 25:2813–2830. Chinnusamy, V., Ohta, M., Kanrar, S., Lee, B.H., Hong, X., Agarwal, M., and Zhu, J.K. (2003). ICE1: a regulator of cold-induced transcriptome and freezing tolerance in Arabidopsis. Genes Dev. 17:1043–1054. Chinnusamy, V., Zhu, J., and Zhu, J.K. (2007). Cold stress regulation of gene expression in plants. Trends Plant Sci. 12:444–451.

Page 20

570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613

Comai, L. (2005). The advantages and disadvantages of being polyploid. Nat. Rev. Genet. 6:836–846. Conant, G.C. (2010). Rapid reorganization of the transcriptional regulatory network after genome duplication in yeast. Proc. R. Soc. B. 277:869–876. Cui, L., Wall, P.K., Leebens-Mack, J.H., Lindsay, B.G., Soltis, D.E., Doyle, J.J., Soltis, P.S., Carlson, J.E., Arumuganathan, K., Barakat, A., et al. (2006). Widespread genome duplications throughout the history of flowering plants. Genome Res. 16:738–749. D'Hont, A., Denoeud, F., Aury, J.M., Baurens, F.C., Carreel, F., Garsmeur, O., Noel, B., Bocs, S., Droc, G., Rouard, M., et al. (2012). The banana (Musa acuminata) genome and the evolution of monocotyledonous plants. Nature 488:213–217. de Mendoza, A., Sebé-Pedrós, A., Šestak, M.S., Matejčić, M., Torruella, G., Domazet-Lošo, T., and Ruiz-Trillo, I. (2013). Transcription factor evolution in eukaryotes and the assembly of the regulatory toolkit in multicellular lineages. Proc. Natl. Acad. Sci. USA 110:E4858–E4866. De Smet, R., and Van de Peer, Y. (2012). Redundancy and rewiring of genetic networks following genome-wide duplication events. Curr. Opin. Plant Biol. 15:168–176. Dietrich, R.A., Richberg, M.H., Schmidt, R., Dean, C., and Dangl, J.L. (1997). A novel zinc finger protein is encoded by the Arabidopsis LSD1 gene and functions as a negative regulator of plant cell death. Cell 88:685–694. Doyle, J.J., Flagel, L.E., Paterson, A.H., Rapp, R.A., Soltis, D.E., Soltis, P.S., and Wendel, J.F. (2008). Evolutionary genetics of genome merger and doubling in plants. Annu. Rev. Genet. 42:443–461. Edger, P.P., Heidel-Fischer, H.M., Bekaert, M., Rota, J., Glöckner, G., Platts, A.E., Heckel, D.G., Der, J.P., and Wafula, E.K. (2015). The butterfly plant arms-race escalated by gene and genome duplications. Proc. Natl. Acad. Sci. USA 112:8362–8366. Estep, M.C., McKain, M.R., Vela Diaz, D., Zhong, J., Hodge, J.G., Hodkinson, T.R., Layton, D.J., Malcomber, S.T., Pasquet, R., and Kellogg, E.A. (2014). Allopolyploidy, diversification, and the Miocene grassland expansion. Proc. Natl. Acad. Sci. USA 111:15149–15154. Fawcett, J.A., Maere, S., and Van de Peer, Y. (2009). Plants with double genomes might have had a better chance to survive the Cretaceous-Tertiary extinction event. Proc. Natl. Acad. Sci. USA 106:5737–5742. Force, A., Lynch, M., Pickett, F.B., Amores, A., Yan, Y.L., and Postlethwait, J. (1999). Preservation of duplicate genes by complementary, degenerative mutations. Genetics 151:1531–1545. Freeling, M. (2009). Bias in plant gene content following different sorts of duplication: tandem, whole-genome, segmental, or by transposition. Annu. Rev. Plant Biol. 60:433–453. Friedman, W.E. (2009). The meaning of Darwin's 'abominable mystery'. Am. J. Bot. 96:5–21.

Page 21

614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657

Fursova, O.V., Pogorelko, G.V., and Tarasov, V.A. (2009). Identification of ICE2, a gene involved in cold acclimation which determines freezing tolerance in Arabidopsis thaliana. Gene 429:98–103. Goodstein, D.M., Shu, S., Howson, R., Neupane, R., Hayes, R.D., Fazo, J., Mitros, T., Dirks, W., Hellsten, U., Putnam, N., et al. (2012). Phytozome: a comparative platform for green plant genomics. Nucleic Acids Res. 40:D1178–D1186. Hegarty, M.J., and Hiscock, S.J. (2008). Genomic clues to the evolutionary success of polyploid plants. Curr. Biol. 18:R435–R444. Heimhofer, U., Hochuli, P.A., Burla, S., Dinis, J.M.L., and Weissert, H. (2005). Timing of Early Cretaceous angiosperm diversification and possible links to major paleoenvironmental change. Geology 33:141. Huang, C.H., Zhang, C., Liu, M., Hu, Y., Gao, T., Qi, J., and Ma, H. (2016). Multiple polyploidization events across Asteraceae with two nested events in the early history revealed by nuclear phylogenomics. Mol. Biol. Evol. 33:2820–2835. Jaillon, O., Aury, J.M., Noel, B., Policriti, A., Clepet, C., Casagrande, A., Choisne, N., Aubourg, S., Vitulo, N., Jubin, C., et al. (2007). The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature 449:463–467. Ji, H., Pardo, J.M., Batelli, G., Van Oosten, M.J., Bressan, R.A., and Li, X. (2013). The Salt Overly Sensitive (SOS) pathway: established and emerging roles. Mol. Plant 6:275–286. Jiao, Y., Lau, O.S., and Deng, X.W. (2007). Light-regulated transcriptional networks in higher plants. Nat. Rev. Genet. 8:217–230. Jiao, Y., Leebens-Mack, J., Ayyampalayam, S., Bowers, J.E., McKain, M.R., McNeal, J., Rolf, M., Ruzicka, D.R., Wafula, E., Wickett, N.J., et al. (2012). A genome triplication associated with early diversification of the core eudicots. Genome Biol. 13:R3. Jiao, Y., Li, J., Tang, H., and Paterson, A.H. (2014). Integrated syntenic and phylogenomic analyses reveal an ancient genome duplication in monocots. Plant Cell 26:2792–2802. Jiao, Y., Wickett, N.J., Ayyampalayam, S., Chanderbali, A.S., Landherr, L., Ralph, P.E., Tomsho, L.P., Hu, Y., Liang, H., Soltis, P.S., et al. (2011). Ancestral polyploidy in seed plants and angiosperms. Nature 473:97–100. Jin, J., Tian, F., Yang, D.C., Meng, Y.Q., Kong, L., Luo, J., and Gao, G. (2017). PlantTFDB 4.0: toward a central hub for transcription factors and regulatory interactions in plants. Nucleic Acids Res. 45:D1040–D1045. Katoh, K., Kuma, K., Toh, H., and Miyata, T. (2005). MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res. 33:511–518. Khan, S.A., Li, M.Z., Wang, S.M., and Yin, H.J. (2018). Revisiting the role of plant transcription factors in the battle against abiotic stress. Int. J. Mol. Sci. 19:1634.

Page 22

658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700

Levin, D.A. (1983). Polyploidy and novelty in flowering plants. Am. Nat. 122: 1–25. Li, L., Stoeckert, C.J., and Roos, D.S. (2003). OrthoMCL: Identification of ortholog groups for eukaryotic genomes. Genome Res. 13:2178–2189. Lu, F., Lipka, A.E., Glaubitz, J., Elshire, R., Cherney, J.H., Casler, M.D., Buckler, E.S., and Costich, D.E. (2013). Switchgrass genomic diversity, ploidy, and evolution: novel insights from a network-based SNP discovery protocol. PLoS Genet. 9:e1003215. Lynch, M., and Conery, J.S. (2000). The evolutionary fate and consequences of duplicate genes. Science 290:1151–1155. Maere, S., De Bodt, S., Raes, J., Casneuf, T., Van Montagu, M., Kuiper, M., and Van de Peer, Y. (2005). Modeling gene and genome duplications in eukaryotes. Proc. Natl. Acad. Sci. USA 102:5454–5459. Mayrose, I., Zhan, S.H., Rothfels, C.J., Magnuson-Ford, K., Barker, M.S., Rieseberg, L.H., and Otto, S.P. (2011). Recently formed polyploid plants diversify at lower rates. Science 333:1257. McKain, M.R., Tang, H., McNeal, J.R., Ayyampalayam, S., Davis, J.I., depamphilis, C.W., Givnish, T.J., Pires, J.C., Stevenson, D.W., and Leebens-Mack, J.H. (2016). A phylogenomic assessment of ancient polyploidy and genome evolution across the Poales. Genome Biol. Evol. 8:1150–1164. Ming, R., VanBuren, R., Wai, C.M., Tang, H., Schatz, M.C., Bowers, J.E., Lyons, E., Wang, M.L., Chen, J., Biggers, E., et al. (2015). The pineapple genome and the evolution of CAM photosynthesis. Nat. Genet. 47:1435–1442. Nei, M., and Gojobori, T. (1986). Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol. 3:418. Nichols, D.J., and Johnson, K.R. (2008). Plants and the K-T boundary (England: Cambridge University Press). Novikova, P.Y., Hohmann, N., and Van de Peer, Y. (2018). Polyploid Arabidopsis species originated around recent glaciation maxima. Curr. Opin. Plant Biol. 42:8–15. Obayashi, T., Aoki, Y., Tadaka, S., Kagaya, Y., and Kinoshita, K. (2018). ATTED-II in 2018: A plant coexpression database based on investigation of the statistical property of the mutual rank index. Plant Cell Physiol. 59:e3(1– 7). Obayashi, T., and Kinoshita, K. (2010). Coexpression landscape in ATTED-II: usage of gene list and gene network for various types of pathways. J. Plant Res. 123:311–319. Ohno, S. (1970). Evolution by gene duplication (Berlin: Springer-Verlag). Otto, S.P., and Whitton, J. (2000). Polyploid Incidence and Evolution. Annu. Rev. Genet. 34:401–437. Parisod, C., and Broennimann, O. (2016). Towards unified hypotheses of the impact of polyploidy on ecological niches. New Phytol. 212:540–542.

Page 23

701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744

Paterson, A.H., Bowers, J.E., Chapman, B.A. (2004). Ancient polyploidization predating divergence of the cereals, and its consequences for comparative genomics. Proc. Natl. Acad. Sci. USA 101:9903–9908. Ramsey, J. (2011). Polyploidy and ecological adaptation in wild yarrow. Proc. Natl Acad. Sci. USA 108:7096–7101. Ren, R., Wang, H., Guo, C., Zhang, N., Zeng, L., Chen, Y., Ma, H., and Qi, J. (2018). Widespread whole genome duplications contribute to genome complexity and species diversity in angiosperms. Mol. Plant 11:414–428. Renny-Byfield, S., and Wendel, J.F. (2014). Doubling down on genomes: polyploidy and crop plants. Am. J. Bot. 101:1711–1725. Rensing, S.A., Lang, D., Zimmer, A.D., Terry, A., Salamov, A., Shapiro, H., Nishiyama, T., Perroud, P., Lindquist, E.A., Kamisugi, Y., et al. (2008). The Physcomitrella Genome Reveals Evolutionary Insights into the Conquest of Land by Plants. Science 319:64–69. Ruberti, I., Sessa, G., Ciolfi, A., Possenti, M., Carabelli, M., and Morelli, G. (2012). Plant adaptation to dynamically changing environment: the shade avoidance response. Biotechnol. Adv. 30:1047–1058. Schmutz, J., Cannon, S.B., Schlueter, J., Ma, J., Mitros, T., Nelson, W., Hyten, D.L., Song, Q., Thelen, J.J., Cheng, J., et al. (2010). Genome sequence of the palaeopolyploid soybean. Nature 463:178–183. Schnable, J.C., Springer, N.M., and Freeling, M. (2011). Differentiation of the maize subgenomes by genome dominance and both ancient and ongoing gene loss. Proc. Natl. Acad. Sci. USA 108:4069–4074. Schulte, P., Alegret, L., Arenillas, I., Arz, J.A., Barton, P.J., Bown, P.R., Bralower, T.J., Christeson, G.L., Claeys, P., Cockell, C.S., et al. (2010). The chicxulub asteroid impact and mass extinction at the Cretaceous-Paleogene boundary. Science 327:1214–1218. Shi, Y., Ding, Y., and Yang, S. (2015). Cold signal transduction and its interplay with phytohormones during cold acclimation. Plant Cell Physiol. 56:7–15. Shi, Y., Ding, Y., and Yang, S. (2018). Molecular regulation of CBF signaling in cold acclimation. Trends Plant Sci. 23:623–637. Singh, R., Ong-Abdullah, M., Low, E.L., Manaf, M.A.A., Rosli, R., Nookiah, R., Ooi, L.C., Ooi, S., Chan, K., Halim, M.A., et al. (2013). Oil palm genome sequence reveals divergence of interfertile species in Old and New Worlds. Nature 500:335–339. Soltis, D.E., Albert, V.A., Leebens-Mack, J., Bell, C.D., Paterson, A.H., Zheng, C., Sankoff, D., Depamphilis, C.W., Wall, P.K., and Soltis, P.S. (2009). Polyploidy and angiosperm diversification. Am. J. Bot. 96:336–348. Soltis, P.S., Marchant, D.B., Van de Peer, Y., and Soltis, D.E. (2015). Polyploidy and genome evolution in plants. Curr. Opin. Genet. Dev. 35:119–125. Soltis, P.S., and Soltis, D.E. (2016). Ancient WGD events as drivers of key innovations in angiosperms. Curr. Opin. Plant Biol. 30:159–165. Stamatakis, A. (2014). RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30:1312–1313.

Page 24

745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787

Stebbins, G.L. (1950). Variation and evolution in plants (New York: Columbia University Press). Stolzer, M., Lai, H., Xu, M., Sathaye, D., Vernot, B., and Durand, D. (2012). Inferring duplications, losses, transfers and incomplete lineage sorting with nonbinary species trees. Bioinformatics 28:i409–i415. Suyama, M., Torrents, D., and Bork, P. (2006). PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Res. 34:W609. Tang, H., Bowers, J.E., Wang, X., and Paterson, A.H. (2010). Angiosperm genome comparisons reveal early polyploidy in the monocot lineage. Proc. Natl. Acad. Sci. USA 107:472–477. te Beest, M., Le Roux, J.J., Richardson, D.M., Brysting, A.K., Suda, J., Kubesova, M., and Pysek, P. (2012). The more the better? The role of polyploidy in facilitating plant invasions. Ann. Bot. 109:19–45. The Tomato Genome Consortium (2012). The tomato genome sequence provides insights into fleshy fruit evolution. Nature 485:635–641. Tuskan, G.A., DiFazio, S., Jansson, S., Bohlmann, J., Grigoriev, I., Hellsten, U., Putnam, N., Ralph, S., Rombauts, S., Salamov, A., et al. (2006). The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science 313:1596–1604. Van de Peer, Y., Maere, S., and Meyer, A. (2009). The evolutionary significance of ancient genome duplications. Nat. Rev. Genet. 10:725–732. Van de Peer, Y., Mizrachi, E., and Marchal, K. (2017). The evolutionary significance of polyploidy. Nat. Rev. Genet. 18:411–424. Vanneste, K., Baele, G., Maere, S., and Van de Peer, Y. (2014). Analysis of 41 plant genomes supports a wave of successful genome duplications in association with the Cretaceous-Paleogene boundary. Genome Res. 24:1334–1347. Vekemans, D., Proost, S., Vanneste, K., Coenen, H., Viaene, T., Ruelens, P., Maere, S., Van de Peer, Y., and Geuten, K. (2012). Gamma paleohexaploidy in the stem lineage of core eudicots: significance for MADS-box gene and species diversification. Mol. Biol. Evol. 29:3793–3806. Visger, C.J., Germain-Aubrey, C.C., Patel, M., Sessa, E.B., Soltis, P.S., and Soltis, D.E. (2016). Niche divergence between diploid and autotetraploid Tolmiea. Am. J. Bot. 103:1396–1406. Wang, J., Sun, P., Li, Y., Liu, Y., Yang, N., Yu, J., Ma, X., Sun, S., Xia, R., Liu, X., et al. (2018). An Overlooked Paleotetraploidization in Cucurbitaceae. Mol. Biol. Evol. 35:16–26. Wang, Y., Tang, H., Debarry, J.D., Tan, X., Li, J., Wang, X., Lee, T.H., Jin, H., Marler, B., Guo, H., et al. (2012). MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 40:e49. Wendel, J.F., Jackson, S.A., Meyers, B.C., and Wing, R.A. (2016). Evolution of plant genome architecture. Genome Biol. 17:37.

Page 25

788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822

William, D.A., Su, Y., Smith, M.R., Lu, M., Baldwin, D.A., and Wagner, D. (2004). Genomic identification of direct target genes of LEAFY. Proc. Natl. Acad. Sci. USA 101:1775–1780. Wood, T.E., Takebayashi, N., Barker, M.S., Mayrose, I., Greenspoon, P.B., and Rieseberg, L.H. (2009). The frequency of polyploid speciation in vascular plants. Proc. Natl. Acad. Sci. USA 106:13875–13879. Yang, Z. (1997). PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 13:555–556. You, Q., Zhang, L., Yi, X., Zhang, K., Yao, D., Zhang, X., Wang, Q., Zhao, X., Ling, Y., Xu, W., et al. (2016). Co-expression network analyses identify functional modules associated with development and stress response in Gossypium arboreum. Sci. Rep. 6:38436. Zachos, J.C., Dickens, G.R., and Zeebe, R.E. (2008). An early Cenozoic perspective on greenhouse warming and carbon-cycle dynamics. Nature 451:279–283. Zdobnov, E.M., and Apweiler, R. (2000). InterProScan--an integration platform for the signature-recognition methods in InterPro. Bioinformatics 17:847–848. Zhao, C., Lang, Z., and Zhu, J.K. (2015). Cold responsive gene transcription becomes more complex. Trends Plant Sci. 20:466–468. Zhao, T., Holmer, R., de Bruijn, S., Angenent, G.C., van den Burg, H.A., and Schranz, M.E. (2017). Phylogenomic synteny network analysis of MADS-Box transcription factor genes reveals lineage-specific transpositions, ancient tandem duplications, and deep positional conservation. Plant Cell 29:1278–1292.

823

Figure Legends

824

Figure 1. Inferring number of gene families surviving duplicates after each WGD

825

in the evolutionary history of land plants. Phylogenetic tree showed the topology

826

and divergence times of 25 plants in this study. The evolutionary relationships of the

827

25 species were based on current accepted topology (Angiosperm Phylogeny

828

Website). Divergence time of each node of the species tree was obtained from the

Page 26

829

TIMETREE website (http://timetree.org/). Well acknowledged whole-genome

830

duplication (circle) and triplication (square) events were positioned onto the branches

831

of the phylogeny. Three periods (~120 Ma, ~66 Ma, <20 Ma) with prolific WGDs

832

were recognized, and are denoted in green, orange, and blue, respectively. The

833

number of gene families with duplicates retention following each WGD are shown

834

around the corresponding circle or square. The proportion of the duplications verified

835

by synteny evidence were generated from WGDs, which were indicated in the dashed

836

circles. A sketch map in the upper left shows the major environmental stresses during

837

the Cretaceous-Paleogene extinction period.

838 839

Figure 2. Retention patterns of stress-related gene families in three periods. (A)

840

Venn diagram showing the shared and specific gene families surviving duplicates

841

after multiple WGDs in certain periods. Numbers represent the number of gene

842

families with gene duplications. Numbers in square bracket indicate number of WGDs

843

with sharing gene families surviving duplicates. (B) The significantly enriched GO

844

terms of stress-related biological processes for the shared gene families retained gene

845

duplicates in three periods. The three columns with different colors are corresponding

846

to the WGDs in three periods as in panel A.

847 848

Figure 3. Biased retention patterns of transcription factor gene families after

849

WGDs. TFs (rows) were clustered based on their retention values, and WGDs

850

(columns) were grouped according to their occurring timing. Gene families to the top

851

of the heatmap were the high retention ones after WGDs, while TFs to the bottom of

852

the heatmap were the low retention ones. Color key on the upper left denotes the

853

retention values of the TFs. The number in each cell of the heatmap represents the

854

retention value of each TF after corresponding WGD. The numbers in parentheses

855

after the TF names represent the total number of orthogroups belonging to the TF

856

gene families.

857

Page 27

858

Figure 4. The duplication pattern of key genes in cold responsive pathway after

859

WGDs around K-Pg boundary. (A) Summary of duplicates retention status of

860

known important gene families in CBF-dependent signaling pathway in angiosperms

861

after 8 WGDs at K-Pg boundary. The ICE, CAMTA and CBF are the key transcription

862

factor gene families, and the SIZ, OST, EIN, and FRY are other related gene families

863

involved in CBF-dependent signaling pathway. “x” denotes no retention and solid

864

dots indicate gene retentions. (B) An illustration of expansion and remodeling of

865

CBF-dependent signaling pathway following WGD in A. thaliana. ICE1, ICE2 were

866

duplicated from β WGD. CBF1, CBF2 and CBF3 were generated by tandem

867

duplications. (C) Phylogeny of the ICE gene family showed the duplications in its

868

evolutionary history. Solid circles indicate duplications occurred in different periods.

869

Numbers on branches show the bootstrap supporting values. Syntenic blocks with ICE

870

genes were placed on the right of the phylogenetic tree.

871 872

Figure 5. Comparison of co-expression networks between duplicates of ICE1 and

873

ICE2 in A. thaliana and the orthologous ICE in V. vinifera. ICE1, ICE2 were

874

generated by β WGD. The red sinewave lines link corresponding orthologous pairs as

875

they clustered in the same orthogroup. Green dashed lines between two nodes indicate

876

positive co-expression relationships. Four genes in the Arabidopsis co-expression

877

network, which have been previously demonstrated function in cold treatment

878

response, were highlighted with annotation information.

879 880

Figure 6. The retention pattern of key genes in shade avoidance pathway after

881

WGDs around K-Pg boundary. (A) Summary of retention status of the PHY and

882

HB gene families after 8 independent WGDs around K-Pg boundary. PHY and HB are

883

the two major gene families in shade avoidance pathway. “x” denotes no retention and

884

solid dots indicate gene retentions. (B) An illustration of expansion and remodeling of

885

shade avoidance pathway following WGD by comparing a predicted ancestral

886

network with the current network in A. thaliana. ATHB2 and HAT1 were generated by

887

β WGD. (C) Phylogeny of the HD-ZIP II gene family showed the duplications in its

Page 28

888

evolutionary history. Solid circles indicate duplications occurred in different periods.

889

Numbers on branches show the bootstrap supporting values. Syntenic blocks with

890

HD-ZIP II genes were placed on the right of the phylogenetic tree.

Cretaceous

Jurassic Asteroid impact

Paleogene-Quaternary 60%

dust reflecting sunlight

Cooling

Darkness

Boechera stricta

75%

β

78%

91%

5,806 αPt

84%

2,727 βG

Arabidopsis thaliana

4,223 αT

Tarenaya hassleriana Populus trichocarpa

93%

1,592 αE

Medicago truncatula

8,769 αG

Glycine max Elaeocarpus grandis

1,575 γ

Duplication

Eudicots

37%

2,185 α

868

Vitis vinifera 55%

68%

2,595 αS

Solanum tuberosum Solanum lycopersicum

Triplication

Setaria viridis 56%

8,337 αPv

54%

66%

1,684 ρ

58%

Panicum virgatum Sorghum bicolor Zea mays Hordeum vulgare Aegilops tauschii

818 τ

1,063 ε ζ

Brachypodium distachyon 1,263 γ

34%

81%

51%

Oryza sativa

4,171

Musa acuminata

M

α/β 65%

1,008

p

2,254

Elaeis guineensis

S

Spirodela polyrhiza

α/β 41%

Amborella trichopoda

36%

200

180

160

140

120

100

80

Selaginella moellendorffii

1,185 αPp

60

Physcomitrella patens 40

20

0 Ma

Monocots

943 σ

1,831 αZ

Setaria italica

A

B

αPt

1,220

β

βG

25

Pp

α

189 49 [8] 195 [7] 493 [6] 1,106 [5]

406

Stress biological process 269

response to cold

αS

response to heat −Log10 (Adjusted P value)

119

198 p

response to osmotic stress

ρ

408 M

α/β

γ

1,255

τ

320

498

αG

αPv

1,335

115

612

110 117

407 213

844

108

αZ

10 37

531

6 8 >10

65

1,386 3,248

response to salt stress

αT

4

response to water deprivation response to wounding

0

1 2 Value

3

β

αPt

βG

αS

ρ

α/β

p

αPp

αT

αG

αPv

αZ

1.89 0.81 1.15 0.81 1.32 0.52 0.99 0.18 1.18 0.63 1.57 1.57 1.1 0.98 0.93 1.19 1.05 0.79 1.48 1.57 0.9 0.63 1.26 0.79 3.15 1.57 0.94 1.57 1.18 0.79 0.52 1.57 2.1 0.79 1.45 0.63 1.26 3.15 1.57 1.57 0.35 3.15 0 0 0 0 0 0 0 0 3.15 0 0 0 0 0 0

3.31 0.53 1.1 0.95 2.18 1.38 0.65 0.97 1.55 0.83 2.07 1.66 0.39 0.78 1.46 0.86 2.76 1.03 1.95 2.76 2.37 0.83 3.31 2.07 8.28 0.83 0 0 0 0 0 0 0 2.07 2.55 0 0 0 0 4.14 0.92 4.14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

3.57 1.3 0.6 1.53 0.47 1.49 1.18 0.53 2.23 2.68 1.12 2.68 1.25 0.56 0.53 1.23 0 0 0 0 0 0 3.57 0 0 0 0.89 2.23 0 4.47 0.74 0 0 0 0.69 0 0 0 0 4.47 0.99 0 0 0 4.47 0 4.47 0 0 0 0 0 0 0 0 0 0

0.95 0.96 1.03 0.92 1 1.19 1.1 1.05 1.12 1.07 0.74 1.07 1 1.15 1.05 0.86 1.19 1.19 1.05 0.79 1.02 0.95 0.95 1.19 1.19 0.95 0.95 1.19 0.89 1.19 0.99 1.19 1.19 1.19 0.92 1.19 0.71 1.19 0.89 0.6 0.93 1.19 0.6 0.71 1.19 1.19 1.19 1.19 1.19 0 0 1.19 0.6 0 1.19 1.19 0

1.89 0.76 1.03 0.61 0.87 0.39 0.87 0.97 1.18 1.66 1.77 1.66 1.05 1.4 0.7 0.82 2.37 1.48 0.97 0.79 1.01 1.89 1.89 2.37 2.37 0.24 1.18 1.18 0.89 1.18 1.18 1.18 0.79 1.77 1.09 0.95 0.47 2.37 1.18 1.18 0.53 1.18 0 0 0 2.37 1.18 0 0 0 0 0 0 0 2.37 1.18 0

1.96 0.84 1.42 1.31 1.03 1.64 1.21 0.58 0.61 1.64 0.82 0.65 0.99 0.92 0.77 0.56 2.18 0.82 0.96 1.64 0.94 0.65 1.31 2.45 3.27 0.98 2.29 1.64 1.23 0.82 0.82 3.27 1.09 0.82 1.01 0.65 1.31 1.64 0 0 0 0 1.64 0.65 0 0 1.64 0 0 0 0 0 0 3.27 0 0 0

2.84 0.53 0.79 0.68 1 1.58 1.74 0.28 2.07 2.37 1.18 1.89 1.43 0.3 0.56 0.82 1.58 1.77 0.56 0.79 1.35 0.95 0 0 4.73 0.47 2.37 1.18 0.59 2.37 0.39 2.37 3.15 2.37 1.46 0 0 0 1.18 2.37 0.53 0 0 0 2.37 0 2.37 2.37 0 0 0 0 0 0 0 0 0

1.15 0.88 1.15 0.86 0.9 0.95 1.06 1.01 1.16 1.43 1.07 1.29 1.1 0.94 1.1 0.79 1.43 0.9 1.1 0.95 1.23 1.15 1.15 1.07 0 1 1.15 1.43 0.72 1.43 1.19 1.43 0.95 1.43 0.77 0.86 0.57 1.43 1.07 1.43 0.8 0.72 1.43 0.86 1.43 0 1.43 0 1.43 0 0 0 0.72 1.43 0 0 0

1.38 0.93 0.77 1.12 0.73 1.92 1.15 1.35 1.44 1.84 1.72 1.38 0.32 1.08 0.68 1.19 1.53 0.86 0.95 1.53 0.98 1.61 0.92 0.57 2.3 1.38 1.84 1.15 0.57 1.72 1.72 0 0 0 0 1.38 0.46 0 0 1.15 1.28 2.3 1.15 0.92 2.3 2.3 0 0 0 2.3 0 0 0 0 0 0 0

1.32 0.9 1.1 1.03 1.21 1.65 1.04 1.16 1.65 0.66 0.82 1.98 1.15 0.41 0.58 0.57 1.1 0.41 1.55 1.65 0.47 0.33 0.66 0.82 3.29 0.66 1.32 0.82 1.65 0.82 0.55 1.65 3.29 1.65 0.51 0 1.32 3.29 0 1.65 0.37 3.29 0 1.98 0 0 1.65 1.65 1.65 3.29 0 3.29 0 0 0 1.65 3.29

1.69 0.84 0.96 1.01 1.06 0.84 0.84 1.09 0.74 1.52 1.47 0.84 1.02 1.26 0.79 0.99 1.12 1.05 1.09 0.56 1.2 1.35 0.67 1.26 1.69 1.18 1.35 0.84 1.05 0.84 1.26 1.69 1.69 1.26 1.04 1.01 0.67 1.69 1.69 0 0.37 0 0.84 0.34 1.69 1.69 0.84 1.69 0.84 1.69 1.69 0 0 0 0 0 0

1.07 0.98 0.96 0.95 1.07 1.07 1.07 1.01 0.87 1.07 1.07 0.96 1.07 1.07 0.88 0.85 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 1.07 0.96 0.85 1.07 0.93 1.07 0.98 1.07 1.07 1.07 1.07 1.28 0.64 1.07 1.07 1.07 0.95 1.07 1.07 1.07 1.07 2.14 0.53 0.53 1.07 1.07 1.07 1.07 1.07 1.07 0 1.07 0

1.07 0.84 1.2 0.99 1.33 0.44 0.81 0.94 1.25 1.2 0.33 1.07 1.05 1.21 0.78 0.83 1.33 1.17 1.33 1.11 1.14 0.67 1.33 0.67 1.33 0.93 1.47 1.33 0.83 1 0.89 1.33 1.33 0.33 0.72 1.07 1.07 1.33 1 0.67 0.89 1.33 0.67 1.33 1.33 1.33 0.67 0.67 1.33 1.33 1.33 0 1.33 0 1.33 1.33 1.33

2.54 0.75 1.13 0.85 1.34 2.12 0.78 1.25 1.32 0.85 1.59 0.85 0.59 1.19 0.5 1.02 1.41 0.53 1.25 2.12 0.6 0.85 1.69 1.06 4.23 0 2.12 1.06 1.06 0 0 2.12 1.41 4.23 1.63 1.69 0.85 0 1.06 0 0.47 0 2.12 0.85 2.12 4.23 0 0 0 0 0 0 0 0 4.23 2.12 0

M

ARF (5) bHLH (62) bZIP (30) C2H2 (35) C3H (19) CO−like (6) ERF (38) G2−like (17) GRAS (16) HD−ZIP (10) HSF (8) LBD (10) MYB (43) NAC (32) Trihelix (17) WRKY (29) CAMTA (3) AP2 (8) MYB_related (17) NF−YB (6) SBP (7) TALE (10) Nin−like (5) ARR−B (4) VOZ (1) TCP (10) GATA (10) NF−YA (4) B3 (8) ZF−HD (4) Dof (12) EIL (2) E2F/DP (3) DBB (4) MIKC_MADS (13) HB−other (5) M−type_MADS (5) CPP (2) YABBY (4) GeBP (2) WOX (9) BES1 (2) GRF (2) NF−YC (5) BBR−BPC (2) NF−X1 (1) RAV (2) LSD (2) SRS (2) LFY (1) FAR1 (2) S1Fa−like (1) Whirly (2) SAP (1) STAT (1) HB−PHD (2) HRT−like (1)

low retention

τ

high retention

γ

A

B

Cold pathway genes

Ortho ID

α Pt

β



3343

ICE

610

CAMTA

x

568

CBF

x

962

SIZ

x

110

OST

x

1242

EIN

x

926

FRY

x

● ● ● ● ● ● ●

Predicted ancestral network

G

S

β

α

x

x

● ● ● ● ●

x

● ● ● ● ●

x

WGD

ICE1

CBF*

α



x

● ● ●



x

x

x x

x

x



x

CAMTAs

ICE2

CBF3

CBF2

CBF1 CBF1 β WGD

COR genes

COR genes

Cold/Freezing tolerance

C

● ● ● ● ● ● ●

Pp

p

Cold stress Cold stress Ca 2+, kinases

Ca 2+, kinases

CAMTA*

● ● ● ● ● ● ●

M

α/β

Current network

Cold stress

ICE*

ρ

tandem duplication

Cold/Freezing tolerance Mtr Medtr3g498695.1

100 92

Gma Glyma.04G200500.1.p

Gma Chr4 47.21-47.45Mb Gma Chr6 13.78-13.53Mb

Gma Glyma.06G165000.1.p 100

Stu PGSC0003DMP400050145 Sly Solyc06g068870.2.1

100 100 70

Ptr Chr12 12.93-13.10Mb Ptr Chr15 12.26-12.41Mb

Ptr Potri.012G106000.1 Ptr Potri.015G105200.1 Tha XP 010536585.1

93

Vvi Chr17 0.02-0.38Mb

Vvi GSVIVT01008637001

Vvi Chr14 24.65-25.19Mb

Vvi GSVIVT01032998001 Egr Eucgr.G01938.1.p

80 100

82

Ath Chr1 4.37-4.43Mb

Ath AT1G12860.1 (ICE2) Bst Bostr.13671s0164.1.p

100

98

Tha XP 010528979.1 Tha XP 010554141.1

66 100

Tha Sca11 1.11-1.17Mb Tha Sca4 4.55-4.51Mb

Bst Bostr.0556s0344.1.p Ath AT3G26744.1 (ICE1)

Ath Chr3 9.87-9.71Mb

AT4G37190 AT2G25100 AT4G04670

AthICE1

AT3G48470

AT3G26744

AT5G13000

GSVIVG01012726001 AT1G20910

CAAX protease self-immunity protein

GSVIVG01007540001

Cold shock protein 2 (CSP2)

GSVIVG01007560001

AT3G49650 AT3G19080

AT2G27470

AT5G02120 AT1G18090

AT4G38680

AT5G17270

AT1G18100 AT2G20290

AT1G75690AT1G23880 AT1G63650 AT1G69523 AT3G17970

AT4G24790

AT4G10450

GSVIVG01013439001

AT2G32415

AT4G35290 GSVIVG01013400001

AT4G25630

AT4G11175 AT1G66760

AT5G52950 AT3G56070AT4G29020

AT1G26761

AT4G34730 AT2G40430

AT2G35260

AT5G15430 AT5G37130

GSVIVG01019319001

AT5G25590

AT3G09890

GSVIVG01031124001

AT1G01970

GSVIVG01000118001

AT2G19740 GSVIVG01000109001

AT1G15510 AT5G56710 GSVIVG01038569001

Response regulator 15 (ARR15)

AT3G49890 AT5G53080

AT1G12860

AthICE2

AT5G57030 AT1G33170

AT5G55580

GSVIVG01032998001

VviICE

AT5G27680

AT5G58370 AT1G20870

AT5G03250 AT4G13650

GSVIVG01016664001

AT3G10840

AT2G20830 AT4G27510

AT1G42540

AT3G57200

AT5G42070

AT1G74890

AT4G40000

AT3G44750

AT4G38970

AT5G09320

AT1G12920 AT5G48760

GSVIVG01010722001

AT1G72320

Fructose-bisphosphate aldolase 2 (FBA2)

AT3G61770

GSVIVG01003549001

GSVIVG01015169001

GSVIVG01008020001

GSVIVG01037780001

A

Ortho ID

B

Shade avoidance genes

480

PHY

623

HB

β

x



α

G

Pt

● ●

Predicted ancestral network

α

● ●

● ●

WGD

100

Gma Glyma.01G207300.1.p Gma Glyma.11G035900.1.p

87

Mtr Medtr5g013010.1

99 94

Gma Glyma.05G062700.1.p Gma Glyma.17G144700.1.p

97

Mtr Medtr4g100550.1 100

● ●

● ●

● ●



phyE

Pfr PIF TFs

HFR1

ATHB4 genes ATHB2 HAT1 other

Shade avoidance

C

α

Pr

Pfr

ATHB4 genes ATHB* other

p

x

phyB/D

PIF TFs HFR1

Shade avoidance Gma Chr1 53.83-54.04Mb

Gma Chr1 53.86-54.01Mb Gma Chr11 2.64-2.47Mb

Gma Chr5 6.26-5.61Mb

Gma Chr5 5.77-6.18Mb Gma Chr17 11.65-11.94Mb

Stu PGSC0003DMP400045581

Sly Chr8 59.00-59.53Mb

Sly Solyc08g078300.2.1

91

Stu PGSC0003DMP400041776 100

Sly Solyc06g060830.2.1

100

Ath AT5G47370.1 Bst Bostr.14419s0034.1.p

100

Ath AT4G17460.1 (HAT1)

100

100

Sly Chr6 35.34-35.16Mb Ath Chr5 19.17-19.29Mb Ath Chr4 9.76-9.68Mb

Bst Bostr.30275s0201.1 Tha XP_010529979.1

64 100

Ath AT4G16780.1 (ATHB2) Bst Bostr.30275s0139.1.p

100 63

Tha XP_010531658.1 Tha XP_010538408.1

100

Ptr Potri.001G155100.1 Ptr Potri.003G079800.1

Pp

α/β

Low R/FR light

phyE

Pr

M

ρ

Current network

Low R/FR light phy*

S

β

Ptr Chr1 12.66-12.98Mb Ptr Chr3 10.65-10.93Mb