Genome Sequences Provide Insights into the Reticulate Origin and Unique Traits of Woody Bamboos

Genome Sequences Provide Insights into the Reticulate Origin and Unique Traits of Woody Bamboos

Please cite this article in press as: Guo et al., Genome Sequences Provide Insights into the Reticulate Origin and Unique Traits of Woody Bamboos, Mol...

2MB Sizes 0 Downloads 32 Views

Please cite this article in press as: Guo et al., Genome Sequences Provide Insights into the Reticulate Origin and Unique Traits of Woody Bamboos, Molecular Plant (2019), https://doi.org/10.1016/j.molp.2019.05.009

Molecular Plant Research Article

Genome Sequences Provide Insights into the Reticulate Origin and Unique Traits of Woody Bamboos Zhen-Hua Guo1,11, Peng-Fei Ma1,11, Guo-Qian Yang1,11, Jin-Yong Hu2,11, Yun-Long Liu1,11, En-Hua Xia3, Mi-Cai Zhong2,4, Lei Zhao1,4, Gui-Ling Sun5, Yu-Xing Xu1,4, You-Jie Zhao6, Yi-Chi Zhang1, Yu-Xiao Zhang7, Xue-Mei Zhang1, Meng-Yuan Zhou1, Ying Guo1, Cen Guo1,4, Jing-Xia Liu1,4, Xia-Ying Ye1, Yun-Mei Chen1, Yang Yang1, Bin Han8, Choun-Sea Lin9,*, Ying Lu10,* and De-Zhu Li1,* 1

Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan 650201, China

2

Key Laboratory for Plant Diversity and Biogeography in East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan 650201, China

3

State Key Laboratory of Tea Plant Biology and Utilization, Anhui Agricultural University, Hefei, Anhui 230036, China

4

Kunming College of Life Science, University of Chinese Academy of Sciences, Kunming, Yunnan 650201, China

5

Institute of Plant Stress Biology, State Key Laboratory of Cotton Biology, Department of Biology, Henan University, Kaifeng, Henan 475001, China

6

College of Big Data and Intelligent Engineering, Southwest Forestry University, Kunming, Yunnan 650224, China

7

Yunnan Academy of Biodiversity, Southwest Forestry University, Kunming, Yunnan 650224, China

8

National Center for Gene Research, Shanghai Institute of Plant Physiology and Ecology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200032, China

9

Agricultural Biotechnology Research Center, Academia Sinica, Taipei, Taiwan, ROC

10

College of Fisheries and Life Science, Shanghai Ocean University, Shanghai 201306, China

11These

authors contributed equally to this article.

*Correspondence: Choun-Sea Lin ([email protected]), Ying Lu ([email protected]), De-Zhu Li ([email protected]) https://doi.org/10.1016/j.molp.2019.05.009

ABSTRACT Polyploidization is a major driver of speciation and its importance to plant evolution has been well recognized. Bamboos comprise one diploid herbaceous and three polyploid woody lineages, and are members of the only major subfamily in grasses that diversified in forests, with the woody members having a tree-like lignified culm. In this study, we generated four draft genome assemblies of major bamboo lineages with three different ploidy levels (diploid, tetraploid, and hexaploid). We also constructed a high-density genetic linkage map for a hexaploid species of bamboo, and used a linkage-map-based strategy for genome assembly and identification of subgenomes in polyploids. Further phylogenomic analyses using a large dataset of syntenic genes with expected copies based on ploidy levels revealed that woody bamboos originated subsequent to the divergence of the herbaceous bamboo lineage, and experienced complex reticulate evolution through three independent allopolyploid events involving four extinct diploid ancestors. A shared but distinct subgenome was identified in all polyploid forms, and the progenitor of this subgenome could have been critical in ancient polyploidizations and the origin of woody bamboos. Important genetic clues to the unique flowering behavior and woody trait in bamboos were also found. Taken together, our study provides significant insights into ancient reticulate evolution at the subgenome level in the absence of extant donor species, and offers a potential model scenario for broad-scale study of angiosperm origination by allopolyploidization. Key words: Bambusoideae, polyploidization, comparative genomics, subgenome evolution, flowering Guo Z.-H., Ma P.-F., Yang G.-Q., Hu J.-Y., Liu Y.-L., Xia E.-H., Zhong M.-C., Zhao L., Sun G.-L., Xu Y.-X., Zhao Y.-J., Zhang Y.-C., Zhang Y.-X., Zhang X.-M., Zhou M.-Y., Guo Y., Guo C., Liu J.-X., Ye X.-Y., Chen Y.-M., Yang Y., Han B., Lin C.-S., Lu Y., and Li D.-Z. (2019). Genome Sequences Provide Insights into the Reticulate Origin and Unique Traits of Woody Bamboos. Mol. Plant. --, 1–13. Published by the Molecular Plant Shanghai Editorial Office in association with Cell Press, an imprint of Elsevier Inc., on behalf of CSPB and IPPE, SIBS, CAS.

Molecular Plant --, 1–13, -- 2019 ª The Author 2019.

1

Please cite this article in press as: Guo et al., Genome Sequences Provide Insights into the Reticulate Origin and Unique Traits of Woody Bamboos, Molecular Plant (2019), https://doi.org/10.1016/j.molp.2019.05.009

Molecular Plant

INTRODUCTION Bamboos (Bambusoideae) are among the fastest-growing plants on earth, and these tree-like woody grasses rarely flower with flowering cycles of up to 120 years (Janzen, 1976). The Bambusoideae, along with the Pooideae (e.g., Brachypodium and wheat) and Oryzoideae (e.g., rice), constitute the BOP clade of the grass family (Poaceae), and is the only major grass lineage that diversified in mesic-to-wet forests (Soreng et al., 2017). The nearly 1500 described bamboo species are native to all continents except Antarctica and Europe. They fall into four monophyletic lineages with different ploidy levels according to the number of basic sets of bamboo chromosomes, 12 (Soderstrom, 1981; Hilu, 2004), which was recently reviewed by Zhou et al. (2017a): herbaceous bamboos (2n = 20–24, but see Judziewicz et al., 1999) are diploids, temperate woody bamboos (TWB, 2n = 46–48) and neotropical woody bamboos (2n = 40–48) are tetraploids, while palaeotropical woody bamboos (2n = 70–72) are hexaploids (Chen et al., 2003; Bamboo Phylogeny Group, 2012; Clark et al., 2015) (Figure 1). Woody bamboos have long been thought as being polyploids, while Triplett et al. (2014) proposed six ancestral genome donors for extant woody bamboos based on analysis of three nuclear genes. However, whole-genome sequencing of bamboo is limited to moso bamboo (Phyllostachys edulis, Ped) (2n = 4x = 48) (Peng et al., 2013; Zhao et al., 2018), which represents one of the four lineages, TWB. Although a previous study revealed the tetraploid origin of moso bamboo by an ancient whole-genome duplication (WGD) event, a comprehensive genomic study of the backbone species of Bambusoideae is still lacking, thereby hindering the extensive and intensive understanding of the evolution and functional traits of woody bamboos. Bamboos are economically important as a pillar to forest industries in many Asian and some African and Latin American countries, with an annual economic value in China alone of US $34 billion (http://www.forestry.gov.cn/main/62/index.html). They are also of great ecological significance as the main food of the giant panda and other wildlife, and are culturally important in eastern and southeastern Asia (Clark et al., 2015). Nevertheless, both the origins of the woody bamboos and their relationships to herbaceous species remain uncertain. The woody lineages were shown to be monophyletic based on transcriptome data (Wysocki et al., 2016) but paraphyletic based on chloroplast genomes (Sungkaew et al., 2009; Wysocki et al., 2015). Understanding the evolution of bamboos and their unique biological traits is of great scientific, social, and economic importance. Polyploidy is a major driver of plant evolution (Ferna´ndezMazuecos and Glover, 2017; Van de Peer et al., 2017) and is commonly present in bamboos, which have diverse polyploid levels; however, the exact role polyploidization played in the origin and evolution of bamboos has not been clearly elucidated. In this study, we sequenced four species of bamboos representing major lineages and three ploidy levels (Figure 1). We reconstructed the reticulate evolutionary history of polyploidy in the major bamboo clades and offer a demonstrated case from wild plants that the tree of life can be a ‘‘web of life.’’ Our results also provide important insights 2

Molecular Plant --, 1–13, -- 2019 ª The Author 2019.

Reticulate Origin of Woody Bamboos into evolution of the woody trait and flowering behavior in bamboos.

RESULTS AND DISCUSSION Genome Sequencing, Assembly, and Annotation We sequenced the genomes of four species with three different ploidy levels (diploid, tetraploid, and hexaploid) as described above: the two herbaceous species Olyra latifolia (Ola) and Raddia guianensis (Rgu) (2n = 2x = 22), and the two woody species neotropical Guadua angustifolia (Gan, 2n = 4x = 46) and palaeotropical Bonia amplexicaulis (Bam, 2n = 6x = 72). Adopting a whole-genome shotgun sequencing strategy, we generated a total of 520 Gb of clean Illumina reads from libraries with different insert sizes, with estimated genome coverages of around 178-, 229-, 50.8-, and 210-fold for B. amplexicaulis, O. latifolia, G. angustifolia, and R. guianensis, respectively (Supplemental Table 1). Two high-quality assemblies were constructed for the B. amplexicaulis (848 Mb of scaffolds with an N50 length of 1.76 Mb) and O. latifolia (646 Mb of scaffolds with an N50 length of 1.26 Mb) assemblies, and two draft assemblies were constructed for G. angustifolia (1614 Mb of scaffolds with an N50 length around 4.1 kb) and R. guianensis (626 Mb of scaffolds with an N50 length around 12.1 kb) (Supplemental Tables 2 and 3). Assembly sizes are all close to the estimations based on both flow cytometry and 17-mer analyses, which indicates that the four genomes have over 90% coverage (Supplemental Table 4). The distribution of K-mer frequency revealed the highest heterozygous rate (a bimodal curve) in G. angustifolia, which could have caused its lower assembly quality, while the O. latifolia assembly had the highest quality, probably because of its low heterozygous rate (normal curve; Supplemental Figure 1). The completeness of the assemblies, assessed by searches for the conserved core eukaryotic genes derived from the Benchmarking Universal Single-Copy Orthologs (BUSCO) plant database (Sima˜o et al., 2015) (Supplemental Table 5), indicated similar genome coverages ranging from 84.3% to 94.5%. Interestingly, the genome size of B. amplexicaulis is just 1.3-fold that of the diploid O. latifolia and unexpectedly much smaller than the tetraploid bamboo genome. A de novo annotation of repetitive sequences was performed on the four genomes. In most cases, the content of transposable elements (TEs) increased along with genome size (Supplemental Table 6), which reinforces the hypothesis that insertion of TEs is the chief mechanistic driver of genome expansion during evolution (Wicker et al., 2007). Long terminal repeat (LTR) retrotransposons, representing over 50% of all TEs, are the predominant elements in bamboo genomes, especially in G. angustifolia and P. edulis (Peng et al., 2013), which have larger genome sizes (Supplemental Table 6). To create consensus protein-coding gene sets, we conducted a combined strategy of ab initio gene prediction with alignments of homologs and RNAsequencing data on the repeat-masked genomes. Based on the annotations there are 47 056 predicted genes in B. amplexicaulis, 36 578 in O. latifolia, 38 575 in G. angustifolia, and 24 275 in R. guianensis (Supplemental Table 7). These predicted genes were used for genome analyses, along with the published gene models of P. edulis (Peng et al., 2013). Over 90% of proteincoding genes were supported by the presence of conserved

Please cite this article in press as: Guo et al., Genome Sequences Provide Insights into the Reticulate Origin and Unique Traits of Woody Bamboos, Molecular Plant (2019), https://doi.org/10.1016/j.molp.2019.05.009

Molecular Plant

Reticulate Origin of Woody Bamboos

Figure 1. Evolutionary History, Geographic Distribution, and Images of the Sequenced Bamboos. Filled circle, triangle, square, and star represent four major clades of the bamboo subfamily Bambusoideae and their corresponding species, Bonia amplexicaulis (Bam, 2n = 6x = 72), Guadua angustifolia (Gan, 2n = 4x = 46), Phyllostachys edulis (Ped, 2n = 4x = 48), Olyra latifolia (Ola, 2n = 2x = 22), and Raddia guianensis (Rgu, 2n = 2x = 22). The blue areas indicate the distribution of bamboos around the world. Height of the bamboo plants is indicated by the scale bars.

functional domains (Supplemental Table 8). In addition, microRNA (miRNA), transfer RNA (tRNA), ribosomal RNA (rRNA), small nucleolar RNA (snoRNA), and small nuclear RNA (snRNA) genes were predicted using de novo and/or homology search methods (Supplemental Table 9). We identified 21 033 gene families among the genomes of the five bamboos, rice, and Brachypodium, using OrthoMCL (Li et al., 2003). Quantitative comparison of single-copy genes and gene families carrying two, three, four, and five members indicated that the genomes of the diploids O. latifolia and R. guianensis had the most single-copy genes, while the genomes of the tetraploids G. angustifolia and P. edulis had more two-member gene families. The hexaploid B. amplexicaulis exhibited larger numbers of both two- and three-member gene families (Supplemental Figure 2 and Supplemental Table 10). The larger number of duplicate genes in some bamboo genomes reflects the occurrence of one or more polyploidization events. 6182 gene families are shared by all five bamboo genomes (Supplemental Figure 3), and woody bamboos have more unique families than do herbaceous bamboos. Estimated variance in family sizes suggests that these woody bamboos carry more expanded gene families, probably because of their polyploid histories (Supplemental Figure 4). Functional annotation indicated that families with domains such as cellulose synthase, xyloglucan fucosyltransferase, and wallassociated receptor kinase are remarkably expanded in woody bamboos (Supplemental Data 1), consistent with greater active cell-wall biosynthesis (Perrin et al., 1999; Cosgrove, 2005; ic  et al., 2018). Verbanc

A Genetic Map for Dendrocalamus latiflorus D. latiflorus is a palaeotropical woody bamboo with the same chromosome number (2n = 6x = 72) as B. amplexicaulis (Chen et al., 2003), and both species belong to the core Bambusinae (Zhou et al., 2017b). To optimize genome assembly, we

genotyped an inbred population (S1) of 190 progenies from D. latiflorus with 3627 ddRAD markers to construct the first high-density genetic map of bamboo (Supplemental Data 2). The markers yielded a genetic map with 36 linkage groups (LGs), corresponding to the 36 chromosome pairs in this species. The map covers a total of 3113 centimorgans (cM) (93.29% of the genome) on chromosomes ranging from 4.91 to 131.69 cM in size (Figure 2 and Supplemental Table 11). Extensive synteny was observed between D. latiflorus and rice; 720 (19.9%) markers on the bamboo map have homologs in the rice genome (Supplemental Data 3), and 681 of them were in highly collinear regions, which were distributed over 95% of the bamboo genome regions covered by the current map. A comparative analysis revealed an apparent 1:3 ratio between rice chromosomes and D. latiflorus LGs (Figure 2), consistent with the number of chromosome pairs in these species (12 and 36, respectively), and supporting a hexaploid origin of D. latiflorus. Two sets of LGs maintained a high level of collinearity whereas the third LG was frequently interrupted by chromosomal inversions, translocations, fusions, and fissions, implicating a possible distinct origin of this subgenome (Figure 2 and Supplemental Data 3). We aligned and oriented the scaffolds of the B. amplexicaulis assembly to the linkage map of D. latiflorus. A total of 698 scaffolds (555–16 773 990 bp) were assigned to 1996 distinct markers, accounting for 641.90 Mb (75.6%) of the assembled B. amplexicaulis sequence (Supplemental Table 12). The orientations of 244 scaffolds covering 481.40 Mb (72.0% of anchored sequence) matched the order of markers on the linkage map (Supplemental Figure 5 and Supplemental Data 4), indicating high congruence between the assembled sequence and genetic data. The discordance in alignment may reflect sequence and structural variations between the two species. Finally we assembled the anchored scaffolds into 36 pseudochromosomes and ordered them on the basis of genetic markers and distance (Supplemental Data 4). Using this Molecular Plant --, 1–13, -- 2019 ª The Author 2019.

3

Please cite this article in press as: Guo et al., Genome Sequences Provide Insights into the Reticulate Origin and Unique Traits of Woody Bamboos, Molecular Plant (2019), https://doi.org/10.1016/j.molp.2019.05.009

Molecular Plant

Reticulate Origin of Woody Bamboos

Rice chromosomes ( n=12)

Chr1

Chr3

Chr2

Aa

10 Mbp

Chr4

Ca Ab

Chr5

Chr6

Ea

Fa

Ba

A

Cb

B

Ac

D

C

Bb

D

E

F Eb

Cc

Fb

Ad

D1

D2

D3

D4

D5

D6

D7

D8

D9

D10

D11

D12

D13 D14

Fb

E

E

D A

B

A

C

B

D16

D17

D

F

D

F

C

Lb

Ea

Cb

Ad

D18

Cc

Eb

Bb

Ac

D15

Ba

Fa

Aa

50 cM

Dendrocalamus latiflorus LGs ( n=36 ) D19

D20 D21

D22

D23

D24

Gb

D25

D26

D27

D28

D29

D30

G

G

D32

D33

I

J

I

Hb

H

D35

K

L

Ca

Kc

K

Kb

L

Ka

G

Ga

H

Ha

I

I

Chr7

J

J

La

Kb

K

L

Lb

Kc

Hb

Chr9

Gb

D36

La

J

Ha

D34 Ka

J

I Ab Ga

D31

Ba

Chr10

Chr8

Chr11

Chr12

Figure 2. Genetic Linkage Map of Hexaploid Bamboo and Syntenic Relationship with Rice. The genetic linkage map was constructed using restriction-site-associated DNA markers in Dendrocalamus latiflorus, a species closely related to sequenced Bonia amplexicaulis. A total of 36 generated linkage groups (LGs) corresponded to the expected number of 36 haploid chromosome pairs. The black dashes on the left represent the locations of the markers on each linkage group while the red dashes on the right represent markers with synteny to the rice chromosome. Twelve rice chromosomes are numbered Chr1 to Chr12 and 36 bamboo LGs are numbered (D1–D36) according to synteny with rice. Each rice chromosome was labeled as a genomic block (A–L), and segments from a genomic block are labeled with lowercase letters (a, b, c, d) (e.g., ‘‘Aa’’ means the ‘‘a’’ segment from genomic block ‘‘A’’). Rearrangement events for bamboo LGs are indicated by labeled rice genomic segments, and the black arrows indicate inversions. The color coding of LGs represents different subgenomes, A (green), B (blue), and C (red) (for detailed information on subgenomes, see Figure 4).

linkage-map-based strategy, we optimized and confirmed the assembly quality of the hexaploid B. amplexicaulis genome, which enabled further analyses of polyploid bamboos at the subgenome level.

Polyploidization and Reticulate Evolution of Bamboos Genome-wide gene synteny analyses between rice and bamboo showed a ratio of 1:1 in O. latifolia and R. guianensis, 1:2 4

Molecular Plant --, 1–13, -- 2019 ª The Author 2019.

in G. angustifolia and P. edulis (Peng et al., 2013), and 1:3 in B. amplexicaulis (Figure 3 and Supplemental Figure 6) as well as a 1:3 ratio between O. latifolia and B. amplexicaulis (Supplemental Figure 7), which confirmed the diploidy of herbaceous bamboos and the polyploidy of woody bamboos, i.e., tetraploidy in G. angustifolia and P. edulis, and hexaploidy in B. amplexicaulis. Further analysis of distributions of sequence divergence (4DTv) values for syntenic duplicate genes revealed one peak each for G. angustifolia and P. edulis (Supplemental

Please cite this article in press as: Guo et al., Genome Sequences Provide Insights into the Reticulate Origin and Unique Traits of Woody Bamboos, Molecular Plant (2019), https://doi.org/10.1016/j.molp.2019.05.009

Molecular Plant

Reticulate Origin of Woody Bamboos Chr1

Chr2

Chr3 Chr4

Chr5 Chr6 Chr7

Chr8

Chr9 Chr10 Chr11 Chr12

0

500

1000

1500

2000

2500

3000

3500

4000

4500

rice

Bam

Ola

5000

Figure 3. Whole-Genome Gene Collinearity between the Bamboo and Rice Genomes. The referenced rice gene sets are arranged according to their gene order in the chromosome. The ordinal number of the rice genes is indicated by the scale bar on the left. The syntenic genes in the bamboos are shown as red (Bonia amplexicaulis, Bam) and blue (Olyra latifolia, Ola) blocks.

Figure 8), consistent with their ploidy levels. In contrast, there was only one peak for B. amplexicaulis instead of the two peaks expected because it is a hexaploid, which indicates that 4DTv plots in general could be ambiguous in evaluating multiple rounds of polyploidization and that the phylogenomic approach can be more insightful (Thomas et al., 2017). To address the genome evolution of polyploid bamboos more deeply, we took a phylogenomic approach based on syntenic genes with the ‘‘perfectly’’ expected number of copies based on subgenome characterization and ploidy levels (Supplemental Figure 9). 187 ‘‘perfect-copy’’ syntenic blocks containing 769 gene sets calibrated by genetic mapping of D. latiflorus were identified from O. latifolia, P. edulis, and B. amplexicaulis (Supplemental Table 13 and Supplemental Data 5). Genes from each syntenic block were concatenated for phylogenetic analysis to identify the subgenomes. Comparative analyses were performed on 155 highly supported phylogenetic trees deduced from these blocks (Supplemental Data 6). An overwhelming 94% of topologies supported a common subgenome shared by hexaploid B. amplexicaulis and tetraploid P. edulis (Figure 4A and Supplemental Figure 10), designated as subgenome C, and the remaining subgenome in P. edulis was designated as D. Another two subgenomes in B. amplexicaulis have different origins from subgenome C, as

well as being different from each other (see Supplemental Information). To distinguish the subgenomes further, we used the other tetraploid G. angustifolia to identify another shared subgenome of B. amplexicaulis, designated as subgenome B, supported by 80% of topologies (Figure 4B and Supplemental Figure 11; Supplemental Data 7 and 8), and the remaining subgenome in B. amplexicaulis was designated as A. These woody bamboo subgenomes formed a monophyletic group sister to the herbaceous diploid O. latifolia genome (Triplett et al., 2014). They exhibited a very similar level of genetic divergence with nearly overlapping Ks distribution peaks of around 0.15 (Supplemental Figures 12 and 13) based on 636 gene sets from syntenic blocks. This also could explain the unexpected single peak in 4DTv values observed for B. amplexicaulis (Supplemental Figure 8). All distributions of Ks between woody bamboo subgenomes and the herbaceous genome showed peaks at a larger Ks value around 0.25 (Supplemental Figure 13), reflecting the more recent divergence of woody bamboo subgenomes. Identifying subgenomes provided an unprecedented opportunity to reconstruct the evolutionary history of polyploid woody bamboos with a long-extinct direct diploid progenitor species. We derived 61 ‘‘perfect-copy’’ syntenic genes common to all sampled woody bamboos for phylogenetic inference and divergence time estimates (Supplemental Data 9). In the resulting tree, the progenitor genomes of woody bamboos split from the herbaceous bamboo lineage ca. 42 million years ago (Ma) and rapidly radiated into four distinct lineages, giving rise to ancestral subgenomes A to D with A + B and C + D sisters to each other (Figure 4C and Supplemental Figure 14). Dating results showed that this radiation occurred from 33 to 31 Ma in the early Oligocene after the end of abrupt global temperate decline around the Eocene-Oligocene boundary (Zachos et al., 2001). Having undergone approximately 10 million years of independent evolution, the ancestral B/C and C/D genomes hybridized to generate two allotetraploid lineages at ca. 22 Ma, one represented by the neotropical woody bamboo clade (G. angustifolia, BBCC) in the tropics and another by the TWB clade in temperate or montane areas (P. edulis, CCDD) (Figure 4C). These events may have been induced by the rapid climatic fluctuation at the Oligocene– Miocene boundary. A third hybridization occurred between the ancestral A and B + C genomes around 19 Ma in the early Miocene, indicated by the nearly simultaneous divergence between the B and C subgenomes in tropical woody bamboos, leading to the origin of the allohexaploid palaeotropical woody bamboo clade (B. amplexicaulis, AABBCC). This hybridization could have been followed by the onset and intensification of the East Asian Monsoon (Tada et al., 2016) and may have facilitated the successful adaptation of woody bamboos to the mesic-to-wet forests. However, it should be noted that there is some degree of uncertainty in the estimated divergence times (see age ranges in Supplemental Figure 14), and the correlation of divergence and hybridization events to major climate upheaval needs more investigation. The C subgenome is shared by all the extant woody bamboo lineages and, based on the genetic map, has undergone more intrachromosome inversions and rearrangements relative to subgenomes A and B (Figure 2). Therefore, the progenitor of subgenome C could be distinct, and was critical for the Molecular Plant --, 1–13, -- 2019 ª The Author 2019.

5

Please cite this article in press as: Guo et al., Genome Sequences Provide Insights into the Reticulate Origin and Unique Traits of Woody Bamboos, Molecular Plant (2019), https://doi.org/10.1016/j.molp.2019.05.009

Molecular Plant A

Reticulate Origin of Woody Bamboos

BamC

origin and survival of polyploids harboring subgenome C during evolutionary history.

PedC

GanC

Genomic Basis Underlying the Woody Trait of Bamboos

PedD

BamA

BamB

BamB

BamA

GanB

Ola

Ola

rice

rice

B

BamC

80%

94%

Time (Ma)

C

A H

~33.4 B

C

Oligocene

~31.8

Eocene

~42.1

~31.1 ~22

D

40

30

20 Miocene

BC CD ~19.7 ABC ~19.4

10

Plio. Plt.

Ola

(HB)

Bam

(PWB)

Gan

(NWB)

Ped

(TWB)

0

4

8 12

0

Temperature (°C)

Figure 4. Evolutionary Model of the Polyploidizations in Woody Bamboos. (A) Identification of the subgenomes in Bonia amplexicaulis (Bam) and Phyllostachys edulis (Ped). The phylogenetic tree shows the predominant topology when trees of syntenic genes are combined, and identifies a subgenome shared by Bam and Ped, defined as subgenome C (pink). This BamC-PedC clade is supported by 94% of 155 analyzed syntenic blocks (pie chart). The other subgenome of Ped is defined as subgenome D (PedD). Another two subgenomes of Bam are defined as subgenomes A (BamA) and B (BamB). The dashed box indicates that no predominant relationship among the subgenomes PedD, BamB, and BamA can be concluded from this analysis. Olyra latifolia (Ola) is used as the herbaceous lineage control. The rice genome is used as an outgroup. (B) Identification of the subgenomes in B. amplexicaulis and Guadua angustifolia (Gan). The predominant topology supported by 80% of 145 phylogenetic trees (pie chart) indicates that subgenomes C (blue) and B (pink) of Bam are also shared by Gan. The corresponding subgenomes of Gan are defined as subgenomes C (GanC) and B (GanB). (C) Origins of herbaceous bamboos (H) and the diploid ancestor of woody bamboos that diverged 42.1 million years ago (Ma). After splitting from the herbaceous bamboo (HB, Ola) lineage, diversification of A, B, C, and D

6

Molecular Plant --, 1–13, -- 2019 ª The Author 2019.

Many bamboo species are well known as woody and tree-like due to their tall and highly lignified culms. We compared two woody species (P. edulis and B. amplexicaulis) and two herbaceous species (O. latifolia and R. guianensis) to explore the underlying genetic basis of the woody trait, focusing on cellulose and lignin pathways (Mcfarlane et al., 2014; Barros et al., 2015). Using rice as a reference, we detected 27 and 26 cellulose synthase (CesA) genes in B. amplexicaulis and P. edulis, respectively, but only 12 and 10 in O. latifolia and R. guianensis, respectively. Likewise, 55, 51, 40, and 35 cellulose synthase-like (Csl) genes were identified in B. amplexicaulis, P. edulis, O. latifolia, and R. guianensis, respectively (Supplemental Table 14 and Supplemental Data 10). A total of 54, 39, 26, 33 genes involved in the lignin biosynthesis pathway were detected in B. amplexicaulis, P. edulis, O. latifolia, and R. guianensis, respectively (Supplemental Table 15 and Supplemental Data 10). Overall, the absolute copy number of both cellulose- and lignin-related genes increased in polyploid woody bamboo species compared with diploid herbaceous species. In further investigation of the duplication events that resulted in extra copies of these genes, an apparently distinct overall pattern was found in woody versus herbaceous species. Fewer and randomly scattered duplication events occurred in the herbaceous species O. latifolia (20 duplication events) and R. guianensis (19 events), while many more duplication events occurred in the woody species B. amplexicaulis (49 events) and P. edulis (36 events), with considerable enrichment of duplicated genes with a Ks distance of 0.1–0.2 (30 of 49 in B. amplexicaulis and 23 of 36 in P. edulis) (Figure 5 and Supplemental Data 11), which was associated with the polyploidization of the woody bamboos (Supplemental Figure 12). The same pattern was found for specific genes such as HCT (hydroxycinnamoylCoA, shikimate/quinate) and CCR (cinnamoyl CoA reductase), which encode key enzymes catalyzing the conversion of phenylpropanoid pathway products into the material for lignin biosynthesis (Vanholme et al., 2010) (Figure 5). Furthermore, we anchored the PAL (encodes phenylalanine ammonia-lyase, the first enzyme in the phenylpropanoid and lignin pathway) gene copies to subgenomes in B. amplexicaulis (Supplemental Figure 15), confirming that the duplicated gene copies resulted from polyploidization. We also compared the copy numbers of lignin pathway genes of bamboos with those of other model plants with different levels of lignification (Hamberger et al., 2007) and found the degree of lignification to increase as the gene copy number grew (Supplemental Figure 16), suggesting that polyploidization and extra copies were relevant to the origin of the woody trait in bamboos.

progenitors occurred in a short interval from 31.1 to 33.4 Ma. Three successive polyploidization events occurring 19 to 22 Ma gave rise to the three major lineages, a hexaploid (ABC) and two tetraploids (BC and CD). These three lineages, e.g., palaeotropical woody bamboo (PWB), neotropical woody bamboo (NWB), and temperate woody bamboo (TWB), are represented by the extant species, Bam, Gan, and Ped, respectively. Climatic sequence of events on the right is modified from Zachos et al. (2001).

Please cite this article in press as: Guo et al., Genome Sequences Provide Insights into the Reticulate Origin and Unique Traits of Woody Bamboos, Molecular Plant (2019), https://doi.org/10.1016/j.molp.2019.05.009

Molecular Plant

Reticulate Origin of Woody Bamboos Woody bamboo

Herbaceous bamboo Ola

Rgu

Lignin biosynthesis

0.75

0.90

Ped

Cellulose synthase CesA

C4H

p-coumairc acid

cinnamic acid

PAL

Csl

phenylalanine

0.45

0.60

4CL

p-coumaroyl -CoA

HCT p-coumaroyl shikimic acid

C3H

caffeoyl shikimic acid

CoA

CCoAOMT

feruloyl-CoA CCR

CCR sinapaldehyde p-coumaraldehyde

0.30

HCT caffeoyl-

coniferaldehyde CAD

CAD

CAD

sinapyl alcohol

coniferyl alcohol

0.15

p-coumaryl alcohol

F5H COMT 5-hydroxyconiferaldehyde

H lignin

S lignin

G lignin

0

Synonymous substitutions per site ( Ks ) of gene pairs

Bam

Figure 5. Evolution of the Genes Involved in Cellulose and Lignin Biosynthesis Pathways. Comparison of recent duplications for the genes involved in cellulose and lignin biosynthesis pathways. The colorful symbols (circle, triangle, square, and star) represent genes involved in the cellulose and lignin biosynthesis pathway (modified from Vanholme et al., 2010 and Kumar et al., 2016). Each symbol indicates a duplication event on the Ks distribution plot. The y axis indicates the calculated Ks value of each duplicated gene pair. The blue boxes show that the woody bamboos Bonia amplexicaulis (Bam) and Phyllostachys edulis (Ped) underwent more gene duplications, with Ks values clustering at 0.10 to 0.20, compared with the herbaceous bamboos Olyra latifolia (Ola) and Raddia guianensis (Rgu). PAL, phenylalanine ammonia lyase; C4H, cinnamate4-hydroxylase; 4CL, 4-coumarate:CoA ligase; HCT, hydroxycinnamoyl-CoA:shikimate/quinate hydroxycinnamoyltransferase; C3H, p-coumaroyl shikimate 30 -hydroxylase/coumaroyl 3-hydroxylase; CCoAOMT, trans-caffeoyl-CoA 3-O-methyltransferase; CCR, cinnamoyl-CoA reductase; F5H, ferulate 5-hydroxylase; COMT, caffeic acid 3-O-methyltransferase; CAD, cinnamyl alcohol dehydrogenase; CesA, cellulose synthase; Csl, cellulose synthase-like.

To address the potential functional diversification of paralogs from different subgenomes, we analyzed the expression of cellulose- and lignin-related genes in various tissues of hexaploid B. amplexicaulis and determined the expression pattern of 96 gene copies assigned to specific subgenomes (Supplemental Data 12). Despite gene copy numbers being generally similar among the three subgenomes (Supplemental Data 12), we found that gene copies from subgenome C were significantly enriched (p < 0.05; hypergeometric distribution) in group I genes with high expression in the shoot and leaf, which represent the degree of lignification (Supplemental Figure 17). This result indicated that subgenome C could have played a critical role in the origin of the bamboo woody trait. However, further experimental evidence is needed to confirm this deduction.

Genetic Clue to the Unique Flowering Behavior of Woody Bamboos Woody bamboos typically exhibit gregarious flowering cycles of up to 120 years followed by death of the parent plants, which is unique among flowering plants with the possible exception of Strobilanthes (Janzen, 1976; Clark et al., 2015). Because herbaceous bamboos feature annual flowering behavior like rice and other typical grasses, woody bamboos can be considered ‘‘evolutionary mutants’’ of their herbaceous relatives (Hu and Saedler, 2007). To identify genetic variation potentially involved in flowering regulatory networks specific for woody bamboos (Figure 6A), we examined putative orthologs of 414 Arabidopsis and 234 rice flowering genes (Supplemental Data 13) and detected no strong bias in rates of frame-shift and premature mutations in five bamboo genomes (Supplemental Table 16).

However, differences in the gene structures of OsSOC1 orthologs, especially in the 50 untranslated regions (UTRs), introns, and protein domains, clearly distinguished the woody bamboos from herbaceous forms (Figure 6B and Supplemental Table 17). The herbaceous O. latifolia and R. guianensis orthologs displayed an identical gene structure and the same single copy number as in rice. For both tetraploid (G. angustifolia and P. edulis) and hexaploid (B. amplexicaulis) woody bamboos, two OsSOC1 orthologs were identified. The orthologs Gan-1, Gan-2, and Ped-2 had no or partial MADSbox domains and 50 UTRs. Bam-1 and Bam-2 had no 50 UTR regions and an exon 7 with very low sequence identity (<20%) to OsSOC1. Although the MADS-box domain and 50 UTR are still present in Ped-1, a very long intron containing insertions of multiple LTR-type transposons of different sizes was found. Since transcription factor binding sites are often present in the 50 UTR regions and introns of MADS-box genes, and insertions of TEs are associated with epigenetic modification of chromatin that alters transcriptional activity (Hepworth et al., 2002; Helliwell et al., 2006; Liu et al., 2008; Rebollo et al., 2011; Dong et al., 2012), all of these SOC1-like genes in woody bamboos are presumed to be lost or malfunctioning analogs of OsSOC1. As a key integrator sensing signals in almost all the flowering pathways, changes in the protein structure of SOC1 orthologs leads to strong variation in flowering time (Samach et al., 2000; Hepworth et al., 2002; Immink et al., 2012). Thus, these changes in SOC1 orthologs are potentially the cause of the extremely long vegetative growth phase in woody bamboos. Additional genes distinguishing woody bamboos from their herbaceous relatives and other grasses are the circadian rhythm regulators, OsPRR95/AtPRR9-like genes. Five of the six Molecular Plant --, 1–13, -- 2019 ª The Author 2019.

7

Please cite this article in press as: Guo et al., Genome Sequences Provide Insights into the Reticulate Origin and Unique Traits of Woody Bamboos, Molecular Plant (2019), https://doi.org/10.1016/j.molp.2019.05.009

Molecular Plant

Reticulate Origin of Woody Bamboos

A

M

B integrators 1

sugar and aging

2

I 1kb

K

C AtSOC1

3

4

5

67

8

Bam-2

10.40kb b

miR156

SPLss

SOC1 /soc1

active GAs de-active GAs

27.63kb

FLC

15.40kb b

GA2OXs

OsSOC1 Rgu-1

FD FT

Bam-1

23.40kb

SVP

Ped-2 19.96kb

86 58

Gan-2 Ped-1

GA3

57.84kb

PRR95

99

97

Ola-1

27.04kb

autonomous/ temperature related

100

74 50

Gan-1 gibberellin acids

photoperiod & circadian clock

DNA

LINE

LTR

Figure 6. Molecular Regulation of the Floral Transition in Woody Bamboos. (A) A simplified gene regulatory network for the floral transition in woody bamboos. Solid and dashed lines indicate direct and indirect regulation, respectively. Arrows indicate positive regulation and blocks indicate negative regulation. Loss/malfunction of SOC1-like genes (names in lowercase in blue), positive selection of OsPRR95-like genes in the photoperiod and circadian clock pathway (boxed in orange), and copy number variation (red arrow, increased; blue arrow, decreased) of GA pathway enzymes (light green) associated well with the long vegetative phase in woody bamboos. (B) Gene structure variation of SOC1-like genes in woody (gene names in blue) and herbaceous bamboos. Phylogeny (right panel; neighbor-joining tree; only branches with bootstrap support above 50% were kept) and gene structure (left panel) of the SOC1-like orthologs in woody bamboos are presented. Exons (numbered below) are shown as boxes, while lines mark the introns. Gene domains are shown with different color (red, 50 UTR; blue, MADSdomain/M; brown, I-region/I; light green, K-box/K; yellow, C-terminus/C). Boxes filled by stretched lines indicate incomplete exons. Numbers below the lines indicate the length of intron between M and I for all genes except Gan-1 and Gan-2 (their MADS-domain was undetermined). The occurrence of different types of TEs in this intron is indicated by filled blue stars (DNA, DNA transposons), orange parallelograms (LINE, long interspersed nuclear elements), and purple ellipses (LTR, long terminal repeat retrotransposons). Note the loss of 50 UTRs in Gan-1, Gan-2, and Ped-2, and low sequence identity for exon 7 (in gray) in Bam-1 and Bam-2.

OsPRR95-like genes identified in woody bamboos were under strong selection pressure (Supplemental Figure 18 and Supplemental Data 14). OsPRR95/AtPRR9 encodes a transcriptional repressor of CCA1/LHY and activator of LWD1/ LWD2, thereby controlling circadian rhythms, with loss of function causing very late flowering (Nakamichi et al., 2005, 2007; Salome et al., 2010). Whether the selected alterations in woody bamboo sequences cause a delay in circadian rhythms awaits further investigation. We further analyzed copy number variation of all flowering-related genes and detected 65 genes (Supplemental Data 15) that show obvious deviation from the expected copy number in woody bamboos. Interestingly, GA3-like genes, which encode key enzymes essential for synthesis of gibberellin acids (GA), decreased dramatically in woody bamboos to one to two copies compared with the five in herbaceous species. In contrast, the number of GA2OX genes, which encode GA degradation enzymes, increased significantly (Figure 6A and Supplemental Table 18). These changes in the number of GA3-like and GA2OX genes appear to have led to a low level of active GAs in woody bamboos (Wang et al., 2015). In summary, loss/malfunction of SOC1-like genes, positively selection of OsPRR95-like genes, and copy number variation of GA pathway enzymes together 8

Molecular Plant --, 1–13, -- 2019 ª The Author 2019.

(Figure 6) contributed to the regulation of the long vegetative growth phase in woody bamboos. Hybridization and polyploidization are major drivers of plant evolution (Ferna´ndez-Mazuecos and Glover, 2017; Van de Peer et al., 2017). The most developed examples of allopolyploidization from a genomic perspective are the crop plants wheat and rape, both with clearly known donor species (Marcussen et al., 2014; Cheng et al., 2016). In broad-scale evolutionary history, however, the donor species in ancient hybridization events of plants are extinct (Soltis et al., 2003). We have showcased how ancient reticulate evolution of bamboo at the subgenome level can be successfully elucidated through whole-genome sequencing. This study can serve as a potential model for furthering research on the complicated evolution and origins of other angiosperms. For bamboos, a peculiar subfamily of grasses, there has been a ‘‘black box’’ of knowledge regarding chromosomal and genomic evolution due to their particularly long life cycle and unpredictable flowering. Our construction of the first high-density genetic linkage map for a hexaploid woody bamboo, in association with subgenome analyses, will substantially advance comparative genomics studies of the grass family. Identification of a shared subgenome illuminates the origin of the woody trait of bamboo, which has long been a major question in

Please cite this article in press as: Guo et al., Genome Sequences Provide Insights into the Reticulate Origin and Unique Traits of Woody Bamboos, Molecular Plant (2019), https://doi.org/10.1016/j.molp.2019.05.009

Molecular Plant

Reticulate Origin of Woody Bamboos grass evolution. Together with novel insights into unique flowering behavior, these findings will greatly enhance further research on bamboo biology and on plant biology in general.

METHODS Sampling and Planting We sampled B. amplexicaulis, a representative hexaploid woody species adapted to limestone hills, in Longzhou County, Guangxi Province, China. G. angustifolia, a representative of the tetraploid tropical clade with a woody culm up to 30 m tall and 22 cm in diameter, was sampled from Xishuangbanna Tropical Botanical Garden. O. latifolia and R. guianensis, representatives of the herbaceous bamboos with herbaceous culms and perennial flowering characteristics, were sampled in the greenhouse at the Germplasm Bank of Wild Species in Kunming. A first-generation inbred population (S1 population) derived from selfing a D. latiflorus parent plant (S0) was developed in Mile County, Yunnan Province. We collected about 4000 mature seeds to germinate 600 seedlings, which were planted in the greenhouse at the Germplasm Bank of Wild Species in September 2014. A total of 190 seedlings were randomly selected for the construction of a genetic map.

De Novo Genome Assembly The genomes of B. amplexicaulis, O. latifolia, G. angustifolia, and R. guianensis were assembled using SOAPdenovo (version 2.04) (Luo et al., 2012) with the K-mers set at 57, 73, 61, and 61 bp, respectively. SSPACE (version 3.0) (Boetzer et al., 2011) with the parameters -x 0, -m 30, -o 15, -k 5, -a 0.70, -n 15, -p 0, -v 0, -z 0, and -g2 was used to build scaffolds from the mate-pair libraries of B. amplexicaulis and O. latifolia. Gaps in the scaffolds were filled using the GapCloser (version 1.12) module from SOAPdenovo. The completeness of the assembled genomes was assessed using the BUSCO gene-mapping method to identify conserved protein-coding genes (Sima˜o et al., 2015).

Genetic Map Construction and Assembly Validation A double-digest restriction-site-associated DNA (ddRAD) library was constructed for the parent and 190 offspring from 200 ng of DNA following an optimized protocol described in Yang et al. (2016) and sequenced on an Illumina Hiseq X10 (Illumina, San Diego, CA); about 500 Gb of clean 150 bp pair-end read data was obtained. Two individuals with an insufficient amount of clean data and tags (<2 million reads, <50 000 tags) were excluded from the final genotype matrix. The clean reads were aligned to a D. latiflorus survey genome (G.-Q. Y., unpublished data) with bowtie (-n 3) (Langmead, 2010), and the STACKs package (version 1.41) (Catchen et al., 2013) was used to identify SNPs. ‘‘pstacks’’ was used to extract stacks and identify SNPs (-m 10 for parents and -m 3 for the offspring), and ‘‘cstacks’’ was used to create a set of consensus loci, merging alleles together for the parent. Tags from each progeny were matched against the catalog to determine progeny alleles with ‘‘sstacks,’’, and genotype matrices for the parent and offspring were encoded as the CP map type by the ‘‘genotypes’’ module. A heterozygous genotype was called when the minor allele frequency was a minimum of 0.1 (MAF R0.1). Because the only type of separation that occurs in the parent is hk 3 hk, the heterozygous genotype was represented by ‘‘hk,’’ the homozygous genotype by ‘‘hh’’ or ‘‘kk,’’ and a missing genotype by ‘‘-.’’ The genetic map was built using markers present in more than 85% of the offspring (160 individuals, missing data points %15%) with JoinMap (version 4.0) (Van Ooijen, 2006). The regression algorithm was used to calculate the order of markers (recombination frequency %0.4, logarithm of odds [LOD] >0.1, jump threshold 5, add one locus after which perform a ripple). Map distance was then calculated using the Kosambi function. After the preliminary mapping, the ‘‘genotype probabilities’’ function was used to detect double recombination events, the suspicious genotypes were replaced with missing values (Isidore

et al., 2003; Van Os et al., 2005), and marker order was calculated with an additional round of linkage map reconstruction. Markers exhibiting significantly distorted segregation were excluded from the map (c2 test, p < 0.01, df = 2) during the linkage analysis. Redundant markers (similarity = 1) were also excluded from the map. The minimum value of LOD is 6 for creating different LGs. Single markers that were not assigned to any group and LGs with fewer than two markers were excluded. Distorted markers that did not alter the order of the surrounding markers were finally added to the constructed LGs. The map was drawn with MapChart (version 2.2) (Voorrips, 2002). The genome coverage was calculated as follows: c = 1  e  2dn/L (d is the average map distance, n is the total number of markers, and L is the total length of the genetic map) (Wang et al., 2009; Palomar et al., 2017). The collinearity of the constructed genetic map with the rice genetic map was determined based on the homologs of ddRAD tags with BLASTN scores of e < 1e5. The constructed genetic map was finally used for anchoring the B. amplexicaulis genome assembly with alignment of the scaffolds by BLASTN (e < 1e20, identity R0.90, and coverage R50%).

Annotation of Repeats We predicted tandem repeats using Tandem Repeats Finder (version 4.07) (Benson, 1999) with the parameters: Match = 2, Mismatch = 7, Delta = 7, PM = 80, PI = 10, Minscore = 50, and MaxPeriod = 2000. We identified and classified repeat sequences using a combination of homology-based and de novo methods. The homology-based method was performed using RepeatMasker (version 4.03) (http://www. repeatmasker.org/RMDownload.html) against the Repbase (version 20.05) database (http://www.girinst.org/). The ab initio prediction program Repeat Modeler (version 1.08) was employed to build a de novo repeat library from the bamboo genomes. This program combines two de novo repeat-finding programs, RECON (version 1.08) (Bao and Eddy, 2002) and RepeatScout (version 1.05) (Price et al., 2005), for identifying repeat element boundaries and family relationships. TEs in the genomes were classified into two main classes described by RepeatMasker: Class I (retro-transposon elements) and Class II (DNA transposons). Class I elements primarily contained LTR retro-elements, short interspersed nuclear elements, and long interspersed nuclear elements (LINEs), whereas Class II consisted of En-Spam, Mudra, and other sequences.

Prediction of Protein-Coding Genes We employed three methods to predict protein-coding genes: homologybased, de novo, and transcriptome-based. We downloaded the protein sequences of Arabidopsis, Brachypodium, and Sorghum from Phytozome (version 10.0; https://phytozome.jgi.doe.gov), rice genes from Rapdb (IRGSP-1.0; http://rapdb.dna.affrc.go.jp), and moso bamboo (P. edulis) genes from the official website (version 2; http://202.127.18.221/ bamboo/). According to the alignment of the reference sequences to the repeat-masked genomes, the gene models were extracted using GeneWise (version 2.2.0) (Birney et al., 2004). The de novo gene models were predicted using two ab initio gene-prediction software tools, Augustus (version 3.1) (Stanke and Waack, 2003) and GENSCAN (version 1.0) (Burge and Karlin, 1997). RNA-sequencing data were mapped to the reference genome using TopHat (version 2.1.1) (Trapnell et al., 2009) and then assembled using Cufflinks (version 2.2.1) (Trapnell et al., 2012). All of the homology-based, de novo, and transcriptome-based gene sets were merged to form a comprehensive and non-redundant reference gene set using the software GLEAN (http://sourceforge.net/projects/ glean-gene/).

Identification of Non-coding RNA Genes Four types of non-coding RNA genes, tRNAs, rRNAs, miRNAs, and snRNAs, were predicted in the four sequenced bamboo genomes. The tRNA genes were predicted using tRNAscan-SE (version 1.3.1) (Lowe and Eddy, 1997) with eukaryote parameters. Identification of the rRNA genes was conducted by aligning known Arabidopsis and rice rRNAs

Molecular Plant --, 1–13, -- 2019 ª The Author 2019.

9

Please cite this article in press as: Guo et al., Genome Sequences Provide Insights into the Reticulate Origin and Unique Traits of Woody Bamboos, Molecular Plant (2019), https://doi.org/10.1016/j.molp.2019.05.009

Molecular Plant against the four bamboo genomes using BLASTN (e < 1e5, identity R0.85, and aligned length R50 bp). The miRNA and snRNA genes were predicted using INFERNAL (Nawrocki et al., 2009) against the Rfam database (Bao et al., 2015).

Annotation of Gene Functions We annotated the gene functions based on the highest scoring match to proteins in the SwissProt and TrEMBL databases (release 2015-10; http:// www.uniprot.org) using BLASTP with an e-value cutoff of less than 1e5. Motifs and conserved functional domains were also identified by performing InterProScan (version 5.8.49) (Jones et al., 2014) searches against different protein databases, such as ProDom, PANTHER, PRINTS, PROSITE, PfamA, and SMART. Genes involved metabolic pathways were identified using the KEGG Automatic Annotation Server (Moriya et al., 2007).

Identification and Analyses of Gene Clusters Genome annotations of Brachypodium, rice, and P. edulis were downloaded, and the transposable-element-liked genes and coding sequences of length less than 150 bp were discarded. An all-against-all BLASTP was then performed with an e-value cutoff of less than 1e5. The BLASTP outputs were clustered into orthologous groups using OrthoMCL (version2.0.9) (Li et al., 2003) with the parameter I set at 1.5. A total of 1012 one-to-one single-copy gene clusters shared by all of the analyzed genomes were extracted, and the protein sequences were aligned by MAFFT (version 7.221) (Katoh and Standley, 2013) and back translated into coding sequences. A phylogenetic tree was inferred from the concatenated sequences of 1012 single-copy gene clusters using RAxML (version 8.2.4) (Stamatakis, 2014) with the GTRGAMMA model. Based on this phylogenetic tree, the extent of gene family expansion and contraction across all of the genomes was calculated using CAFE (version 3.0) (Han et al., 2013) with a stochastic birth-and-death model.

Identifying Gene Collinearity between Bamboos and Rice The genes of four sequenced bamboo genomes and P. edulis were aligned to the reference rice gene models using BLASTP with an e-value cutoff of 1e20. When calling syntenic gene blocks from the scaffolds, we used the criteria as follows: number of genes on the syntenic blocks R2, and number of non-syntenic bamboo genes between two adjacent syntenic genes %10. All syntenic blocks were checked manually. Each syntenic block was anchored on the rice genomes. Based on the identified blocks syntenic with rice, a dotplot map between diploid O. latifolia and hexaploid B. amplexicaulis was generated using MCScanX (Wang et al., 2012) with default parameters.

Polyploidization History and Subgenome Evolution To investigate the WGD events in woody bamboo genomes, we plotted the distribution of 4DTv values for syntenic gene pairs; we used a total of 7287, 349, and 6196 gene pairs to calculate the 4DTv values for B. amplexicaulis, G. angustifolia, and P. edulis, respectively. The 4DTv values were obtained using the maximum likelihood (ML) method implemented in CodeML of the PAML package (version 4.8) (Yang, 2007). We further employed phylogenomic analyses of ‘‘perfect-copy’’ syntenic genes to infer polyploidization events for the woody bamboos and evolution of their subgenomes; an overall workflow is shown in Supplemental Figure 9. In brief, we first identified syntenic blocks containing the expected number of gene copies orthologous to rice based on the polyploidy level (i.e., two and three copies in tetraploid and hexaploid bamboos, respectively). Two datasets were assembled: one included O. latifolia, P. edulis, and B. amplexicaulis and another included O. latifolia, G. angustifolia, and B. amplexicaulis, with rice as outgroup in both datasets. The 769 orthologous gene sets from 187 syntenic blocks (Supplemental Data 5) identified from the former dataset and 251 orthologous gene sets (Supplemental Data 7) identified from the latter

10

Molecular Plant --, 1–13, -- 2019 ª The Author 2019.

Reticulate Origin of Woody Bamboos one were used for subsequent phylogenetic tree building. Individual orthologous gene sets were aligned using ClustalW (Larkin et al., 2007) with the default set as amino acid followed by manual check in MEGA 6.0 (Tamura et al., 2013) to exclude regions that were difficult to be aligned and then converted back into nucleotide sequences. Alignments of genes within the same syntenic block were concatenated for the O. latifolia, P. edulis, and B. amplexicaulis dataset. ML phylogenetic analyses using RAxML (version 8.2.4) under the GTRGAMMA model were performed in searching for the best scoring ML tree combined with 1000 rapid bootstrap replicates. The subgenomes in polyploid bamboos were identified through comparative analyses of inferred phylogenetic trees. Additional details are available in the Supplemental Information. The 61 ‘‘perfect-copy’’ syntenic genes shared by O. latifolia, G. angustifolia, P. edulis, and B. amplexicaulis that could be assigned to subgenomes (Supplemental Data 9) were aligned and concatenated to infer the evolutionary relationships among the subgenomes and the polyploidization history of woody bamboos. In addition to unpartitioned ML analyses as described above, the alignment was also partitioned by codon positions. To place the evolution of bamboo subgenomes on a time scale, we estimated divergence times among them using the program MCMCtree in the PAML package (version 4.8). The split between rice and bamboos was set to 48.6–54 Ma to calibrateion the root of the ML phylogenetic tree based on previous studies (The International Brachypodium Initiative, 2010; Peng et al., 2013). The Markov chain Monte Carlo process was run for 2 million iterations with a 10% burn-in, sampling every 100 iterations. Convergence was assessed using Tracer (version 1.7.1) (Rambaut et al., 2018) and confirmed by two independent runs. Additionally, out of the 769 orthologous gene sets identified above, 636 assigned to subgenomes were used to measure genetic divergence between subgenomes. For the 636 gene sets, the Ks values of each syntenic gene pair between any two (sub)genomes of O. latifolia, P. edulis, and B. amplexicaulis were calculated using the ML method implemented in CodeML of the PAML package (version 4.8).

Analyses of Genes Associated with the Woody Trait Genes related to cellulose synthase (CesA), cellulose synthase-like (Csl), and phenylpropanoid-lignin biosynthesis were investigated in the genomes of B. amplexicaulis, O. latifolia, and R. guianensis and another six model plants. We first collected the homologous sequences of CesA, Csl, and phenylpropanoid-lignin biosynthesis genes in Arabidopsis, rice, sorghum, poplar, and maize (Yin et al., 2009), then aligned the predicted amino acid sequences to the proteomes of three bamboo genomes using BLASTP with an e-value threshold of 1e10. The genes with alignment hits covering over 200 amino acids and at least 50% protein sequence identity were considered to be candidate genes. We further constructed a neighbor-joining tree in MEGA 6.0 using the reported genes as seeds to manually filter the candidate genes. To detect gene duplication events and history, we conducted Ks analysis of CesA, Csl, and phenylpropanoid-lignin genes among four bamboo genomes. Protein sequences of the identified genes were separately multi-aligned using MUSCLE (version 3.8.31) (Edgar, 2004) with default parameters. The alignment files were used to construct phylogenetic trees using the RAxML package (version 8.2.4). The gene pairs on the phylogenetic trees were selected to calculate Ks using PAML (version 4.8). We further investigated the expression patterns of CesA, Csl, and phenylpropanoid-lignin genes in B. amplexicaulis using transcriptome sequencing datasets from shoots, leaves, buds, and flowers. The highquality reads were mapped to the bamboo genome assembly using TopHat (version 2.1.1), and the generated alignment files were fed to Cufflinks (version 2.2.1) to calculate the expression level of each gene located on the bamboo genome. The expression levels were measured using fragments per kilobase of transcripts per million fragments mapped (FPKM). We visualized the expression dynamics of each bamboo cell-wall

Please cite this article in press as: Guo et al., Genome Sequences Provide Insights into the Reticulate Origin and Unique Traits of Woody Bamboos, Molecular Plant (2019), https://doi.org/10.1016/j.molp.2019.05.009

Molecular Plant

Reticulate Origin of Woody Bamboos biosynthesis gene using the ‘‘pheatmap’’ package (version 1.0.8) in R (version 3.4.1). Enrichment of A/B/C subgenome-derived genes in each expression cluster was assessed using a hypergeometric distribution test. A p value of less than 0.05 was defined as siginificantly enriched.

Identification and Analysis of Genes Involved in Flowering Pathways The genes reported to be involved in flowering pathways in Arabidopsis and rice were collected as the seed sequences to search for candidates using BLASTP (e < 1e5) in five bamboo genomes. To facilitate the clustering of orthologous genes, we used the protein sequences of the flowering genes from Arabidopsis, rice, Brachypodium, and Oropetium thomaeum to perform a second round of alignment to the protein sequences of the candidate bamboos genes (e < 1e5, coverage >100 amino acids, identity >50%). The top 100 alignments were retrieved for multiple sequence alignments using ClustalW and construction of ML trees with FastTree (version 2.1) (Price et al., 2010). The orthologous genes in each bamboo genome were further identified by a manual check of all the trees for the clades containing the seed sequences, and the syntenic relationship to seed genes of rice was checked for the ambiguous candidates. To detect sequence variation, we scanned the genes identified in the bamboo genome for frame-shifts and premature mutations using GeneWise (version 2.2.0). For detailed determination of structure variation, we also manually searched for the loss of exons and insertions of repetitive elements (e.g., DNA transposon, LTR, and LINE) in the introns, 50 UTRs, and promoter regions. All gene variation was confirmed by raw sequencing reads. Positive selection on all flowering genes of five bamboos was tested using PAML (version 4.8). Two likelihood ratio tests (LRTs) were performed by means of the terminal branch-site modes of codon evolution using the program CodeML (Zhang et al., 2005) in the package PAML. Significance (p < 0.05) of differences in the compared LRTs was determined using the program ‘‘chi2’’ from the package PAML. Positive selection was calculated using a combination of three herbaceous plant genome backgrounds (O. sativa, B. distachyon, and O. thomaeum) and one bamboo genome foreground (O. latifolia, R. guianensis, G. angustifolia, P. edulis, or B. amplexicaulis). Each of the genes detected to be under positive selection was manually checked.

ACCESSION NUMBERS All the raw data of genomic sequence, transcriptome, and genetic map have been deposited in the NCBI Sequence Read Archive under accession numbers PRJNA432605 and PRJNA417139. The genome assemblies and annotation files are available at http://www.genobank.org/ bamboo.

SUPPLEMENTAL INFORMATION Supplemental Information is available at Molecular Plant Online.

FUNDING This work was supported by funding from Strategic Priority Program of the Chinese Academy of Sciences (CAS) (XDB31000000 to D.-Z.L.), the National Natural Science Foundation of China (grants 31430011 to D.-Z.L. and 31670227 to Z.-H.G.), Leading Talents Program of Yunnan Province (2017HA014 to D.-Z.L.), CAS Youth Innovation Promotion Association (2015321 to P.-F.M.), a grant from Germplasm Bank of Wild Species (Y77P4412Z1 to Z.-H.G.), and CAS Pioneer Hundred Talents Program (292015312D11035 to J.-Y.H.).

AUTHOR CONTRIBUTIONS D.-Z.L., Z.-H.G., P.-F.M., and C.-S.L. conceived and designed the research. Z.-H.G. and Y.L. managed the project. P.-F.M. and Z.-H.G. performed the evolutionary analysis. G.-Q.Y. constructed the genetic map. Y.L., Y.-L.L, L.Z., G.-L.S., Y.-X.X., and Y.-J.Z. performed the genome sequencing, assembly, and bioinformatics. J.-Y.H. and M.-C.Z. per-

formed analysis on bamboo flowering genes. E.-H.X. performed analysis on woody feature-related genes. Y.-C.Z., Y.-X.Z., P.-F.M., X.-M.Z., and Y.Y. prepared the samples. M.-Y.Z., C.G., Y.G., J.-X.L., X.-Y.Y., Y.-M.C., and B.H. contributed to data analysis. D.-Z.L., Z.-H.G., P.-F.M., Y.L., J.-Y.H., and G.Y. wrote and revised the manuscript.

ACKNOWLEDGMENTS We acknowledge Prof. Jeffrey L. Bennetzen (University of Georgia, USA) for considerable discussion and suggestions, Prof. John W. Stiller (East Carolina University, USA) for deeply revising the manuscript, Dr. ChengJun Zhang for discussion on data analysis, Dr. Ping-Yuan Wang and Dr. Yang Luo for providing bamboo pictures, and Dr. Jian-Jun Jin, KeCheng Qian, and Zu-Chang Xu for assistance in data analysis. No conflict of interest declared. Received: March 10, 2019 Revised: May 1, 2019 Accepted: May 20, 2019 Published: May 27, 2019

REFERENCES Bamboo Phylogeny Group. (2012). An updated tribal and subtribal classification of the bamboos (Poaceae, Bambusoideae). Proceedings of the 9th World Bamboo Congress, Gielis, J. and Potters G. (eds), pp 3–27. Bao, W.D., Kojima, K.K., and Kohany, O. (2015). Repbase update, a database of repetitive elements in eukaryotic genomes. Mobile DNA 6:11. Bao, Z., and Eddy, S.R. (2002). Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res. 12:1269–1276. Barros, J., Serk, H., Granlund, I., and Pesquet, E. (2015). The cell biology of lignification in higher plants. Ann. Bot. 115:1053–1074. Benson, G. (1999). Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27:573–580. Birney, E., Clamp, M., and Durbin, R. (2004). GeneWise and genomewise. Genome Res. 14:988–995. Boetzer, M., Henkel, C.V., Jansen, H.J., Butler, D., and Pirovano, W. (2011). Scaffolding pre-assembled contigs using SSPACE. Bioinformatics 27:578–579. Burge, C., and Karlin, S. (1997). Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268:78–94. Catchen, J., Hohenlohe, P.A., Bassham, S., Amores, A., and Cresko, W.A. (2013). Stacks: an analysis tool set for population genomics. Mol. Ecol. 22:3124–3140. Chen, R.Y., Li, X.L., Song, W.Q., Liang, G.L., Zhang, P.X., Lin, R.X., Zong, W.X., Chen, C.B., and Fung, X.L. (2003). Chromosome Atlas of Various Bamboo Species. Chromosome Atlas of Major Economic Plants Genome in China, Vol. IV (Beijing: Science Press). Cheng, F., Sun, R., Hou, X., Zheng, H., Zhang, F., Zhang, Y., Liu, B., Liang, J., Zhuang, M., Liu, Y., et al. (2016). Subgenome parallel selection is associated with morphotype diversification and convergent crop domestication in Brassica rapa and Brassica oleracea. Nat. Genet. 48:1218–1224. Clark, L.G., London˜o, X., and Ruiz-Sanchez, E. (2015). In Bamboo Taxonomy and Habitat. Bamboo, the Plant and its Uses, Vol. 10, W. € hl, eds. (Switzerland: Springer International Liese and M. Ko Publishing), pp. 1–30. Cosgrove, D.J. (2005). Growth of the plant cell wall. Nat. Rev. Mol. Cell Biol. 6:850–861. € bel, U., Engelhorn, J., He, F., Schoof, H., and Dong, X., Reimer, J., Go Turck, F. (2012). Natural variation of H3K27me3 distribution between

Molecular Plant --, 1–13, -- 2019 ª The Author 2019.

11

Please cite this article in press as: Guo et al., Genome Sequences Provide Insights into the Reticulate Origin and Unique Traits of Woody Bamboos, Molecular Plant (2019), https://doi.org/10.1016/j.molp.2019.05.009

Molecular Plant two Arabidopsis accessions and its association with flanking transposable elements. Genome Biol. 13:R117. Edgar, R.C. (2004). MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32:1792–1797. Ferna´ndez-Mazuecos, M., and Glover, B.J. (2017). The evo-devo of plant speciation. Nat. Ecol. Evol. 1:110. Hamberger, B., Ellis, M., Friedmann, M., de Azevedo Souza, C., Barbazuk, B., and Douglas, C.J. (2007). Genome-wide analyses of phenylpropanoid-related genes in Populus trichocarpa, Arabidopsis thaliana, and Oryza sativa: the Populus lignin toolbox and conservation and diversification of angiosperm gene families. Can. J. Bot. 85:1182–1201. Han, M.V., Thomas, G.W.C., Lugo-Martinez, J., and Hahn, M.W. (2013). Estimating gene gain and loss rates in the presence of error in genome assembly and annotation using CAFE 3. Mol. Biol. Evol. 30:1987–1997.

Reticulate Origin of Woody Bamboos Liu, C., Chen, H., Er, H.L., Soo, H.M., Kumar, P.P., Han, J.H., Liou, Y.C., and Yu, H. (2008). Direct interaction of AGL24 and SOC1 integrates flowering signals in Arabidopsis. Development 135:1481–1491. Lowe, T.M., and Eddy, S.R. (1997). tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25:955–964. Luo, R., Liu, B., Xie, Y., Li, Z., Huang, W., Yuan, J., He, G., Chen, Y., Pan, Q., Liu, Y., et al. (2012). SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience 1:18. Marcussen, T., Sandve, S.R., Heier, L., Spannagl, M., Pfeifer, M., International Wheat Genome Sequencing Consortium, Jakobsen, K.S., Wulff, B.B., Steuernagel, B., Mayer, K.F., et al. (2014). Ancient hybridizations among the ancestral genomes of bread wheat. Science 345:1250092. € ring, A., and Persson, S. (2014). The cell biology of Mcfarlane, H.E., Do cellulose synthesis. Annu. Rev. Plant Biol. 65:69–94.

Helliwell, C.A., Wood, C.C., Robertson, M., James Peacock, W., and Dennis, E.S. (2006). The Arabidopsis FLC protein interacts directly in vivo with SOC1 and FT chromatin and is part of a high-molecularweight protein complex. Plant J. 46:183–192.

Moriya, Y., Itoh, M., Okuda, S., Yoshizawa, A.C., and Kanehisa, M. (2007). KAAS: an automatic genome annotation and pathway reconstruction server. Nucleic Acids Res. 35:W182–W185.

Hepworth, S.R., Valverde, F., Ravenscroft, D., Mouradov, A., and Coupland, G. (2002). Antagonistic regulation of flowering-time gene SOC1 by CONSTANS and FLC via separate promoter motifs. EMBO J. 21:4327–4337.

Nakamichi, N., Kita, M., Ito, S., Yamashino, T., and Mizuno, T. (2005). Pseudo-response regulators, PRR9, PRR7 and PRR5, together play essential roles close to the circadian clock of Arabidopsis thaliana. Plant Cell Physiol. 46:686–698.

Hilu, K.W. (2004). Phylogenetics and chromosomal evolution in the Poaceae (grasses). Aust. J. Bot. 52:13–22.

Nakamichi, N., Kita, M., Niinuma, K., Ito, S., Yamashino, T., Mizoguchi, T., and Mizuno, T. (2007). Arabidopsis clock-associated pseudo-response regulators PRR9, PRR7 and PRR5 coordinately and positively regulate flowering time through the canonical CONSTANS-dependent photoperiodic pathway. Plant Cell Physiol. 48:822–832.

Hu, J.Y., and Saedler, H. (2007). Evolution of the inflated calyx syndrome in Solanaceae. Mol. Biol. Evol. 24:2443–2453. Immink, R.G.H., Pose´, D., Ferrario, S., Ott, F., Kaufmann, K., Valentim, F.L., de Folter, S., van der Wal, F., van Dijk, A.D.J., Schmid, M., et al. (2012). Characterization of SOC1’s central role in flowering by the identification of its upstream and downstream regulators. Plant Physiol. 160:433–449. Isidore, E., van Os, H., Andrzejewski, S., Bakker, J., Barrena, I., Bryan, G.J., Caromel, B., van Eck, H., Ghareeb, B., de Jong, W., van Koert, P., et al. (2003). Toward a marker-dense meiotic map of the potato genome: lessons from linkage group I. Genetics 165:2107–2116. Janzen, D.H. (1976). Why bamboos wait so long to flower. Ann. Rev. Ecol. Syst. 7:347–391. Jones, P., Binns, D., Chang, H.Y., Fraser, M., Li, W., McAnulla, C., McWilliam, H., Maslen, J., Mitchell, A., Nuka, G., et al. (2014). InterProScan 5: genome-scale protein function classification. Bioinformatics 30:1236–1240. Judziewicz, E.J., Clark, L.G., London˜o, X., and Stern, M.J. (1999). American Bamboos (Washington and London: Smithsonian Institution Press). Katoh, K., and Standley, D.M. (2013). MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30:772–780. Kumar, M., Campbell, L., and Turner, S. (2016). Secondary cell walls: biosynthesis and manipulation. J. Exp. Bot. 67:515–531. Langmead, B. (2010). Aligning short sequencing reads with Bowtie. Curr. Protoc. Bioinformatics, Chapter 11:Unit 11.17. https://doi.org/10. 1002/0471250953.bi1107s32. Larkin, M.A., Blackshields, G., Brown, N.P., Chenna, R., McGettigan, P.A., McWilliam, H., Valentin, F., Wallace, I.M., Wilm, A., Lopez, R., et al. (2007). Clustal W and clustal X version 2.0. Bioinformatics 23:2947–2948. Li, L., Stoeckert, C.J., and Roos, D.S. (2003). OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 13:2178–2189.

12

Molecular Plant --, 1–13, -- 2019 ª The Author 2019.

Nawrocki, E.P., Kolbe, D.L., and Eddy, S.R. (2009). Infernal 1.0: inference of RNA alignments. Bioinformatics 25:1335–1337. € gi, A., Matsuba, C., Nicieza, A.G., and Palomar, G., Ahmad, F., Vasema Cano, J.M. (2017). Comparative high-density linkage mapping reveals conserved genome structure but variation in levels of heterochiasmy and location of recombination cold spots in the common frog. G3 (Bethesda) 7:637–645. Peng, Z., Lu, Y., Li, L., Zhao, Q., Feng, Q., Gao, Z., Lu, H., Hu, T., Yao, N., Liu, K., et al. (2013). The draft genome of the fast-growing nontimber forest species moso bamboo (Phyllostachys heterocycla). Nat. Genet. 45:456–461. Perrin, R.M., DeRocher, A.E., Bar-Peled, M., Zeng, W., Norambuena, L., Orellana, A., Raikhel, N.V., and Keegstra, K. (1999). Xyloglucan fucosyltransferase, an enzyme involved in plant cell wall biosynthesis. Science 284:1976–1979. Price, A.L., Jones, N.C., and Pevzner, P.A. (2005). De novo identification of repeat families in large genomes. Bioinformatics 21:i351–i358. Price, M.N., Dehal, P.S., and Arkin, A.P. (2010). FastTree 2— approximately maximum-likelihood trees for large alignments. PLoS One 5:e9490. Rambaut, A., Drummond, A.J., Xie, D., Baele, G., and Suchard, M.A. (2018). Posterior summarization in Bayesian phylogenetics using Tracer 1.7. Syst. Biol. 67:901–904. Rebollo, R., Karimi, M.M., Bilenky, M., Gagnier, L., Miceli-Royer, K., Zhang, Y., Goyal, P., Keane, T.M., Jones, S., Hirst, M., et al. (2011). Retrotransposon-induced heterochromatin spreading in the mouse revealed by insertional polymorphisms. PLoS Genet. 7:e1002301. Salome, P.A., Weigel, D., and McClung, C.R. (2010). The role of the Arabidopsis morning loop components CCA1, LHY, PRR7, and PRR9 in temperature compensation. Plant Cell 22:3650–3661.

Please cite this article in press as: Guo et al., Genome Sequences Provide Insights into the Reticulate Origin and Unique Traits of Woody Bamboos, Molecular Plant (2019), https://doi.org/10.1016/j.molp.2019.05.009

Reticulate Origin of Woody Bamboos Samach, A., Onouchi, H., Gold, S.E., Ditta, G.S., Schwarz-Sommer, Z., Yanofsky, M.F., and Coupland, G. (2000). Distinct roles of CONSTANS target genes in reproductive development of Arabidopsis. Science 288:1613–1616. Sima˜o, F.A., Waterhouse, R.M., Ioannidis, P., Kriventseva, E.V., and Zdobnov, E.M. (2015). BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31:3210–3212. Soderstrom, T.R. (1981). Some evolutionary trends in the Bambusoideae (Poaceae). Ann. Mo. Bot. Gar. 68:15–47. Soltis, D.E., Soltis, P.S., and Tate, J.A. (2003). Advances in the study of polyploidy since plant speciation. New Phytol. 161:173–191. Soreng, R.J., Peterson, P.M., Romaschenko, K., Davidse, G., Zuloaga, F.O., Judziewicz, E.J., Filgueiras, T.S., Davis, J.I., and Morrone, O. (2017). A worldwide phylogenetic classification of the Poaceae (Gramineae) II, an update and a comparison of two 2015 classifications. J. Syst. Evol. 55:259–290. Stamatakis, A. (2014). RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30:1312–1313. Stanke, M., and Waack, S. (2003). Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics 19:ii215–ii225.

Molecular Plant Van Os, H., Stam, P., Visser, R.G., and Van Eck, H.J. (2005). SMOOTH: a statistical method for successful removal of genotyping errors from high-density genetic linkage data. Theor. Appl. Genet. 112:187–194. Vanholme, R., Demedts, B., Morreel, K., Ralph, J., and Boerjan, W. (2010). Lignin biosynthesis and structure. Plant Physiol. 153:895–905. Voorrips, R.E. (2002). MapChart: software for the graphical presentation of linkage maps and QTLs. J. Hered. 93:77–78. Wang, H.Y., Cui, K., He, C.Y., Zeng, Y.F., Liao, S.X., and Zhang, J.G. (2015). Endogenous hormonal equilibrium linked to bamboo culm development. Genet. Mol. Res. 14:11312–11323. Wang, S., Zhang, L., Meyer, E., and Matz, M.V. (2009). Construction of a high-resolution genetic linkage map and comparative genome analysis for the reef-building coral Acropora millepora. Genome Biol. 10:R126. Wang, Y., Tang, H., Debarry, J.D., Tan, X., Li, J., Wang, X., Lee, T.H., Jin, H., Marler, B., Guo, H., et al. (2012). MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 40:e49. Wicker, T., Sabot, F., Hua-Van, A., Bennetzen, J.L., Capy, P., Chalhoub, B., Flavell, A., Leroy, P., Morgante, M., Panaud, O., et al. (2007). A unified classification system for eukaryotic transposable elements. Nat. Rev. Genet. 8:973–982.

Sungkaew, S., Stapleton, C.M., Salamin, N., and Hodkinson, T.R. (2009). Non-monophyly of the woody bamboos (Bambuseae; Poaceae): a multigene region phylogenetic analysis of Bambusoideae s.s. J. Plant Res. 122:95–108.

Wysocki, W.P., Clark, L.G., Attigala, L., Ruiz-Sanchez, E., and Duvall, M.R. (2015). Evolution of the bamboos (Bambusoideae; Poaceae): a full plastome phylogenomic analysis. BMC Evol. Biol. 15:50.

Tada, R., Zheng, H.B., and Clift, P.D. (2016). Evolution and variability of the Asian monsoon and its potential linkage with uplift of the Himalaya and Tibetan Plateau. Prog. Earth Planet. Sci. 3:4.

Wysocki, W.P., Ruiz-Sanchez, E., Yin, Y., and Duvall, M.R. (2016). The floral transcriptomes of four bamboo species (Bambusoideae; Poaceae): support for common ancestry among woody bamboos. BMC Genomics 17:384.

Tamura, K., Stecher, G., Peterson, D., Filipski, A., and Kumar, S. (2013). MEGA6: molecular evolutionary genetics analysis version 6.0. Mol. Biol. Evol. 30:2725–2729. The International Brachypodium Initiative. (2010). Genome sequencing and analysis of the model grass Brachypodium distachyon. Nature 463:763–768. Thomas, G.W.C., Ather, S.H., and Hahn, M.W. (2017). Gene-tree reconciliation with MUL-trees to resolve polyploidy events. Syst. Biol. 66:1007–1018. Trapnell, C., Pachter, L., and Salzberg, S.L. (2009). TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25:1105–1111. Trapnell, C., Roberts, A., Goff, L., Pertea, G., Kim, D., Kelley, D.R., Pimentel, H., Salzberg, S.L., Rinn, J.L., and Pachter, L. (2012). Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat. Protoc. 7:562–578. Triplett, J.K., Clark, L.G., Fisher, A.E., and Wen, J. (2014). Independent allopolyploidization events preceded speciation in the temperate and tropical woody bamboos. New Phytol. 204:66–73.  ic  , J., Lunn, J.E., Stitt, M., and Persson, S. (2018). Carbon Verbanc supply and the regulation of cell wall synthesis. Mol. Plant 11:75–94.

Yang, G.Q., Chen, Y.M., Wang, J.P., Guo, C., Zhao, L., Wang, X.Y., Guo, Y., Li, L., Li, D.Z., Guo, Z.H., et al. (2016). Development of a universal and simplified ddRAD library preparation approach for SNP discovery and genotyping in angiosperm plants. Plant Methods 12:39. Yang, Z.H. (2007). PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24:1586–1591. Yin, Y., Huang, J., and Xu, Y. (2009). The cellulose synthase superfamily in fully sequenced plants and algae. BMC Plant Biol. 9:99. Zachos, J., Pagani, M., Sloan, L., Thomas, E., and Billups, K. (2001). Trends, rhythms, and aberrations in global climate 65 Ma to present. Science 292:686–693. Zhang, J.Z., Nielsen, R., and Yang, Z.H. (2005). Evaluation of an improved branch-site likelihood method for detecting positive selection at the molecular level. Mol. Biol. Evol. 22:2472–2479. Zhao, H., Gao, Z., Wang, L., Wang, J., Wang, S., Fei, B., Chen, C., Shi, C., Liu, X., Zhang, H., et al. (2018). Chromosome-level reference genome and alternative splicing atlas of moso bamboo (Phyllostachys edulis). GigaScience 7:1–12.

Van de Peer, Y., Mizrachi, E., and Marchal, K. (2017). The evolutionary significance of polyploidy. Nat. Rev. Genet. 18:411–424.

Zhou, M., Xu, C., Shen, L., Xiang, W., and Tang, D. (2017a). Evolution of genome sizes in Chinese Bambusoideae (Poaceae) in relation to karyotype. Trees 31:41–48.

Van Ooijen, J.W. (2006). JoinMap 4.0: Software for the Calculation of Genetic Linkage Maps in Experimental Population (Wageningen, The Netherlands: Kyazma).

Zhou, M.Y., Zhang, Y.X., Haevermans, T., and Li, D.Z. (2017b). Towards a complete generic-level plastid phylogeny of the paleotropical woody bamboos (Poaceae: Bambusoideae). Taxon 66:539–553.

Molecular Plant --, 1–13, -- 2019 ª The Author 2019.

13