Multilocus data reveal deep phylogenetic relationships and intercontinental biogeography of the Eurasian-North American genus Corylus (Betulaceae)

Multilocus data reveal deep phylogenetic relationships and intercontinental biogeography of the Eurasian-North American genus Corylus (Betulaceae)

Journal Pre-proofs Multilocus data reveal deep phylogenetic relationships and intercontinental biogeography of the Eurasian-North American genus Coryl...

835KB Sizes 0 Downloads 67 Views

Journal Pre-proofs Multilocus data reveal deep phylogenetic relationships and intercontinental biogeography of the Eurasian-North American genus Corylus (Betulaceae) Tiantian Zhao, Guixi Wang, Qinghua Ma, Lisong Liang, Zhen Yang PII: DOI: Reference:

S1055-7903(18)30594-3 https://doi.org/10.1016/j.ympev.2019.106658 YMPEV 106658

To appear in:

Molecular Phylogenetics and Evolution

Received Date: Revised Date: Accepted Date:

18 September 2018 14 October 2019 17 October 2019

Please cite this article as: Zhao, T., Wang, G., Ma, Q., Liang, L., Yang, Z., Multilocus data reveal deep phylogenetic relationships and intercontinental biogeography of the Eurasian-North American genus Corylus (Betulaceae), Molecular Phylogenetics and Evolution (2019), doi: https://doi.org/10.1016/j.ympev.2019.106658

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

© 2019 Published by Elsevier Inc.

Multilocus data reveal deep phylogenetic relationships and intercontinental biogeography of the Eurasian-North American genus Corylus (Betulaceae) Tiantian Zhao, Guixi Wang, Qinghua Ma, Lisong Liang, Zhen Yang* Key Laboratory of Tree Breeding and Cultivation of the State Forestry and Grassland Administration, Research Institute of Forestry, Chinese Academy of Forestry, Beijing, China. * Correspondence: Zhen Yang [email protected] Abstract The evolutionary history of the genus Corylus, a tertiary disjunct lineage consisting of approximately 15~20 taxa with New and Old World distribution, has not been fully studied using molecular tools. In this research, we reconstructed comprehensive phylogenies of this genus using multiple datasets (genome-wide SNPs; complete chloroplast genomes; and nuclear ribosomal ITS sequences) based on detailed sampling of 17 Corylus species currently recognized. Divergence times were estimated using a fossil calibrated relaxed clock model, and ancestral area reconstruction were inferred using Bayesian binary MCMC (BBM) method and the dispersal–extinction–cladogenesis (DEC) model. Phylogenetic incongruences were detected from datasets, with nuclear SNP and ITS phylogenies supporting four major clades that correspond well with morphological traits, while chloroplast phylogeny revealed geographic partitioning. Recombination and introgressive hybridization played

important

roles

in

Corylus

diversification.

Molecular

dating

and

biogeographical analyses unambiguously revealed that Corylus originated in Southwest China during the middle Eocene. The westward migration of

Phyllochlamys (Clade C) and Colurnae (Clade D) and the uplift of Qinghai-Tibet Plateau drove the formation of European taxa, whereas the transoccanic migration crossing the Bering Land Bridge of Siphonochlamys (Clade B) and Phyllochlamys (Clade C) led to the occurrence of North American taxa. The topographic heterogeneity and climatic oscillations from Miocene to Pleistocene made East Asia the diversity center for Corylus. This study offers important insights into the phylogenetic relationships and biogeography history of the genus Corylus. Keywords Corylus; multilocus phylogeny; recombination; introgressive hybridization; molecular dating; biogeography inference 1. Introduction The historical biogeography of intercontinental disjunction among eastern Asia, European Mediterranean, and North America in the north temperate zones has long fascinated botanists and biogeographers (Deng et al., 2015; Kim et al., 2015; Semerikova et al., 2018). The geographic origin, potential migration routes, and divergence times of many disjunct taxa have been assessed based on fossil records and phylogenetic methods (Zeng et al., 2014; Barba-Montoya et al., 2018). For some ancient linages, it is suggested that vicariance events induced by the splitting of Gondwanaland are the most reasonable explanation for this disjunct distribution (Mao et al., 2012; Doyle et al., 2004). However, the formation of more young disjunctions is proposed to be associated with the fragmentation of the Arcto-Tertiary flora, caused by global climatic changes since the Tertiary as well as the breaking of two intercontinental land bridges: North Atlantic Land Bridge and Beringian Land Bridge (IckertBond and Wen, 2006; Kim et al., 2015; Wen, 2016). Besides, some studies have documented that long distance dispersal, possibly accomplished through ocean currents, winds, or animals, is also an important explanation for the intercontinental disjunctions of many plant taxa (Lo et al., 2014; Deng et al., 2015). Nevertheless, due

to the complex biological evolution in the Northern Hemisphere flora, there still remains much to explore about the formation history of the intercontinental disjunctions and the underlying mechanisms of species diversification. Corylus L. is a small genus of the subfamily Coryleae (Betulaceae), which is well known for its edible nuts and high-quality wood. This genus is easily identified by its characteristic fruits, with a large, oil-rich seed enclosed by a tubular, leafy or spinous involucre (Chen et al., 1999). The chromosome number of the genus is 2n = 2x = 22 (Thompson et al., 1996). The monophyly of Corylus has been confirmed by the phylogenetic inference of Betulaceae based on morphology and molecular fragments of the chloroplast rbcL gene and nuclear internal transcribed spacer regions (ITS) (Chen et al., 1999). Members of Corylus have been divided into two sections (Acanthochlamys and Corylus) based on morphological and anatomical characters (Whitcher and Wen, 2001; Bassil et al., 2013). In addition, this genus has been subject to several molecular phylogenetic studies, such as nuclear ITS sequence and chloroplast matK gene (Erdogan and Mehlenbacher, 2000; Whitcher and Wen, 2001), nSSRs and cpSSRs (Bassil et al., 2013). Interestingly, all the above studies have revealed strong phylogenetic incongruence between nuclear and chloroplast genes. By comparison, nuclear phylogenies were more congruent with morphological traits and interspecific hybridization for Corylus species (Erdogan and Mehlenbacher, 2000). However, due to incomplete taxon sampling and the low-resolution of traditional markers, some difficult issues including the taxonomic status of several species (e.g., C. wangii, C.fargesii), as well as relationships among species and clades are still unresolved. Corylus (15~20 species) exhibits a typical disjunctive distribution across the Northern Hemisphere, with approximately eleven species occurring in eastern Asia, three in North America, two in European Mediterranean, and one in the Himalayas (Whitcher and Wen, 2001; Bassil et al., 2013). Fossil records of Corylus are relatively abundant (Wolfe and Wehr, 1987; Takhtajan, 1982), and can be combined with DNA evidence for biogeographic inference. A biogeographic analysis based on ITS regions revealed the exchange between East Asia and North America, migration from East

Asia to European Mediterranean, and long distance dispersal from European Mediterranean to North America (Whitcher and Wen, 2001). Nevertheless, this biogeographic inference was performed using a poorly supported phylogeny reconstructed based on only 45 parsimony-informative sites. Phylogenomics uses genomic data to rebuild the evolutionary history of organisms (Delsuc et al., 2005), which is well suited for examining rapidly radiated clades (Ruhfel et al., 2014; Wickett et al., 2014) and hybridization events in diverse lineages (Sun et al., 2015). The non-recombination and uniparental inheritance properties of chloroplast genomes have allowed them to be successfully applied to reconstruct the phylogeny of plant taxa at the tribal, generic, or even species level (Xi et al., 2012; Barrett et al., 2014). Simultaneously, a streamlined restriction site associated DNA genotyping method based on sequencing the uniform fragments produced by type IIB restriction endonucleases, defined as 2b-RAD, has been increasingly applied to genotype a large number of single nucleotide polymorphisms (SNPs) across the whole genome (Wang et al., 2012). Thus, the 2b-RAD method is particularly suitable for phylogenetic studies of closely related taxa, as it also allows the detection of introgression (Díaz-Arce et al., 2016). In this research, we used multiple datasets (genome-wide SNPs, nuclear ITS region, and chloroplast genomes) to infer the phylogenetic relationships of universally recognized species of Corylus based on complete taxon sampling. The obtained phylogeny was employed as a framework to estimate divergence times and determine the biogeographic history of the genus. 2. Materials and Methods 2.1. Taxon sampling and DNA extraction We collected all species of Corylus based on the taxonomic treatment and geographic distribution proposed by Zhang et al. (2005), Bassil et al. (2013), and Flora of China (Fu et al., 1999). A total of 42 accessions representing 17 Corylus species were collected from their natural populations in the world. Each species was represented by at least two accessions with the exception of C. kweichowensis var.

brevipes (Table 1). Since genus Ostryopsis was revealed as sister to Corylus in previous studies (Chen et al., 1999; Zhu et al., 2017), we also sampled three accessions of Ostryopsis davidiana as the outgroups. Voucher specimens were deposited in the Nonwood forest research lab of Research Institute of Forestry Chinese Academy of Forestry. Total genomic DNA was extracted from silica-dried leaves using a Plant Genomic DNA kit (Tiangen Biotech, Beijing, Co., Ltd.) following the manufacturer’s specifcations. The quality and quantity of DNA were assessed by Qubit 2.0 fluorometer (Invitrogen, Thermo Fisher Scientific, Waltham, Massachusetts, USA). 2.2. 2b-RAD library preparation, sequencing and genotyping The 2b-RAD libraries were prepared for all accessions following the method of Wang et al. (2012). The concentration of DNA was adjusted to 200 ng/μL. The high-quality DNA (200 ng) from each accession (n=45) was digested with 2 U of the type IIb restriction endonucleases BsaXI (New England BioLabs) at 37°C for 1 h, generating a pool of fragments with uniform length. A comparable amount of DNA was digested simultaneously to check the digestion performance by 1% agarose gel electrophoresis. Then the digested products were ligated to two library-specifc adaptors, and the ligation reaction was performed at 16°C for 1 h. The 2b-RAD tags were subjected to PCR amplification with two pairs of primers for 1~16 cycles and to incorporate specific barcodes and the annealing sites. Individual libraries were pooled in equimolar amounts and run on agarose gel electrophoresis to exclude primer dimers. The purification of each pool was conducted using the SPRIselect purification kit (Beckman Coulter, Pasadena, California, USA) to reserve only the targeted restriction fragments. All the 45 purified libraries were sequenced on an Illumina HiSeq X-Ten platform by paired-end sequencing (2×150 bp) at Shanghai OE Biomedical Technology Co., Ltd. (Beijing, China), which also performed data demultiplexing. The quality of raw reads was preliminarily verified with FastQC software (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Then, a custom Perl scripts described by Pecoraro et al. (2016) was run for subsequent reads filtering. Raw reads were first trimmed to remove adaptor sequences. The terminal 3 bp positions

were excluded from each read to eliminate artefacts that might have arisen at ligation sites. Reads with no restriction sites or containing ambiguous base calls (N), long homopolymer regions (>10 bp), excessive numbers of low quality positions (>5 positions with quality of <10) were removed. According to the genotyping strategy of RADtyping, the remaining trimmed, high-quality reads were aligned against the known BsaXI sites in the Betula nana genome (EMBL accession number ERP001867) (Wang et al., 2013) using SOAP2 software (the parameter is: -V 2 -M 4 -R 0) (Li et al., 2009). Genotypes were assigned to sites using maximum-likelihood (ML) approach. To ensure the accuracy of genotyping, SNPs were further filtered using the RADtyping program (Fu et al., 2013) under the following criteria: polymorphic loci with more than two alleles were discarded; only one bi-allelic SNP at each locus was retained; SNPs occurred in at least 80% of the individuals; the minor allele frequency (MAF) ≥ 0.05; and the threshold of missing loci per individual was set as 30%. To assign genotypes at heterozygous sites, we randomly selected one of two alternate alleles. 2.3. Chloroplast genome sequencing, assembly and annotation Chloroplast genome sequencing was conducted using an Illumina HiSeq 2500-PE125 platform with massively parallel sequencing Illumina technology at Novogene Bioinformatics Technology Co., Ltd. (Beijing, China). A-tailed fragments ligated to paired-end adaptors and PCR amplified with a 500 bp insert and a mate-pair library with an insert size of 5 kb were used for the library construction. PCR adaptor reads and low quality reads were filtered with the quality control pipeline. Then, high-quality paired reads were qualitatively assessed and assembled using SPAdes 3.6.1 (Bankevich et al., 2012). We further screened contigs using the BLAST program (Altschul et al., 1990), with C. heterophylla plastome (KX822769.2) as reference sequence. These contigs were then assembled using Sequencher v5.4 with default parameters. Small gaps between contigs in the assemblies were filled by amplification with PCR-based Sanger sequencing. In addition, specific primers were also designed to verify junctions of single copy regions (SCs) and inverted repeats (IRs) through PCR amplification in each chloroplast genome. PCR amplification was performed on

a SimpliAmp Thermal Cycler (Applied Biosystem, USA) in 20 µL reaction system. The PCR program began with 4 min initial denaturing at 94ºС followed by 35 cycles of 1 min denaturation at 94 ºС, 1 min annealing at abovementioned Tm, and 1.5 min extension at 72ºС, a final extension was run for 5 min at 72ºС. PCR products were analysed in both directions with the same primers using an ABI 3730xl automated sequencer (Applied Biosystems, USA). We annotated the chloroplast genomes using DOGMA (Wyman et al., 2004) with subsequent manual correction in Geneious 8.1 (Kearse et al., 2012) by aligning with homologous genes in C. heterophylla plastome. 2.4. Nuclear ITS processing The ITS1-ITS4 universal primers (White et al., 1990) were used to obtain complete ITS sequences from both directions. PCR amplification was performed in a 20 µL volume that contained 10-40 ng of plant DNA, 2.0 µL of dNTPs (10 mM), 1.0 µL of MgCl2 (25 mM), 0.8 µL of each primer, 2 µL of 10× PCR buffer, 0.2 µL of Taq DNA polymerase (5 U µL-1) (Biotech International) and 11.2 µL of sterile water. The following cycling parameters were set in the PCR amplifications: an initial 4 min denaturation at 94°C, which was followed by 10 cycles of 30 s at 94°C, 30 s of annealing at 60°C, and a 2 min extension at 72°C, followed by 26 cycles of 30 s at 94°C, 30 s annealing at 50°C, and a 2 min extension at 72°C, with a final extension for 10 min at 72°C. PCR products were purified by a QIAquick PCR purification kit (BioTeke, Beijing, China) according to the manufacturer’s instructions and then sequenced with the automatic DNA sequencer ABI 3730xl (Applied Biosystems, USA). To avoid the generation of heterozygous loci during sequencing, at least five cloned PCR products were randomly selected and sequenced to include the full ITS sequences from the donor species. 2.5. Recombination assessment Phylogeny inference using SNPs can be severely distorted by recombination either between individuals within the dataset or with an unobserved individual. Similarly, phylogenetic discordance among different plastome regions may also imply interspecific recombination (Martin et al., 2015). To detect potential recombination among different ecotypes, we employed RDP4 (Recombination Detection Program

v.4.7; Martin et al., 2015) to implement recombination tests prior to performing phylogenetic analyses. This program can simultaneously identify the location of breakpoints as well as the most likely recombinant and parental sequences, which has been widely applied in nucleotide recombination test of organisms (plants, virus, bacteria, and animals) (Sullivan et al., 2017; Pfeil et al., 2017; Saleem et al., 2016; Kelly et al., 2009). The global alignments of three datasets (genome-wide SNP matrix, whole chloroplast genome (CPG) matrix, and ITS matrix) were input into RDP4 for recombination analysis. Recombination detection was conducted with seven algorithms: the RDP (Martin and Rybicki, 2000), GENECOV (Padidam et al., 1999), BootScan (Salminen et al., 1995), MaxChi (Smith, 1992), Chimaera (Posada and Crandall, 2001), SiScan (Padidam et al., 1999), and 3Seq (Boni et al., 2007). Recombinants were confirmed by more than four methods for authenticity. 2.6. Phylogenetic analyses We performed independent phylogenetic analysis for each dataset using ML and Bayesian inference (BI) methods. For the RAD data, we evaluated the phylogenies based on two matrices: the COMPLETE dataset that included all SNP loci, and the NO-RECOMBINATION dataset that excluded recombination loci. ML analysis was conducted with IQ-TREE (Nguyen et al., 2015) using 1000 replicates of ultrafast bootstrapping (UFBoot) (Quang et al., 2013) and 1,000 bootstrap replicates of the Shimodaira/Hasegawa approximate likelihood-ratio test (SH-aLRT) (Guindon et al., 2010). The best substitution model was selected using the ModelFinder program implemented in IQ-TREE based on the Bayesian information criterion. BI analysis was performed with MrBayes v.3.2.6 (Ronquist et al., 2012). Two independent runs were performed using the GTR+G model and random starting trees. Each one was run for two million generations, with four Markov chains under the default heating settings and sampling every 100 generations. Stationarity was considered to be reached when the average standard deviation of split frequencies was < 0.01. The first 25% of generations were discarded as burn-in, and the remaining trees were used to construct a majority-rule consensus tree and estimate the posterior probabilities (PP). All resulted trees were visualized using FigTree v1.4 (Rambaut, 2012).

For the chloroplast data, 26 plastome sequences were used to reconstruct the phylogeny, including 17 ingroup and nine outgroup plastomes (Table S2). Four structural regions: whole chloroplast genome (CPG) sequence, large single copy (LSC) region, small single copy (SSC) region, and inverted repeat (IR) region, were separately aligned using MAFFT v.7.031b (Katoh and Standley, 2013); poorly aligned regions were removed using Gblocks v.0.91b (Talavera and Castresana, 2007). The phylogenetic analyses were performed using the same methods as those described above, and were also applicable to the ITS dataset. 2.7. Neighbor-Net analysis Previous attempts indicated that the interspecific hybridization potential and genetic diversity of wild Corylus species were high (Thompson et al., 1996; Erdogan, 1999; Erdogan and Mehlenbacher 2000), especially for sympatric or parapatric species, the interspecific gene flow caused by introgressive hybridization may affect the evolutionary history of the genus. To visually illustrate the global reticulate signals and evaluate possible phylogenetic conflicts, Neighbor-Net analysis was conducted with SplitsTree 4 (Huson and Bryant, 2006). This method utilizes sequence data to generate an unrooted network, which can reveal complicated evolutionary processes such as hybridization and recombination in a precise way. In this analysis, the outgroup taxa were excluded to better show character conflicts among ingroup. Splits were created from uncorrected-p distances and visualized as a neighbor net on which each node represented a species accession. 2.8. Time-calibrated species tree In addition to phylogenetic analyses, we also estimated a time-calibrated species tree under a multispecies coalescent model using the StarBEAST template implemented in Beast 2.4.8 (Bouckaert et al., 2014). Because ITS phylogeny did not provide enough resolution in two species complexes, and plastome phylogeny mainly reflected geographical differentiation, we thus calibrated the NO-RECOMBINATION tree, which represented the optimal phylogenetic inference (see result of phylogenetic analyses). StarBEAST requires that multiple accessions of each species be collapsed down to “taxon sets”, assigned to be monophyletic, and then infers a species tree with

these taxon sets as terminals. We applied an uncorrelated lognormal relaxed clock, a Yule speciation process to model the tree prior, and the GTR substitution model set internally in BEAST 2.4.8. To calibrate the species tree, two fossils were used as calibrations: (1) the most recent common ancestor (MRCA) of the extant genus Corylus was dated to the middle Eocene (37.0-49.0 Ma) based on the oldest fossil fruits discovered in the Republic Flora of Northern Washington that can unequivocally be assigned to Corylus (Wolfe and Wehr, 1987; Pigg et al., 2003). The prior of this constraint was assigned a normal distribution with a standard deviation of 3.5 Ma. (2) abundant fossil fruits resembling the modern C. colurna or C. chinensis imply a rough time that section Colurnae had diverged into different taxa by the Miocene, but no reliable fossil calibration had yet been defined. An alternative estimation of nucleotide substitution rates based on Tamura-Nei distances and fossil evidence revealed a relatively accurate time that this group began to diverge between 8.74 Ma and 10.9 Ma (Takhtajan, 1982). This constraint was used to calibrate the crown node of section Colurnae and also assigned a normal distribution with a standard deviation of 0.4 Ma. BEAST was run for a total of 100 million generations with a sampling frequency of 1000 generations. The adequacy of parameters was checked using Tracer v1.6, noting effective sample size values > 200. The first 10% of trees were discarded as burn-in. Then, the time-calibrated species tree was summarized as a Maximum Clade Credibility (MCC) tree using TreeAnnotator v2.4.8 (Rambaut and Drummond, 2014). Divergence times with 95% highest posterior density intervals were visualized in FigTree v1.4 (Rambaut, 2012). 2.9. Ancestral area reconstruction To reconstruct the broad-scale biogeographical history of Corylus, we coded the distribution of each extant species as a character with nine states according to floristic regions defined by Zhang et al. (2005) and Bassil et al. (2013) (Table 1): A, Northeast Asia; B, Korean peninsula and the Japanese archipelago; C, Central Plains and Qinling Mountains of China; D, Central and East China; E, Southwest China; F, the Himalayas; G, European-Mediterranean region; H, eastern North America; I, western North America. We estimated ancestral areas based on the collapsed MCC tree from

the NO-RECOMBINATION dataset using two approaches: the Bayesian binary MCMC (BBM) method and the dispersal–extinction–cladogenesis (DEC) model, which were both implemented in RASP 4.0 (Yu et al., 2015). For the BBM analysis, we ran 10 million generations on four chains, sampled every 2,000 generations, used F81 as the state frequency, and specified a gamma distribution for the among-site rate variation. For the DEC analysis, the probability of dispersal between areas was modelled as equal, and all values in the dispersal constraint matrix were set to 1. In both analyses, we presented only the most likely status (MLS) in the center of pie for each node. 3. Results 3.1. Characteristics of the datasets A total of 204,761,295 clean reads were generated from the 45 RAD libraries, with an average of 4,550,251 reads per sample (Table S1). The average sequencing depth was 57.24×. Overall, approximately 60.40% of the high-quality reads for each sample were uniquely mapped onto the Betula nana genome. Finally, a total of 4,894 unlinked SNPs were genotyped. The RAD data have been submitted to the Sequence Read Archive database of the NCBI, under accession number SRP150981. Sixteen plastomes were newly sequenced and were deposited in GenBank with accession numbers MH628446-MH628448 and MH628450-MH628462 (Table S2). The genome size, structure, and gene content were similar to those of previously published Corylus plastomes (Hu et al., 2016; Yang et al., 2018). Integrating ten other Betulaceae plastomes, the global alignments of four structural regions were conducted. Overall, four partitioned alignments generated an aligned length of 161,297 bp, 90,101 bp, 18,976 bp, and 26,120 bp for the CPG, LSC, SSC, and IR dataset, respectively (Table 2). Among the aligned sites, 4,974 parsimony-informative sites (PICs) were detected in the CPG alignment, of which 3,617, 974, and 204 PICs were found within the LSC, SSC and IR alignment, respectively. The final ITS matrix had 43 sequences and an aligned length of 615 bp, in which 52 PICs were detected. 3.2. Recombination assessment

Within the genome-wide SNP matrix, the seven methods revealed three putative recombination events (Figure 1). The significance level of each recombination event supported by single test method was shown in Table 2. The first two events occurred in Clade B (Figure 1, 2), with all individuals detected as recombinants. Event 1 was supported by five of the seven methods. C. ferox var. thibetica from Clade A and C. kweichowensis var. brevipes from Clade C were identified as the major parent and minor parent, respectively. The ML breakpoints in Event 1 were located within the 2,169-2,661 loci. Event 2 was supported by six of the seven methods, with the breakpoints ranging from 3,673 to 4,597 loci. The recombinant segments originated from the major parent C. ferox var. thibetica of Clade A and the minor parent C. jacquemontii of Clade D. Event 3 occurred in Clade D, with three of the five species (C. chinensis, C. avellana and C. jacquemontii) identified as recombinants. Approximately 200 loci (4,165-4,395) may have originated from two unknown parents similar to C. ferox var. thibetica (minor parent) and C. heterophylla (major parent). By contrast, no recombination was detected in ITS matrix and CPG alignment, with only few micro-recombination events were revealed to be associated with outgroup. 3.3. Phylogenetic analyses and incongruence evaluation Altogether, ML and BI analyses revealed almost identical trees from each dataset. For the RAD data, the COMPLETE phylogeny revealed four well-supported clades (A-D) within the ingroup and resolved the species relationships within clades (Figure 2). Apparently, the four clades were divided into two large lineages, in which Clade A and B, Clade C and D separately formed sister clades. Clade A (SH-aLRT/UFboot/PP: 100/100/1) consisted of three Chinese species: C. ferox, C. ferox var. thibetica and C. wangii. Clade B (100/100/1), a species complex, included four shrub species distributed disjunctively between North America and European-Mediterranean region. Clade C (99.5/99/1) comprised five morphologically similar shrubs: C. americana, C. yunnanensis, C. heterophylla, C. kweichowensis, and C. kweichowensis var. brevipes. The latter four species of Chinese origin formed the C. heterophylla complex, while C. americana of North American origin formed sister to this complex. Clade D

(88.7/73/1) was a multi-geographic origin group of five geographically isolated species, including C. fargesii and C. chinensis from China, C. jacquemontii from the Himalayas, and C. colurna and C. avellana from European-Mediterranean region. The topology of NO-RECOMBINATION phylogeny was basically consistent with the COMPLETE topology, which also supported the division of Corylus into four major clades. However, visible differences were discovered to lie in the phylogenetic position of C. avellana which was classified into Clade D in the COMPLETE phylogeny (Figure 2) but into Clade C in the NO-RECOMBINATION phylogeny (Figure 3). Although the ITS phylogeny revealed four distinct clades that corresponded uniformly to that in the NO-RECOMBINATION phylogeny, topological differences were still observed (Figure S1; Figure 3). Clade A was sister to the remainder of the genus Corylus in the ITS phylogeny rather than sister to Clade B in the NO-RECOMBINATION phylogeny. Clade A and B were well supported by ultrafast bootstraps and posterior probabilities. However, Clade C and D were weakly supported, with very short, comb-arranged branches and low UFboot/PP values assigned to these subclades. Obviously, the ITS region had low resolution in the two species complexes. The topological consistency and support rate of four phylogenies inferred from the chloroplast data sets were evaluated. Overall, the CPG dataset and LSC dataset revealed identical topologies and both identified five highly supported clades (A-E) (Figure 4, Figure S2). In the SSC phylogenetic tree (Figure S3), although the stem group of Corylus and several innernodes were weakly or moderately supported, it indeed inferred a similar topology with that of the CPG and LSC datasets. By comparison, the IR phylogeny did not fully resolve the phylogenetic relationships of Corylus, with many innernodes weakly supported by both ultrafast bootstraps and posterior probabilities, especially two tree clades (Clade C and Clade D) were not clearly distinguished (Figure S4). The low levels of support in the SSC and IR phylogenies are probably caused by the limited parsimony-informative sites in the two regions (Table 3). Thus, the following statement is largely based on the CPG dataset

since it revealed distinct clades with high support. Clade A was located in the basal position of the ingroup, consisting of three sympatric or parapatric species (C. ferox, C. yunnanensis, and C. wangii) in Southwest China, Central Plains and Qinling Mountains of China, and Central and East China. Three American-origin species (C. americana, C. cornuta, and C. californica) were included in Clade B. Clade C was a monophyly constituted by C. jacquemontii of the Himalayas. Clade D comprised two European-Mediterranean species (C. colurna and C. avellana), three Central Plains and Qinling Mountains of China, and Southwest China species (C. ferox var. thibetica, C. fargesii, and C. chinensis). In Clade E, two sympatric species (C. mandshurica and C. heterophylla) in Northeast Asia and their parapatric species (C. sieboldiana) in the Korean peninsula and the Japanese archipelago grouped into a common subclade, with two parapatric species (C. kweichowensis and C. kweichowensis var. brevipes) in the Central Plains and Qinling Mountains of China forming the sister subclade. Compared to RAD and ITS phylogenies, chloroplast phylogeny revealed a distinctly different topology by dividing Corylus into several clades corresponding largely to geographic

distribution

rather

than

authentic

phylogenetic

relationships.

Nucleo-cytoplasmic discordance can be specially shown in the mirrored cladograms derived from CPG dataset and collapsed NO-RECOMBINATION dataset (Figure 5). Note that the phylogenetic positions of many taxa varied greatly between trees inferred from different sources of evidence. 3.4. Neighbor-Net analyses Neighbor-Net tree of the genome-wide SNP matrix revealed apparent reticulate signals that were related to the introgressive hybridization among species (Figure 6A), especially between C. wangii and C. ferox, C. chinensis and C. avellana. By contrast, few conflicting signals were found in the chloroplast and ITS Neighbor-Net trees (Figure 6B, 6C). Overall, network relationships elucidated by each dataset were predominantly tree-like and highly consistent with their respective phylogenetic tree. 3.5. Time-calibrated species tree The StarBEAST analysis (Figure 7) based on the NO-RECOMBINATION dataset supported the tree topology of the collapsed ML/BI phylogenetic

reconstruction (Figure 5A). The ancestor of Corylus and Ostryopsis was estimated to have existed before 53.9 Ma (95% HPD: 78.7-36.2 Ma), when the two sister genera first diverged. The origin of Corylus and the split between Clade A+B and Clade C+D were predicted to occur during the middle Eocene approximately 40.7 Ma (95% HPD: 47.6-33.7 Ma). The divergence of Clade A and B was estimated to occur during the late Eocene (36.2 Ma; 43.2-15.5 Ma), while the divergence between Clade C and D occurred during the late Oligocene (24.5 Ma; 33.7-13.7 Ma). The initial ancestors of the four clades began to appear throughout the Miocene, with rapid diversification occurring from the middle Miocene to Pliocene. The earliest divergence of four extant clades occurred in Clade A, with the split between C. wangii and section Acanthochlamys estimated to be 17.3 Ma. In Clade B, the intercontinental divergence between North America and eastern Asia occurred around 15.8 Ma, and the split within each subclade occurred approximately 9.5 Ma. Species diversification in Clade C was gradual, with C. avellana and C. americana successively split approximately 19.3 and 14.8 Ma, whereas the speciation within the C. heterophylla complex occurred recently (11.27-5.06 Ma). In Clade D, C. chinensis and C. fargesii separated successively at 9.9 and 7.9 Ma, while the split between C. colurna and C. jacquemontii was estimated to be 6.4 Ma. 3.6. Biogeographical history Results from BBM and DEC analyses revealed similar biogeographical patterns (Figure 8, Figure S5), with both the models suggesting Southwest China as the most likely ancestral area of Corylus. Therefore, we only presented the result of BBM inference. A mixture of dispersal and vicariance events seemed to have shaped the current distribution of Corylus. Combined with the molecular dating analysis (Figure 7),

two

independent

long-distance

dispersals

from

East

Asia

to

the

European-Mediterranean region (E-G) were observed, resulting in the speciation of C. colurna and C. avellana in the early Miocene (~17.87 Ma) and early Pliocene (~4.74 Ma), respectively. By comparison, another long-distance dispersal route across the Bering land bridge (E-C-A-H) was responsible for the occurrence of C. cornuta (Clade B) and C. americana (Clade C) in North America. Moreover, these two clades

arrived in North America at nearly the same time during the middle Miocene (~15.69 and 13.73 Ma, respectively). Within East Asia, three important dispersals (E-C-A-B; E-D; and E-F) and subsequent vicariance led to the colonization of Corylus in mainland China, Korean peninsula and the Japanese archipelago, and the Himalayas. 4. Discussion 4.1. Topology incongruence: recombination and introgressive hybridization Incongruence among the nuclear 2b-RAD, nuclear ribosomal ITS, and chloroplast datasets is found at the species level, especially between the nuclear and organellar phylogenies. Overall, sources of incongruence in nuclear datasets partly come from the recombination events within genome-wide SNPs (Figure 1), and partly result from limited parsimony-informative sites in the ITS region (Table 3). In general, phylogenetic inference assumes that no recombination occurs among nucleotide sequences. However, recombination may occur spontaneously in nature and invisibly bias phylogenetic analyses to result in inaccurate results (Martin et al., 2015; Kiil and Østerlund, 2018). RAD sequencing, sampling numerous SNPs from the nuclear genome, is very likely to capture recombination events (Martin et al., 2015), whereas the plastome is uniparentally inherited and scarcely experiences any recombination following hybridization (Birky, 1995, Jansen and Ruhlman, 2012). However, evidence for plastome recombination has been verified in various genera such as Cycas (Huang et al., 2001), Picea (Sullivan et al., 2017), and Lachemilla (Morales-Briones et al., 2018). In our recombination tests, the SNP matrix showed strong recombination signals, whereas no evidence of recombination was found in CPG matrix and ITS matrix. This phenomenon is generally associated with the inheritance pattern of different datasets. Anyway, topological differences among nuclear phylogenies (COMPLETE, NO-RECOMBINATION, and ITS) (Figure 2, 3) were visible but did not create major conflict because they all revealed similar clades that correspond well to morphological cognition. Of these three, the genealogical relationships within or among clades inferred from NO-RECOMBINATION dataset are more closely aligned with the traditional taxonomy and biogeography of our ingroup. We also inferred the

phylogenetic relationships of Corylus based on four structural regions of chloroplast genome. The results showed that CPG and LSC phylogenies exhibited high levels of support in most nodes (Figure 4, Figure S2), while IR phylogeny was severely influenced by its highly conservative domain (containing only 204 PICs) to produce very short branches and low levels of support for many nodes (Figure S4). In spite of this, these phylogenetic inferences were not essentially in conflict because they just reflected different resolutions of different chloroplast regions for Corylus species depending on their structure variation. In contrast, the striking incongruence is that chloroplast relationships of Corylus are largely inconsistent with morphological taxonomy, and especially with relationships inferred from nuclear RAD and ITS datasets (Figure 5, 7), because their phylogenies show apparent geographic clustering of chloroplast clades across taxonomic groups (Figure 4, S2-4). Nucleo-cytoplasmic discordance has been well documented in previous studies of advanced plants (Bonnet et al., 2017). Three hypotheses may explain this incongruence: (1) convergent evolution of shared chloroplast sequences; (2) incomplete lineage sorting of ancestral polymorphisms; and (3) introgressive hybridization (Degnan and Rosenberg, 2009; Acosta and Premoli, 2010). Convergent evolution occurs when independent species evolve in the same direction and thus independently acquire similar characteristics. This hypothesis is impossible because of the high differentiation within each chloroplast clade yet similar characteristics among clades (Figure 4, S2-4). Due to the stochastic nature of coalescence, incomplete lineage sorting may yield random patterns of interspecific relationships (Buckley et al., 2006) and thus might lead to discordance among gene and species trees. In the present study, incomplete lineage sorting cannot account for the incongruence because this process should result in a random pattern across the geographic range of taxa (Avise, 2004), but in Corylus, the species clusters displayed a strong geographic pattern. Accordingly, this phenomenon may be well explained by chloroplast capture, the plastid introgression from one plant species into another following hybridization and backcrossing. Introgression-induced chloroplast capture events have been discovered frequently in sympatric or parapatric plant taxa (Soltis

and Kuzoff, 1995; Tsitrone et al., 2003; Delgado et al., 2007; Acosta and Premoli, 2010), and often occur in the absence of detectable nuclear introgression (Soltis and Soltis, 1995; Fehrer et al., 2007). In plants, chloroplast introgression is relatively easier than that of nuclear genome probably due to its maternal inheritance, low influence of selection, and free of recombination (Yi et al., 2015; Martinsen et al., 2001; Avise, 2004). Hence, although the network tree inferred from the SNP dataset revealed extensive gene flow and reticulate signals which represent the nuclear introgression (Figure 6), the process of chloroplast introgression will be another scenario that reflects the heterospecific origin of chloroplast genomes. When a Corylus species colonized the region of another native species, hybridization and subsequent introgression may soon result in most of offspring containing the whole chloroplast genome of the donor species as well as most nuclear DNA of the native species. Multiple chloroplast captures within Corylus may be facilitated by the sympatric distribution of closely related species, and frequent colonization events through long distance dispersal of nuts by birds or vertebrates. The significant result is that chloroplast phylogeny is correlated with geographic distribution rather than taxonomic relationships (Acosta and Premoli, 2010), which is consistent with the pattern we observed in Corylus. Thus, the evidence presented here led us to confirm that chloroplast phylogeny unequivocally deviated from the most likely phylogenetic history, inferred from the nuclear genome. 4.2. Relationships among major clades and taxonomic implications Taxonomy of Corylus has undergone several revisions since the mid-19th century, with different taxonomists emphasizing certain morphological characters (Kasapligil, 1972; Krussmann, 1976; Huxley et al., 1992) or molecular markers (Erdogan and Mehlenbacher, 2000; Whitcher and Wen, 2001). However, none of them have performed a comprehensive evaluation on the phylogenetic relationships due to two major reasons: incomplete taxon sampling and a lack of effective molecular markers. In this study, we sampled almost all the extant Corylus species proposed by Zhang et al. (2005) and Bassil et al. (2013), including several species that were rarely studied. Compared to single loci (e.g., matK and rbcL) and genome-skimming markers (e.g.,

nSSRs and cpSSRs), the integrated genomic data of genome-wide SNPs and chloroplast genomes undoubtedly have higher resolution in species identification of Corylus. Comparative analyses indicate that NO-RECOMBINATION phylogeny is to our knowledge the optimal phylogenetic inference to divide Corylus into four major clades that corresponded roughly to four sections (Acanthochlamys, Siphonochlamys, Phyllochlamys, and Colurnae) proposed by previous phylogenetic assignments (Bassil et al., 2013; Whitcher and Wen, 2001; Thompson et al., 1996). The reason why we do not uphold the COMPLETE phylogeny is that C. avellana is unseemly assigned into section Colurnae (Clade D), which is not conform to traditional cognition of putting it into section Phyllochlamys (Clade C). Moreover, we validate that this inconformity is mainly caused by recombinant SNP loci in the COMPLETE dataset. C. avellana is expected to be included in section Phyllochlamys because its husk and nut characteristics are most similar to the wild species (e.g. C. americana and C. heterophylla complex) in this section. Besides, C. avellana is reported to hybridize easily with C. americana and C. heterophylla, suggesting a close affinity (Erdogan, 1999). Above all, three kinds of evidence (NO-RECOMBINATION phylogeny, morphology and phenology, and interspecific hybridization) indicate that these species are closely related. Thus, the following discussion will elaborate each of the four sections based on the NO-RECOMBINATION phylogeny. 4.2.1. Section Acanthochlamys-Clade A C. ferox and its variety C. ferox var. thibetica are invariably placed in section Acanthochlamys. Besides, C. wangii, a unique species native to Weixi County in China, was also included in this section. Although C. ferox and the variety were not separated in the molecular phylogeny, stable variation in leaf morphology may be an effective morphological marker for distinguishing them, with the native species and variety separately having broadly ovoid and narrow leaves. C. wangii was rarely included in molecular studies, which inevitably led to controversial classification. A recent research based on nSSR STRUCTURE and ITS phylogeny for Chinese Corylus revealed that genetic components of C. wangii were derived from C. ferox and C. yunnanensis, proposing firstly that C. wangii originated from the sympatric

hybridization of the latter two (Lu, 2017). Similarly, nuclear and chloroplast phylogenies at the genomic level in our research jointly revealed close relationships among these three species. (Figure 2-4), which may also imply the possibility of hybrid origin of C. wangii. In future, population studies may contribute to test this hypothesis. 4.2.2. Section Siphonochlamys-Clade B Section Siphonochlamys is a typical species complex constituted by two East Asian species (C. mandshurica and C. sieboldiana) and two North American species (C. cornuta and C. californica). The robust evolutionary relationships within this clade have been repeatedly confirmed (Erdogan and Mehlenbacher, 2000; Whitcher and Wen, 2001; Bassil et al., 2013). Despite the intercontinental distribution of these species, they are highly similar in morphology, especially the tubular and beaked husks. The low levels of sequence divergence indicate recent diversification within this species complex. Although the intercontinental divergence between East Asia and North America is detected by RAD phylogeny, genetic relationships within each subclade are still controversial. Within the East Asian subclade, C. mandshurica was once recognized as a botanical variety of C. sieboldiana (Thompson et al., 1996), while recent classification listed it as a distinct species (Whitcher and Wen, 2001; Zhang et al., 2005). Considering the geographic distribution of each taxon and dispersal routes inferred from biogeographical analysis, we support C. mandshurica as the native species and C. sieboldiana as a variety or distinct species after long-term isolation. Within the North American subclade, C. californica, with its narrow geographical distribution in West North America, is often viewed as a botanical variety (Thompson et al., 1996; Huxley et al., 1999) of C. cornuta, a cosmopolitan species in East North America. Some taxonomists, however, treated it as a distinct species (Krussmann 1976; Erdogan and Mehlenbacher, 2000). According to the speciation hypothesis between Eurasia and North America, Beringian Land Bridge and North Atlantic Land Bridge have played significant roles. Nevertheless, the Miocene origin of the North American subclade highlights the significance of Beringian Land Bridge but excludes the function of North Atlantic Land Bridge

because the latter no longer existed after the Oligocene. Differentiation between C. cornuta and C. californica occurred after their Asian ancestor crossing BLB to North America, probably triggered by the rise of the Sierra Nevadas and Coastal Ranges. 4.2.3. Section Phyllochlamys-Clade C Section Phyllochlamys comprises the Northeast Asian C. heterophylla complex, the European-Mediterranean C. avellana, and the eastern North American C. americana. The high degree of morphological similarity has led to unresolved relationships among these species, especially within the C. heterophylla complex. Traditional molecular markers are validated to have low resolution in identifying species of this section. However, RAD phylogeny successfully detected the intercontinental divergence, with C. avellana, C. americana, and the C. heterophylla complex separately forming an independent subclade (Figure 3, 5A). Species delimitation within the C. heterophylla complex is difficult due to relatively little genetic variation caused by recent divergence and frequent gene flow. Hence, several taxonomists have placed this complex at various taxonomic levels, and with different names. C. kweichowensis and C. yunnanensis were both recognized as the varieties of C. heterophylla in early research (Thompson et al., 1996) but identified as distinct species more recently (Ma et al., 2014). Additionally, C. kweichowensis var. brevipes, a newly defined variety with a marked brachypodous trait, was also proposed (Zhang et al., 2005). Anyway, unstable interspecific relationships practically reflect the incomplete differentiation within this complex. For practical considerations, a population genetic strategy should be applied to investigate the deep relationships. 4.2.4. Section Colurnae-Clade D Section Colurnae consists of four tree species: C. colurna, C. jacquemontii, C. chinensis, and C. fargesii. The close relationships of the former three species are supported by multiple data, including morphological traits and hybridization characteristics (Erdogan, 1999), nSSRs (Bassil et al., 2013), nuclear RAD and ITS datasets. C. colurna and C. jacquemontii are highly similar in tree habit, husk and nut characters, while C. colurna and C. chinensis can readily hybridize and are similar in leaf shape. C. fargesii, also called paperbark hazel, is different from other tree species

in that its bark peels off like paper birch. To date, molecular phylogenies of C. fargesii have scarcely been constructed because this species exists in limited regions of China and is difficult to sample. In the present study, multiple evidence including phylogenies, time-calibrated species tree, network tree identically placed C. fargesii in Clade D. Hence, we support section Colurnae as a robust clade containing all three species described above. 4.3. Diversification dynamics and biogeographic history Numerous studies have indicated that current biogeographic patterns and species diversification in the north temperate zone are necessarily related to long-term climate fluctuation and geological events occurring since the Tertiary (Xiang et al., 2014). According to molecular dating and biogeographic analysis (Figure 7, 8), Corylus and its relative Ostryopsis are estimated to split in the early Eocene (~53.9 Ma), which is very close to the time of the Paleocene-Eocene thermal maximum (~55 Ma, ). Thus, it is likely that temperature has played an important role in the early divergence of Corylus. As the global climate slowly cooled in the late Eocene, the Corylus crown began to occur at about 40.7 Ma in southwest China, and most extant species in this genus originated from radiation events in the late Cenozoic. Southwest China, ranging from the eastern Himalayas and Yarlung Zangbo Canyon to the entire Hengduan Mountains, is one of the 34 biodiversity hotspots according to Conservation International. Unsurprisingly, this region was identified as the origin center of Corylus, and this hypothesis is also confirmed by other biogeographic studies (Donoghue and Smith, 2004; Khan et al., 2018). According to the results of ancestral area reconstruction (Figure 8), intercontinental migrations were found to exist in Siphonochlamys (Clade B), Phyllochlamys (Clade C), and Colurnae (Clade D), with only Acanthochlamys (Clade A) having limited dispersal around its original area. Moreover, our ITS and chloroplast phylogenies (Figure 4, S1) as well as previous studies (Chen et al., 1999; Whitcher and Wen, 2001) all confirmed the basal position of Acanthochlamys in Corylus. The above facts mean that this oldest clade is more adapted to the cool alpine climate resembling that of southwest China, which in turn limits its dispersal potential. Siphonochlamys shows a northeastern Asian-western and

eastern North American disjunction, for which a hypothesis can explain this distribution pattern: an ancestral taxon in Siphonochlamys spread from southwest to northeast China, and then crossed the Beringian Land Bridge into North America in the middle Miocene, while the uplift of the Sierra Nevadas and Coastal Ranges induced the east-west differentiation, giving rise to the formation of C. cornuta and C.californica. In the meanwhile, the northeastern Asian subclade began to split to form C.mandshurica and C. sieboldiana, probably caused by the isolation of the Korean peninsula and the Japanese archipelago. A similar scenario can also be applied to explain the northeastern Asian-eastern North American disjunction of Phyllochlamys, that is, the occurrence of C. americana in North America. However, the western-eastern Eurasian disjunction of Phyllochlamys and Colurnae needs to be interpreted using another scenario: the East Asian ancestor of Phyllochlamys and Colurnae dispersed westward along the Tethys coast to the western Eurasia, where C. avellana and C. colurna arose. However, the two migrations occurred at different periods, separately in the early and late Miocene. Following the rapid uplift of Qinghai-Tibet Plateau at about 20 Ma (Shi et al., 1999) and increased aridity in interior Eurasia since 14 Ma (Fortelius et al., 2006), C. avellana diverged from its Asian relatives (C. heterophylla complex) and gradually colonized almost all the Europe. The divergence between C. colurna and C. jacquemontii (distributed in the Himalayas) was relatively late because a close biogeographic connection has been reported to exist between the Mediterranean region and Qinghai-Tibet Plateau (Mao et al., 2010), providing more chances for gene flow. Acknowledgements This study was supported by the Special Fund for Basic Scientific Research Business of Central Public Research Institutes of the Chinese Academy of Forestry (Grant No. CAFYBB2018SY011 and RIF2014-12). The authors sincerely thank Dr. Yanfei Zeng of the Chinese Academy of Forestry and Beijing Novogene Bioinformatics Technology Co., Ltd. for assistance in data analysis.

References Acosta, M.C., Premoli, A.C., 2010. Evidence of chloroplast capture in South American Nothofagus (subgenus Nothofagus, Nothofagaceae). Mol. Phylogenet. Evol. 54: 235-242. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J., 1990. Basic local alignment search tool. J. Mol. Biol. 215(3), 403-410. Avise, J.C., 2004. Molecular markers, natural history, and evolution, 2nd ed. Sunderland, MA: Sinauer Associates Bankevich, A., Nurk, S., Antipov, D., Gurevich, A.A., Dvorkin, M., Kulikov, A.S., Lesin, V.M., Nikolenko, S.I., Pham, S., Prjibelski, A.D., 2012. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19:455-477. Barrett, C.F., Specht, C.D., Leebens-Mack, J., Stevenson, D.W., Zomlefer, W.B., Davis, J.I., 2014. Resolving ancient radiations: can complete plastid gene sets elucidate deep relationships among the tropical gingers (Zingiberales). Ann Bot. 113: 119-133. Bassil, N.V., Boccacci, P., Botta, R., Postman, J., Mehlenbacher, S., 2013. Nuclear and chloroplast microsatellite markers to assess genetic diversity and evolution in hazelnut species, hybrids and cultivars. Genet. Resour. Crop. Ev. 60(2): 543-568. Birky, C.W., 1995. Uniparental inheritance of mitochondrial and chloroplast genes: mechanisms and evolution. Proc. Natl. Acad. Sci. USA. 92(25):11331-11338. Bonnet, T., Leblois, R., Rousset, F., Crochet, P.A., 2017. A reassessment of explanations for discordant introgressions of mitochondrial and nuclear genomes. Evolution 71, 2140-2158. Boni, M.F., Posada, D., Feldman, M.W., 2007. An exact nonparametric method for inferring mosaic structure in sequence triplets. Genetics 176:1035-1047. Bouckaert, R., Heled, J., Kühnert, D, Vaughan, T., Wu, C.H., Xie, D., Suchard, M.A., Rambaut, A., Drummond, A.J., 2014. BEAST 2: a software platform for Bayesian evolutionary analysis. PLoS Computat. Biol. 10, e1003537. Buckley, T.R., Cordeiro, M., Marshall, D.C., Simon, C., 2006. Differentiating between

hypotheses of lineage sorting and introgression in New Zealand alpine cicadas (Maoricicada Dugdale). Syst. Biol. 55, 411-425. Chen, Z. D., Manchester, S. R., and Sun, H. Y. (1999). Phylogeny and evolution of the Betulaceae as inferred from DNA sequences, morphology, and paleobotany. Am. J. Bot. 86, 1168-1181. Degnan, J.H., Rosenberg, N.A., 2009. Gene tree discordance, phylogenetic inference and the multispecies coalescent. Trends Ecol Evol. 24(6):332-340. Delgado, P., Salas-Lizana, R., Vázquez-Lobo, A., Wegier, A., Anzidei, M., Alvarez-Buylla, hybridization

E.R., in

Vendramin,

Pinus

G.G.,

montezumae

Pinero,

Lamb

D.

and

2007. Pinus

Introgressive pseudostrobus

Lindl.(Pinaceae): morphological and molecular (cpSSR) evidence. Int. J. Plant. Sci. 168(6), 861-875. Delsuc, F., Brinkmann, H., Philippe, H., 2005. Phylogenomics and the reconstruction of the tree of life. Nat. Rev. Genet. 6:361-375. Deng, J.B., Drew, B.T., Mavrodiev, E.V., Gitzendanner, M.A., Soltis, P.S., Soltis, D.E., 2015. Phylogeny, divergence times, and historical biogeography of the angiosperm family Saxifragaceae. Mol. Phylogenet. Evol. 83, 86-98. Díaz-Arce, N., Arrizabalaga, H., Murua, H., Irigoien, X., Rodríguez-Ezpeleta, N., 2016. RAD-seq derived genome-wide nuclear markers resolve the phylogeny of tunas. Mol. Phylogenet. Evol. 102:202-207. Donoghue, M.J., Smith, S.A., 2004. Patterns in the assembly of temperate forests around the Northern Hemisphere. Phil. Trans. R. Soc. B 359,1633-1644. Doyle, J. A., Sauquet, H., Scharaschkin, T., Le Thomas, A. 2004. Phylogeny, molecular and fossil dating, and biogeographic history of Annonaceae and Myristicaceae (Magnoliales). Int. J. Plant. Sci. 165(S4), S55-S67. Erdogan, V., 1999. Genetic Relationships among Hazelnut (Corylus) Species. PhD thesis, Oregon State University, Corvallis, U.S.A. Erdogan, V., Mehlenbacher, S.A., 2000. Phylogenetic relationships of Corylus species (Betulaceae) based on nuclear ribosomal DNA ITS region and chloroplast matK gene sequences. Syst. Bot. 25(4): 727-737.

Evanno, G., Regnaut, S., Goudet, J., 2005. Detecting the number of clusters of individuals using the software structure: a simulation study. Mol. Ecol. 14, 2611-2620. Fortelius, M., Eronen, J., Liu, L., Pushkina, D., Tesakov, A., Vislobokova, I., Zhang, Z., 2006. Late Miocene and Pliocene large land mammals and climatic changes in Eurasia. Palaeogeogr. Palaeoclimatol. Palaeoecol. 238(1-4), 219-227. Fu, X., Dou, J., Mao, J., Su, H., Jiao, W., Zhang, L., Hu, X., Huang, X., Wang, S., Bao, Z., 2013. RADtyping: an integrated package for accurate de novo codominant and dominant RAD genotyping in mapping populations. PloS One 8(11), e79960. Guindon, S., Dufayard, J.F., Lefort, V., Anisimova, M., Hordijk, W., Gascuel, O., 2010. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 59(3), 307-321. Hu, G., Cheng, L., Lan, Y., Cao, Q., Wang, X., Huang, W., 2016. The complete chloroplast genome sequence of the endangered Chinese endemic tree Corylus fargesii. Conserv. Genet. Resour. 9(2):1-3. Huang, S., Chiang, Y.C., Schaal, B.A., Chou, C.H., Chiang, T.Y., 2001. Organelle DNA phylogeography of Cycas taitungensis, a relict species in Taiwan. Mol. Ecol. 10:2669-2681. Huson, D.H., Bryant, D., 2006. Application of phylogenetic networks in evolutionary studies. Mol. Biol. Evol. 23:254-267. Huxley, A., Griffiths, M., Margot, L., 1992. The new Royal Horticultural Society dictionary of gardening. Vol.1. London:McMillan Press Ltd. Ickert-Bond, S.M., Wen, J., 2006. Phylogeny and biogeography of Altingiaceae: evidence from combined analysis of five non-coding chloroplast regions. Mol Pphylogenet Evol. 39(2), 512-528 Jansen, R.K., Ruhlman, T.A., 2012. Plastid genomes of seed plants. In: Bock, R., Knook, V. (Eds.), Genomics of Chloroplasts and Mitochondria. Springer, Netherlands, pp. 103-126. Johnson, L.A., Soltis, D.E., 1995. Phylogenetic inference in Saxifragaceae s. s. and Gilia (Polemoniaceae) using matK sequences. Ann. Mo. Bot. Gard. 82: 149-175.

Kasapligil, B., 1972. A bibliography on Corylus (Betulaceae) with annotations. Annu. Rpt. Northern. Nut. Growers. Assn. 63:107-162. Katoh, K., Standley, D.M., 2013. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772-780. Kearse, M., Moir, R., Wilson, A., Stones-Havas, S., Cheung, M., Sturrock, S., Buxton, S., Cooper, A., Markowitz, S., Duran, C., 2012. Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics 28(12):1647-1649. Kelly, L.J., Leitch, A.R., Clarkson, J.J., Hunter, R.B., Knapp, S., Chase, M.W., 2009. Intragenic recombination events and evidence for hybrid speciation in Nicotiana (Solanaceae). Mol. Bio. Evol. 27(4), 781-799. Khan, G., Zhang, F., Gao, Q., Fu, P., Zhang, Y., Chen, S., 2018. Spiroides shrubs on qinghai-tibetan plateau: multilocus phylogeography and palaeodistributional reconstruction of spiraea alpina, and s. mongolica, (rosaceae). Mol. Phylogenet. Evol. 123: 137-148. Kiil, K., Østerlund, M., 2018. CleanRecomb, a quick tool for recombination detection in SNP based cluster analysis. bioRxiv 317131. Kim, C., Deng, T., Wen, J., Nie, Z.L., Sun, H., 2015. Systematics, biogeography, and character evolution of Deutzia (Hydrangeaceae) inferred from nuclear and chloroplast DNA sequences. Mol. Phylogenet. Evol. 87, 91-104. Krussmann, G., 1976. Manual of cultivated broad-leaved trees and shrubs. Portland, Ore.: Timber Press. Li, R., Yu, C., Li, Y., Lam, T.W., Yiu, S.M., Karsten, K., Wang, J., 2009. SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics 25(15):1966-1967. Lo, E.Y., Duke, N.C., Sun, M. 2014. Phylogeographic pattern of Rhizophora (Rhizophoraceae) reveals the importance of both vicariance and long-distance oceanic dispersal to modern mangrove distribution. BMC. Evol. Biol. 14(1), 83. Lu, Z.Q., 2017. Species Delimitation in the Subfamiliy Coryloideae of Betulaceae in China. PhD thesis, LanZhou: Lanzhou University.

Ma, Q.H., Huo, H.L., Chen, X., Zhao, T.T., Liang, W.J., Wang, G.X., 2014. Study on the

Taxonomy,

Distribution,

Development

and

Utilization

of

Corylus

kweichowensis Hu. J. Plant. Gene. Resour. 15(6): 1223-1231. Mao, K., Milne, R.I., Zhang, L., Peng, Y., Liu, J., Thomas, P., Mill, R.R., Renner, S. S., 2012. Distribution of living Cupressaceae reflects the breakup of Pangea. Proc. Natl. Acad. Sci. USA. 109(20), 7793-7798. Mao, K., Hao, G., Liu, J., Adams, R.P., Milne, R.I., 2010. Diversification and biogeography of Juniperus (Cupressaceae): variable diversification rates and multiple intercontinental dispersals. New Phytol. 188(1), 254-272. Martin, D.P., Murrell, B., Golden, M., Khoosal, A., Muhire, B., 2015. RDP4: Detection and analysis of recombination patterns in virus genomes. Virus. Evol. 1:1-5. Martin, D.P., Rybicki, E., 2000. RDP: detection of recombination amongst aligned sequences. Bioinformatics 16:562-563. Martinsen, G.D., Whitham, T.G., Turek, R.J., Keim, P., 2001. Hybrid populations selectively filter gene introgression between species. Evolution 55, 1325-1335. Morales-Briones, D.F., Liston, A., Tank, D.C., 2018. Phylogenomic analyses reveal a deep history of hybridization and polyploidy in the Neotropical genus Lachemilla (Rosaceae). New Phytol. 218(4): 1668-1684. Nguyen, L.T., Schmidt, H.A., von Haeseler, A., Minh, B.Q., 2014. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32(1), 268-274. Padidam, M., Sawyer, S., Fauquet, C.M., 1999. Possible emergence of new geminiviruses by frequent recombination. Virology 265:218-225. Pecoraro, C., Babbucci, M., Villamor, A., Franch, R., Papetti, C., Leroy, B., Ortega-Garcia, S., Muir, J., Rooker, J., Arocha F., 2016. Methodological assessment of 2b-RAD genotyping technique for population structure inferences in yellowfin tuna (Thunnus albacares). Mar. Genom. 25, 43-48. Pfeil, B.E., Toprak, Z., xelman, B., 2017. Recombination provides evidence for ancient hybridisation in the Silene aegyptiaca (Caryophyllaceae) complex. Org.

Divers. Evol. 17(4), 717-726. Pigg, K.B., Manchester, S.R., Wehr, W.C., 2003. Corylus, Carpinus, and Palaeocarpinus (Betulaceae) from the middle Eocene Klondike Mountain and Allenby formations of northwestern North America. Int. J. Plant. Sci. 164(5), 807-822. Posada, D., Crandall, K.A., 2001. Evaluation of methods for detecting recombination from DNA sequences: computer simulations. Proc. Natl. Acad. Sci. USA. 98:13757-13762. Quang, M.B., Thi, NM A., Arndt, V.H., 2013. Ultrafast Approximation for Phylogenetic Bootstrap. Mol. Biol. Evol. 30(5):1188-1195. Rambaut, A., 2012. FigTree v1. 4. University of Edinburgh, Edinburgh, UK Available at: http://tree bio ed ac uk/software/figtree. Rambaut, A., Drummond, A.J., 2014. TreeAnnotator v2. 1.2. Edinburgh: University of Edinburgh, Institute of Evolutionary Biology. Ronquist, F., Teslenko, M., van der Mark, P., Ayres, D.L., Darling, A., Höhna, S., Larget, B., Liu, L., Suchard, M.A., Huelsenbeck, J.P., 2012. MrBayes 3.2: Efficient Bayesian phylogenetic inference and model choice across a large model space. Syst Biol. 61:539-542. Ruhfel, B.R., Gitzendanner, M.A., Soltis, P.S., Soltis, D.E., Burleigh, J.G., 2014. From algae to angiosperms-inferring the phylogeny of green plants (Viridiplantae) from 360 plastid genomes. BMC Evol. Biol. 14(1), 23. Salminen, M.O., Carr, J.K., Burke, D.S., McCutchan, F.E., 1995. Identification of breakpoints in intergenotypic recombinants of HIV type 1 by bootscanning. AIDS. Res. Hum. Retrov. 11:1423-1425. Saleem, H., Nahid, N., Shakir, S., Ijaz, S., Murtaza, G., Khan, A.A., Mubin, M., Nawaz-ul-Rehman, M.S., 2016. Diversity, mutation and recombination analysis of cotton leaf curl geminiviruses. PLoS One 11(3), e0151161. Semerikova SA, Khrunyk YY, Lascoux M, Semerikov VL. 2018. From America to Eurasia: a multigenomes history of the genus Abies. Mol. Phylogenet. Evol. 125:14-28.

Shi, Y., Li, J., Li, B., 1999. Uplift of the Qinghai-Xizang (Tibetan) plateau and east Asia

environmental

change

during

late

Cenozoic.

Acta.

Geographica

Sinica-Chinese Edition. 54, 20-28. Smith, J.M., 1992. Analyzing the mosaic structure of genes. J. Mol. Evol. 34(2):126-129. Sun, M., Soltis, D.E., Soltis, P.S., Zhu, X., Burleigh, J.G., Chen, Z., 2015. Deep phylogenetic incongruence in the angiosperm clade Rosidae. Mol. Phylogenet. Evol. 83: 156-166. Sullivan, A.R., Schiffthaler, B., Thompson, S.L., Street, N.R., Wang, X.R., 2017. Interspecific plastome recombination reflects ancient reticulate evolution in Picea (Pinaceae). Mol. Biol. Evol. 34(7), 1689-1701. Talavera, G., Castresana, J., 2007. Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Syst. Biol. 56: 564-577. Takhtajan,

A.,

1982.

Fossil

flowering

plants

of

the

USSR.

Vol

2.

Ulmaceae-Betulaceae. Nauka, Leningrad. (In Russian). Thompson, M.M., Lagerstedt, H.B., Mehlenbacher, S.A., 1996. Hazelnuts. In: Janick J, Moore JN (eds) Fruit breeding: nuts, vol 3. Wiley, New York, pp 125-184. Wang, N., Thomson, M., Bodles, W.J., Crawford, R.M., Hunt, H.V., Featherstone, A.W., Buggs, R.J., 2013. Genome sequence of dwarf birch (betula nana) and cross-species rad markers. Mol. Ecol. 22(11), 3098-3111. Wang, S., Meyer, E., McKay, J.K., Matz, M.V., 2012. 2b-RAD: a simple and flexible method for genome-wide genotyping. Nat. Methods. 9, 808-810. Wen, J., Nie, Z., Ickert-Bond, S.M., 2016. Intercontinental disjunctions between eastern Asia and western North America in vascular plants highlight the biogeographic importance of the Bering land bridge from late Cretaceous to Neogene. J. Syst. Evol. 54(5):469-490. White, T.J., Bruns, T.D., Lee, S.B., Taylor, J.L., 1990. Amplification and Direct Sequencing of Fungal Ribosomal RNA Genes for Phylogenetics. PCR Protocols 315-322.

Whitcher, I.N., Wen, J., 2001. Phylogeny and biogeography of Corylus (Betulaceae): inferences from ITS sequences. Syst Bot. 26(2): 283-298. Wickett, N.J., Mirarab, S., Nguyen, N., Warnow, T., Carpenter, E., Matasci, N., Ayyampalayam, S., Barker, M.S., Burleigh, J.G., Gitzendanner, M.A., 2014. Phylotranscriptomic analysis of the origin and early diversification of land plants. Proc. Natl. Acad. Sci. U.S.A. 111:E4859-E4868. Wolfe, J.A., Wehr, W.C., 1987. Middle Eocene dicotyledonous plants from Republic, northeastern Washington. US Geol Surv Bull. 1597:1-25. Wyman, S.K., Jansen, R.K., Boore, J.L., 2004. Automatic annotation of organellar genomes with DOGMA. Bioinformatics 20, 3252-3255. Xi, Z., Ruhfel, B.R., Schaefer, H., Amorim, A.M., Sugumaran, M., Wurdack, K.J., 2012. Phylogenomics and a posteriori data partitioning resolve the Cretaceous angiosperm radiation Malpighiales. Proc. Natl. Acad. Sci. USA. 109, 17519-17524. Xiang, X.G., Wang, W., Li, R.Q., Lin, L., Liu, Y., Zhou, Z.K., Li, Z.Y., Chen, Z.D., 2014. Large-scale phylogenetic analyses reveal fagalean diversification promoted by the interplay of diaspores and environments in the Paleogene. Perspect. Plant Ecol. Evol. Syst. 16, 101-110. Yang, Z., Zhao, T.T., Ma, Q.H., Liang, L.S., Wang, G.X., 2018. Comparative Genomics and Phylogenetic Analysis Revealed the Chloroplast Genome Variation and Interspecific Relationships of Corylus (Betulaceae) Species. Front. Plant. Sci. 9: 927. Yi, T.S., Jin, G.H., Wen, J., 2015. Chloroplast capture and intra-and inter-continental biogeographic diversification in the Asian–New World disjunct plant genus Osmorhiza (Apiaceae). Mol. Phylogenet. Evol. 85, 10-21. Yu, Y., Harris, A.J., Blair, C., He, X., 2015. RASP (Reconstruct Ancestral State in Phylogenies): a tool for historical biogeography. Mol. Phylogenet. Evol. 87, 46-49. Zachos, J., Pagani, M., Sloan, L., Thomas, E., Billups, K., 2001. Trends, rhythms, and aberration in global climate 65 Ma to present. Science 292, 686-693. Zeng, L., Zhang, Q., Sun, R., Kong, H., Zhang, N., Ma, H., 2014. Resolution of deep angiosperm phylogeny using conserved nuclear genes and estimates of early

divergence times. Nat. Commun. 5, 4956. Zhang, Y.H., Liu, L., Liang, W.J., Zhang, Y.M., 2005. Chinese Fruit Tree, volume Chesenut and Hazelnut. China Forestry Publishing House.

FIGURE LEGENDS

Figure 1. Recombination tests using RDP4 within the Corylus 2b-RAD matrix. Three recombination events (Event 1-3) were detected to directly affect the nucleotide composition; the putative recombinant segments associated with each recombination event are presented in different colors; the detailed positions of all recombination break points are also labeled near the recombination regions. Figure 2. Maximum likelihood phylogeny of Corylus inferred from the COMPLETE dataset. Four distinct clades (A-D) of ingroup assigned in the ML tree are highlighted with different vertical bars and color branches on the cladogram; SH-aLRT values above 85%, UFboot values above 70%, and PP values above 0.9 are shown on the branches; the hyphen represents that SH-aLRT, UFboot, and PP values are less than the threshold. Figure 3. Comparison between the NO-RECOMBINATION cladogram and ITS cladogram of Corylus. Different clades in both trees are endowed with different colors; topological incongruences between two cladograms are noted with the same species connected with lines at both ends. Figure 4. Maximum likelihood phylogeny of Corylus inferred from the CPG dataset. Five major clades (A-E) of ingroup are assigned to vertical bars, with blocks on the right correspond with the distribution for the species at the tip; nine regions were defined: (A) Northeast Asia; (B) Korean peninsula and the Japanese archipelago; (C) Central Plains and Qinling Mountains of China; (D) Central and East China; (E) Southwest China; (F) the Himalayas; (G) European-Mediterranean region; (H) eastern North America; (I) western North America; SH-aLRT values above 85%, UFboot

values above 70%, and PP values above 0.9 are shown on the branches; the hyphen represents that SH-aLRT, UFboot, and PP values are less than the threshold. Figure 5. Comparison between the NO-RECOMBINATION cladogram and CPG cladogram of Corylus. Species represented by multiple accessions in the NO-RECOMBINATION cladogram are collapsed; topological incongruences between two cladograms are noted with the same species connected with lines at both ends. Figure 6. Neighbor-Net trees of Corylus species based on: (A) genom-wide SNP matrix, (B) nuclear ITS region, (C) chloroplast genome. Figure 7. Time calibrated species tree of Corylus species inferred from the StarBEAST 2 analysis. Divergence dating is based on two fossil constraints (see text for more details). MRCA times and their 95% HPD intervals (in Ma) are shown on the branches; blue bars represent 95% HPD intervals for node ages. Figure 8. Ancestral area reconstruction based on the BBM model in RASP. The insert map indicates the species distribution of Corylus used in the reconstruction. Nine regions were defined: (A) Northeast Asia; (B) Korean peninsula and the Japanese archipelago; (C) Central Plains and Qinling Mountains of China; (D) Central and East China; (E) Southwest China; (F) the Himalayas; (G) European-Mediterranean region; (H) eastern North America; (I) western North America. Letters and colors in the legend represent extant ancestral areas and combination of these; Pie chart labeled with letters at each node indicates the most likely ancestral area. Figure S1. Maximum likelihood phylogeny of Corylus inferred from the ITS dataset. Figure S2. Maximum likelihood phylogeny of Corylus inferred from the LSC dataset. Figure S3. Maximum likelihood phylogeny of Corylus inferred from the SSC dataset. Figure S4. Maximum likelihood phylogeny of Corylus inferred from the IR dataset. Figure S5. Ancestral area reconstruction based on the DEC model in RASP. Table S1. Summary of 2b-RAD Sequencing. Table S2. Summary of chloroplast genome organization and accession numbers.

Table 1. Details of taxon code, sample code and sampling location of 45 individuals used in the study. Abbreviations of

Number

Sample code

Sample Location

1

C. colurna-1

Mazowieckie, Poland

EM

2

C. colurna-2

Karlovarsky, Czech

EM

3

C. colurna-3

Tbilisi, Georgia

EM

4

C. jacquemontii-1

Punjab, Pakistan

HIM

5

C. jacquemontii-2

Sindh, Pakistan

HIM

6

C. jacquemontii-3

Kathamandu, Nepal

HIM

7

C.fargesii-1

Gansu, China

CQC, SC

8

C.fargesii-2

Gansu, China

CQC, SC

9

C. chinensis-1

Yunnan, China

CQC, CEC, SC

10

C. chinensis-2

Shanxi, China

CQC, CEC, SC

11

C. chinensis-3

Shanxi, China

CQC, CEC, SC

12

C.avellana-1

Giresun, Turkey

EM

13

C.avellana-2

Thuringia, Germany

EM

14

C.avellana-3

Caserta, Italy

EM

15

C. americana-1

Michigan, USA

ENA

16

C. americana-2

Oregon, USA

ENA

17

C. americana-3

Oregon, USA

ENA

18

C. yunnanensis-1

Yunnan, China

SC

19

C. yunnanensis-2

Yunnan, China

SC

20

C. yunnanensis-3

Yunnan, China

SC

21

C.kweichowensis-1

Shanxi, China

CQC, CEC

distribution regions

22

C.kweichowensis-2

Anhui, China

CQC, CEC

23

C.kweichowensis var. brevipes

Jiangxi, China

CQC, CEC

24

C.heterophylla-1

Hebei, China

NEA

25

C.heterophylla-2

Jilin, China

NEA

26

C. sieboldiana-1

Saitama, Japan

KJ

27

C. sieboldiana-2

Saitama, Japan

KJ

28

C. sieboldiana-3

Sejong, Korea

KJ

29

C.mandshurica-1

Beijing, China

NEA

30

C.mandshurica-2

Hebei, China

NEA

31

C.californica-1

California, USA

WNA

32

C.californica-2

California, USA

WNA

33

C.californica-3

Oregon, USA

WNA

34

C.cornuta-1

Minnesota, USA

ENA

35

C.cornuta-2

Minnesota, USA

ENA

36

C.cornuta-3

New York, USA

ENA

37

C. ferox-1

Yunnan, China

CQC, CEC, SC

38

C. ferox-2

Shanxi, China

CQC, CEC, SC

39

C.ferox var. thibetica-1

Shanxi, China

CQC, SC

40

C.ferox var. thibetica-2

Shanxi, China

CQC, SC

41

C. wangii-1

Yunnan, China

SC

42

C. wangii-2

Yunnan, China

SC

43

Ostryopsis davidiana-1

Neimenggu, China

NEA, CQC, SC

44

Ostryopsis davidiana-2

Neimenggu, China

NEA, CQC, SC

45

Ostryopsis davidiana-3

Neimenggu, China

NEA, CQC, SC

Abbreviations: NEA=Northeast Asia; KJ=the Korean peninsula and the Japanese archipelago; CQC= Central Plains and Qinling Mountains of China; CEC= Central and East China; SC=Southwest China; HIM=the Himalayas; EM=European-Mediterranean area; ENA=eastern North America; WNA= western North America.

Table 2. Bonferonni corrected P values for the three recombination events detected in Clade B and D. Recombination tests

Event 1

Event 2

Event 3

(methods)

(p value)

(p value)

(p value)

-01

-01

5.06×10-03

RDP

2.10×10

GENECOV

--

BootScan

--

1.11×10

MaxChi Chimaera

-04

2.72×10

--03

5.38×10-03

1.73×10-02

-02

-03

3.41×10

1.08×10-02

SiScan

2.89×10

3Seq

8.16×10

1.94×10-02 -4.76×10-02

6.75×10-04

--

1.39×10-02

2.03×10-02

Table 3. Data characteristics, selected model and resolved relationships among tribes in each dataset. Data origin

2b-RAD

Parsimony-informative

Best fit model

Best fit model

sites (%)

(ML)

(BI)

4,894

2,723 (55.64)

TVMe+R2

GTR+G

Data partition

Number of sites

COMPLETE 43-COMPLETE

4,894

2,700 (55.17)

TVMe+R2

GTR+G

NO-RECOMBINATION

4,894

2,225 (45.46)

TIM3e+R2

GTR+G

ITS

615

52 (8.46)

TN+F

GTR+G

CPG

161,297

4,974 (3.08)

TVM+F+I+G4

GTR+G+I

Chloroplast

LSC

90,101

3,617 (4.01)

TVM+F+R3

GTR+G+I

genome

SSC

18,976

974 (5.13)

TVM+F+R3

GTR+G+I

IR

26,120

204 (0.77)

K3Pu+F+I

GTR+G+I

Nuclear ITS

Highlights

Conflicting phylogenetic signals were identified between nuclear and chloroplast topologies. Nuclear phylogenies resolved the evolutionary relationships, origin, and diversification history of Corylus, while chloroplast phylogenies revealed geographic differentiation.

Genus Corylus can be divided into four major clades corresponding to previous classification based on morphological traits. Corylus originated in the the middle Eocene in Southwest China. Lineage diversification associated with the Miocene and habitat associations.