Geographic differentiation and phylogeographic relationships among world soybean populations

Geographic differentiation and phylogeographic relationships among world soybean populations

Journal Pre-proof Geographic differentiation and phylogeographic relationships among world soybean populations Xueqin Liu, Jianbo He, Yufeng Wang, Gu...

2MB Sizes 0 Downloads 21 Views

Journal Pre-proof Geographic differentiation and phylogeographic relationships among world soybean populations

Xueqin Liu, Jianbo He, Yufeng Wang, Guangnan Xing, Yan Li, Shouping Yang, Tuanjie Zhao, Junyi Gai PII:

S2214-5141(19)30151-5

DOI:

https://doi.org/10.1016/j.cj.2019.09.010

Reference:

CJ 427

To appear in:

The Crop Journal

Received date:

5 April 2019

Revised date:

24 June 2019

Accepted date:

3 December 2019

Please cite this article as: X. Liu, J. He, Y. Wang, et al., Geographic differentiation and phylogeographic relationships among world soybean populations, The Crop Journal(2019), https://doi.org/10.1016/j.cj.2019.09.010

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

© 2019 Published by Elsevier.

Journal Pre-proof

Geographic differentiation and phylogeographic relationships among world soybean populations Xueqin Liu, Jianbo He, Yufeng Wang, Guangnan Xing, Yan Li, Shouping Yang, Tuanjie Zhao, Junyi Gai* Soybean Research Institute, Nanjing Agricultural University; MARA National Center for Soybean Improvement, Nanjing Agricultural University; MARA Key Laboratory of Biology and Genetic Improvement of Soybean (General), Nanjing Agricultural University; State Key Laboratory for Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University; Jiangsu Collaborative Innovation Center for Modern Crop Production, Nanjing Agricultural University, Nanjing 210095, Jiangsu, China

Abstract: A fast-growing protein and oil crop, soybean was domesticated in ancient China and disseminated early in Asia and afterwards to other continents, in particular the Americas in recent centuries. After adaptation, locally

f

developed landraces and cultivars formed a diversity of geographic-populations. In an investigation of their

oo

phylogeographic features, marker-derived traits were combined with geography-related photo- and temperature-sensitive traits to study 13 geographic-populations comprising 371 accessions. Extreme

pr

differentiation among geographic-populations was observed for flowering date (33–94 days), maturity date (79– 181 days), and main stem node number (6–25 nodes). Restriction-site associated DNA sequencing revealed strong

e-

genetic differentiation among these geographic-populations, including genetic richness (alleles, 35,242–44,986) and specific-present alleles (SPAs, 0–67). More SPAs (28–67) emerged in some secondary and tertiary centers

Pr

than in centers of origin (8–11). Phenotypic and genotypic clustering divided 11 of the 13 geographic-populations into the same five sets of sensitivity-similar geographic-populations and grouped the populations of northeast China and northern North America rather than center-of-origin populations as secondary centers, indicating the

al

importance of geography-related traits in determining genetic differences among geographic-populations. A

rn

model of four soybean dissemination paths is presented: from the center of origin to the north, east, and south in Asia and from northeast China to Europe and the Americas. These findings provide a detailed phylogeographic

Jo u

understanding of worldwide soybeans.

Keywords: Genetic diversity; Geographic differentiation; Phylogeographic relationship; Clustering; Dissemination

1. Introduction

Originating in ancient China, soybean (Glycine max (L.) Merr.) is now cultivated worldwide. It has been recognized as a miracle crop in North and South America over the past six decades for its nutritional value (the seed contains about 40% protein and 20% oil) and high economic profitability [1]. Worldwide soybean production is increasing year by year and amounted to 337 Mt in 2017/2018, reaching a historical peak (Table S1). In China, the major soybean-producing areas are the northern single-cropping area (including northeast and northwest China), the Huang-Huai (Yellow RiverHuai River) double cropping area, and the southern multiple-cropping area [2–4]. For the center of origin of soybean, the Huang-Huai [4–6] and Changjiang valleys and the region to their south (southern China) [7–9] have been suggested. Over its long history, farmers in ancient *

Corresponding author: Junyi Gai, E-mail address: [email protected].

Received: 2019-04-05; Revised: 2019-06-24; Accepted: 2019-12-03. 1

Journal Pre-proof China developed many soybean landraces for their own production in each geographic region. Scientific breeding of soybean cultivars began in the early 20th century and about 2500 cultivars have been released. These landraces and released cultivars constitute the soybean germplasm collection. Singh and Hymowitz [10] roughly mentioned on the basis of historical records and ancient legends that soybean was disseminated from China to Asia, Europe, and the USA. The literature [11–13] records its dissemination to Canada, South America, and Africa. The putative paths of soybean dissemination from China to other regions of the world are shown in Fig. 1. Soybean reached Asian countries around the first century A.D. and other continents

Jo u

rn

al

Pr

e-

pr

oo

f

in the last three centuries [10, 14].

Fig. 1 – The paths of soybean dissemination from the center of origin to other world regions. Arrows indicate directions of soybean dissemination. Thirteen regional populations are shown in different colors.

With the dissemination and adaptation of soybean to other continents, soybean landraces were developed by local farmers before scientific breeding began, and commercial soybean cultivars were developed by breeders after that point. Given that introductions to different ecological-regions may have come from different and even multiple sources, genetic backgrounds may differ among geographic- and ecological-regions. In particular, landraces and improved varieties adapted to individual regions were developed as a consequence of adaptation to local environments and diverse breeding objectives, resulting in differentiation among regional populations and differing agronomic performance [15–17]. As a short-day plant, soybean is greatly affected by daylength and temperature variation among growing regions, and depends particularly on local latitude and altitude as well as the sowing seasons in multiple cropping systems. During dissemination and adaptation, sharp differentiation is expressed in traits influenced by daylength 2

Journal Pre-proof and temperature, such as flowering date, maturity date, and main stem node number [18–20]. Apart from these geographic-ecological traits, the genetic diversity and differentiation of soybeans can be detected as genotypic variation. Abe et al. [21] used 20 SSRs to genotype 131 soybean accessions introduced from 14 Asian countries and found two germplasm pools, namely Japanese and parts of Korean peninsula populations clustered in one group, and most populations in Southeast and South and Central Asia with Chinese and parts of Korean peninsula populations clustered in the other group. Ude et al. [22] used five AFLPs to investigate the genetic diversity and relationships among North American soybean ancestors (NASA), North American soybean cultivars (NASC), and modern Chinese and Japanese cultivars, and found that Japanese and Chinese cultivars were distinct from NASA and NASC. However, knowledge of worldwide soybean variation and differentiation is limited. In particular, little is known about the phylogeographic relationships among soybean populations worldwide. This is because of difficulties in collecting worldwide soybean germplasm, given that governments have diverse germplasm

f

exchange policies and it is difficult to test multiple maturity groups (MGs) in the same natural environment

oo

(location and season).

pr

Efforts have been made [10, 14] to collect a representative sample of soybean accessions worldwide as with the aim of investigating phylogeographic relationships among soybeans in diverse geographic regions. The present

e-

study aimed to characterize the geographic-ecological differentiation among soybeans

worldwide,

phylogeographic relationships among geographic-regional populations, and ultimately the historical dissemination evolutionary

development

of

cultivated

soybean.

The

phylogeographic

relationships

among

Pr

and

geographic-regional populations involved two axes, namely time (history) and area (geographic region). In the present study, the area axis was real, but no time axis could be constructed because the accessions collected for

al

this study were modern rather than historical (already extinct) accessions. Accordingly, the historical phylogeographic relationship among geographic-regional populations could be inferred only from present-day

rn

geographic-regional populations (13 geographic-populations composed of 371 accessions). For this purpose, two indicators were used to describe geographic patterns. One was geographically and ecologically characterized traits

Jo u

(geographic-ecological traits) sensitive to geographically associated daylength and temperature (flowering date, maturity date, main stem node number, geographic-ecological-trait-based cluster). The other was gene/allele-derived or marker/haplotype-derived traits, which are genetic traits passed in regional populations: allele richness, allele frequency dispersion, specific-present allele (SPA), specific-deficient allele (SDA), and allele constitution, as well as gene/allele-based or marker/haplotype-based cluster.

2. Materials and methods 2.1. Plant materials Based on available information about soybean dissemination, world soybean production, and collected soybean accessions, the 13 geographic regions were grouped as shown in Fig. 1. A total of 371 accessions from the 13 geographic regions were sampled from the germplasm collection of the National Center for Soybean Improvement (NCSI), Nanjing Agricultural University, Nanjing, China (Table S2). The 13 regional populations were labeled with the assumed stages of their dissemination for convenience of description. Here, O represents the centers of origin in the Huang-Huai Valleys (O-HCHN) and Changjiang Valley and south of Changjiang Valley (O-SCHN) in China because they were both nominated in the literature [4–7, 9, 23] as centers of origin. A, B, C, and D 3

Journal Pre-proof represent four secondary centers extending outward from the center of origin. “A” represents northeast China (A-NCHN, including a few accessions from northwest China and in fact introduced mainly from northeast China in recent decades. These two parts comprise the northern single-cropping soybean area, which is conventionally abbreviated as northeast China). “B” represents the Korean peninsula (B-KORP) and Japanese islands (B-JPAN). “C” represents Southeast Asia (C-SEAS). “D” represents northern North America (D-NNAM) and southern North America (D-SNAM). A1, A2, C1, C2, and D1 represent tertiary centers derived from A, B, C, and D, including the far east of Russia (A1-RUFE), southern Sweden (A2-SSWE, which ought to be a sample collected from Europe, but only accessions from southern Sweden could be used because of few European accessions in the germplasm collection), South Asia (C1-SASI), Africa (C2-AFRI), and Central and South America (D1-CSAM). No accessions from Australia were included in the study, owing to the short history of soybean cultivation in Australia and the absence of Australian soybeans from the NCSI collection. The 371 accessions originated in

oo

f

China, the USA, Brazil, and 24 other countries, spanning the belt between roughly 58°N and 34°S (Table S2). 2.2. Field experiments

pr

From 2011 to 2012, the 371 accessions were tested in one-row plots length 1 m, an interval of 0.4 m, and two replications at Jiangpu Experimental Station (32°07'N, 118°62'E) of Nanjing Agricultural University. At this

e-

latitude, all accessions could mature or nearly mature under spring sowing conditions, contributing to the comparability of their geographic-ecological-traits. Sowing dates were April 28 in 2011 and April 25 in 2012. This

Pr

was a full-season test (rather than local double cropping with sowing after the harvest of wheat). Geographic-ecological-traits were observed, including phenology stages of emergence (VE), beginning bloom (R1) and full maturity (R8) following Fehr and Caviness [24]. Flowering and maturity dates were calculated as the

al

periods from sowing to R1 and to R8, respectively. If some accessions were unable to mature normally, the

rn

maturity date was estimated by adding the maturity date of MG VIII to the difference in flowering date between MG VIII and immature MG IX and X. Main stem node number was recorded at maturity with the cotyledonary

Jo u

node numbered zero, with counting to the top of the main stem, in 2012 only. 2.3. Statistical and clustering analysis of geographic-ecological-traits PROC MEANS in SAS 9.1 (SAS Institute Inc., Cary, NC, USA) was used to calculate the descriptive statistics (mean, minimum, and maximum) of flowering date, maturity date, and main stem node number. PROC GLM was used to fit an analysis of variance (ANOVA) from which the genetic coefficient of variation (GCV) was obtained from the equation GCV(%) = σg/μ × 100%, where σg is the genotypic standard deviation calculated according to the expected mean squares in ANOVA and μ is the population mean. PROC CLUSTER was used to cluster the 371 accessions and 13 regional populations to obtain phenotypic data using the method of average linkage. 2.4. Genotyping Restriction-site-associated DNA sequencing (RAD-seq) was performed by BGI Tech, Shenzhen, China. Genomic DNA of all 371 accessions was extracted from young leaves using the CTAB method [25]. The sequence depth (120.17 Gb of sequence) of the 371 accessions was approximately 4.08× with 4.64% coverage. Sequence was generated with an Illumina HiSeq 2000 instrument using multiplexed shotgun genotyping [26]. All sequences were aligned against Williams 82 [27] using SOAP 2 [28], using sequence similarity, paired-end relationships, and sequence quality. Using RealSFS software [29], 98,482 single nucleotide polymorphism (SNPs) were identified 4

Journal Pre-proof for the 371 accessions with a proportion of missing and heterozygous alleles ≤ 30% and minor-allele frequencies ≥ 1%. FastPHASE software [30] was used for SNP genotyping imputation after heterozygous loci were transformed into missing alleles. After that, SNPs within a linkage disequilibrium (LD) block (D’ > 0.7) were organized into an SNPLDB (SNP linkage disequilibrium block) marker with its haplotypes using Haploview software [31, 32]. A total of 20,701 SNPLDBs and their 55,404 haplotypes were identified. Liu et al. [33] used the same genotype dataset in a previous study. 2.5. Analysis of the genetic marker-haplotypes (gene-alleles) of populations Because marker-haplotype data were used for the genetic analysis of the populations a marker with its haplotype was analogous to a gene or QTL (quantitative trait locus) with its alleles. In the following description, gene-allele and marker-haplotype, which are analogous, are used alternately for simplicity. Genetic or gene diversity was analyzed in its broad sense, including allele richness (the first-rank statistic of gene diversity) and allele frequency

oo

f

dispersion (also known as gene diversity but the second-rank statistic of gene diversity in a narrow sense). These statistics were calculated with PowerMarker 3.25 [34]. Allele richness (A) refers to the total number of alleles at

 Ai , where Ai represents the number of alleles at the ith

pr

all marker loci in a population and is calculated as A 

locus in the population. Allele frequency dispersion (D) means the variation of allele frequency in the population

D l represents the mean allele frequency dispersion at the lth locus,

e-

and is calculated as Dl  1  k ~ p 2 , where u 1 lu

Pr

~p lu represents the frequency of the uth allele at the lth locus, and k is the total number of alleles at the lth locus. Wen et al. [35] defined the SPA of a population as an allele present in that population and in no other population

al

and SDA as an allele present in all other populations but absent from that population. These two indicators were used to evaluate the specificity of the 13 geographic populations.

rn

2.6. Genetic differentiation and phylogeographic analysis





p

Jo u

Arlequin 3.11 [36] was used to calculate differentiation coefficient FST statistics as p

FST   ni FSTi  /  ni , i 1



i 1

where ni is the allele number of the ith population, P is the total number of populations, and 1

𝐹𝑆𝑇𝑖 = [

1

𝑛 𝑃−1

SSD(𝐴𝑃) −

1

𝑁

𝑛𝑖 𝑁−𝑃

SSD(𝑊𝑃𝑖 )] /𝜎𝑇2,

where SSD (AP) is the sum of the squares of populations, SSD (WPi) is the sum of the squares of the ith population, N is the total number of genotypes, and



2

T

is the sum of expected mean squares.

Genetic distance (dij) was calculated for all pairs of accessions: genetic similarity, calculated as

sij 

 n / 2m , k

ijk

5

d ij  1  sij , where sij is the coefficient of

Journal Pre-proof where nijk is the common allele number of the ith and jth individual at the kth locus, and m is the total number of loci. Clustering was performed using the genetic distance matrix of dij for both the 371 accessions and the 13 regional populations with the neighbor-joining procedure of PowerMarker 3.25 and displayed by MEGA 4 [37]. On this basis, the results of geographic-ecological traits were combined to infer phylogeographic relationships among the geographic populations.

3. Results 3.1. Phylogeographic relationships among geographic-regional populations in terms of geographic-ecological traits

f

As shown in Table 1, world soybean accessions with their MGs from 000 to X covering all kinds of MGs had

oo

flowering dates ranging from 30 to 123 days, maturity dates from 75 to 200 days, and main stem node number from 4 to 35 nodes under uniform full-season environmental conditions in Nanjing, China. This result revealed an

pr

extremely wide range of variation in these traits. Mean flowering dates, maturity dates, and main stem node numbers in centers of origin (O) were respectively 51–62 days (ranging from 37 to 104 days), 128–141 days

e-

(ranging from 95 to 178 days) and 15–16 nodes (ranging from 8 to 28 nodes). Those in northern centers (A, A1 and A2) were shortened; those in the Korean peninsula and Japanese centers (B) were similar to those in centers

Pr

of origin; those in southern centers (C, C1, C2) were markedly increased; those in new American centers (D) showed both lower and higher values. Table 1 shows the differences in means and ranges of these three traits among the 13 regional populations and even within each population, indicating the adaptation of soybeans to

Jo u

rn

different areas around the world.

al

different geographic environments (daylength and temperature conditions) with the dissemination of soybeans to

6

Journal Pre-proof Table 1 – Phenotypic diversity and differentiation among regional populations of three daylength- and temperature-sensitive traits of soybeans grown in Nanjing (32°07'N, 118°62'E), China. Flowering date Maturity date Main stem node number Regional No. of Maturity group population accessions (MG) Mean Range GCV (%) Mean Range GCV (%) Mean Range GCV (%)

rn

al

Pr

e-

pr

oo

f

O-HCHN 31 II–IV 51 f 40–62 11.15 128 f 107–147 6.34 15 ed 8–28 29.46 O-SCHN 34 II–VIII 62 e 37–104 27.11 141 d 95–178 18.28 16 cd 9–28 29.83 A-NCHN 51 000–IV 38 ih 30–51 14.04 102 h 81–136 13.15 9h 4–18 27.58 A1-RUFE 18 00–IV 37 i 33–46 7.53 101 h 82–135 11.65 11 hg 6–22 35.52 A2-SSWE 17 000 33 j 32–36 2.39 79 i 75–85 2.06 6i 4–8 13.19 B-KORP 19 0–VII 50 f 32–68 19.30 129 ef 89–168 18.33 12 g 6–29 44.15 B-JPAN 34 0–VII 48 g 35–86 20.94 131 e 87–175 20.93 11 g 6–28 35.09 C-SEAS 26 III–IV, VII–X 94 a 51–121 19.33 175 b 123–200 11.03 25 a 10–35 26.83 C1-SASI 15 VI–X 87 c 52–123 29.41 181 a 152–200 7.25 22 b 9–35 36.33 C2-AFRI 7 V–X 91 b 55–118 21.43 180 a 149–200 9.68 24 a 16–31 21.11 D-NNAM 64 000–IV, VI–VII 39 h 31–70 17.93 111 g 77–172 15.37 13 f 6–33 35.45 D-SNAM 30 IV–VIII 62 e 41–78 11.12 160 c 130–181 7.09 13 ef 8–20 16.94 D1-CSAM 25 V–VIII, X 71 d 62–118 16.14 162 c 144–200 7.66 17 c 12–27 19.41 All 371 000–X 54 30–123 39.36 131 75–200 24.80 14 4–35 45.29 Regional population: O represents the centers of origin in the Huang-Huai Valleys (O-HCHN) and Changjiang Valleys and south of Changjiang Valleys (O-SCHN) in China; A, B, C, and D represent four secondary centers extending outward from the center of origin. “A” represents northeast China (A-NCHN, including a few accessions from northwest China and in fact introduced mainly from northeast China in recent decades. These two parts comprise the northern single-cropping soybean area, which is conventionally abbreviated as northeast China). “B” represents the Korean peninsula (B-KORP) and Japanese islands (B-JPAN). “C” represents Southeast Asia (C-SEAS). “D” represents northern North America (D-NNAM) and southern North America (D-SNAM). A1, A2, C1, C2, and D1 represent tertiary centers derived from A, B, C, and D, including the far east of Russia (A1-RUFE), southern Sweden (A2-SSWE, which ought to be a sample collected from Europe, but only accessions from southern Sweden could be used because of few European accessions in the germplasm collection), South Asia (C1-SASI), Africa (C2-AFRI), and Central and South America (D1-CSAM). The same is true for the later tables and figures. MG, the maturity groups of accessions, following Liu et al. [33]. Flowering and maturity dates are means of 2011 and 2012 and main stem node number is the mean of 2012. GCV, genotypic coefficient of variation. Different letters indicate that means within a column are significantly different (P < 0.01).

Jo u

The 13 geographic ecological-populations were phenotypically clustered based on the three daylength- and temperature-sensitive traits (Fig. 2-A). Generally, A-NCHN, A1-RUFE, A2-SSWE, and D-NNAM were in a daylength and temperature most-insensitive group; O-HCHN, O-SCHN, B-KORP, and B-JPAN were in a less sensitive group; and C-SEAS, C1-SASI and C2-AFRI, D-SNAM, and D1-CSAM were in a sensitive group, with “C” groups more sensitive than “D” groups. For individual traits, the clustering results for maturity date and main stem node number were very similar to those for the three traits, whereas the clustering results for flowering date were somewhat different from those of the other two, with D-SNAM and D1-CSAM joining the O-SCHN group (Fig. S1, Fig. 2-A).

7

Journal Pre-proof

Fig. 2 – Unrooted trees showing the relationships among 13 regional populations. (A) Phenotypic clustering by three daylength-

f

and temperature-sensitive traits using the method of average linkage. (B) Genotypic clustering using neighbor joining.

oo

3.2. Phylogeographic relationships among geographic-regional populations in terms of genetic marker-haplotype (gene-allele) traits

pr

The analysis of molecular variance showed highly significant genetic differences among and within regional populations (Table S3). In the centers of origin (primary centers), O-HCHN and O-SCHN harbored 42,519 and

e-

43,349 alleles accounting for 76.7% and 78.2% of the alleles (55,404) in all the populations (Table 2), which meaning that 23.3% and 21.8% of the alleles arose in geographic regions other than the center-of-origin regions.

Pr

In the secondary centers, A-NCHN, B-KORP, and B-JPAN harbored 41,439 (74.8%), 41,200 (74.4%), and 43,942 (79.3%) alleles, respectively, showing proportions similar to those in the center of origin, although their areas

al

were much smaller. However, in other secondary centers, C-SEAS and D-NNAM harbored 44,141 (79.7%) and 44,986 (81.2%) alleles, respectively, showing a marked increases. Allele richness in the newly formed tertiary

rn

centers A2-SSWE, C2-AFRI, and D1-CSAM was lower than that in the major centers O, A, B, C, and D.

Jo u

A total of 247 SPAs and 4112 SDAs were detected. In the older centers O-HCHN, O-SCHN, B-KORP, and C-SEAS, SPAs were only 8, 11, 2, and 4, respectively, whereas in the newly formed center D-NNAM and in the tertiary centers A1-RUFE, A2-SSWE, and C1-SASI, SPAs were as many as 28, 41, 41, and 67, respectively (Table 2, Fig. 3). To be specific, 24 SPAs were found in the older center B-JPAN. No SPAs were found in the younger tertiary centers C2-AFRI and D1-CSAM. SDAs followed similar trends.

8

Journal Pre-proof Table 2 – Genetic diversity and differentiation of 13 regional populations. Regional No. of Allelic richness (A) Allele frequency population accessions (%) dispersion (D)

Specific-present allele (SPA)

Specific-deficient allele (SDA)

31

42,519

0.218

8

103

O-SCHN

34

43,349

0.204

11

72

A-NCHN

51

41,439

0.181

19

60

A1-RUFE

18

42,586

0.254

41

205

A2-SSWE

17

30,038

0.111

41

1,772

B-KORP

19

41,200

0.209

2

33

B-JPAN

34

43,942

0.214

24

106

C-SEAS

26

44,141

0.235

4

149

C1-SASI

15

42,880

0.287

67

375

C2-AFRI

7

35,242

0.204

0

746

D-NNAM

64

44,986

0.175

28

17

D-SNAM

30

36,190

0.157

2

168

D1-CSAM

25

36,066

0.164

Total

371

55,404

0.238

0

306

247

4,112

Jo u

rn

al

Pr

e-

pr

oo

f

O-HCHN

Fig. 3 – The distribution of specific-present alleles (SPAs) in regional populations other than C2-AFRI and D1-CSAM. The vertical axis, labeled “Allele”, shows the specific-present alleles and the horizontal axis, labeled “Accession”, shows accessions of the 11 regional populations. Different colors represent the sources of regional populations with 247 SPAs in total. 9

Journal Pre-proof

Gene diversity, D in a narrow sense (allele frequency dispersion) indicates the evenness of allele frequencies. In the process of evolution, new mutants led to a low D-value, which increased to equilibrium after a long period of adaptation. In the present study, the center of origin “O” and secondary centers “B” and “C” showed relatively stable D-values of 0.204–0.235. However, new secondary centers D-NNAM and D-SNAM showed relatively unbalanced D-values of 0.157–0.175, as did the secondary center A-NCHN and the tertiary centers A2-SSWE and D1-CSAM. This finding indicates the emergence or introduction of more new alleles in these secondary and tertiary centers (Table 2). The above measurements of genetic diversity showed differences of allele frequency dispersion among geographic populations, while differentiation coefficient (Fst) and genetic similarity coefficient (sij) measured the

f

degree of differentiation between two populations (Table 3). The overall mean Fst between populations was 0.188,

oo

with individual Fst values ranging from 0.036 to 0.422. Very small Fst was observed between O-HCHN and O-SCHN (0.054), indicating high consistency between the two center-of-origin populations. Similarities were

pr

observed between the two “B”s (0.053) but not the two “D”s (0.153), indicating a larger differentiation between D-NNAM and D-SNAM. The differentiation of the center of origin “O” from the secondary center “B” was not

e-

large (0.110–0.136), but was larger from the secondary center A-NCHN (0.158–0.196) and still larger from D-NNAM and D-SNAM (0.196–0.233). A2-SSWE was quite different from all other populations, with Fst ranging

Pr

from 0.279–0.422 and a mean of 0.354. Moreover, “D” centers differed greatly from Asian and African

al

populations (0.200–0.212).

Table 3 – Genetic differentiation among 13 regional populations.

O-HCHN

0.908

rn

O-HCHN O-SCHN A-NCHN A1-RUFE A2-SSWE B-KORP B-JPAN C-SEAS C1-SASI C2-AFRI D-NNAM D-SNAM D1-CSAM Mean 0.858

0.845

0.782

0.865

0.862

0.868

0.825

0.852

0.842

0.840

0.839

0.849

0.843

0.827

0.778

0.857

0.860

0.877

0.815

0.854

0.831

0.836

0.834

0.843

0.872

0.816

0.881

0.871

0.830

0.785

0.830

0.896

0.870

0.862

0.851

0.782

0.859

0.839

0.826

0.815

0.827

0.853

0.828

0.823

0.833

0.805

0.815

0.768

0.726

0.771

0.805

0.795

0.797

0.787

0.903

0.860

0.803

0.844

0.866

0.866

0.858

0.856

0.858

0.790

0.842

0.858

0.854

0.851

0.850

0.834

0.890

0.823

0.824

0.831

0.841

0.821

0.781

0.780

0.781

0.796

0.820

0.824

0.831

0.834

0.888

0.874

0.845

0.903

0.842

0.054

A-NCHN

0.158

0.196

A1-RUFE

0.127

0.163

0.103

A2-SSWE

0.338

0.366

0.330

0.300

B-KORP

0.110

0.136

0.114

0.093

0.321

B-JPAN

0.120

0.133

0.132

0.130

0.279

0.053

C-SEAS

0.093

0.087

0.188

0.135

0.340

0.105

0.114

C1-SASI

0.141

0.162

0.236

0.128

0.366

0.160

0.189

0.106

C2-AFRI

0.113

0.126

0.199

0.125

0.416

0.130

0.136

0.036

0.106

D-NNAM

0.196

0.233

0.115

0.144

0.360

0.154

0.165

0.216

0.257

0.232

D-SNAM

0.203

0.223

0.180

0.193

0.422

0.160

0.179

0.211

0.254

0.245

0.153

D1-CSAM

0.194

0.218

0.190

0.189

0.408

0.164

0.173

0.190

0.238

0.222

0.174

0.117

Mean

0.154

0.175

0.178

0.153

0.354

0.142

0.150

0.152

0.195

0.174

0.200

0.212

Jo u

O-SCHN

Values on the upper diagonal are similarity coefficients sij and those on the lower diagonal are differentiation coefficients Fst.

10

0.840 0.206

Journal Pre-proof Thus, allele richness and allele frequency dispersion were usually higher, SPA and Fst were not higher in centers of origin (“O”s) and older secondary centers (such as “B”s), but these genetic diversity values in younger secondary centers (such as “C” and “D”) and tertiary centers may be higher or lower than those in older centers. This finding means that alleles emerged or vanished after dissemination and adaptation, giving rise to changes in allele frequency and thus differentiation among 13 regional populations. Geographic and genetic clustering revealed the relationship among all the populations on the basis of the genetic structure of each population. Fig. 2-B shows the results of clustering using genetic distances, dividing the 13 populations into five groups. (1) From the top, one group is composed only of the A2-SSWE population, which was genetically quite distinct from other groups. (2) The second group consists of all secondary and tertiary centers in the Americas, including D-NNAM, D-SNAM, and D1-CSAM, among which D-SNAM and D1-CSAM

f

show a closer genetic relationship because the latter was developed based on the former. (3) The third group

oo

contains only the A-NCHN population included in the upper cluster with three American populations rather than centers of origin in China. This composition matches the results of geographic-ecological clustering and suggests

pr

a close relationship between major sources of American populations and A-NCHN. This explains why A-NCHN was not included in center of origin but nominated as a secondary center. (4) The fourth group was largest,

e-

consisting of six regional populations all from Asia. In this group, the center-of-origin populations O-HCHN and O-SCHN were included showing a closer relationship with C-SEAS, C1-SASI, and C2-AFRI than with A-NCHN.

Pr

(5) The fifth group contains the secondary centers B-KORP and B-JPAN, which were genetically more distant from the centers of origin O-HCHN and O-SCHN than from the derived centers C-SEAS, C1-SASI, and C2-AFRI.

al

A-NCHN, A1-RUFE, and A2-SSWE were clustered in three different groups with A1-RUFE closer to “O”s

rn

than to A-NCHN. The reason may be that its source was mainly from the earlier accessions of northeast China, closer to O-HCHN, whereas some new introductions from other populations were included in the present A1-RUFE.

Jo u

northeast China population, whereas newly developed early-maturing accessions had not yet been introduced to

3.3. Phylogeographic relationships among world soybean accessions Clustering individual accessions may provide further information on the genetic structure of an ecological-population. In the clustering analysis of 371 world soybean accessions based on three geographic-ecological-traits, these accessions were grouped into six clusters characterized by different magnitudes of the three traits (Table 4, Fig. S2). (1) The largest group (group IV) was composed of 160 accessions insensitive to daylength and temperature and from all regional populations except the earliest-maturing population A2-SSWE and the latest-maturing populations C2-AFRI and D1-CSAM. (2) The second largest cluster (group VI) comprised 89 accessions sensitive to daylength and temperature and mainly from O-SCHN, B-JPAN, D-SNAM, and D1-CSAM. (3) The third largest cluster (group V) consisted of 76 accessions very insensitive to daylength and temperature and mainly from A-NCHN, A1-RUFE, A2-SSWE, B-JPAN, and D-NNAM. (4) The fourth cluster (group III) comprised 34 accessions very sensitive to daylength and temperature and mainly from O-SCHN, C-SEAS, C1-SASI, C2-AFRI, and D1-CSAM. (5) The other two clusters (groups I and II) contained respectively only two and 10 accessions, which were sensitive to daylength and temperature. 11

Journal Pre-proof

Table 4 – The frequency distribution of geographic accessions in six phenotypic clusters of world soybeans based on three daylength- and temperature-sensitive traits. Average linkage cluster

Regional population

i

O-HCHN O-SCHN A-NCHN

ii

iii

5

2

9

1 2 2

105 186 27

9 17 2 6

46 2

16

160

76

5 12 3 4 1 2 28 23 89

45 120 13

34 87 8

63 161 15

pr

86 169 22

e-

56 189 15

2 34

1 25

vi 1 10

Total 31 34 51 18 17 19 34 26 15 7 64 30 25 371

Pr

Flowering date Maturity date Main stem node number χ2 = 508.5, χ20.01,60 = 88.4, P < 0.01.

10

v

f

18 8 4

2

2

12 15 3 1

oo

A1-RUFE A2-SSWE B-KORP B-JPAN C-SEAS C1-SASI C2-AFRI D-NNAM D-SNAM D1-CSAM Total

iv 30 16 26

al

In Table 4, O-SCHN, B-JPAN, C-SEAS, and C1-SASI are scattered across respectively 5, 4, 4, and 4 clusters among geographic-regional populations, indicating a wide range of daylength- and temperature-response features

rn

in these regions. However, O-HCHN, A-NCHN, A1-RUFE, A2-SSWE, D-SNAM, and D1-CSAM were scattered across respectively only 2, 2, 2, 1, 2, and 2 clusters and concentrated on daylength- and temperature-insensitive or

Jo u

-sensitive clusters. By sensitivity to daylength and temperature, clusters ranked iii, ii, vi, i, iv, and v in descending order, while geographic regions ranked O-SCHN (covering five clusters), B-JPAN, C-SEAS, B1-SASI (covering four clusters), B-KORP, C2-AFRI, D-NNAM (covering three clusters), O-HCHN, A-NCHN, A1-RUFE, D-SNAM, and D1-CSAM (covering two clusters), and A2-SSWE (covering only one cluster) in descending order of sensitivity. Thus, there was higher geographic-ecological variation in the O-SCHN and “B” groups and lower variation in the O-HCHN and “A” and “D” groups, possibly owing to the difference between multiple planting seasons and mainly summer-planting season. In genotypic clustering based on individual accessions, the 371 accessions were classified into six genetically different groups (Table 5, Fig. 4-A). (1) Cluster IV was the largest cluster, comprising 132 accessions from all geographic regions by A2-SSWE, including almost all accessions from the centers of origin O-HCHN and O-SCHN. This cluster contained major parts of C-SEAS, C1-SASI, and C2-AFRI, including parts from A-NCHN, B-KORP, and B-JPAN, which maintained closer genetic relationships with accessions of the center of origin. (2) Cluster V was the second largest, containing 117 accessions from nine geographic regions, including a major part from A-NCHN, D-NNAM, and A2-SSWE. This result indicated that both American and European early-maturing accessions could trace their sources to northeast China. (3) Cluster III comprised 47 accessions from seven 12

Journal Pre-proof geographic regions, including a major part from B-KORP, B-JPAN, and A1-RUFE. B-KORP and B-JPAN germplasm exerted no obvious genetic influence on American accessions, given that not many “D” accessions were included in this cluster. (4) Cluster VI comprised 61 accessions from six geographic regions, mainly from D-SNAM and D1-CSAM and partly from D-NNAM, indicating that the accessions in D-SNAM had their germplasm from D-NNAM but that the genetic structure of D-SNAM diverged from that of D-NNAM and formed a separate cluster along with D1-CSAM after intensive breeding. (5) Cluster I was very specific with only one accession from a specific type in Africa (there were others in cluster IV). (6) Cluster II contained 13 accessions including 11 from A-NCHN and two from A1-RUFE, all of which were early-maturing, but genetically different from early-maturing ones in clusters III, IV, and V.

f

Table 5 – The frequency distribution of geographic accessions in six genotypic clusters of world soybeans.

O-HCHN O-SCHN A-NCHN A1-RUFE A2-SSWE B-KORP B-JPAN

II

III

2 7

e-

11 2

Pr

13 21

C-SEAS C1-SASI

2 1

C2-AFRI

1

1

3 10

47

75 169 25

44 113 11

49 129 11

67 147 18

rn

1 34 5 16 1 3

VI

1 2 1

6

13

13

V

23 14

1

Jo u

Flowering date Maturity date Main stem node number χ2 = 727.3, χ20.01,60 = 88.4, P < 0.01.

31 33 4 4

2 1 1 132

al

D-NNAM D-SNAM D1-CSAM Total

IV

pr

I

oo

Genetic distance cluster

Regional population

Total 31 34 51 18 17 19 34 26 15 7

55 1 1 117

6 28 23 61

36 101 11

64 157 15

64 30 25 371

rn

al

Pr

e-

pr

oo

f

Journal Pre-proof

Jo u

Fig. 4 – Unrooted trees showing genetic relationships among 371 world soybean accessions. (A) Unrooted tree showing the genetic relationships among 371 world soybean accessions. Colors represent the sources of regional populations. Roman numerals beside vertical bars indicate six clusters assembled from 371 accessions; (B) Second-order genotypic clusters based on the first-order clusters.

It is interesting that the accessions in the centers of origin O-HCHN and O-SCHN were concentrated in cluster IV, indicating the similar genetic bases of the two populations. However, accessions in other 11 geographic regions were distributed in several clusters, indicating that multiple sources were introduced from outside regions and that genetic differentiation occurred because of multiple introductions. In particular, O-SCHN covered only two clusters genetically, but five clusters geographic-ecologically. Accessions in A-NCHN and A1-RUFE were all early-maturing, but covered four clusters, suggesting that early-maturing accessions might have quite different genetic structures. Accessions in secondary centers B-KORP and B-JPAN were distributed in four or three clusters genetically and geographic-ecologically, indicating a large variation of their maturity groups and mixed genetic sources (Table 5). These six genotypic clusters were further clustered. Fig. 4-B shows a close genetic relationship between 14

Journal Pre-proof clusters III and IV as well as clusters II and V. Most accessions of cluster IV were from O-HCHN, O-SCHN, C-SEAS, and C1-SASI, whereas cluster III comprised accessions mainly from B-KORP and B-JPAN, suggesting that soybeans were introduced from China into the Korean peninsula, Japan, Southeast Asia, and South Asia and that soybeans in the Korean peninsula and Japanese islands formed different germplasm pools later. The formation of new germplasm pools might be attributable to the long history of breeding efforts of historical farmers and modern breeders in the Korean peninsula and the Japanese islands. Another reason might be genomic introgression through interspecific hybridization, given the abundance of local wild soybeans naturally present in the Korean peninsula and the Japanese islands, as described by Xu et al. [38] and Wang et al. [39]. Cluster V comprised 117 accessions, most of which were from high latitude regions including A1-RUFE, A2-SSWE, A-NCHN, and D-NNAM. Cluster VI comprised 61 accessions, of which 51 (84%) came from D-SNAM and D1-CSAM. The differentiation between clusters V and VI suggests that soybeans in the south of North America

f

and Central and South America formed a specific genetic structure after their introductions from northern North

oo

America.

pr

4. Discussion

4.1. Features of the genotypic and phenotypic clustering of the world soybeans

e-

Integrating genetic marker-haplotype (gene-allele) and geographic-ecological-trait clustering analysis, the present

Pr

study focused on the same world soybean population. Both genotypic and phenotypic clustering put O-HCHN and O-SCHN into the same cluster and also did so for A-NCHN and D-NNAM; C-SEAS, C1-SASI, and C2-AFRI; B-KORP and B-JPAN; and D-SNAM and D1-CSAM (Fig. 2). Thus, 11 of the 13 geographic-populations were

al

grouped into the same five sets of sensitivity-similar geographic-populations. Such consistency must reflect common factors, of which the major one might be the common genetic constitutions of daylength- and

rn

temperature-sensitive traits. In addition to consistent sets, however, some differences were also found, and may

Jo u

reflect different genetic constitutions of other traits. Both approaches showed the largest clusters covering most of the regional populations (iv vs. IV), namely a cluster covering a major part of D-SNAM and D1-CSAM (vi vs. VI) and a cluster covering O-SCHN, A-NCHN, A1-RUFE, A2-SSWE, B-KORP, B-JPAN, D-NNAM (v vs. V) in the phenotypic and genotypic clustering among the accessions. These three pairs of major clusters displayed similarities among the three traits (Table 4 and Table 5). However, the two clustering results also showed some differences. For example, three minor clusters in both clustering results had quite different components; O-SCHN covered five clusters in phenotypic clustering but only two clusters in genetic clustering, and the same disparity was observed for A-NCHN and A1-RUFE (2 vs. 4 clusters) and C1-SASI (4 vs. 2 clusters). This finding suggests that different genetic constitutions may show similar daylength and temperature responses and vice versa. The results of geographic-ecological and genetic clustering showed some consistency at a lower cluster level and some inconsistency at a higher cluster level. This finding may be explained as follows. First, the genotypic clustering reflected the relationship among accessions based on the whole genome, whereas geographic-ecological clustering involved only the part of the genomic information that conditioned the geographic-ecological traits. Second, multiple differing sources of introductions and germplasm exchanges might result in differential genetic constitutions and even similar phenotypes of geographic-ecological traits. Third, in addition to geographic 15

Journal Pre-proof conditions a variety of breeding goals are likely to have caused extra genetic variation. In summary, both geographic-ecological clustering and genotypic clustering shed light on the relationships among regional populations, and their results can be integrated to reveal the phylogeographic features of soybeans. 4.2. Historical dissemination paths of soybeans summarized from present phylogeographic analysis Historical dissemination paths of soybeans may be summarized by integration of the phylogeographic results obtained from the analysis of geographic-ecological-traits and molecular marker-haplotype (gene-allele) traits with information in the literature. The first path of dissemination was from the center of origin to northeast China, although previous researchers [10] considered northeast China as part of the center of origin. However, though it is part of China, it is not necessarily part of the center of origin of soybean. Population A-NCHN clustered apart from O-HCHN and

oo

f

O-SCHN with the earliest-maturing groups and SPAs more than those in the centers of origin. For this reason, we assigned it as the first secondary center A-NCHN. A-NCHN, A1-RUFE, and A2-SSWE were not in the same cluster in Fig. 2-B, but shared the same genetic cluster V in Table 5, while A1-RUFE was very similar to

pr

A-NCHN in clustering distribution (Table 5, cluster II, III, IV, and V). Accordingly, A-NCHN, A1-RUFE, and A2-SSWE were placed in Group “A”, with the latter two as populations derived from A-NCHN. Here, it was

e-

understood that the early soybeans in A-NCHN were introduced mainly from the center of origin, where very

Pr

early-maturing varieties, such as MG 00 and MG 000, were developed intensively and independently along with introductions from other regions used in breeding programs, which caused the present A-NCHN, A1-RUFE, and A2-SSWE grouped into different cluster branches. This grouping is in accord with a previous [14] grouping of

al

Russia and northeast China as a soybean pool, Swedish and Russian varieties maintain a close relationship [11]

rn

and several Swedish accessions originated in northern Manchuria (northern northeast China) [40]. The second path of dissemination was from the center of origin eastward to B-KORP and then to B-JPAN from

Jo u

about the first century A.D. Multiple paths were taken, one through the border area between northeast China and the Korean peninsula and others via ocean voyages, which possibly brought soybeans directly to B-KORP or B-JPAN. These two secondary “B”s span the same range of maturity groups 0–VII even in a small territory, with a wide range of flowering and maturity dates, they were genetically clustered in a separate cluster apart from that of the center of origin (Fig. 2-B) and contained wider genetic clusters (II, IV, V, and even VI). These secondary “B”s adapted to maritime conditions and formed a new geographic group. In the literature [3, 21, 41], Korean peninsula and Japanese soybeans are genetically similar, but different from Chinese soybeans, in agreement with the present finding that B-KORP and B-JPAN shared a common origin but are genetically grouped into a cluster other than that containing the “O” population. The third path of dissemination was from the center of origin southward to Southeast Asia via commerce and migration, afterwards to South Asia, and then to Africa (C-SEAS, C1-SASI, and C2-AFRI). Soybeans were first cultivated in Africa in 1896 [12]. They were grown first in Algeria and then introduced to South Africa, Mauritius, and Tanzania. We supposed that this was a joint path with B-KORP or B-JPAN. However, geographic-ecological data showed that their maturity groups were mainly later ones (MG III-IV and MG VII-X in C-SEAS, MG VI-X in C1-SASI and C2-AFRI) and adapted to short daylength and warm temperature, with the greatest number of new SPAs in C1-SASI. These derived populations were genetically clustered mainly in the same cluster (Group 16

Journal Pre-proof IV) with centers of origin O-HCHN and O-SCHN (Fig. 2-B, Table 5), suggesting that these three populations showed small genetic differences from “O”s. Thus, C-SEAS was a secondary center and the other two were tertiary centers. The fourth or the most recent dissemination path was mainly from China and later Japan westward to Europe with missionaries and sailors, with soybeans then grown in France and England in the late 18th century [42]. Soybeans were introduced by Samuel Bowen from China into the USA via London in 1765 [43]. In 1893, Zavitz introduced soybeans into Canada, and bred forage soybean cultivars [11]. Afterwards, soybean germplasm was reintroduced from China several times, especially from northeast China to the USA. The first soybean accessions were introduced from the southern USA into Brazil around the end of the 19th century [13], followed by the introduction of some accessions from Philippines, Thailand and Pakistan. Soybeans were brought from Brazil

f

directly to Argentina and other countries. With suitable climate and soil conditions, Brazil and Argentina became

oo

large soybean producers. In the meantime, breeding progress was made, contributing to the formation of a potential new gene center of soybeans there. The present study genetically clustered the secondary centers

pr

D-NNAM and D-SNAM and the tertiary center D1-CSAM in the same group with A-NCHN (Fig. 2-B) grouped with the major part (55 of 64 accessions) of D-NNAM in the same major group V, meaning that the germplasm of

e-

D-NNAM, D-SNAM, and D1-CSAM was mainly from northeast China. Thus, the fourth path of dissemination was actually from the center of origin to A-NCHN via Europe or directly to North America, in general agreement

Pr

with the literature [44], which indicates that many accessions in USDA soybean collection were from China and the Korean peninsula, with these introductions forming the basis of North American germplasm. These introductions were also used in the breeding programs of southern North America. After adapting to southern

al

North America, D-SNAM was different from D-NNAM. Its major cluster was accordingly Group VI in Table 5. D1-CSAM after a few decades.

rn

Soybeans in Central and South America were originally introduced from D-SNAM and formed the tertiary center

Jo u

Thus, different paths of dissemination resulted in adaptation to different geographic-ecological environments, improvements in meeting local requirements and the formation of different geographic-populations. Among the paths, the dissemination to the north and then to the west (Americas) was the most important because the soybean production of the western world began with the soybean germplasm of northeast China. However, the southward path may represent the starting point for breeding later-maturing soybeans in tropical and subtropical areas.

5. Conclusions World soybean accessions numbering 371 from 13 geographic ecological-regions, their phylogeographic features, and the relationships among their geographic-populations were studied using two sets of trait data: molecular gene/allele or marker/haplotype traits and geography-related photo- and temperature-sensitive traits (flowering date, maturity date, and main stem node number). The latitude (daylength and temperature) of the testing site Nanjing in China allows all materials to blossom and mature or approximately mature naturally, favoring the comparability of geographic-ecological characteristics. The results confirmed the historical differentiation forming geographic populations, including two center-of-origin populations, four secondary centers, and five tertiary centers with different genetic and geo-ecological components. The results of both genetic and phenotypic clustering showed that 11 of 13 geographic-populations were grouped into the same five sets of sensitivity-similar 17

Journal Pre-proof geographic-populations, indicating that geographic-related photo- and temperature-sensitive traits were influential in determining the genetic differences among geographic-populations. Both clustering results grouped the population of northeast China into a secondary center rather than the center-of-origin populations together with the population of northern North America, correcting the conventional model considering it part of the center of origin. The phylogeographic analysis integrated with historical information also provided an explanation for the model of four dissemination paths: from centers of origin to the north (northeast China, Russia, and Sweden/Europe in turn), the east (Korean peninsula and Japan), the south (Southeast Asia, South Asia and Africa) and from northeast China to the west (Europe, northern USA, southern USA, and South America). Among these, the most recent path of dissemination to the Americas via northeast China was the most influential, given that the soybean production of the Western world has grown rapidly in the past half century, accounting for around 90% of current world soybean production. Thus, the soybean geographic-population of northeast China was the one

oo

f

most profoundly influencing the development of soybean production worldwide in modern society.

Acknowledgments

pr

This work was supported by the National Key Research and Development Program of China (2017YFD0101500, 2016YFD0100304), the National Natural Science Foundation of China (31701447, 31671718, 31571695), the

e-

Ministry of Education 111 Project (B08025), the Ministry of Education Program for Changjiang Scholars and Innovative Research Team in University (PCSIRT_17R55), the China Agriculture Research System (CARS-04),

Pr

the Priority Academic Program Development of Jiangsu Higher Education Institutions, the Fundamental Research Funds for the Central Universities and the the Jiangsu Collaborative Innovation Center for Modern Crop

Supplementary material

al

Production, Natural Science Foundation of Jiangsu Higher Education Institutions (19KJB210010).

rn

Supplementary figures and tables for this article can be found online at https://dx.doi.org/10.1016/j.cj 201x.xx.xxx.

Jo u

Fig. S1 – Unrooted trees showing the relationships among 13 regional populations based on three daylength- and temperature-sensitive traits- using the method of average linkage. Fig. S2 – Unrooted trees showing the phenotypic relationships among 371 world soybean accessions. Table S1 – World soybean production in 2010–2018 (1000 t). Table S2 – The sources of 371 soybean accessions. Table S3 – Analysis of molecular variance of the world soybean based on SNPLDB markers

References [1] T. Zhao, J. Gai, The origin and evolution of cultivated soybean [Glycine max (L.) Merr.], Sci. Agric. Sin. 37 (2004) 954–962 (in Chinese with English abstract). [2] M. Bu, T. Pan, A study on the regionalization of soybean producing area in China, Soybean Sci. 1 (1982) 105–121 (in Chinese with English abstract). [3] Z. Li, L. Qiu, J.A. Thompson, M.M. Welsh, R.L. Nelson, Molecular genetic analysis of US and Chinese soybean ancestral lines, Crop Sci. 41 (2001) 1330–1336. [4] Y. Li, R. Guan, Z. Liu, Y. Ma, L. Wang, L. Li, F. Lin, W. Luan, P. Chen, Z. Yan, Y. Guan, L. Zhu, X. Ning, M.J. Smulders, W. Li, R. Piao, Y. Cui, Z. Yu, M. Guan, R. Chang, A. Hou, A. Shi, B. Zhang, S. Zhu, L. Qiu, Genetic structure and diversity of cultivated soybean (Glycine max (L.) Merr.) landraces in China, Theor. Appl. Genet. 117 (2008) 857–871. 18

Journal Pre-proof [5] Y. Dong, L. Zhao, B. Liu, Z. Wang, Z. Jin, H. Sun, The genetic diversity of cultivated soybean grown in China, Theor. Appl. Genet. 108 (2004) 931–936. [6] Y. Li, C. Zhang, M.J.M. Smulders, W. Li, Y. Ma, Q. Xu, R. Chang, L. Qiu, Analysis of average standardized SSR allele size supports domestication of soybean along the Yellow River, Genet. Resour. Crop Evol. 60 (2013) 763–776. [7] J. Gai, D. Xu, Z. Gao, Y. Shimamoto, J. Abe, H. Fukushi, S. Kitajima, Studies on the evolutionary relationship among eco-types of G. max and G. soja in China, Acta Agron. Sin. 26 (2000) 513–520 (in Chinese with English abstract). [8] J. Guo, Y. Wang, C. Song, J. Zhou, L. Qiu, H. Huang, Y. Wang, A single origin and moderate bottleneck during domestication of soybean (Glycine max): implications from microsatellites and nucleotide sequences, Ann. Bot. 106 (2010) 505–514. [9] Z. Wen, T. Zhao, Y. Ding, J. Gai, Genetic diversity, geographic differentiation and evolutionary relationship among ecotypes of Glycine max and G. soja in China, Chinese Sci. Bull. 54 (2009) 4393–4403. [10] R. Singh, T. Hymowitz, Soybean genetic resources and crop improvement, Genome 42 (1999) 605–616.

oo

f

[11] Y.B. Fu, G.W. Peterson, M.J. Morrison, Genetic diversity of Canadian soybean cultivars and exotic germplasm revealed by simple sequence repeat markers, Crop Sci. 47 (2007) 1947–1954.

[12] W. Shurtleff, A. Aoyagi, History of soybeans and soyfoods in Africa, Soyfoods Center, Lafayate, California, 2007.

pr

[13] P.T. Wysmierski, N.A. Vello, The genetic base of Brazilian soybean cultivars: evolution over time and breeding implications, Genet. Mol. Biol. 36 (2013) 547–555.

e-

[14] T. Hymowitz, N. Kaizuma, Soybean seed protein electrophoresis profiles from 15 Asian countries or regions: hypotheses on paths of dissemination of soybeans from China, Econ. Bot. 35 (1981) 10–23.

Pr

[15] J.R. Harlan, Crops and Man, American Society of Agronomy and Crop Science Society of America, Madison, WI, USA, 1992. [16] J.G. Hawkes, The Diversity of Crop Plants, Harvard University Press, Cambridge, MA, USA, 1983. [17] F. Schwanitz, The Origin of Cultivated Plants, Harvard University Press, Cambridge, MA, USA, 1966.

al

[18] J.F. Doebley, B.S. Gaut, B.D. Smith, The molecular genetics of crop domestication, Cell 127 (2006) 1309–1321.

rn

[19] D.Q. Fuller, Contrasting patterns in crop domestication and domestication rates: recent archaeobotanical insights from the Old World, Ann. Bot. 100 (2007) 903–924.

1037–1045.

Jo u

[20] E.M.K. Koinange, S.P. Singh, P. Gepts, Genetic control of the domestication syndrome in common bean, Crop Sci. 36 (1996)

[21] J. Abe, D.H. Xu, Y. Suzuki, A. Kanazawa, Y. Shimamoto, Soybean germplasm pools in Asia revealed by nuclear SSRs, Theor. Appl. Genet. 106 (2003) 445–453.

[22] G.N. Ude, W.J. Kenworthy, J.M. Costa, P.B. Cregan, J. Alvernaz, Genetic diversity of soybean cultivars from China, Japan, North America, and North American ancestral lines determined by amplified fragment length polymorphism, Crop Sci. 43 (2003) 1858–1867. [23] T. Hymowitz, Soybeans: the success story, in: J. Janick, J.E. Simon (Eds), Advances in New Crops, Timber Press, Portland, OR, USA, 1990, pp. 159–163. [24] W.R. Fehr, C.E. Caviness, Stages of soybean development, Special Report 80, Iowa Agricultural Experiment Station, Iowa Cooperative External Series, Iowa State University, Ames, IA, USA. [25] M.G. Murray, W.F. Thompson, Rapid isolation of high molecular weight plant DNA, Nucleic Acids Res. 8 (1980) 4321–4325. [26] P. Andolfatto, D. Davison, D. Erezyilmaz, T.T. Hu, J. Mast, T. Sunayama-Morita, D.L. Stern, Multiplexed shotgun genotyping for rapid and efficient genetic mapping, Genome Res. 21 (2011) 610–617. [27] J. Schmutz, S.B. Cannon, J. Schlueter, J. Ma, T. Mitros, W. Nelson, D.L. Hyten, Q. Song, J.J. Thelen, J. Cheng, D. Xu, U. Hellsten, G.D. May, Y. Yu, T. Sakurai, T. Umezawa, M.K. Bhattacharyya, D. Sandhu, B. Valliyodan, E. Lindquist, M. Peto, D. 19

Journal Pre-proof Grant, S. Shu, D. Goodstein, K. Barry, M. Futrell-Griggs, B. Abernathy, J. Du, Z. Tian, L. Zhu, N. Gill, T. Joshi, M. Libault, A. Sethuraman, X.C. Zhang, K. Shinozaki, H.T. Nguyen, R.A. Wing, P. Cregan, J. Specht, J. Grimwood, D. Rokhsar, G. Stacey, R.C. Shoemaker, S.A. Jackson, Genome sequence of the palaeopolyploid soybean, Nature 463 (2010) 178–183. [28] R. Li, C. Yu, Y. Li, T.W. Lam, S.M. Yiu, K. Kristiansen, J. Wang, SOAP2: an improved ultrafast tool for short read alignment, Bioinformatics 25 (2009) 1966–1967. [29] X. Yi, Y. Liang, E. Huerta-Sanchez, X. Jin, Z.X. Cuo, J.E. Pool, X. Xu, H. Jiang, N. Vinckenbosch, T.S. Korneliussen, H. Zheng, T. Liu, W. He, K. Li, R. Luo, X. Nie, H. Wu, M. Zhao, H. Cao, J. Zou, Y. Shan, S. Li, Q. Yang, Asan, P. Ni, G. Tian, J. Xu, X. Liu, T. Jiang, R. Wu, G. Zhou, M. Tang, J. Qin, T. Wang, S. Feng, G. Li, Huasang, J. Luosang, W. Wang, F. Chen, Y. Wang, X. Zheng, Z. Li, Z. Bianba, G. Yang, X. Wang, S. Tang, G. Gao, Y. Chen, Z. Luo, L. Gusang, Z. Cao, Q. Zhang, W. Ouyang, X. Ren, H. Liang, H. Zheng, Y. Huang, J. Li, L. Bolund, K. Kristiansen, Y. Li, Y. Zhang, X. Zhang, R. Li, S. Li, H. Yang, R. Nielsen, J. Wang, J. Wang, Sequencing of 50 human exomes reveals adaptation to high altitude, Science 329 (2010) 75–

oo

f

78.

[30] P. Scheet, M. Stephens, A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase, Am. J. Hum. Genet. 78 (2006) 629–644.

pr

[31] J.C. Barrett, B. Fry, J. Maller, M.J. Daly, Haploview: analysis and visualization of LD and haplotype maps, Bioinformatics 21 (2005) 263–265.

e-

[32] J.D. Wall, J.K. Pritchard, Haplotype blocks and linkage disequilibrium in the human genome, Nat. Rev. Genet. 4 (2003) 587– 597.

Pr

[33] X.Q. Liu, J.A. Wu, H.X. Ren, Y.X. Qi, C.Y. Li, J.Q. Cao, X.Y. Zhang, Z.P. Zhang, Z.Y. Cai, J.Y. Gai, Genetic variation of world soybean maturity date and geographic distribution of maturity groups, Breed. Sci. 67 (2017) 221–232. [34] K. Liu, S.V. Muse, PowerMarker: an integrated analysis environment for genetic marker analysis, Bioinformatics, 21 (2005)

al

2128–2129.

rn

[35] Z. Wen, Y. Ding, T. Zhao, J. Gai, Genetic diversity and peculiarity of annual wild soybean (G. soja Sieb. et Zucc.) from various eco-regions in China, Theor. Appl. Genet. 119 (2009) 371–381.

Jo u

[36] B.S. Weir, W.G. Hill, Estimating F-statistics, Annu. Rev. Genet. 36 (2002) 721–750. [37] K. Tamura, J. Dudley, M. Nei, S. Kumar, MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0, Mol. Biol. Evol. 24 (2007) 1596–1599.

[38] D. Xu, J. Abe, J. Gai, Y. Shimamoto, Diversity of chloroplast DNA SSRs in wild and cultivated soybeans: evidence for multiple origins of cultivated soybean, Theor. Appl. Genet. 105 (2002) 645–653. [39] X. Wang, L. Chen, J. Ma, Genomic introgression through interspecific hybridization counteracts genecitc bottleneck during soybean domestication, Genome Biol. 20 (2019) 22. [40] S. Holmbeeg, Soybean trials in Sweden, Soybean Digest 11 (1950) 13. [41] M. Perry, M. McIntosh, Geographical patterns of variation in the USDA soybean germplasm collection: I. Morphological traits, Crop Sci. 31 (1991) 1350–1355. [42] W.J. Morse, J. Cartter, L.F. Williams, Soybeans: Culture and Varieties, Farmers’ Bulletin 1520, U.S. Department of Agriculture, Washington D.C., USA, 1949. [43] T. Hymowitz, J.R. Harlan, Introduction of soybean to North America by Samuel Bowen in 1765, Econ. Bot. 37 (1983) 371–379. [44] G. Brown-Guedira, J. Thompson, R. Nelson, M. Warburton, Evaluation of genetic diversity of soybean introductions and North American ancestors using RAPD and SSR markers, Crop Sci. 40 (2000) 815–823.

20