Genetic diversity and differentiation of the African wild rice (Oryza longistaminata chev. et roehr) in Ethiopia

Genetic diversity and differentiation of the African wild rice (Oryza longistaminata chev. et roehr) in Ethiopia

Scientific African 6 (2019) e00138 Contents lists available at ScienceDirect Scientific African journal homepage: www.elsevier.com/locate/sciaf Genet...

3MB Sizes 0 Downloads 58 Views

Scientific African 6 (2019) e00138

Contents lists available at ScienceDirect

Scientific African journal homepage: www.elsevier.com/locate/sciaf

Genetic diversity and differentiation of the African wild rice (Oryza longistaminata chev. et roehr) in Ethiopia Getachew Melaku a,b, Marlee Labroo c, Huang Liyu a, Zhang Shilai a, Huang Guangfu a, Zhang Jing a, Kassahun Tesfaye d, Teklehaimanot Haileselassie d, Fengyi Hu a,∗ a

Research Center of Perennial Rice Engineering and Technology in Yunnan, School of Agriculture, Yunnan University, Kunming 650091, China b Department of Microbial, Cellular and Molecular Biology, College of Natural and Computational Sciences, Addis Ababa University, Addis Ababa 1176, Ethiopia c Department of Crop Sciences, University of Illinois, Urbana, IL 61801, USA d Institute of Biotechnology, College of Natural and Computational Sciences, Addis Ababa University, Addis Ababa 1176, Ethiopia

a r t i c l e

i n f o

Article history: Received 21 December 2018 Revised 23 August 2019 Accepted 26 August 2019

Editor: Dr. B. Gyampoh Keywords: Oryza longistaminata Differentiation Genetic diversity Ethiopia

a b s t r a c t Undomesticated rice species (Oryza longistaminata) which possessed AA genome as O. sativa, is native to Africa and a good candidate for widening the cultivated rice gene pool. So as to use O. longistaminata as a genetic resource in rice breeding, its level of genetic diversity and differentiation must be resolved. Thus, 57 SSR markers across the whole rice genome were used to assess the genetic diversity of 360 O. longistaminata accessions from 12 agroecologies of the Amhara and Gambella regions of Ethiopia. RADSeq based analysis on 87 early maturing O. longistaminata accessions was also done to assess genetic differentiation among the Amhara and Gambella regions of Ethiopia. In this study, the Ethiopian O. longistaminata populations showed high genetic diversity (PIC = 76.5%) with a region based genetic differentiation (Fst = 0.08). Furthermore, genetic relatedness among populations of a region was mainly laid on their geographic provenance. Hence, the high genetic diversity and region based differentiation observed in the Ethiopian O. longistaminata populations could be used as a potential genetic resource in rice improvement programs. © 2019 The Authors. Published by Elsevier B.V. on behalf of African Institute of Mathematical Sciences / Next Einstein Initiative. This is an open access article under the CC BY license. (http://creativecommons.org/licenses/by/4.0/)

Introduction Rice is a staple food for over half of the world’s human population [31]. During domestication, 50–60% of alleles were lost from the nearest wild relatives of the cultivated rice [2]. Since the limited genetic variation for stress-related traits of cultivated rice affecting its productivity [1], there must be an urgent need to explore diverse sources of genes for tolerance to various stress factors [31]. ∗

Corresponding author. E-mail addresses: [email protected] (G. Melaku), [email protected] (M. Labroo), [email protected] (H. Liyu), [email protected] (Z. Shilai), [email protected] (H. Guangfu), [email protected] (Z. Jing), [email protected] (K. Tesfaye), [email protected] (T. Haileselassie), [email protected] (F. Hu). https://doi.org/10.1016/j.sciaf.2019.e00138 2468-2276/© 2019 The Authors. Published by Elsevier B.V. on behalf of African Institute of Mathematical Sciences / Next Einstein Initiative. This is an open access article under the CC BY license. (http://creativecommons.org/licenses/by/4.0/)

2

G. Melaku, M. Labroo and H. Liyu et al. / Scientific African 6 (2019) e00138

Wild Oryza species are important sources of agronomically useful traits and could be used in rice improvement programs [2]. In spite of their rich genetic diversity, only a small proportion of wild Oryza species such as O. nivara and O. glumaepatula have been utilized in rice breeding programs [1]. The vast majority of wild rice genes remain untapped to date, as it is often difficult to transfer these genes into the cultivated rice [31]. Therefore, precise genetic manipulation of desirable traits should be started from a prior and thorough assessment of genetic diversity, population structure and genetic differentiation of the natural wild rice populations. Beyond its use as cereals for human food and beverage, O. longistaminata is a good source of forage [15]. Accessions of O. longistaminata are immune to rice yellow mottle sobemovirus, bacterial blight (Xanthomonas oryzae pv. oryzae) and rice blast (Magnaporthe grisea) [1]. Along with the disease resistant gene, high pollen production, long stigmas and drought tolerance traits of O. longistaminata were transferred to the cultivated rice [15]. Such useful gene transfer tasks from O. longistaminata to O. sativa can be accomplished through conventional hybridization, selection and backcrossing procedures [1]. However, effective introgression is typically hampered by embryo abortion at the initial cross and in the early backcross generations [21]. The strongly perennial and vigorously rhizomatous feature of O. longistaminata enables its extensive distribution throughout Africa [31]. Despite of the excellent gene pool from O. longistaminata, its presence in a rice farm bear a competitive effect on the cultivated rice and serve as an alternate host for some important rice pests and pathogens [3]. In some cases, its seed may be contaminated with the commercial rice grain and hence reduce quality [23]. Thus, local farmers consider O. longistaminata as noxious weed [15]. In Ethiopia, climatic change, overgrazing and human population pressure further shrinks the wetland habitats of O. longistaminata [19]. Studies using AFLP and microsatellite markers have attempted to explore the genetic diversity of O. longistaminata [15,19]. However, detail investigations on its genetic diversity, population structuring and differentiation need to rely on adequate number of accessions and highly resolving molecular tools. Therefore, genetic diversity indices such as; Na, Ne, Ho, He, PIC and Fst from the high length variation of SSR markers along with the highly refined phylogeographic analysis generated from the RADSeq based genome wide SNPs will provide adequate information on the diversity status and phylogeographic relationships of natural populations [9,30]. In the present study, 57 polymorphic SSR markers from each of the 12 chromosomes were used to investigate the genetic diversity and population structure of 360 O. longistaminata accessions from Ethiopia. Moreover, another 87 early maturing O. longistaminata accessions collected from six unique locations of Ethiopia were sampled for the estimation of regional differentiation and phylogenetic tree reconstruction via a multiplexed RADseq based population genomic analyses. Materials and methods Plant material For the SSR analysis, a total of 360 O. longistaminata accessions were randomly sampled from 12 ecologically distinct populations of Ethiopia (Fig. 1; Additional file 1). Additionally, seeds of 87 morphologically vigorous and early maturing O. longistaminata accessions were collected from six narrow niches and sown for the RADSeq analysis (Additional file 2). To collect both leaf and seed samples of the O. longistaminata accessions in the wild and farmers filed, permission was obtained from the local managers and the farmers respectively. Formal identification of the samples was undertaken based on the description listed on flora of Ethiopia and Eritrea [26]. DNA extraction and SSR markers polymerase chain reaction (PCR) Total genomic DNA was extracted from silica gel dried leaves of each accession using CTAB protocol [10]. Quality of the extracted DNA was determined by electrophoresis in a 1% agarose gel and quantification was accomplished using a spectrophotometer. A total of 57 Simple Sequence Repeat markers, with an average of 4.75 per chromosome were used to amplify their targets (Additional file 3). Genome wide coverage and higher PIC value (> 75%) reports from [7,30] respectively, were used to select primers for Polymerase Chain Reaction (PCR). PCR amplification was done in a 10 μl reaction mixture containing 4 μl of genomic DNA, 0.5 μl of each of the two primers (at a concentration of 10 mM), 1.75 μl of a 10X Taq buffer, 0.5 μl of a 2.5 mMdNTP mixture, 0.1 μl of 2.5 U/ μlTaq DNA polymerase and 3 μl of double distilled water. The temperature profile used for PCR amplification consisted of 5 min preheating at 94 °C followed by 34 cycles of denaturation at 94 °C for 45 s, annealing at 55 °C to 67 °C (Additional File 3) for 30 s, extension at 72 °C for 30 s, and a final extension at 72 °C for 7 min. Polyacrylamide gel electrophoresis and SSR alleles scoring The amplified PCR products were electrophoresed in an 8.0% polyacrylamide gel and detected through silver staining as described by Panaud et al. [22] and size was determined using a standardized DNA ladder. Based on the expected PCR product size given in the GRAMENE markers database (http://www.gramene.org), size of the most intensely amplified bands was identified as alleles for the SSR loci. Differently sized amplified bands were scored as different genotypes.

G. Melaku, M. Labroo and H. Liyu et al. / Scientific African 6 (2019) e00138

3

Fig. 1. Site maps for the Amhara and Gambella O. longistaminata populations of Ethiopia (dots in the two study regions designated locations of each population).

SSR data analysis The number of alleles per locus, expected and observed heterozygosity, Polymorphic Information Content (PIC) and F statistics such as Genetic differentiation (Fst ), Wrights fixation index (Fis ) and Total inbreeding coefficient (Fit ) were calculated using GenAlEx 6.502 [24]. The major allelic frequency and PIC of each marker were computed using Power Marker Version 3.25 [17]. Genetic distance and identity along with the Analysis of Molecular Variance (AMOVA) and principal components analysis (PCA) for the 12 O. longistaminata populations of Ethiopia were also conducted by GenAlEx6.502. Based on their computed pair wise genetic distance value, a dendrogram was drawn for the 360 Oryza longistaminata accessions by using DARwin V6 software [25]. Unweighted Pair Group Method with Arithmetic Mean (UPGMA) for the 12 O. longistaminata populations was drawn by the Poptree2 (http://www.ualberta.ca/∼fyeh/fyeh) program. Population structure was inferred via the STRUCTURE v. 2.3 model [28]. Three replications were run with a burn-in period of 150,0 0 0 and 150,0 0 0 repeats of the MCMC method for K value 1 to 12. The optimal number of subpopulations (K) was determined using the K approach as described by Evanno et al. [12] using STRUCTUREHARVESTER v. 0.6.8. [11]. RADSeq based study RAD library preparation and sequencing Total genomic DNA extracted from young leaves using a standard CTAB procedure [20] was concentrated, quantified by PicoGreen and normalized to a 50 ng/ul concentration [27]. The RAD tag library was prepared as described by Clark et al. [8]. Briefly, genomic DNA of 87 O. longistaminata accessions from six locations of Ethiopia were fragmented by PstI and MspI enzymes in a 96-well format. The digestion was held in a 15 μl reaction volume containing 8μl nuclease free water, 0.25 μl of 20,0 0 0 U/ml of MspI, 0.25 μl of PstI-HF (High-Fidelity), 1.5 μl 1X NE Buffer and 5 μl DNA for 3 h at 37 °C followed by enzymes inactivation at 80 °C for 20 min. In the same tube/plate, sequencing adapters with sample specific indices (barcodes) were added and ligated to the PstI end. An 8.5 μl of ligation master mix consisting of 1 μl 10X ligase buffer with ATP, 0.5 μl 10 uM MspI adapter, 1.5 μl 10 mM ATP, 0.1 μL 2 M U/ml T4 ligase and 5.4 μl nuclease free water was used for the ligation reaction at 27 °C for 2 h and followed by ligase inactivation at 65 °C for 20 min.

4

G. Melaku, M. Labroo and H. Liyu et al. / Scientific African 6 (2019) e00138 Table 1 Population based genetic diversity parameters of Oryza longistaminata in Ethiopia. Population

N

Na

Ne

Ho

He

F

PIC

Gambella 1 Gambella 2 Gambella 3 Gambella 4 Gambella 5 Gambella 6 Amhara 1 Amhara 2 Amhara 3 Amhara 4 Amhara 5 Amhara 6 Grand Mean ±SE

28.93 28.68 28.37 27.95 28.02 25.9 28.42 28.21 27.56 26.84 27.7 27.41 27.83±0.15

2.32 2.58 2.32 2.41 2.09 2.28 2.46 2.56 2.37 2.47 2.25 1.7 2.32±0.04

1.64 1.7 1.64 1.76 1.44 1.62 1.65 1.64 1.57 1.54 1.48 1.35 1.59±0.02

0.17 0.19 0.18 0.14 0.16 0.19 0.18 0.17 0.2 0.21 0.16 0.16 0.18±0.009

0.32 0.33 0.3 0.32 0.24 0.31 0.33 0.32 0.31 0.28 0.26 0.18 0.29±0.009

0.46 0.45 0.42 0.55 0.31 0.36 0.38 0.43 0.40 0.37 0.40 0.19 0.40±0.02

77.19% 84.21% 71.93% 84.21% 70.182% 71.93% 85.96% 87.72% 85.96% 84.21% 70.18% 43.86% 76.46%±3.56%

Where; N= Sample size, Na= Actual number of alleles, Ne= Effective number of Alleles, Ho= Observed Heterozygosity, He= Expected Heterozygosity, F= Fixation Index, PIC- Polymorphic Information Content, SE= Standard Error.

The ligated products were pooled in equal volumes and run for 20 min at 100 V in a 2% agarose gel. A smear between 20 0 and 50 0 bp were excised and purified with the QIAquick PCR Purification Kit (Qiagen). As described by Clark et al. [8], PCR was carried out by paired primers that can anneal to the PstI and MspI overhangs. The gel extracted PCR products with a 20 0–50 0 bp size were purified again and a 150-bp paired-end reads were generated by Illumina sequencing. During Illumina based sequencing by synthesis, each fragment was isothermally amplified and form clusters that can be excited by a characteristic fluorescing signal [6]. Signal intensity along with the emission wave length directed the base call. Based on the unique indices introduced during the sample preparation, sequences from a pooled library were separated [16]. RADSeq data processing and analysis In accordance to the barcodes used for each O. longistaminata accession, the process_radtags program from the Stacks v1.48 package was used to demultiplex and sort the FASTQ sequence [6]. During this process, reads having high sequencing quality with the right barcode and an unambiguous RAD site were retained. The demultiplexed reads were aligned against the Oryza glaberima reference genome sequence (version 1, Ensembl release 43) via the Bawtie2 Alignment Tool version 0.11.3 [16]. Based on the alignment positions, population stacks (pstacks) used the Bowtie sequence alignments and built stacks [5]. From such cleaned reads that aligned to a reference genome, polymorphic nucleotide sites (SNPs) were discovered. A set of all possible loci in the populations were merged in to a catalog through the catalog stacks (cstacks) program [6]. The search stacks (sstacks) merged SNPs of the matched loci in the catalog and adjusted haplotype calls [5]. Using the adegenet package of R, genetic relationships among the Ethiopian O. longistaminata accessions were analyzed by the Discriminant Analysis of Principal Components (DAPC) [14]. The UNEAK pipeline in TASSEL 3.0.162 was also used for the base recalibration and phylogenetic tree construction at a minimum call rate of 0.5 and a minimum minor allele frequency of 0.01 [18]. Based on the complete data produced by Stacks, Nei’s genetic distance, pairwise population differentiation (Fst ) and fixation index (Fis ) between the Amhara and Gambella O. longistaminata populations were computed. Results SSR genotyping and comparative polymorphism among the Ethiopian O. longistaminata populations From the 360 O. longistaminata accessions of Ethiopia, a total of 254 alleles ranging from 2 (RM14, RM19, RM134 and RM159) to 8 (RM184 and RM225) with an average of 4.46 alleles per locus were genotyped (Additional file 4). From the assessed microsatellite primers, average polymorphic information content (PIC) of 0.45 was reported. Based on their PIC value, 26 (45.6%) of the 57 SSR markers were highly informative (PIC >0.5). The higher actual and effective number of alleles, observed heterozygosity, gene diversity and PIC values showed majority of the Ethiopian O. longistaminata populations as genetically diverse and highly polymorphic (Table 1). However, Amhara 6 was found to be the least in all genetic diversity parameters. In particular, highest actual (2.58) and effective (1.76) number of alleles were found in Gambella 2 and Gambella 4 populations, respectively. Due to its maximum actual number of alleles, Gambella 2 had higher gene diversity (0.3). Majority of the populations with higher Na, Ne, He also showed higher Fixation Index (>0.46) and Polymorphism (>84%). On pooling the 12 O. longistaminata populations from Ethiopia as one, the SSR polymorphism showed a grand mean of 0.29 gene diversity and 76.46% PIC (Table 1). Generally, all of the assessed O. longistaminata populations showed lower observed than expected heterozygosity.

G. Melaku, M. Labroo and H. Liyu et al. / Scientific African 6 (2019) e00138

5

Table 2 F statistic and probability P (rand > = data) of the 12 Ethiopian Oryza longistaminata populations. F-Statistics

Value

P

Fst Fis Fit Nm

0.33 0.35 0.54 0.92

0.001 0.001 0.001

Where; Fst = Genetic differentiation, Fis = Wrights fixation index, Fit = Total inbreeding coefficient and Nm = Gene flow. Table 3 Analysis of molecular variance (AMOVA) of the 12 Ethiopian Oryza longistaminata populations at three levels. Source

Df

SS

MS

Est. Var.

Percentage

Among populations Among accessions Within accession Total

11 348 360 719

3103.608 5592.333 1693.00 10,388.942

282.146 16.07 4.703

4.435 5.684 4.703 14.821

30% 38% 32% 100%

Where; Df= Degrees of Freedom; SS= Sum of squares; MS= Mean squares; Est. Var.= estimated variance and%= proportion of genetic variability. Table 4 Nei’s unbiased measures of genetic distance (below diagonal) and genetic identity (above diagonal) among each pair of the twelve O. longistaminata populations. Gambella 1 Gambella 2 Gambella 3 Gambella 4 Gambella 5 Gambella 6 Amhara 1 Amhara 2 Amhara 3 Amhara 4 Amhara 5 Amhara 6 Gambella 1 Gambella 2 Gambella 3 Gambella 4 Gambella 5 Gambella 6 Amhara 1 Amhara 2 Amhara 3 Amhara 4 Amhara 5 Amhara 6

∗∗∗∗

0.11 0.16 0.14 0.20 0.20 0.21 0.22 0.27 0.31 0.30 0.47

0.90

∗∗∗∗

0.12 0.17 0.26 0.26 0.25 0.28 0.29 0.31 0.30 0.50

0.85 0.89

∗∗∗∗

0.87 0.84 0.83

0.18 0.23 0.25 0.24 0.25 0.26 0.32 0.33 0.45

∗∗∗∗

0.82 0.77 0.79 0.84

0.17 0.21 0.24 0.23 0.28 0.38 0.37 0.54

∗∗∗∗

0.82 0.77 0.78 0.81 0.89

0.12 0.27 0.27 0.27 0.34 0.30 0.49

∗∗∗∗

0.81 0.78 0.78 0.79 0.76 0.75

0.29 0.31 0.32 0.34 0.34 0.49

∗∗∗∗

0.80 0.75 0.78 0.79 0.76 0.74 0.88

0.13 0.17 0.33 0.20 0.27

∗∗∗∗

0.77 0.75 0.77 0.75 0.76 0.73 0.84 0.84

0.17 0.32 0.20 0.38

∗∗∗∗

0.73 0.73 0.72 0.68 0.71 0.71 0.72 0.73 0.74

0.31 0.24 0.29

∗∗∗∗

0.74 0.74 0.72 0.69 0.74 0.71 0.82 0.82 0.79 0.79

0.24 0.31

0.32

∗∗∗∗

0.63 0.60 0.64 0.58 0.61 0.61 0.76 0.68 0.75 0.73 0.73 ∗∗∗∗

Genetic differentiation and variance analysis of O. longistaminata in Ethiopia Using the 57 SSR loci, the 12 O. longistaminata populations showed a highly significant (P> 0.001) Genetic differentiation (Fst ) = 0.33, Wrights fixation index (Fis ) = 0.35 and Total inbreeding coefficient (Fit ) = 0.54 (Table 2). An indirect estimate of gene flow using Wright’s Fst showed that the average number of migrants per generation across the populations was Nm = 0.92, with higher gene flow among adjacent populations. For instance, the three highest rates of gene flow computed from the pairwise Fst values of Additional file 5 were between Gambella 1 and Gambella 2 (Nm = 2.73), Gambella 2 and Gambella 3 (Nm = 2.35) and Amhara 1 and Amhara 2 (Nm = 2.27). In particular, the strongly significant Fst value showed a higher divergence or differentiation among the 12 O. longistaminata populations from Ethiopia. The indirect estimation of gene flow using Fst was 0.92. Based on these 12 O. longistaminata populations, the Analysis of molecular variance (AMOVA) showed 30% variance among populations, 38% variance among accessions and 32% variance within accessions (Table 3). Genetic distance and identity among the twelve O. longistaminata populations Based on pairwise comparison of the twelve Ethiopian O. longistaminata populations, the highest distance (0.54) was observed between populations of Gambella 4 and Amhara 6. The least genetic identity (0.6) was found among Gambella 2 and Amhara 6. To the contrary, both highest identity (0.9) and smallest genetic distance (0.11) were found between Gambella 1 and Gambella 2 (Table 4). SSR markers based cluster and structure analysis From the scatter plot, the first two principal components of the 360 O. longistaminata accessions accounted 15.87% of the total variations. The first coordinate precisely separated the Amhara popoulations from those of Gambella. In particular, the

6

G. Melaku, M. Labroo and H. Liyu et al. / Scientific African 6 (2019) e00138

Fig. 2. Two-dimensional scaling of principal component analysis for 12 Oryza longistaminata populations from 57 SSR markers.

Fig. 3. UPGMA tree of the 12 Oryza longistaminata groups based on corrected genetic distance.

whole accessions from Amhara 6 formed a distinct cluster (Fig. 2). Moreover, the relatively distant position of Amhara 6 to the whole Gambella populations was coincided with the higher genetic distance or lower genetic identity among Amhara 6 and the rest Gambella Populations (Table 4). Unlike the PCA, UPGMA grouped the 12 O. longistaminata populations into two main clusters comprising Amhara 4 and Amhara 6 in the first and the rest 10 Populations in the other (Fig. 3). The second cluster was partitioned in to two subclusters having all the 6 Gambella populations in the first and the remaining 4 Amhara populations in the second. Moreover, the Amhara populations were separated from the Gambella’s and undergone continuous partitioning on the basis of their geographic provenance. Similar to the UPGMA based dendrogram, the hierarchical clustering of the 360 Oryza longistaminata accessions showed 2 main groups categorizing all Amhara accessions in one and the rest Gambella accessions in the other (Fig. 4). As showed in the PCA analysis (Fig. 2), there is no intermingled accession among Amhara 6 and Amhara 4. Despite, the UPGMA implicated their relatedness. Regarding population admixture, almost all O. longistaminata accessions were residing in their own populations. However, very few accessions were observed as admixed. STRUCTURE analysis suggested that the optimal number of subpopulations (K) for all O. longistaminata individuals in the study is two (Fig. 5). At K = 2, all Gambella populations comprised one cluster and the whole Amhara’s hold the second (Fig. 6). Regional genetic diversity indices Geographic region based genetic diversity of the Ethiopian O. longistaminata groups was further assessed by the 57 SSR markers. The Amhara region showed more polymorphism (PIC = 98.25%) and gene diversity (0.42) than the Gambella (He = 0.4 and PIC = 94.74%) (Table 5). Pairwise genetic differentiation, Nei genetic distance and identity among the two

G. Melaku, M. Labroo and H. Liyu et al. / Scientific African 6 (2019) e00138

7

Fig. 4. Dendrogram based on Ward’s minimum variance for 360 O. longistaminata accessions.

Fig. 5. The modal value K at k = 2 (the highest peak).

Fig. 6. STRUCTURE bar graph of Oryza longistaminata populations at k = 2.

regions were 0.08, 0.15, 0.87 and 0.87, respectively. A Wrights fixation index of 0.5 and total inbreeding coefficient of 0.05 were also recorded in between the Amhara and Gambella regions of Ethiopia (Table 5).

8

G. Melaku, M. Labroo and H. Liyu et al. / Scientific African 6 (2019) e00138 Table 5 Allelic diversity, genetic diversity and pairwise differentiation among the Amhara and Gambella regions of Ethiopia. Region

Gambella

Amhara

Actual number of alleles (Na) Effective number of alleles (Ne) Expected heterozygosity (He) Polymorphic Information Content (PIC) Pairwise Nei genetic distance Pairwise Nei genetic identity Pairwise genetic differentiation (Fst) Wrights fixation index (Fis) Total inbreeding coefficient (Fit)

3.89 1.94 0.4 94.74%

3.93 1.98 0.42 98.25% 0.15 0.87 0.08 0.5 0.05

Table 6 Sequence types generated after running the process_radtags program of Stacks. Type of sequences

No of reads

Frequency of reads

Retained reads Ambiguous bar code drops Low quality read drops Ambiguous rad tags drops Total

652,611,547 7,454,638 624,658 3,639,933 664,330,776

0.982 0.011 0.001 0.005 1.00

RADSeq analysis RAD tag generation In this study, a multiplexed and reduced genome sequencing strategy was used to explore genetic diversity and regional differentiation of O. longistaminata in Ethiopia. Whole genome RAD tag libraries at PstI cut sites were created from individually barcoded and sequenced DNA fragments of 87 O. longistaminata accessions of 6 different populations. The processed_radtags program retained 98.2% (652,611,547) reads of the Fastq data. However, the remaining 1.1%, 0.1% and 0.5% represented the ambigious barcode drops, low quality read drops and ambiguous RAD tag drops, respectively (Table 6). Phylogeny and discriminant analysis of principal components (DAPC) Reduced genome multiple alignments by the UNEAK pipeline in TASSEL was used to identify single nucleotide polymorphism markers. From such single nucleotide polymorphisms (SNPs), phylogenetic relationship among 87 O. longistaminata accessions of Ethiopia was visualized by a neighbor-joining tree (Fig. 7). The phylogenetic tree generated in this study was principally organized by the accessions’ geographic origins. For instance, the most distantly related Amhara accessions were in a clear and well-separated cluster. However, accessions from the five Gambella populations were found to be admixtures. Like the Neighbor-joining tree, discriminant analysis of principal components (DAPC) from the adegenet R package of the 87 O. longistaminata accessions of Ethiopia showed lack of genetic differentiation among the Gambella accessions (Fig. 8). However, accessions from Amhara were the most differentiated than that of Gambella. Region based differentiation The Bayesian information criterion (BIC) versus number of clusters graph of this study showed a clear decrease of BIC until k = 2 clusters and after which BIC increased (Fig. 9). In this case, the elbow in the curve also matched with the smallest BIC and clearly indicated retaining of the 2 clusters. From analysis of the resulting multiple alignment, a clear geographical stratification of the Amhara population from the Gambellas along with evidence of admixture between the Gambella populations were observed. This pattern of clustering was totally alike with grouping of the whole accessions in to two clusters (Fig. 10). The RADSeq analysis in the present study showed the Nei’s genetic distance for the Amhara and Gambella regions as 0.2478 and 0.2237, respectively. Pairwise comparison or genetic differentiation (Fst ) between the Amhara and Gambella regions was estimated as 0.0602 (Table 7). Moreover, the genome wide pattern of Wright’s inbreeding coefficient (Fis ) value across the two regions was 0.3268. Discussion Patterns of molecular diversity of O. longistaminata in Ethiopia In this study, the twelve Ethiopian O. longistaminata populations showed high genetic diversity. Inclusion of a number of O. longistaminata accessions from different agroecologies of Ethiopia could be a factor for such high polymorphism.

G. Melaku, M. Labroo and H. Liyu et al. / Scientific African 6 (2019) e00138

9

Fig. 7. 30,256 RAD Tag SNPs based phylogenetic tree of 87 O. longistaminata accessions of Ethiopia. (branches in a blue are the Amhara accessions and the rest Gambella accessions are designated by red). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

10

G. Melaku, M. Labroo and H. Liyu et al. / Scientific African 6 (2019) e00138

Fig. 8. Accession based DAPC from 30,256 nuclear SNPs. Principal Component I showed regional differentiation and principal component II revealed genetic difference among the accessions. Here, each dot represents a given O. longistaminata accession and the different colors stand for population of each accession. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Fig. 9. Graph of BIC versus clusters number of the Ethiopian O. longistaminata accessions from the RADSeq SNPs. Table 7 Region level genetic distance along with pairwise comparison of genetic distance (Fst ) and genetic identity (Fis ) values from the RADSeq analysis. Amhara Nei’s genetic distance Fst with Nei’s correction Wright’s inbreeding coefficient (Fis )

Gambella

0.2478 ± 0.0011 0.2237 ± 0.0009 0.0602 ± 0.0005 0.3268 ± 0.0017

G. Melaku, M. Labroo and H. Liyu et al. / Scientific African 6 (2019) e00138

11

Fig. 10. Region based DAPC plot generated by adegenet. Principal coordinate II clearly differentiated the Amhara accessions from the Gambella. And principal coordinate I displayed genetic differentiation among accessions of a region.

Genotyping of 192 cultivated rice lines using 61 SSR markers reported a very low PIC than this study [20]. Actually, number of SSR markers used, repeats size of those SSR markers and their position across the genome will complicate the contrast between studies [[7,30]]. Of the diversified Ethiopian O. longistaminata populations, Gambella 2 and Gambella 4 showed the highest genetic diversity indices and hence suggested for the intraspecific crossing. Exceptionally, Amhara 6 was found to be less diverse. As shown in additional file 1, Amhara 6 accessions were the only collections from a rice farm. Hence, the very low diversity status can be associated with the strong weeding out of O. longistaminata by farmers. Despite of its very low diversity status, the high genetic distance of Amhara 6 with Gambella 2 and Gambella 4 is an ideal resource for an interspecific crossing and widening the gene pool of O. longistaminata. Genetic differentiation and feature of natural O. longistaminata populations Among the twelve Ethiopian O. longistaminata populations, highly significant F statistic values have been reported (Table 3). The Fst value in this study showed a wider differentiation among the Ethiopian O. longistaminata groups. Such population differentiation is expected as genetic differentiation among populations is not evenly distributed across the sampled area [4]. On the contrary, the AMOVA implicated the highest component of variance (38%) among accessions (Table 4). Thus, higher genetic variation was found within than among the Ethiopian O. longistaminata populations. The self-incompatible feature of O. longistaminata possibly raised the gene flow and variation among accessions of a given population. All of the Ethiopian O. longistaminata populations showed lower observed heterozygosity than the expected (Table 1). This observation inferred a significant deviation of the Ethiopian O. longistaminata populations from the Hardy–Weinberg expectation. Fixation index of this study (Fis =0.35) was however lower than the Fis =0.75 of the 320 O. longistaminata accessions of [19] and Fis =0.958 from 900 cultivated, wild and weedy rice individuals of [4]. Population structure, regional differentiation and geographic factors As shown in Fig. 2, coordinate 1 of the PCoA existed in between the Amhara and Gambella O. longistaminata accessions. Almost all of those accessions were clustered on the basis of their geographic locations. Similarly, both UPGMA and hierarchical clustering methods showed further division of the main clusters based on respective geographic locations. However, very few O. longistaminata accessions looked admixed. Such intermingled accessions in a cluster could be attributed to the high gene flow. Population structure of this study also sorted the Amhara and Gambella O. longistaminata populations of Ethiopia in to two independent clusters (Fig. 6). From this structure analysis, it is very clear that the Amhara and Gambella O. longistaminata groups of Ethiopia represent two different gene pools. Pairwise geographical differentiation analysis among the Amhara and Gambella regions showed a higher differentiation (Fst = 0.08) than differentiation report of the same regions (Fst = 0.064) by Melaku et al. [19]. Comparison between the two regions for all genetic variability parameters such as Na, Ne, He and PIC showed the Amhara to be higher than Gambella. This relatively wider genepool in the Amhara region can be attributed to its altitude, environmental heterogeneity and scarce water level [15]. It is known that elevation has a particular importance on affecting phenology, population size, population density, gene flow, genetic drift and genetic structure. However, genetic differentiation between the two regions might be a function of climatic difference as well. Hence, the very wide altitudinal, mean temperature, annual rainfall differences along with diverse modes of adaptations to variable soil types and ecological conditions affected the genetic differentiation among the two eco-geographic regions of Ethiopia.

12

G. Melaku, M. Labroo and H. Liyu et al. / Scientific African 6 (2019) e00138

Similar to the pattern of regional comparison from the SSR markers of this study (Table 5), the genetic distance generated from the RADSeq analysis was higher in the Amhara accessions than with in the Gambella region (Table 7). This result is expected in populations experiencing significant levels of gene flow [19]. The DAPC and phylogenetic tree generated from the genome wide RADSeq data showed the strong impact of regional difference on the genetic relationships of the Ethiopian O. longistaminata accessions. The admixtured Gambella accessions in both phylogenetic and DAPC analysis explicitly implicated the strong effect of edaphic, altitudinal and other physical factors. The two independent clusters inferred from the BIC versus number of clusters graph of the whole accessions were also in harmony with the region based DAPC. In the present study, both SSR and RADSeq data were used to draw pairwise comparisons of the two ecological regions of Ethiopia. Based on the RADSeq data, genetic differentiation between the Amhara and Gambella regions was estimated as 0.0607. This Fst value which is very close to zero could not show a biologically significant differentiation among the regions. In contrast, the resolving power generated from the SSR markers showed a better level of genetic differentiation (Fst ) between the Amhara and Gambella regions. This variation among the two molecular tools could be associated with the higher variance in the allele frequency of SSR markers. Wright’s inbreeding coefficient (Fis ), which showed loss of heterozygosity with respect to the expectation under Hardy– Weinberg Equilibrium theory is used to detect a hidden population structure [13]. The positive Fis from the whole genome reduction technique indicated the nonrandom mating or cryptic population structure [29]. According to Catchen et al. [6], many genomic regions exhibited higher Fis values from the recurrent introgressive hybridization of the differentiated populations. However, Fis from the genome wide analysis of the two regions was very less than from the value of the 57 SSR markers of this study. This pattern of low average Fis and variance can be an indication for the older and stable populations or assortative mating with in the regions [5]. Conclusion This study assessed the genetic diversity and relationship of Ethiopian O. longistaminata populations via 57 SSR markers across rice genome and their regional differentiation using 30, 256 SNPs that were generated from RADSeq technique. The population structuring on the basis of the 57 SSR markers also indicated two gene pools comprising the Amhara populations in one and the Gambella populations in the second subgroup. On the basis of their geographic regions, the assessed O. longistaminata populations of Ethiopia were genetically different and their further subgrouping was concurrent with geographical proximity. Similarly, the RADSeq analysis showed region based genetic distinctness of O. longistaminata in Ethiopia. Moreover, the high level of genetic variation from the natural O. longistaminata populations can be utilized in rice improvement programs. Declaration of Competing Interest The authors declare that they have no competing interests. Acknowledgments This work is part of the first author’s PhD thesis. The authors would like to thank Addis Ababa University and Yunnan University for the material and technical supports of this research. We would also like to thank the Ethiopian Biodiversity Institute (EBI) for allowing the transfer of the study material to China. Funding This work is financially supported by grants from the Yunnan province Science and Technology Department, China [2016BB001] for the SSR markers based genetic diversity study, National key research and development plan of China [2016YFD0300508] for the regional differentiation study of the Ethiopian O. longistaminata accessions via RADSeq technique and Yunnan University services in Yunnan [2016ZD03] for providing the whole laboratory facilities. Authors’ contributions FH and GH conceived the project. GM, SZ and JZ collected and prepared the samples. FH and GM designed the experiment. GM, LH and GH performed the experiment. GM and ML performed the genotyping, data analysis and drafting of the manuscript. FH, TH, and KT revised the manuscript. All authors read and approved the final manuscript. Supplementary materials Supplementary material associated with this article can be found, in the online version, at doi:10.1016/j.sciaf.2019.e00138.

G. Melaku, M. Labroo and H. Liyu et al. / Scientific African 6 (2019) e00138

13

References [1] D.S. Brar, G.S. Khush, Alien gene introgression in rice, Plant Mol. Biol. 35 (1997) 35–47. [2] D.S. Brar, G.S. Khush, Utilization of wild species of genus oryza in rice improvement, in: J.S. Nanda, S.D. Sharma (Eds.), Monograph on Genus Oryza, Science Publ., Enfield, 2003, pp. 283–309. [3] I. Buddenhagen, Bacterial blight of rice appears to be indigenous on wild rice species in west Africa, Int. Rice Res. Newsl. 7 (1982) 5. [4] Q. Cao, B. Lu, H. Xia, J. Rong, F. Sala, A. Spada, F. Grassi, Genetic diversity and origin of weedy rice (Oryza sativa f. spontanea) populations found in north-eastern china revealed by simple sequence repeat (SSR) markers, Ann. Bot. 98 (2006) 1241–1252. [5] J. Catchen, S. Bassham, T. Wilson, M. Currey, C. O’Brien, Q. Yeates, W.A. Cresko, The population structure and recent colonization history of oregon threespine stickleback determined using restriction-site associated DNA-sequencing, Mol. Ecol. 22 (2013) 2864–2883. [6] J. Catchen, P. Hohenlohe, S. Bassham, A. Amores, W. Cresko, W. Stacks, an analysis tool set for population genomics, Mol. Ecol. 22 (2013) 3124–3140. [7] X. Chen, S. Temnykh, Y. Xu, G.Y. Cho, R.S. McCouch, Development of a microsatellite framework map providing genome-wide coverage in rice (Oryza sativa L.), Theor. Appl. Genet. 95 (1997) 553–567. [8] V.L. Clark, E.J. Brummer, K. Głowacka, M. Hall, K. Heo, J. Peng, T. Yamada, H.J. Yoo, Y.C. Yu, H. Zhao, P.S. Long, J.E. Sacks, A footprint of past climate change on the diversity and population structure of miscanthus sinensis.", Ann. Bot. (2014), doi:10.1093/aob/mcu084. [9] J.W. Davey, M.L. Blaxter, RADSeq: next-generation population genetics, Brief Funct Genom. 9 (2010) 416–423. [10] J.J. Doyle, J.L. Doyle, A rapid dna isolation procedure for small quantities of fresh leaf tissue, Phytochem. Bull 19 (1987) 11–15. [11] D.A. Earl, B.M. von Holdt, Structure harvester: a website and program for visualizing structure output and implementing the evanno method, Conserv. Genet. Resour. (2011), doi:10.1007/s12686- 011- 9548- 7. [12] G. Evanno, S. Regnaut, J. Goudet, Detecting the number of clusters of individuals using the software structure: a simulation study, Mol. Ecol. 14 (2005) 2611–2620. [13] K.E. Holsinger, B.S. Weir, Genetics in geographically structure populations: defining, estimating and interpreting fst, Nat. Rev. Genet. 10 (2009) 639–650. [14] T. Jombart, S. Devillard, F. Balloux, Discriminant analysis of principal components: a new method for the analysis of genetically structured populations, BMC Genet. 11 (2010) 94. [15] D.K. Kiambi, H.J. Newbury, B.V. Ford-Lloyd, I. Dawson, Contrasting genetic diversity among oryza longistaminata (A. chev. et roehr) populations from different geographic origins using aflp, Afri. J. Biotechnol. 4 (2005) 308–317. [16] B. Langmead, Aligning short sequencing reads with bowtie, Curr. Protoc. Bioinform. (2010), doi:10.1002/0471250953.bi1107s32. [17] K. Liu, S.V. Muse, PowerMarker: integrated analysis environment for genetic marker data, Bioinformatics 21 (2005) 2128–2129. [18] et F. Lu, A.E. Lipka, J. Glaubitz, R. Elshire, Switchgrass genomic diversity, ploidy, and evolution: novel insights from a network-based snp discovery protocol, PLoS Genet. 9 (2013) e1003215. [19] G. Melaku, T. Haileselassie, T. Feyissa, S. Kiboi, Genetic diversity of the African wild rice (Oryza longistaminata chev. et roehr) from Ethiopia as revealed by SSR markers, Genet. Resour. Crop Evol. 60 (2013) 1047–1056. [20] V.V. Nachimuthu, R. Muthurajan, S. Duraialaguraja, R. Sivakami, A.B. Pandian, G. Ponniah, K. Gunasekaran, M. Swaminathan, K.K. Suji, R. Sabariappan, Analysis of population structure and genetic diversity in rice germplasm using ssr markers: an initiative towards association mapping of agronomic traits in oryza sativa, Rice 8 (2015) 1–24. [21] H.I. Oka, Origin of Cultivated Rice, Japan Scientific Societies Press, Tokyo, 1988. [22] O. Panaud, X. Chen, S.R. McCouch, Development of microsatellite markers and characterization of simple sequence length polymorphism (SSLP) in rice (Oryza sativa L.), Mol. Gen. Genet. 252 (1996) 597–607. [23] C. Parker, M.L. Dean, Control of wild rice in rice, Pestic. Sci. 7 (1976) 403–416. [24] R. Peakall, P.E. Smouse, GenAlEx 6.5: genetic analysis in excel. population genetic software for teaching and research-an update, Bioinformatics 28 (2012) 2537–2539. [25] X. Perrier, J.P. Jacquemoud-Collet. (2006). DARwin software. htp:/ww.darwin.cirad.fr/darwin. [26] S. Phillips, Poaceae (Gramineae), in: I. Hedberg, S. Edwards (Eds.), Flora of Ethiopia and Eritrea, 7, The national herbarium, Addis Ababa University, Ethiopia and department of systematic botany, Uppsala University, Sweden, 1995, pp. 9–10. [27] J.A. Poland, P.J. Brown, M.E. Sorrells, J.L. Jannink, Development of high-density genetic maps for barley and wheat using a novel two-enzyme genotyping by sequencing approach, PLoS One 7 (2) (2012) e32253, doi:10.1371/journal.pone.0032253. [28] J.K. Pritchard, M. Stephens, P. Donnelly, Inference of population structure using multilocus genotype data, Genetics 155 (20 0 0) 945–959. [29] M. Slatkin, Inbreeding coefficients and coalescence times, Genet. Res. 58 (1991) 167–175. [30] S. Temnykh, W.D. Park, N. Ayers, S. Cartinhour, N. Hauck, L. Lipovich, Y.G. Cho, T. Ishii, S.R. McCouch, Mapping and genome organization of microsatellite sequences in rice (Oryza sativa L.), Theor. Appl. Genet. 100 (2000) 697–712. [31] P.W. Wambugu, M. Brozynska, A. Furtado, D.L. Waters, R.J. Henry, Relationships of wild and domesticated rices (Oryza aa genome species) based upon whole chloroplast genome sequences, Sci. Rep. 5 (2015) 13957.