De novo assembly and characterization of transcriptome using Illumina sequencing and development of twenty five microsatellite markers for an endemic tree Juglans hopeiensis Hu in China

De novo assembly and characterization of transcriptome using Illumina sequencing and development of twenty five microsatellite markers for an endemic tree Juglans hopeiensis Hu in China

Biochemical Systematics and Ecology 63 (2015) 201e211 Contents lists available at ScienceDirect Biochemical Systematics and Ecology journal homepage...

1MB Sizes 0 Downloads 65 Views

Biochemical Systematics and Ecology 63 (2015) 201e211

Contents lists available at ScienceDirect

Biochemical Systematics and Ecology journal homepage: www.elsevier.com/locate/biochemsyseco

De novo assembly and characterization of transcriptome using Illumina sequencing and development of twenty five microsatellite markers for an endemic tree Juglans hopeiensis Hu in China Yi-Heng Hu a, Peng Zhao a, *, Qiang Zhang a, Yang Wang a, Xiao-Xiao Gao a, Tian Zhang a, Hui-Juan Zhou a, Meng Dang a, Keith E. Woeste b a

Key Laboratory of Resource Biology and Biotechnology in Western China, Ministry of Education, College of Life Sciences, Northwest University, No. 229, Street Taibaibei, Xi'an, Shaanxi, 710069, China USDA Forest Service Hardwood Tree Improvement and Regeneration Center (HTIRC), Department of Forestry and Natural Resources, Purdue University, 715 West State Street, West Lafayette, IN, 47907, USA

b

a r t i c l e i n f o

a b s t r a c t

Article history: Received 20 June 2015 Received in revised form 4 October 2015 Accepted 17 October 2015 Available online 2 November 2015

The Chinese walnut (Juglans hopeiensis Hu) is an endemic temperate tree species and narrowly distributed in China. However, there are still few specific molecular markers for understanding genetic diversity of this walnut. In this study, more than 44 million sequencing reads were generated using Illumina sequencing technology. De novo assembly yielded 93,822 unigenes with an average length of 731 bp. Based on sequence similarity search with known proteins, a total of 39,708 (42.3%) genes were identified. Searching against the Kyoto Encyclopedia of Genes and Genomes Pathway database (KEGG) indicated that 15,903 (17.0%) unigenes. To contribute to its conservation and management, twenty five microsatellite markers were identified in J. hopeiensis were screened for polymorphism across 27 Chinese walnut tree individuals from two locations. The number of alleles ranged from two to nine, observed heterozygosity ranged from 0.016 to 0.933 (mean 0.468), and expected heterozygosity from 0.022 to 0.823 (mean 0.462). The polymorphic information content (PIC) ranged from 0.012 to 0.831 (mean 0.437). The development of these new microsatellite markers will be useful for studying population genetic structure, evolutionary ecology, conservation, and genetic breeding of this endemic walnut tree or other Juglans species. © 2015 Elsevier Ltd. All rights reserved.

Keywords: Microsatellites Juglans hopeiensis Transcriptome Endemic Chinese walnut Genetic diversity

1. Introduction The Chinese walnut (Juglans hopeiensis Hu) is moderately supported by as sister to the remaining three Asian butternut, Juglans mandshurica Maxim., Juglans cathayensis Dode and Juglans ailantifolia Carr., belongs to Juglans, the section

* Corresponding author. Tel./fax: þ86 02988302411. E-mail addresses: [email protected] (Y.-H. Hu), [email protected] (P. Zhao), [email protected] (Q. Zhang), [email protected] (Y. Wang), [email protected] (X.-X. Gao), [email protected] (T. Zhang), [email protected] (H.-J. Zhou), [email protected] (M. Dang), [email protected] (K.E. Woeste). http://dx.doi.org/10.1016/j.bse.2015.10.011 0305-1978/© 2015 Elsevier Ltd. All rights reserved.

202

Y.-H. Hu et al. / Biochemical Systematics and Ecology 63 (2015) 201e211

Cardiocaryon, which are monoecious wind-pollinated deciduous trees native to East Asia (Aradhya et al., 2007). It is an endemic tree species, valuable for its wood and nuts. The high quality wood can be used to made butt and aircraft. The nut shell was hard, big and variety beautiful, which was used to artist in China. This walnut species is an endangered and endemic tree species, which is narrowly distributed in northern Hebei province near Beijing in northeast China. As an artistically important resource, J. hopeiensis has been suffering from severe population decline due to habitat destruction and indifferent consciousness. Previous studies on J. hopeiensis have mainly focused on molecular phylogeny (Aradhya et al., 2007), traditional genetic breeding, and genetic relationships and diversity among this walnut species germplasm collections. Based on the original sequences used to identify simple repeats, SSRs can be divided into genomic SSRs (simple sequence repeats) and expressed sequence tag sequences EST-SSRs. Traditional methods to isolate and identify genomic SSRs are costly, labor-intensive, and time-consuming (Zane et al., 2002; Squirrell et al., 2003). Genomic SSRs proved to useful and effective molecular tool in genetic variation and population structure surveys several decades ago (Robichaud et al., 2006; Kalia et al., 2011; Almeida et al., 2014). Alternatively, EST-SSRs are more evolutionary conserved than non-coding sequences, which are a relatively high transferability. Up to now, a large number of microsatellite markers of Juglans (walnut) have been employed to investigate the genetic diversity and differentiation of populations. Nuclear microsatellite markers have been developed in other Juglans, such as black walnut (Juglans nigra L.) (Woeste et al., 2002), Persian walnut (Juglans regia) (Dangl et al., 2005), and butternut (Juglans cinerea L.) (Ross-Davis and Woeste, 2008). EST-SSR development through transcriptome sequencing is becoming more desirable over other methods in the model and non-model plants (Zeng et al., 2010; Li et al., 2012; Ji et al., 2014). Although, there are some SSR markers by data mining of the EST database for J. regia (Zhang et al., 2010) and J. cathayensis (Dang et al., 2015). However, nuclear microsatellite markers (simple sequence repeats, SSRs) and expressed sequence tag (EST) sequences SSRs markers have not been developed in J. hopeiensis. Thus, there was a need to develop novel specific microsatellite markers to investigate genetic variation of J. hopeiensis. Large collections of Transcriptome or EST sequences are efficient way to generate functional genomic-level data for nonmodel organisms, which are invaluable for development of molecular markers (Wei et al., 2011). Recently, an increasing number of EST datasets have become available for model and non-model organisms, but relatively few ESTs are currently available for J. hopeiensis. In this study, we utilized Illumina paired-end sequencing technology to characterize the pooled transcriptome of buds, leaves, and fruit of common walnut and to develop polymorphic EST-derived SSR markers. We screened 27 individuals from two locations in its geographic distribution area. These EST-SSRs will be useful for investigating population genetics, phylogeographic, conservation, and genetic history of J. hopeiensis. These microsatellite markers will be also useful for evolutionary ecology, and genetic breeding other Juglans species. 2. Materials and methods 2.1. Sample collections, DNA extraction, and RNA extraction Twenty-seven leaf and seed samples were collected in 2013 from J. hopeiensis natural populations in two locations in China (LS ¼ 15,115 330 E, 39 320 N; LM ¼ 12,115 260 E, 39 580 N) (Fig. 1). DNA was extracted following the methods described by Doyle and Doyle (1987) and Zhao and Woeste (2011), and stored at 20  C. Fresh leaves, buds, and fruit were separately collected from one Chinese walnut tree (J. hopeiensis) in the Xiaolongmen National Forest Park, immediately frozen in liquid nitrogen and then stored at 80  C. Total RNA was extracted using Plant RNA Kit (OMEGA Bio-Tek, Norcross, GA, USA). RNA degradation and contamination was monitored with 1% agarose gels. RNA purity was assayed using the Nano Photometer® spectro photometer (IMPLEN, Westlake Village, CA, USA) and RNA concentration was measured using Qubit® RNA Assay Kit in Qubit® 2.0 Flurometer (Life Technologies, Carlsbad, CA, USA). RNA integrity was assessed using the RNA Nano 6000 Assay Kit of the Agilent Bioanalyzer 2100 system (Agilent Technologies, Santa Clara, CA, USA). 2.2. RNA-seq lib preparation for transcriptome sequencing RNA-seq libraries were generated using NEBNext® Ultra™ RNA Library Prep Kit for Illumina® (NEB, Beverly, MA, USA) following manufacturer's recommendations and index codes were added to attribute sequences to each sample. Briefly, mRNA was purified from 3 mg total RNA using poly-T oligo-attached magnetic beads. Remaining overhangs were converted into blunt ends via exonuclease/polymerase activities of DNA polymerase and RNase H. After adenylation of 30 ends of DNA fragments, NEBNext Adaptor with hairpin loop structure were ligated to prepare for hybridization. To enrich for ~150 bp to 200 bp cDNA fragments, the library was first purified using the AMPure XP system (Beckman Coulter, Beverly, MA, USA). Then 3 mL USER Enzyme (NEB, Beverly, MA, USA) was used with size-selected, adaptor-ligated cDNA at 37  C for 15 min followed by 5 min at 95  C. Next, PCR was performed using the Phusion High-Fidelity DNA polymerase, Universal PCR primers and Index (X) Primer. At last, PCR products were purified (AMPure XP system) and library quality was assessed on the DNA high sensitivity chips in Agilent Bioanalyzer 2100 system. 2.3. Transcriptomes assembling and gene annotation Illumina HiSeq2000 sequencing was performed by Novogene Bioinformatics Technology Co., Ltd., Beijing, China (www. novogene.cn). De novo transcriptomes assembly were accomplished using Trinity (Grabherr et al., 2011) by default and all

Y.-H. Hu et al. / Biochemical Systematics and Ecology 63 (2015) 201e211

Fig. 1. The assembly length distribution of Juglans hopeiensis (a, the transcript length distribution; b, the unigene length distribution).

203

204

Y.-H. Hu et al. / Biochemical Systematics and Ecology 63 (2015) 201e211

other parameters set default. For annotating the unigenes of the two transcriptomes from a bioinformatics approach, the unigenes of J. hopeiensis were each searched against the public databases, including the NCBI non-redundant protein sequences (NR) database, NCBI nucleotide sequences (NT) database, eukaryotic ortholog groups (KOG) database, KEGG ortholog (KO) database, Swiss-Prot protein database, Gene Ontology (GO) database, and protein family (PFAM) database. 2.4. Discovery of EST-SSRs, primer design Microsatellites were identified using MISA (http://pgrc.ipk-gatersleben.de/misa/misa.html), sequences with 5 uninterrupted motifs were randomly selected for primer design by Primer3 (http://primer3.sourceforge.net/releases.php). Primer design parameters were set as follows: length range ¼ 18e23 nucleotides with 21 as optimum; PCR product size range ¼ 100 bpe400 bp; optimum annealing temperature ¼ 55  C; and GC content 40e60%, with 50% as optimum. Total of twenty seven leaves from two populations used for each DNA extraction (Table 1). 2.5. Amplification conditions and validation of primers DNA was resuspended in 50 mL of water and dilutions were performed to obtain a final concentration of 10 ng/mL, followed by storage at 20  C. PCR amplification was carried out on a PTC-200 Thermal Cycler (MJ Research, Waltham, MA, USA) in 10 mL reaction volumes (5 mL 2  PCR Master Mix, 0.2 mL each primer, 1 mL BSA, 1 mL of 10 ng/mL DNA). The PCR was programmed for 3 min at 94  C followed by 35 cycles of 15 s at 93  C, 1 min at annealing temperature (TA) (Table 2), 30 s at 72  C and extension of 10 min at 72  C. PCR products were separated by size on 10% polyacrylamide gels and visualized by silver staining. Fragment sizes of each locus were estimated using Quantity One Software (Bio-Rad Laboratories, Drive Hercules, CA, USA) and a 50 bp ladders size standard. Twenty-seven individuals from two populations were used to evaluate the amplification success and Microsatellite markers loci polymorphism. 2.6. Microsatellites data analysis Genetic diversity per locus and population were evaluated through the following descriptive summary statistics: number of alleles (NA), observed (HO) and expected (HE) heterozygosity, and inbreeding coefficient (FIS) using the program GenAlEx6.5 (Peakall and Smouse, 2012). GENEPOP version 4.2 (Rousset, 2008) and Arlequin 3.5 (Excoffier and Lischer, 2010) were used to test the HardyeWeinberg equilibrium (HWE) and linkage disequilibrium (LD) for all loci. The program CERVUS version 3.0 (Kalinowski et al., 2007) was used to calculate polymorphic information content (PIC). Null alleles were found using MICROCHECKER 2.2.3 (Van Oosterhout et al., 2004). Principal coordinate analyses (PCoA) using of the program GenAlEx6.5 (Peakall and Smouse, 2012). 3. Results 3.1. Sequence assembly and microsatellites (SSR) enrichment A total of 4.44 G clean high quality reads were used to de novo assemble transcriptomes using Trinity (Grabherr et al., 2011), which was including 44,431,989 reads. GC content was 46.6%. Do novo assemblies generated 217,803 transcripts including 93,822 unigenes using software Trinity. The length of the transcripts varied from 201 bp to 13,169 bp, with an average of 1058 bp, and the N50 value was 1677 bp. The length of the unigenes varied from 201 bp to 13,169 bp, with an average of 731 bp, and the N50 value was 1259 bp (Fig. 1). The Chinese walnut tree (J. hopeiensis) with the maximum of reads count was 1460, 613.54 and RPKM was 26,837.25 (Appendix Fig. S1). The transcripts with the length over 500 bp accounted for about 61.9% while the number of length between 200 bp to 500 bp was 83,080. The unigenes with the length over 500 bp accounted for about 40.7% with the number of length between 200 bp to 500 bp was 55,665 (Fig. 1; Appendix Fig. S2; Appendix Fig. S3). In total, 16,699 sequences were contained SSRs, which are 17.8 percent in all unigenes from transcriptome sequences, and the distribution density was 181.26 per Mb. In these loci, mononucleotide repeats were the most frequent (7208, or 43.2%), followed by dinucleotide repeats (6180, or 37.0%), then by trinucleotide repeats (2288, or 13.7%) (Fig. 2). SSRs with ten tandem repeats 3682 (21.0%) were the most common, followed by six tandem repeats (14.3%), followed by eleven tandem repeats (9.8%), followed by seven tandem repeats (9.3%), five tandem repeats (9.0%), eight tandem repeats (6.7%), and >11 tandem Table 1 Twenty-seven individuals from two locations of Juglans hopeiensis used in this study. Collection site

Population code

Species

Sample size

Longitude (E)

Latitude (N)

Elevation/m

Laishui, Hebei Xiaolongmen, Beijing Total

LS LM

J. hopeiensis J. hopeiensis

15 12 27

115.5635 115.4494

39.5639 39.9839

311 1267

Y.-H. Hu et al. / Biochemical Systematics and Ecology 63 (2015) 201e211

205

Table 2 Characterization of 25 microsatellite loci in 27 individuals of Juglans hopeiensis Hu. Locus

Sequence (50 e30 )

Unigene size (bp)

Repeat

TA( C)

NA

Size range (bp)

HO (LS/LM)

HE (LS/LM)

PHW (LS/LM)

JH6514

F:CGTTACGTCGGGAGGATGAG R:CCTCGTTCGTAGTCTCAGCC F:CGACAGCCTCACCACTTTCT R:GAAGGTGGATTCGCAACAGC F:ACCTTCCCTGCTCCTCTCTT R:GAGCCTTGTGGAAGCAAACG F:CAGTTTTGGCCAGCTGCAAT R:TGTGCCCATGCTAAGACTGG F:ATTGTTGTTGTTGCGGAGGC R:TCACACCTTTCCTCTCCCCT F:CCGAACAGCTTGCACTTGAC R:ATGGTGTACCTAACGAGCCT F:GAAAAGCATGGTCCTGCTGC R:ATTGAGCGACGAAAAGGGGT F:CGAACCTAAGGCGTTCCTGT R:TGGCGGCCATGGAGTATTTT F:CCTCGTCTCCTCCCCTAACA R:GTAGGATAGTGTGGCGTCGG F:GTAGGATAGTGTGGCGTCGG R:CCTCGTCTCCTCCCCTAACA F:AGCAATCGAACATGAGGCCA R:TTCCCCTTACTTGACGCACC F:ACTGTATCGTCTTCCAATGTGGA R:CGCTTTGGATAAATGGGGCG F:AGTCACAGTACTGGATAAAAGCAGA R:TGCTCAGGAGACAGTTGACT F:TCTCAACCTCGGCTTGTGTC R:CCTCTAAACCATCGCACCCA F:AAGCTATGTTGGCTGCTGGT R:ATTGTTCAGCGGTTGCCCTA F:CACCCACCAAGCCATACCAT R:CTTCGGCGTCCTTCTTCGTA F:TCTGAGGAAGCTGCATGGAA R:AACTCTGGACACATGCCGC F:GGTGCGAGGGATGATGATGT R:CGCAGGTTGAAGTCCTCTGT F:GAAGGAGACAATGGCATTAGCT R:AGATCGGCCTCCAGCTAGTT F:GTGTGTGGTCCTCCAACTCT R:TTTCAAAGCGACTGCACAGC F:CTCGGAAATCCTCGGCAACT R:ACAAGTGGCTTTTTGCGAGT F:AGTGTGAGACAACCTTGGTGG R:TCCTGCAAATTGACATGGTTTTCT F:TGTCGAGTTGTGCTTGTGGA R:TGAGAAGGGAGATGGCGAGA F:TGTACCTGTACGTGCTGCTG R:TGCAATCCAACGGCTCTGAT F:ACTTGCTCCTTTGTCACCCC R:GTGCACCCCCGAAAAATTCC

1453

(TTAGGG)6

55

4

132e162

(CT)8t(TC)10

50

4

203e215

4384

(GGT)6

55

6

189e204

454

(GCT)6

55

9

188e212

341

(GTGCG)5

55

5

170e190

963

(TAAT)5

55

3

213e229

663

(CTG)7

55

6

211e238

1440

(CT)8t(TC)10

50

2

266e280

1403

(CCA)7

60

6

222e240

1285

(TGG)7

55

5

218e240

2294

(AAG)6

50

2

322e328

386

(TCA)6

55

3

251e260

397

(ATT)7

55

2

241e253

(TA)7tgggg(A) 10 (GCA)7

58

2

275e290

53

4

258e276

(CTC)6

58

3

278e296

2220

(TGCA)6

55

8

264e312

2194

(GA)10

58

3

204e208

2264

(CTAG)5

55

4

296e328

3848

(TA)10

55

7

208e230

(CA)6tatgg(T)13

55

2

149e161

1867

(GTCT)5

55

3

213e225

1001

(AAG)8

53

2

197e203

1807

(GGC)7

55

2

217e223

1859

(TC)10

53

2

240e244

0.358/ 0.219 0.611/ 0.497 0.800/ 0.597 0.716/ 0.823 0.436/ 0.747 0.231/ 0.625 0.607/ 0.649 0.480/ 0.497 0.651/ 0.028 0.642/ 0.080 0.041/ 0.219 0.551/ 0.330 0.038/ 0.495 0.500/ 0.068 0.558/ 0.684 0.491/ 0.219 0.816/ 0.719 0.584/ 0.469 0.584/ 0.622 0.624/ 0.601 0.022/ 0.080 0.452/ 0.325 0.542/ 0.421 0.423/ 0.368 0.651/ 0.352

0.239/0.621

4283

0.467/ 0.250 0.400/ 0.917 0.867/ 0.333 0.800/ 0.250 0.333/ 0.833 0.267/ 0.500 0.867/ 0.750 0.667/ 0.917 0.600/ 0.622 0.600/ 0.083 0.028/ 0.250 0.400/ 0.417 0.032/ 0.900 0.800/ 0.061 0.933/ 0.583 0.400/ 0.083 0.800/ 0.917 0.267/ 0.250 0.467/ 0.583 0.356/ 0.417 0.016/ 0.083 0.354/ 0.258 0.521/ 0.368 0.325/ 0.230 0.562/ 0.354

JH2678 JH9978 JH2753 JH2751 JH8883 JH1908 JH6876 JH6044 JH0405 JH8061 JH1424 JH9127 JH9543 JH2096 JH4093 JH4548 JH1195 JH0608 JH8168 JH0190 JH2576 JH9664 JH4114 JH5908

1858 2234 467

402

0.399/0.003** 0.062/0.012** 0.115/0.014** 0.391/0.722 0.551/ 0.008*** 0.020*/0.084 0.132/0.034* ND/ND 0.013*/0.880 ND/0.621 0.002**/0.362 ND/0.010** 0.000***/ND 0.002**/0.364 0.107/0.032* 0.016*/ 0.008** 0.033*/0.106 0.005**/0.512 0.030*/0.073 ND/0.880 0.210/0.062 ND/ND ND/ND ND/ND

ND indicates that not done; PHW -probability of HardyeWeinberge equilibrium (*P < 0.05; **P < 0.01; ***P < 0.001); TA-annealing temperature; NA-number of alleles; HO-observed heterozygosity; HE-expected heterozygosity.

repeats (20.6%). The dominant repeat motif in EST-SSRs was A/T (44.3%), followed by AG/CT (32.8%), AAG/CTT (4.7%), AT/AT (3.6%), and AC/GT (2.7%) (Appendix Table S1). 3.2. Gene annotation of J. hopeiensis transcriptomes In this study, the Blast2GO program (Conesa et al., 2005) was firstly used to analyze GO annotation of the assembled unigenes, and then the GO functional classifications of these unigenes were performed with WEGO software (Ye et al., 2006). In total, 39,708 (42.3%) unigene sequences were annotated to GO classes with 93,822 unigenes (Fig. 3). The assignments to the biological process made up the majority (103,733, 45.9%), followed by the cellular component (71,745, 31.8%), and molecular function (50,278, 22.3%). The cellular process (23,927, 10.6%) and metabolic process (22,848, 10.1%) were the first and second largest categories in total of 21 categories of the biological process, while the less were rhythmic process (35, 0.015%) and cell killing (30, 0.013%), respectively. For the cellular component, cell (14,690, 20.5%), cell part (14,657, 20.4%) and organelle

206

Y.-H. Hu et al. / Biochemical Systematics and Ecology 63 (2015) 201e211

Fig. 2. Simple sequence repeats (SSR) motifs distribution of Juglans hopeiensis Hu. Mono ¼ single nucleotide repeats, Di ¼ double nucleotide repeats (dinucleotide), Tri ¼ Three nucleotide repeats (trinucleotide), Tetra ¼ four nucleotide repeats, Penta ¼ five nucleotide repeats, and Hexa ¼ six nucleotide repeats.

(10,322, 14.4%) were the most among the 19 level-3 categories, while the less were synapse part and synapse. For the molecular function, binding (23,336, 46.4%) and catalytic activity (19,739, 39.3%) were the most among the 13 level-2 categories, while the less were extracellular matrix and extracellular matrix. The most highly represented terms were binding and catalytic activity in the 11 level-2 categories, while the less were receptor regulator activity and metallochaperone activity (Fig. 3). 3.3. Functional classification by the orthologous groups (COG) The protein database of COGs is an attempt to phylogenetically classify the complete complement of proteins encoded in a complete genome. Each COG is a group of three or more proteins that are inferred to be orthologs, i.e., they are direct evolutionary counterparts. Therefore, the COG reflects one-to-many and many-to-many orthologous relationships as well as simple one-to-one relationship. All unigenes were aligned to the COG database to predict and classify possible functions. Out of 27,435 Nr hits, 18,001 sequences were assigned to the COG classifications (Fig. 4). Among the 25 COG categories, the cluster for General function prediction only (3306, 18.4%) represented the largest group, followed by Posttranslational modification, protein turnover, chaperones (2380, 13.2%), Signal transduction mechanisms (1,703, 9.5%), Transcription (1137, 6.3%), Translation, ribosomal structure and biogenesis (1117, 6.2%), Intracellular trafficking, secretion, and vesicular transport (1200, 6.0%), and Translation, ribosomal structure and biogenesis (1103, 6.1%), whereas only a few unigenes were assigned to Extracellular structures, Cell motility, and Unnamed protein (Fig. 4). 3.4. Functional classification by the KEGG pathway To further analyze the transcriptome of J. hopeiensis, all of the unigenes were analyzed in the KEGG pathway database. The KEGG pathway database is a knowledge base for the systematic analysis of gene functions in terms of networks of genes and molecules in cells and their variants specific to particular organisms. Out of the 15,903 unigenes, 6532 (41.1%) with significant matches in the database were assigned to 5 main categories, including 32 KEGG pathways (Fig. 5). Among these five main categories, translation (1430, 9.0%), followed by carbohydrate metabolism (1379, 8.7%), folding, sorting and degradation (1355, 8.5%), signal transduction (1337, 8.4%) and overview information processing (1031, 6.5%). These results indicate that active metabolic processes were on-going. 3.5. SSR primers screening and verification Thirty-three unigenes were randomly selected for the use of primer design from 16,699 microsatellites which contained the repeats with greater than 5 times and an expected PCR product size of 150 bp to 280 bp but without single nucleotide repeats sequences. PCR products could be amplified from genomic DNA. Genomic DNA of 27 individuals of J. hopeiensis were used as PCR amplification and polymorphic analysis of the 33 microsatellite loci with more than five repeats motifs through nondenaturing polyacrylamide gel (8%) electrophoresis and visualized using silver stain. Among 33 microsatellite loci, 25 loci (75.8%) were successfully amplified and polymorphic in

Y.-H. Hu et al. / Biochemical Systematics and Ecology 63 (2015) 201e211

Fig. 3. Gene Ontology classification of assembled unigenes. The results are summarized in three main categories: Biological process, Cellular component and Molecular function. In total, 93,822 unigenes with BLAST matches to known proteins were assigned to gene ontology.

207

208

Y.-H. Hu et al. / Biochemical Systematics and Ecology 63 (2015) 201e211

Fig. 4. Histogram presentation of clusters of orthologous groups (COG) classification. All unigenes were aligned to COG database to predict and classify possible functions. Out of 27,435 Nr hits, 18,001 sequences were assigned to 25 COG classifications. RNA processing and modification (A), chromatin structure and dynamics (B), energy production and conversion (C), cell cycle control, cell division, chromosome partitioning (D), amino acid transport and metabolism (E), nucleotide transport and metabolism (F), carbohydrate transport and metabolism (G), coenzyme transport and metabolism (H), lipid transport and metabolism (I), transition, ribosomal structure and biogenesis (J), transcription (K), replication, recombination and repair (L), cell wall/membrane/envelope biogenesis (M), cell motility (N), posttranslational modification, protein turnover, chaperones (O), inorganic ion transport and metabolism (P), secondary metabolites biosynthesis, transport and catabolism (Q), general function prediction only (R), function unknown (S), signal transduction mechanisms (T), intracellular trafficking, secretion, and vesicular transport (U), defense mechanisms (V), extracellular structures (W), unnamed protein (X), nuclear structure (Y), cytoskeleton (Z).

these two populations (Table 2; Appendix Table S2). The results showed that, 25 pairs of specific primers can be amplified in J. hopeiensis bands and high specificity (extended effect as shown in Appendix Fig. S4). The numbers of alleles per locus (NA) ranged from 2 to 9 with a mean of 3.96. The observed heterozygosity (HO) and expected heterozygosity (HE) varied from 0.016 to 0.933 (0.468, average) and from 0.022 to 0.823 (0.462, average), respectively. Polymorphic information content (PIC), ranged from 0.012 to 0.831, with a mean of 0.437. Three loci which showed significant departure from HardyeWeinberg equilibrium (HWE) were detected for all individuals when 1000 permutations testing with Genepop (Table 2). It may be caused by the presence of null alleles and sampling effect. The novel set of polymorphic microsatellite loci developed here was existed in the transcribed genes. Thus, they may be particularly useful for determining population differentiation that is related to local adaptation. Microsatellites revealed a clear genetic distinction between the 27 individuals (FST ¼ 0.283, P < 0.001). These results were clearly corroborated by analysis of molecular variance (AMOVA), where the most variance within individuals based on 25 SSRs (Table 2). A high gene flow (Nm ¼ 0.632) was observed between two populations. However, two populations were discriminated along the first two coordinates of the principal coordinate analyses (Fig. 6, accounting for 61.9% of the observed variance). Results of PCoA showed that two populations (LS and LM) were clustered as two groups using 25 EST-SSR to determine different natural populations. 3.6. Discussion The Chinese walnut (J. hopeiensis) is one of the most important domesticated walnut species for its art value in China, marker assisted selection in J. hopeiensis breeding have stagnated for a long time due to three main reasons. On the one hand, it has been not conducted to develop the reliable DNA markers from the transcriptome or EST of J. hopeiensis its own. On the other hand, it is difficult to record and obtain reliable data of quantitative traits from the monoecious wind-pollinated deciduous trees and populations. The third, J. hopeiensis might be hybridized with J. mandshurica, J. cathayensis, or J. regia, which are complex genetic background by human cultivar selection from natural population. Transcriptome sequencing is an effective method to obtain EST sequences that are essential for developing molecular markers (Wei et al., 2011; Salgado et al., 2014; Shrivastava et al., 2014; Zhou et al., 2015; Neri et al., 2015). Prior to sequencing, cDNAs were obtained from leaf, bud, and fruits mixed together in order to increase the sequencing efficiency of rare transcripts. It also provided a better informative transcriptome approach. The average contig length (731 bp) for the target species in this study were comparable to those observed in other studies 843 bp, mung bean (Vigna radiata), (Ellwood et al., 2008); Pinus contorta (719 bp) and P. taeda (500 bp) (Parchman et al., 2010); sweet potato, 790 bp (Schafleitner et al., 2010); 454 bp, field pea (Pisum sativum) and faba bean (Vicia faba) (Franssen et al., 2011); and lentil (Lens culinaris), 770 bp (Kaur et al., 2012) in the revision. Mononucleotide repeats were the most frequent SSR motif type (43.2%). This finding

Y.-H. Hu et al. / Biochemical Systematics and Ecology 63 (2015) 201e211

209

Fig. 5. Pathway assignment based on the Kyoto Encyclopedia of Genes and Genomes (KEGG). (A) Classification based on cellular process categories, (B) classification based on environmental information processing categories, (C) classification based on genetic information processing categories, (D) classification based on metabolism categories, and (E) classification based on organismal systems categories.

Fig. 6. Principal coordinate analyses (PCoA) of two Chinese walnut (J. hopeiensis) populations resolved into two genotype groups based on 25 microsatellite loci. Green Circle: Cluster I, Red Circle: Cluster II.

210

Y.-H. Hu et al. / Biochemical Systematics and Ecology 63 (2015) 201e211

were not consistent with results reported for o il palm, Cocos nucifera, field pea (Pisum sativum) and faba bean (V. faba), but different from Arabidopsis, peanut, canola, sugar beet, cabbage, soybean, sunflower, sweet potato, pea, white poplar, and grape (Kumpatla and Mukhopadhyay, 2005; Du et al., 2012). In this study, 25 (75.6%) out of the total 33 loci of interest could be amplified by PCR successfully. The size of Unigene (twenty five Unigene contained SSRs) ranged from 341 bp to 4384 bp with a mean of 1671 bp. Other microsatellites are probably due to the existence of introns. For example, PCR products of four of the loci, locus JH2678, JH8883, JH6876, and JH8061, were showing the longer size than expected. Some microsatellites were only having two alleles from 27 individuals of J. hopeiensis, which may be caused by the presence of null alleles and sampling effect. We have successfully developed polymorphic EST-SSR markers (Table 1) which we used in this paper to study the genetic diversity of introduced and collected materials. The polymorphic microsatellite loci developed here is a useful tool for population genetic study, evolutionary application, and conservation management of J. hopeiensis. The novel set of polymorphic microsatellite loci developed here was existed in the transcribed genes. The total number of 14 unigenes (64.0%) was found in orthologous groups (GO) functional classification database. The biology process descriptions of these 14 unigenes were cellular protein modification process, transcription, pyrimidine nucleobase metabolic process, cell wall biogenesis, lipid metabolic process etc (Appendix Table S3). Thus, they may be particularly useful for determining population differentiation that is related to local adaptation. Acknowledgments The authors wish to thank Hai-Long Xia, Zhong-Hu Li, Jia Yang, Li Feng, and Tao Zhou for sample collection, and two anonymous reviewers for helpful comments on the manuscript. Mention of a trademark, proprietary product, or vendor does not constitute a guarantee or warranty of the product by the U.S. Department of Agriculture and does not imply its approval to the exclusion of other products or vendors that also may be suitable. This work was supported by the National Natural Science Foundation of China (No. 31200500; No. 41471038; No. J1210063), the Program for Excellent Young Academic Backbones funding by Northwest University, the Northwest University Training Programs of Innovation and Entrepreneurship for Undergraduates (No. 2014003), Education Department of Shaanxi Province Natural Science Funding (No. 12JK0829), Changjiang Scholars and Innovative Research Team in University (No. IRT1174). Appendix A. Supplementary data Supplementary data related to this article can be found at http://dx.doi.org/10.1016/j.bse.2015.10.011. References Aradhya, M.K., Potter, D., Gao, F., Simon, C.J., 2007. Molecular phylogeny of Juglans (Juglandaceae): a biogeographic perspective. Tree Genet. Genomes 3, 363e378. ~o, S.T., Caminero, C., Torres, A.M., Rubiales, D., Patto, M.C.V., 2014. Transferability of molecular markers from major legumes to Lathyrus Almeida, N.F., Leita spp. for their application in mapping and diversity studies. Mol. Biol. Rep. 41, 269e283. €tz, S., García-Go mez, J.M., Terol, J., Talo  n, M., Robles, M., 2005. Blast2GO: a universal tool for annotation, visualization and analysis in Conesa, A., Go functional genomics research. Bioinformatics 21, 3674e3676. Dang, M., Liu, Z.X., Chen, X., Zhang, T., Zhou, H.J., Hu, Y.H., Zhao, P., 2015. Identification, development, and application of 12 polymorphic EST-SSR markers for an endemic Chinese walnut (Juglans cathayensis L.) using next-generation sequencing technology. Biochem. Syst. Ecol. 60, 74e80. Dangl, G.S., Woeste, K., Aradhya, M.K., Koehmstedt, A., Simon, C., Potter, D., Leslie, C.A., McGranahan, G., 2005. Characterization of 14 microsatellite markers for genetic analysis and cultivar identification of walnut. J. Am. Soc. Hortic. Sci. 130, 348e354. Du, Q., Gong, C., Pan, W., Zhang, D., 2012. Development and application of microsatellites in candidate genes related to wood properties in the Chinese white poplar (Populus tomentosa Carr.). DNA Res. 20, 31e44. Doyle, J.J., Doyle, J.L., 1987. A rapid DNA isolation procedure for small quantities of fresh leaf tissue. Phytochem. Bull. 19, 11e15. Ellwood, S.R., Phan, H.T.T., Jordan, M., Hane, J., Torres, A.M., Avila, C.M., Cruz-Izquierdo, S., Oliver, R.P., 2008. Construction of a comparative genetic map in fababean (Vicia faba L.), conservation of genome structure with Lens culinaris. BMC Genomics 9, 380. Excoffier, L., Lischer, H.E.L., 2010. Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows. Mol. Ecol. Resour. 10, 564e567. Franssen, S.U., Shrestha, R.P., Br€ autigam, A., Bornberg-Bauer, E., Weber, A.P.M., 2011. Comprehensive transcriptome analysis of the highly complex Pisumsativum genome using next generation sequencing. BMC Genomics 12, 227. Grabherr, M., Haas, B., Yassour, M., Levin, J., Thompson, D., Amit, I., Adiconis, X., Fan, L., Raychowdhury, R., Zeng, Q., Chen, Z., Mauceli, E., Hacohe, N., Gnirke, A., Rhind, N., di Palma, F., Birren, W.B., Nusbaum, C., Lindbla-Toh, K., Friedman, N., Regev, A., 2011. Full length transcriptome assembly from RNASeq data without a reference genome. Nat. Biotechnol. 29, 644e652. Ji, L., Teixeirada Silva, J.A., Zhang, J., Tang, Z., Yu, X., 2014. Development and application of 15 novel polymorphic microsatellite markers for sect. Paeonia (Paeonia L.). Biochem. Syst. Ecol. 54, 257e266. Kalia, R.K., Rai, M.K., Kalia, S., Singh, R., Dhawan, A.K., 2011. Microsatellite markers: an overview of the recent progress in plants. Euphytica 177, 309e334. Kalinowski, S.T., Taper, M.L., Marshall, T.C., 2007. Revising how the computer program CERVUS accommodates genotyping error increases success in paternity assignment. Mol. Ecol. 16, 1099e1106. Kaur, S., Pembleton, L.W., Cogan, N.O., Savin, K.W., Leonforte, T., Paull, J., Materne, M., Forster, J.W., 2012. Transcriptome sequencing of field pea and faba bean for discovery and validation of SSR genetic markers. BMC Genomics 13, 104. Kumpatla, S.P., Mukhopadhyay, S., 2005. Mining and survey of simple sequence repeats in expressed sequence tags of dicotyledonous species. Genome 48, 985e998. Li, D., Deng, Z., Qin, B., Liu, X., Men, Z., 2012. De novo assembly and characterization of bark transcriptome using Illumina sequencing and development of EST-SSR markers in rubber tree (Hevea brasiliensis Muell. Arg.). BMC genomics 13, 192. Neri, J., Nazareno, A.G., Wendt, T., Palma-Silva, C., 2015. Development and characterization of microsatellite markers for Vriesea simplex (Bromeliaceae) and cross-amplification in other species of Bromeliaceae. Biochem. Syst. Ecol. 58, 34e37.

Y.-H. Hu et al. / Biochemical Systematics and Ecology 63 (2015) 201e211

211

Parchman, T.L., Geist, K.S., Grahnen, J.A., Benkman, C.W., Buerkle, C.A., 2010. Transcriptome sequencing in an ecologically important tree species: assembly, annotation, and marker discovery. BMC Genomics 11, 180. Peakall, R., Smouse, P.E., 2012. GenAlEx 6.5: genetic analysis in Excel. Population genetic software for teaching and research-an update. Bioinformatics 28, 2537e2539. Robichaud, R.L., Glaubitz, J.C., Rhodes Jr., O.E., Woeste, K., 2006. A robust set of black walnut microsatellites for parentage and clonal identification. New For. 32, 179e196. Ross-Davis, A., Woeste, K., 2008. Microsatellite markers for Juglanscinerea L. and their utility in other Juglandaceae species. Conserv. Genet. 9, 465e469. Rousset, F., 2008. Genepop'007: a complete re-implementation of the genepop software for Windows and Linux. Mol. Ecol. Resour. 8, 103e106. s, M.F., Almeida, P., Rocha, V.R., Magalha ~es, M., Gerber, A.L., Figueira, A., Cascardo, J.C. Salgado, L.R., Koop, D.M., Pinheiro, D.G., Rivallan, R., Le Guen, V., Nicola M., Vasconcelos, A.T.R., Silva, W.A., Coutinho, L.L., Garcia, D., 2014. De novo transcriptome analysis of Heveabrasiliensis tissues by RNA-seq and screening for molecular markers. BMC genomics 15, 236. Schafleitner, R., Tincopa, L.R., Palomino, O., Rossel, G., Robles, R.F., Alagon, R., Rivera, C., Quispe, C., Rojas, L., Pacheco, J.A., Solis, J., Cerna, D., Kim, J.Y., Hou, J., Simon, R., 2010. A sweetpotato gene index established by de novo assembly of pyrosequencing and Sanger sequences and mining for gene-based microsatellite markers. BMC Genomics 11, 604. Shrivastava, D., Verma, P., Bhatia, S., 2014. Expanding the repertoire of microsatellite markers for polymorphism studies in Indian accessions of mung bean (Vignaradiata L. Wilczek). Mol. Biol. Rep. 41, 5669e5680. Squirrell, J., Hollingsworth, P.M., Woodhead, M., Russell, J., Lowe, A.J., Gibby, M., Powell, W., 2003. How much effort is required to isolate nuclear microsatellites from plants? Mol. Ecol. 12, 1339e1348. Van Oosterhout, C., Hutchinson, W.F., Wills, D.P.M., Shipley, P., 2004. MICRO-CHECKER: software for identifying and correcting and genotyping errors in microsatellite data. Mol. Ecol. Notes 4, 535e538. Woeste, K., Burns, R., Rhodes, O., Michler, C., 2002. Thirty polymorphic nuclear microsatellite loci from black walnut. J. Hered. 93, 58e60. Wei, W., Qi, X., Wang, L., Zhang, Y., Hua, W., Li, D., Lv, H., Zhang, X., 2011. Global transcriptome using Illumina paired-end sequencing and development of EST-SSR markers. BMC genomics 12, 451. Ye, J., Fang, L., Zheng, H., Zhang, Y., Chen, J., Zhang, Z., Wang, J., 2006. WEGO: a web tool for plotting GO annotations. Nucleic Acids Res. 34, W293eW297. Zane, L., Bargelloni, L., Patarnello, T., 2002. Strategies for microsatellite isolation: a review. Mol. Ecol. 11, 1e16. Zeng, S., Xiao, G., Guo, J., Fei, Z., Xu, Y., Roe, B.A., Wang, Y., 2010. Development of a EST dataset and characterization of EST-SSRs in a traditional Chinese medicinal plant, Epimedium sagittatum (Sieb. EtZucc.) Maxim. BMC Genomics 11, 94. Zhang, R., Zhu, A., Wang, X., Yu, J., Zhang, H., Gao, J., Cheng, Y., Deng, X., 2010. Development of Juglans regia SSR markers by data mining of the EST database. Plant Mol. Biol. Rep. 28, 646e653. Zhao, P., Woeste, K., 2011. DNA markers identify hybrids between butternut (Juglans cinerea L.), and Japanese walnut (Juglans ailantifolia Carr.). Tree Genet. Genomes 7, 511e533. Zhou, X.J., Wang, Y.Y., Xu, Y.N., Yan, R.S., Zhao, P., Liu, W.Z., 2015. De Novo characterization of flower bud transcriptomes and the development of EST-SSR markers for the endangered tree Tapiscia sinensis. Int. J. Mol. Sci. 16, 12855e12870.