Evolution of DUF1313 family members across plant species and their association with maize photoperiod sensitivity

Evolution of DUF1313 family members across plant species and their association with maize photoperiod sensitivity

YGENO-08792; No. of pages: 9; 4C: Genomics xxx (2015) xxx–xxx Contents lists available at ScienceDirect Genomics journal homepage: www.elsevier.com/...

2MB Sizes 4 Downloads 45 Views

YGENO-08792; No. of pages: 9; 4C: Genomics xxx (2015) xxx–xxx

Contents lists available at ScienceDirect

Genomics journal homepage: www.elsevier.com/locate/ygeno

Evolution of DUF1313 family members across plant species and their association with maize photoperiod sensitivity Jia Li a,b,1, Erliang Hu a,b,1, Xueying Chen a,b, Jie Xu a,b, Hai Lan a,b, Chuan Li a,b, Yaodong Hu c, Yanli Lu a,b,⁎ a b c

Maize Research Institute, Sichuan Agricultural University, Chengdu 611130, Sichuan, China Key Laboratory of Biology and Genetic Improvement of Maize in Southwest Region, Ministry of Agriculture, China Institute of Animal Genetics and Breeding, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu 611130, Sichuan, China

a r t i c l e

i n f o

Article history: Received 6 October 2015 Received in revised form 16 December 2015 Accepted 4 January 2016 Available online xxxx Keywords: Association analysis DUF1313 family Maize Phylogenetic analysis Selection pressure

a b s t r a c t Proteins of the DUF1313 family contain a highly conserved domain and are only found in plants; they play important roles in most plant functions. In this study, 269 DUF1313 genes from 81 photoautotrophic species were identified; they were classified into three major types based on the amino acid substitutions in the conserved region: IARV, I(S/T/F)(K/R)V, and IRRV. Phylogenic tree constructed from 51 DUF1313 genes from graminoids revealed three clades: A, B1, and B2. Clade B1 was found to have undergone episodic positive selection after a gene duplication event and included four amino acid sites under positive selection. The association between DUF1313 family members and traits investigated in maize indicated that three of four genes (GRMZM2G025646, GRMZM5G877647, GRMZM2G359322, and GRMZM2G382774) were associated with the target traits such as days to silking, days to tasselling, and plant height. The nucleotide diversity of the most primitive and highly conserved DUF1313 gene, ELF4-like4, was the highest in Tripsacum and the lowest in maize. Tajima's D and Fu and Li's D tests revealed that significant purifying selection had occurred in the coding sequence region of this DUF1313 gene in teosinte and maize. No significant signal was detected in the 5′-untranslated region of this gene in each of the three species (maize, teosinte, and Tripsacum) or in any gene regions of Tripsacum. Phylogenetic analyses revealed that the 103 accessions of maize, teosinte, and Tripsacum can be grouped into four clades based on the ELF4-like4 gene sequence similarity. Thus, this gene can be used to determine the relationships between maize and its relatives, and the DUF1313 family members and alleles identified in this study might be valuable genetic resources for molecular marker-assisted breeding in maize. © 2015 Elsevier Inc. All rights reserved.

1. Introduction Maize is a facultative short-day plant: it flowers earlier when grown under short-day conditions [1]. Although most temperate maize cultivars do not respond to changes in day length, tropical maize cultivars are usually sensitive to such variation. In addition, like the wild relatives, tropical maize shows greater genetic variation than temperate maize. Thus, when grown in temperate regions, tropical maize cultivars require improved adaptability to long-day conditions [2]. Plant breeders are currently investigating strategies to improve the adaptability of tropical maize by exploring its high level of genetic diversity. The domain of unknown function (DUF) is a protein domain with unknown function. Protein families with such domains have been deposited together in the Pfam (http://pfam.xfam.org/family) database and labeled using the prefix “DUF” followed by a number [3]. More than 20% of all protein domains are currently annotated as DUF, many ⁎ Corresponding author at: Maize Research Institute, Sichuan Agricultural University, Chengdu 611130, Sichuan, China. E-mail address: [email protected] (Y. Lu). 1 These authors contributed equally to this work.

of which are highly conserved, indicating that they play important roles in the biology of the plants in which they are found [4,5]. The DUF1313 family consists of several hypothetical plant proteins having length of around 100 residues. DUF1313 genes are known to be only present in plants and contain a highly conserved domain [6]. Early Flowering 4 (ELF4) belongs to the DUF1313 family of plant proteins [7]. In Arabidopsis, ELF4 encodes a protein of 111 amino acids without a known protein signature. The Arabidopsis elf4 mutants flower early under short-day conditions because of the reduced ability to sense variations in day length. The tendency of elf4 mutants to flower early is thought to be attributed to the elevated expression of CONSTANS (CO) [6]. Furthermore, under red-light conditions, the normal expression level of ELF4 is induced by phytochrome B (PHYB), whereas phyb decreases the expression of ELF4. ELF4 also plays a role in the PHYBmediated induction of de-etiolation of seedlings under red-light conditions; elf4 mutant seedlings exhibit relatively weak sensitivity to red light, leading to delayed de-etiolation [8]. The ELF4-homologous gene Die Neutralis (DNE) was found to affect the rhythmic expression of clock genes in peas under continuous light and dark conditions [9]. Traits associated with target genes have been previously identified; this approach can be used to determine the relationship between the

http://dx.doi.org/10.1016/j.ygeno.2016.01.003 0888-7543/© 2015 Elsevier Inc. All rights reserved.

Please cite this article as: J. Li, et al., Genomics (2015), http://dx.doi.org/10.1016/j.ygeno.2016.01.003

2

J. Li et al. / Genomics xxx (2015) xxx–xxx

DUF1313 gene family members and photoperiod sensitivity in maize. Association mapping has been successfully used to explore new functional genes in maize. Wilson [10] used an association approach to evaluate six candidate maize genes associated with kernel starch biosynthesis. Liu et al. [11] used a diverse group of 368 maize inbreds to evaluate the association between nucleic acid variation of each ZmDREB gene and drought tolerance. They found a significant association between the genetic variation of ZmDREB2.7 and drought tolerance during the seedling stage. Buckler et al. [12] used a large nested association mapping (NAM) population containing 25 recombinant inbred line (RIL) populations to investigate variations in flowering time. They found numerous small-effect quantitative trait loci (QTLs) that were common among families, as well as allelic effects that differed across founder lines. These findings suggest that identifying favorable allelic variations within either gene families or a single gene is possible by using a diverse population or genetic linkage population. This study aimed to (1) identify DUF1313 family genes in plants and analyze the phylogenetic relationships among them; (2) use association mapping to identify the gene members associated with photoperiod sensitivity in maize; (3) evaluate the associations between nucleic acid variations of each candidate gene and photoperiod sensitivity by using a NAM population; and (4) clone ELF4-like4 homologous genes and analyze the evolutionary history of maize and its wild relatives. These results might allow better understanding of the evolutionary origins of the DUF1313 gene family and its association with photoperiod sensitivity. In addition, the gene members and alleles associated with photoperiod sensitivity identified in this study might become valuable genetic resources for molecular marker-assisted breeding in maize.

amino acid sequence of AtELF4 (NC_003071). To ensure that the complete sequence of each species was retrieved, the results of the search were repeatedly resubmitted to the database search as query terms until no new genes were found. Since the contents of different databases partially overlapped, the sequences of DUF1313 genes were integrated and redundancy was removed by manual screening. Subsequently, all gene sequences obtained were analyzed using the online software programs SMART and Pfam to predict the conserved domains and remove redundant and low-homology sequences. DUF1313 family members were then individually screened and identified in the different plant species. MAFFT software has greater speed and higher accuracy than CLUSTALW for alignment of amino acid or nucleotide sequences [16]. Therefore, multiple sequence alignment analysis was performed using MAFFT online software version 7 (http://mafft.cbrc. jp/alignment/server/). The conserved amino acid motifs were analyzed using MEME–MAST program (http://meme-suite.org/tools/ meme). Neighbor-Joining (NJ) trees were constructed using calculated amino acid substitution models in MEGA v5.0 [17]. A Jones–Taylor– Thornton + Gamma-distributed model was estimated using ProtTest v2.4 [18], and a bootstrap test with 1000 replications was performed. Maximum likelihood (ML) trees were constructed using PhyML v3.1 [19], with default parameters and bootstrap test replication number set to 1000. Further, Bayesian inference (BI) trees were constructed using MrBayes v3.2 [20]. Four Markov Chain Monte Carlo (MCMC) chains were run for 2,000,000 generations. The trees were then viewed in Figtree v1.4.2. (http://tree.bio.ed.ac.uk/software/).

2. Materials and methods

2.3. dN:dS ratio estimates and selection pressure of DUF1313 genes in graminoids

2.1. Data sets The NCBI (http://www.ncbi.nlm.nih.gov/), JGI (http:// phytozome.jgi.doe.gov/pz/portal.html), Pfam (http://pfam.xfam. org/), and UniprotKB (http://www.uniprot.org/uniprot/) databases were used to search for Arabidopsis ELF4 homologues in plant species. An association panel of 513 maize inbred lines was used to identify single nucleotide polymorphisms (SNPs) in genes of the DUF1313 family that are significantly associated with photoperiod sensitivity. This panel included 556,809 high-quality SNPs [13,14]; it was downloaded from http://www.maizego.org/Resources.html. The traits related to photoperiod sensitivity, including days to silking (DS), days to tasselling (DT), pollen shed (PS), plant and ear height (PH and EH), tassel length (TL), tassel branch number (TBN), and the number of leaves above the ear (LN) [15], were investigated in five different environments, i.e., Ya'an (30°N, 103°E), Sanya (18°N, 109°E), and Yunnan (25°N, 102°E) in 2009, and Guangxi (23°N, 110°E) and Yunnan in 2010. The SNPs localized to four gene regions of the DUF1313 family in maize were extracted for further candidate gene based-association mapping of the photoperiod sensitivity-related traits. Genotype-by-sequencing (GBS) analyses were conducted on a NAM population containing 25 RIL populations [12]; eight photoperiod sensitivity traits, including DS, DT, days to silking at growing degree day (GDD) (GDD-DS), days to tasselling at GDD (GDD-DT), anthesissilking interval at GDD (GDD-ASI), tassel length (TL), and PH and EH, were measured in multiple environments during 2006–2007. The GBS and phenotypical data were downloaded from http://www.panzea. org. The SNPs localized to the four maize DUF1313 genes were extracted to verify the associations between favorable alleles and traits related to photoperiod sensitivity in the 25 RIL populations. 2.2. Identification and reconstruction of phylogeny of the DUF1313 gene family in plants In this study, ELF4 homologues in plant species were searched for in the Pfam, UniprotKB, NCBI, and JGI databases by using BLASTP and the

How selection pressure operates on the major graminoid phylogenetic clades was determined by extracting 51 DUF1313 genes from different graminoid species. The sequence alignment of the 51 DUF1313 proteins and the corresponding coding sequences (CDSs) were converted to a codon alignment by using PAL2NAL (http://www.bork.embl.de/ pal2nal). Phylogenetic trees were reconstructed using the relaxed clock model in BEAST v1.6.1 [21]. The general time reversible + I + G substitution model was selected using the jModelTest, and a Yule tree prior was used for the analyses. The analysis was run for at least 200,000,000 generations, while sampling for every 1000 states. Next, a branch model (model = 2, NS sites = 0) was used to estimate the nonsynonymous to synonymous site ratios (dN:dS) for each major clade by using the CODEML program in the PAML 4.7 [22] package. The branches of each major clade were set to the foreground, and the remaining branches of the full gene tree were assigned to the background. The dN:dS values were estimated using the following equation: fix_omega = 0; omega = 1. Duplication events of DUF1313 genes were inferred using Notung 2.6 [23]. Notung offers a unified framework for incorporating duplication-loss parsimony into phylogenetic tasks. First, a rooted phylogenetic tree (graminoid species tree) and a gene tree (DUF1313 gene tree) were constructed using NCBI Taxonomy Browser (http://www. ncbi.nlm.nih.gov/Taxonomy/CommonTree/wwwcmt.cgi) and gene nucleotide sequences, respectively. Subsequently, both the trees were used to infer duplication and loss of gene events during species evolution by using Notung 2.6 [23]. Whether the major ancestral gene duplications were followed by a strong shift in selective constraints was determined by setting these nodes as the foreground branches in a branch model analysis (model = 2, NS sites = 0), with all remaining branches of the full gene tree set as background. The amino acid sites that had experienced a shift in selective pressure throughout the evolution of the gene family were identified by conducting branch-site tests. Subsequently, branchsite model analyses were performed for the branches with the ancestral gene duplications set as foreground and the remaining subtree branches set as background. The branch-site model (model = 2, NS sites = 2)

Please cite this article as: J. Li, et al., Genomics (2015), http://dx.doi.org/10.1016/j.ygeno.2016.01.003

J. Li et al. / Genomics xxx (2015) xxx–xxx

3

Fig. 1. Conserved amino acid motifs of DUF1313 genes in plants. Abscissa indicates amino acid residue number, and vertical axis indicates residue height, as described by the percentage of residues in the type. A, B, and C indicate IARV-type, I(S/T/F)(K/R)V-type, and IRRV-type genes, respectively. Amino acid residues in parentheses indicate variable residues. The black box indicates the four conserved amino acid residues of the DUF1313 family, among the three types.

with a varying dN:dS value (fix_omega = 0, omega = 2) was used to calculate the likelihood of positive selection at each site along a branch. 2.4. Candidate gene-based association mapping of DUF1313 family members in maize In all, 18, 17, 26, and 30 SNPs were detected in the gene region of four DUF1313 genes (GRMZM2G025646, GRMZM2G382774, GRMZM5G877647, and GRMZM2G359322, respectively) from 513 maize inbred lines. The influence of environmental factors was eliminated by generating the best linear unbiased predictors for each trait in different environments by using SAS v8 software. The general linear model (GLM) and GLM combined with Q matrix (population structure) in TASSEL v3.0 [24] were used to conduct association mapping between the SNPs of candidate genes and the eight traits in the 513 maize inbred lines; p b 0.01 was considered as the significance threshold.

CATGAAGCACT-3′). The PCR products were then cloned into the pMD19-T vector (TaKaRa) according to the manufacturer's instructions, and the positive clones were sequenced after three replications. Multiple sequences were aligned using MAFFT online software. Unique substitutions in single clones were ignored, and several identical sequences were represented by a single sequence in the alignments. The linkage disequilibrium (LD) levels of the ELF4-like4 genes of maize, teosinte,

2.5. Favorable allele detection of DUF1313 genes in the 25 RIL populations in maize The GBS data were used to identify 8, 1, 5, and 11 SNPs in the four DUF1313 family genes (GRMZM2G025646, GRMZM2G382774, GRMZM5G877647, and GRMZM2G359322, respectively). The SNPs found in DUF1313 genes associated with the eight photoperiod sensitivity traits were analyzed using a t-test for the 25 RIL populations, where p b 0.001 was set as the significance threshold. Finally, the ttest results of the 25 RIL populations were combined to calculate the rate of significant sites as a percentage of the total number of sites. The percentage values were then used to draw heatmaps by using HemI v1.0 [25]. 2.6. Polymorphism, linkage disequilibrium, and evolution analyses of the ELF4-like4 genes in maize and its relatives Being the most conserved DUF1313 family member in maize, ELF4like4 gene sequences were generated for 29 maize inbred lines, 58 teosinte accessions from six species, and 16 Tripsacum accessions from six species (Table S1) by using ELF4-specific primers (forward primer 5′GAGTATCTTGCGGATTATGTG-3′ and reverse primer 5′-CTCGCCTTGA

Fig. 2. Phylogenetic neighbor-joining (NJ) tree of 269 DUF1313 family genes from 81 species. The letters A, B, C, and D indicate DUF1313 genes from algae, Pinus, Musci, and Pteridophytes, respectively. The branches marked with Roman numerals I, II, and III indicate the three graminaceous groups. The branches marked with four-pointed black stars and black triangles indicate two subgroups of the IRRV type: IRTV-type and (V/ M)RRV-type, respectively. The black dots and numbers indicate major clades with bootstrap values of N50%.

Please cite this article as: J. Li, et al., Genomics (2015), http://dx.doi.org/10.1016/j.ygeno.2016.01.003

4

J. Li et al. / Genomics xxx (2015) xxx–xxx

and Tripsacum were calculated in TASSEL v3.0 by using the SNPs and insertion/deletions called from the tested lines. Tests of neutrality such as Tajima's D and Fu and Li's D statistic were performed as described by the DnaSP v5 [26]. Phylogenetic analyses were conducted using the NJ method in MEGA 5.0. The bootstrap test was performed with 1000 replicates. An ML tree was constructed using PhyML 3.1, with a bootstrap test of 1000 replicates. In addition, BI analysis was performed using MrBayes v3.0. Four MCMC chains (one cold and three heated) were run for 1000,000 generations. Tree files were modified and exported using Figtree v1.4.2. 3. Results 3.1. Reconstructing the phylogeny of DUF1313 genes in plants In all, 269 DUF1313 genes from 81 photoautotrophic species of eukaryotic organisms, including marine photoautotrophic algae (Coccomyxa subellipsoidea), terrestrial moss (Physcomitrella patens), terrestrial tall trees (Picea sitchensis), and graminoids (Zea mays), were identified using the four databases (Table S2). The prediction of the amino acid motifs of DUF1313 genes showed that 49 amino acid residues were highly conserved, and only 0.01% of the genes had undergone insertion or deletion in the conserved motif region. Notably, four amino acid residues located between sites 42 and 45 exhibited regular substitution: Ile (I), Ala (A)/Ser (S)/Thr (T)/Phe (F)/Arg (R), Arg (R)/Lys (K), and Val (V) (Fig. 1). The NJ, ML, and BI trees had similar topology (Figs. 2 and S1). According to the substitution of the four amino acid residues, the 269 DUF1313 genes from 81 species were classified into three major types: IARV-type, I(S/T/F)(K/R)V-type, and IRRV-type (Figs. 2 and S1),

although no special conserved character was found in the DUF1313 genes to clearly differentiate each of the species. The IARV-type included 17 genes from graminoids that are closely related to algae and are highly conserved. The I(S/T/F)(K/R)V-type included 69 genes, three of which are from Pinus and the others are from dicotyledonous plants. The IRRV-type includes 158 genes from monocotyledonous and dicotyledonous plants, of which 32 genes were from graminoids that are closely related to Pinus, Musci, and pteridophytes. The IRRV-type is the largest and can be divided into two subgroups: the IRTV-type and (V/M)RRV-type (Figs. 2 and S1), which mainly include cruciferous plants. A few DUF1313 genes (black labels in Fig. S1, but except those in Pinus, Musci, Pteridophytes, and algae), were different compared with the three major groups; this could be attributed to the mutations of the major types. For example, the genes from Citrus sinensis (ID: Orange1.1g033651m), Populus trichocarpa (ID: Potri.019G131700.1), Eucalyptus grandis (ID: Eucgr.C03559.2), Helianthus annuus (ID: C6ZKH6), Musa acuminata subsp. malaccensis (ID: M0TLN3), and Setaria italica (ID: K4A1Z2) belonged to the ITRV, IRRA, IKQV, IRKV, IQRV, and IRHV types (Table S2), respectively, which are presumably mutations of the IRRV-type. Moreover, the Ba02170 gene (MARV-type) from Panicum virgatum is presumably a mutation of the IARV-type. Interestingly, the Spipo2G0024500 gene from Spirodela polyrhiza belongs to the LKRLtype, which might be an archaic DUF1313 gene of monocotyledons and is closely related to Pinus. Three genes (ID: Phpat.006G022300.2, D8SJV9, and D8RCC1) from Musci and Pteridophytes are of the IGKV, IGQV, and IKRV types, respectively (Table S2, Fig. 2); the IRRV types have evolved from these genes. Rapid expansion of the DUF1313 gene family occurred after the divergence of terrestrial plants, in particular,

Fig. 3. Maximum clade credibility tree of 51 DUF1313 genes in graminoids. The numbers above the branches indicate the posterior probability values. The branch labels are tagged with abbreviations of species names and gene ID numbers, separated by an underline. The black dots indicate predicted gene duplication events in the nodes. A and B indicate the two major groups that were divided by a major gene duplication. Group B is divided into two subgroups, B1 and B2. The asterisk indicates the branch with sites of significant positive selection.

Please cite this article as: J. Li, et al., Genomics (2015), http://dx.doi.org/10.1016/j.ygeno.2016.01.003

J. Li et al. / Genomics xxx (2015) xxx–xxx

after the divergence of gymnosperms (Fig. 2). Notably, the DUF1313 genes of graminoids are divided into three groups (I, II, and III in Fig. 2), and their bootstrap support rate is very high (≥90%). This result indicates that the DUF1313 genes of graminoids have a separate differentiation mechanism. In all, nine maize DUF1313 protein sequences were identified (Table S2), but only four corresponding genes were found in the B73 reference genome by using BLASTP analysis. The results showed that three protein sequences (B6TE20, E2J7X8, and B6TR40) exhibited a high level of similarity with B4FX35, corresponding to gene GRMZM2G025646. Two sequences (B6U7D2 and B6TYS7) were highly similar to B7ZZS3, corresponding to gene GRMZM5G877647. The remaining two protein sequences (B6SH35 and B6SXG0) were identified as GRMZM2G359322 and GRMZM2G382774, respectively. Gene GRMZM2G025646 belongs to the IARV-type, whereas genes GRMZM5G877647, GRMZM2G359322, and GRMZM2G382774 belong to the IRRV-type. These results indicate that GRMZM2G025646 is more primitive and conserved, whereas GRMZM5G877647 formed relatively late and showed higher rates of polymorphism. These four genes were used for further analysis. Previous studies identified five ELF4-like genes in Arabidopsis: ELF4, ELF4-like1, ELF4-like2, ELF4-like3, and ELF4-like4 [8]. GRMZM2G025646, GRMZM2G382774, and GRMZM2G359322 in maize are homologous

5

to the ELF4-like4 gene in Arabidopsis, whereas GRMZM5G877647 is homologous to the ELF4-like3 gene. These results suggest that plants retained most of the original members of the gene family, and new genes were formed during the evolution of the DUF1313 family. We speculate that functional differentiation had occurred in these genes, and they were selected for further analysis. 3.2. Selection pressure analysis of DUF1313 genes in graminoids The selection pressure exerted on the different phylogenic clades of graminoids was investigated by selecting 51 graminoid DUF1313 genes and dividing them into two major clades A and B via the reconstruction of a maximum clade credibility tree. Clade B was divided into two subgroups, B1 and B2. The three groups included 18, 13, and 20 genes, respectively (Fig. 3). The classification of the three groups was consistent with clades I, II, and III (Fig. 2). DUF1313 genes were thought to have undergone 25 instances of gene duplication during the evolution of graminoids by using Notung program 2.6 (indicated as black nodes in Fig. 3). Two or three of these gene duplication events were shared by all graminoid species. The dN:dS ratio is indicative of the change in selective pressure. Nonsynonymous nucleotide substitution (N) produces changes in the

Table 1 Association mapping of DUF1313 family genes with photoperiod sensitivity traits in maize. Genes GRMZM2G382774

Chr 1

GRMZM2G025646

7

GRMZM2G359322

9

Position (bp)

Genotypea

MAF

71,214,602

C/T

0.12

71,214,679

G/T

0.08

71,214,744

C/T

0.10

71,214,747

G/T

0.10

106,230,666 106,230,817 106,230,819 122,956,024

C/T A/G C/G G/T

0.07 0.06 0.05 0.14

122,956,042

C/T

0.38

122,956,100

C/G

0.06

122,956,101

C/G

0.06

122,956,179

A/T

0.33

122,956,186

G/T

0.32

122,956,589 122,957,077 122,957,078 122,957,079 122,957,081 122,957,162 122,957,164 122,957,165 122,957,178

C/G C/T C/G C/T C/G A/G G/T C/G C/G

0.39 0.07 0.19 0.24 0.12 0.17 0.17 0.17 0.05

Trait LN PH DS LN PH DS LN PH LN PH DS LN LN LN PH DT LN PH EH DT PH PS DS EH DS PH PS DS DT PS DS DT PS DS DT TL TL TL TL LN LN LN EH DT

R2

P value −4

2.25 × 10 4.04 × 10−4 3.64 × 10−3 4.70 × 10−3 6.48 × 10−3 9.89 × 10−3 2.72 × 10−4 4.95 × 10−4 2.72 × 10−4 4.95 × 10−4 6.45 × 10−3 5.87 × 10−3 2.53 × 10−3 4.52 × 10−3 6.12 × 10−3 7.26 × 10−3 8.09 × 10−3 4.01 × 10−3 8.28 × 10−4 5.19 × 10−4 1.82 × 10−3 3.15 × 10−4 2.91 × 10−3 8.28 × 10−4 5.19 × 10−4 1.82 × 10−3 3.15 × 10−4 2.91 × 10−3 1.23 × 10−3 7.61 × 10−3 7.35 × 10−3 9.71 × 10−4 4.74 × 10−3 7.41 × 10−3 3.45 × 10−3 8.62 × 10−3 2.58 × 10−4 3.68 × 10−4 3.50 × 10−3 2.45 × 10−3 2.45 × 10−3 2.45 × 10−3 7.82 × 10−3 3.77 × 10−3

0.024 0.025 0.015 0.014 0.015 0.012 0.024 0.024 0.024 0.024 0.012 0.013 0.016 0.015 0.016 0.012 0.013 0.017 0.021 0.018 0.019 0.020 0.015 0.021 0.018 0.019 0.020 0.015 0.017 0.011 0.013 0.018 0.013 0.013 0.014 0.014 0.028 0.027 0.018 0.016 0.016 0.016 0.013 0.013

Chr: chromosome; MAF: minor allele frequency; DS: days to silking; DT: days to tasselling; PS: pollen shed; PH: plant height; EH: ear height; TL: tassel length; LN: the number of leaves above the ear. R2: explained phenotypic variation. a Underlined allele indicates minor allele.

Please cite this article as: J. Li, et al., Genomics (2015), http://dx.doi.org/10.1016/j.ygeno.2016.01.003

6

J. Li et al. / Genomics xxx (2015) xxx–xxx

amino acid residues in an encoded protein, whereas synonymous substitutions (S) do not. Thus, positive selection is indicated by dN:dS ratios greater than 1.0, while stationary (stabilizing) or purifying (negative) selection is indicated by dN:dS ratios that approach zero, where dN and dS represent the rates of non-synonymous and synonymous nucleotide substitution, respectively [27,28]. The dN:dS ratios of clades A and B were 0.19 and 0.22, respectively (Fig. 3), implying the occurrence of stationary selection in the two clades. The selective forces within subclades were determined by considering that the amino acid residues predicted by the branch-site model of the CODEML program were under positive selection (Table S3). Subclade B1 was found to have undergone episodic positive selection after a gene duplication event. Four amino acid sites (codon sites) were found to have undergone positive selection in subclade B1 (Fig. 3, Table S3), but no significant site had undergone positive selection in subclade B2. This indicates that some graminoid DUF1313 genes underwent purifying selection, and the others underwent positive selection after a gene duplication event. Thus, the genes in this family not only retained their original functions, but also developed new functions, which provides a basis for the genetic diversity of graminoids. 3.3. DUF1313 family members associated with photoperiod sensitivity identified in maize Analysis of graminoid DUF1313 genes suggested the presence of significant positive selection in maize; in addition, the DUF1313 family was predicted to have undergone functional divergence. Therefore, whether the natural variation of the DUF1313 family members was associated with photoperiod sensitivity in maize was further investigated. In all, 17, 18, 26, and 30 SNPs in the GRMZM2G382774, GRMZM2G025646, GRMZM5G877647, and GRMZM2G359322 regions, respectively, were used to determine their association with the target traits. Gene GRMZM2G359322 contained 30 SNPs, of which 15 were significantly associated with different traits having minor allele frequencies of ≥ 0.05. Seven SNPs, each significantly associated with several traits, were found. Four SNPs were significantly associated with DT, PS, and DS (Table 1). In gene GRMZM2G382774, four of the 17 SNPs were

significantly associated with three traits. The genes responsible for the LN and PH shared four significant SNPs, whereas those associated with DS had two significant SNPs (Table 1). In gene GRMZM2G025646, two SNPs were significantly associated with LN, and one SNP was associated with DS (Table 1). No significant associations with photoperiod sensitivity-related traits were detected in gene GRMZM5G877647 across the 513 maize inbred lines. 3.4. Favorable alleles for maize photoperiod sensitivity identified in the DUF1313 genes The association of the DUF1313 family members with photoperiod sensitivity was validated by further identifying favorable alleles for the target traits by using 25 RIL populations. In all, five and eleven polymorphic SNPs (MAF, N 0.05) in genes GRMZM5G877647 and GRMZM2G359322, respectively, were identified in all the populations (Table S4). Further, eight and one polymorphic SNPs in genes GRMZM2G025646 and GRMZM2G382774, respectively, were found in seven RIL populations (Table S4). A t-test was conducted (p b 0.001) with the polymorphic SNPs and the eight selected photoperiod sensitivity traits (Tables S4 and S5). The results indicated that one polymorphic SNP in the GRMZM2G382774 gene on chromosome 1 was significantly associated (minimum p value = 2.72 × 10−7) with five traits (GDD-DT, GDD-DS, DL, DS, and DT) collected from different environments in three RIL populations. In GRMZM5G877647, five SNPs were significantly associated with the different tested traits. The three SNPs on chromosome 4 were significantly associated with six traits (DS, DT, GDD-ASI, GDDDS, GDD-DT, and PH) collected from a single environment in five RIL populations. In GRMZM2G025646, three SNPs on chromosome 7 were significantly associated with four traits (GDD-ASI, GDD-DS, GDD-DT, and TL) collected from eight environments in five RIL populations. In GRMZM2G359322, four SNPs on chromosome 9 were significantly associated with the eight traits collected from almost all environments in 19 populations. Association mapping revealed that one of these SNPs was significantly associated with DT in the 513 maize inbred lines. The significance level of the polymorphic SNPs found in the four genes was clearly determined by calculating the percentage of

Fig. 4. Percentage of single nucleotide polymorphisms (SNPs) located in the four maize genes associated with eight photoperiod sensitivity traits identified using 25 recombinant inbred line (RIL) populations. DS: days to silking; DT: days to tasselling; GDD-DS: days to silking at growing degree days; GDD-DT: days to tasselling at growing degree days; GDD-ASI: anthesissilking interval at growing degree days; TL: tassel length; PH: plant height; EH: ear height. The SNPs indicated by the darker blocks are more closely associated with the target traits.

Please cite this article as: J. Li, et al., Genomics (2015), http://dx.doi.org/10.1016/j.ygeno.2016.01.003

J. Li et al. / Genomics xxx (2015) xxx–xxx

7

Table 2 ELF4-like4 gene nucleotide diversity and neutrality test. Region

CDS

Exon

3′UTR

5′UTR

Overall

Species

Site (bp)

S

H

π

θω

Tajima's D

Fu and Li's D

Maize Teosinte Tripsacum Total population Maize Teosinte Tripsacum Total population Maize Teosinte Tripsacum Total population Maize Teosinte Tripsacum Total population Maize Teosinte Tripsacum Total population

438 442 441 456 1365 1699 1225 1764 789 1123 662 1158 139 134 115 149 1845 2228 1831 2417

28 58 39 109 57 149 104 221 23 80 56 98 11 14 9 17 79 199 152 281

16 43 15 75 26 59 16 98 17 49 15 69 7 10 10 18 29 61 16 103

0.006 0.011 0.017 0.016 0.008 0.013 0.021 0.016 0.008 0.015 0.023 0.017 0.023 0.022 0.024 0.025 0.007 0.013 0.022 0.015

0.017 0.030 0.027 0.049 0.016 0.035 0.031 0.050 0.012 0.037 0.033 0.048 0.038 0.043 0.035 0.069 0.015 0.035 0.032 0.051

−2.215⁎⁎ −2.071⁎ −1.676 −2.315⁎⁎ −1.778 −2.201⁎⁎

−3.613⁎⁎ −5.26⁎⁎ −1.950 −6.619⁎⁎ −2.684⁎ −4.870⁎⁎

−1.493 −2.355⁎⁎ −1.181 −2.198⁎⁎ −1.322 −2.237⁎⁎ −1.321 −1.556 −1.213 −1.781⁎ −1.992⁎ −2.319⁎⁎ −1.568 −2.437⁎⁎

−1.817 −6.309⁎⁎ −1.351 −4.022⁎⁎ −1.720 −5.170⁎⁎ −0.049 −1.340 −1.011 −3.399⁎⁎ −3.077⁎ −5.245⁎⁎ −1.922 −6.722⁎⁎

S: Total separation loci number (excluding indels and deletion); H: Haplotype number; π: The average nucleotide difference between any two sequences of each locus; θω: The diversity based on the number of segregating sites. Total population includes maize, teosinte, and Tripsacum. ⁎ p b 0.05. ⁎⁎ p b 0.01.

significant SNPs in each RIL population by using a heatmap (Fig. 4). DUF1313 genes exhibited obvious functional differences: genes GRMZM2G025646 and GRMZM2G382774 were most likely associated with flowering traits and GRMZM2G359322 with plant height, ear height, and flowering traits; however, GRMZM5G877647 was not related to any of the target traits. 3.5. Polymorphism, neutral test, and LD analysis of the ELF4-like4 gene in maize and its relatives GRMZM2G025646, a homolog of the Arabidopsis ELF4-like4 gene, is the most highly conserved DUF1313 gene in maize. Its polymorphisms, LD, and selection pressure in maize and its relatives were investigated

by obtaining 103 gene sequences, including 29 from maize, 58 from teosinte, and 16 from Tripsacum (Table S1). Two overall measures of nucleotide diversity, π and θω, were calculated separately for the different sequence regions of the ELF4-like4 genes in the three species (Table 2). The estimates of nucleotide diversity in the ELF4-like4 gene sequence were the highest in Tripsacum and the lowest in maize. Tajima's D and Fu and Li's D tests of each region of the ELF4-like4 gene revealed purifying selection in the CDS region of teosinte and maize, with significant negative values. No significant signal was detected in the 5′-untranslated region of any of the three species or in any gene regions of Tripsacum. The LD patterns of maize, teosinte, and Tripsacum (Fig. 5) suggested significant LD blocks in teosinte (Fig. 5A) and maize (Fig. 5B), whereas

Fig. 5. Linkage disequilibrium (LD) block (A, B, and C) and LD decay (D) of ELF4-like4 genes in teosinte, maize, and Tripsacum.

Please cite this article as: J. Li, et al., Genomics (2015), http://dx.doi.org/10.1016/j.ygeno.2016.01.003

8

J. Li et al. / Genomics xxx (2015) xxx–xxx

few LD blocks were observed in Tripsacum (Fig. 5C). Either a high LD level or complete LD was noted among sites in different regions of maize and teosinte. The LD of ELF4-like4 in maize decayed to 0.1 within 1300 bp, whereas that in teosinte decayed to 0.1 until 600 bp (Fig. 5D), and the gene LD in Tripsacum decayed rapidly to almost 0.1 at 300 bp, which is consistent with the polymorphisms found in the three species. 3.6. Phylogenetic analysis of the ELF4-like4 gene in maize and its relatives The GRMZM2G025646 (ELF4-like4) gene is the most primitive and conserved among the DUF1313 family genes in maize. An evolutionary analysis of the DUF1313 genes in maize and its relatives was conducted by reconstructing an NJ tree containing 103 sequences, with N 50%

bootstrap support (Fig. 6). The results of ML and BI trees were consistent with those of the NJ tree. The species were grouped into two major clades, A and B. Clade A mainly included Tripsacum species, and clade B included teosinte and maize, each with 99% bootstrap support. A subsequent analysis of clade B showed that it could be divided into 3 clades, B1, B2, and B3. The B1 clade mainly included Zea luxurians and Zea nicaraguensis (teosinte species); the B2 clade included Zea perennis and Zea diploperennis (perennial teosinte); and the B3 clade included Z. mays subsp. parviglumis, Z. mays subsp. huehuetenangensis, Z. mays subsp. mexicana, and Z. mays. Tripsacum species (clade A) comprised a monophyletic clade sister to clades B1 and B2. B1 and B2 are sister clades and together comprise a monophyletic clade sister to clade B3. Individuals of Z. mays subsp. mexicana, Z. mays subsp. parviglumis, Z. mays subsp. huehuetenangensis, and Z. mays were largely but not completely separated. However, within clade B3, the branches that included Z. mays were poorly supported in the NJ tree, with bootstrap proportions of b50%. These results are consistent with the morphological classification of Maydeae described as Doebley and Iltis [29]. 4. Discussion

Fig. 6. Phylogenetic neighbor-joining (NJ) tree of the ELF4-like4 gene based on nucleotide sequences in maize, teosinte, and Tripsacum (bootstrap support, N50%). The nodes of branches with the bootstrap support of N50% are marked with black dots, and the main clades are marked with the bootstrap value. The branch labels are tagged with a species name and source number.

Previous studies have shown that DUF1313 genes only exist in plants and encode proteins with a highly conserved domain of unknown function [6]. In the study, 269 DUF1313 family genes from 81 photoautotrophic eukaryotic genomes were identified. Phylogenetic analysis showed that this gene family underwent rapid expansion after the divergence of gymnosperms, with graminoids having retained most of the original sub-family genes (IARV-type). The DUF1313 family genes were found to have undergone at least two or three large-scale gene duplication events and were subjected to positive selection in different graminoid species. Most graminoid species retained the original DUF1313 genes after the gene duplication events, indicating that these genes play an essential role in various physiological and biochemical processes. The selection pressure analysis of the DUF1313 family in graminoids showed that DUF1313 genes have undergone functional differentiation during the evolution of graminoids, and that GRMZM2G025646 is a key conserved gene in maize. We used association approaches to determine the functions of the DUF1313 gene family in maize. Previous studies have shown that the Arabidopsis elf4 mutant caused early flowering under short-day conditions [6], and that ELF4 homologous gene DNE in peas could change the expression of circadian rhythm genes under continuous light and dark conditions [9]. Candidate gene-based association mapping indicated that GRMZM2G025646, GRMZM2G382774, and GRMZM2G359322 genes were significantly associated with photoperiod sensitivityrelated traits, but GRMZM5G877647 was not. GRMZM5G877647 was evolutionarily distant from the other three genes, suggesting that it is a younger gene that diverged during the latest gene duplication event. Accordingly, the function of this gene differed from those of the others. The reliability of the results of association analysis was further verified by conducting analysis using RIL populations. Polymorphic SNPs for photoperiod sensitivity traits were detected in GRMZM2G025646, GRMZM2G382774, and GRMZM2G359322 of several RIL populations in different environments; this finding is consistent with those obtained using association mapping. These results indicate that the association approach is suitable for identifying the function of DUF1313 family genes. The composition and phylogenetic patterns of DUF1313 genes in graminoids are similar; therefore, determination of DUF1313 gene function in maize might provide a basis for conducting similar analyses in other graminoid species. GRMZM2G025646, a homolog of the Arabidopsis ELF4-like4 gene, is the most conserved DUF1313 gene in maize. Sequence polymorphism analysis indicated that polymorphisms of this gene gradually increased in maize, teosinte, and Tripsacum. The polymorphisms also varied across different gene regions among the three species. Genetic diversity in maize populations is significantly lower than that in teosinte and

Please cite this article as: J. Li, et al., Genomics (2015), http://dx.doi.org/10.1016/j.ygeno.2016.01.003

J. Li et al. / Genomics xxx (2015) xxx–xxx

Tripsacum, revealing the occurrence of a recent expansion in the maize population. The neutral test showed that selection pressure varied among the different gene regions in maize, teosinte, and Tripsacum. Significant selection signals were detected in the CDS region of maize and teosinte, but not in Tripsacum. The ELF4-like4 nucleotide sequences of maize, teosinte, and Tripsacum were used for phylogenetic analysis. Ten quantitative traits, including the shape of tassel branches and tassel spikelets external to the glume, seven quality characteristics, and ear spikelet cupules, were used by Iltis and Doebley [29] to classify Zea into two subgenera: Zea and Luxurians. The Zea subgenus includes one species (Z. mays) and three subspecies: the cultivated maize subspecies (Z. mays subsp. mays), Z. mays subsp. mexicana, and the Z. mays subsp. parviglumis subspecies. The Luxurians subgenus includes three species: Zea luxurians, Zea perennis, and Zea diploperennis. This classification is consistent with the results of our phylogenetic analysis of the ELF4-like4 gene. Previous studies have shown that maize was domesticated from teosinte (Z. mays subsp. parviglumis) in the Balsas River drainage in southern Mexico about 9,000 years ago [30]. Our conclusion supports this hypothesis, but we also showed that Z. mays subsp. huehuetenangensis and Z. mays subsp. mexicana played an important role in the evolution of Z. mays. We consider that Zea diploperennis might have been derived from Zea perennis. Field hybridization experiments also indicated that the seed setting rate of hybrids of Z. mays with Z. mays subsp. parviglumis and Z. mays subsp. mexicana is very high, whereas that of hybrids of Z. mays with Zea perennis, Zea diploperennis, Zea luxurians, and Zea nicaraguensis is low. Natural hybrids between maize and Tripsacum are not robust. These results also indicate greater genetic distance between these plants. Thus, the ELF4-like4 gene can also be used to classify the relationships among maize and its relatives, and the DUF1313 family members and alleles identified in this study might be valuable genetic resources for conducting molecular marker-assisted breeding in maize. Supplementary data to this article can be found online at http://dx. doi.org/10.1016/j.ygeno.2016.01.003. Acknowledgments This work was supported by the Fok Ying-Tong Education Foundation of China (20135103210002), the Sichuan Youth Science and Technology Innovation team of China (2013TD0014), and the National High Technology Research and Development Program of China (2012AA101104). We thank Dr. Jianbing Yan (National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University) for providing data of 513 maize inbred lines, and other researchers for sharing data in the public databases. References [1] R. Emerson, A genetic view of sex expression in the flowering plants, Science 59 (1924) 176–182. [2] Q. Yang, Z. Li, W. Li, L. Ku, C. Wang, J. Ye, K. Li, N. Yang, Y. Li, T. Zhong, J. Li, Y. Chen, J. Yan, X. Yang, M. Xu, CACTA-like transposable element in ZmCCT attenuated photoperiod sensitivity and accelerated the postdomestication spread of maize, Proc. Natl. Acad. Sci. 110 (2013) 16969–16974. [3] A. Bateman, P. Coggill, R.D. Finn, DUFs: families in search of function, Acta Crystallogr. Sect. F Struct. Biol. Cryst. Commun. 66 (2010) 1148–1152. [4] N.F. Goodacre, D.L. Gerloff, P. Uetz, Protein domains of unknown function are essential in bacteria, mBio 5 (2013) (e00744–00713-e00744–00713).

9

[5] A. Bateman, L. Coin, R. Durbin, R.D. Finn, V. Hollich, S. Griffiths-Jones, A. Khanna, M. Marshall, S. Moxon, E.L.L. Sonnhammer, D.J. Studholme, C. Yeats, S.R. Eddy, The Pfam protein families database, Nucleic Acids Res. 32 (2004) D138–D141. [6] M.R. Doyle, S.J. Davis, R.M. Bastow, H.G. McWatters, L. Kozma-Bognár, F. Nagyk, A.J. Millar, R.M. Amasino, The ELF4 gene controls circadian rhythms and flowering time in Arabidopsis thaliana, Nature 419 (2002) 74–77. [7] K. Holm, T. Källman, N. Gyllenstrand, H. Hedman, U. Lagercrantz, Does the core circadian clock in the moss Physcomitrella patens (Bryophyta) comprise a single loop? BMC Plant Biol. 10 (2010) 109. [8] R. Khanna, EARLY FLOWERING 4 functions in phytochrome b-regulated seedling deetiolation, Plant Physiol. 133 (2003) 1530–1538. [9] L.C. Liew, V. Hecht, R.E. Laurie, C.L. Knowles, J.K. Vander Schoor, R.C. Macknight, J.L. Weller, DIE NEUTRALIS and LATE BLOOMER 1 contribute to regulation of the Pea circadian clock, Plant Cell 21 (2009) 3198–3211. [10] L.M. Wilson, Dissection of maize kernel composition and starch production by candidate gene association, Plant Cell Online 16 (2004) 2719–2733. [11] S. Liu, X. Wang, H. Wang, H. Xin, X. Yang, J. Yan, J. Li, L.S. Tran, K. Shinozaki, K. Yamaguchi-Shinozaki, F. Qin, Genome-wide analysis of ZmDREB genes and their association with natural variation in drought tolerance at seedling stage of Zea mays L, PLoS Genet. 9 (2013) e1003790. [12] E.S. Buckler, J.B. Holland, P.J. Bradbury, C.B. Acharya, P.J. Brown, C. Browne, E. Ersoz, S. Flint- Garcia, A. Garcia, J.C. Glaubitz, M.M. Goodman, C. Harjes, K. Guill, D.E. Kroon, S. Larsson, N.K. Lepak, H. Li, S.E. Mitchell, G. Pressoir, J.A. Peiffer, M.O. Rosas, T.R. Rocheford, M.C. Romay, S. Romero, S. Salvo, H. Sanchez Villeda, H.S. da Silva, Q. Sun, F. Tian, N. Upadyayula, D. Ware, H. Yates, J. Yu, Z. Zhang, S. Kresovich, M.D. McMullen, The genetic architecture of maize flowering time, Science 325 (2009) 714–718. [13] J. Fu, Y. Cheng, J. Linghu, X. Yang, L. Kang, Z. Zhang, J. Zhang, C. He, X. Du, Z. Peng, RNA sequencing reveals the complex regulatory network in the maize kernel, Nat. Commun. 4 (2013). [14] H. Li, Z. Peng, X. Yang, W. Wang, J. Fu, J. Wang, Y. Han, Y. Chai, T. Guo, N. Yang, Genome-wide association study dissects the genetic architecture of oil biosynthesis in maize kernels, Nat. Genet. 45 (2013) 43–50. [15] N. Yang, Y. Lu, X. Yang, J. Huang, Y. Zhou, F. Ali, W. Wen, J. Liu, J. Li, J. Yan, Genome wide association studies using a new nonparametric model reveal the genetic architecture of 17 agronomic traits in an enlarged maize association panel, PLoS Genet. 10 (2014) e1004573. [16] K.M. Wong, M.A. Suchard, J.P. Huelsenbeck, Alignment uncertainty and genomic analysis, Science 319 (2008) 473–476. [17] K. Tamura, J. Dudley, M. Nei, S. Kumar, MEGA4: molecular evolutionary genetics analysis (MEGA) software version 4.0, Mol. Biol. Evol. 24 (2007) 1596–1599. [18] D. Darriba, G.L. Taboada, R. Doallo, D. Posada, ProtTest 3: fast selection of best-fit models of protein evolution, Bioinformatics 27 (2011) 1164–1165. [19] S. Guindon, J.-F. Dufayard, V. Lefort, M. Anisimova, W. Hordijk, O. Gascuel, New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0, Syst. Biol. 59 (2010) 307–321. [20] F. Ronquist, J.P. Huelsenbeck, MrBayes 3: Bayesian phylogenetic inference under mixed models, Bioinformatics 19 (2003) 1572–1574. [21] A.J. Drummond, M.A. Suchard, D. Xie, A. Rambaut, Bayesian phylogenetics with BEAUti and the BEAST 1.7, Mol. Biol. Evol. 29 (2012) 1969–1973. [22] Z. Yang, PAML 4: phylogenetic analysis by maximum likelihood, Mol. Biol. Evol. 24 (2007) 1586–1591. [23] D. Durand, B.V. Halldórsson, B. Vernot, A hybrid micro–macroevolutionary approach to gene tree reconstruction, J. Comput. Biol. 13 (2006) 320–335. [24] P.J. Bradbury, Z. Zhang, D.E. Kroon, T.M. Casstevens, Y. Ramdoss, E.S. Buckler, TASSEL: software for association mapping of complex traits in diverse samples, Bioinformatics 23 (2007) 2633–2635. [25] W. Deng, Y. Wang, Z. Liu, H. Cheng, Y. Xue, HemI: a toolkit for illustrating heatmaps, PLoS One 9 (2014) e111988. [26] P. Librado, J. Rozas, DnaSP v5: a software for comprehensive analysis of DNA polymorphism data, Bioinformatics 25 (2009) 1451–1452. [27] Z. Yang, J.P. Bielawski, Statistical methods for detecting molecular adaptation, Trends Ecol. Evol. 15 (2000) 496–503. [28] J.G. Schwerdt, K. MacKenzie, F. Wright, D. Oehme, J.M. Wagner, A.J. Harvey, N.J. Shirley, R.A. Burton, M. Schreiber, C. Halpin, J. Zimmer, D.F. Marshall, R. Waugh, G.B. Fincher, Evolutionary dynamics of the cellulose synthase gene superfamily in grasses, Plant Physiol. 168 (2015) 968–983. [29] J.F. Doebley, H.H. Iltis, Taxonomy of Zea (Gramineae). I. A subgeneric classification with key to taxa, Am. J. Bot. 67 (1980) 982–993. [30] Y. Matsuoka, Y. Vigouroux, M.M. Goodman, J.G. Sanchez, E. Buckler, J. Doebley, A single domestication for maize shown by multilocus microsatellite genotyping, Proc. Natl. Acad. Sci. 99 (2002) 6080–6084.

Please cite this article as: J. Li, et al., Genomics (2015), http://dx.doi.org/10.1016/j.ygeno.2016.01.003