Identification and expression analysis of phosphatidy ethanolamine-binding protein (PEBP) gene family in cotton

Identification and expression analysis of phosphatidy ethanolamine-binding protein (PEBP) gene family in cotton

Genomics xxx (xxxx) xxx–xxx Contents lists available at ScienceDirect Genomics journal homepage: www.elsevier.com/locate/ygeno Original Article Id...

2MB Sizes 0 Downloads 39 Views

Genomics xxx (xxxx) xxx–xxx

Contents lists available at ScienceDirect

Genomics journal homepage: www.elsevier.com/locate/ygeno

Original Article

Identification and expression analysis of phosphatidy ethanolamine-binding protein (PEBP) gene family in cotton ⁎

Min Wanga, , Yangguang Tanb,c, Caiping Caic,d, Baohong Zhangb,c a

Beijing Key Laboratory of Plant Resources Research and Development, Beijing Technology and Business University, Beijing 100048, People's Republic of China Henan Collaborative Innovation Center of Modern Biological Breeding, Henan Institute of Sciences and Technology, Xinxiang 453003, Henan, China c Department of Biology, East Carolina University, Greenville 27858, NC, USA d State Key Laboratory of Crop Genetics & Germplasm Enhancement, Hybrid Cotton R & D Engineering Research Center, Ministry of Education, Nanjing Agricultural University, Nanjing 210095, China b

A R T I C LE I N FO

A B S T R A C T

Keywords: PEBP Phylogenetic analysis Structural analysis Expression pattern Cotton

The phosphatidy ethanolamine-binding proteins (PEBP) play an important role in controlling flower development and phase change. Here, a total of 61 PEBP genes were identified, in which 20, 21, 10, and 10 were from tetraploid Gossypium hirsutum (AD1) and G. barbadense (AD2), and diploid G. raimondii (D5) and G. arboreum (A2), respectively. In G. hirsutum, 20 identified PEBP genes were unevenly distributed on 12 chromosomes. The identified PEBP genes were classified into four groups (TFL1, MFT, FT and FT-like). Among those, FT-like group are unique to cotton. The majority of PEBP genes had similar intron/exon distribution, whereas the divergence of PEBP genes suggests the possibility of functional diversification. The expression of PEBP genes varied among different tissues. This study brings new insights into the integrated genome-wide identification of PEBP genes in cotton and provides a foundation for breeding cotton cultivars with early maturation.

1. Introduction

populous [16], legume [17], rice [9], barley [18], Cucurbita moschata [19], apple (Malus×domestica Borkh) [20], and perennial vine kiwifruit [21]. In Arabidopsis, there are two FT-like genes, FT and TWIN SISTER OF FT (TSF), are floral activators; mutants of these genes caused lateflowering under long-day conditions [22]. FT-like is a member of the small CENTRORADIALIS/TERMINAL FLOWER 1/SELF-PRUNING (CETS) protein family [23]. The expression of TSF and FT is up-regulated in phloem companion cells in the leaves [24,25]. TFL1, BROTHER OF FT(BFT), TFL1, and Arabidopsis thaliana CENTRORADIALIS [11] belongs to the TFL1-like subfamily [26]; the TFL1 gene is expressed at a high level in inflorescence meristem of the Arabidopsis [27]. MFT-like subfamily only contains MFT gene, the MFT-like gene is related to the gamete, sporophyte and seed development of bryophytes; the FT/TFL1like gene plays an important role in the transition of vegetative to reproductive growth [4]. In the Arabidopsis, the MFT gene expression is regulated by the ABA-INSENSITIVE3 (ABI3) and ABA-INSENSITIVE5 transcription factors in the ABA signaling pathway [28]. Cotton (Gossypium hirsutum), an important fiber crop, is grown extensively around the world. Aside from providing valuable fiber, cotton seed provides an important source of edible oil as well as seed cake used in animal feed [29]. Cotton originated from the tropical region, and its growth is very sensitive to low temperature; the objective of most

PEBP, phosphatidy ethanolamine-binding protein, is an evolutionarily conserved class of proteins in plants [1–3]. The PEBP genes are involved in many biological processes in plants. Important processes include the regulation of flower development timing as well as controlling plant architecture [4]. PEBP proteins are a large family that can be divided into three subfamilies, including the FLOWERING LOCUS T [5]-like proteins, the TERMINAL FLOWER1 (TFL1)-like proteins, and the MOTHER OF FT AND TFL1 (MFT)-like proteins [6]. The FT gene is a floral activator, whereas theTFL1 gene is a floral repressor [7]. FT gene encodes a protein that is thought to be the plant's florigen [8–10], although the functions of FT-like and TFL1-like genes are opposite, both are involved in flowering regulation [11]. Additionally, MFT-like genes play a significant role in seed germination and development [12]. Among them, MFT-like is an ancestor of FT-like and TFL1-like subfamilies. There are no FT or TFL1 homologous genes in moss and plantations, but with 4 and 2 MFT homologous genes respectively, indicating that FT/TFL1-like is the result of the evolution of seed plants [13]. To date, many PEBP family members have been identified in several plant species, including Arabidopsis [14], maize [6], grapevine [15],



Corresponding author. E-mail address: [email protected] (M. Wang).

https://doi.org/10.1016/j.ygeno.2018.09.009 Received 17 March 2018; Received in revised form 4 September 2018; Accepted 15 September 2018 0888-7543/ © 2018 Published by Elsevier Inc.

Please cite this article as: Wang, M., Genomics, https://doi.org/10.1016/j.ygeno.2018.09.009

Genomics xxx (xxxx) xxx–xxx

M. Wang et al.

cotton breeding is flowering earliness [30]. The PEBP family genes are important in regulating flower timing, inflorescence meristem in cotton [5]. Although great progress has been made in other plant species, only limited study has been reported on cotton PEBP family. Up to now, only a limited number of PEBP genes were identified and cloned [5,25,31]. One study shows that GhFT1 may play a role in regulating flowering time and fiber development [31]. Recently, four cotton species, including G. hirsutum acc.TM-1 [32], G. barbadense acc.3–79 [33], G. raimondii [34] and G. arboretum [35] were sequenced, which provided a powerful resource for investigating the PEBP family genes in cotton. In this study, we first identified all potential PEBP family genes from the G. hirsutum genome, along with three other cotton genomes. We analyzed their phylogenetic relations, chromosome location, gene duplication, gene structure and motif; we also analyzed their expression in a wide range of tissues. Our studies will help further investigation of the detailed molecular and biological functions of PEBP members and can provide a reference for the practice of molecular breeding in cotton.

subgroups, which contain eight upland cotton genes and several genes from other plant species, including AtMFT, VvMFT, OsMFT, PoptMFT. FT-subgroup includes AtFT, AtTSF from Arabidopsis, four from poplar. The FT-subgroup includes AtFT, AtTSF from Arabidopsis, four genes from poplar, one from grapevine, thirteen from rice, fourteen from maize, and two (FT1D and FT1A) from cotton. It is worth mentioned that the FT-like-subgroup, FTL1A, 1D, 2A and 2D were specially found in cotton (Table 1). Generally speaking, all subgroups contain at least five plant species except FT-like-subgroup, exhibiting an analogous distribution. This may imply that the PEBP genes with similar function tend to cluster in the same clade in cotton. There is one more PEBP gene in G. barbadense than in G. hirsultum; this one might generated after the divergence of these two cotton species by duplication event. 2.2. Chromosomal location and gene duplication of PEBP gene family In order to elucidate their chromosomal distribution the 20 identified upland cotton PEBP genes were mapped onto 12 chromosomes. No PEBP gene was detected in A02/D02, A03/D03, A05, A06/D06, A9, A10/D10, A12/D12 or A13/D13 homologous chromosomes of G. hirsutum acc.TM-1 (Fig. 2A and B). As shown in Fig. 2A and B, genome chromosomal location analyses revealed that 16 PEBP genes were distributed on 12 chromosomes, the remaining four genes were distributed on the scaffold of the genome of G. hirsutum acc.TM-1. The distribution of PEBP genes appeared to be uneven, among them were six genes on the A-subgenome. Chromosome 4 had the highest number of PEBP genes, which were GhFTL1A and GhTFL1–3A; chromosomes 1, 7, 8, and 11 contained only onePEBP gene; 10 PEBP genes were located/identified on the D-subgenome, two were discovered on the D4, D8 and D9 chromosomes, respectively; one was discovered on each of D1, D5, D7 and D11 chromosomes. Due to the effect of the duplicate genes on the agronomically vital traits and evolution of plant species [41]. In order to elucidate gene duplication status, we were interested in identifying duplicated genes in G. raimondii, the G. raimondii genome had undergone at least two rounds of genome-wide duplication [42]. The 10 PEBP genes were mapped onto chromosomes of G. raimondii, among them, we identified three pairs of duplication that were highly similar paralogs in the same subgroups by searching the PGDD database, and shared a high degree of identity through their protein sequences (Supplementary Table 6) [43]. The three pairs of PEBP genes identified were GrTFL1-2D and GhTFL1–3D, GhMFT1D, GhMFT2D, GhFTL1D, and GhFTL2D (Fig. 2C). Among these duplicated genes, the synonymous and non-synonymous mutation rates were slightly different and the Ks is less than Ka (Supplementary Table 6). This suggests that genome duplication may also play a significant role in the expansion of the PEBP gene family.

2. Results 2.1. Identification and phylogenetic analysis of the PEBP gene family in cotton To identify all potential PEBP genes in cotton, we used the PEBP protein domains (PF01161) to search the four cotton species, including G. hirsutum acc.TM-1 [32], G. barbadense acc.3–79 [33], G. raimondii [34] and G. arboretum [35] protein databases by HMMER software version 3.0 [36]. Originally, a total of 20, 21, 10, 10 protein candidate sequences were identified by Pfam 31.0 and SMART in G. hirsutum acc.TM-1 (Supplementary Table S1) [37,38]. G. barbadense acc.3–79 (Supplementary Table S2), G. raimondii and G. brboretum (Supplementary Table S3), respectively. In theory, one gene in the diploid G. raimondii and G. arboreum should correspond to two orthologous genes in the tetraploid G. hirsutum acc.TM-1 and G. barbadense acc. 3–79,respectively. Subsequently, all genes in G. arboreum, G. raimondii correspond to 20 and 19 PEBP orthologous genes in the G. hirsutum acc.TM-1 and G. barbadense acc.3–79, respectively (Supplementary Table S4), the remaining genes that do not correspond may be due to the duplication in cotton evolution, assembly errors in some chromosomal regions, or different sequencing methods [39]. The nomenclature of PEBPs in G. hirsutum acc.TM-1 was followed their phylogenetic relationship. Those PEBP sequences were named by systematic designation were used to perform a BLAST with the Arabidopsis genome, and PEBP genes were further named as TFL1-1A, 1D, 2A, 2D, 3A and 3D; MFT1A, 1D, 2D, 3A, 3D, 4A, 4D and 4S; FT1A and 1D; FTL1A, 1D, 2A and 2D. It is worth noting that A, D, S represents the genes from the A-subgenome, D-subgenome and scaffold, respectively (Supplementary Table 1). The corresponding orthologs in G. raimondii, G. arboretum, G. hirsutum acc.TM-1 and G. barbadense acc.3–79 were named GrPEBP, GaPEBP, GhPEBP and GbPEBP, respectively. As the G. raimondii and G. arboreum had been integrated with the D-subgenome and A-subgenome in G. hirsutum acc.TM-1 and G. barbadense acc.3–79. The nomenclature of PEBP genes in G. raimondii and G. arboreum was similar to G. hirsutum acc.TM-1 and G. barbadense acc.3–79 (Supplementary Table S4). To investigate the phylogenetic relationship of the upland cotton PEBP family genes with other plant species, an unrooted phylogenetic tree was constructed by using MEGA6.0 [40]. Multiple sequence alignment of G. hirsutum acc.TM-1 PEBP protein sequences with all 104 PEBP protein sequences from G. arboreum, G. raimondii, G. barbadense acc.3–79 Arabidopsis, rice, maize, poplar and grapevine (Fig. 1). Consistent with orthologs in Arabidopsis and other species, the 124 PEBP genes were clustered into four subgroups. In the TFL1-subgroup, there are a total of 35 genes, including AtACT, AtTFL1 from Arabidopsis, six genes from maize, two genes from poplar, one gene from grapevine, and four genes from rice. MFT-subgroup are also the largest number of

2.3. Gene structure analysis and motif detection in upland cotton In order to gain further insight into the structural diversity of upland cotton PEBP genes, we used the online software Gene Structure Display Server 2.0 (http://gsds.cbi.pku.edu.cn/) to investigate the Intron-Exon Structure pattern by alignment of complete CDS and genomic DNA sequence [44]. As shown in Fig. 3 the majority of PEBP genes within the same subgroups exhibited similar gene structure with regards to the numbers and lengths of introns and exons. The results indicates the number of exons varied from 1 to 4. Fifteen PEBP genes contained 4 exons; one PEBP gene GhFTL2D contained 3 exons, three genes consisted of 2 exons, while only one exon was discovered in GhMFT4S gene. These results correspond to the results reported by other researchers [25]. Conserved motifs in the 20 PEBP proteins were identified using the MEME online tool (Fig. 3B) [45]. Totally, 12 motifs were identified in the upland cotton PEBP proteins, named motifs 1 to 12, the motifs identified were 6 to 50 amino acids in length. The number of conserved motifs in each PEBP varied between 3 and 7indicating that the same 2

Genomics xxx (xxxx) xxx–xxx

M. Wang et al.

Fig. 1. Neighbor-joining phylogenetic tree of 124 PEBP genes of four cotton species and other five plant species. The phylogenetic tree was generated by MEGA 6.0 software with JTT model and complete deletion deletion option. PEBP gene family is divided into four subgroups: TFL1, MFT, FT and FTL. Numbers on branches are bootstrap proportions from 1000 replicates. The FTL subgroup is only composed of cotton species.

by Mev4.9.0 (Fig. 4), as shown in Fig. 4, four genes (TFL1-1A, TFL1-2A, TFL1-2D, and MFT4S) were not expressed in these tissues; the rested 16 PEBP genes were expressed in the tested tissues of G. hirsutum acc.TM-1 with tissue-dependent expression patterns. The majority of the PEBP genes exhibited different expression levels, while some of the gene exhibited similar expression patterns. For example, GhMFT3D, GhMFT3A and GhFTL2A were preferentially expressed in petals or stamen at the high levels; it is worthy to note that GhMFT3D, GhFT1A, GhFTL1A and GhFTL1D had extremely high expression levels in 10, 20 and 25 DPA of fiber, respectively, suggesting that those genes play a positive role in the development of fibers at different developmental times. GhFT1A and GhFT1D had high expression levels in the leaves. Interestingly, GhFTL2D was expressed in all tissues (roots, stems, leaves, petals, stamens, −3, 0, 3 DPA ovules, 5, 10, 20 and 25 DPA fiber), their expression levels were not the same in different tissues. The expression level was highest in the leaf, suggesting that they may play crucial roles in multiple tissues. Conversely, the amount of residual genes expression was low in different tissues (Supplementary Table S4).

Table 1 The number of identified PEBP genes in cotton. Cotton species

TFL1

MFT

FT

FT-like

Total

G. G. G. G.

6 7 3 3

8 8 4 4

2 2 1 1

4 4 2 2

20 21 10 10

hirsutum barbadense arboreum raimondii

subgroup of PEBP protein members share one or more identical motifs, 15 PEBP genes have the same motifs of 1, 2, 3, 4, 5 and 10, and for similar exons, members of the PEBP gene belonging to the same subgroup also showed similar motif composition, suggesting their functional similarity. In addition, some motifs were only presented at a particular subgroup, indicating that they may perform subgroup particular functions [46]. For example, GhMFT4S only have motifs 2, 5 and 10; GhFTL1A, 1D, 2A contain equal motifs 3, 5, 6, 7, 9 and 12, remarkably, the location of the exon/intron and motif of the GhFTL2D gene is significantly different from other genes.

3. Discussion 2.4. Expression profiling of PEBP genes in different tissues of G. hirsutum acc.TM-1

3.1. Phylogenetic characterization of the PEBP family genes in the cotton

Transcriptome data from G. hirsutum acc.TM-1 vegetative organization (root, stem and leaf), floral organization (petal and anther), and fiber tissues at seven different developmental stages, including −3, 0, 3, 5, 10, 20 and 25 DPA, were used to analyze the expression patterns of the 20 identified PEBP genes [32]. The heat mapping was constructed

The evolutionary analysis showed that the PEBP gene family in cotton was divided into four subfamilies (TFL1, MFT, FT and FT-like) (Fig. 1). In the TFL1-subgroup, which included homologous genes in TFL1 and ATC in Arabidopsis and other species, it is worthy to note that the six GhTFL1 of G. hirsutum acc.TM-1 within the same class illustrated 3

Genomics xxx (xxxx) xxx–xxx

M. Wang et al.

Fig. 2. The chromosomal location of each PEBP was mapped to the G. hirsutum acc.TM-1 A- (A) or D (B) sub-genome. PEBP gene duplication in G. raimondii (C). The chromosome number is indicated at the top of each chromosome. The scale is in mega bases (Mb).

did not know what caused this difference, but it may be unique to the cotton during long evolutionary history; another potential is that those genes may be a processed pseudogene or caused by bad sequenced results. Thus, these two genes need more detailed reexamination in the future.

similar intron/exon scatter with regards to intron/exon number (Fig. 3C). TFL1 subgroup was the most homologous to TFL1 in poplar and included TFL1 homologous gene in Arabidopsis, maize and grapevine. One interested thing is that, among the identified PEBP genes, GhFTL2D and GhMFT4S were lack of introns and had smaller size; we 4

Genomics xxx (xxxx) xxx–xxx

M. Wang et al.

Fig. 3. The phylogenetic tree, motifs and gene intron/exon structure of 20 identified PEBP genes in upland cotton. The phylogenetic tree was generated using the Neighbor-Joining [15] method implemented in the MEGA 6.0 software with JTT model and pairwise gap deletion option. The bootstrap analysis was conducted with 1000 iterations. The motif compositions of PEBP genes. Each color represents a specific motif. The exon/intron distribution of PEBP genes. Exons and introns are represented by black boxes and black lines, respectively.

Fig. 4. Expression patterns of 20 identified genes in different tissues and organs in G. hirsutum acc.TM-1. Twelve tissues and organs were root, stem, leaf, petal, stamen, −3, 0 or 3 DPA ovules, 5, 10, 20 and 25 DPA fibers. The color represents PEBP expression levels: Log2 (FPKM). The red color means high expression and the green color means low expression. The phylogenetic relationship was showed on the left.

5

Genomics xxx (xxxx) xxx–xxx

M. Wang et al.

introns or in terms of intron numbers. Additionally, great variety was observed in their chromosomal distribution. For example, 10 PEBP genes were unevenly distributed across 7 of the 13 G. raimondii chromosomes; among them, 10 genes were located on chromosomes 1, 4, 5, 7, 8, 9 and 11. Additionally, chromosomes 4, 8, 9 had two genes, respectively; the rested chromosomes each only contained one gene (Supplementary Table S1). The PEBP genes were more unevenly distributed throughout 6 chromosomes of G. arboreum. They are located on chromosomes 1, 2, 3, 10, 11 ad 12. It is worth mentioning that chromosomes 3, 11, 12 have 2, 3 and 2 genes, respectively, whereas there is only one on chromosomes 1 and 10 (Supplementary Table S1). The reason for this great variety is still unclear. It may be related to the high Long Terminal Repeat (LTR) activities that contributed to the twofold increase in the size of the G. arboreum genome [46].

The MFT-subgroup, the largest class, had the highest number with 8 members, accounting for 40% of total PEBP genes, and could be further divided into four groups: GhMFT1, GhMFT2, GhMFT3 and GhMFT4, containing 2, 1, 2, and 3 members, respectively. All GhMFT members of G. hirsutum acc.TM-1 within the same class illustrated very similar intron/exon number except for GhMFT4S (Fig. 3C). GhMFT4S is on the scaffold of the G. hirsutum acc.TM-1 genome and it only has one exon. These results indicate that the PEBP family genes in monocotyledonous and dicotyledonous plants are highly conserved in structure. In the FTsubgroup, we identified two cotton PEBP genes. Compared with the Arabidopsis FT gene, both FT and TSF showed similar functions in flowering time [47], and some of the FT family genes had no TSF and FT in other species [48]. In fact, FT-like-subgroup specially exist in cotton, since it is very close to the phylogenetic evolution of FT-subgroup, it was renamed FT-like-subgroup, it is worthy to note that the PEBP domain and intron-exon structure of GhFTL-2D was abnormal, except that it contained PEBP domain and three Leucine rich repeat N-terminal domains in the front end of the genome. The reason for this anomaly was unclear, and will require further study. Except FT-like-subgroup, the phylogenetic relationships of the other three subgroups were similar to the Arabidopsis. According to previous reports, upland cotton (G. hirsutum acc.TM-1) and island cotton (G. barbadense acc.3–79), were AD tetraploid, evolved from A-genome diploid G. arboreum and D-genome diploid G. raimondii at around 1–2 Mya [49]. Based on this study, there were 10 PEBP family genes in diploid cotton G. raimondii and G. arboreum, which corresponded to 20 PEBP gene sequences of tetraploid cotton G. hirsutum acc.TM-1 and G. barbadense acc.3–79 (Supplementary Table S4), this indicates the number of PEBP gene families in cotton was highly conserved.

4. Materials and methods 4.1. Database search and identification of PEBP family genes The genome sequences of four cotton species, G. raimondii, G. arboretum, G. hirsutum acc.TM-1 and G. barbadense acc.3–79 were downloaded from http://www.phytozome.net/, http://cgp.genomics.org.cn, http://mascotton.njau.edu.cn/ and https://www.cottongen.org/ respectively. The protein sequences of Arabidopsis, rice and maize were obtained from The Arabidopsis Information Resource (TAIR:http://www. arabidopsis.org), the Rice Genome Annotation Project Database (http:// rice.plantbiology.msu.edu/index.shtml) and http://www.phytozome.net/, respectively; the protein database of poplar and grapevine were obtained from http://www.phytozome.net/. To identify the PEBP family in the four cotton species, the PEBP domain (PF01161), downloaded from Pfam 31.0 (http://pfam.xfam.org/) [37], was used as query to run HMMER software version 3.0 [36] against the three cotton protein databases [46]. To further confirm the genes identified above, the PEBP sequences were manually inspected with Pfam 31.0 (http://pfam.xfam. org/) and SMART (http://smart.embl-heidelberg.de/) to confirm the presence of the conserved PEBP domain and corresponding motifs [38].

3.2. Expression of PEBP genes in the G. hirsutum PEBP genes had diverse expression patterns in cotton different tissues. As an example, GhFT1D, GhFT1A and GhFTL2D showed preferential expression in the leaves; GhFTL2A, GhMFT3A and GhMFT3D exhibited a unique and high expression pattern in the petal and stamen; while GhMFT3D, GhFTL1A, GhFTL1D and GhFTL2D were highly expressed in different stages of the fiber (Fig. 4), additionally, the genes in the FT-like subgroup have not been reported in other plant species. In different cotton genotypes, FT-like genes are involved in photoperiodic response and flowering time regulation [25]. Overexpression of GhFT1 in Arabidopsis obviously generated early flowering phenotypes in both long daylight and short daylight conditions, indicating GhFT1 is a putative FT ortholog in G. hirsutum that regulates floral transition, similar to Arabidopsis [31].GhFT1A and GhFT1D were highly expressed in fiber. Recent studies have shown that MFT-like gene regulates seed development in plants [13]. GhMFT-3D and GhFTL-2D were highly expressed in roots (Supplementary Table S5). Their gene expression patterns were similar to a previous report, but their molecular mechanisms and functions were not well understood in roots [5]. In the MFT and TFL1subgroup, although MFT and TFL1 genes shared similar structures (Fig. 3), their expressions were different; MFT genes were highly expressed in the petal, stamen and fibers, while TFL1 genes exhibited low expression in the root, stem and leaf (Fig. 4).

4.2. Phylogenetic analysis of the PEBP family genes MEGA6.0 (http://www.megasoftware.net/) was used to construct the phylogenetic tree following the Neighbor-Joining [15] method [40]. The parameters were established as follows: Bootstrap method, the bootstrap analysis was conducted with 1000; model/method: JonesTaylor-Thornton (JTT) model; uniform rates and complete deletion option [50]. To further understand the phylogenetic relationship of PEBP proteins in three cotton species and other model plant species, the phylogenetic tree was constructed using all PEBP protein sequences obtained from Arabidopsis, rice, maize, Poplar and grapevine as well as three cotton species. 4.3. Chromosomal location and gene duplication of PEBP family genes Chromosomal location information of upland cotton PEBP family genes was obtained from the Cotton Functional Genomics Database (https://cottonfgd.org/about/download.html), including the chromosome's length and the start position of gene. To determine the distribution of the 20 identified PEBP genes on chromosomes of the G. hirsutum acc.TM-1, the MapInspect software (http://mapinspect.software. informer.com/) was used to map the identified PEBP family genes on Asubgenome, D-subgenome and scaffold of the G. hirsutum acc.TM-1. It has been long recognized that gene duplication occurs during long history of plant evolution; gene duplication creates a new function of a certain gene [51,52]. Gene duplication at a whole genome level (whole genome duplication) provides genetic redundancy, chromosomal rearrangement and further causes substantial gene family expansion and evolutionary novelties [53]. According to a recent report, an event will

3.3. Comparison of the PEBP family genes between G. arboreum and G. raimondii We isolated and characterized 20 and 21 PEBP genes from two tetraploid cotton: G. hirsutum acc.TM-1 and G. barbadense acc.3–79, respectively (Supplementary Table S1, S2). A-genome diploid G. arboreum and D-genome diploid G. raimondii contain 10 PEBP genes (Supplementary Table S3), respectively. Comparison of PEBP gene structures between G. arboreum and G. raimondii indicated that the PEBP genes exhibited a highly conserved distribution of exons and 6

Genomics xxx (xxxx) xxx–xxx

M. Wang et al.

(31170263). We greatly appreciate Brandon Widrick for his critical proofreading of this manuscript.

be considered a gene duplication if it meets the following criteria: (1) the alignable nucleotide sequence covers more than 80% of the longer gene, (2) the region of identity between the two sequences encompasses more than 75% of the alignable region. The tandem duplication pairs were defined as genes separated by five or fewer genes and the segmental duplication pairs were identified using the Plant Genome Duplication Database (http://chibba.agtec.uga.edu/duplication) (Supplementary Table S6) [43,54].

Appendix A. Supplementary data Supplementary data to this article can be found online at https:// doi.org/10.1016/j.ygeno.2018.09.009. References

4.4. Gene structure analysis and motif detection in upland cotton [1] P. Jolle's, From structure to function: possible biological roles of a new widespread protein family binding hydrophobic ligands and displaying a nucleotide binding site, FEBS Lett. 369 (1995). [2] D.C. Bradley, Rosemary; Copsey, Lucy; Vincent, coral control of inflorescence architecture in Antirrhinum, Nature 379 (1996) 791–797. [3] J.J.B. Mark J Banfield, Anthony C.F. Perry, R. Leo Brady, Function from structure the crystal structure of human phosphatidylethanolaminebinding protein suggests a role in membrane signal transduction, Curr. Biol. 6 (1998) 1245–1254. [4] A. Karlgren, et al., Evolution of the PEBP gene family in plants: functional diversification in seed plant evolution, Plant Physiol. 156 (2011) 1967–1977. [5] A. Argiriou, G. Michailidis, A.S. Tsaftaris, Characterization and expression analysis of TERMINAL FLOWER1 homologs from cultivated alloteraploid cotton (Gossypium hirsutum) and its diploid progenitors, J. Plant Physiol. 165 (2008) 1636–1646. [6] O.N. Danilevskaya, X. Meng, Z. Hou, E.V. Ananiev, C.R. Simmons, A genomic and expression compendium of the expanded PEBP gene family from maize, Plant Physiol. 146 (2008) 250–264. [7] S. Hanano, K. Goto, Arabidopsis TERMINAL FLOWER1 is involved in the regulation of flowering time and inflorescence development through transcriptional repression, Plant Cell 23 (2011) 3172–3184. [8] L. Corbesier, C. Vincent, S. Jang, F. Fornara, Q. Fan, I. Searle, A. Giakountis, S. Farrona, L. Gissot, C. Turnbull, G. Coupland, FT proteinmovement contributes to long-distance signaling in floral induction of Arabidopsis, Science 316 (2007) 1030–1033. [9] S.M. Shojiro Tamaki, Hann Ling Wong, Shuji Yokoi, Shimamoto Ko, Hd3a protein is a mobile flowering signal in rice, Science 316 (2007) 1033–1036. [10] E. Lifschitz, T. Eviatar, A. Rozman, A. Shalit, A. Goldshmidt, The tomato FT ortholog triggers systemic signals that regulate growth and flowering and substitute for diverse environmental stimuli, PNAS 103 (2006) 6398–6403. [11] D. Bradley, O. R., C. Vincent, R. Carpenter, E. Coen, Inflorescence commitment and architecture in Arabidopsis, Science 275 (1997) 80–83. [12] Y.B. Tao, L. Luo, L.L. He, J. Ni, Z.F. Xu, A promoter analysis of MOTHER OF FT AND TFL1 1 (JcMFT1), a seed-preferential gene from the biofuel plant Jatropha curcas, J. Plant Res. 127 (2014) 513–524. [13] Q. Li, et al., Identification of a soybean MOTHER OF FT AND TFL1 homolog involved in regulation of seed germination, PLoS One 9 (2014) e99642. [14] Y. Kobayashi, A Pair of related genes with antagonistic roles in mediating flowering signals, Science 286 (1999) 1960–1962. [15] M.J. Carmona, M. Calonje, J.M. Martinez-Zapater, The FT/TFL1 gene family in grapevine, Plant Mol. Biol. 63 (2007) 637–650. [16] R. Mohamed, et al., Populus CEN/TFL1 regulates first onset of flowering, axillary meristem identity and dormancy release in Populus, Plant J. 62 (2010) 674–688. [17] V. Hecht, et al., The pea GIGAS gene is a FLOWERING LOCUS T homolog necessary for graft-transmissible specification of flowering but not for responsiveness to photoperiod, Plant Cell 23 (2011) 147–161. [18] S. Faure, J. Higgins, A. Turner, D.A. Laurie, The FLOWERING LOCUS T-like gene family in barley (Hordeum vulgare), Genetics 176 (2007) 599–609. [19] M.K. Lin, et al., FLOWERING LOCUS T protein may act as the long-distance florigenic signal in the cucurbits, Plant Cell 19 (2007) 1488–1506. [20] N. Kotoda, et al., Molecular characterization of FLOWERING LOCUS T-like genes of apple (Malus x domestica Borkh.), Plant Cell Physiol. 51 (2010) 561–575. [21] E. Varkonyi-Gasic, et al., Homologs of FT, CEN and FD respond to developmental and environmental signals affecting growth and flowering in the perennial vine kiwifruit, New Phytol. 198 (2013) 732–746. [22] I. Kardailsky, V. S., J.H. Ahn, N. Dagenais, S.K. Christensen, J.T. Nguyen, et al., Activation tagging of the floral inducer FT, Science 286 (5446) (1999). [23] L. Pnueli, T. Gutfinger, D. Hareven, O. Ben-Naim, N. Ron, N. Adir, E. Lifschitz, Tomato SP-interacting proteins define a conserved signaling system that regulates shoot architecture and flowering, Plant Cell 13 (2001) 2687–2702. [24] S. Jang, S. Torti, G. Coupland, Genetic and spatial interactions between FT, TSF and SVP during the early stages of floral induction in Arabidopsis, Plant J. 60 (2009) 614–625. [25] X. Zhang, Characterization and functional analysis of PEBP family genes in upland cotton (Gossypium hirsutum L.), PLoS One 11 (2016) e0161080. [26] S.J. Yoo, et al., Brother Of FT and TFL1 (BFT) has TFL1-like activity and functions redundantly with TFL1 in inflorescence meristem development in Arabidopsis, Plant J. 63 (2010) 241–253. [27] N. Mimida, K. Goto, Kobayashi functional divergence of the TFL1-like gene family in Arabidopsis revealed by characterization of a novelhomologue, Genes Cells 6 (2001) 327–336. [28] W. Xi, C. Liu, X. Hou, H. Yu, Mother of FT and TFL1 regulates seed germination through a negative feedback loop modulating ABA signaling in Arabidopsis, Plant Cell 22 (2010) 1733–1748. [29] A.K. Singh, K. Paritosh, U. Kant, P.K. Burma, D. Pental, High expression of Cry1Ac

Gene structure provides valuable information, including relationships within gene family. The intron-exon structure pattern represents an independent guide to support subgroup designation of phylogenetic analysis [50]. Gene Structure Display Server 2.0 (http://gsds.cbi.pku. edu.cn) was used to identify the exon/intron organization of the PEBP proteins [55]. The information of the PEBP family genes, including coding DNA sequence (CDS), gene sequence and protein sequence, were obtained from the Cotton Functional Genomics Database (https:// cottonfgd.org/) and Phytozome database. Additionally, ExPASy (http:// web.expasy.org/compute_pi/) was utilized to compute the isoelectric point (PI) and molecular weight [42] of PEBP proteins (Supplementary Table S1) [56]. The conserved motif was investigated using the Multiple Em for Motif Elicitation (MEME suite 4.11.4http://meme-suite.org) in upland cotton PEBP protein sequences [45]. The optimized parameters were as follows: the width of motif ranged from 6 to 50, with a maximum of 20, zero or one repeat per sequence. Other options used the default values. MEME usually finds the most statistically significant (low E-value) motifs first. It is unusual to consider a motif with an E-value larger than 0.05. Only motifs with an E-value of < 1e-4 were retained for further analysis [57]. 4.5. Expression pattern analysis of PEBP family genes in G. hirsutum acc.TM-1 To better understand the tissue-specific expression profiles of PEBP genes in cotton, high-throughput RNA-sequencing data for G. hirsutum acc.TM-1 [32] were used to analyze the expression of PEBP in different cotton tissues/organs, including vegetative tissues (root, stem, and leaf), floral tissues (petal and stamen), ovule tissues (−3, 0, and 3 Days post anthesis (DPA)), and fiber tissues (5, 10, 20, and 25 DPA). Cluster analysis was performed using Multi Experiment Viewer (MeV) (https:// sourceforge.net/projects/ mev-tm4/) by anon-supervised hierarchical clustering method, heat mapping was constructed by aligning all PEBP genes expression data for expression patterns [56]. Competing financial interests The authors declare no competing financial interests. Author contributions statement M.W. and B.Z. conceived the experiments; M.W., Y.T. and C.C. conducted the experiments; M.W., Y.T., C.C., and B.Z. analyzed the results. M.W., Y.T., C.C., & B.Z. wrote the paper. All authors reviewed the manuscript. Acknowledgements This work was partially supported by the National Key Research Development Program of China (2016YFD0101413); This research was also supported by Collaborative Innovation Center of Modern Biological Breeding in Hehan Province;Major Science and Technology Projects in Henan Province (161100510100);Science Foundation of China 7

Genomics xxx (xxxx) xxx–xxx

M. Wang et al.

[30]

[31]

[32] [33]

[34] [35] [36] [37] [38] [39] [40] [41] [42] [43]

[44] B. Hu, et al., GSDS 2.0: an upgraded gene feature visualization server, Bioinformatics 31 (2015) 1296–1297. [45] T.L. Bailey, N. Williams, C. Misleh, W.W. Li, MEME: discovering and analyzing DNA and protein sequence motifs, Nucleic Acids Res. 34 (2006) W369–W373. [46] J. Ma, et al., Comprehensive analysis of TCP transcription factors and their expression during cotton (Gossypium arboreum) fiber early development, Sci. Rep. 6 (21535) (2016). [47] M. D'Aloia, et al., Cytokinin promotes flowering of Arabidopsis via transcriptional activation of the FT paralogue TSF, Plant J. 65 (2011) 972–979. [48] C.E. Grover, J.P. Gallagher, J.F. Wendel, Candidate Gene Identification of Flowering Time Genes in Cotton, Plant Genome 8 (2015) e0. [49] K.B. Xie, et al., Systematic discovery and characterization of stress-related microRNA genes in Oryza sativa, Biologia (Bratislava) 70 (2015) 75–84. [50] Q. He, et al., Genome-Wide Identification of R2R3-MYB genes and expression analyses during Abiotic stress in Gossypium raimondii, Sci. Rep. 6 (2016) 22980. [51] W. James Kent, B. R., Angie Hinrichs, Webb Miller, David Haussler, Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes, PNAS 100 (2003) 11484–11489. [52] S.B. Cannon, A. Mitra, A. Baumgarten, N.D. Young, G. May, The roles of segmental and tandem gene duplication in the evolution of large gene families in Arabidopsis thaliana, BMC Plant Biol. 4 (2004) 10. [53] J. Zhang, Evolution by gene duplication: an update, Trends Ecol. Evol. 18 (2003) 292–298. [54] E. Niu, et al., Comprehensive analysis of the COBRA-Like (COBL) gene family in Gossypium Identifies two COBLs Potentially associated with fiber quality, PLoS One 10 (2015) e0145725. [55] E. Gasteiger, ExPASy: the proteomics server for in-depth protein knowledge and analysis, Nucleic Acids Res. 31 (2003) 3784–3788. [56] R. Sun, Genome-wide identification of auxin response factor (ARF) genes and its tissue-specific prominent expression in Gossypium raimondii, Funct. Integr. Genomics 15 (2015) 481–493. [57] W. Ma, W.S. Noble, T.L. Bailey, Motif-based analysis of large nucleotide data sets using MEME-ChIP, Nat. Protoc. 9 (2014) 1428–1450.

protein in cotton (Gossypium hirsutum) by combining independent transgenic events that target the protein to cytoplasm and plastids, PLoS One 11 (2016) e0158603. C. Li, et al., Promoting flowering, lateral shoot outgrowth, leaf development, and flower abscission in tobacco plants overexpressing cotton FLOWERING LOCUS T (FT)-like gene GhFT1, Front. Plant Sci. 6 (454) (2015). D. Guo, et al., Molecular cloning and functional analysis of the FLOWERING LOCUS T (FT) homolog GhFT1 from Gossypium hirsutum, J. Integr. Plant Biol. 57 (2015) 522–533. T. Zhang, et al., Sequencing of allotetraploid cotton (Gossypium hirsutum L. acc. TM1) provides a resource for fiber improvement, Nat. Biotechnol. 33 (2015) 531–537. D. Yuan, et al., The genome sequence of Sea-Island cotton (Gossypium barbadense) provides insights into the allopolyploidization and development of superior spinnable fibres, Sci. Rep. 5 (17662) (2015). K. Wang, et al., The draft genome of a diploid cotton Gossypium raimondii, Nat. Genet. 44 (2012) 1098–1103. F. Li, et al., Genome sequence of the cultivated cotton Gossypium arboreum, Nat. Genet. 46 (2014) 567–572. R.D. Finn, J. Clements, S.R. Eddy, HMMER web server: interactive sequence similarity searching, Nucleic Acids Res. 39 (2011) W29–W37. R.D. Finn, et al., Pfam: the protein families database, Nucleic Acids Res. 42 (2014) D222–D230. I. Letunic, T. Doerks, P. Bork, SMART: recent updates, new developments and status in 2015, Nucleic Acids Res. 43 (2015) D257–D260. J. Xu, et al., Discovery and identification of candidate genes from the chitinase gene family for Verticillium dahliae resistance in cotton, Sci. Rep. 6 (29022) (2016). K. Tamura, G. Stecher, D. Peterson, A. Filipski, S. Kumar, MEGA6: molecular evolutionary genetics analysis version 6.0, Mol. Biol. Evol. 30 (2013) 2725–2729. N. Panchy, M. Lehti-Shiu, S.H. Shiu, Evolution of gene duplication in plants, Plant Physiol. 171 (2016) 2294–2316. A.H. Paterson, et al., Repeated polyploidization of Gossypium genomes and the evolution of spinnable cotton fibres, Nature 492 (2012) 423–427. T.H. Lee, H. Tang, X. Wang, A.H. Paterson, PGDD: a database of gene and genome duplication in plants, Nucleic Acids Res. 41 (2013) D1152–D1158.

8