Accepted Manuscript Genome-wide identification, evolution, and expression analysis of GATA transcription factors in apple (Malus×domestica Borkh.)
Hongfei Chen, Hongxia Shao, Ke Li, Dong Zhang, Sheng Fan, Youmei Li, Mingyu Han PII: DOI: Reference:
S0378-1119(17)30499-7 doi: 10.1016/j.gene.2017.06.049 GENE 42012
To appear in:
Gene
Received date: Revised date: Accepted date:
14 January 2017 16 June 2017 28 June 2017
Please cite this article as: Hongfei Chen, Hongxia Shao, Ke Li, Dong Zhang, Sheng Fan, Youmei Li, Mingyu Han , Genome-wide identification, evolution, and expression analysis of GATA transcription factors in apple (Malus×domestica Borkh.), Gene (2017), doi: 10.1016/j.gene.2017.06.049
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT Genome-wide identification, evolution, and expression analysis of GATA transcription factors in apple (Malus × domestica Borkh.)
PT
Hongfei Chen, Hongxia Shao, Ke Li, Dong Zhang, Sheng Fan, Youmei Li, Mingyu Han*
RI
College of Horticulture, Northwest A&F University, Yangling, Shaanxi 712100, China *Corresponding author
SC
College of Horticulture, Northwest A&F University, Yangling, Shaanxi 712100, China Tel & Fax: ++ 86-029-87082849
AC
CE
PT E
D
MA
NU
Email:
[email protected]
ACCEPTED MANUSCRIPT Abstract Plant GATA transcription factors are type-IV zinc-finger proteins that play important regulatory roles in plant growth and development. In this study, we identified 35 GATA genes classified into four groups in the whole genome sequence of Malus domestica. A physiochemical property analysis indicated that GATA proteins are largely unstable hydrophilic proteins. An analysis of conserved protein motifs
PT
uncovered three highly conserved motifs, in addition to the GATA motif, in all MdGATA proteins. These three motifs, CCT, TIFY, and ASXH, were found to occur in specific GATA groups and may be
RI
related to GATA gene function. We identified 10 pairs of putative paralogs, indicating that MdGATA genes have mainly undergone whole genome duplication. Eighteen orthologous gene pairs were also
SC
identified between Arabidopsis thaliana and M. domestica. Furthermore, many light-responsive cis-elements were found in MdGATA gene promoters. Tissue-specific expression analysis performed by
NU
quantitative real-time reverse transcription PCR showed that MdGATA genes were preferentially expressed in flowers, leaves, and buds. Apple seedlings maintained in darkness for 7 days exhibited a
MA
moderate decline in chlorophyll content along with significant down-regulation of most MdGATA genes, suggesting that MdGATA genes may be involved in light-responsive development and chlorophyll-level
D
regulation. The distinctly higher expression levels observed for many MdGATA genes during three
PT E
stages of floral induction also indicate that MdGATA genes may play a role in the apple flowering transition. The results presented here lay the foundation for further investigation of MdGATA gene family putative functions and improvement of apple yields.
AC
Abbreviations
CE
Keywords: GATA; apple; evolution; expression
ASXH, additional sex homology; GDR, Genome Database for Rosaceae; GRAVY, grand average of hydropathicity; GSDS, Gene Structure Display Server; NCBI CDD, National Center for Biotechnology Information Conserved Domains Database;
PlantCARE, Plant Cis-Acting Regulatory Element; RGAP, Rice Genome Annotation Project; TAIR, The Arabidopsis Information Resource
ACCEPTED MANUSCRIPT
AC
CE
PT E
D
MA
NU
SC
RI
PT
MeJA, Methyl Jasmonate
ACCEPTED MANUSCRIPT
1. Introduction GATA transcription factors are a group of regulatory proteins that exist in a wide range of eukaryotic organisms including fungi, metazoans, and plants. These proteins are known for their specific binding of the consensus sequence WGATAR (W = T or A; R = G or A) (Lowry and Atchley, 2000). In animals, a (T/A)GATA(A/G) sequence to which six transcription factors (GATA1 to GATA6)
PT
can bind was initially identified in the chicken globin promoter (Evans et al., 1988). In plants, the first GATA transcription factor was identified from tobacco and named NTL1 based on its similarity to a
RI
protein in Neurospora crassa (Daniel-Vedele and Caboche, 1993). Type-IV zinc-finger motifs related
SC
to GATA transcription factors were subsequently identified in Arabidopsis thaliana (Teakle and Gilmartin, 1998). Although structures of GATA proteins differ among species, each GATA protein
NU
contains a highly conserved type-IV zinc-finger motif consisting of the amino acid sequence CX2CX17–20CX2C followed by a highly basic region (Reyes et al., 2004). In animals, GATA proteins
MA
contain two typical conserved zinc fingers (CX2CX17CX2C), but only the C-terminal zinc finger is related to DNA binding (Lowry and Atchley, 2000). Previous research has shown that the N-terminal zinc finger (N-finger) can modulate the C-terminal zinc finger to bind DNA with different specificities
D
(Newton et al., 2001; Patient and McGhee, 2002; Trainor et al., 2000) or mediate interactions between
PT E
GATA transcription factors and transcription cofactors composing the Friend of GATA (FOG) family (Tsang et al., 1997). The synergies of two finger-like structures can improve DNA affinity binding and
CE
increase the recognition range of the finger domain. Most fungal GATA proteins, such as those in yeast, contain only a single CX2CX17CX2C or CX2CX18CX2C domain (Scazzocchio, 2000). A study of
AC
GATA transcription factors in Arabidopsis thaliana and Oryza sativa indicated that most plant GATA proteins contain only the single zinc-finger domain C-X2-C-X18-C-X2-C, with a few containing two zinc-finger domains or the C-X2-C-X20-C-X2-C zinc-finger domain (Reyes et al., 2004). That study identified 29 GATA family members in A. thaliana and 30 in O. sativa divided into four and six subfamilies, respectively, according to their evolutionary relationships, domain structures, and exon–intron structures. Among the subfamilies in A. thaliana, subfamily I contained 14 members with two exons, subfamily II contained 10 members with two to three exons, subfamily III contained three members with seven exons, and subfamily IV members had no characteristic gene structure. In addition to the GATA domains present in all GATA proteins, acid domains and CCT domains were identified in
ACCEPTED MANUSCRIPT subfamilies I and III, respectively. Similar divisions were also seen in O. sativa. This study of the GATA gene family in A. thaliana and O. sativa provided a foundation for further identification, evolutionary analysis, and functional analysis of GATA transcription factors in other species. Functional analysis has suggested that GATA transcription factors play significant roles in the regulation of plant growth and development. GATA elements have been found in many regulatory
PT
regions of light-responsive genes. Electrophoretic mobility shift assays and DNase I footprinting experiments have subsequently demonstrated that GATA transcription factors can bind to these
RI
elements, thereby implicating GATA transcription factors in light-mediated processes (Borello et al., 1993; Lam and Chua, 1989; Schindler and Cash more, 1990; Terzaghi and Cashmore, 1995). A study
SC
of A. thaliana revealed that expressions of many GATA genes are responsive to light culture, dark culture, and changes in circadian rhythms (Manfield et al., 2007). GATA2 (AT2G45050) can regulate
NU
the expressions of light-responsive genes and plays an important role in photomorphogenesis (Luo et al., 2010). Besides light response, GATA transcription factors are also involved in floral development.
MA
An RNA gel blot experiment indicated that ZIM (At4g24470), a GATA gene, is highly expressed in shoot apices of immature flowers and in flowers in the reproductive phase (Nishii et al., 2000).
D
HANABA TARANU, a GATA transcription factor, is believed to be involved in the establishment of
PT E
boundaries between the meristem and its newly initiated organ primordia. This transcription factor affects the expression of WUSCHEL, a gene functioning near the boundary of the central zone and rib meristem in shoot and floral meristems, that can promote meristem activities (Zhao et al., 2004; Mayer
CE
et al., 1998). Extensive research suggests that GATA transcription factors also play a role in the regulation of carbon and nitrogen metabolism, chlorophyll levels, chloroplast size, and photosynthetic
AC
efficiency (An et al., 2014; Bi et al., 2005; Chiang et al., 2012; Hudson et al., 2011). Apple (Malus domestica) is known as the king of temperate fruits, and its importance in terms of human production and livelihood is self-evident. Some apple cultivars, however, especially ‘Fuji’ accounting for 65% of apple cultivation in China, do not readily flower (Xing et al., 2016). Because this difficulty in flowering severely restricts apple production and economic benefits, the study of the molecular mechanisms of apple flowering is essential to increase apple yields. Despite efforts made to investigate the GATA gene family in model plants such as A. thaliana and O. sativa, this family is relatively uncharacterized in apple. In this study, we therefore performed a genome-wide search for GATA genes in the apple genome and analyzed their chromosomal distributions, evolutionary
ACCEPTED MANUSCRIPT mechanisms, gene structures, functional domains, and cis-elements. We also analyzed their expression patterns in different tissues in response to light and during floral differentiation by quantitative real-time reverse transcription PCR (qRT-PCR) and RNA sequencing (RNA-seq). Our study findings should not only contribute to future investigations of MdGATA gene regulation of light response and flowering in apple, but also provide guidance useful for increasing apple production.
PT
2. Materials and methods 2.1 Identification and chromosomal distribution of GATA genes in M. domestica
the
GATA
domain
(PF00320)
was
downloaded
from
the
Pfam
database
SC
of
RI
To identify all members of the M. domestica GATA gene family, the hidden Markov model profile
(http://pfam.xfam.org/family/PF00320/hmm). This model was used as a query to search for GATA
NU
genes in the M. domestica complete genome based on an expected value (E-value) cutoff of 0.01 in HMMER 2.0 (Finn et al., 2011), which resulted in the initial identification of 39 M. domestica GATA
(http://pfam.xfam.org/search/sequence)
MA
genes. To further confirm the 39 predicted genes as GATA family members, the Pfam database and
NCBI
CCD
D
(https://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi) were used to examine the integrity of the
PT E
GATA domain based on an E-value cutoff of 0.01 (Wang et al., 2016). This analysis resulted in the identification of 35 confirmed MdGATA genes from the M. domestica complete genome. To obtain chromosomal location information for MdGATA genes, the DNA sequence of each MdGATA gene was
CE
used in BLASTN searches of the M. domestica genome in the Genomics Database for Rosaceae (GDR) (http://www.rosaceae.org/tools/ncbi_blast). The 35 MdGATA genes were then named as MdGATA1 to
software
AC
MdGATA35 based on their chromosomal locations and were mapped onto chromosomes using the program
MapInspect
(http://www.plantbreeding.wur.nl/UK/software_mapinspect.html).
Prosite ExPASy server (http://web.expasy.org/protparam/) was used to predict physicochemical characteristics of MdGATA proteins.
2.2 Multiple sequence alignment and phylogenetic analysis Based on the results of previous studies (Reyes et al., 2004; Ao et al., 2015), A. thaliana, O. sativa, and R. communis GATA proteins were downloaded from The Arabidopsis Information Resource (http://www.arabidopsis.org/index.jsp),
the
Rice
Genome
Annotation
Project
(http://rice.plantbiology.msu.edu/cgi-bin/ORF_infopage.cgi), and the Castor Bean Genome Annotation
ACCEPTED MANUSCRIPT database (http://castorbean.jcvi.org/index.php), respectively. Multiple sequence alignment of GATA proteins was carried out using DNAMAN software (Zhang et al., 2016). To further understand the characteristics of MdGATA proteins, the online tool WebLogo (http://weblogo.berkeley.edu/) was used to examine sequence identities in the multiple sequence alignment. MEGA 7.0.14 (Kumar et al., 2016) was used for phylogenetic analysis of the GATA transcription factor family in A. thaliana, O. sativa, M.
PT
domestica, and R. communis. Phylogenetic tree was constructed using the neighbor-joining method with the following parameters: Poisson substitution model, pairwise deletion, and 1,000 bootstrap tests.
RI
2.3 Detection of homologous gene pairs and synteny analysis
SC
Whole-genome protein sequences from A. thaliana and M. domestica were combined and compared for homology using BLASTP with an E-value cutoff of 1 × 10−5. The OrthoMCL algorithm
NU
with default parameters was applied to detect paralogous and orthologous gene pairs and then MdGATA paralogous gene pairs within the M. domestica genome and GATA orthologous gene pairs between M.
MA
domestica and A. thaliana genomes were extracted (Li et al., 2003). The homologous gene pairs obtained in this fashion were used to further identify syntenic blocks using the MCScan algorithm with default parameters (Wang et al., 2012). Syntenic blocks within the M. domestica genome and between
D
M. domestica and A. thaliana genomes were downloaded from the Plant Genome Duplication Database
PT E
(http://chibba.agtec.uga.edu/duplication/) and used to identify syntenic blocks containing MdGATA genes. Detected orthologous, paralogous, and syntenic relationships were illustrated using Circos
CE
(Krzywinski et al., 2009).
2.4 Gene structure, conserved motif, and promoter sequence analysis
AC
Information on MdGATA and AtGATA gene structures was downloaded respectively from the M. domestica genome database in Phytozome (http://www.phytozome.net/apple) and the TAIR database (http://www.arabidopsis.org/index.jsp). The MEME (http://meme-suite.org/) server was used to determine conserved motifs in MdGATA proteins using default parameters and a conserved motif number of 30. The conserved motifs were annotated using the Pfam database. The online tool GSDS 2.0 (http://gsds.cbi.pku.edu.cn/) was used to display exon–intron layouts. The 1,500-bp genomic DNA sequence upstream of the start codon of each MdGATA gene was obtained from the apple genome.
Cis-elements
in
promoters
were
then
identified
(http://bioinformatics.psb.ugent.be/webtools/plantcare/html/ ).
using
the
PlantCARE
database
ACCEPTED MANUSCRIPT 2.5. Analysis of MdGATA microarray expression data Gene expression data from different tissues of ‘Golden Delicious’ and a range of apple hybrids (M14, M20, M49, M67, M74, and X8877) were downloaded from the Gene Expression Omnibus database (http://www.Ncbi.Nlm.Nih.gov/geo/) with the reference number GSE42873. We then manually extracted the MdGATA gene expression data and visualized the results using Mev 4.9.0
PT
software.
2.6. Plant material, RNA deep sequencing, library construction, and determination of
RI
chlorophyll content
SC
Plant material was collected from 6-year-old apple trees of ‘Nagafu No. 2’ (a ‘Fuji’ cultivar) grown on M.26 rootstocks. RNA extracted from buds at three time points—namely, early, middle, and
NU
late stages of flower bud differentiation—was used for cDNA library construction and RNA-seq by the Biomarker Biotechnology Corporation (Beijing, China). The methods used for library construction,
MA
RNA deep sequencing, and data processing are described in detail in Xing et al. (2015). The RNA-seq expression profiles were visualized using Mev 4.9.0 software. Five types of young tissue (stem, leaf, flower, fruit and bud) of mature ‘Nagafu No. 2’ were used as materials for tissue-specific expression
D
analysis. To compare light- vs. dark-cultivated material, 21-day-old tissue-cultured ‘Nagafu No. 2’
PT E
apple seedlings were grown on Murashige-Skoog medium in a tissue culture room at 25°C under 2,000 lux light intensity. The flasks for controls were subjected to a 16:8 h light:dark photoperiod for 7 days,
CE
while flasks for dark treatments were covered with two layers of aluminum foil. Seedlings under these respective growth conditions were harvested into nitrogen after subjective dawn on the 7th day
AC
(Manfield et al., 2007). Both sets of seedlings were divided into 0.2-g portions, which were cut into small pieces (≤0.2 cm) and soaked in 25 ml of 95% ethanol at 4ºC in the dark for 3 days to extract chlorophyll. Total chlorophyll content was measured spectrophotometrically (Fukuda and Terao 2015).
2.7. RNA extraction, purification, cDNA synthesis, and qRT-PCR Total RNA was purified by the cetyltrimethylammonium bromide method and treated with RNase-free DNase I (Invitrogen, Shanghai, China) to remove any residual genomic DNA. First-strand cDNA was synthesized from 1 μl of total RNA using a SYBR Prime Script RT-PCR Kit II (Takara, Shanghai). Expression profiles of MdGATA genes in different tissues and during photomorphogenesis and skotomorphogenesis were analyzed by qRT-PCR. The qRT-PCR amplifications were performed
ACCEPTED MANUSCRIPT with three technical replicates in 20-µl volumes containing SYBR Premix Ex Taq II (TliRNaseH Plus) with 10 μl of 2× SYBR Premix Ex Taq II (Takara, Beijing, China) on a Bio-Rad CFX Connect Real-Time PCR Detection system. The specific PCR primers used in this study were designed using PrimerQuest (http://www.idtdna.com/Primerquest/). PCR amplification conditions were as follows: 95°C for 5 min, followed by 40 cycles of 94°C for 5 s, 60°C for 15 s, and 72°C for 10 s. Analysis of
PT
rmRNA relative expression levels was performed using the 2 –ΔΔCT method, with the apple actin gene used as an internal control for gene expression normalization.
RI
3. Results
SC
3.1 Genome-wide identification of M. domestica GATA genes
Using HMMER 2.0 software, 35 members of the GATA gene family were identified in M.
NU
domestica (Table 1). To determine their chromosomal distributions, the DNA sequence of each MdGATA gene was searched using BlastN against the M. domestica genome in the GDR database
MA
(http://www.rosaceae.org/tools/ncbi_blast). A total of 31 MdGATA genes were mapped onto 13 M. domestica chromosomes, representing all chromosomes except for 4, 5, 10, and 14, while the other four
D
MdGATA genes were located on unanchored scaffolds. Chromosomes 8 and 15 contained the largest
PT E
number of MdGATA genes, representing 6% and 17.14% of the total number, respectively. Seven chromosomes, namely, 1, 3, 6, 7, 12, 13, and 16, each contained only one MdGATA gene: MdGATA1, 5, 6, 7, 19, 20, and 27, respectively (Fig. 1). On the basis of their chromosomal locations, the 35 GATA
CE
genes were named MdGATA1–MdGATA35. Information on these 35 genes, including gene accession numbers, genomic locations, amino acid numbers, molecular weights, instability indexes, aliphatic
AC
indexes, grand average of hydropathicity (GRAVY) values, and coding sequence (CDS) lengths, is given in Table 1. MdGATA gene CDSs were between 273 and 3,486 bp long, with the molecular weights of predicted MdGATA proteins accordingly ranging from approximately 10 to 130 kDa. The instability index, a measure of protein stability (Guruprasad et al., 1990), was greater than 40 for each protein except for MdGATA25, MdGATA29, and MdGATA30. Theoretical isoelectric points of the 35 MdGATA proteins ranged from 5.23 to 10.62, with the majority above 7. Most MdGATA proteins within the same group had similar theoretical isoelectric points. All theoretical isoelectric points of MdGATA proteins in group B were higher than 8, while those of group C proteins were lower than 8. GRAVY values of all MdGATA proteins were less than zero.
ACCEPTED MANUSCRIPT 3.2 Phylogenetic analysis and sequence alignment To better analyze the evolutionary relationships of MdGATA genes, an unrooted phylogenetic tree was generated using the 35 MdGATA proteins and 19 R. communis, 29 O. sativa, and 30 A. thaliana GATA proteins (Fig 2; Table S1). According to the results of a cluster analysis combined with previous GATA gene studies in model plants, we divided the MdGATA genes into four groups (A, B, C, and D).
PT
Examination of the tree revealed that group A contained 20 GATA members, accounting for more than half of the total number of MdGATA genes (57.1%). With only three members, namely, MdGATA6,
RI
MdGATA27, and MdGATA35, Group D contained the lowest number of MdGATA genes (8.6%). To
SC
further analyze the sequence features of the 35 MdGATA proteins, their amino acid sequences were aligned. This multiple alignment revealed that most MdGATA proteins contained the integrated
NU
conserved domain C-X2-C-X17-20-C-X2-C, with others, including MdGATA29, MdGATA30, MdGATA35 and so on, having lost some of these amino acids (Fig. 3). Amino acid sequence
MA
characteristics of MdGATA proteins in each group were also generally consistent with previously studied GATA proteins in A. thaliana (Reyes et al., 2004). For example, M. domestica GATA proteins of group A were characterized by the presence of conserved Gln and Thr amino acids in positions 15
D
and 25, respectively, of the zinc-finger loop. In addition, a series of conserved sequences was present in
PT E
the α-helix and the unstructured amino-terminal region of all group A members. Group-B MdGATA proteins were characterized by the presence of a Ser residue in the 25th position of the zinc-finger loop
CE
and an Ile residue in the 32nd position. Similar to group-C GATA proteins of other species, all MdGATA group C members had an insertion of two amino acids (Reyes et al., 2004). The MdGATA
AC
proteins of group D were characterized by the presence of a Val residue in the first position before the zinc-finger loop, with an 18-residue loop containing almost no conserved amino acid sites except for a His residue in the fifth position. The GATA motifs and conserved amino acid sites in MdGATA proteins may contribute to the various functions of these GATA proteins.
3.3 Homologous gene pairs and synteny analysis To analyze MdGATA gene duplication events, we identified 10 pairs of putative paralogous MdGATA genes within the MdGATA gene family (Fig. 4a). Tandem and segmental duplications are reported to be the two main mechanisms underlying gene family expansion (Cannon et al., 2004). Tandem duplication is thought to have occurred when two closely related genes are located within the
ACCEPTED MANUSCRIPT same chromosomal region and separated by fewer than 20 genes (Xu et al., 2009). Segmental duplications can be divided into two classes: interchromosomal and intrachromosomal. Members of the interchromosomal class are duplicated on non-homologous chromosomes and many localize to pericentromeric and subtelomeric chromosomal regions. Intrachromosomal duplications, referred to as region- or chromosome-specific low-copy repeats, are typically found on a single chromosome or in a
PT
single chromosomal band (Emanuel and Shaikh, 2001). A whole genome duplication has also occurred in M. domestica, as shown by the distribution of gene pairs across different but homologous
RI
chromosomes, including pairings on chromosomes 1-7, 2-15, 8-15, 3-11, 4-12, 5-10, 6-14, 9-17, and 13-16 (Velasco et al., 2010). Our results indicate that MdGATA2/MdGATA11, MdGATA5/MdGATA17,
SC
MdGATA12/MdGATA15, MdGATA21/MdGATA23 have undergone segmental duplications and MdGATA2/MdGATA21, MdGATA2/MdGATA23, MdGATA11/MdGATA21, MdGATA11/MdGATA23,
NU
MdGATA12/MdGATA22, and MdGATA13/MdGATA26 have been formed by whole genome duplication events. Orthologous gene pairs can provide effective information about evolutionary
MA
relationships between species (Wei et al., 2015). We therefore investigated orthologous GATA gene pairs between M. domestica and A. thaliana, which resulted in the identification of eighteen gene pairs
analysis,
which
revealed
that
MdGATA2/MdGATA23,
MdGATA2/AtGATA22,
PT E
synteny
D
across the two species (Fig. 4a; Table S2). Using these homologous genes, we then carried out a
MdGATA11/MdGATA21, MdGATA12/MdGATA22, MdGATA13/MdGATA26, MdGATA2/AtGATA22, MdGATA13/AtGATA25, AtGATA22/MdGATA23, and AtGATA25/MdGATA26 were located in syntenic
CE
blocks (Fig. 4b; Table S3).
3.4 Gene structure and conserved motif analysis
AC
To better analyze the structure of GATA genes and the conserved motifs of GATA proteins, we constructed a phylogenetic tree of ATGATA and MdGATA proteins. Exon–intron structures of 65 GATA genes were determined based on their full-length CDSs and corresponding genomic DNA sequences, thereby revealing that GATA genes have between 1 (MdGATA32 and MdGATA35) and 20 (MdGATA19) exons. Group A had the lowest average number of CDSs per gene, 2.4, while group C had the highest, 8.6. Furthermore, GATA genes within the same group had analogous gene structures. For example, each GATA gene in group C was composed of more than five CDSs, and, at 3686 bp (data not shown), the average length of their full-length intron was longer than the average intron of any other group. We identified 30 motifs (designated as motifs 1 to 30) in the 65 GATA proteins, with most GATA proteins
ACCEPTED MANUSCRIPT in the same group containing similar motifs (Fig. 5). For example, GATA proteins in group D had an average of nine conserved motifs, including motif 1, which was assigned to the GATA zinc finger according to annotations in the Pfam database, and unique motifs 10, 20, 21, 23, and 27 (Table S4). In addition to motif 1, all GATA proteins in group C contained conserved motifs 4 and 8 representing CCT and TIFY domains, respectively. Apart from MdGATA35, all GATA proteins in group D contained
PT
motif 11, an ASXH domain. Taken together, these results reveal that different clades differ significantly from each other. The order of exons encoding GATA motifs was also similar within each group. For
RI
example, most GATA domains of group-B and group-C GATA proteins were encoded by the last two
SC
and fifth exons, respectively.
3.5 Analysis of the promoter sequences of MdGATA genes
NU
To further explore the function and regulatory patterns of MdGATA genes, a 1,500-bp region of the genomic DNA sequence of each gene was scanned for putative cis-regulatory elements using the
MA
PlantCARE database. This search identified 11 main types of cis-elements (Fig. 6a and b). More than 20 types of light-responsive cis-elements, such as ACE, L-box, and Sp1, were observed across the 35 MdGATA genes. Most notably, light-responsive cis-elements were found to constitute the bulk (up to
D
63%) of presumptive cis-elements (Fig. 6b). In addition, various cis-elements involved in hormone
PT E
response (e.g., MeJA, salicylic acid, gibberellins, auxin, and ethylene), stress response (e.g., drought, low temperature, and heat), meristem expression, and circadian control were also identified in promoter
CE
sequences of the MdGATA genes.
3.6 Chlorophyll content and expression profile analysis of MdGATA genes
AC
To elucidate the expression patterns of MdGATA genes during apple growth and development, the expression patterns of individual MdGATA family members were analyzed in various tissues, including roots, stems, leaves, flowers, seeds, and seedlings, using microarray expression data. A heat map was generated to show the expression profiles (Fig. S1). MdGATA genes exhibited obviously strong preferential expression in leaves, flowers, and fruits. On the basis of these results, 10 family members that were highly expressed in seedlings and leaves were subjected to qRT-PCR analysis (MdGATA2, MdGATA7,
MdGATA13,
MdGATA18,
MdGATA19,
MdGATA23,
MdGATA26,
MdGATA29,
MdGATA32, and MdGATA34; Table S5). The qRT-PCR analysis uncovered high expression of MdGATA7, MdGATA13, MdGATA18, MdGATA19, and MdGATA26 in buds; high expression of
ACCEPTED MANUSCRIPT MdGATA19, MdGATA26, and MdGATA29 in flowers; and high expression of MdGATA2, MdGATA13, MdGATA23, MdGATA26, MdGATA29, and MdGATA32 in leaves (Fig. 7). After 7 days in darkness, seedlings appeared to be etiolated (Fig. 8a). To further investigate the effect of shading on the chlorophyll content of apple, we determined the total chlorophyll content of light-grown and dark-grown seedlings. Our results revealed that the chlorophyll content of apple seedlings experienced
PT
a moderate decrease after dark treatment (Fig. 8b). Most noteworthily, the expressions of all MdGATA genes highly expressed in leaves according to qRT-PCR were higher in light-grown seedlings than in
RI
dark-grown seedlings (Fig. 8c); for MdGATA23, in particular, this expression difference was greater than 2-fold. Other genes, including MdGATA7, MdGATA18, MdGATA32, and MdGATA34, showed
SC
higher expression in dark-grown seedlings. To explore the possible roles of MdGATA genes in apple flowering, expression profiles during three stages of flower bud physiological differentiation were also
NU
analyzed by RNA-seq. As shown in Fig. S2, the expressions of 16 MdGATA genes were detected during the three stages in this study. Expressions of MdGATA1, MdGATA3, MdGATA5, MdGATA6,
MA
MdGATA13, MdGATA17, MdGATA20, and MdGATA26 remained at obviously high levels during all three stages of flower bud physiological differentiation, while MdGATA12, MdGATA22, and
D
MdGATA24 were only significantly up-regulated during early and middle stages.
PT E
4. Discussion
GATA transcription factors play significant roles in various plant growth and development
CE
processes. In conjunction with the rapid development of modern bioinformatics, genome-wide analysis of the GATA gene family has been performed in model plants such as A. thaliana and O. sativa. In this
AC
study, 35 MdGATA genes were identified and classified into four subfamilies designated as groups A to D. Consistent with A. thaliana and O. sativa, group A contained the most MdGATA genes (Reyes et al., 2004). All the GATA homologous gene pairs identified in this study were tightly grouped together, indicating that these homologous gene pairs were more closely related to each other (Fig.2, Fig.4 and Fig.5), which suggested that the topologies of phylogenetic trees were to some extent consistent with the synteny analysis. Protein stability may be a useful indicator to estimate the suitability of a protein for medical and industrial production. Our analysis indicated that the instability index indexes of most MdGATA proteins are above 40, suggesting their possible instability (Guruprasad et al., 1990). GRAVY values of all identified MdGATA proteins were less than zero, indicating that they are
ACCEPTED MANUSCRIPT hydrophilic. These results are consistent with those of a previous study of GATA genes in castor bean (Ao et al., 2015). Taken together, these findings suggest that MdGATA genes are widely conserved in terms of physicochemical properties across different species. Gene duplication events are thought to be an important mechanism in the evolution of plant genomes (Vision et al., 2000; Cannon et al., 2004; Zhou et al., 2004; Yang et al., 2008). Orthologous
PT
relationship analysis of M. domestica and A. thaliana indicated that many AtGATA genes have two or more counterparts in M. domestica, which suggests that the expansion of the MdGATA gene family in
RI
M. domestica may have resulted from genome duplication events. Further investigation revealed the presence of an additional four, and six gene pairs inferred to have arisen from segmental duplication
SC
and whole genome duplication events, respectively. This finding suggests that whole genome duplication is the main mechanism by which the MdGATA gene family has expanded in apple. The
NU
identification of orthologous GATA gene pairs between apple and A. thaliana, a model plant in which many GATA genes have been functionally characterized, has provided reference information about the
MA
evolutionary relationships and expression patterns of GATA genes in M. domestica. Patterns of synteny can provide insight into the evolutionary history of a genome. However, some homologous GATA
D
genes may not have been mappable to any syntenic blocks because of chromosomal rearrangements,
PT E
fusions, and selective gene loss, thus obscuring the identification of chromosomal syntenies (Zhang et al., 2012). We detected an interesting case where apple duplications corresponded to Arabidopsis duplications, such as MdGATA13/MdGATA26-AtGATA24/ AtGATA25/ AtGATA28. AtGATA24 (ZML1),
CE
AtGATA25 (ZML), and AtGATA28 (ZML2) were reported to be homologous genes in previous research (Shikata et al., 2004), demonstrating the accuracy of our results.
AC
In addition to the GATA motif found in all GATA proteins, various MdGATA groups contained other conserved domains, including two new domains in GATA proteins, namely, ASXH and TIFY, and the CCT domain identified in A. thaliana and rice in previous research (Reyes et al., 2004). The CCT domain was first identified in the CONSTANS (CO) protein related to the circadian clock and flowering control in A. thaliana (Suárez-López et al., 2001). The fully conserved TIFY domain has previously been found to characterize a large family of transcription factors (Vanholme et al., 2007). Ongoing studies have suggested that the TIFY domain mediates homomeric and heteromeric interactions between TIFY proteins and other specific proteins (Melotto et al., 2008; Chini et al., 2009). The TIFY domain has also been recently found to be widespread in the JASMONATE ZIM-domain
ACCEPTED MANUSCRIPT protein family and PEAPOD proteins and to be related to the jasmonic acid pathway (Bai et al., 2011). Current research on the ASXH domain has focused mostly on animals, where it is considered to regulate the combination of polycomb-group proteins (Aravind & Iyer,2012). The presence of these different highly conserved domains may therefore be related to various MdGATA protein functions. Exon gain/loss has occurred extensively within many gene families during their evolution. In A.
PT
thaliana, most GATA genes in group A contain only two exons. In MdGATA group A, however, MdGATA 32 contains one exon, while MdGATA15, MdGATA29, and MdGATA31, among others, posses
RI
more than two exons. Such differences also occur in other groups. Taken together, these results demonstrate that GATA genes have undergone moderate divergence in terms of structure and function
SC
over the course of evolution.
Leaf tissue is significantly involved in the light-signaling pathway and photosynthesis during plant
NU
growth and development. Being very sensitive to environmental changes, tissue-cultured seedlings of apple are an ideal material for light regulation research (Li et al., 2012). To further analyze
MA
light-regulated expression in seedlings, we therefore selected MdGATA genes that were strongly expressed in leaves and seedlings according to microarray data. Tissue-specific gene expression
D
analysis based on qRT-PCR revealed the highest expressions of MdGATA genes in leaves, flowers, and
PT E
buds, thus implicating MdGATA genes in the biological processes occurring in these tissues. For a few genes, however, their tissue-specific expression profiles obtained through qRT-PCR of mature tissues were not really consistent with the RNA-seq data; for example, MdGATA7 showed lower expression in
CE
leaves according to qRT-PCR, possibly because of the different plant materials used. The identification of many light-responsive cis-elements in the promoters of MdGATA genes implies that their functions
AC
may be related to light regulation of development. In A. thaliana, GNC(AtGATA21) and CGA1(AtGATA22) are both considered to be widely involved in the regulation of chlorophyll levels, chloroplast size, photosynthetic efficiency, and carbon and nitrogen metabolism (Bi et al., 2005; Hudson et al., 2011; Chiang et al., 2012). GNC homologs in rice and poplar, Os02g12790 (OsGATA11) and PdGNC, respectively, also play essential roles in regulating chlorophyll levels and carbon and nitrogen metabolism (Hudson et al., 2013; An et al., 2014). 29794. m003323, which are homologs of the CGA1 gene in R. communis, has also been demonstrated to potentially function in physiological processes of light regulation (Ao et al., 2015). The CGA1 ortholog MdGATA23, which is closely related to CGA1, GNC, 29794.m003323, and even Os02g12790 according to our analyses of
ACCEPTED MANUSCRIPT phylogenetic, exhibited the greatest difference in expression levels between photomorphogenesis and skotomorphogenesis of any analyzed MdGATA genes. Compared with the other MdGATA genes subjected to qRT-PCR expression analysis, MdGATA23 also contained the largest number of light-responsive cis-elements (i.e., 12), thus demonstrating the potential functions of MdGATA23 in light response and chlorophyll-level regulation. The highly consistent expression patterns of MdGATA
PT
genes between mature leaves and light-grown seedlings according to qRT-PCR performed in this study also suggests that MdGATA genes are related to light-mediated regulation. Environmental condition is a
RI
fundamental factor that influences flowering in apple. Light affects apple flowering not only by photoperiodic induction (Suárez-López et al., 2016), but also via photosynthesis in the floral induction
SC
period, which was proved by our previous research (Fan et al., 2016). Not exactly clear, however, is whether MdGATA genes can function through the photoperiod pathway similar to other
NU
light-responsive genes such as CO (Onouchi et al., 2000) that control flowering formation by responding to light changes and subsequently activating flowering-related genes via various signal
MA
transduction pathways. Nevertheless, a moderate decline in chlorophyll content accompanied the down-regulated expression of most MdGATA genes in our study, demonstrating that MdGATA genes
D
might be able to regulate chlorophyll levels and photosynthetic efficiency to indirectly regulate apple
PT E
flowering. Understanding the details of this process, however, will require further research. Floral induction is a decisive period for flowering in apple. In a previous study, ZIM (AtGATA25) and its two homologous genes AtGATA24 and AtGATA28 were all found to be highly expressed in shoot apices of
CE
the vegetative phase and inflorescences of the reproductive phase in A. thaliana (Shikata et al., 2004). Similarly, AtGATA18 (HANABA TARANU) has also been reported to affect shoot apical meristem
AC
development and function at the boundaries between the meristem and its newly initiated organ primordia and at the boundaries between different floral whorls (Zhao et al., 2004). According to our orthologous relationship analysis, the orthologous genes of AtGATA18 (HANABA TARANU) in apple is MdGATA24, and, the orthologous gene of AtGATA24, AtGATA25 and AtGATA28 in apple are MdGATA13 and MdGATA26 (Table S2). Interestingly, RNA-seq during floral development indicated that MdGATA13 and MdGATA26 had higher expression during the three stages of flower bud physiological differentiation, while MdGATA24 was also highly expressed during early and middle stages. These results strongly suggest that MdGATA13, MdGATA24, and MdGATA26 may play a role in floral induction in apple. The fact that the expressions of the remaining seven MdGATA genes
ACCEPTED MANUSCRIPT (MdGATA1, MdGATA3, MdGATA5, MdGATA6, MdGATA17, and MdGATA20) remained elevated throughout the bud differentiation period also implies that they potentially have various functions in apple floral development. Despite these insights, the way in which GATA genes function in shoot apices to regulate flowering, such as by hormone induction or other types of signal transduction, is unknown. Application of various measures to improve flowering in apple has always been a topic of
PT
importance. Here, we have directly and indirectly provided two possible perspectives to understand the functions of MdGATA genes in flowering in apple, with the ultimate goal of offering some useful
RI
information for improving apple yield.
Overall, the whole-genome, evolutionary, and expression analyses of the GATA gene family in M.
SC
domestica carried out in our study resulted in the identification and characterization of 35 MdGATA genes. Based on their phylogenetic relationships, the 35 MdGATA genes were divided into four groups.
NU
MdGATA genes within the same group had similar conserved motifs, amino acid sites, and exon–intron organizations. Through our analysis of gene duplication events, we believe that whole genome
MA
duplication may be the primary mechanism underlying expansion of the MdGATA gene family. Our analysis of promoter sequences revealed that many light-response-related cis-acting elements are
D
prevalent in the promoter regions of MdGATA genes. Tissue-specific expression analysis suggested that
PT E
MdGATA genes are expressed mainly in leaves, flowers, and buds. The RNA-seq results showed that 11 MdGATA genes were highly expressed at different stages of flower bud physiological differentiation, suggesting that MdGATA genes may play a role in floral induction in apple. Moreover, most MdGATA
CE
genes, especially MdGATA23, were significantly down-regulated after 7 days of dark culture, suggesting a potential function in light-regulated transcription. Here, we have provided basic
AC
information about MdGATA genes that can be usefully applied to different methods, such as genetic engineering and light control, to increase apple flowering and yield.
Acknowledgements This work was supported by the National Science and Technology Supporting Project (2013BAD20B03) and the National Apple Industry Technology System of the Agricultural Ministry of China (CARS-28).
Conflict of interest The authors declare that they have no conflict of interest.
ACCEPTED MANUSCRIPT
References An,Y., Han, X., Tang, S., Xia, X., Yin, W., 2014. Poplar GATA transcription factor PdGNC is capable of regulating chloroplast ultrastructure, photosynthesis, and vegetative growth in Arabidopsis under varying nitrogen levels. Plant Cell Tiss. Org. 119, 313-327. Ao, T., Liao, X.J., Xu, W., Liu, A.Z., 2015. Identification and Characterization of GATA Gene Family in Castor Bean (Ricinus communis) . Plant Diver. Resour. 37, 453-462. Aravind, L., Iyer, L.M., 2012. The HARE-HTH and associated domains: novel modules in the coordination of epigenetic DNA and protein modifications. Cell Cycle. 11, 119-131.
PT
Bai, Y., Meng, Y., Huang, D., Chen, M., Q, Y., 2011. Origin and evolutionary analysis of the plant-specific TIFY transcription factor family. Genomics. 98, 128-136.
RI
Bi, Y.M., Zhang, Y., Signorelli, T., Zhao, R., Zhu, T., Rothstein, S., 2005. Genetic analysis of Arabidopsis GATA transcription factor gene family reveals a nitrate‐inducible member important for chlorophyll synthesis and
SC
glucose sensitivity. Plant J. 44, 680-692.
Borello, U., Ceccarelli, E., Giuliano, G., 1993. Constitutive, light-responsive and circadian clock-responsive factors compete for the different I box elements in plant light-regulated promoters. Plant J. 4, 611-619.
NU
Cannon, S.B., Mitra, A., Baumgarten, A., Young, N.D., May, G., 2004. The roles of segmental and tandem gene duplication in the evolution of large gene families in Arabidopsis thaliana. BMC Plant Biol. 4, 10. Chiang, Y.H., Zubo, Y.O., Tapken, W., Kim, H.J., Lavanway, A.M., Howard, L., Pilon, M., Kieber, J.J., Kieber,
MA
G.E., 2012. Functional Characterization of the GATA Transcription Factors GNC and CGA1 Reveals Their Key Role in Chloroplast Development, Growth, and Division in Arabidopsis. Plant Physiol. 160, 332-348. Chini, A., Fonseca, S., Chico, J.M., Fernández-Calvo, P., Solano, R., 2009. The ZIM domain mediates homo‐and heteromeric interactions between Arabidopsis JAZ proteins . Plant J. 59, 77-87.
D
Chehadeh, W., Albaksami, O., Altawalah, H., Ahmad, S., Madi, N., John, S.E., Abraham, P.S., Al-Nakib, W., 2015. Phylogenetic analysis of HIV-1 subtypes and drug resistance profile among treatment-naïve people in
PT E
Kuwait. J. Med. Virol. 87, 1521-1526.
Daniel-Vedele, F., Caboche, M., 1993. A tobacco cDNA clone encoding a GATA-1 zinc finger protein homologous to regulators of nitrogen metabolism in fungi. Mol. Genet. Genomics. 240, 365-373. Emanuel, B.S., Shaikh, T.H., 2001. Segmental duplications: an 'expanding' role in genomic instability and disease.
CE
Nat. Rev. Genet. 2, 791-800.
Evans, T., Reitman, M., Felsenfeld, G., 1988. An erythrocyte-specific DNA binding factor recognizes a regulatory sequence common to all chicken globin genes. Proc. Natl. Acad. Sci. USA. 85, 5976-5980.
AC
Fan, S., Zhang, D., Lei, C., Chen, H.F., Xing, L.B., Ma, J.J., Zhao, C.P., Han, M.Y., 2016. Proteome Analyses Using iTRAQ Labeling Reveal Critical Mechanisms in Alternate Bearing Malus prunifolia. J. Proteome Res. 15, 3602-3616.
Finn, R.D., Clements, J., Eddy, S.R., 2011. HMMER web server: interactive sequence similarity searching. Nucleic acids res. 39, W29-W37. Fukuda, A., Terao, T., 2015. QTLs for Shoot Length and Chlorophyll Content of Rice Seedlings Grown under Low-Temperature Conditions, using a Cross between Indica and Japonica Cultivars. Plant Prod. Sci. 18, 128-136. Guruprasad, K., Reddy, B.B., Pandit, M.W., 1990. Correlation between stability of a protein and its dipeptide composition: a novel approach for predicting in vivo stability of a protein from its primary sequence. Protein Eng. 4, 155-161. Higgins, D.G., Sharp, P.M., 1988. CLUSTAL: a package for performing multiple sequence alignment on a
ACCEPTED MANUSCRIPT microcomputer. Gene. 73, 237-244. Hudson, D., Guevara, D.R., Hand, A.J., Xu, Z., Hao, L., Chen, X., Zhu, T., B, Y.M., Rothstein, S.J., 2013. Rice cytokinin GATA transcription Factor1 regulates chloroplast development and plant architecture.Plant Physiol. 162, 132-144. Hudson, D., Guevara, D., Yaish, M.W., Hannam, C., Long, N., Clarke, J.D., Bi, Y.M., Rothstein , S.J., 2011. GNC and CGA1 modulate chlorophyll biosynthesis and glutamate synthase (GLU1/Fd-GOGAT) expression in Arabidopsis. PLoS One. 6, e26765. Komeda, Y., 2004. Genetic regulation of time to flower in Arabidopsis thaliana. Annu. Rev. Plant Biol. 55, 521-535.
PT
Krzywinski, M., Schein, J., Birol, I., Connors, J.,Gascoyne, R., Horsman, D., Jones, S.J., Marra, M. A., 2009. Circos: an information aesthetic for comparative genomics. Genome Res. 19, 1639-1645.
RI
Kumar, S., Stecher, G., Tamura, K., 2016. MEGA7: Molecular Evolutionary Genetics Analysis version 7.0 for bigger datasets. Mol. Biol. Evol. 33, 1870-1874.
SC
Lam, E., Chua, N.H., 1989. ASF-2: a factor that binds to the cauliflower mosaic virus 35S promoter and a conserved GATA motif in Cab promoters. Plant Cell. 1, 1147-1156.
Li, L., Stoeckert, C.J., Roos, D.S., 2003. OrthoMCL: identification of ortholog groups for eukaryotic genomes.
NU
Genome Res. 13, 2178-2189.
Lowry, J.A., Atchley, W.R., 2000. Molecular evolution of the GATA family of transcription factors: conservation within the DNA-binding domain. J. Mol. Evol. 50, 103-115.
MA
Luo, X.M., Lin, W.H., Zhu. S., Zhu, J.Y., Sun, Y., Fan, X.Y., Cheng, M., Hao,Y.Q., Oh, E., Tian, M., Liu, L., Zhang, M., Xie, Q., Chong, K., Wang, Z.Y., 2010. Integration of light-and brassinosteroid-signaling pathways by a GATA transcription factor in Arabidopsis. Dev. Cell. 19, 872-883. Li, Y.Y., Mao, K., Zhao, C., Zhao, X.Y., Zhang, H.L., Shu, H.R., Hao, Y.J., 2012. MdCOP1 ubiquitin E3 ligases
Plant physiol. 160, 1011-1022.
D
interact with MdMYB1 to regulate light-induced anthocyanin biosynthesis and red fruit coloration in apple.
PT E
Liu, A., Yong, W., Dang, C., Zhang, D., Song, H., Yao, Q., Chen, K., 2012. A genome-wide identification and analysis of the basic helix-loop-helix transcription factors in the ponerine ant, Harpegnathos saltator. BMC. Evol. Biol. 12:165.
Manfield, I.W., Devlin, P.F., Jen, C.H., David, R., Westhead, Philip, M., Gilmartin, C., 2007. convergence, and
CE
divergence of light-responsive, circadian-regulated, and tissue-specific expression patterns during evolution of the Arabidopsis GATA gene family. Plant Physiol. 143, 941-958. Mayer, K.F., Schoof, H., Haecker, A., Lenhard, M., Jurgens, G., Laux, T., 1998. Role of WUSCHEL in regulating
AC
stem cell fate in the Arabidopsis shoot meristem. Cell. 95, 805-815. Melotto, M., Mecey, C., Niu, Y., Chung, H.S., Katsir, L., Yao, J., Zeng, W., Thines, B., Staswick, P., Browse, J., Howe, G.A., 2008. A critical role of two positively charged amino acids in the Jas motif of Arabidopsis JAZ proteins in mediating coronatine‐and jasmonoyl isoleucine‐dependent interactions with the COI1 F-box protein. Plant J. 55, 979-988. Onouchi, H., Igeño, M I., Périlleux, C., Graves, K., Coupland, G., 2000. Mutagenesis of plants over expressing CONSTANS demonstrates novel interactions among Arabidopsis flowering-time genes. The Plant Cell. 12, 885-900. Newton, A., Mackay, J., Crossley, M., 2001. The N-terminal zinc finger of the erythroid transcription factor GATA-1 binds GATC motifs in DNA. J. Biol. Chem. 276, 35794-35801. Nishii, A., Takemura, M., Fujita, H., Shikata, M., Yokota, A., Kohchi, T., 2000. Characterization of a novel gene encoding a putative single zinc-finger protein, ZIM, expressed during the reproductive phase in Arabidopsis
ACCEPTED MANUSCRIPT thaliana. Biosci. Biotechnol. Biochem. 64, 1402-1409. Patient, R.K., McGhee, J.D., 2002. The GATA family (vertebrates and invertebrates). Curr. Opin. Genet. Dev. 12, 416-422. Reyes, J,C., Muro-Pastor, M.I., Florencio, F.J., 2004. The GATA family of transcription factors in Arabidopsis and rice. Plant Physiol. 134, 1718-1732. Schindler, U., Cashmore, A.R., 1990. Photoregulated gene expression may involve ubiquitous DNA binding proteins. EMBO J. 9, 3415-3427. Scazzocchio, C., 2000 The fungal GATA factors. Curr. Opin. Microbiol. 3, 126-131. Suárez-López P, Wheatley K, Robson F, Onouchi H, Valverde F, Coupland G (2001) CONSTANS mediates
PT
between the circadian clock and the control of flowering in Arabidopsis. Nature. 410, 1116-1120. Shikata, M., Matsuda,Y., Ando, K., Nishii, A., Takemura, M., Yokota, A., Kohchi, T., 2004. Characterization of
RI
Arabidopsis ZIM, a member of a novel plant-specific GATA factor gene family. J. Exp. Bot. 55, 631-639. Teakle, G.R., Gilmartin, P.M., 1998. Two forms of type IV zinc-finger motif and their kingdom-specific
SC
distribution between the flora, fauna and fungi. Trends Biochem. Sci. 23, 100-102.
Terzaghi, W.B., Cashmore, A.R., 1995. Light-regulated transcription. Annu. Rev. Plant Physiol. 46, 445-474. Trainor, C.D., Ghirlando, R., Simpson, M.A., 2000. GATA zinc finger interactions modulate DNA binding and
NU
transactivation. J. Biol. Chem. 275, 28157-28166.
Tsang, A.P., Visvader, J.E., Turner, C.A., Fujiwara,Y., Yu, C., Weiss, M.J., Crossley, M., Orkin, S.H., 1997. FOG, a multitype zinc finger protein, acts as a cofactor for transcription factor GATA-1 in erythroid and
MA
megakaryocytic differentiation. Cell. 90, 109-119.
Vanholme, B., Grunewald, W., Bateman, A., Kohchi, T., Gheysen, G., 2007. The tify family previously known as ZIM. Trends Plant Sci. 12, 239-244.
Velasco, R., Zharkikh, A., Affourtit, J., Dhingra, A., Cestaro, A., Kalyanaraman, A., Fontana, P., Bhatnagar, S.K.,
D
Troggio, M., Pruss D., et al 2010. The genome of the domesticated apple (Malus domestica Borkh.). Nat. Genet. 42, 833-839.
2114-2117.
PT E
Vision,T.J., Brown, D.G., Tanksley, S.D., 2000. The origins of genomic duplications in Arabidopsis. Science. 290,
Wang, W., Wu, P., Li, Y., Hou, X.L., 2016. Genome-wide analysis and expression patterns of ZF-HD transcription factors under different developmental tissues and abiotic stresses in Chinese cabbage. Mol. Genet. Genomics.
CE
291, 1451-1464.
Wang, Y., Tang, H., DeBarry, J.D., Tan, X., Li, J., Wang, X., Lee, T., Jin, H., Marler1, B., Guo, H., Kissinger, J.C., Paterson A.H., 2012. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and
AC
collinearity. Nucleic Acids Res. 40, e49-e49. Wei, X., Wang, L., Yu, J., Zhang, Y.X., Li, D.H., Zhang, X.R 2015. Genome-wide identification and analysis of the MADS-box gene family in sesame. Gene. 569, 66-76. Xing, L., Zhang, D., Song, X., Weng, K., Shen, Y., Li, Y., Zhao, C., Ma, J., An, N., Han, M., 2016. Identifying genome-wide sequence variations and comparing floral-associated traits based on re-sequencing of two varieties of apple (Malus domestica Borkh.)‘Nagafu No. 2’and ‘Qinguan’. Front. Plant Sci. 7. Xing, L.B., Zhang, D., Li, Y.M., Shen, Y.W., Zhao, C.P., Ma,J.J., An, N., Han, M., 2015. Transcription profiles reveal sugar and hormone signaling pathways mediating flower induction in apple (Malus domestica Borkh.). Plant Cell Physiol. 56, 2052-2068. Xu, G., Ma, H., Nei, M., Kong, H., 2009. Evolution of F-box genes in plants: different modes of sequence divergence and their relationships with functional diversification. Proc. Natl. Acad. Sci. USA.106, 835-840. Yang, S., Zhang, X., Yue, J.X., Tian, D., Chen, J.Q., 2008. Recent duplications dominate NBS-encoding gene
ACCEPTED MANUSCRIPT expansion in two woody species. Mol. Genet. Genomics. 280, 187-198. Zhang, Y.C., Gao, M., Singer, S.D., Fei, Z.J., Wang, H., Wang, X.P., 2012. Genome-wide identification and analysis of the TIFY gene family in grape. Plos one. 7, e44465. Zhang, H.X., Jin, J.H., He, Y.M., Lu, B.Y., Li, D.W., Chai, W.G., Khan, A., Gong, Z.H., 2016. Genome-wide identification and analysis of the SBP-box family genes under Phytophthora capsici stress in pepper (Capsicum annuum L.). Front. Plant Sci. 7. Zhao, Y., Medrano, L., Ohashi, K., Fletcher, J.C., Yu, H., Sakai, H., Meyerowitz, E.M., 2004. HANABA TARANU is a GATA transcription factor that regulates shoot apical meristem and flower development in Arabidopsis. Plant Cell. 16, 2586-2600.
PT
Zhou, T., Wang, Y., Chen, J.Q., Araki, H., Jing, Z., Jiang, K., Shen, J., Tian, D., 2004. Genome-wide identification of NBS genes in japonica rice reveals significant expansion of divergent non-TIR NBS-LRR genes. Mol.
AC
CE
PT E
D
MA
NU
SC
RI
Genet. Genomics. 271, 402-415.
ACCEPTED MANUSCRIPT
Table legends Table 1 The 35 putative GATA genes identified in Malus domestica in this study along with their predicted and tallied physiochemical properties Table S1 Thirty Arabidopsis thaliana GATA genes and 29 Oryza sativa GATA genes used in phylogenetic analyses
PT
Table S2 Eighteen orthologous GATA gene pairs between Malus domestica and Arabidopsis thaliana Table S3 Syntenic block regions detected using MCScanX toolkit
RI
Table S4 Distribution of motifs in MdGATA proteins
SC
Table S5 MdGATA gene-specific primers used for qRT-PCR analysis
Figure legends
NU
Fig. 1 Chromosomal distribution of GATA genes in Malus domestica. The scale is in kilobases (kb). The chromosome number is shown at the top of each chromosome.
MA
Fig. 2 An unrooted phylogenetic tree representing the relationships of GATA genes in Malus domestica, Oryza sativa, Ricinus communis and Arabidopsis thaliana. 35 MdGATA proteins and 29 OsGATA and and 19 RcGATA proteins and 30 OsGATA proteins were used to construct the tree. GATA proteins
D
from different species are indicated by different symbols: M. domestica by filled circles, A. thaliana by
PT E
solid yellow triangles, O. sativa by solid squares and R. communis by solid prism. The four different groups are represented by different colors. Numbers at nodes represent bootstrap values based on 1,000
CE
replicates. Bootstrap values below 50% are not shown (Liu et al., 2012; Chehadeh et al., 2015). Fig. 3 Alignment of amino acid sequences from 35 putative GATA genes in Malus domestica. GATA
AC
motifs and amino acid sites are marked at the top, and sequence identities are shown at the bottom. Fig. 4 Syntenic relationships of GATA genes in apple and Arabidopsis thaliana. (a) Results of paralogous relationship analysis of MdGATA genes and orthologous relationship analysis of GATA genes between apple and A. thaliana. (b) Results of synteny analyses of MdGATA and GATA genes between apple and A. thaliana. Homologous genes and syntenic gene regions are connected by colored curves. Fig. 5 Exon–intron structures of GATA genes and a schematic diagram of the amino acid motifs of GATA proteins in Malus domestica and Arabidopsis thaliana. The position of the sequence encoding the GATA motif in GATA genes is shown by feature. The phylogenetic tree representing the
ACCEPTED MANUSCRIPT relationships of GATA genes was constructed using MEGA 7.0.14 according to the neighbor-joining method with 1,000 bootstrap test replicates. Bootstrap values below 50% are not shown. Fig. 6 Distribution of cis-elements in the promoters of putative MdGATA genes. (a) The number of various cis-elements in the promoters of each MdGATA gene. (b) The relative proportions of different cis-elements in the promoters of MdGATA genes are indicated by the pie chart. Cis-elements sharing
PT
identical or similar functions are represented by the same color. Fig. 7 Expression profiles of 10 MdGATA genes in different tissues, including stems, leaves, flowers,
RI
fruits, and buds, investigated by qRT-PCR. Values are means of three replicates ± SE. Small letters indicate significant differences at the 0.05 level.
SC
Fig. 8 Comparative analysis of light- and dark-cultured apple seedlings. (a) Phenotypes of seedlings grown in light (L) and in darkness (D). (b–c) Chlorophyll content of light-grown and dark-grown
NU
seedlings (b) and expression changes of 10 MdGATA genes during photomorphogenesis (L) and skotomorphogenesis (D) using real-time quantitative reverse transcription PCR (c). Values are means of
MA
three replicates ± SE. The ratio of chlorophyll content and expression level in light-grown and dark-grown seedlings is shown.
D
Fig. S1 Transcriptional profile of MdGATA genes in tissues of ‘Golden Delicious’ and different apple
PT E
hybrids. GSM numbers correspond to different RNA sequencing samples. The bar at the top of heat map represents relative expression values. Fig. S2 Flowering-related MdGATA gene expression profiles at early (ES), middle (MS), and late (LS)
CE
stages of flower bud physiological differentiation. Fragments Per Kilobase of transcript per million
AC
mapped reads (FPKM) values and the hierarchical method were used for the cluster analysis.
RI
PT
ACCEPTED MANUSCRIPT
AC
CE
PT E
D
MA
NU
SC
Figure 1
PT E
D
MA
NU
SC
RI
PT
ACCEPTED MANUSCRIPT
AC
CE
Figure 2
MA
NU
SC
RI
PT
ACCEPTED MANUSCRIPT
AC
CE
PT E
D
Figure 3
SC
RI
PT
ACCEPTED MANUSCRIPT
AC
CE
PT E
D
MA
NU
Figure 4
SC
RI
PT
ACCEPTED MANUSCRIPT
AC
CE
PT E
D
MA
NU
Figure 5
PT E
D
MA
NU
SC
RI
PT
ACCEPTED MANUSCRIPT
AC
CE
Figure 6
AC
CE
PT E
D
MA
NU
SC
RI
PT
ACCEPTED MANUSCRIPT
Figure 7
AC
CE
PT E
D
MA
NU
SC
RI
PT
ACCEPTED MANUSCRIPT
Figure 8
ACCEPTED MANUSCRIPT Table 1 The 35 putative GATA genes in M. domestica with their predicted and tallied physiochemical properties.
acids(aa)
Molecular weight/D
pI
Instability
Aliphatic
Hydroph
index
index
obicity
MDP0000220844
1245
414
44895.9
6.43
65.9
62.15
-0.577
MdGATA9
MDP0000255235
1032
343
37820.1
8.29
67.31
53.99
-0.829
MdGATA8
MDP0000824445
996
331
36377.2
6.29
69.74
55.35
-0.758
MdGATA20
MDP0000172464
966
321
35536.2
5.66
52.46
71.31
-0.45
MdGATA16
MDP0000777336
972
323
35496.8
5.88
52.22
70.28
-0.533
MdGATA7
MDP0000528111
732
243
27417.9
9.37
58.46
43.05
-0.9
MdGATA15
MDP0000462997
1305
434
47505.7
6.07
50.32
69.49
-0.503
MdGATA22
MDP0000290156
1125
374
40848.2
6.01
52.28
59.2
-0.724
MdGATA12
MDP0000248210
1125
374
40859.4
6.31
54.71
58.66
-0.694
MdGATA28
MDP0000542350
852
283
31634.2
8.95
42.52
56.18
-0.673
MdGATA33
MDP0000566760
1584
527
59551
9.16
51.17
58.52
-0.729
MdGATA29
MDP0000542351
273
90
9935.4
10.29
39.39
43.33
-1.036
MdGATA30
MDP0000401351
357
118
13170.1
8.68
34.2
85.85
-0.205
MdGATA32
MDP0000224540
582
193
21172.9
9.22
58.15
56.58
-0.715
MdGATA31
MDP0000182176
1092
363
39844.9
8.61
49.48
59.39
-0.753
MdGATA10
MDP0000137305
1056
351
37928.1
5.86
60.18
69.23
-0.489
MdGATA1
MDP0000248942
1884
627
72000.7
10.62
62.96
85.39
-0.434
MdGATA5
MDP0000166889
906
301
33653.7
8.34
57.03
61.3
-0.734
MdGATA18
MDP0000338280
1086
363
40470.9
7.69
57.29
69.06
-0.575
MdGATA17
MDP0000275252
819
272
30448.4
8.77
57.18
62.06
-0.757
MdGATA4
MDP0000310271
1044
347
38351.5
9.8
50.89
59.63
-0.623
MdGATA24
MDP0000253174
810
269
29607.5
8.32
55.83
46.77
-0.798
MdGATA25
MDP0000263391
543
180
20096.4
10.09
26.6
46.11
-0.663
MdGATA19
MDP0000309902
3486
1161
129736.1
8.98
51.02
87.08
-0.299
MdGATA2
MDP0000131803
1029
342
37077.6
9.49
60.76
58.57
-0.644
MdGATA23
MDP0000190038
1080
359
38758.3
9.35
56.5
57.13
-0.667
MdGATA21
MDP0000739900
1074
357
39063.5
9.17
40.59
61.79
-0.638
MdGATA11
MDP0000237740
1068
355
38684.4
9.2
44.76
60.25
-0.655
MDP0000283079
1554
517
55849.1
7.66
42.95
65.86
-0.591
MDP0000316985
1653
550
60344
5.31
45.42
58.15
-0.827
MdGATA34
MDP0000303048
921
306
33338.8
5.53
56.27
61.54
-0.701
MdGATA26
MDP0000192617
1212
403
44360.6
5.23
48.54
68.39
-0.674
MdGATA6
MDP0000309356
1875
624
70048.3
7.01
52.25
75.69
-0.467
MdGATA27
MDP0000129092
1824
607
67711.5
6.56
58.32
71.3
-0.575
MdGATA35
MDP0000703990
663
220
24582.1
10.61
76.74
73.14
-0.67
MdGATA14
RI
NU
MA
D
PT
MdGATA3
MdGATA13
D
of amino
SC
C
size(bp)
PT E
B
Gene locus
CE
A
Gene name
AC
Group
Number
CDS
ACCEPTED MANUSCRIPT Highlights 1. A total of 35 Malus domestica GATA genes were identified and divided into four Groups based on phylogenetic analysis. 2. Eighteen orthologous gene pairs were found between Arabidopsis and M.domestica. Segmental and whole-genome duplications may account for the expansion of GATA genes in Malus domestica. 3. Four main conserved motifs were identified in GATA proteins and the exon-intron physical and encoding layouts were then analyzed. 4. RT-qPCR and RNA-seq results showed that MdGATA genes were expressed mainly in leaf, flower, and bud; most of MdGATA genes were significantly down-regulated after 7 day´s dark culture,
PT
suggesting a potential function in light-regulated transcription; 11 MdGATA genes were highly expressed in different stages of flower bud physiological differentiation, suggesting that MdGATA
AC
CE
PT E
D
MA
NU
SC
RI
genes might play a role during floral induction in apple.