Aspartic proteases gene family in rice: Gene structure and expression, predicted protein features and phylogenetic relation

Aspartic proteases gene family in rice: Gene structure and expression, predicted protein features and phylogenetic relation

Gene 442 (2009) 108–118 Contents lists available at ScienceDirect Gene j o u r n a l h o m e p a g e : w w w. e l s ev i e r. c o m / l o c a t e / ...

2MB Sizes 0 Downloads 55 Views

Gene 442 (2009) 108–118

Contents lists available at ScienceDirect

Gene j o u r n a l h o m e p a g e : w w w. e l s ev i e r. c o m / l o c a t e / g e n e

Aspartic proteases gene family in rice: Gene structure and expression, predicted protein features and phylogenetic relation Jiongjiong Chen 1, Yidan Ouyang 1, Lei Wang, Weibo Xie, Qifa Zhang ⁎ National Key Laboratory of Crop Genetic Improvement and National Centre of Plant Gene Research (Wuhan), Huazhong Agricultural University, Wuhan 430070, China

a r t i c l e

i n f o

Article history: Received 10 December 2008 Received in revised form 22 April 2009 Accepted 23 April 2009 Available online 3 May 2009 Received by J.G. Zhang Keywords: Aspartic protease Rice Microarray Expression profile Phytohormone

a b s t r a c t Aspartic proteases constitute a large family in eukaryotes, which play fundamental roles in protein processing, maturation and degradation. In this study, we identified 96 OsAP genes in rice (Oryza sativa L.), the model plant for monocots, by a reiterative database search. The analysis of the complete set of OsAP genes is presented, including chromosomal location, phylogenetic relationships, classification and gene structure. Moreover, a comprehensive expression analysis of OsAP family genes was performed using 24 tissues during the plant life cycle of two rice cultivars. Sixty-six OsAP genes were found to be expressed in at least one of the examined developmental stages, which were divided into 3 classes based on their transcript levels. OsAP genes were also found to be differentially up- or down-regulated in rice seedlings in response to treatments with phytohormones, as well as in plumules/radicles under light/dark conditions. The comprehensive annotation and expression profiling undertaken in this research add to our understanding of OsAP family genes in rice growth and development. Our results also provide a basis for selection of candidate genes for functional validation in future studies. © 2009 Elsevier B.V. All rights reserved.

1. Introduction Aspartic proteases (APs; EC 3.4.23) are widely distributed in all living organisms, constituting one of the four superfamilies of proteolytic enzymes (Davies, 1990; Barrett, 1992; Rawlings and Barrett, 1999). Most of the APs contain two aspartic acid residues at the active sites. They are active in acidic pH and are specifically inhibited by pepstatin A. The catalytic Asp residues are located within the conserved Asp-Thr/Ser-Gly (DT/SG) motif. In most of the known APs, a pair of aspartic residues act together to bind and activate the catalytic water molecule. However, in some APs, residues of other amino acids replace the second Asp (Barrett et al., 2004). APs are synthesized as single chain preproenzymes and converted to mature two-chain enzymes during activation. Plant APs, especially Arabidopsis APs, are divided into three categories, including typical APs, nucellin-like APs and atypical APs (Faro and Gal, 2005). The characteristic of typical plant AP precursors is the presence of an extra protein domain of approximately 50–100 amino acids known as the plant specific insert (PSI). This segment, inserted into the C-terminal domain of typical plant AP precursors, is Abbreviations: OsAP, Oryza sativa aspartic protease; KOME, Knowledge-based Oryza Molecular biological Encyclopedia; NCBI, National Center for Biotechnology Information; TIGR, The Institute for Genomic Research; NAA, naphthalene acetic acid; GA3, gibberellic acid; KT, kinetin. ⁎ Corresponding author. Tel.: +86 27 87282429; fax: +86 27 87287092. E-mail address: [email protected] (Q. Zhang). 1 These authors contributed equally to this work. 0378-1119/$ – see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.gene.2009.04.021

usually removed during protein maturation. The sequence, structure and function of the PSI domain show no homology with animal or microbial APs, but it is highly similar to that of saposin-like proteins (Mutlu and Gal, 1999). The nucellin-like APs encode proteins similar to nucellin first detected in barley nucellar cells (Chen and Foolad, 1997). Atypical APs display intermediate features between the typical and nucellin-like sequences (Faro and Gal, 2005). APs have been detected or purified from both monocotyledons and dicotyledons as well as from gymnosperms. They are expressed in different plant organs, such as seed, grain, tuber, leaf, flower, petal, root and pollen, as well as in the digestive fluids of carnivorous plants. Plant APs have been implicated in protein processing and/or degradation in different plant organs, as well as in plant senescence, stress response, programmed cell death, and reproduction (Simões and Faro, 2004). In rice (Oryza sativa L.), a typical aspartic protease, oryzasin, was purified from seed and had been characterized in details (Asakura et al., 1995, 1997, 2000). Recently, a rice atypical aspartic protease was identified as playing a major role in regulating indica– japonica hybrid sterility and wide compatibility by conditioning embryo-sac fertility (Chen et al., 2008). Genome-wide identification and phylogenetic analyses of AP genes had been reported in Arabidopsis (Beers et al., 2004; Faro and Gal, 2005; Takahashi et al., 2008). In addition, the phylogenetic relationships of different eukaryotic AP genes had also been investigated (Hughes et al., 2003; Pimentel et al., 2007). However a complete analysis of the AP content of the O. sativa genome is still lacking.

J. Chen et al. / Gene 442 (2009) 108–118

In this paper, we report on systematic identification and phylogenetic analysis of rice AP (OsAP). We also investigated the expression profiles of the OsAP genes in two rice genotypes during plant life cycle under physiological growth conditions and upon light/ dark and phytohormones treatments. In addition, the expression of duplicated OsAP genes was also analyzed, showing that duplication has played an important role in functional diversification of the OsAP genes.

109

the subcellular localizations were performed with the aid of programs PSORT, TargetP 1.1, MITOPROT and ChloroP 1.1 available in Expasy and PredictNLS (http://cubic.bioc.columbia.edu/cgi/-var/nair/resonline. pl). Scans for the active sites of OsAPs were performed using PROSITE. Domains other than ASP were found in databases of SMART, Pfam and Interproscan. Exon–intron organization was determined using genome browser tool in TIGR (http://www.tigr.org/tigr-scripts/osa1_ web/gbrowse/rice/). Manual assessment of the gene structure was done for each predicted OsAP.

2. Materials and methods 2.5. Plant materials and growth conditions 2.1. Database search The BLASTP search was carried out against all the annotated proteins in TIGR (release 5) (http://www.tigr.org), KOME full length cDNA database (http://cdna01.dna.affrc.go.jp/cDNA/) and NCBI RefSeq (http://blast.ncbi.nlm.nih.gov/Blast.cgi) using the Hidden Markov Model profile. The search was conducted using the 352 amino acids of ASP domain (PF00026) downloaded from Pfam (http://pfam.sanger.ac.uk/) as the query, followed by removal of the redundant sequences from the different databases. An e-value cutoff of 10 was adopted for the BLASTP search. Meanwhile, the keyword “aspartic proteinase” was searched in TIGR 5 database. Motif scan was performed using SMART (http://smart.embl-heidelberg.de/), NCBICD (http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi) or INTERPRO (http://www.ebi.ac.uk/Tools/InterProScan/) with filter off. MW, pI, and Titration curve program available through Expasy (http://www.expasy.org/tools/) were employed to calculate molecular weight and pI of the proteins that were not available in TIGR. 2.2. Determining of chromosomal localization and gene duplication The OsAP genes were positioned on rice chromosome pseudomolecules available at TIGR (http://www.tigr.org/tdb/e2k1/osa1/ pseudomolecules/info.shtml) making use of BLASTP. The information on segmental genome duplication of rice available at TIGR (http:// www.tigr.org./tdb/e2k1/osa1/semental_dup/index.shtml) was used to determine the presence of OsAP genes on duplicated chromosomal segments, with the maximal length distance permitted between collinear gene pairs of 500 kb. Tandem duplications were arbitrarily defined as ones that occur within a sequence distance of 50 kb (Riechmann et al., 2000). Identity among duplicated genes was calculated using MEGA 4 software. 2.3. Phylogenetic analysis A set of 96 amino acid sequences of conserved ASP domain was identified from the Pfam. Because of the incomplete conserved ASP domain, OsAP11, OsAP12, OsAP33, OsAP35, OsAP76, OsAP88 and OsAP90 were excluded from the phylogenetic analyses. The remaining OsAPs were aligned by means of ClustalX 2.0.9 (Larkin et al., 2007). To compare and define the subgroups, we integrated conserved ASP domain sequences of CDR1, CND41, PCS1, Nepenthesin I and Nucellin into this dataset. The neighbor-joining tree was constructed with pdistance substitution model by MEGA 4 (Tamura et al., 2007), and the bootstrap support values (2000 replications) were also calculated. The porcine pepsin A was used as the outgroup. The tree obtained in this way was then viewed using the MEGA 4 software. 2.4. Analysis of gene and protein structure and their organization All the publically available details of the OsAP genes were taken into account (see Table S1). The lengths of signal sequences of all OsAPs in rice were predicted by SignalP 3.0 (http://www.cbs. dtu.dk/services/SignalP/) and Signal Peptide Prediction (http:// bioinformatics.leeds.ac.uk/prot_analysis/Signal.html). Predictions of

Two indica (O. sativa ssp. indica) cultivars Minghui 63 and Zhenshan 97 were used to study the expression levels of the OsAP genes at different developmental stages and under a range of growth conditions (Table S2) using Affymetrix rice microarray. Seeds were soaked in water and germinated at 28 °C. For the light/dark treatments, seedlings at the plumule and radicle stages were placed under 48 h continuous light or darkness. For the phytohormone treatments, seedlings at the trefoil stage were treated with 0.1 mM of the phytohormones gibberellic acid (GA3), naphthalene acetic acid (NAA), or kinetin (KT) and the treated leaves were collected after 5, 15, 30 and 60 min, respectively. 2.6. Expression profile analysis The expression profile data of OsAP genes in both rice cultivars were extracted from CREP database (http://crep.ncpgr.cn). Thirtyone RNA samples of several tissues at different developmental stages or subjected to hormonal or dark/light treatments were used to hybridize with Affymetrix rice gene chips. The developmental stages and organs of the tissues were described in Table S2. If more than one probe was available for a gene in the chip, the probe corresponding to the maximum average expression signal for all test tissues was used for expression analysis. To identify the differentially expressed genes, expression level in each of the tissues was compared against the expression in seed. The average of two biological replicates for each sample, except for tissues 1, 2, 14, 15 and 16 (six biological replicates) was used for analysis. A gene was considered as up- or downregulated in a tissue if the signal ratio was N2 or b0.5 while the difference was significant at P b 0.05 by the t-test compared with the seed. Similarly, a gene was regarded as showing differential regulation by phytohormone or light treatments if the signal ratio between the treatment and the control was N2 or b0.5 while the difference was significant with P b 0.05 by the t-test. Cluster analysis based on expression profiles of the genes was performed using Euclidean distance matrix of Hierarchical clustering. 3. Results 3.1. Identification of OsAP genes in rice BLASTP searches against all the annotated proteins in the TIGR pseudomolecules (version 5) with the ASP domain as the query using Hidden Markov Model profile found 130 sequences, out of which 108 were unique. Meanwhile, 111 sequences were obtained from KOME and 73 sequences from NCBI RefSeq database with the same method. In addition, a keyword search in the TIGR database for “aspartic proteinase” resulted in 76 gene models. A total of 130 sequences were obtained from three databases after the redundant sequences were removed manually. These sequences were confirmed for the presence of the ASP domain with confidence (e-value less than 1.0) in the Pfam database, resulting in 107 sequences. Further scanning of the 107 sequences for the ASP domain by motif scan using SMART, NCBI-CD or INTERPRO searches with filter off by default setting reduced the total number of OsAP family genes to 96. Five of the 96 genes had been

110

J. Chen et al. / Gene 442 (2009) 108–118

Fig. 1. Localization of 93 OsAP genes on the rice chromosomes. The chromosome numbers are indicated on the top of the figure. The ORFs of the genes placed on the left side of each chromosome are in reverse directions relative to those on the right side. Dashed lines indicate genes on duplicated segments. Grey shaded boxes indicate tandemly duplicated genes and named as clusters I to X. The position (bp) and orientation of each OsAP gene on TIGR rice chromosome pseudomolecules (release 5) are given in Table S2.

reported in the literatures and were named as OsAsp1, OsAsp2 and OsAsp3 (Bi et al., 2005), Oryzasin (Asakura et al., 1995) and S5 (Chen et al., 2008). We kept the naming of OsAsp1 to OsAsp3, but the remaining 93 genes were named as OsAP4 to OsAP96, in which Oryzasin was renamed as OsAP41 and S5 was renamed as OsAP51. The information of TIGR locus, ORF length, protein length, and chromosomal location of each gene is given in Table S1. 3.2. Chromosomal localization and gene duplication Totally 93 of the 96 OsAP genes were localized on the rice chromosome pseudomolecules available in TIGR. The remaining three OsAP genes were present as cDNAs only in the KOME database. These genes might be a product of alternative splice, or presumably located in the unsequenced gap regions of the rice genome. Fourteen of the OsAP genes were located on chromosome 1, 13 on chromosome 6, 12 on chromosome 10, 10 on chromosome 4, 7 on chromosomes 2 and 7, 6 on chromosomes 3, 5 and 9, 5 on chromosome 11, 4 on chromosome 12, and 3 on chromosome 8. The distribution of OsAP genes on the 12 rice chromosomes is shown in Fig. 1, and the exact position of each gene on the rice chromosome pseudomolecules is given in Table S1. Both tandem and segmental duplications of the OsAP genes on the chromosomes were evident (Fig. 1). Totally 36 OsAP genes were considered as tandem duplications according to the criterion adopted in our analysis, including a large cluster of 11 members on chromosome 10, three four-gene clusters on chromosomes 3, 6 and 7, one three-gene cluster on chromosome 2, and five two-gene clusters on chromosomes 4, 6, 9 and 11. A total of 23 OsAP genes, forming 12 pairs, were involved in segmental duplications (Fig. 1 and Table S3b), and some of the tandemly duplicated genes were also involved in segmental duplications. 3.3. Phylogenetic analysis of OsAP family genes A phylogenetic tree was constructed using the conserved ASP domain sequences of the OsAP proteins (Fig. 2). Because of the incomplete conserved ASP domain, 7 OsAPs were not included in the constructing phylogenetic tree. Three categories (A, B and C) were resolved, similar to those described in Arabidopsis by Faro and Gal (2005). Two OsAPs (OsAP94 and OsAP96) were unable to be classified into any categories. Category A containing 6 members represented typical aspartic proteases. Category B with 13 OsAPs consisted of nucellin-like APs. Category C, composed of atypical aspartic proteases,

was the largest group with 68 OsAP members. Some members in this category were further subdivided into five groups (C1, C2, C3, C4 and C5) according to the APs already reported and the predicted structure, among which 9 OsAPs in C1, 3 OsAPs in C2, 2 OsAPs in C3, 1 OsAPs in C4, and 11 OsAPs in C5. C1 group was homologous to the CND41 protein, a tobacco protein with DNA binding and proteolytic activity found in the chloroplast nucleoid (Nakano et al., 1997). S5 was classified into C2 group; this protein was reported as a major regulator of indica–japonica hybrid sterility and wide-compatibility by conditioning embryo-sac fertility (Chen et al., 2008). C3 group was homologous with PCS1 protein, an Arabidopsis aspartic protease which played an important role in determining the fate of cells in embryonic development and in reproduction processes (Ge et al., 2005). C4 group was remarkably homologous to the CDR1 protein, an Arabidopsis protein involved in signaling of disease resistance (Xia et al., 2004). However, there were no OsAPs highly homologous to Nepenthesin I, a novel subfamily of aspartic protease (Takahashi et al., 2005). OsAPs in C5 group were highly conservative in the first aspartic (D) in the DTG motif, while the other residues in this motif were less conservative. An interesting relationship had been found between the phylogenetic grouping and genomic duplication of OsAP genes. All tandemly duplicated OsAP genes belonged to the C category, with the exception of cluster X in the B category on chromosome 11. Different from tandemly duplicated genes, OsAP genes located on segmentally duplicated chromosomal regions were dispersed in all three categories (Figs 1 and 2; Table S3). 3.4. Predicted structure of the OsAP proteins and genes By consensus, a typical OsAP protein has the following basic structure: a signal peptide, a propeptide, and an ASP domain with two active sites (Barrett et al., 2004). All of the 96 identified OsAP proteins were predicted to contain at least one ASP domain, although the length of this domain was highly variable (Fig. 3; Table S4). It can be seen that 24 OsAP proteins (25.00%) had the basic structure of a signal peptide and two active sites. Fifty-eight OsAP proteins (60.42%) lacked either the signal peptide or the active site, and the remaining 14 (14.58%) had the ASP domain without the signal sequence or active sites. Only six OsAPs were identified as typical OsAP proteins, five of which had a SapB domain located in the plant specific insert (PSI) sequence. In addition, one atypical protein, OsAP76, had a leucine rich repeat N-terminal (LRRNT_2) domain (Fig. 3).

J. Chen et al. / Gene 442 (2009) 108–118

111

Fig. 2. Phylogenetic analysis of the conserved ASP domain from rice OsAPs and other known AP proteins, Nucellin (U87148) from barley, CDR1 (NP_198319) and PCS1 (NP_195839) from Arabidopsis thaliana, CND41 (D26015) from tobacco, Nepenthesin I (BAD07474) from digestive fluids of carnivorous plants, and porcine pepsin A (outgroup) were included for comparison. The tree was constructed with p-distance substitution model by MEGA 4. OsAP proteins grouped into three distinct categories (A, B, and C). The category C was further divided into 5 groups. Values b60 are not shown. Scale bar represents 0.2 amino acid substitution per site.

112

J. Chen et al. / Gene 442 (2009) 108–118

Fig. 3. Protein structure and gene structure of representative OsAP proteins. A, Motif organization of representative OsAP proteins from each group. Different motifs of OsAP proteins are shown in different colors and shapes. B, Exon–intron structure of OsAP proteins corresponding to protein structure. The number in bracket shows the amount of the same gene structure in the same category.

The PSORT, TargetP, MITOPROT, ChloroP and PredictNLS programs were used to predict the subcellular sites where the OsAP proteins were targeted. Fifty-one OsAP proteins were predicted to be extracellular, 15 in plasma membrane, 11 in endoplasmic reticulum membrane, 6 in cytoplasm, 3 in mitochondria, and one each in peroxisome and vacuole, respectively. The details of the targeted subcellular sites of the OsAP proteins predicted were given in Table S4. Analysis of the gene structure showed that OsAP genes in category A had 10 or 12 introns. Intron numbers of the OsAP genes in B category ranged from 7 to 11. But the numbers of introns of the OsAP genes in C category were less than 3 (Table S4). Thus, there were fewer introns in atypical OsAP genes than typical aspartic proteases. 3.5. Expression profiling of OsAP genes in different tissues of rice Eighty-five of the 96 OsAP genes were identified in the Affymetrix chip. OsAP81 and OsAP83 shared the same probeset because of the high sequence homology between them, thus the expression level measured by this probeset represented the sum of the two transcripts. The signal values of the 85 OsAP genes were given in Table S5. Sixty-six genes had “present” call in at least one of the stages analyzed, whereas the remaining 19 genes were regarded as “absent” in these tissues in Minghui 63. The z-scores of log-transformed expression values for each gene in Minghui 63 were used in the analyses. Cluster analysis was performed on the basis of correlation coefficients using the Hierarchical cluster method of “complete linkage”. Expression patterns of the genes were grouped into three main classes (Fig. 4). Class I was subdivided into four subclasses. Nine genes (OsAP8, OsAP10, OsAP12, OsAP32, OsAP39, OsAP43, OsAP82, OsAP89 and OsAP94), were in Class Iα and two genes (OsAP69 and OsAP67)

were in Class Iγ, all of them (except OsAP10) showed low expression levels in all tissues analyzed. Class Iβ consisted of seven genes (OsAP19, OsAP30, OsAP63, OsAP71, OsAP78, OsAP81 and OsAP79) and Class Iδ consisted of four genes (OsAP34, OsAP35, OsAP65 and OsAP76). The expression patterns were very similar in these genes in each subclass. Class Iβ and Class Iδ particularly showed high expression in root and stamen in 1 day before flowering respectively, compared to the remaining tissues analyzed. Surprisingly, OsAP30 was 369-fold up-regulated in root compared with seed, as showing the maximum diversification. Moreover, OsAP34 was 135-fold upregulated in stamen in 1 day before flowering compared with seed. Class II comprised of 26 genes, most of which showed low expression in seed as well as in the developing young panicle. But they showed relatively high transcript accumulation in vegetative tissues, such as root, leaf, sheath and stem. Seven OsAP genes in C1 group were homologous to CND41, which showed accumulation in leaf, stem, and root (Nakano et al., 1997). In contrast, 17 genes belonging to Class III showed high expression level during the development of young panicle and also relatively high in seed, but with low expression in most vegetative tissues. Five OsAP genes (OsAP16, OsAP17, OsAP25, OsAP26 and OsAP66) in Class III were homologous to PCS1, showing higher expression level in reproductive tissues and lower expression level in vegetative tissues (Ge et al., 2005), but two of them were also similar to that of the PCS1 gene. These genes might be involved in similar function as the PCS1 gene. Not all OsAP genes in the same branch showed similar expression pattern. Another example, OsAP8 and OsAP21 were in the same branch in C1 group. However, OsAP8 was expressed in Class Iα, whereas OsAP12 was expressed in Class II. The expression patterns of all OsAP genes in Zhenshan 97 were similar to that in Minghui 63. The calculated fold change values of gene expressions with respect to seed were listed in Table S6.

J. Chen et al. / Gene 442 (2009) 108–118

113

Fig. 4. Expression profiles of OsAP genes in the plant life cycle in rice. Hierarchical cluster displaying expression profiles for 66 aspartic protease genes with detectable expression in at least one of the tissues in Minghui 63 (color bar at the base represents the z-score values transformed from log2-based expression values: green color represents low level, black indicates medium level, and red signifies high level). Different color lines represent different groups.

3.6. Expression of OsAP genes under light, dark and phytohormone treatments To investigate possible light regulation of OsAP genes, the microarray data were obtained for the tissues of plumule and radicle treated with light and dark for 48 h at 48 h after emergence. A total of 14 OsAP genes were differentially expressed between light and dark treatments in the two tissues in two varieties (Fig. 5, Table S7). The expression of 10 genes was up-regulated by more than 2-fold in plumule tissue in Minghui 63 by light compared to dark, but only 3 genes showed elevated expression under light in Zhenshan 97. While all differentially expressed genes were up-regulated in plumule under light compared to dark, three of four differentially expressed genes in radicle were down-regulated under light by more than 2-fold, while only one gene was up-regulated by light for more than 2-fold in radicle of Minghui 63. To understand the regulatory mechanisms of the differential expression of the OsAP genes under the light/dark treatments, putative promoter regions of 2 kb upstream the translation start site were analyzed using PLACE (http://www.dna.affrc.go.jp/ PLACE/signalscan.html). All 14 genes identified in the light treatment contained light regulation cis-elements, such as ASF1MOTIFCAMV, IBOXCORE, INRNTPSADB, SORLIP1AT and SORLIP2AT

(Terzaghi and Cashmore, 1995; Nakamura et al., 2002; Hudson and Quail, 2003) (Table S8). A phytochrome A-repressed motif SORLIP3AT was found in the OsAP30 promoter sequence, and indeed this gene was up-regulated under dark treatment by more than 9-fold in radicle. Two typical OsAP genes exhibited differential expression during dark/light treatment (OsAP6 and OsAP9), but the light regulation cis-elements showed no difference between them. It could be inferred that the large 5′ introns upstream of the initiation codon were not involved in the regulation. This result was consistent with the report of typical plant APs genes in cardoon (Pimentel et al., 2007). For phytohormone treatments, nine OsAP genes in the two varieties exhibited differential expression in response to NAA, GA3 and KT treatments, of which three (OsAP15, OsAP45 and OsAP52) were in common in the two varieties (Fig. 6, Table S7). OsAP52 was downregulated to various extents by the phytohormone treatments, and all the remaining genes showed various degrees of up-regulation by the phytohormones. The 2 kb upstream of the translation start sites of the genes were also analyzed using PLACE. There were two auxin elements (GGTCCCATGMSAUR and NTBBF1ARROLB) and one gibberellin element (WRKY71OS) in the OsAP15 gene promoter sequence (Xu et al., 1997; Baumann et al., 1999; Zhang et al., 2004). Promoter regions of two OsAP genes contained the auxin element ARFAT (Ulmasov et al.,

114

J. Chen et al. / Gene 442 (2009) 108–118

Fig. 5. Differential expression detected for 14 OsAP genes regulated by light and dark treatments. (A) Eleven genes showing differential expression in plumule stage (48 h after emergence) with light and dark treatments in Minghui 63 and Zhenshan 97. (B) Four genes showing differential expression in radicle (48 h after emergence) with light and dark treatments in Minghui 63 and Zhenshan 97. The scores are the average expression values obtained from microarrays. Error bars represent standard errors for data obtained in two biological replicates.

1999), and nine genes contained gibberellin elements such as ARFAT, GADOWNAT, GAREAT, MYBGAHV, PYRIMIDINEBOXOSRAMY1A, PYRIMIDINEBOXHVEPB1, RBENTGA3, and WRKY71OS (Gubler et al., 1995; Cercos et al., 1999; Fukazawa et al., 2000; Mena et al., 2002; Ogawa et al., 2003). Two genes (OsAP18 and OsAP40) contained the cytokinin element CPBCSPOR (Fusada et al., 2005). 3.7. Comparison of expression profiles of duplicated OsAP genes in rice All of 12 pairs of the genes located in the segmentally duplicated regions were found to be represented in the Affymetrix GeneChip, which provided opportunity for comparing their expression patterns (Fig. 7). Two genes (OsAP14 and OsAP57), belonging to two different pairs (OsAP7 and OsAP14, OsAP20 and OsAP57), were not expressed in chips, and no ESTs were found in O. sativa spp. japonica ESTs databases (http://www.ncbi.nlm.nih.gov/). This suggested that one of the members in the duplicates might have lost its function, or becoming pseudofunctionalization, after duplication during the course of evolution. The expression patterns of the remaining pairs were different in most of the tissues tested. These results indicated that one member of the duplicates might have gained new function. Of the tandemly duplicated genes, five (OsAP28, OsAP29, OsAP61, OsAP62 and OsAP77) were not present in the Affymetrix GeneChip. All four members of the gene cluster on chromosome 6 showed similarly expression patterns, suggesting that the functions of these genes might not diverge after duplications (Fig. 8). In the largest cluster of 11 members on chromosome 10, the expression patterns of three tandemly duplicated genes, OsAP78, OsAP79 and OsAP81, were extremely similarly in two varieties. Two genes in this cluster were not present in the Affymetrix GeneChip and the remaining six were expressed at extremely low level in all tissues in both varieties. This result indicated that most of the genes in this cluster might lose (or are losing) their functions during the course of evolution. However, OsAP20, OsAP21 and OsAP22, forming a triplet on chromosome 2, were expressed differentially in most of the tissues studied, indicating diversification of their functions, although the historical development of the divergence remains unclear (Lynch and Katju, 2004). Gene duplications are one of the primary driving forces in genome evolution (Moore and Purugganan, 2003). For most gene families, their dramatic variations in family size and distribution were affected by tandem duplications and segmental duplications (Cannon et al., 2004). In this study we showed that a large number of OsAP genes are involved in segmental duplications and tandem duplications in the rice genome. Comparison of expression profiles of the genes involved in segmentally duplicated regions and tandemly duplicated regions showed either similar, differential or almost no expression in case of one of the partner, reflecting conservation, neofunctionalization or pseudogenization after the duplication event (Lynch and Conery, 2000).

4. Discussion 4.1. Correlation between phylogenetic relationship and expression pattern Expression profiling of OsAP gene family demonstrated that not all OsAP genes in the same category had similar expression patterns. For example, OsAP41 belonging to category A showed high expression level in all the developmental stages. However, the expression pattern of other OsAP genes such as OsAP10 and OsAP44 in this category was very different. In addition, two OsAP genes (OsAsp2 and OsAsp3) were classified in expression Class II, which were different from that of OsAsp1 in Class III, although all these three OsAPs belonged to category B. Moreover, even those OsAPs clustered with an extremely high bootstrap support value in phylogenetic tree may show different expression patterns. Examples of such were provided by OsAP15, OsAP19 and OsAP34, which form a closely related clade in category B, but they were classified into three expression patterns: II, Iβ and Iδ. In addition, OsAP78 and OsAP82 were classified into expression patterns Iβ and Iα whereas they were in the same clade. Conversely, OsAP genes classified in the same expression pattern may have a distant phylogenetic relationship. Class Iδ genes showed high expression level in stamen. However, four OsAP genes in this expression pattern did not belong to the same phylogenetic category.

Fig. 6. Clustering of expression profiles of nine OsAP genes in trefoil stage of seedlings showing response to treatments of three phytohormones (NAA, GA3 and KT). Color bar at the base represents the z-score values transformed from log2-based expression values: green color represents low level, black indicates medium level, and red signifies high level.

J. Chen et al. / Gene 442 (2009) 108–118

115

Fig. 7. Comparison of expression profiles in two varieties of aspartic protease genes that are localized in the duplicated segments of rice genome. X-axis indicates the developmental stages as given in Table S1. Y-axis represents the normalized raw expression values obtained using microarrays. The series of the number represents the gene number, e.g., MH2 and ZS2 represent the gene OsAP2 in Minghui 63 and Zhenshan 97, respectively.

116

J. Chen et al. / Gene 442 (2009) 108–118

Fig. 8. The expression pattern of tandemly duplicated OsAP genes in rice. X-axis represents the developmental stages as given in Table S1. Y-axis represents the raw expression values obtained from microarrays. The series of the number represents the gene number.

J. Chen et al. / Gene 442 (2009) 108–118

All these results suggested that high sequence similarity was not necessarily correlated with similar expression patterns, because proteins with the same or very similar sequences, presumably performing very similar biochemical functions, are needed in different tissues during growth and development. Conversely similar expression patterns exhibited by OsAPs with dissimilar sequences presumably performing different biochemical functions suggest their cofunctions in the tissues during growth and development. 4.2. Different functions and multiple roles of the OsAP proteins in rice development Genome-wide expression profiling of OsAP gene family covering plant life cycle under physiological growth conditions showed that some genes are expressed in many tissues such as seed, root, panicles and endosperms, suggesting multiple functions of these proteases in rice. Thirteen genes, OsAsp2, OsAsp3, OsAP6, OsAP7, OsAP10, OsAP15, OsAP18, OsAP19, OsAP23, OsAP35, OsAP41, OsAP44 and OsAP65, showed high transcript accumulations in all the tissues analyzed in the two varieties. These OsAP genes might be involved in housekeeping functions during rice growth and development. There has been no previous report on the function of aspartic proteases in roots. Our data revealed that 26 OsAP genes were highly expressed in roots, of which three (OsAP30, OsAP79 and OsAP81) were exclusively expressed in root and in the extreme case, OsAP30 was 273-fold up-regulated in root compared with seed. This suggests that OsAPs play important roles in root, although the detailed biochemical functions remain to be characterized. Many studies identified aspartic proteases in reproductive organs. In rice for example, OsAsp1 transcripts were detectable in spikelets at 0–5 days after flowering (DAF) and became abundant at 2 DAF (Bi et al., 2005). In our study the expression of OsAsp1 was low or nondetectable in most of the tested tissues. But it was up-regulated by 54fold in spikelets of 3 days after pollination in Zhenshan 97. Aspartic protease had also been reported to be involved in pollen-tube germination in maize (Radłowski et al., 1996). Interestingly, the Class Iδ genes showed high expression level in stamen. These results indicated that these genes may function in pollination. It was also reported that a rice aspartic protease named S5 is specifically expressed in young panicles and functions to regulate embryo-sac fertility (Chen et al., 2008). These results clearly indicated that OsAPs have important functions in reproductive development in rice. Aspartic proteases have been purified from seeds in many plants, including wheat (Belozersky et al., 1989), Arabidopsis thaliana (Mutlu and Gal, 1999), sorghum (Macedo et al., 1999), buckwheat (Timotijevic et al., 2006), barley (Zhang and Jones, 1999), sunflower (Park et al., 2000) and Theobroma cacao (Guilloteau et al., 2005), which were speculated as conditioning protein degradation. Germination of seeds requires the action of a number of proteolytic enzymes that hydrolyze the seed-storage proteins to provide amino acids to growing plant (Higgins, 1984). It is thus likely that many aspartic proteases were involved in the degradation of seed-storage proteins during germination. This was supported by our data, in which 20 OsAP genes showed high expression level in seed germination. 4.3. Duplication provides expression divergence in OsAP gene family Gene duplications are one of the primary driving forces in genome evolution (Moore and Purugganan, 2003). Whole genome duplication provides raw genetic materials for sequence and expression evolution of duplicate genes (Ha et al., 2009). For most gene families, their dramatic variations in family size and distribution were related to tandem and/or segmental duplications, giving rise to duplicate genes with new functions (Long et al., 2003; Cannon et al., 2004). In this study we showed that a large number of OsAP genes were involved in segmental duplications and tandem duplications in the

117

rice genome. Comparison of expression profiles of these OsAP genes involved in segmentally and tandemly duplicated regions showed similar, differential or silenced gene expression relative to other members of the duplicates, signifying conservation, neofunctionalization or pseudogenization after the duplication events (Lynch and Conery, 2000). Gene expression may be altered by multiple mechanisms, such as mutations in the cis-regulatory regions (Smith et al., 2006), or mutations affecting the related regulatory network (Wang et al., 2007; Xing et al., 2007). And a large part of expression divergence was considered to be brought about by duplications in the course of evolution (Cannon et al., 2004). Duplications in which only partial upstream segments are copied may lose cis-elements and thus cause shifts in patterns of gene expression (Haberer et al., 2004). It follows that rates of expression divergence are likely to be affected by such changes. OsAP gene family presents an opportunity to study how expression has diverged following gene duplication. As mentioned above, duplicated OsAP genes are destined to conservation, neofunctionalization or pseudogenization in terms of their functions. The evolution of expression patterns can be similarly explained. Similar expression behavior between duplicated genes suggests conservation in the regulation of the expression mechanism, and divergence in expression patterns thus neofunctionalization of the gene indicates acquirement of novel regulatory characteristic, while silencing of gene expression after duplication leading to nonfunctionalization of the gene implies large disruption of the regulatory mechanism. Thus, duplication is not only a driving force for the evolution of genome size, but also plays an important role in diversification of gene functions. Such diversification is not only the result of mutations in the coding sequences afforded by the duplicated copies, but also because of the changes in the expression patterns of the duplicated copies in the multigene families. Acknowledgements This research was supported in part by grants from the National Special Key Project of China on Functional Genomics of Major Plants and Animals, and the National Natural Science Foundation of China. Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j.gene.2009.04.021. References Asakura, T., Watanabe, H., Abe, K., Arai, S., 1995. Rice aspartic proteinase, oryzasin, expressed during seed ripening and germination, has a gene organization distinct from those of animal and microbial aspartic proteinases. Eur. J. Biochem. 232, 77–83. Asakura, T., Watanabe, H., Abe, K., Arai, S., 1997. Oryzasin as an aspartic proteinase occurring in rice seeds: purification, characterization, and application to milk clotting. J. Agric. Food Chem. 45, 1070–1075. Asakura, T., Matsumoto, I., Funaki, J., Arai, S., Abe, K., 2000. The plant aspartic proteinase-specific polypeptide insert is not directly related to the activity of oryzasin 1. Eur. J. Biochem. 267, 5115–5122. Barrett, A.J., 1992. Cellular proteolysis: an overview. Ann. N.Y. Acad. Sci. 674, 1–15. Barrett, A.J., Rawlings, N.D., Woessner, J.F., 2004. Handbook of Proteolytic Enzymes, 2nd ed. Elsevier Acad. Press, Amsterdam, pp. 3–4. Baumann, K., De Paolis, A., Costantino, P., Gualberti, G., 1999. The DNA binding site of the Dof protein NtBBF1 is essential for tissue-specific and auxin-regulated expression of the rolB oncogene in plants. Plant Cell 11, 323–334. Beers, E.P., Jones, A.M., Dickerman, A.W., 2004. The S8 serine, C1A cysteine and A1 aspartic protease families in Arabidopsis. Phytochemistry 65, 43–58. Belozersky, M.A., Sarbakanova, S.T., Dunaevsky, Y.E., 1989. Aspartic proteinase from wheat seeds: isolation, properties and action on gliadin. Planta 177, 321–326. Bi, X., Khush, G.S., Bennett, J., 2005. The rice nucellin gene ortholog OsAsp1 encodes an active aspartic protease without a plant-specific insert and is strongly expressed in early embryo. Plant Cell Physiol. 46, 87–98. Cannon, S.B., Mitra, A., Baumgarten, A., Young, N.D., May, G., 2004. The roles of segmental and tandem gene duplication in the evolution of large gene families in Arabidopsis thaliana. BMC Plant Biol. 4, 10.

118

J. Chen et al. / Gene 442 (2009) 108–118

Cercos, M., Gomez-Cadenas, A., Ho, T.H., 1999. Hormonal regulation of a cysteine proteinase gene, EPB-1, in barley aleurone layers: cis- and trans-acting elements involved in the co-ordinated gene expression regulated by gibberellins and abscisic acid. Plant J. 19, 107–118. Chen, F., Foolad, M.R., 1997. Molecular organization of a gene in barley which encodes a protein similar to aspartic protease and its specific expression in nucellar cells during degeneration. Plant Mol. Biol. 35, 821–831. Chen, J., et al., 2008. A triallelic system of S5 is a major regulator of the reproductive barrier and compatibility of indica–japonica hybrids in rice. Proc. Natl. Acad. Sci. U. S. A. 105, 11436–11441. Davies, D.R., 1990. The structure and function of the aspartic proteinases. Annu. Rev. Biophys. Biophys. Chem. 19, 189–215. Faro, C., Gal, S., 2005. Aspartic proteinase content of the Arabidopsis genome. Curr. Protein Pept. Sci. 6, 493–500. Fukazawa, J., Sakai, T., Ishida, S., Yamaguchi, I., Kamiya, Y., Takahashi, Y., 2000. Repression of shoot growth, a bZIP transcriptional activator, regulates cell elongation by controlling the level of gibberellins. Plant Cell 12, 901–915. Fusada, N., Masuda, T., Kuroda, H., Shimada, H., Ohta, H., Takamiya, K., 2005. Identification of a novel cis-element exhibiting cytokinin-dependent protein binding in vitro in the 5′-region of NADPH-protochlorophyllide oxidoreductase gene in cucumber. Plant Mol. Biol. 59, 631–645. Ge, X., Dietrich, C., Matsuno, M., Li, G., Berg, H., Xia, Y., 2005. An Arabidopsis aspartic protease functions as an anti-cell-death component in reproduction and embryogenesis. EMBO Rep. 6, 282–288. Gubler, F., Kalla, R., Roberts, J.K., Jacobsen, J.V., 1995. Gibberellin-regulated expression of a myb gene in barley aleurone cells: evidence for Myb transactivation of a high-pI alpha-amylase gene promoter. Plant Cell 7, 1879–1891. Guilloteau, M., Laloi, M., Michaux, S., Bucheli, P., McCarthy, J., 2005. Identification and characterisation of the major aspartic proteinase activity in Theobroma cacao seeds. J. Sci. Food Agric. 85, 549–562. Haberer, G., Hindemitt, T., Meyers, B.C., Mayer, K.F., 2004. Transcriptional similarities, dissimilarities, and conservation of cis-elements in duplicated genes of Arabidopsis. Plant Physiol. 136, 3009–3022. Ha, M., Kim, E.D., Chen, Z.J., 2009. Duplicate genes increase expression diversity in closely related species and allopolyploids. Proc. Natl. Acad. Sci. U. S. A. 106, 2295–2300. Higgins, T.J.V., 1984. Synthesis and regulation of major proteins in seeds. Annu. Rev. Plant Physiol. 35, 191–221. Hudson, M.E., Quail, P.H., 2003. Identification of promoter motifs involved in the network of phytochrome A-regulated gene expression by combined analysis of genomic sequence and microarray data. Plant Physiol. 133, 1605–1616. Hughes, A.L., Green, J.A., Piontkivska, H., Roberts, R.M., 2003. Aspartic proteinase phylogeny and the origin of pregnancy-associated glycoproteins. Mol. Biol. Evol. 20, 1940–1945. Larkin, M.A., et al., 2007. Clustal W and Clustal X version 2.0. Bioinformatics 23, 2947–2948. Long, M., Betran, E., Thornton, K., Wang, W., 2003. The origin of new genes: glimpses from the young and old. Nat. Rev. Genet. 4, 865–875. Lynch, M., Conery, J.S., 2000. The evolutionary fate and consequences of duplicate genes. Science 290, 1151–1155. Lynch, M., Katju, V., 2004. The altered evolutionary trajectories of gene duplicates. Trends Genet. 20, 544–549. Macedo, I.Q., Marques, P., Delgadillo, I., 1999. Pepstatin-sensitive proteolytic activity of sorghum seeds. Biotechnology Techniques 13, 817–820. Mena, M., Cejudo, F.J., Isabel-Lamoneda, I., Carbonero, P., 2002. A role for the DOF transcription factor BPBF in the regulation of gibberellin-responsive genes in barley aleurone. Plant Physiol. 130, 111–119. Moore, R.C., Purugganan, M.D., 2003. The early stages of duplicate gene evolution. Proc. Natl. Acad. Sci. U. S. A. 100, 15682–15687.

Mutlu, A., Gal, S., 1999. Plant aspartic proteinases: enzymes on the way to a function. Physiol. Plant. 105, 569–576. Nakamura, M., Tsunoda, T., Obokata, J., 2002. Photosynthesis nuclear genes generally lack TATA-boxes: a tobacco photosystem I gene responds to light through an initiator. Plant J. 29, 1–10. Nakano, T., Murakami, S., Shoji, T., Yoshida, S., Yamada, Y., Sato, F., 1997. A novel protein with DNA binding activity from tobacco chloroplast nucleoids. Plant Cell 9, 1673–1682. Ogawa, M., Hanada, A., Yamauchi, Y., Kuwahara, A., Kamiya, Y., Yamaguchi, S., 2003. Gibberellin biosynthesis and response during Arabidopsis seed germination. Plant Cell 15, 1591–1604. Park, H., Yamanaka, N., Mikkonen, A., Kusakabe, I., Kobayashi, H., 2000. Purification and characterization of aspartic proteinase from sunflower seeds. Biosci. Biotechnol. Biochem. 64, 931–939. Pimentel, C., Van Der Straeten, D., Pires, E., Faro, C., Rodrigues-Pousada, C., 2007. Characterization and expression analysis of the aspartic protease gene family of Cynara cardunculus L. FEBS J. 274, 2523–2539. Radłowski, M., Kalinowski, A., Adamczyk, J., Kro'likowski, Z., Bartkowiak, S., 1996. Proteolytic activity in the maize pollen wall. Physiol. Plant. 98, 172–178. Rawlings, N.D., Barrett, A.J., 1999. MEROPS: the peptidase database. Nucleic Acids Res. 27, 325–331. Riechmann, J.L., et al., 2000. Arabidopsis transcription factors: genome-wide comparative analysis among eukaryotes. Science 290, 2105–2110. Simões, I., Faro, C., 2004. Structure and function of plant aspartic proteinases. Eur. J. Biochem. 271, 2067–2075. Smith, A.D., Sumazin, P., Xuan, Z., Zhang, M.Q., 2006. DNA motifs in human and mouse proximal promoters predict tissue-specific expression. Proc. Natl. Acad. Sci. U. S. A. 103, 6275–6280. Takahashi, K., et al., 2005. Nepenthesin, a unique member of a novel subfamily of aspartic proteinases: enzymatic and structural characteristics. Curr. Protein Pept. Sci. 6, 513–525. Takahashi, K., Niwa, H., Yokota, N., Kubota, K., Inoue, H., 2008. Widespread tissue expression of Nepenthesin-like aspartic protease genes in Arabidopsis thaliana. Plant Physiol. Biochem. 46, 724–729. Tamura, K., Dudley, J., Nei, M., Kumar, S., 2007. MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Mol. Biol. Evol. 24, 1596–1599. Terzaghi, W.B., Cashmore, A.R., 1995. Light-regulated transcription. Annu. Rev. Plant Mol. Biol. 46, 445–474. Timotijevic, G.S., Radovic, S.R., Konstantinovic, M.M., 2006. Aspartic proteinases from buckwheat (Fagopyrum esculentum moench) seeds purification and properties of the 47 kDa enzyme. Arch. Biol. Sci., Belgrade 58, 171–177. Ulmasov, T., Hagen, G., Guilfoyle, T.J., 1999. Dimerization and DNA binding of auxin response factors. Plant J. 19, 309–319. Wang, D., et al., 2007. Expression evolution in yeast genes of single-input modules is mainly due to changes in trans-acting factors. Genome Research 17, 1161–1169. Xia, Y., et al., 2004. An extracellular aspartic protease functions in Arabidopsis disease resistance signaling. EMBO J. 23, 980–988. Xing, Y., Ouyang, Z., Kapur, K., Scott, M.P., Wong, W.H., 2007. Assessing the conservation of mammalian gene expression using high-density exon arrays. Molecular biology and evolution 24, 1283–1285. Xu, N., Hagen, G., Guilfoyle, T., 1997. Multiple auxin response modules in the soybean SAUR 15A promoter. Plant Sci. 126, 193–201. Zhang, N., Jones, B.L., 1999. Polymorphism of aspartic proteinases in resting and germinating. Barley Seeds 76, 134–138. Zhang, Z.L., Xie, Z., Zou, X., Casaretto, J., Ho, T.H., Shen, Q.J., 2004. A rice WRKY gene encodes a transcriptional repressor of the gibberellin signaling pathway in aleurone cells. Plant Physiol. 134, 1500–1513.