Molecular characterization of gene sequences coding for protein disulfide isomerase (PDI) in durum wheat (Triticum turgidum ssp. durum)

Molecular characterization of gene sequences coding for protein disulfide isomerase (PDI) in durum wheat (Triticum turgidum ssp. durum)

Gene 265 (2001) 147±156 www.elsevier.com/locate/gene Molecular characterization of gene sequences coding for protein disul®de isomerase (PDI) in dur...

406KB Sizes 10 Downloads 68 Views

Gene 265 (2001) 147±156

www.elsevier.com/locate/gene

Molecular characterization of gene sequences coding for protein disul®de isomerase (PDI) in durum wheat (Triticum turgidum ssp. durum) q M. Ciaf®, A.R. Paolacci, L. Dominici, O.A. Tanzarella, E. Porceddu* Department of Agrobiology and Agrochemistry, University of Tuscia, 01100 Viterbo, Italy Received 7 December 2000; received in revised form 15 December 2000; accepted 9 January 2001 Received by M. D'Urso

Abstract The organisation of the durum wheat genomic sequence (3.5 kb) coding for protein disul®de isomerase (PDI), deduced by comparison between genomic fragments and cDNA sequences (1.5 kb) isolated from immature caryopses, is described. The gene structure consists of ten exons and nine introns. The presence of consensus sequences involved in splicing, such as intron-exon junctions and branchpoint, has been observed and discussed. Although the deduced wheat PDI amino acid sequence exhibited an overall identity of only 31% to that of human PDI, their modular architecture in terms of number, size, location and secondary structure-propensities of the constituent domains are remarkably similar. The comparison of the amino acid sequences with the eight available plant PDI-like sequences showed a high identity with four of them and low with the remaining ones. Analyses of transcription levels showed that the PDI mRNA was present in all analysed tissues, with much higher expression in immature caryopses. q 2001 Elsevier Science B.V. All rights reserved. Keywords: Gene structure; Gene expression; pre-mRNA splicing; Protein domain; Binding protein (BiP)

1. Introduction Protein disul®de isomerase (PDI) (EC 5.3.4.1) is an abundant protein in the lumen of the endoplasmic reticulum (ER) acting as a catalyst for disul®de bond formation during the biosynthesis of various secretory and cell surface proteins (Freedman et al., 1994). Structure and function of PDI have been extensively studied in vertebrates including mammalian species. The vertebrate enzyme is a dimer consisting of two identical subunits of about 57 kDa; intron-exon boundaries, sequence homologies, and some proteolysis studies suggest that PDI is a modular protein comprising ®ve domains: a, b, b 0 , a 0 and c 0 (Edman et al., 1985; Freedman et al., 1994). The a and a 0 domains are homologous to thioredoxin, a well characterized small protein that particiAbbreviations: bp, base pair; BiP, binding protein; CS, Triticum aestivum cv Chinese Spring; ER, endoplasmatic reticulum; LNG, Tricticum turgidum ssp. Durum cv Langdon; mRNA, messenger RNA; PBI, protein disul®de isomerase; rRNA, ribosonal RNA; RT-PCR, reverse transcriptase polymerase chain reaction; SP, signal peptide q The nucleotide sequence data reported in this paper will appear in the DDBJ/EMBL/GenBank Nucleotide Sequence Databases with the following Accessions Nos: PDIGDW-F1-R1, AJ277377; PDIGDW-F1-R3, AJ277378, PDICDW-F1-R1, AJ277379; PDICDW-F1-R3, AJ277380. * Corresponding author. Tel.: 139-076-135-7231; fax: 139-076-1357256. E-mail address: [email protected] (E. Porceddu).

pates in many cytoplasmic redox reactions (Edman et al., 1985; Freedman et al., 1994). In addition, PDIs have functions such as peptide binding, cell adhesion and perhaps chaperone activities (for review see Ferrari and Soling, 1999). Characteristics of plant PDI, regulation of its expression during plant development, and its physiological role are less known. Studies on its expression and intracellular localisation in wheat and maize (Shimoni et al., 1995a; Li and Larkins, 1996) indicated that the enzyme may play an important role in the folding of plant secretory proteins, and particularly in the formation of endosperm protein bodies. PDI or PDI-like cDNA sequences have been cloned and sequenced from species such as alfalfa (Shorrosh and Dixon, 1991; 1992), barley (Chen and Hayes, 1994), common wheat (Shimoni et al., 1995b), maize (Li and Larkins, 1996), castor bean (Coughlan et al., 1996) and tobacco (EMBL ac. Y11209), whereas genomic DNA sequences encoding for proteins belonging to the PDI family have so far been reported only in Arabidopsis(EMBL accession numbers AC003033 and AC002535). PDI gene sequences are located in all the three genomes of the hexaploid wheat T. aestivum, one each in chromosome arms 4AL, 4DS, 4BS and 1BS (Ciaf® et al., 1999). Genomic fragments spanning the whole open reading frame of PDI gene sequences from durum wheat, Triticum turgi-

0378-1119/01/$ - see front matter q 2001 Elsevier Science B.V. All rights reserved. PII: S 0378-111 9(01)00348-1

148

M. Ciaf® et al. / Gene 265 (2001) 147±156

dum ssp. durum (Desf.) cv Langdon (LNG), have been cloned and sequenced as part of a study on the structure and expression of the gene. The exon-intron organization of the entire genomic sequence, deduced by comparison between the genomic fragments and the corresponding cDNA sequences isolated from immature caryopses, is reported in this paper. Special attention was devoted to possible similarities/differences with vertebrate PDI coding sequences. Analyses of transcripts allowed to describe the variation in the gene expression in different tissues and at different stages of caryopsis development. 2. Materials and methods 2.1. Plant material Seedlings and young roots of durum wheat cv LNG were collected 20 days after germination. Leaves were harvested from mature plants 30 and 20 days before anthesis, and 5 and 21 days after anthesis, and ¯orets just prior to anthesis. Developing caryopses were collected between 5 and 37 days after anthesis. All plant material was immediately frozen in liquid nitrogen and stored at 2808C until use. Chromosome assignment was performed by utilizing nulli-tetrasomic lines of the common wheat cultivar Chinese Spring (CS) (Sears, 1966). 2.2. DNA and RNA isolation DNA was isolated from 5 g of leaves from single plants as reported in Ciaf® et al. (1999). Total RNA was extracted from 10 days after anthesis developing caryopses as described by Colot et al. (1989) and poly(A) 1 mRNA was puri®ed using a magnetic mRNA isolation procedure (PolyATtract system IV, Promega). The ®rst-strand cDNA was synthesized from 50 ng of poly(A) 1 mRNA using the Expande Reverse Transcriptase (RT) (Boehringer Mannheim, Germany). 2.3. Northern blot analysis Ten micrograms of total RNA from different tissues were separated on 1.4% agarose gel containing 2.2 M formaldehyde and transferred to nylon membrane (S&S Nytran N, Schleicher & Schuell). Prehybridization (4 h) and hybridization (16±20 h) reactions were carried out at 658C using standard procedures (Sambrook et al., 1989). Immunological detection was performed using Anti-Digoxigenin-AP, Fab Fragments (Boehringer Mannheim). Digoxigenin-11dUTP was incorporated into DNA by PCR as described in Ciaf® et al. (1999). Two different probes were used for Northern hybridization: an RT-PCR product containing the entire PDI coding region from durum wheat (PDICDW-F1-R1 clone) and a partial cDNA clone coding for the glucose regulated protein 78 (GRP78), also known as

the lumenal binding protein (BiP) isolated from bread wheat (Grimwade et al., 1996). 2.4. PCR analysis The Expand Long Template PCR System (Boehringer Mannheim) was used as directed in the package insert using the supplied buffer 3. PCR reactions were performed in 50 ml using 0.75 ml of the enzyme mix containing Taq and Pwo DNA polymerases, 1 £ reaction buffer, 500 mM of each dATP, dTTP, dCTP and dGTP, 0.25 mM of each primer and 300 ng of genomic DNA or 5 ml of the RT reaction. DNA was subjected to an initial denaturation step at 948C for 2 min; the ampli®cation conditions were 30 cycles each at 948C for 30 s, 638C for 1 min and 688C for 5 min, followed by a ®nal extension step at 688C for 7 min. Primers for amplifying different regions of PDI gene have been synthesised on the basis of the common wheat cv CS cDNA sequence (Shimoni et al., 1995b): (F1) 5 0 CATGGCGATCTCCAAGGTCTGG 3 0 , (F2) 5 0 CATGCCAACCATCTCCCAGC 3 0 , (F3) 5 0 TATTCCTGAGGCCAACAATGAGCC 3 0 , (R1) 5 0 CAACCAACAACCAACTCTCGTCCC 3 0 , (R2) 5 0 CAAGAGGTAAGGATGGTTGTCAGGG 3 0 , (R3) 5 0 AATAGGCTCGGACTTCCTGAATGG 3 0 . Ampli®ed products were separated on 1.5% agarose gel, l DNA digested by HindIII and EcoRI was used as a molecular weight marker. Southern blots of PCR products were hybridized with a digoxigenin-labelled PDI cDNA of bread wheat cv CS (Shimoni et al., 1995b) as described in Ciaf® et al. (1999). 2.5. Cloning, nucleotide sequencing and computer analysis Genomic and cDNA ampli®ed products were cloned into the modi®ed EcoRV site of the pGEM-T plasmid vector (Promega) and sequenced using an ABI Prism Dye Terminator sequencing kit (Perkin Elmer) and either vector speci®c or sequence speci®c primers. The PC/GENE computer package (IntelliGenetics, Inc., USA) and the GCG sequence analysis software (Genetics Computer Groups Incorporated, WI) were used to analyze the sequence data. The software used for the intron splice prediction was that developed by Hebsgaard et al. (1996) (www.cbs.dtu.dk/services/ NetGene2). A search against all SWISS-PROT (SwissProt release 29.0) database entries was performed for the deduced amino acid sequences of wheat PDI clones using the program blastp and the score table Blosum62. Alignments were produced by the new software BLAST 2 sequences (Tatusova and Madden, 1999). Secondary structure analysis was performed using the secondary structure consensus prediction procedure (Deleage et al., 1997) (http://pbil.ibc.fr/NPSA). CLUSTALW 1.6 (Thompson et al., 1994) was used to align the PDICDW-F1-R1 with all the known plant PDI-like protein sequences of plants and a gene tree was constructed by Phylip 3.5c (Felsenstein, 1993) on the basis of the distances between the sequences estimated using PROTDIST (Categories distance corrections).

M. Ciaf® et al. / Gene 265 (2001) 147±156

The tree was formed by the neighbor-joining method (Saitou and Nei, 1987) and rooted by the most divergent PDI-like sequence (arabidopsis1, AC003033), as revealed by the guide tree (CLUSTALW), computed from the distance matrix giving the divergence of each pair of plant PDI-like sequences. 3. Results 3.1. Isolation of genomic and cDNA sequences of wheat PDI genes by PCR ampli®cation Genomic sequences from durum wheat cv LNG were ampli®ed using six combinations of primer pairs, which ampli®ed the entire PDI gene, and several partially overlapping regions spanning the whole gene sequence (Fig. 1a). PCR ampli®cation products comprised the whole coding region plus a short segment of the untraslated 3 0 ¯anking sequence (Fig. 1a). Analyses of the PCR outputs showed the presence of single and speci®c ampli®cation products for all the combinations of primer pairs. Southern-blot analysis with the PDI cDNA probe from T. aestivum cv CS con®rmed that all the ampli®cation products derived from the PDI gene sequence (data not shown). All genomic ampli®cation fragments were much larger than expected on the basis of the T. aestivum cDNA sequence, indicating the presence of intron sequences. The comparison of ampli®cations of genomic DNA (about 3.5 kb) and cDNA (about 1.6 kb) with the primers ¯anking the entire coding region

149

(F1-R1) indicated the presence of about 1.9 kb intronic sequences (Fig. 1a,b). Nulli-tetrasomic analysis carried out using the six primer combinations showed that three of them (primer pairs F1R1, F2-R1 and F3-R1) ampli®ed two distinct sequences respectively assigned to 4A and 4D chromosomes, whereas those of the remaining three primer pairs (F1-R3, F1-R2 and F2-R3) could not be located, probably for the presence of co-migrating PDI gene sequences located in different chromosomes (data not shown). The relative intensity of the F1R3, F1-R2 and F2-R3 PCR products in CS, LNG and in CS nulli-tetrasomic lines indicated that the ampli®ed sequences derived from the B genome (either chromosome 4B or 1B) and/or that these combinations of primer pairs preferentially ampli®ed the PDI gene sequence located in 4B chromosome (data not shown). 3.2. Structure of the wheat PDI gene The nucleotide sequences of the six genomic clones, designated as PDIGDW-F1-R1, PDIGDW-F1-R3, PDIGDW-F1R2, PDIGDW-F2-R1, PDIGDW-F2-R3, and PDIGDW-F3R1 were 3476, 2722, 1517, 2329, 1555 and 763 bp long, respectively (Fig. 1b), and were classi®ed into two groups by sequence analysis: the overlapping regions of the three ampli®ed sequences located in the 4A chromosome (PDIGDW-F1-R1, PDIGDW-F2-R1 and PDIGDW-F3-R1 clones) were identical, whereas those of the remaining ones did not show complete nucleotide match with the ®rst group.

Fig. 1. PCR ampli®cation of different genomic regions of the PDI genes from T. turgidum ssp. durum cv LNG and structure of the gene. (a) Diagram of the PDI cDNA clone from T. aestivum cv CS (Shimoni et al., 1995b), arrows indicate the position of the primers used for PCR analyses. (b) Schematic representation of the six fragments ampli®ed from genomic DNA of LNG; size (bp) and chromosome assignment are shown. (c) Intron-exon structures of the PDIGDW-F1-R1 and PDIGDW-F1-R3 clones. The open boxes indicate exons and the solid black boxes denote introns, numbers represent exon and intron size (bp). The positions of the putative N-terminal signal peptide (SP), of the two thioredoxin-like active sites (CGHC) and of the C-terminal KDEL signal sequence for ER retention are also indicated.

150

M. Ciaf® et al. / Gene 265 (2001) 147±156

The two largest clones, PDIGDW-F1-R1 and PDIGDW-F1R3, representative of the two groups, were chosen to determine the gene intron-exon structure. Alignment of the nucleotide sequence of the genomic clone PDIGDW-F1-R1 with the corresponding cDNA revealed a complete nucleotide match in their overlapping regions and indicated that the isolated PDI gene was composed of ten exons (Fig. 1c). The intron size ranged from 85 (9th intron) to 684 bp (5th central intron). Exon 1 started with a 75 bp long nucleotide sequence, including the ATG initiation codon, and coded for a putative N-terminal signal peptide of 25 amino acids (von-Heijne, 1986). Exon 10 consisted of 113 bp from the 3 0 untraslated region and 225 bp from the coding region; this contained a nucleotide sequence encoding for the tetrapeptide KDEL, which is a consensus signal for protein retention in the lumen of the ER (Denecke et al., 1992), located just before the stop triplet (TGA). The codons for the two catalytic sites (CGHC aminoacid sequences) for protein disulphide isomerase activity were located 1 bp after the beginning in exon 2 and 26 bp after the beginning in exon 9. Moreover, the triplets for a potential N-glycosilation site (NFS) were present 15 bp after the beginning of exon 6.

The nucleotide sequence of the genomic clone PDIGDWF1-R3 was identical to the corresponding cDNA sequence, whereas it exhibited 92.7% identity along the 2721 nucleotides overlapping with PDIGDW-F1-R1. As expected, the comparison of nucleotide sequences in the two genomic clones indicated that there is a lower homology between introns (from 89.2%, intron 3, to 91.5%, intron 6), than between exons (from 93.5%, exon 2, to 100%, exon 7). Moreover, the sequence divergence in the exon regions was only due to nucleotide substitutions, whereas that in the intron regions comprised substitutions, insertions and/ or deletions, as also shown by the different sizes of ®ve out of the seven introns (Fig. 1c). Sequences around the exon/intron junctions of the PDIGDW-F1-R1 clone and a list of putative branchpoint sequences were compared with the plant consensus sequences of 5 0 , 3 0 splice sites and putative branch-points determined combining data obtained by Simpson and Filipowicz (1996), Simpson et al. (1996) and by Brown et al. (1996) (Table 1). Sequences ¯anking putative 5 0 and 3 0 splicing sites of the PDIGDW-F1-R1 clone ®tted quite well to the consensus sequence for splice donor and accep-

Table 1 Sequences of exon/intron junctions and putative intron branchpoint in PDIGDW-F1-R1 clone

a

Conserved residues with the plant consensus sequences are shown in bold. Number of nucleotides between adjacent sequences. c Occurrence (%) of the most conserved nucleotides in the consensus sequences are given in subscript; for the 3 0 splice site in the position 26 214 the percentage occurence of t ranged from 45 to 53%. b

M. Ciaf® et al. / Gene 265 (2001) 147±156

tor sites for plant genes (Table 1). Dinucleotide sequences found at the 5 0 - and 3 0 -ends of each intron followed the universal GT-AG rule, but individual introns exhibited some variation around the highly conserved GT-AG dinucleotides, as in general observed for other plant and vertebrate introns (Simpson and Filipowicz, 1996). T was the most abundant nucleotide (50±70%) in seven out of the nine introns of the clone PDIGDW-F1-R1 (Table 1), between nucleotides 25 and 214 in the 3 0 splice site region, where the polypyrimidine tract of vertebrate introns usually occurs. The region directly upstream of the splice acceptor site of introns 5 and 9 was relatively rich, respectively, in A and C (50%), and contained only the 30% of T. This was unexpected since the region is often thymine-rich and seems to be important in branchpoint de®nition and 3 0 splice site selection, as in vertebrate and yeast introns (Baynton et al., 1996). However, a closer inspection of the sequence ¯anking this region indicated the presence of a poly-thymine tract directly upstream, for intron 5, and downstream, for intron 9, of the branchpoint consensus sequences. These two thymine-rich sequences located between the positions 230 and 222 (TTCTGTTCT) of intron 5, and 222 and 214 (TGTTTTCTC) from the 3 0 splice site might be involved in the recognition of the 3 0 border of these two introns. Only recently plant branchpoint sequences have been shown to be important splicing signals and consensus sequences (YTNAN), similar to those of vertebrate and yeast (CTAAC), have been found between 20 and 60 nucleotides of the 3 0 splice site of most plant introns, (Simpson et al., 1996). All PDIGDW-F1-R1 introns matched to the branchpoint plant degenerate consensus sequence in the region 16 (intron 5) to 36 (intron 2) nucleotides upstream of the 3 0 splice site (Table 1). Eight out of the nine introns contained the most frequent plant branchpoint consensus sequence CTNAN, whereas only the intron 7 possessed the alternative consensus sequence TTNAN. It is interesting to notice that the intron sequences reported in Table 1, and in general the nucleotide sequences around the 3 0 and 5 0 splice sites, were highly conserved among the two isolated clones PDIGDW-F1-R1 and PDIGDW-F1-R3, suggesting that these sequences might be involved in some important functions, such as the splicing process.

151

only in six nucleotides which determine three amino acid changes (data not shown). Its overlapping nucleotide sequence was also different from that of PDICDW-F1-R3 (97.2% identity), over the compared 1128 nucleotides of the coding region. Only ®ve out of 32 single nucleotide substitutions, detected between the coding regions of the two durum wheat PDI clones, altered the amino acid sequence, and only two of these amino acid changes (positions 294 and 325) could modify the biochemical characteristics of the protein. The deduced amino acid sequence of PDICDW-F1-R1 exhibited in total between 30 and 40% identity with PDIs from a wide range of evolutionary distant organisms including yeast, Humicola insolens, Drosophila, mouse, rabbit, cattle and human beings. Comparison of its deduced amino acid sequence with that from other plants led to the identi®cation of three potentially different gene subfamilies (Fig. 2). It had a strong identity with four of the eight plant PDI-like sequences reported to-date, ranging from 96.5% (barley, EMBL ac. L33251) to 59.8% (castor bean, U41385), whereas it showed only 33.2, 32.9 and 31.3% identity to the sequences isolated from Arabidopsis (arabidopsis2, AC002535), tobacco (Y11209) and alfalfa (alfalfa2, P38661), respectively. These last three sequences showed more than 75% identity among each other and differed from the former plant PDI subfamily for the shorter size of the encoded proteins (about 360 residues), the absence of potential N-glycosilation sites and the distance between the pair of thioredoxin-like active sites (about 110 amino acids, as opposed to approximately 335 amino acids of the other plant PDIs). Furthermore, although these three proteins contain a potential ER-translocation signal, they lack an ER-retention signal, suggesting that either they may be targeted to a different subcellular location or are retained as part of a heteromeric complex by interaction with subunits which do contain such a signal. A second sequence isolated from Arabidopsis (arabidopsis1, AC003033) has an average of 22 and 26% identity with the amino acid sequences of the ®rst and second subfamily, respectively, suggesting that it is a member of a third

3.3. Comparison of the deduced amino acid sequence of wheat PDI with other plant PDIs The nucleotide sequences of cDNA clones PDICDW-F1R1 and PDICDW-F1-R3 were 1661 and 1128 bp long, respectively. The 1661-bp insert of PDICDW-F1-R1 consisted of 116 bp from the 3 0 untraslated region and 1545-bp of an open reading frame coding for a 515 amino acids polypeptide rich in acidic residues (16%), with an estimated molecular weight of 56.6 kDa and 4.7 pI. Its nucleotide sequence was almost identical to that of the PDI clone isolated from a root-tip cDNA library of T. aestivum (EMBL ac. U11496, Shimoni et al., 1995b), differing

Fig. 2. Phenogram of the plant PDI-like proteins reported to-date. The sequence of arabidopsis 1 (AC003033) was used as root for the phenogram.

152

M. Ciaf® et al. / Gene 265 (2001) 147±156

M. Ciaf® et al. / Gene 265 (2001) 147±156

distinct group of plant PDI-like sequences. The size of the protein is of 440 amino acid residues and the two thioredoxin-like active sites are connected by a polypeptide of 124 amino acid residues. The protein does not contain any potential N-glycosilation site but it is present the C-terminal KDEL signal for ER retention. 3.4. Secondary structure prediction analysis of the protein translated from PDICDW-F1-R1 Analyses of the deduced amino acid sequence of PDICDW-F1-R1 indicated that the primary structure of the protein contained two regions of about 110 residues with internal homology, on this basis, the mature protein was divided into four regions designated a, a 0 , b and c, as for the human PDI. Region a comprised amino acids 14±127 and was homologous (43% identity and 61% similarity) to region a 0 (amino acids 357±466) (Fig. 3a). Near the beginning of the two regions there was a highly conserved stretch of 17 amino acids (residues 35±51 and 379±395), with only two amino acid differences between the counterparts (Val 36, Ile 380 and Ser 48, Lys 91), and the presence of two Cys residues separated by a Gly and His, representing two thioredoxin-like active sites. Regions a and a 0 of wheat PDI showed strong similarities with two segments of the mature human PDI (Fig. 3b,c), which correspond almost exactly to the two domains a and a 0 , whose limits of the folded structures were recently determined experimentally (Darby et al., 1996,1999). More speci®cally, the a region of PDICDW-F1R1 shared 69% similarity (55% identity) with a segment spanning the ®rst N-terminal half of the mature human PDI (amino acids 7±119), whereas the a 0 region exhibited 62% similarity (52% identity) with a human PDI fragment near the C-terminus (residues 357±458). Secondary structure analysis of PDICDW-F1-R1 revealed that the two regions a and a 0 possessed an a/b fold consisting of a central core of a ®ve-stranded b-sheet surrounded by three ahelices (Fig. 3). This structure was very similar to that determined for the a domain of the human PDI (Kemmink et al., 1995,1996) and ®tted also well with the predicted structure of the a 0 domain for which no experimental data are available (Fig. 3C). No signi®cant internal homologies were detected in the region b of PDICDW-F1-R1, although it showed weak but signi®cant similarity (about 45 and 22% identity) with regions corresponding to domains b and b 0 of PDI-like proteins from humans, bovines, chickens and Drosophila. To identify the limits of the putative b and b 0 domains in PDICDW-F1-R1, the region b was aligned with the middle

153

section of the human PDI containing the homologous b and b 0 modules (Darby et al., 1996,1999). Despite considerable divergence in primary structure between the putative b (amino acids 128±226 in Fig. 3d) and b 0 (amino acids 227±356) domains of PDICDW-F1-R1, their predicted secondary structure was remarkably similar (Fig. 3d), and ®tted also well with those predicted for the a and a 0 domains. Furthermore, the analysis of the primary structure of the putative b and b 0 domains of PDICDW-F1-R1, carried out by using the procedures `gonnet 1 predds' and `gonnet 1 predds 1 Mult2' (Fischer and Eisenberg, 1996), identi®ed the thioredoxin-like fold as being the most likely structure adopted by these domains, with the thioredoxin (PDB Id SRX) and human PDI (PDB Id 1MEK) structures showing the highest Z-scores (detailed results not shown). These results were consistent with NMR spectroscopic data on human PDI reported by Kemmink et al. (1997), where the arrangement of the secondary structure elements (b-a-b-ab-a-b-b-a) of the b domain was identical to that found in the a domain of PDI and in the thioredoxin. Except for the KDEL motif, the c region of PDICDW-F1R1 (the last 23 amino acids of the C-terminal region) did not show any signi®cant sequence similarity neither to the Cterminal segment of vertebrate PDI designated c (28 amino acids), nor to other reported sequences. However, the conformation of the predicted secondary structure of the PDICDW-F1-R1 c region was completely a-helical, as that determined for the highly acidic C-terminal segment of vertebrate PDI (data not shown). 3.5. Analysis of PDI mRNA transcribed in different tissues and during caryopsis development To study the expression of the PDI genes, the level of their transcription during plant development was compared with that of another ER-resident protein, the Glucose Regulated Protein 78 (GRP78), also known as the lumenal binding protein (BiP) (Pedrazzini and Vitale, 1996). Total RNAs from different plant tissues were probed, by Northern analysis, with PDI and BiP cDNA sequences (Fig. 4a). For PDI, hybridisation to an RNA species of about 1700 bases was detected; this is consistent with the size of the PDICDW-F1R1 cDNA sequence. The PDI mRNA was constitutively present in all the tissues tested, with very strong expression in immature caryopses (lane 5 in Fig. 4a). The level of PDI transcripts was quite similar in all the remaining tissues examined, although it seemed slightly higher in root tips (lane 1 in Fig. 4a) and in pre-anthesis ¯orets (lane 4) than in coleoptiles and in leaves (lanes 2 and 3 in Fig. 4a, respec-

Fig. 3. Sequence alignment of the two homologous regions a and a 0 within PDICDW-F1-R1 (a) and their comparison with the corresponding a and a 0 domains of the human PDI (b and c, respectively) (Darby et al., 1996,1999). (d) Sequence alignment of the PDICDW-F1-R1 b region with the middle section of the human PDI containing the b and b 0 domains, as determined experimentally by Darby et al. (1996 and 1999). Identical residues are shaded in black, similar residues are bold typed and underlined and non-homologous residues are normal typed. Gaps are indicated by dashed lines. The elements of secondary structure are speci®ed by symbols: open bars indicate residues present in a helices, whereas those delimited by arrowheads indicate residues present in b strands.

154

M. Ciaf® et al. / Gene 265 (2001) 147±156

Fig. 4. Northern blot analyses of PDI and BiP mRNAs in wheat tissues (a) and in developing caryopses (b) collected between 5 and 37 days after anthesis (DPA). (a) 1 ˆ root tips of wheat seedlings; 2 ˆ wheat coleoptiles; 3 ˆ leaves of mature plants 20 days before anthesis; 4 ˆ pre-anthesis ¯orets; 5 ˆ caryopses 5 days after anthesis. Equal amounts of total RNA (10 mg) were loaded in each track, as shown in the gel stained with ethidium bromide.

tively). The transcription levels in adult leaves were approximately similar at all the analysed developmental stages (leaves collected 30 days before anthesis, and 5 and 21 days after anthesis, results not shown). Conversely, mRNA for BiP, the other analysed ER-resident protein, was not preferentially expressed in developing caryopses and its level was slightly higher in pre-anthesis ¯orets and in shoots of young plants than in other tissues. Transcription of PDI was very high in early to mid stages of grain ®lling (Fig. 4b). Conversely, the transcription level of BiP was relatively lower and approximately constant during all the stages of grain ®lling. The PDI mRNA content was already relatively high 5 days after anthesis, it increased notably between 9 and 17 days after anthesis, showed the highest detectable level of expression 13 days after anthesis, then declined. 4. Discussion Two different wheat genomic sequences that shared several structural characteristics with the mammalian PDI genes have been isolated and sequenced. Alignment of the nucleotide sequences of the genomic and cDNA clones indicated that the isolated PDI genes were composed of 10 exons and that they extended over about 3.5 kb. To our knowledge, this is the ®rst detailed report of the complete genomic sequence structure of a plant PDI gene. The PDICDW-F1-R1 sequence represents the wheat homolog of PDI for the following features: (a) the size of the translated protein (515 amino acids), with an estimated molecular weight of 56.6 kDa, and a predicted pI of 4.7; (b) the

presence in the encoded polypeptide of a putative signal sequence at the N-terminus and of the ER retention signal KDEL at the C-terminus; (c) the presence of two thioredoxin-like catalitic sites (CGHC) separated by 340 amino acids, which is typical for PDIs (Freedman et al., 1994). Furthermore, although the deduced amino acid sequence of PDICDW-F1-R1 exhibited an overall identity of only 31% to that of the human PDI, their modular architectures in terms of number, size, location and secondary structurepropensities of the constituent domains are remarkably similar. Sequence homologies, both internally and to the human PDI, indicated that the protein coded by PDICDWF1-R1 is composed of four major regions that correspond almost exactly to the a, b, b 0 and a 0 domains of the human PDI. Secondary structure analysis revealed that the a and a 0 domains of PDICDW-F1-R1, which are homologous both to each other (43% identity) and to thioredoxin, adopted a thioredoxin-like folding. However, it was also observed that both the putative b and b 0 domains of PDICDW-F1R1 had folding patterns very similar to those of the a and a 0 modules, although the extension of sequence identity between the b and b 0 regions was not suf®cient to prove that they are internal repeats; moreover, sequence homology was not detected between them and any thioredoxin or thioredoxin-like domain. Although experimental studies would be necessary to identify the domain boundaries and to determine their structure unambiguously, the proposed multidomain structure of PDICDW-F1-R1 suggests that in plants, like in mammalians, the four PDI domains may derive from partial gene duplication or shuf¯ing from a common thioredoxin ancestral gene, followed by sequence divergence. The four domains probably arose before the appearance of most eukaryotic species, because homologous PDI sequences are present in eukaryotes as diverse as fungi, insects, mammalians and plants (Freedman et al., 1994; Sahrawy et al., 1996; Ferrari et al., 1998). All eukaryotic PDIs have recognisable a and a 0 modules, whereas the putative b and b 0 modules have diverged to such extent within and between species that their homology is doubtful, however the corresponding segments always contain approximately the same number of amino acids and retain some elements of the thioredoxin fold. Both PDI and BiP are components of the machinery that assists in the folding, assembling and sorting of secretory proteins via the ER (Denecke, 1996). Expression analysis of gene sequences coding for these two proteins during plant development in durum wheat showed that the relative levels of PDI and BiP transcripts varied in different tissues. The mRNA for PDI was strongly expressed in developing caryopses, whereas that for BiP was more represented in pre-anthesis ¯orets and in coleoptiles. These results are consistent with those of Coughlan et al. (1996), who showed that PDI mRNA was constitutively present in different castor bean tissues. However its expression in developing seed endosperm was stronger than other ER resident proteins involved in the secretory pathway, such as BiP,

M. Ciaf® et al. / Gene 265 (2001) 147±156

calnexin and calreticulin. Moreover, PDI and BiP showed different expression patterns during wheat grain development. These differences suggest that the transcription of these genes is not subjected to a co-ordinated regulation and that PDI gene expression might be under a regulatory mechanism which differs from those of other ER resident proteins. Previous results suggested that there could be up to four and three enzymatic forms in T. aestivum and T. durum, respectively (Ciaf® et al., 1999). At present it is not known whether all these different gene sequences are functional and differ in the coding regions and/or in the ¯anking sequences involved in their regulation. Sequence differences among wheat PDI genes might affect their functions and the regulation of their expression at the level of timing, tissue speci®city and transcription rates. The comparison of nucleotide and deduced amino acid sequences of the PDICDW-F1-R1 and PDICDW-F1-R3 cDNA clones and of their corresponding genomic sequences, that likely correspond to two out of the three PDI gene sequences identi®ed in durum wheat, suggests that the slight differences detected do not justify a functional differentiation between the two putative gene products. Future studies should focus on the isolation and functional analysis of the promoter regions from different wheat PDI genes in order to elucidate the regulatory mechanism controlling their spatial and temporal speci®c expression. Acknowledgements Research supported by the Italian Ministry of Agriculture, Project: Plant Biotechnology, Research Area 5: Resistance to abiotic stresses (M.D. 48/94). References Baynton, C.E., Potthoff, S.J., McCullough, A.J., Schuler, M.A., 1996. Urich tracts enhance 3 0 -splice-site recognition in plant nuclei. Plant J. 10, 703±711. Brown, J.W.S., Smith, P., Simpson, C.G., 1996. Arabidopsis consensus intron sequences. Plant Mol. Biol. 32, 531±535. Chen, F., Hayes, P.M., 1994. Nucleotide sequence and developmental expression of duplicated genes encoding protein disul®de isomerase in barley (Hordeum vulgare L.). Plant Physiol. 106, 1705±1706. Ciaf®, M., Dominici, L., Tanzarella, O.A., Porceddu, E., 1999. Chromosomal assignment of gene sequences coding for protein disulphide isomerase (PDI) in wheat. Theor. Appl. Genet. 98, 405±410. Colot, V., Bartels, D., Thompson, R., Flavell, R., 1989. Molecular characterization of an active LMW glutenin gene and its relation to wheat and barley prolamin genes. Mol. Gen. Genet. 216, 81±90. Coughlan, S.J., Hastings, C., Winfrey, R.J., 1996. Molecular characterization of plant endoplasmic reticulum: identi®cation of protein disul®deisomerase as the major reticuloplasmin. Eur. J. Biochem. 235, 215±224. Darby, N.J., Kemmink, J., Creighton, E., 1996. Identifying and characterizing a structural domain of protein disul®de isomerse. Biochemistry 35, 10517±10528. Darby, N.J., van Straaten, M., Penka, E., Vincentelli, R., Kemmink, J., 1999. Identifying and characterizing a second structural domain of protein disul®de isomerase. FEBS Lett. 448, 167±172.

155

Deleage, G., Blanchet, C., Geouryon, C., 1997. Protein structure prediction. Implication for biologist. Biochimie 79, 681±686. Denecke, J., 1996. Soluble endoplasmic reticulum resident proteins and their function in protein synthesis and transport. Plant Physiol. Biochem. 34, 197±205. Denecke, J., De Rycke, R., Botterman, J., 1992. Plant and mammalian sorting signals for protein retention in the endoplasmic reticulum contain a conserved epitope. EMBO J. 11, 2345±2355. Edman, J.C., Ellis, L., Blacher, R.W., Roth, R.A., Rutter, W.J., 1985. Sequence of protein disulphide isomerase and implications of its relationship to thioredoxin. Nature 317, 267±270. Felsenstein, J., 1993. PHYLIP (phylogeny inference package) version 3.5c. Department of Genetics, University of Washington, Seattle WA. Ferrari, D.M., Ngyen Van, P., Kratzin, H.D., Soling, H.D., 1998. ERp28, a human endoplasmic-reticulum-lumenal protein, is a member of the protein disul®de isomerase family but lacks a CXXC thioredoxin-box motif. Eur. J. Biochem. 255, 570±579. Ferrari, D.M., Soling, H.D., 1999. The protein disulphide-isomerase family: unravelling a string of folds. Biochem J. 339, 1±10. Fischer, D., Eisenberg, D., 1996. Fold recognition using sequence-derived predictions. Protein Sci. 5, 947±955. Freedman, R.B., Hirst, T.R., Tuite, M.F., 1994. Protein disulphide isomerase: building bridges in protein folding. Trends Biochem. Sci. 19, 331± 336. Grimwade, B., Tatham, A.S., Freedman, R.B., Shewry, P.R., Napier, J.A., 1996. Comparison of the expression patterns of genes coding for wheat gluten proteins and proteins involved in the secretory pathway in developing caryopses of wheat. Plant Mol. Biol. 30, 1067±1073. Hebsgaard, S.M., Korning, P.G., Tolstrup, N., Engelbrecht, J., Rouze, P., Brunak, S., 1996. Splice site prediction in Arabidopsis thaliana premRNA by combining local and global sequence information. Nucleic Acids Res. 24, 3439±3452. Kemmink, J., Darby, N.J., DiJkstra, K., Scheek, R.M., Creighton, T.E., 1995. Nuclear-magnetic-resonance characterization of the N-terminal thioredoxin-like domain of protein disul®de-isomerase. Protein Sci. 4, 2587±2593. Kemmink, J., Darby, N.J., DiJkstra, K., Nigels, M., Creighton, T.E., 1996. Structure determination of the N-terminal thioredoxin-like domain of protein disul®de isomerase using multidimensional heteronuclear 13C/ 15N NMR Spectroscopy. Biochem. 35, 7648±7691. Kemmink, J., Darby, N.J., DiJkstra, K., Nigels, M., Creighton, T.E., 1997. The folding catalyst protein disul®de isomerase is constructed of active and inactive thioredoxin modules. Curr. Biol. 7, 239±245. Li, C.P., Larkins, B.A., 1996. Expression of protein disul®de isomerase is elevated in the endosperm of the maize ¯oury-2 mutant. Plant Mol. Biol. 30, 873±882. Pedrazzini, E., Vitale, A., 1996. The binding protein (BiP) and the synthesis of secretory proteins. Plant Physiol. Biochem. 34, 207±216. Sahrawy, M., Hecht, V., Lopez-Jamarillo, J., Chueca, A., Chartier, Y., Meyer, Y., 1996. Intron position as an evolutionary marker of thioredoxin and thioredoxin domains. J. Mol. Evol. 42, 422±431. Saitou, N., Nei, M., 1987. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406±425. Sambrook, J., Fritsch, E.F., Maniatis, T., 1989. Molecular Cloning: A Laboratory Manual. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY. Sears, E.R., 1966. Nullisomic-tetrasomic combination in hexaploid wheat. In: Riley, R., Lewis, K.R. (Eds.), Chromosome Manipulation and Plant Genetics. Oliver and Boyd, Edinburgh, pp. 29±45. Shimoni, Y., Zhu, X., Levanoy, H., Segal, G., Galili, G., 1995a. Puri®cation, characterization, and intracellular localization of glycosylated protein disul®de isomerase from wheat grains. Plant Physiol. 108, 327±335. Shimoni, Y., Segal, G., Zhu, X., Galili, G., 1995b. Nucleotide sequence of a wheat cDNA encoding protein disul®de isomerase. Plant Physiol. 107, 281. Shorrosh, B.S., Dixon, R.A., 1991. Molecular cloning of a putative plant

156

M. Ciaf® et al. / Gene 265 (2001) 147±156

endomembrane protein resembling vertebrate protein disul®de-isomerase and a phosphatidylinositol-speci®c phospholinase. Proc. Natl. Acad. Sci. USA 88, 10941±10945. Shorrosh, B.S., Dixon, R.A., 1992. Molecular characterization and expression of an alfalfa protein with sequence similarity to mammalian ERp72, a glucose-regulated endoplasmic reticulum protein containing active site sequences of protein disulphide isomerase. Plant J. 2, 51±58. Simpson, G.G., Filipowicz, W., 1996. Splicing of precursors to mRNA in higher plant: mechanism, regulation and sub-nuclear organisation of the spliceosomal machinery. Plant Mol. Biol. 32, 1±41. Simpson, C.G., Clark, G., Davidson, D., Smith, P., Brown, J.W.S., 1996.

Mutation of putative branchpoint consensus sequences in plant introns reduces splicing ef®ciency. Plant J. 9, 369±380. Tatusova, T.A., Madden, T.L., 1999. Blast 2 sequences - a new tool for comparing protein and nucleotide sequences. FEMS Microbiol. Lett. 174, 247±250. Thompson, J.D., Higgins, D.G., Gibson, T.G., 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position speci®c gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673±4680. von-Heijne, G., 1986. A new method for predicting signal sequence cleavage sites. Nucleic Acids Res. 14, 4683±4690.