Molecular and Biochemical Parasitology, 56 (1992) 353-356
353
© 1992 Elsevier Science Publishers B.V. All rights reserved. / 0166-6851/92/$05.00 MOLBIO 01882
Short C o m m u n i c a t i o n
Nucleotide sequence of a Plasmodium falciparum stress protein with similarity to mammalian 78-kDa glucose-regulated protein N i r b h a y K u m a r * and H o n g Z h e n g Department of Immunology and Infectious Diseases, School of Hygiene and Public Health, The Johns Hopkins University, Baltimore, USA Key words: Plasmodium falciparum; Heat shock protein; Stress proteins
Genes for two members of the heat shock protein 70 family have been cloned from Plasmodium falciparum. These include proteins of 75 kDa (Pfhsp) and 72 kDa (Pfgrp), sharing sequence similarity with a eukaryotic heat shock protein of 70 kDa and a glucoseregulated protein of 78 kDa, respectively [1-4]. While barely detectable in the salivary gland sporozoites, these proteins are expressed at elevated levels in parasites undergoing development in liver cells (exoerythrocytic stages) (Kumar et al., submitted for publication). Temperature shift studies in blood-stage parasites of P. falciparum and P. berghei have also shown that the hspT0-1ike protein is heatinducible [5]. These proteins are localized in the nuclear and cytoplasmic compartments of the parasite [5]. In contrast, known stimulators of grp78 expression (glucose deprivation, tunicamycin, 2-deoxyglucose) in mammalian cells [6] and temperature shift had no apparent effect on the expression of Pfgrp in P. falciparum [5]. The mechanism of regulation of Pfgrp expression remains unclear. Correspondence address: Nirbhay Kumar, DIID-SHPH-JHU, 615 N.Wolfe St. Baltimore, MD 21205, USA. Note: Nucleotide sequence data reported in this paper have been submitted to the GenBank T M data base with the accession number L02822.
Abbreviations: HSP, heat shock protein; GRP, glucose-regulated protein; PCR, polymerase chain reaction.
A tetrapeptide sequence at the carboxy terminus of grp78, KDEL in the mammalian proteins and HDEL in the Saccharomyces cerevisiae protein, plays a critical role in the retention of these proteins in the lumen of the endoplasmic reticulum [7]. The grp78-1ike protein in P. falciparum contains SDEL at the carboxy-terminus, and is also localized in the endoplasmic reticulum-like membranous compartment, as revealed by immunoelectron microscopy (ref. 5, and Kumar et al., submitted). Thus the sequence SDEL appears to serve the same function as the KDEL in mammalian cells and HDEL in yeasts. The knowledge of complete sequence, including the promoter region, would facilitate understanding of functions and the regulation of the Pfgrp gene in P. falciparum. Here we describe the complete nucleotide sequence of the P. falciparum (3D7 clone of NF54 isolate) gene encoding Pfgrp. The sequences of a genomic clone previously designated T-114 [1], a cDNA clone, and a genomic clone obtained by inverse PCR and finally by RNA-PCR [8], were compiled to obtain the complete sequence of Pfgrp (Fig. 1). The genomic clone in Fig. 1A (1417-2268 bp, nucleotide positions based on the final complete sequence as shown in Fig. 2) was described earlier [1]. A cDNA clone (Fig. 1B), isolated from a P. falciparum cDNA library in lambda NM 1149 contained the sequence from 689 to the 3' end of the gene.
354 b
(F)
5'-GTAAGTAT . . . .
(G)
1
AT-I-TTTTAG-3'
b
1 1
b
2
3
4
b
I
(D) (E)
: 1
V
r - ' - - ' l
I J
Fig. 1. Cloning and sequencing strategy. (A) represents a genomic fragment (1417-2268; numbers based on complete sequence as in Figure 2) [1]. (B) represents a cDNA clone (starting at 689 to the 3' end of the gene). The clone was isolated from a P. falciparum cDNA library using oligonucleotides based on the genomic fragment (A, above) as probes. The insert was cloned into pUC19 for sequencing by the dideoxy chain termination procedure using a 'Sequenase' kit [10]. (C) represents a BgllI (restriction site at 2152 identified by b) genomic fragment hybridizing to an oligonucleotide 20 (CAGAGTATGAAAGCAACTG, 1904-2012). Bglll digested genomic DNA was diluted to 1 and 2 #g ml - l and circularized by self ligation. Oligonucleotides identified as 1 (AGGTGTCTTAATTCAAGT, 1563-1580) and 16 (GCACTAATTTGTTCAGGAGC, complement of 704-723) were used to amplify the DNA by PCR. The 'inverted PCR' product was blunt-end cloned into the SmaI site of pUC19 for sequence analysis. (D) shows the genomic fragment representing the complete sequence from the translation initiation site to the termination codon and an intron (dark box). (E) represents the coding sequence of the gene after defining the boundaries of the intron based on sequencing the mRNA according to RNA-PCR protocol [8]. Briefly, P. falciparurn RNA was isolated [12] and treated with RNase-free DNase I. After reverse transcription using Moloney murine leukemia virus reverse transcriptase and anti-sense oligonucleotide 31 (GGGTTATAATTCGTCACTATCTAC) based on the known 3'-end sequence of the gene, the first strand cDNA was amplified by PCR using oligonucleotides 250 (ATGAACCAAATTAGGCC, 1-17) corresponding to the Y-end of the coding sequence and 16. (F) shows the sequences at the ends of the intron. (G) shows a gel of the PCR products using genomic DNA (lane 3) and first strand cDNA product (lane 4) from E (above) as template DNA and oligonucleotides 250 and 16 as primers. Lanes 1 and 2 show HindlII-digested lambda DNA and 123 bp ladder DNA size markers respectively. The ethidium bromide stained gel was photographed as a negative image using 'Stratagene' Eagle-Eye.
'Inverted' PCR [9] was then used to obtain the sequence at the Y-end of the gene. Hybridization of BglII digested P. falciparum genomic DNA with an oligonucleotide 20 identified an approximately 2.5 kb fragment (Fig. 1C). The Bg/II-digested P. falciparum DNA was circularized by self-ligation and used as template DNA for PCR. The PCR product containing a major band of expected size ( ~ 1.5 kb) was treated with T4 polymerase and blunt-end cloned into the Smal site of pUC19 for sequencing (Fig. 1D) by the dideoxy chain termination method [10]. The sequence was independently confirmed using the linear amplification DNA sequencing method [11].
Figure 2 shows the complete nucleotide sequence of the gene for Pfgrp including an intron-like non coding region revealed by analysis of the sequence [13] near the 5' end. The ATG codon identified as nucleotide 1, followed by a sequence encoding a long stretch of hydrophobic amino acids, typical of leader sequence, is assigned as the translation initiation codon. Sequence upstream of this ATG (approximately 200 bp, not shown) lacked any open reading frame. To exactly define the boundaries of the intron, RNA from P. falciparum was reverse transcribed and amplified using oligonucleotides 250 and 16 as primers, cloned into pUC19 and sequenced
355 -22 1 121 241
361 481 601 721 841 961 1081 1201 1321 1441 1561 1681 1801 1921 ~041 2161
TT ATCATATATA TAATTCAAAA ~i~CCAAA TTAGGCCATATATTTTACTATTAATTGTTTCCTTATTAAAATTTATAAGT GCCGTTGACTCAAACAGTAA GTATTATATAATTTTAAAGAAAGTATTTATGTTTTTTTAA ~T~AATAAAAAAAAAAAAAA ATATATATATATATATATATGTAATTAAAAAAATCTCTGA TGAAGCATAT TATATTATGTGTAAAAATATATTACACTTATGTATTTATTACACATATAT ATTTATATATTTATAATACACTTATGTATGTATTACACATATGTATGTATTTGAAGTATA TGTAATTATA TATATGTAATATGTGTATATATAAAAAATT TGAATCTTTTATTTTAATTT TTTAGTTGAGGGACCCGTTATTGGTATTGACTTGGGTACCACTTATAGTTGCGTTGGTGTTTTTAAAAAT GGAAGAGTTGAAATATTGAA TAATGAATTAGGTAATCGTATTACCCCATC ATATGTTTCCTTTGTAGATGGAGAAAGGAAAGTTGGTGAGGCAGCTAAATTAGAAGCTAC TGTACATCCTACTCAAACAG TTTTTGATGTAAAGAGATTAATAGGAAGAAAATTTGATGA CCAAGAAGTTGTTAAAGATCGTTCTTTATTACCATATGAAATTGTAAATAATCAAGGCAA ACCAAATATT AAGGTACAAA TAAAGGATAAAGATACTACA TTTGCTCCTGAACAAATTAG TGCTATGGTTTTAGAAAAAA TGAAAGAAATAGCTCAATCATTTTTAGGTAAACCAGTAAA AAATGCAGTT GTTACTGTCCCTGCTTATTTTAATGATGCTCAAAGACAAG CAACAAAAGA TGCTGGTACTATAGCTGGATTGAACATT~TTCGTATTATTATCAATCAAC CAACTGCTGC TGCTTTAGCA TATGCTTTAGATAAGAAAGA AGAGACCAGT ATTTTAGTATACGATTTAGG TGGTGGTACTTTTGATGTTTCTATTCTTGTTATTGACAATGGTGTTTTTGAAGTATATGC TACTGCTGGTAATACTCATTTAGGAGGTGAAGATTTTGATCAAAGAGTTATGGACTATTT TATAAAAATGTTCAAGAAAAAAAACAATAT CGATTTAAGAACTGACAAAA GAGCTATTCA GAAATTAAGA AAAGAAGTTGAAATAGCAAA AAGAAACTTA TCTGTTGTTCACTCAACACA AATCGAAATTGAAGATATAGTTGAAGGACATAATTTTTCTGAAACCTTAACAAGAGCCAA ATTTGAAGAA TTAAATGATGATTTATTTAGAGAAACCTTA GAGCCAGTAAAAAAAGTTTT GGATGATGCTAAATATGAAA AAAGTAAAAT TGATGAAATTGTTTTAGTAGGAGGTTCAACACGTATTCCA AAAATTCAAC AAATTATCAA AGAATTCGAA TTCTTTAATGGTAAAGAACC AAATAGAGGTATAAATCCTGATGAAGCTGTTGCTTATGGTGCTGCTATCCAAGCAGGTAT TATTTTAGGTGAAGAATTACAAGACGTTGTTTTATTAGATGTTACTCCATTAACTTTAGG TATAGAAACTGTGGGTGGTATTATGACACAATTAATTAAAAGAAATACTGTCATCCCAAC CAAAAAATCA CAAACCTTTTCAACATATCAAGATAACCAA CCAGCTGTCTTAATTCAAGT TTTTGAAGGAGAAAGAGCAT TAACCAAAGA TAATCACCTTTTAGGAAAGTTTGAATTATCTGGTATTCCACCAGCACAAA GAGGAGTACCCAAAATTGAAGTTACCTTTACCGTAGACAA AAATGGTATCTTACATGTTGAAGCTGAAGA CAAAGGTACAGGTAAAAGTAGAGGTATAAC TATTACTAAT GACAAAGGTAGATTATCGAAAGAACAAATC GAAAAAATGA TTAATGATGC AGAAAAATTCGCAGATGAAGATAAJU~CTT AAGAGAAAAA GTTGAAGCCAAAAATAAACT TGATAATTAT ATACAGAGTATGAAAGCAACTGTTGAAGATAAAGATAAAT TAGCTGATAA AATCGAAAAAGAAGATAAAA ATACTATCCTTTCAGCTGTTAAAGATGCTGAAGATTGGTT AAATAATAAC TCGAATGCTGATTCTGAAGCATTAAAACAA AAATTAAAAG ATCTTGAAGC TGTATGCCAACCAATCATTGTTAAATTATATGGTCAACCA~GAGGACCTTCACCACAACC TAGTGGAGAC GAAGATGTAGATAGTGACGAA T T A i n T CTTCACAG
Fig. 2. Nucleotide sequence of the Pfgrp gene numbered relative to the putative translation initiation site. The termination site TAA is identified at 2255-2257 by the shaded area. The underlined sequence (77-365) denotes the intron.
(Fig. 1E). The intron sequence (77-365 bp) of 289 bp contains GTAAGTA at the 5'- splice site and TTTAG at the 3' splice site (Fig. IF). The PCR results in Fig. 1G further confirm the presence of the intron. Bands of expected sizes (723 bp from genomic D N A and 434 bp from cDNA) were obtained when amplified by PCR using oligonucleotides 250 and 16. The exonintron boundary follows the GT-AG splicing MNQIRPYILLLIVSLLKFISAVDSNIE
rule [14] and matches exactly the consensus sequence [15]. While the Pfgrp gene contains a single intron, it is interesting to note that the gene for the other member of the hsp70 family in P. falciparum (Pfhsp) does not contain an intron [4]. The coding sequence (1965 bp) can thus code for a protein of 655 amino acids with a predicted molecular mass of 72 693. The
...... GPVI~~GVFKNGRVEILNNELGNRITPSYVSFVD
~.S-V~-M.LLL..~E.DKKEDV.T.V:7:::f~:?:?~:::~::?:~ :.. . . . . . . . .
IA.DQ
.........
A.TP
T -GERKVGE
ATVHPTQTVFDWRLIGRKFDDQEVV
E...LI.D...NQL.SN.EN
aS
LPYEIW.
KpNIKVQI
....
KD
.... A . . . . . . T W N . P S . Q Q . I K F . . F K V . E K K T . . Y . Q . D . G G Q T
TTFAPE
.......
Q I S A M V L E K M K E I A Q S F L G K P V K N A ~ ~ ~ ~ ~ i ~ i ~ ~ ~ K K E - E T
S ILVY ~
~
I LV ID N G V ~ ~ ~ ~ R V M D Y F I KMFKKKNN ID LRTD KRA IQKLRKEVE I . . . . . I Q'~V'~'~'~'~'""~'~-~"-"~'~---.EH.--LY-..TGK-V-K.N--V .... R . . . K
....
..............
....
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
~ I I K E F E F F N G K E P N R G I ~ ~ ~ A G I I L G E E L Q - D ~ ~ t~i~QLIKRNTVIp • ..LV...--. ..... S .............. V...VLS.DQDTG.L...H.C ........... V..K..PS...V. TKKSQTFSTYQDNQPAVLIQVFEGERALTKDNHLLGKFELSGIPPAQR~~VDKNGILHVEAEDKGTGK • . N . . I . . . A S . . . . T . T . K . Y .... P . . . . . . . . . T . D . T ..... P.:~:~:::~::~::~:::?~::~:~:~:::~EI.V .... R . T . . . . . . . N SRGITITNDKGRLSKEQIEKMINDAEKFADEDKNLREKVEAKNKLDNYIQSMKATVEDKDKLADKIEKEDKNTILS KNK ...... QN..TP.E..R.V ....... E...K.K.RIDTR.E.ES.AY.L.NQIG..E..GG.LSS...E.MEK AVKDAEDWLNNNSNADSEALKQKLKDLEAVCQPIIVKLYGQPGGPSpQPSGDEDVDS-DEL ..EEKIE..ESHQD..I.DF.A.K.E..EIV .... S .... S A . P . - - - . T . E . . T A E K . . .
Fig. 3. Comparison of the amino acid sequence encoded by the P~rp open reading ~ame (1965 bp; upper row) with that of human grp78 (lower row). Sequen~s were aligned using the PRTALN program included in the NlH-Molecular biology progams. Computer generated insertions are indicated by dashes. Amino add ~sidues in the human grp78 that are identical to those of P~rp have been replaced by dots. Shaded areas rep~sent the eight domains conserved in heat-inducible proteins of several sp~ies [13]. An arrow in the human grp78 sequence identifies the cleavage site ~ r the signal pepfide pre~nt at the amino terminus.
356
coding and intron sequences contain 68.9% and 87.6% A/T respectively, typical of P. falciparum genes [15]. The deduced amino acid sequence of the entire open reading frame is shown in Fig. 3. Comparison of the sequence of Pfgrp with that of human grp78 revealed 64% (72% including conservative changes) similarity. The protein is very hydrophilic, with only a few hydrophobic regions (data not shown). Included in the regions of similarity are eight domains (shaded sequences in Fig. 3) that are 80% conserved among heat- inducible proteins of human, Xenopus, Drosophila, yeast, and E. coli [16]. The signal sequence at the amino terminus and the canonical (S)DEL sequence at the carboxy terminus presumably facilitate translocation and retention of these proteins in the endoplasmic reticulum.
Acknowledgements The authors thank Pichart Uparanukraw for helpful discussions. These studies were supported by research grants from the NIH (AI24704 and AI31589) and John D. and Catherine T. MacArthur foundation.
References 1 Kumar, N., Syin, C., Carter, R., Quakyi, I. and Miller, L.H. (1988) Plasmodium falciparum gene encoding a protein similar in sequence to the 78 kDa rat glucoseregulated stress protein. Proc. Natl. Acad. Sci. USA 85, 6277-6281. 2 Aredeshir, F., Flint, J.E., Richman, S. and Reese, R.T. (1987) A 75 kDa merozoite surface protein of Plasmodium faleiparum which is related to the 70 kDa heat-shock protein. EMBO J. 6, 493~,99. 3 Bianco, A.E., Favaloro, J.M., Burkot, T.R., Culvenor,
J.G., Crewther, P.E., Brown, G.V., Anders, R., Coppel, R.L. and Kemp, D.J. (1986) A repetitive antigen of Plasmodium falciparum that is homologous to heat shock protein 70 of Drosophila melanogaster. Proc. Natl. Acad. Sci. USA 83, 8713-8717. 4 Yang, Y-F., Tan-ariya, P., Sharma, Y.D. and Kilejian, A. (1987) The primary structure of a Plasmodium falciparum polypeptide related to heat shock proteins. Mol. Biochem. Parasitol. 26, 61-68. 5 Kumar, N., Koski, G., Harada, M., Aikawa, M. and Zheng, H. (1991) Induction and localization of Plasmodium falciparum stress proteins related to the heat shock protein 70 family. Mol. Biochem. Parasitol. 48, 47 58. 6 Lee, A.S. (1987) Coordinated regulation of a set of genes by glucose and calcium ionophores in mammalian cells. Trends Biochem. Sci. 12, 21~23. 7 Pelham, H.R.B. (1989) Control of protein exit from the endoplasmic reticulum. Annu. Rev. Cell Biol. 5, 1 23. 8 Kawasaki, E.S. and Robinson, I.B. (1989) PCR technology: Principles and Application of DNA Amplification (Ehrlich, H.A. ed.), pp. 89 97, Stockton, New York. 9 Triglia, T., Peterson, M.G. and Kemp, D.J. (1988) A procedure for in vitro amplification of DNA segments that lie outside the boundaries of known sequences. Nucleic Acids Res. 16, 8186. 10 Sanger, F., Nicklen, S. and Coulson, A.K. (1977) DNA sequencing with chain-terminatinginhibitor. Proc. Natl. Acad. Sci. USA 74, 5463 5467. 11 Murray, V. (1989) Improved double stranded DNA sequencing using the linear polymerase chain reaction. Nucleic Acids Res. 17, 8889. 12 Chomczynski, P. and Sacchi, N. (1987) Single-step method of RNA isolation by acid guanidinium thiocyanate-phenol-chloroform extraction. Anal. Biochem. 162, 156159. 13 Mount, D. (1985) Computer analysis of sequence, structure and function of biological macromolecules. Biotechniques. 3, 102 112. 14 Breathnach, R., Benoist, C., O'Hare, K., Gannon, F. and Chambon, P. (1978) Ovalbumin gene: Evidence for a leader sequence in mRNA and DNA sequence at the exon-intron boundaries. Proc. Natl. Acad. Sci. USA 75, 4853~4857. 15 Weber, J.L. (1988) Molecular biology of malaria parasite. Exp. Parasitol. 66, 143-170. 16 Ting, J. and Lee, A.S. (1988) Human gene encoding the 78 000-dalton glucose-regulated protein and its pseudogene: Structure, Conservation, and Regulation. DNA 7, 275 286.