Molecular and Biochemical Parasitology, 24 (1987) 289-294
289
Elsevier MBP 00834
Circumsporozoite gene of a Plasmodium falciparum strain from Thailand Hernando
A . d e l P o r t i l l o , R u t h S. N u s s e n z w e i g a n d V i n c e n z o E n e a
Department of Medical and Molecular Parasitology, New York University Medical Center, New York, NY, U.S.A. (Received 22 December 1986; accepted 15 April 1987)
The nucleotide and deduced amino acid sequences of the CS gene of a Plasmodium falciparum strain from Thailand (T4) are presented. Comparison with the nucleotide sequences of two other P. falciparum CS genes, 7G8 from Brazil and WeUcome from West Africa, shows that: (1) the coding regions outside the repeats of T4 and 7G8 are co-extensive and lack 30 nucleotides present in the Wellcome strain 5' to the repeats; (2) in this region, T4 also differs at 3 nucleotide positions from the 7G8 and the Wellcome strains; (3) in the region 3' to the repeats, T4 differs at two positions from 7G8 and at two other positions from the Wellcome strain - remarkably, all of these differences result in amino acid substitutions; (4) the structure of the tandem repeats in the CS gene of T4 is, 5' to 3', [NANP-NVDP] x 3, [NANP] x 38, which is different from that of the two other strains. Due to the use of synonymous codons, the repetition of the sequence is more precise at the amino acid level than at the nucleotide level. These features contrast with those observed in the CS genes of other plasmodial species. Key words: P. falciparum; Circumsporozoite gene; DNA sequence
Introduction
The circumsporozoite (CS) gene of Plasmodium falciparum encodes a protein containing a central portion composed of tandemly repeated sequences, two charged regions (I and II), found to be highly conserved among the CS genes of other plasmodial species, and putative anchoring and leader sequences [1-5]. Significant polymorphism of this gene has been detected among different geographical isolates [6-8]. However, all appear to bear similar central repetitive sequences [2,6,8]. In contrast, the CS repeats of other plasmodial species are frequently strain specific [5,9,10]. To gain an insight into why the relationship between polymorphism and antigenic diversity seems to be different in the CS gene of P. falciparum, it will be necessary to colCorrespondence address: Dr. Hernando A. del Portillo, Department of Medical and Molecular Parasitology, New York University Medical Center, 341 East 25th Street, New York, NY 10010, U.S.A. Abbreviations: CS, circumsporozoite; bp, base pair; SDS, sodium dodecyl sulfate; SSC, sodium saline citrate.
lect and collate the nucleotide sequences of several polymorphs. Here we present the complete nucleotide sequence of the CS gene of a Thailand isolate, T4, of P. falciparum and compare it with the CS genes of two other isolates. Materials and Methods
Parasite DNA. Parasites used in this study were obtained from Thailand, and a strain, T4, maintained in continuous in vitro culture as described [11]. Sal I digests of bloodstream parasite DNA obtained as described [12], were size fractionated on a 10--20% sucrose density gradient and the fraction hybridizing to radiolabelled p277-19 (a cDNA clone encoding the CS repeats [13]) was ligated to preannealed Sal I arms of lambda EMBL4 and packaged using gigapack (VCS). The library was developed in Escherichea coli LE 392 and screened with p277-19 as described [4]. All DNA protocols followed standard methodologies [14]. Radiolabelled nucleotides were obtained from Amersham and enzymes from New England Biolabs and Bethesda Research Laboratoties.
0166-6851/87/$03.50 (~) 1987 Elsevier Science Publishers B.V. (Biomedical Division)
290
Oligodeoxynucleotides. Two 40 bases-long oligodeoxynucleotides, corresponding exactly to the first 40 bp and the last 40 bp of the coding region of the CS gene of a Brazilian isolate of P. falciparum [2], were a gift from Chiron Corp., Emeryville, CA. The 40-mers were end-labelled (specific activity 107 counts mill -l ~g-a DNA) and used in standard hybridizations of Southern blots (6 x sodium saline citrate (SSC), 50°C, 2 x 105 counts min -1 m1-1) [14], followed by 3 washes in 0.2 x SSC, 0.1% sodium dodecyl sulfate (SDS) at 37°C for 2 h. DNA sequencing. Five reactions (G, A + G , C+T, C and A > C ) using chemical cleavage were used to determine the sequence of radiolabelled fragments in pUC 19 [15]. The sequence of both strands was determined for the entire coding and non-coding regions presented. Results and Discussion
The phage genomic library of T4 was screened with the CS-specific cDNA clone p277-19 [13]. Positive clones and their subclones were mapped with p277-19 and with two oligomers representing 40 nucleotides of the 5' and 3' coding regions of the 7G8 CS gene [2]. An outline of the mapping and subcloning strategy is presented in Fig. 1. The complete nucleotide and deduced amino acid sequences of the CS gene of T4 are presented in Fig. 2. The sequence includes all the characteristic features found in other CS genes: putative leader (bp 80-120) and anchoring (bp 1291-1354) sequences, charged regions I (bp
398-442) and II (bp 1144-1183), flanking a central portion of tandem repeats (bp 449-976), in addition to AT-rich non-coding regions [1-5]. Remarkably, the regions outside the repeats showed no silent nucleotide substitutions among these strains. The T4 sequence differs at 4 amino acid positions from either 7G8 or the Wellcome strains. The nucleotide substitutions in the region 5' to the repeats are close to a region which is quasi-repeated in some strains [7,8]. Thus, as far as the non-repetitive regions are concerned, these strains show an unusual relationship, i.e., nearly identical homology at the nucleotide level and at the amino acid level. Two similar cases involve the rabbit [3-globin sequence discussed by Perler et al. [16], and two allelic forms of the C k genes [17]. In contrast, the central repetitive region displays numerous single nucleotide substitutions both within and between genes. A precise alignment of the repetitive sequences with respect to the 7G8 or Wellcome strains cannot be made since they differ in the number of repeats (T4=44, 7G8=41, Wellcome=46). Nonetheless, the first 6 repeats in the T4 and 7G8 sequences are made up of alternating NANP-NVDP, i.e., of three 8amino acid repeats. This is reminiscent of the CS repeats of P. cynomolgi (Ceylon) which are composed of two different sequences, one a subset of the other [5]. Also worthy of note is that one particular NANP-coding sequence (AAT.GCA. AAC.CCA) is by far the most abundant in all of these strains (Fig. 3). The central repetitive sequences of the CS gene of P. falciparum appear to differ strikingly from analogous regions of the CS gene of P. berghei [4], P. knowlesi [9], and P. cynomolgi [5]. In those 0
p007-2
l
~-
-
~,~
9[
• Ill
=1 4
I
I
200bp
°
z t,,) 0,)(oe,~ .~.. =l OI
I
~_
=1=13 I l l
J: 4"
'
|
p O 0 7 - 21
p007-10.1
,
w. 4
p007-I0.1
pO07-1.1
4
D p007__10.1"~m
T I
q
~' p 0 0 7 - 1 0 . m
~
I IIJ
Fig. 1. Restriction map and sequencing strategy of p007-2 containing the CS gene from T4. Only three subclones were produced to obtain the sequence of the entire coding and non-coding regions presented. The sequence of the repeats was obtained from p007-10,1 by labelling at the BstN1 site 5' to the repeats with polynucleotide kinase or Klenow fragment, and at the T t h l l l I site 3' to the repeats with polynucleotide kinase or "1"4DNA polymerase.
ATATTATAAATI3CAATr C
. M
M
% R
K
~ L
A
~ I
L
~ S
V
~ S S
F
L
F
V
E
A
L
F
Q
E
Y
Q
C
Y
G
S
S
S
N
R
V
L
N
E
L
N
Y
GAT AAT CKI% GGC ACT AAT TI~ ~ T D N A G T N L Y
~
T
~
~
79
139
AAT GAA TG% G~A A~G AAT ~ T "N E L E M N ¥
~%T GGG A~A (~G C~A Y G K Q E
259
(,,*} AAT TGG T~T AGT C'IT AAA AAA AAT ~ T N W Y S L K K N S
AAT AAT GGA CaAT A A T ~ (~ N N G D NI(GMtR
¢~ £
~ R
GGT AAA GAT C ~ G K D E
C,%C A ~ C ~ D N E -~
(~ M
GAT C ~
~
TC~ CIT GC,~ G~A AAT GAT GAT ~ A S L O E N 0 D G
.x ~
~
(~T AAA ~ D K g
AAA A~A T ~ K K L
AAT AAT N N
319
A~ N
GAA "E
379
GGG GAT GOT RAT G D G N
439
AAT
499
G~T G ~ D G
AAG ( ~ O ~ K Q P
A~ g
~ c o m *xr a m m ~
c = ,x~ x c ~ c o m ~xr ~ _ ; g _
~
~
~
~
c9
~
559
A N P N V D P ~ A N P N A N P N A N P ~ ~ ~ ~ ~ ~ ~ ~ A N P N A N P ~ A N P N A N P N A N P ~
619
~
679
~
~
~
~
~
~
A N P N A N P ~ A N P N A N P N A N P ~ ~
~
~
~
~
~
~
739
A N P N A N P N A N P N A N P N A N P ~ 799
A N P N A N P N A N P ~ A N P N A N P ~
A
N
P
N
~
A
N
~
P
N
A
~
N
P
~
N
A
N
~
P
N
A
~
N
P
~
~
919
A N P N A N P N A N P N A N P H A N P ~ ~
~
A
N
~
P
N
~
A
~
N
~
P
~
N
A
~
N
~
P
~
N
979
~
A
N
~
P
N
A
~
N
P
N
~
1039
K N N Q G N G Q G H N N P N D P N R N V D E N A N A N N A V K N N N N E E P S D
K
H
I
~
O
Y
L
K
K
D
E
L
I
~1~
, e P
K
D
Y
E
~ N
V
V
S I T E W S P
g S
N
N
TT
•
G
TTg
N
D
I
E
K
K
I
C
N
S
~ S I
G
~ L
I
M
V
T
K
M
E
K
V
L
S
F
m9
(Q) ~
~ C
L
S
F
S
V
L
F
N
~
~
1339
-
Fig. 2. Nucleotide and deduced amino acid sequences of the CS gene of T4. Nucleotide substitutions in the coding regions outside the repeats (underlined) between "1"4and 7G8 are boxed, and those between T4 and the We,come strain are indicated by asterisks. Amino acid substitutions are enclosed in parenthesis. (°*°)= region 5' to the repeats containing 30 more nucleotides in the Wellcome strain.
292
cases the maintenance and evolution of the CS repeats is best explained by a mechanism operating primarily at the D N A level [4,5,10]. The repeats of the CS genes of P. falciparum display a large number of mostly silent nucleotide substitutions and hence a relatively well conserved amino acid sequence. This situation certainly seems one in which the repetitiveness of the T4
AAT
C-CA
AAC
CCA
7C~
sequence is maintained primarily at the protein level rather than at the D N A level. There are no obvious differences between the CS genes of P. falciparum and those of other plasmodial species to account for these contrasting patterns. In fact, the repetitive sequences of the P. falciparum CS gene are typical of other CS repetitive sequences both in terms of amino acid we!'ozme
1,5,7-21,23
1,5,7-14,18
5-!5,22-23
26-28,36,41
20,21,24,25
30,35,44,45
42
33,38,39
35,40
23,32,37
21,29,34,39,43
22,24,31,43
15,28,40
40
3
3
1,3
24
T
T
C
C
C
T
30,25,38,44
16,17,19,27
25-27,32,37
29,30,35,41
41,46
C
32,33
C
29,37
26,34
24,31,36
34,39
31,36
28,33,38,42
4,6
4,6
2,4
2
2
T
T
GT
T
GT
C
T
GT
T
22
20
Fig. 3. Sequence variation within the central repetitive regions of the CS genes of three P. falciparum strains: T4 (Thailand), 7G8 (Brazil), Wellcome (West Africa). The numbers under the strains indicate the position of a repeat of a given sequence in a given strain. Repeats are numbered progressively in a 5' to 3' direction and the longest continuous arrays are underlined. The top panel
contains the NANP-coding repeats and the bottom one the NVDP-coding repeats.
293
composition and of codon usage and there are no reasons to believe that the function of this CS protein should be different. Thus, one should not exclude the CS gene of P. falciparum from the evolutionary context of other plasmodial CS genes. We have no clues to explain why the relationship between the extent of overall polymorphism and the occurrence of antigenic diversity should be so different among plasmodial species. Further sequence analysis will be required to reconcile this contrast and to gain insights into the mechanism responsible for the evolution and maintenance of parasites genes coding for proteins with repetitive sequences [18-23].
Acknowledgement We thank Dr. Daniel Eichinger for helpful discussions, Dr. Phillip Barr for kindly supplying the oligonucleotide probes, Ms. Sharon Robles and Mrs. Laurie Weinstein for excellent technical assistance and Mrs. Beatrice Robles for typing the manuscript. Supported by the agency for International Development DPE 0453-A-00-5012-00, the MacArthur Foundation and the UNDP/World Bank/WHO Special Programme for Research and Training in Tropical Diseases. H.A.P. also received funding to support his salary under the World Health Organization grant number T16/181/M2/21 (L) and V.E. is a recipient of the Irma T. Hirschl Trust Fellowship.
References 1 0 s a k i , L.S., Svec, P., Nussenzweig, R.S., Nussenzweig, V. and Godson, G. N. (1983) Structure of the Plasmodium knowlesi gene coding for the circumsporozoite protein. Cell 34, 815-822. 2 Dame, J.B., Williams, J.L., McCutchan, T.F., Weber, J.L., Wirtz, R.A., Hockmeyer, W.T., Maloy, W.L., Haynes, J.D., Schneider, I., Roberts, D., Sanders, G.S., Reddy, E.P., Diggs, C.L. and Miller, L.H. (1984) Structure of the gene encoding the immunodominant surface antigen on the sporozoite of the human malaria parasite Plasmodium falciparum. Science 225,593-599. 3 Arnot, D.E., Barnwell, J.W., Tam, J.P., Nussenzweig, V., Nussenzweig, R.S. and Enea, V. (1985) Circumsporozoite protein of Plasmodium vivax: gene cloning and characterization of the immunodominant epitope. Science 230, 815-818.
4 Eichinger, D J . , Arnot, D.E., Tam, J.P., Nussenzweig, V. and Enea, V. (1986) The circumsporozoite protein of Plasmodium berghei: gene cloning and identification of the immunodominant epitopes. J. Mol. Cell. Biol. 6, 3965--3972. 5 Galinski, M.R., Arnot, D.E., Cochrane, A.H., Barnwell, J.W., Nussenzweig, R.S. and Enea, V. (1987) The circumsporozoite gene of the Plasmodium cynomolgi complex. Cell 48,311-319. 6 Weber, J.L. and Hockmeyer, W.T. (1985) Structure of the circumsporozoite gene in 18 strains of Plasmodium falciparum. Mol. Biochem. Parasitol. 15,305-316. 7 de la Cruz, V.F. and McCutchan, T.F. (1986) Heterogeneity at the 5' end of the circumsporozoite protein gene of Plasmodium falciparum is due to a previously undescribed repeat sequence. Nucleic Acids Res. 14, 4635. 8 Lockyer, M.J. and Schwarz, R.T. (1987) Strain variation in the circumsporozoite protein gene of Plasmodium falciparum. Mol. Biochem. Parasitol. 22, 101-108. 9 Sharma, S., Svec, P., Mitchell, G.H. and Godson, G.N. (1985) Diversity of circumsporozoite antigen genes from two strains of the malarial parasite Plasmodium knowlesi. Science 229, 779-782. 10 Enea, V., Galinski, M.R., Schmidt, E., Gwadz, R. and Nussenzweig, R.S. (1986) Evolutionary profile of the circumsporozoite gene of the Plasmodium cynomolgi complex. J. Mol, Biol. 188, 721-726. 11 Trager, W. and Jensen, J.B. (1976) Human malarial parasites in continuous culture. Science 193,673--675. 12 Ferreira, A., Enea, V., Morimoto, T. and Nussenzweig, V. (1986) Infectivity of Plasmodium berghei sporozoites measured with a DNA probe. Mol. Biochem. Parasitol. 19, 103-109. 13 Enea, V., Ellis, J., Zavala, F., Arnot, D.E., Asavanich, A., Masuda, A., Quakyi, I. and Nussenzweig, R.S. (1984) DNA cloning of Plasmodium falciparum circumsporozoite gene: amino acid sequence of repetitive epitope. Science 225,628-630. 14 Maniatis, T., Fritsch, E.F. and Sambrook, J. (1982) Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY. 15 Maxam, A.M. and Gilbert, W. (1980) Sequencing end labeled DNA with base-specific chemical cleavages. Methods Enzymol. 65,499-560. 16 Perler, F., Efstratiadis, A., Lomedico, P., Gilbert, W., Kolodner, R. and Dodyson, J. (1980) The evolution of genes: The chicken preproinsulin gene. Cell 20, 555-566 17 Sheppard, H.W. and Gutman, G.A. (1981) Allelic forms of rat k chain genes: evidence for strong selection at the level of nucleotide sequence. Proc. Natl. Acad. Sci. USA 78, 7064-7068. 18 Cowman, A.F., Saint, R.B., Coppel, R.L., Brown, G.W., Anders, R.F. and Kemp, D.J. (1985) Conserved sequences flank variable tandem repeats in two S-antigen genes of Plasmodium falciparum. Cell 40, 775-783. 19 Ravetch, J.V., Feder, R., Pavlonec, A. and BIobel, G. (1984) Primary structure and genomic organization of the histidine-rich protein of the malaria parasite Plasmodium lophurae. Nature 312, 616--620.
294 20 Bobek, L., Rekosh, D.M., van Keulen, H. and Loverde, P.T. (1986) Characterization of a female-specific cDNA derived from a developmentally regulated mRNA in the human blood fluke Schistosoma mansoni. Proc. Natl. Acad. Sci. USA 83, 5544-5548. 21 Peterson, D.S., Wrightsman, R.A. and Manning, J.E. (1986) Cloning of a major surface-antigen gene of Trypanosoma cruzi and identification of a nonapeptide repeat. Nature 322, 566-568.
22 Kochan, J., Perkins, M. and Ravetch, J.V. (1986) A tandemly repeated sequence determines the binding domain for an erythrocyte receptor binding protein of P. falciparum. Cell 44,689-696. 23 Bianco, A.E., Favoloro, J.M., Burrot, T.R., Culvenor, J.G., Crewther, P.E., Brown, G.V., Anders, P.F., Coppel, R.L. and Kemp, D.J. (1986) A repetitive antigen of Plasmodium falciparum that is homologous to heat shock protein 70 of Drosophila melanogaster. Proc. Natl. Acad. Sci. USA 83, 8713-8717.