J. MoF. Riol. (1982) 157, 527-546
Complete Amino Acid Sequence of a Variant Glycoprotein (VSG 117) from Trypanosoma GIWFFREY
ALIJX.
Lvss
1’. GI’RXETT .~NI) C:EOK(:E: A. M. CROSS
Ikpartnwnt
Langley (Received
of Immunochem.istry
Wellcome Research Laboratories Court. Beckenham, Kent BR33BS.
14 Ncwembw
Surface brucei
1981, and in rrvised form
U.K.
lb’ February
1982)
The amino acid sequence of a variant surface glycoprotein (VSU 1Ii) of ~‘T~~YPCIWOSOW~~~ br?Lcei has been determined by manual sequencing of tryptic. staphylococcal protease and cyanogen bromide peptides and fragments derived from these peptides. Some overlaps needed for completion of the sequence were deduced from the nucleotide sequence of complementary DNA derived from messenger RNA coding for VSG 117. The glycoprotein consists of 470 amino acid residues wit,h two carbohydrate chains attached at Asn420 and Asp470. No pronounced hydrophobic regions. which are characteristic of many membrane proteins. are present in t.he isolated glycoprotein, and the carboxy-terminal region. which is close to the membran?. is remarkably hydrophilic. These observations indicate that the molecule probably does not penetrate the lipid bilayer of the plasma membrane. The high proportion of charged residues in the carboxyt,erminal region is more consistent with electrostatic interaction with the polar head groups of the phospholipids. 1.
Introduction
African trypanosomes, of which Trypanosorna brucei is the most extensively studied species, are able to evade the mammalian host’s immune system by the sequential expression of genes specifying variant surface coat glycoproteins. These variant glycoproteins are immunologically distinct on intact cells, but cross-reacting determinant,s are present on some (Barbet 8r McGuire, 1978). but not all (Cross, 1979) of the purified glycoproteins. The cross-reacting antigenic determinants have been located on the carbohydrate groups of glycopeptides at t,he carboxy-termini of the glycoproteins (Holder & Cross, 1981). Variant surface glycoproteins have been purified from several clones of T. brwei and characterized. They are of comparable molecular weight (Mr = 65,000) but differ widely in isoelectric point. peptide composition, amino acid composition (Cross, 1975,1977), carbohydrate composition (Johnson & Cross, 1977) and Nterminal amino acid sequence (Bridgen et a.l., 1976). The carbohydrate was shown to be attached in the (J-terminal region of some VSGst (Johnson & Cross. 1979). apparently at the membrane/protein interface (Cross 8t Johnson, 1976). t Al~brrrititions
used. I%(;.
variant
surface
glycwprot.ein
: cl)SA.
romplrmentary
DNA.
.i27 noz~-11836/8~/1.~~~2’i-20
$03.00/O
1’, 1982 Academic
Press Inv.
(I,ondon)
Ltd
,528
c:. ALLES.
I.. P. GI~RNETT ANI) (4. :I. hf. (‘ROSS
The ability of salivarian trypanosomes to express sequentially a large numbrt (perhaps several hundred) of related, but, different, surface coat proteins present.s a fascinating field of study, and one t,hat is of major importance in t.he potential control of trypanosomiasis by vaccination. We have determined the comp1et.e sequence of one member of this class of protein, VSG 117 : the nucleotide sequence of the complementary DNA derived from the messenger RNA specifying this protein, which is reported in the accompanying paper (Boothroyd et 01.. 1982). wa.s used to provide evidence for overlapping peptides at seven positions, otherwise the evidence presented was obtained entirely independently and the resuks from the parallel investigations are in complete agreement. Details of the methods used to purify each peptide. including chromat.ogral)hi(. elution profiles and pept,ide maps. and Tables of amino acid analyses of peptides are given in a Supplement?.
2. Materials and Methods (:uanidinium ehloridr and urea were “AristaR” grade from KDH. 4-S. ,V’-dimethylaminoazobenzenA’-isothiocyanate was from Fluka. and phenyl isothiocyanate and .Y. .Ydimethylall~lamine were from Pierce : other reagents were of analytical grade. I’yridine Mas redistilled from ninhydrin. Iodo-2-[‘4C]acetic acid was from Amersham International ; it was diluted to about 2000 cts min-’ nmol-’ before use’. Fisofluor-1 liquid scintillation cocktail (Fisons) was used in a Packa.rd liquid-scintillation counter. model PLI) Tri-Carl).
Antigenic: variants of a cloned line of T. hrwri strain 427. were cloned as drscrilwd I)) Cross (1975). VSC 117 was purified essentially as described (Cross. 1975.1977) from Molteno Institute trypanozoon antigen. type MiTat 1.4. subclone 117. serologically equivalent to clone 049 (Cross. 1977). .4 total of 220 mg (about 4 pmol) of the purified glycoprotrin was used in the present work.
VS(: 1li was reduced with dithiothreitol and carboxymethylated with iodo-2-l “C]acrtatr in a solution of 6 M-guanidinium chloride, 1 mwEDTA and @l M-Tris. adjusted to pH 8.3 with HCI. Details of the method were as described elsewhere (Allen. 1981). The product had 12.4 [ 14CJcarboxymethyl groups per molecule.
The freeze-dried and carboxymethylated VSG 117 (55 mg) was suspended in I % nil ot 64 mwNH,HCO,. 14 PM-CaCl, at 37’(‘. and TPCK-trypsin (Worthington Riochemicals) (055 mg) was added. The suspension was stirred. and became clear after 5 min. Digestion was continued for 110 min. and the solution was applied directly to a column of Sephadex (:50 (superfine) (Fig. 1). A second trvptic digest. of 16 mg VSG 117. was prepared similarly. hut, with a protein/trypsin weight ratio of 1 : 62, and a total digestion time of 3 h. t The Supplement (69 pages) with the same title as the paper. has brrn deposited with thr Hritlsh IArar~ 1,ending Division. Boston Spa. Wetherby. West Yorkshire M237BQ. England. and copies mlt> Iw obtained from them \>,Yquoting No. SL’I’40013.
7’. hrucei (d) Tryptic
digestion
VARIANT
GLYCOPROTEIN
of reduced. S-cnrboxymethy2ated
,520
SEQV’ENCE an,d succinylated
VSG I17
A sample of reduced and S-carboxpmethylated VW 11’7 (72 mg) in 43 M-guanidinium chloride. 0.04 M-Tris. HCI (pH 83). was treated with an equal weight of succinic anhydride. added gradually over 30 min with rapid stirring; 2 M-NaOH was added during the reaction to maintain pH &Of05 The mixture was dialysed exhaustively against water, then against 50 mM-NH,HCO,/@Ol% (v/v) thiodiglycol, and freeze-dried. The protein was dissolved in 2.5 ml 40 PM-CaCl,, pH 8.3 (adjusted with NH,) and digested with @7 mg trypsin at 37°C for 1 h. A further 0.7 mg t,rypsin was added and digestion was continued for 1 h. The digest was separated in 2 batches on a column of Sephadex G-75 (superfine). (e) Staphylococcus
aureus (V8)
protease digestion
of L’S%” 117
Reduced. S-1 “C]carboxymethylated VSG 117 (48 mg) was dissolved in 11.5 ml @l MNH,HCO, ; 20 mM-EDTA (005 ml) was added. S. wuwus (V8) protease (050 mg) (Miles Riochemicals) was added. and the solution was incubated at 37°C for 4 h with stirring. Digestion was stopped by freeze-drying. The initial fractionation of peptides was by gel filtration. (f) Cyowgrn
bromide
cleavxzgu of L’AYJ 117
VS(: 117 (10 mg) was dissolved in 72”/, (v/v) formic acid (@5 ml). A solution of CNHr (IO mg) in 72% formic acid (01 ml) was added, and the solution was kept under N2 at 20°C for 48 h in the dark. The mixture was dried in WXXO,dissolved in 45% (v/v) formic acid. and applied to a column (@9 cm x 59 cm) of Sephadex GlOO (superfine) in 4.5y0 formic acid. Fractions of @4 ml were collected and the A,,, of the eluate was monitored. A second digest, of the reduced and S-carboxymethylated protein (19 mg) was prepared. using 40 mg CNRr in the presence of 3.4 mg L-tr.vptophan at 23°C for 24 h. This digest was freeze-dried and peptides were initially separated on a column of Sephadex G-75 (superfine) (1 cm x 198 cm) in 50 mM-NH,HCO,. (g) Isolation
of peptides
Peptides were purified by suitable combinations of successive chromatographic and electrophoretic methods. Inltlal fractionation was bg gel filtration. Larger peptides (of more than about I5 residues) were then separated by ion-exchange chromatography on columns generally of DEAE-cellulose while smaller peptides were separated by 2-dimensional thinlayer peptide mapping (Heiland et al.. 1976) or by ion-exchange chromatography on Aminex 5OW x4 (RioRad) resin columns. Peptides were detected on thin layers with dilute fluorescamine (Vandekerkhove & Van Montagu, 1974) and in column fractions by ultraviolet light absorbance at low (215 to 225 nm) and high (280 nm) wavelengths, or by thin-layer chromatography and ninhydrin staining, as appropriate. scintillation counting or by Radiolabelled peptides were detected by liquid radioautography. Glycopeptides were detected in column fractions by the phenol/sulphuric acid method (Dubois et nl., 1956). with @l of the reagent volumes. Final purification if required was by a further chromatographic or electrophoretic step. Thin-layer chromatography was performed on silica gel G (Macherey-Nagel) in butanol/acetic acid/water/pyridine (15 : 3 : 12 : 10. by vol.) unless otherwise stated. Detailed information about purification methods for each peptide is contained in the Supplement. Purity of prptides was investigated by amino-terminal analysis using the dansyl chloride method (Bruton C Hartley, 1970). by amino acid analysis, and by studying their chromatographic and electrophoretic hehaviour. (h) Amino
acid seque’nce determination
of prptidrs
Amino acid analyses of peptides were performed on 1 to 5 nmol using a Rank-Hilger Chromaspek analpser with a fluorescence attachment. for detection using o-phthalaldehyde/
(:. Al,I,EiY.
530
I,. P. (:VKNETT
ASI)
(:. .4. AI. (‘K,OSS
mercaptoethanol. Smaller method, slightly modified azobenzene isothiocganate
peptides were sequenced using either the micro-‘dans?-I-E:dnlall’ from that of Hruton & Hartley (1970). or thr dimrthylaminomethod (Chang ~1 01.. 1978). modified hy reduction in reaction volumes. Full experimental details were as described elsewhere (Allen. 1981). Some pept,ides \vere sequenced hp both procedures. Larger peptides (or more than 15 to 20 residues) ww digested further. generally with thermolgsin but, also with chymotypsin or trypsin where, followed IIV t,hill-layclr, appropriate. The suhfragmenb were purified 1)~ gel filtration chromatography or peptide mapping. or 11.7;peptlde mapping alone. and their structures were determined as described above. (:lutamic and aspartic acids were distinguished from their rrspwtiw atnidw t4thc.l method. or from ttlc, using the dinleth~laminoazol)t~~~zen~ isothioc~~anatt~ directly. clectrophoretir mobilities of small peptides on thill-layer cellnlosr MN-500 (Jlac,ht~rr~-Sayc.1~ at pH 65. or both. It was found that mobilities relative to that of aspartic acaid ww in good
agreement with those predicted for paprr ttlrctrophoresis (Offord. 1966). except for H sitlgltl pair of histidine-containing prptides TTa and T711 (sets MOW). Tr>-ptophan residurs \vet‘~~ the dimeth~laminoazob~~~~zenc isot,hiocyanak determined directly using test (set’ ;Illen. Tryptophan was identified in small pept,ides 1)~. t.he Ehrlich spectrophotometrically. and placed in sequent as a pap in a peptide sec~utww
completely
determined
were identified
by the ‘dans+Edman’
method. [ 15(‘I~:arl)oxgn~c~th~latrtl rtlsiduc,s
hg release of radioactivit,y
using the dimethylaminoazobenzenr
method. 1081 ) 01’ other\\ iw
from peptides and 1)~ t bin-layer isothiocyanate method.
c~hromatopr:rl)h?
3. Results Tryptic. S. u~rcus (V8) protease and (‘NBr digests yielded pepfides covering t,hr whole length of VSG 117, and these pept’ides are numbered according to tjhr digestion method (T. SP. CB. respectively) and their order in the complet’r sequence. Overlapping peptides, the products of inromplet~e digestion. are labellt~tl. e.g. T( 1 12). Subfragments are similarly numbered. wit,h thermolvsin-d~~rirt,tl peptides labelled Th and chymotrypt,it, pept,ides (01. l+ptides from the t,ryptk digest of the succinylated protein. and derived mainly by cleavage at arginine residues. are labelled by R, followed by t,heir position in the complete sequenc*r specified by residue numbers. The methods of purification of the various peptides and Tables of amino acid analyses of peptides are given in the Supplement. The separat,ion of t,ryptic peptides from VSG 117 by gel filtration is presented as an example in Figure 1. A IargSr portion of the peptide ultraviolet, light absorbance and the radioactivity. and in the tryptic digest of the suc4nylatetl Sall the carbohydrate. [ 14C]carboxymethylated VSG 117 was eluted from a Sephadex Cl-75 column at. 01’ volume. Sodium dodecyl sulphatc/polyacrvlami(~~~ gel close to. the void electrophoresis using the method of Chua 8r Bennoun (1975) with ILW),;, acrylan~itle showed a doublet of M, approximakly 43.000. compared with A’, 70,000 for thcb succinylated VSG 117.
Amino acid analyses and thin-layer elertrophoretic mobilities of purified peptides (Supplement) were in all but a few (‘asps in agreement with t.lrfl compositions calculated from t,heir sequences. A few discrepancies in amino ac.id analyses were due to the presence of impurities, incomplete hydrolysis (e.g. of Val Ile bonds) or partial destruction (e.g. of Ser). The analyses of pept,ides T(18.19).
1’. brucei
0
VARIANT
GLYCOPROTEIN
531
SEQlTENCE
I 40 T-A
T-B
T-C
80 T-D T-E
120 T-F
T-G
T-H
Fraction number T-I Pool
FIN;. 1. Separation of tryptic peptides from VSG 117 by gel liltration. The tryptic digest (5.5 mg) was chromatographed on a column (11 mm x 1960 mm) of Sephadex C&50 (superfine) in 50 miw NH,HCO,/O.O1O/b (v/v) thiodiglycol. The flow rate was 6 ml/h and fractions of 1.5 ml were collected. by liquid scintillation counting Fractions were assayed by ultraviolet light absorbance (-4i,‘,“. -), of 20.~1 samples (14C cts/min. - - - ) and for carbohydrate by the phenol/sulphuric acid reaction using 5O-fil samples (A,,,’ cm. . ..). Fractions were pooled as indicated at the base of the Figure and the pools wem freeze-dried. X’-Dinitrophenyl-lysine added immediately following the sample emerged at fraction 146 (arrow).
T18b, T(l&lQ)a, T22d, T22e, T(54,55,56), SY3, SP3’, SY16, CBl and CB3 were not in satisfactory agreement with the proposed sequence. However, subfragments of these peptides and peptides covering the same parts of the sequence, but derived from other digests. had analyses that agreed with the proposed complete sequence. The discrepancies, which were not great, were presumably due to impurities. The determination of the amino acid sequences of tryptic peptides is described in Figure 2. Peptide T7 was isolated in two forms that were indistinguishable by amino acid analysis and sequencing, but that had different electrophoretic mobilities at pH 6.5. The reason for this peculiarity was not determined, although the isolation in the same narrow peak from the cation-exchange resin column suggests that it might be due to an artefact arising after the purification of the peptide. Peptide T32 was isolated in low yield apparently with N-terminal glycine in addition to the N-terminal asparagine found at this position in peptides T(32,33) and SP16 and predicted from the nucleotide sequence of cDNA. Whether this is the result of a mutational event (which seems unlikely since it would involve two base changes) or due to contamination of the peptide with glycine was not resolved. Complete sequences of the long tryptic peptides T(18,19), T21 and T22 were determined from thermolysin fragments and non-specific trypsin digestion products aligned by partial identity with sequences in these fragments ; remaining gaps in the sequences were determined from peptides isolated from other digests. as
532 Peptide l-1 TZ T3 T4 T5
“al-Rla-Glyi;ly-“~l-leu-Thr-Lys
T6
77777777 La-I‘ys
ma T7b T8, T9
7 Ser-His-Ile lSer)Tyr-Arg 777 7'--7 Ser-His-Ile-Se+-Tyr-kg 7 ---/.7 7 --'I see
T4
%P
r'ASp
to.29
ICI.26
“;hsP
to 14
Free
lvsine
TlO T(9.10)
Tll I”12 T13 T14 T15 T16 l-17 T(18.19)
ThernlYSln
T-19
Neutral
T20 T21 T22 t-23 T24 3-25 l-26 T27 T28 T29
T30 T31 I”32a T33 T(32.33) T34 T35 Tf34,35) T3% T36 T37 T38 T38a T38b
framts peptide
used
T39 T(38.391 T(40.41) T42 T43 T(42.43) T44 T45 T46 7x7 T48 TC47,481 T49 TM T51 T52 TV T54 T(53.54)
Ala-Gly-Glu-Lys -++-++ His
mpsp @Cl
Glycosylated
QP Mixtare
777-7 TC54.55.56)
(-)-(-)~ly-Val-Ser-Val'Ihr-Gln-lhr-Gln-(-)-... -.) . . .. "', . ..) . ..) ..:,
T56
CYS (LYsl
T57
my-Lys 77 Le”-Gl~~-ojS -?-++--+7
T58
..-)
. ..) ..-)
Sam
ASX
to.44
of peptides.,
giving
pxx
results
as T44
(LYS)
T(57.581
Gly-Lys-reu-clx-~x~Ih+)cys-Lys 7777-
T6.2
Glx-Ser-AsxqsLys
T(60.611
-7---i-7Glx-Ser-Asx~~-Lys(-)~;lx-(-)-(-)-Ala-(-l ---, 7 --I 7 7 -->
77 c-tennina
---7
earlier
sequenoe
work
hlam
(Holder
fnnl
6 Cross,
1981)
Flc: 2. S~~~uenws determined from trpptic peptidcs of VSG 117. Amino acid residues were determined rxither by using the “dansyl-Edman” method ( - ) or by t,hr dimethylaminobenzrne isothiocyanatr ( l)ARl’l’(‘) method ( 2). or both (+). If there was any doubt about the identification of a residue. either I~~ause of low yield or the presence of impurities. this is indicated by a broken underlining (- ?_ -‘). tlwidws identified 1)~ amino acid analysis but not hy sequential degradation are enclosed in pawnthwrs. (:ysteinr residues were ident,ified as [ “C]carboxvmethylcvsteine Esidues. mA,p. ~,ll~c,t~r)phorrti(: mobility at pH 6.5 relative to that of aspartic acid ; a negat’ive sign indicates migration t,owards thr, anode
shown in Figures 3 to 5. Thermolysin digestions of 90 to 110 nmol of each peptide were performed in @l JVI-NH,HCO, for 70 minutes at 45°C with a weight r&o of enzyme to substrate of l/30. The fragments were separated on a column of Sephadex (250 (superfine) in 50 mM-NH,HCOJ and followed by thin-layer chromatography on cellulose MN-300 in n-butanollacetic acid/water/pyridine (1.5 : 3 : 12 : 10. by vol.) or, where resolution was insufficient, by thin-layer peptide mapping, Amino acid analyses of the peptides are given in the Supplement and sequence determinations are shown in Figures 3 to Fi. Of the peptides from the tryptic digest of succinylated WC: 117, the only sequence giving unique information for the determination of the total sequence was that of p@ide R(39-49): Lvs-Lvs-Leu-Glx-( 44-1-1
-)-Met-Glx-Thr-Lys-Leu-( 7----T--
- ).
5%:. 3. Determination of the srqurncr of peptidr (TlX.19). The sequenw was reconstructed from thr sequencrs of tryptic peptides from VW 117 derived by nowspwifir cleavage and from thcrmolysirr fragments (T(18.19) ‘Thl-9) of peptide T(18.19). and was confirmed 1)~ sequences determintd in staphylococcal protease and CNRr peptides. Thea symholn used are desrrilwd in the legend to Fig. 2.
lb0 165 Vsl-~a-Aspser-cly-P~~~~~-Val-PmGly-
170
was detwminrd from peptide TBI its FM:. 4. Sequence of peptide T21 from VSCZ 117. Th e sequrnw thrrmolysin fragments and from sequences in fragments of staphylococcal protease peptide SPl%. Symbols as in Fig. 2.
T. brucei
VARIANT
GLYCOPROTEIK
35
YEQI’ENCE
----
‘7
I T22a -
I 77-7
- _ -~
- - -, T22b
-.-, --~,T22.ms
-. ., -
.-‘,
-. -,
---, 1-‘-,‘~~--i+-+-+ ..-, T22.mio
-
‘-7 7
17
-
-7
T22.mu
T22d "7
--7
“-I
‘-7
..-,-
---7
..-,..T22b
7
I I
*22e
T22.rn12
‘1’-7 T22.Th13
T22.'IW4
FIG:. 5. Sequence of prptide T22 of VBG 117. The sequence was reconstructed from srqurnces determined in the whole peptide. from tryptic peptides derived by non-specific cleavage of WC: 117. and from t,hermolysin fragments of peptide T22, and was confirmed by sequencing CSBr peptides CHR and C‘R6 of VSG 117. TprtlO was determiwd by difference analysis.
The lysine residues were determined as succinylated derivatives. Some of t,he t,ryptic peptides from t,he succinylated protein were digested with thermolysin and the limited information from this set of peptides is given in the Supplement. This information was useful confirmatory evidence for the sequence of residues 1 to 75. The det,ermination of the sequences of staphylococcal protease peptides is described in Figure 6. The 117-residue peptide SPl2 (20 nmol) was digested with trypsin and peptides were isolated from thin-layer peptide maps. Peptides T23 (SP12.T3), T24 (SP12.T4), T25 (SP12.T5). T26 (SP12.T6). T27 (SP12.T7) and T28 (SP12.T8) were identified by N-terminal and amino acid analyses (see Supplement). Peptide SPl2.Tl was identified as the C-terminal portion of peptide T21. These tryptic peptides were overlapped by chymotryptic peptides as described in Figure 7. Peptide SP12 (90 nmol) was digested with TLCK-treated n-chymotrypsin (Kostka & Carpenter, 1964) (l/100, w/w) in I.5 M-urea, 50 mM-NH,HC03 at 37°C for two hours. The chymotryptic peptides were purified by gel filtration on a column of Sephadex G-50 (superfine) (1 cm x 195 cm) in 50 mM-NH,HC03, O.Ol”/b (v/v) thiodiglycol, followed by thin-layer peptide mapping of pooled peaks. Amino acid analyses of the isolated peptides are given in the Supplement. The order of the peptides T23 and T24 and confirmation of the sequence were obtained from CNBr peptides, as shown in Figure 7. Peptide SP( 10,ll) (25 nmol) was digested with trypsin and peptides were isolated by thin-layer peptide mapping. Peptide SP(lO,ll)Ta was identified as peptide T19, SP(lO,lI)Tb as T20, SP(lO,ll)Tc as the C-terminal portion of peptide T(18,19) and
SPl SP2
SPCL2) SP3
SP4 SP5 SP6 97 SP7.T3 SW3
27 residues
SP9
iCys,Asx,Thr4,Sc5,Gl.x,Gly3,
ALa4.Val,~t,L.eu4, LO residues
910
Tyr,His)
(Cys,Asx3,Thr,GLx2,ao,GLY6'
Ala.Leu,Hls,LYs2,Rrgl SPll
33 residues
sPw,111
(Cys,Asx5,Thr,Ser,GL4,Pro2,
Gly8,Ala,,reu,Pk,His,Lys2,Ary) see
SP12
Fly.
7
SP13 SP14
h
o.”
19 residues
SP15
lAsx,Thr2,Ser,GLu5.Pro,Gly,
Ala4,"al,Ile,Ie",Lys, 26 residues
sP16
(As~,Thr2,Glx2,pro,Gly2,
Val,Ile2,~u2,~.P~,Lys4,~~ SP17 SP(16,17)
Leu-Ala-Ma--Lys%lrWe-Lys(Glx) 7-7 --1--17--r-Lys-~~~-Php(-)-(-)-ASX-ASX-... --7--T-77--7
SP18
Leu-Gly-Tm-val-Tlr-~-Ala-Glu --7----7~--7--7--77
SP19
I.ewa-(-j-&u-... ---,
36 residues 7
---7
(A.y,
Thr4,S.x,Glx3,Pro,G1y3,
Ala*,Val,Ile2.Leu3,~,P~,LYs6,~); cryptic
fragments
also
partL311y
sequencg,
confirming
-0.39 SP 15 residues
(As~,Thr,Ser,Glx~,VaI,lruj,
--,
W3'LYS2' Sequence
SP19b
within
T(38,39);
sex) SPXB SPZl
SP22 SP23 SP24
Acidic
peptide,
from peptide
map
SPl23,241 SP25 SF'(24.25)
7 --,---r--r--,7ser-Glx-(-wsx-Lys-Ilcd%Sr-G~~ -7
(-I-Lys-qs-Asc--I --, --1
(-)
FII:. 6.
14 residues
isolated
(O/s2.~2.Ser2,~Lu4,~,~le,~ys2)
arid
structure
T. brucei
VARIANT
GLYCOPROTEIN
sP26
AZ&-Lys-Ik-Cys-SC-Tq-Hiis -+-a-+-b-s-.‘--,
SP27
val-Lys-ua~l~lu ---T-y-
sP28
Lys-His-Cys-Lys-PheAsx(Ser)Thr-Lys-Ala-L~lx ‘7 7 7 --77 --r --7-Y--77
Glycosylatipeptiae.
Lys+ay-val-*val-... --7----T Lyp val-serval-Thr-cl~~-~~ly~~~ -A- -+--b-b+++-+----'(-,-AlaiIhr~~(-)-l-)-l-)~~... .._L __> 2 --h
15 residues
sP29 SP(29.30)
~a~~~-~Lys~s-Lys~ly-Lys-Leu-Glx --+-+-+~7-77~--777
SPX
Asp-Thr-cys-Lys-Lys
SP32 se33 SPC32,33)
(Lys)Glx 7 %
SPXI
537
REQI’ENCE
O’O
tryptic
26
k7quemeconfirma
frm
fragmnts
residues
('RK,,SS,G~X~,G~~~,A~,V~~~,L~S) Cys,Asx,Thr6,Sex,Glx4,Glyq,Ala2,
va12.1eu,LYsq)
Glymsylatea
peptide
Glymsylated
peptide,
12 residues
(Cys,,Asx,,
Ser, Glr, WL~s~,hp)
FIG:. 6. Sequenors of staphylococcal protease peptides from VSC: 117. Symbols are as in Fig. 2. with. in addition. Lie = Lru or Ilc (which are not easily distinguished using the DARITC method).
peptide SP(lO,ll)Td as the N-terminal portion of peptide T21. This information sufficed, together with peptides SPlO and SPll, to complete the sequence of residues 126 to 158. Peptide SP(16,17) was similarly digested with trypsin and the analyses and partial sequences of the tryptic fragments confirmed the overlap of tryptic peptides T(32,33) and T(34.35). The determination of the N-terminal sequences of the CNBr peptides is shown in Figure 8. (h) Alignment of peptides and the complete sequence The complete sequence of VSG 117 is shown in Figure 9, together with the locations of the peptides from which the complete sequence was constructed. The N-terminal sequence of 35 residues has been determined (Bridgen et al., 1976). One difference from the earlier result was found : Gly25 rather than the Thr25 reported earlier. The complete sequence of residues 1 to 303 was determined from the peptide sequences described in this paper. However, the carboxyl-terminal region of the protein, contained within the large CNBr fragment, CB8, is rich in glutamic acid and lysine residues, and at seven positions (Glu302/Lys303, Lys337/Glu338, Glu368/Lys369, Lys408/Glu409, Glu414/Lys415, Lys425/Glu426 and Lys457/ Glu458) unambiguous ordering of peptides was not possible, since cleavage by trypsin occurred at all these lysine residues and by staphylococcal protease at all these glutamic acid residues. However, the nucleotide sequence of cDNA coding for the C-t,erminal portion (corresponding to amino acid residues 374-470 together with a C-terminal extension) was determined during the course of this work (Boothroyd et ai., 1980) and only the first three of these overlaps remained to be confirmed. The peptide sequences (303-337) and (338-369) can only be placed in one order, but reliance on single-residue overlaps in a protein of this size is not satisfactory, and for supporting evidence we rely upon the nucleotide sequence of the cDNA coding for the whole of VSG 117 described in the accompanying paper
538
I 771
(:.
. - ., _. -,
ALLEN.
I,.
I’.
(:I’ILNET’I’
ANI)
(;
A.
M
(‘R.OSS
i
. - ., - . ., SP12.?h(1,2)
.--/m+ SP12.013
sPl2.ch4
7-T---,---‘/
I SP12.&;7--Y
-I----SP12.ch7
fragment~s of FIG:. 7. S~quenceofpeptideSI’12frorr~ WC: 117. Th e sequence \I as dcductYi from tryptic SP12. identified as known tryptic peptides from VSC: 117. overlapped 1)~ vhymotryptic fragment.s of SP12. and completed from the sequences of CNRr pept,ides from \‘S(: 115. Symlwls itw as in Fig 2.
(Boothroyd et al., 1982). The identification of Glu441 from the peptide sequences was also unsatisfactory, relying on the specificity of staphylococcal protease. The sequence (142-470) is devoid of arginine residues, in agreement with the observation of an M, 43,000 fragment after tryptic digestion of the succinylated protein. The amino acid composition of the protein, calculated from the sequence, is in excellent agreement with the amino acid analysis determined previously (Cross, 1975), as shown in Table 1. The calculated molecular weight of the protein portion is 50,207, and that of the glycoprotein is 53,156, using the values from carbohydrate analysis determined by Holder & Cross (1981).
4. Discussion The entire sequence of VSG 117 was represented in the sets of tryptic, staphylococcal protease and CNBr peptides. The common problems in protein sequence determinations, such as cyclization of N-terminal glutamine residues under acidic conditions and incomplete digestion by trypsin at lysine residues adjacent to acidic residues were encountered. Purification of tryptic peptides containing peptide T55 was not achieved, and residue Glu441 was identified only from the amino acid analyses of peptides SP29 and SP(29,30), and from the cleavage specificity of staphylococcal protease. However, this part of the sequence was already known from the nucleotide sequence of cDNA coding for the Cterminus of VSG 117 (Boothroyd et al., 1980). Overlaps at seven positions in the Cterminal region were also identified from the cDNA sequence (Boothroyd et al., 1980,1982). The two investigations were performed concurrently (although the protein sequence analysis was at an advanced stage before nucleotide sequences were determined) and interchange of information was found to be useful. In a few positions, sequences that had been only tentatively identified were subsequently confirmed or corrected in the light of knowledge of nucleotide sequences. Similarly, the initial interpretation of nucleotide sequencing results was occasionally aided by foreknowledge of peptide sequences, There is complete agreement between the
SP12
SP12
7’. hrztc~i
VARIANT
205 Gly-~~ln-ThrLys-Pr~~~-Phe-Gly~~ly~t-~u~-Il~ly
GLYCOPROTEIS 215
210
541
SIXQITENCE 220 au-Thr-~~ln-~-IleGly-leu-ly-~u-Lys-
225
SP12
SP12
265
255 260 Ala-Gln-As~~t-Lys-AL~-~~~~~~-~"-Lys-val-~~~~~u-~"-~a-v~l-~"-val-~a-~r-~~luT27
T28 ,
-
270
275
T29 l29a
T29b I
SP12
SP12
cm
cB8
315
305 310 ~l~~s-Esn-Rvrn/rPhe-Gly~-~-~-~"-L~-~~l"-P~-~~~~Ly~-I~~Ly~-Gly~~-~"T31 sp15
IC I
320 T33
T32
*' T34 I
T(32.33)
I'
325 "
sP16 1
T35 TO4.35) SP16 SPU6.17)
SP(16.17)
cE8
cm
330 IIPVal-~~~-Rla-Rla~ly-~~Ly~~l*-~"~ly-Ser-?hrT,,
T36
345
'I
SP17
SP18
b
365
355 360 Leu-leu-Ser-~rr~~~-~-v~l-~-Ly~~l~~l~~-Ly~-Ly~-~~~~l~~Ly~-Il~-~-L~-~~~l~~I T38 ' T39
'6i8' ,e
T(38,39)
T41
"
T(40,41)
I' 1
--II
SP23 I
T44
'I
T46 I
"
T43 'I2
395
T45
400 -
T47
T(46,47,48) I
SP25
SF'24 "
P
" T(42,431
SP21
390 380 385 Glu-Leu-Rla-AspGln-Lys-Gly-Lys-~~~l~-~r~lu~s-~-Lys-Ile-Ser-Glu-Glu-~Lys~s-~~~l~T43
T42
”
I
SP2ca
SP19b
375
370
se20
SPlY 1
SP19
I
SP(16,17)
T(42.43)
350
T37
.-
4
TC34.35) L
SP16
340
335
I
SP(24,25)
CBS
cm
FIG:. 9.
FII:. 9. Complete amino acid sequrncr of VSG I Ii. The srqu~nw was rwonstrurtrd hy alignment of tryptic. staphylococcal protease and CNBr peptides. with some information from previouslg published determinations of the N-terminal sequence (Bridgen rl ol.. 1976) and the C-terminal tryptic glycopeptide (Holder & Cross. 1981). and from tryptic prptidrs from the succin?lated protein. Alignment at 7 points in the C-terminal region (marked *) was deduced from t.htxnnrk*otidr sequencr of rDNA (Boothroyd el 01.. 1980.1982). and confirmation of some wsidurs. a,hioh wcw only trntativrl~ identified by protein sequence analysis. was also derived from thv nucltwtide wcluencr (see text). Residues Asn420 and Asp470 bear gl~cosyl groups.
Amino
acid analysis
of variant glycoprotein 049 and the composition calculated ,from the sequencr Ammo a&i wmpoxitir~n of v&M: 1 17 calrulatrd from tttr sequence (residupsimoleculr)
Residue
Amino acid analysisi(residues/molecule)
Asx
‘II?3
40:
Thr SW
II+4 27.1
44 28
(:1x Pro (:I)
63.4 1Pa
fwg 12
402 X+X
II .5x
17.3 I 1.x 64 13.1 402 0.7
1’ lb 7 I;, 34 IO
Ala \'a1
()3 Met Ilr I,eu T,Vr I’hr His I+2 Arg Trp
of 1;S(i 117’
8.4
x
94 A)’ -- -i 5.7 3.1 I/
9 57 .i 5
corrected for a sequence of 470 residues : Asp23. AsnI7: 8 Glu40. Gln22; and 1)Trp detrrminrd sI~r(~trl)phc~t~)mrtricaIl~ (Edrlhoch. 1967) in t,hr presrnt \vork. t Cross (1977);
2'. brtccri VARIANT
(:LYCOPROTEIN
SEQI'ESCE
543
results of the two investigations. As is generally found, the speed of DNA sequence analysis was several times greater than that of protein sequence analysis. Protease cleavage specificities were generally as expected. Prolonged treatment; with trypsin led to cleavage at TyrlO6, His192, Tyr198, Tyr210, Leu213, Leu267, Tyr234 and Tyr236. Cleavage by staphylococcal protease (V8) occurred at all glutamic acid residues except those at positions 156,293,294,315,394 and 395, and t’wo of these (Glu315 and Glu395) were in Glu-Pro sequences. Partial cleavage was observed at the C-terminal side of Ser33, Asp328 and Ser353. The two tryptic glycopeptides in VSG 117 (T52 and T61) have been described previously (Holder & Cross, 1981). The carbohydrate chain attached to the Cterminal residue. predicted to be aspartic acid from the nucleotide sequence of cDNA (Boothroyd et al.. 1980). carries the cross-reacting antigenic determinants (Holder & Cross, 1981). A probable secondary structure prediction has been made by Dr N. M. Green (National Institute for Medical Research, Mill Hill, London), using the method of Geisow & Roberts (1980), based on Chou & Fasman (1978) parameters. Regions l-8. 38-47, 64-77, 81-95, 98-103, 246-252, 275-286 and 361-378 are predicted to form 1-helices (19%). Regions 48-52, 196-201, 212-216, 262-273, 295-298, 317320, 324-328, 351-359 and 429-436 are predicted to form p-sheet (120/,). b-Turns are predicted for residues 12-14, 54-57, 114-117, 123-141, 152-154, 160-165, 170178, 190-196,205-209,241-243,289-292,303-313,333-337,381-390,395-402 and 458-470 (24y0). Some other regions are weakly predicted to form s-helices (144148, 155-159, 406-416 and 424-427). The remainder is not predicted to form any particular structure, and the amount of structure predicted is overall rather low. The calculated molecular weight of the glycoprotein, 53,156, is smaller than that estimated by sodium dodecyl sulphate/polyacrylamide gel electrophoresis (J!Ir 65,000) (Cross. 1975), but glycoproteins commonly behave anomalously. The distribution of acidic and basic residues along the polypeptide chain has been studied. There is a stretch of 33 residues (8-40) containing ten basic but no acidic residues, but otherwise there is a fairly even distribution of acidic and basic residues along t’he chain. With a total of 61 acidic and 71 basic (including 9 histidine) residues, the expected slightly alkaline isoelectric point is consistent with that found (PI 7.6) (Cross. 1975). Thr total sequence shown in Figure 9 is remarkable for a membrane protein : for its lack of the hydrophobic s&etches of amino acid residues characteristic of many integral membrane proteins, such as glycophorin (Tomita et al., 1978), cytochrome 115(0~01s & Gerard, 1977). n-alanine carboxypeptidase (Waxman & St,rominger, 1981). (la’+ -ATYase of sarcoplasmic reticulum (Allen et al.. 1980), influenza virus 1975) and proteins of the major haemagglutinin (Skehel & Waterfield, histocompatibility complex of mice (Shimada 8: Nathenson: 1969) and men (Springer 8r Strominger. 1976). Short stretches of hydrophobic sequence, including 1VLet-(‘ys-Ala-Leu-Phe (181-185) and -Leu-Leu-Ala-Val-Leu-Val-Ala(266-272) are present. in VSG 117, but these are not untypical of soluble proteins, and did not give rise to the problems of aggregation or insolubility t’hat are characteristic of lipid bilayer-associated peptides. The hydrophilic nature of the C-terminal 112 residues is particularly marked, and
.544
(:. ALLES.
I,. I’. GIIRNETT
ANI)
(;. -4. 11, (‘ROSS
this region contains the membrane attachment site (Cross & Johnson, 1976). This portion is very rich in lysine (24 residues) and glutamic acid (17). but poor in t’he hydrophobic residues proline (2), valine (3), methionine (0). isoleucine (3), leucine (3), tyrosine (0), phenylalanine (1) and tryptophan (2). There are, however. eight half-cystine residues involved in disulphide bonds with each other. Studies on t,he location of the disulphide bonds (G. Allen & I,. P. Gurnett. unpublished data) are in progress. This portion of the protein is unusually resistant to proteolysis if the disulphide bonds are intact, and this indicates a t.ightly folded structure. The distribution of half-cystine residues is also striking: the five half-cystine residues nearest the C-terminus are in -Cys-Lys- sequences. There is a single free cysteine (Cys244) in the molecule. Cleavage of native VSG 117 by trypsin yields several (‘-terminal glycopeptide fragments (Johnson 8: Cross, 1979). The amino acid composition of a mixtjurr of these is in reasonable agreement with the composition of residues 373 to 470. suggesting that a peptide bond, or bonds, around Lys372 is exposed and susceptible t,o cleavage by trypsin. However, the fragments were not purified, and a definite identification of the cleavage point is not possible. How the mature VSG 117 remains attached to the surface membrane of t,he trypomastigote is still an open question. The hydrophilic C-terminal region is unlikely to be even partially embedded within the lipid bilayer. The lack of aggregation of peptides from VSG 117 indicates strongly that, there is no hydrophobic prosthetic group, such as a lipid, as has been found with rat brain Thy1 glycoprotein (Campbell et al., 1981). Escherichia coli murein 1ipoprot)ein (Hantke & Braun, 1973) and glycoprotein of vesicular stomat,itis virus (Schmidt &, Schlesinger, 1979). Gas chromatographic analysis after saponification of VS(: 1 Ii revealed no detectable fatty acids (A. A. Holder $ P. Overath. personal communication). The lack of distinctive hydrophobic structure in VSG 117 is consistent with the ready removal of the VSG from the membrane, wit,hout detergent disruption of the lipid bilayer. The nucleotide sequence of mRK.4 for VSG 117 predicts that* both a hydrophobic K-terminal “signal sequencr~’ (Boothroyd et al., 1981 ) and a hydrophobic C-terminal extension (Boothroyd rt al.. 1980) are present in the nascent protein. As discussed previously (Boothroyd et al., 1980) it is possible that the C-terminal hydrophobic “tail” is involved in binding the glycoprotein to the membrane and that the rapid release is concomitant wit’h a proteolytic event. However, a more plausible hypothesis is that this “tail” serves as an additional “signal peptide” during processing and intracellular transport t,o the plasma membrane of the VSG. and that bindinK of the mature glycoprotein t,o the membrane is by specific electrostatic interaction with the polar head groups of’ the phospholipids, consistent wit,h the model proposed by Cross (197X). The rigid structure imposed by the disulphide bonds and the peculiar. repeating (‘ys-Lys sequences may be involved in such bonding. Comparison of the peptide sequence of VW 117 with those of other VS(:s rvill provide an indication of the extent of struct)ural differences between different variant antigens and possibly common elements of structure. which may underlit, the gross similarities in properties of these glycoproteins. It has recently been shown that the C-terminal 92 residue sequence of VSG 117 differs at only three
T. hrtccri
VARIANT
GLYCOPROTEIN
SEQlTENCE
545
positions from a sequence in another variant, AnTat 14; a third variant, AnTat 1.1 also shows considerable homology in its C-terminal region with VSG 117 (Matthyssens et al., 1981) as do two more variants, ILTAT 12 and ILTAT 1.3 (Rice-Ficht et al., 1981). All eight cysteine residues are conserved in these sequences as far as has been determined, and it may be anticipated that cysteine residues involved in disulphide bonds will often be found to be conserved, as is the case with variant influenza virus haemagglutinin molecules (Gething et al., 1980). However, not all VSGs have homologous C-terminal regions (Holder $ Cross, 1981; RiceFicht et al., 1981), and it has been suggested that VSGs may be classified into subsets based on C-terminal homology, two of which have been detected to date (Rice-Ficht et al., 1981). Homology in the C-terminal hydrophobic tail absent from the mature glycoprotein is greater than that, within the mature VSGs (Rice-Ficht et al., 1981 : Majumder et ai., 1981). We thank Dr N. M. Cireen for the secondarv structure prediction. MS I,. Dave? for assistance in the purification of the variant-specific glycoprotein. and Dr A. A. Holder for helpful discussions.
REFERENCES Allen, G. (1981). Laboratory Techniques in Biochemistry and Molecular Biology (Work, T. S. 8r Burdon, R. H., eds), vol. 9, EIsevier/North-Holland Biomedical Press, Amsterdam. Allen. (;., Trinnaman, B. J. & Green, N. M. (1980). Bioche,m. J. 187, 591-616. Barbet, A. F. & McUuire, T. C. (1978). Proc. Sat. Acad. Sci., U.S..4. 75, 1989-1993. Boothrovd, ,J. C.. Cross, G. A. M.. Hoeijmakers, J. H. ?J.& Borst. P. (1980). Yature (London). 288, 624-626. Boothro.vd. J. C., Paynter. C. A.. Cross. (:. A. M.. Bernards. A. & Borst. P. (1981). .liiLcI. =Icids Re.9. 9. 47354743.
Boothroyd. J. C., Paynter. C. A.. Coleman, 8. I,. & Cross. (:. A. M. (1982). J. Mol. Biol. 157. 547-556. Bridgen. P. J.. Cross, G. A. M. & Brigden. J. (1976). Xature (London). 263, 613-614. Bruton. C. J. & Hartley, B. S. (1970). J. Mol. Biol. 52, 165-178. Campbell. D. G.. (iagnon. J.. Reid. K. B. M. & Williams, A. F. (1981). Biochem. J. 195, 1.5-30. Chang. J. Y., Brauer, I). & Wittman-Liebold, B. (1978). FEBS Letters. 93. 205-214. Chou, 1’. Y. & Fasman, G. D. (1978). Annu. Rw. Biochem. 47. 251-276. Chua. N.-H. & Bennoun, I’. (1975). Proc. ,Vat. -Iced. Sci.. I1.S.d 72, 2175-2179. Cross. (:. A. M. (1975). Parasitology. 71. 393-417. Cross. Cl. A. M. (1977). A4nn. &x Beige Med. Trap. 57. 389-399. Cross. G. A. M. (1979). Proc. Roy. Sot. ser. B. 202. X-72. Cross. C:. A. M. (1979). ‘Yature (London), 277. 310-312. of Parasites and Host-Pomsite Cross. (:. A. M. & Johnson. J. G. (1976). In Biochemistry ReIotionjships (Van den Bossche. H.. ed.). pp. 413-420. ElsevierlNorth-Holland Biomedical Press. Amsterdam Dubois. M.. Uilles. K. A., Hamilton. J. K.. Rebers. P. A. & Smith. F. (1956). dnal. Chew. 28. 350-356. Edelhoch, H. (1967). Biochemistry. 6;1948-1954. (ieisow. M. 8: Roberts. R. D. B. (1980). Internat. J. Macromol. 2, 387-389. (iething. M-.J., Bye. J.. Skehel, J. & Waterfield. >fl. (1980). ,Vuture (Lcmdon), 287. 301-306. Hant.ke. K. & Braun. V. (1973). Rur. J. Biochem. 34, 284-296. Heiland. I.. Brauer. D. 8: Wittmann-I~iebold. R. (1976). Hoppe-Seylrr’s %. Phylsiol. Ohem. 357. 1751b1770. 18
546
t;. ALLEN.
I,. l’. (:lTKKETT
AXI)
t:
A. .\I. (‘KOSS
Holder. A. A. & Cross. (:. A. M. (1981). Mol. Bio&m. /‘nm.sitol. 2. 13% I.50. ,Johnson. J. U. B Cross. (:. A. 11. (1977). J. t’rototool. 24. 587 -591. Johnson, J. (:. &: Cross. (:. A. M. (1979). Biorhrm. J. 178. 689--697. Kost,ka. V. & Carpenter. F. H. (1964). J. Rio/. Chem, 239. 1799-~1803. Majumder. H. K.. Hoothroyd. J. C. & R’eher. H. (1981). .Vztcl. :lrids Krs. 9. 474&4i.i3. Matthyssens. C:.. Michiels. F.. Hamers. R.. Pays. E. & Strinert M. (1981). A%i~~we(Lo&w/. 223, 239233. Offord. R. E. (1966). .V&.wr (London). 211. 5!)1-593. Ozols, J. & (iward, C’. (1977). Pmt. .\ht. .4cud. Sci.. I’.S..4 74. :17% :%729. Rice-Ficht. A. C‘.. (‘hen. K. K. & Donelson. .J. K. (1981). .tirturr /l,ow~or,). 294. 53 57. Schmidt. M. F. (G. 8: Schlesinger. &I. J. (1979). Cdl. 17. 81% 8l!f. Shimada. A. 8r Nathrnson, S. (;. (1969). Riochumi&y. 8. 404% 4062. Skehel. J. ,J. & Waterfield. M. I). (1975). I’roc. ‘Vut. .-lca.d. Sci.. I’.S..4. 72. 93G97. Springer. T. A. &. Strominger. .I. I,. (l!Uti). C‘rot. Sot. .4cctd, A’ci.. I :.S..4 73. 2481~~2485. Tomita. M.. Furthmayr, H. & 1larehesi. V. T. (1!)78). Riochrrnistry. 17. 47X-4770. I’andekerkhow, J. & Van Nontagu. IU. (1!)74). Ew. .J. Riochrnr. 44, 279p288. Waxman, LX J. R- Strominger. .I. I,. (1981). .J. Rio!. (‘hrrrr. 256. 2067 2077.