J. theor. Biol. (1986) 121, 293-306
Is There Evidence for a Common Amino Acid Sequence in Proteins with Membrane Attaching Ability? M. J. TAYLOR,~" C. J. DUGGLEBY~ AND T. ATKINSON']"
Microbial Technology Laboratoryt and Molecular Genetics Laboratory,~. P.H.L.S. Centre for Applied Microbiology and Research, Porton Down, Salisbury SP4 0JG, U.K. (Received 6 October 1985, and in revised form 14 January 1986) A comparison between the primary sequence of staphylococcal protein A (SPA) and a wide range of published protein sequences revealed a limited, but striking homology in approximately two-thirds of them. The region in the SpA sequence with the common homology was identified as the octapeptide repeats comprising the cell wall peptidoglycan-binding domain. Available structural information and the known location of the proteins within their host cells suggests that this common octapeptide may be important in interaction of the protein with the cell surface (either membrane or wall).
Introduction The IgG-binding protein produced by many strains of Staphylococcus aureus, staphylococcal protein A, an extremely valuable immunochemical reagent in diagonostic assays and in immunoglobulin purification (e.g. of monoclonal antibodies), is potentially important in cancer therapy and in the treatment of immune-deficiency diseases (e.g. AIDS). Its value derives from the ability of SpA to bind the F¢ region of a wide range of mammalian immunoglobulins, an interaction quite distinct from an antibody-antigen interaction (for a review on the properties and uses of SpA see Langone, 1982). Recently the genes encoding SpA from two different S. aureus strains, NCTC 8325 (Lofdahl et al., 1983) and Cowan I (Duggleby & Jones, 1983; Colbert & Anilionis, 1983) have been cloned in Escherichia coli and their nucleotide sequences determined (Colbert & Anilionis, 1983; Uhlen et al., 1984; Crocker et al., manuscript in preparation). Analysis of the protein primary structure along with previous studies on the protein itself (Lofdahl et al., 1983; Sjodahl, 1977; Goding, 1978; Deisenhofer, 1981) has shown that it has a molecular weight of approximately 42 000 and comprises two functionally distinct regions; an amino-terminal region responsible for immunoglobulin binding, and a carboxy-terminal region, termed region X, involved in binding to the peptidoglycan of the bacterial cell wall. A comparison between the primary sequence of this region X and a wide variety of other proteins suggests that this common octapeptide is important in the cell wall, membrane or receptor interactions of proteins. We have recently determined the nucleotide sequence of the SpA gene from $. aureus Cowan I (Crocker et al., manuscript in preparation). Our interest here lies in the sequence of region X (SpAx); that responsible for the peptidoglycan-binding 293 0022-5193/86/150293+ 14 $03.00/0
© 1986 Academic Press Inc. (London) Ltd
294
M.j. TAYLOR E T A L .
property (Gusset al., 1984). The amino acid sequence reveals that this region consists of 11 repeating units, each unit consisting of an octapeptide: GLY-ASN-LYS-PRO-GLY-LYS-GLU-ASP 1 2 3 4 5 6 7 8 This repeating structure is also found in the SpA sequences from a similar, though from our data not identical, strain of Cowan I (Colbert & Anilionis, 1983; Uhlen et al., 1984) and strain NCTC-8325 (Uhlen et al., 1984) although the number of repeating units vary from 10-12. Between the strains and within the region X of a given strain there exist differences in the sequence of the octapeptide, nearly all changes being confined to the first and second amino acids of the octapeptide (GLY and ASN in our sequence of Cowan I, GLY/ASN and ASN in NCTC 8325-4, GLY/ASN and LYS .in Cowan I). The highly conserved nature of this periodic repeating unit suggests that strong evolutionary pressure exists to retain this functional sequence and therefore stimulated this investigation into interactions of a large number of proteins with the cell wall or membrane. Homology Matrices
The homology matrices shown in Figs 1-4 are derived from a comparison of pairs of amino acid sequences. A "block" of amino acids (in this work a block of five was chosen) from sequence A (residues 1-5 on the Y-axis) is compared with the first block of amino acids from sequence B (again a block of five was chosen: residues 1-5) on the X-axis. The second block of residues is then incremented one residue at a time (i.e. residues 2-6, 3-7, 4-8 etc.) and compared with the block of residues from sequence A. Once all of sequence B has been scanned the residues in sequence A are incremented by one (now residues 2-6) and again compared with all of sequence B (as described). This process is repeated until the whole of sequence A has been compared with the whole of sequence B. If, during each comparison, a pre-determined number of amino acids occur in the same position (within both blocks) a point is plotted at a coordinate corresponding to the middle of both blocks. We chose 60% homology (3 or more amino acids out of 5) as the threshold level in this work. A continuous diagonal through the origin is observed when both sequences are identical (Fig. 1) and other regions of homology (internal homology in the case of a protein compared against itself) appear as short lines parallel to the diagonal. In Figs 2-4 the diagonals (representative of comparisons between identical sequences) are absent. Evidence
Homology matrix analysis using the microcomputer programs of Fristensky et al. (1982) of our Cowan I SpA sequence against itself (Fig. 1) shows both the extensive homology (more than 60%) between the IgG-binding regions (E, D, A, B, C) and between the 11 internal octapeptide repeats in the X-region. Employing the same matrix analysis we have compared the SpA amino acid sequence with those of 96
AMINO
ACID
SEQUENCES
IN
MEMBRANE-ATTACHING
PROTEINS
295
S.oureus PROTEIN A
1 I
57 !
118 I
176 I
254 I
292 I
391
473 I REGION S 56 amino acid signal 1 peptide REGION E 57
z
REGION D Ig-binding
118
W I.-
REGION A Ig-binding 176 REGION B Ig-binding 254 REGION C Ig-binding
o0e (3_
o 0
-292
REGION Xr
contains octapeptide repeats: binds S. eureus pepfidoglycan
-391
\,
REGION X c -444 464 Hydrophobic membrone~,75 spanning domain
FIG. 1. Internal homology of Cowan I SpA. T h e d o t m a t r i x analysis at 60% homology was done using the program described by Fristensky et aL (1982). The complete primary sequence of Cowan I S p A is represented on both axes. Blocks of 5 amino acids of the SpA sequence (horizontal axis) were sequentially compared with segments of the SpA sequence (vertical axis) and homologies between them were stored using a matrix. In the above plot a point was plotted when 3 or more amino acids matched within a span of 5.
proteins of which 63 are believed to be membrane or cell wall associated or to have a receptor function. The remaining 33 proteins are generally considered to be cytoplasmic and not membrane-interactive. The purpose of this work was to investigate whether sequences similar to the region X octapeptide occurred in other proteins with similar wall or membrane attachment properties or indeed with proteins believed to have a receptor activity. The repeating structure of region X makes the presence of a region (or regions) homologous to SpAx distinctive against the background, and the absence of any homology to SpAx readily apparent. In this survey we found 100% correlation between the nature of the protein and its homology with region X. That is, those proteins with membrane or cell wall association showed homology; whereas no homology is evident within reported cytoplasmic proteins. Below we detail specific cases of such homology in an attempt to correlate it with the available structural information. In our first example where we examined the homology between the SpA amino acid sequence and that of an E. coli sensory transducer protein (the Tar gene product) (Krikos et al., 1983) we found three regions of homology with SpAx occurring approximately around amino acid residues 61,404 and 489 (marked by
296
M. J. T A Y L O R E T A L . S.oureus PROTEIN A I
1
57
118
176
254
:992
591
475
I
I
I
I
I
1
I
I
1 I
PERIPLASMIC SURFACE
0
Q
o
!,.,;i. laJ
PERIPLASMIC DOMAIN
~<~
LO~ "tO r..) ¢'~
_z taJ
E the)
0 r~ n
- 189
Hydrophobic
-214
membranespanning domain
o~
r~ W
u>,E ~
SIGNALLING DOMAIN
0~ Z rr >r~ 0
I Z
558
0 C3
405
Highly conserved 1
Z w
447
,
1461 Htghly conserved J
< _J 13_ 0 I->-
-520
FIG. 2. A dot matrix analysis of homology (at 60%) between SpA and E. coli sensory transducer protein. The complete primary sequence of Cowan SpA I is represented on the horizontal axis (numbered bar corresponds to annotation in Fig. 1) and the sequence of the sensory transducer protein on the vertical axis. The annotated vertical bar on fight-hand axis summarizes available structural information on the E. coli protein. The plot was constructed as in Fig. 1. Arrows denote regions of homology with SpA x .
arrows in Fig. 2). The location of these "sites" within the protein were compared to the known structure and function of the protein. The Tar gene product acts as a chemoreceptor, monitoring the chemical environment and affecting the frequency of reversal of the bacterial flagellar motion. It is an integral membrane protein with the amino-terminal 190 amino acids assembled at the periplasmic surface, a hydrophobic region around residue 200 locating it within the lipid bilayer, and its carboxy-terminus folded on the cytoplasmic side of the membrane (Krikos et aL, 1983). Figure 2 shows that a site of homology (with SpAx) occurs after the signal peptide (within the periplasmic domain) and is thus proposed here as the means of locating the chemoreceptor on the periplasmic membrane surface. Our results show that two sites of homology with SpAx exist in the cytoplasmic domain of the protein. Previous workers have suggested that the protein must possess a means of signalling changes to the flagellar motors, and presumably therefore, a means also of receiving signals. The fact that the site of homology with SpAx around residue 404 is found to be highly conserved throughout several E. coIi transducers (e.g. the Tap and Tsr gene products) (Krikos et al., 1983) suggests that this region may be involved in transmitting a signal to the flagellor motors since both the signal and signal-receptor must remain compatible.
AMINO ACID SEQUENCES
IN M E M B R A N E - A T T A C H I N G
PROTEINS
297
In our second example we looked for homology with a classic case of one of the most common family of proteins exhibiting cell-receptor function: those associated with the outer coat proteins of mammalian viruses. Virus uptake occurs at specialized regions of the plasma membrane called "coated pits" (Helenius et al., 1980; Bretscher & Pearse, 1984) (plasma membrane invaginations with a characteristic electron-dense layer on the cytoplasmic side). A major component of the coat material of these pits is clathrin. The virus initially binds to the target cell at these pits by means of its surface glycoproteins. The haemagglutinin spikes of influenza virus are known to be important in this attachment to the target cell (Air, 1981;Krystal et al., 1982; Vaerhoeyen et al., 1983; Van Rompuy et al., 1983; Knossow et ai., 1984). We found that human type B influenza virus haemagglutinin had a large number of sites exhibiting homology with SpAx (Fig. 3). Structural information on this protein (Air, 1981; Krystal et al., 1982; Vaerhoeyen et al., 1983; Van Rompuy et al., 1983;
S.oureus 1 I
57
118 l
I
A
PROTEIN
176 I
254 I
292
391
475 I
1
15 amino acid signal peptide
127 157"
high %'age substitution
Z 0 I-¢Z Z I-_1 (..9
Antigenic determinant
<:~
HA1
hi I
224
r~
CONSERVED REGION
O~
524 W a~ 0 13_ Z
N Z W kL Z
HA2 •
.
,
.
,
.
- 515 -559
Membrane-spanning domain
593 FIG. 3. A dot matrix analysis of homology (at 60%) between SpA and human influenza virus haemagglutinin. The complete primary sequence of Cowan SpA I is represented on the horizontal axis (numbered bar corresponds to annotation in Fig. 1) and the sequence of the sensory transducer protein on the vertical axis. The annotated vertical bar on right-hand axis summarizes available structural information on the haemagglutinin. The plot was constructed as in Fig. 1.
298
M. J. T A Y L O R , E T A L .
Knossow et aL, 1984) suggests that the antigenic site occurs at residues 127-137, an area distinct from the proposed cell receptor site (Fig. 3). Approximately 25% of all substitutions occur within the antigenic site, the remainder of the protein being highly conserved, with only 6.2% amino acid changes in 39 years. This sequence conservation has been noted in the region of several proteins proposed to carry the cell membrane receptor site (Air, 1981; Krystal et al., 1982; Van Rompuy et aL, 1983). If the proposal that the haemagglutinin carries the viral receptor is correct, then further corroberating evidence is provided by the fact that only antibodies directed against influenza virus haemagglutinin (and not cell surface neuraminidase) antigens are able to prevent infection (Vaerhoeyen et aL, 1983). S.aureus PROTEIN A
]
57
lib
176
2~4
2~2
39t
4T3 -1
24 amino acid signal peptide
nO I--n w t..) LO" cr
hJ
EXTRACELLULAR DOMAIN - contains humanepidermal growth factor binding sites
nO I-¢.)
"lb-
o
t~ L9 / :g tw W r'~
ft. W Z
-I-
FIG. 4. A dot matrix analysis of homology (at 60% ) between SpA and human epidermal growtht factor (EGF) receptor. The complete primary sequence of Cowan SpA I is represented on the horizontal axis (numbered bar corresponds to annotation in Fig. 1) and the sequence of the EGF receptor protein on the vertical axis, The annotated vertical bar on right-hand axis summarizes available structural information on the receptor protein, The plot was constructed as in Fig. 1.
AMINO
ACID
SEQUENCES
IN
MEMBRANE-ATTACHING
PROTEINS
619 644
299
a-helix; spans membrone
694 \
CYTOPLASMIC DOMAIN -contoins tyrosine-specific protein kinose ocfivity Ld
940
1 45
'1 76
Autophosphorylation sites
1 1 1 1
58 56 66 76 -' 186 FIG. 4.
(cont.)
Our last example shows the homology of human epidermal growth factor (EGF) receptor protein (Ullrich et al., 1984) with SpAx (Fig. 4). This binds to the surface of target cells and triggers an intracellular chain of events resulting in the induction of tyrosine kinase activity intrinsic to the EGF-receptor. Shortly after binding, EGF-receptor complexes are localized in clathrin-coated regions of the plasma membrane, before being internalized by the cell. This sequence of events finally results in the stimulation of DNA synthesis and cellular proliferation (Ullrich et al., 1984; Xu et al., 1984; Downward et al., 1984b; Hunter et aL, 1984). A related protein, avian erythroblastosis virus (AEV) v-erb-B transforming protein (Ullrich et al., 1984; Xu et al., 1984; Downward et al., 1984a, b; Privalsky et al., 1984; Yamamoto et al., 1983) lacks most of the extracellular domain responsible for EGF receptor binding but is similar over a 376 residue long core region beginning at the cytoplasmic junction with the transmembrane domain in both sequences (beginning at approximately residue 576 in EGF-receptor protein) (Ullrich et al., 1984).
300
M.J.
TAYLOR
ET
AL.
TABLE 1
A list o f the proteins whose primary sequence was analyzed f o r homology with Cowan I region X , the peptidoglycan-binding region o f the protein. In the list o f proteins showing homology with SpAx those marked " t " are proteins which are thought to be cell wall or m e m b r a n e locating or to have receptor function. The proteins marked " $ " show homology to SpAx but are not thought to have any o f the above functions. In the section o f proteins f o u n d not to have homology with SpAx those m a r k e d " t " are not thought to be cell wall or m e m b r a n e locating or to have receptor function. All u n m a r k e d proteins are those about which no relevant information could be found. Proteins possessing SpA cell-attachement site Outer membrane proteins t Coliphage lambda attachment site (LamB) t Enterobacter aerogenes OmpA t Escherichia coli OmpA (porin) t E. coil OmpC (porin) t E. coil OmpF t E. coli phosphate limitation inducible outer membrane pore protein (PhoE) t Serratia marcescens OmpA t Salmonella typhimurium OmpA t Shigella dysenteriae OmpA Toxins
t Clostridium perfringens type A toxin t Corynebacterium diphtheriae toxin fragment B (CB 4
N-terminus 1-44 a.a.'s)
t E. coli heat labile toxin A & B t Ricin Mammalian proteins (a) Human t Complement factor B t Epidermal growth factor receptor # Growth hormone t Immunoglobulin E t Interleukin 2 receptor t Luteinizing hormone-releasing hormone t Lysozyme Rennin # Rhodopsin t Serum albumin t Transferrin receptor (b) Other t Bovine fibronectin-collagen binding domain t Bovine heart cytochrome c oxidase subunit V Bovine lactalbumin t Chicken lysozyme t Duck lysozyme t Guinea-pig lactalbumin t Hamster 3-Hydroxy-3-methyl-glutaryl Coenzyme A reductase Porcine kidney D-amino acid oxidase t Porcine liver microsomal cytochrome B5 Sheep's liver sorbitol dehydrogenase
Viral proteins t Avian erythroblastosis virus v-erb-B transforming protein t Epstein-Barr virus 93 k & 34 k t Gene III of filamentous bacteriophage fl t Hepatitis B surface antigen t Herpes simplex virus t Human type B influenza virus haemagglutinin t Immediate early antigen of human cytomegalovirus Miscellaneous bacterial proteins Azotobacter vinlandii 7 Fe ferredoxin t Bacillus cereus fl-lactamase 1 presecretory protein t B. licheniformis fl-lactamase B. stearothermophilus D-glyceraldehyde-3phosphate dehydrogenase ~ B. stearothermophilus triosephosphate isomerase t E. coil alkaline phosphatase t E. coli ATP synthase Fo complex subunits A &B t E. coil fumarase t E. coil NADH dehydrogenase t E. coil K12 pyruvate dehydrogenase complex t E. coil TAR sensory transducer protein t E. coil TonB gene product t Pseudomonas putida cytochrome P Ps. putida catechol-2, 3-dioxygenase t Staphylococcus aureus protein A t Staphylococcus aureus staphylokinase t Streptococcal protein M t Transposon 903 aminoglycosidase Other protein
t Saccharomyces cerevisiae fl-D-fructofuranoside fructohydrolase (invertase)
AMINO ACID SEQUENCES IN MEMBRANE-ATTACHING PROTEINS
301
Proteins not possessing SpA cell-attachment site
Toxins t Corynebacterium diphtheriae toxin fragment B
(CBI, CB2, CB3, CB5) E. coli heat stable toxins I & IA Staphylococcal-toxin Mammalian proteins (a) Human Apoprotein E Lactalbumin t Retinol binding protein Somatostatin 1 (including propeptide) (b) Other t Bovine heart mitochondria oligomycin sensitivity-conferringprotein t Bovine protein C light & heavy chains t Chicken ovalbumin Chicken avidin Japanese quail egg lysozyme Viral proteins t BacteriophageT4/T2 lysozyme Gene X of filamentous bacteriophage fl t Vesicular stomatitis virus G protein (NHzfragment only)
Miscellaneous bacterial proteins t B. licheniformis a-amylase t B. subtilis thymidine synthase (ThyP3) t E. coli cyanate hydrolase (cyanase) E. coil pyruvate dehydrogenase complex gene A t E. coil ATP synthase Fo complex subunit C t E. coli ATP synthase F1 complex a- & 8 subunits t E. coli thymidine synthase (ThyA) t E. coil transposon 5 (Tn5) t Pseudomonas fluorescens p-hydroxybenzoate hydroxylase t S. typhimurium His operon P-protein Other proteins Bee melittin t Saccharomyces cerioisiae cytochrome c oxidase subunit VI t S. cerevisiae cytochrome C peroxidase precursor t S. cerevisiae Pho5 gene product
Homology with SpAx is apparent around residue 606 in E G F and such homology was found in both the AEV and the EGF-receptor proteins. The known structural regions responsible for specific functions in the EGF-receptor protein are annotated in Fig. 4. The amino-terminal hydrophobic signal sequence of 24 amino acids, responsible for transporting the protein across the lipid bilayer shows no SpAx homology. The extracellular domain, encompassing the region approximately up to residue 619 and reported to contain the EGF-receptor bindiog site(s) (Ullrich et al., 1984; Xu et aL, 1984; Downward et al., 1984a, b; Privalsky et al., 1984; Yamamoto et al., 1983), shows four sites of homology with SpAx--around residues 102, 296, 446, and 607. Residues 619-644, reported as spanning the lipid bilayer, show no homology with SpAx. In accordance with our proposal no homology with SpAx is observed within the cytoplasmic domain (residues 646-1186) containing the tyrosinespecific protein kinase region (amino acids 694-940). However, a site of homology around residue 1036 may indicate a membrane-binding site proposed to occur in this region from structural data. Such homology is absent in the AEV protein, confirming the reports that this oncogenic protein lacks most of the receptor sites present in the cytoplasmic domain of the EGF-receptor protein. The area of SpAx homology common to both EGF-receptor and the v-erb-B protein also appears in a series of related oncogenic proteins from sarcoma viruses (v-src, v-yes, v-fes, v-fps, v-mos), in bovine protein kinase, and in several protein kinases thought to be involved in the processing of precursors to polypeptide hormones and growth factors
Ala
Cys
Asp
Glu
Phe
Gly
His
lie
Lys
Leu
A.
C
D
E
F
G
H
I
K
L
8.20
11-47 1.74
0
1.64
1.64
3.28
9.84 1.64
3.28
6.56 2-26
8.20
6
4.92
8.20 1.24
1.64
0
9.84 1.17
3.28
13.11 2.18
8-20 1.49
0
4-92
5
4-92
6.56
6.56 1.46
0
6.56
8.20 2.20
8.20 1.37
13.11 2.38
1.64
6-56
4
6.56
3.28
3.28
0
18.03 2.15
0
11.47 1.91
9.84 1.79
3.28
4.92
3 0 0
4.84
16.29 2.47
1.61
8.06 4.03
3.23
1.61
3.23
1.61
2
9-68 1.31
32-26 4.89
1-61
1.61
4.84
3.23
3.23
1-61
3.23
3-23
1
1.61
1.61
3.23
0
0
1-61
1.61
3.23
0
3.23
P
4.84
3-23
1"61
0
45.90 5.46
1-61
4.84
0
1'61
3"23
G
14.52 1.96
22.58 3.42
1.61
3"23
4.84
3.23
8.06 1.34
3.23
1-61
3.23
1
4.84
6.45
4.84
0
9.68 1.15
6.45 1.79
14.52 2.42
1.61
3.23
1.61
2
4.84
8-06 1.22
4-84
0
8.06
1.61
4-84
4.84
6.45 2.22
6.45
3
9.68 1.31
1.61
8.06 1.79
0
6.45
0
6.45
4.84
4.84 1.67
8.06
4
4.92
6-56
4.92
1-64
3.28
4.92
11.47 1.91
1.64
4.92 1-70
8.20
5
-I ,< t" 0 '-t
6.67
8'33
3'33
6"67 1 '48
5.00 2.50
3.33
3.33
5.00
.'-'
3'33 3 "33
6
Analysis of regions of homology. This table shows the relative frequency of occurrence (as a percentage) of residues in the regions of the 6 3 proteins that have homology with SpAx. The columns represent the position relative to the Pro-Gly doublet in SpAx, and the upper row the observed frequency of each residue at that position. Where there is a significant occurrence, fo/ fe > 1 (see text), this value is shown underneath. At the bottom of the figure, underneath what might be regarded as a significance frequency profile, examples of individual regions of homology are given. Underlined residues indicate an exact match with SpAx, whereas broken underscoring indicates an acceptable substitution according to Shotton & Hartley ( 1 9 7 0 )
TABLE 2
b~
ASn
Pro
Gin
Arg
Ser
Thr
Val
Trp
Tyr
N
P
Q
R
S
T
V
W
Y
Human EGFR OTC'ase Ps. cytoehrome p450 E. coli F o complex A Haemagglutinin Tar receptor
Most frequently occurring residues SpA region X (+variations)
Met
M
Asp Asp Arg Thr Met
Tyr
Ala Gln Ala Tyr
Asn
Lvs
1'64
ASp
Glu
Lys
0
3"28
4"92
1 "64
6"56 1.34
8.20 2"10
0
3"28
4.92 2"89
Met Asp Asp
6"56 1-47
1.64
4"92
8.20 1'34
1 "64
9.84 2.01
3"28
0
4"92
3 "20 1.93
Glu Arg Glu
Cys
Tyr
8.20 2.41
1.64
9.84 1.49
8.20 1.34
3.28
1.64
0
3"28
3.28
1-64
Gly Gin Lys
_G_l_n_
Asn
~_a_
3"23
1.61
4.84
3"23
9"68 1.38
8-06 1-65
6.45 1-65
11 "29 2.17
His Asn Asn Lys
Lys Arg Ser Val
0 14"52 3"38
Glv Asn Gly Asn
0
0
11.47 1.74
4'92
6-56
3.28
1-64
3"28
8'20 1"91
0
Ala
Val
Lys
Tyr
Lvs
8"06 2"37
0
4.84
9'68 1.59
3.23
3'23
1'61
1"61
3"23
0
Ser Pr...._q Pro Pr_._gq Pro
Thr
Pro
Pr..._qo
1"61
0
0
3"23
3"23
1.61
1"61
69"35 13.34
3"23
0
GIy GIv GIy
Leu
_Q_!x Glv
Lvs Lvs
Thr
Lvs
Lcu Lys
Lys
1.61
3'23
1"61
8"06 1"32
8-06
0
6"45 1"65
1 '61
1"61
3"23 1.90
Met Gly
GIy
1"61
1-61
0
3"23
6.45
1.61
6-45 1 "65
3"23
4"84
4.84 2-85
Gin Thr Gin
_As_0_
Lys Gt_..~u
Thr Ash
_m__a_
lie
._Al_a_ Ala
Leu
Tyr Asn Gly Asn
6"45 1 "90
0
3"23
8-06 1.32
9.84 1'41
3"23
4'84
6"45
Gly Thr Gly Met
0 8'06 1'87
.O_!u
Asn Cys Asp
0
Gl._.__~u Phe Glu
0 3.23
3.23
6.45
3-23
4-84
6.45 1 "65
1"61
19"35 4.50
3"23 1 "90
3.23
6.45
9"68 1.59
6.45
3"23
6"45 1 "65
3"23
6'45 1'50
0
Set Ile Ile Asn
Pro
Gin Asn Lys
Pro
1 "64
0
4.92
8"20 1.34
3"28
8-20 1-67
8.20 2.10
11.47 2.21
1.64
0
Met Leu Ile Glu Val
His Lys
Ass
0
3"33
10-00 1"52
8"33 1"37
5.00
5"00
6"67 1-71
3'33
3"33 2'94
5.00
Z
;Z (3 m
rM
m
,0 C
Z 0 >
>
304
M.J.
TAYLOR
ET AL.
(Ullrich et al., 1984; Privalsky et al., 1984; Yamamoto et al., 1983; Lundgren et al., 1984). During the course of this investigation only two known anomalies were found in 78 known proteins; the Bacillus stearothermophilus proteins triosephosphate isomerase and D-glyceraldehyde-3-phosphate dehydrogenase (Table 1). Both enzymes showed multiple sites of homology with SpAx. However, neither enzyme has been reported to be membrane-associated but it is possible that association with membranes within the cell confers increased thermostability upon the enzymes; this has been observed in some immobilized systems, and thermophilic bacteria are often highly membranous in structure (Gibson, 1982). It was also possible to predict that gene III from bacteriophage fl is responsible for locating it within the bacterial membrane. This homology site is in agreement with the previous prediction that one or more of the fl genes are responsible for its membrane location (Beck & Zink, 1981). Similarly, the immediate early antigen of human cytomegalovirus shows four regions of homology with SpAx, thus suggesting that this antigen is membrane associated. Preliminary results (Dr G. Farrar, personal communication) indicate that this antigen may be associated with the nuclear membrane of the cytomegalovirus.
Discussion
The frequency of occurrence of the amino acids in 63 sequences observed to have significant (>60%) homology with SpA region X are shown in Table 2. The data are derived from frequency of occurrence (fo) of specific amino acids in the positions from - 6 (N-terminal) through the common Pro-Gly to +6 (C-terminal). From this, fo/fe represents the ratio of observed amino acid frequency in that position to expected amino acid distribution frequency (Hunt et al., 1978), where for unbiased occurrence fo/fe = 1. Although weighted by the 60% homology requirement the occurrence of amino acids in the positions from - 6 to +6 relative to the striking Pro-Gly dipeptide is significant. Analysis of the residues N-terminal to the - 6 reveals one of two types of sequence; either it displays a partial repeat of the X-region octapeptide or it has an abnormally high occurrence of Met, Set, Tyr, Trp, or Asn (fo/fe = 2.2-9-3). Where the partial repeat is seen then the latter residues again predominate N-terminal to that partial repeat. Analysis of the residues beyond +6 C-terminal to the Pro-Gly shows similar features. In addition, amino acids can be identified which do not ever occur in specific positions from - 6 to +6 relative to the predominant Pro-Gly, notably His, Trp and Met. Two other points are worth considering with respect to the octapeptide sequence. Firstly, because the sequence is multiply repeated it is not clear where a cell membrane/wall recognition sequence commences and finishes within the sequence. Secondly, the sequence around the Pro-Gly does bear a resemblance to the frequency hierarchy of amino acid residues in the/3-turn of proteins (Chou & Fasman, 1978), an observation not inconsistent with the possible role of such a sequence acting as a recognition signal for membrane attachment.
AMINO ACID SEQUENCES
IN M E M B R A N E - A T T A C H I N G
PROTEINS
305
In summary, we have shown from a detailed analysis of 96 proteins (Table 1) that of 63 exhibiting significant homology with the X region of SpA (at the level of 3 amino acids in 5), 54 are believed to have cell wall or m e m b r a n e binding capacity or receptor function, as illustrated in the previous examples; whilst of 33 proteins showing no homology with SpA region X, 22 are believed to be cytosomal and not to interact with m e m b r a n e or cell wall systems. The remaining proteins are those about which no structural information could be found to determine their cellular location (Table 1). This clearly indicates that the conservation of the SpA X region is not only crucial to the cell wall attachment of this molecule in Staphylococcus aureus, but that this fundamental primary sequence has been highly conserved throughout evolution for those proteins which have an association with cell-walls, membranes, or receptors. Analysis of the data from the SpAx region and those proteins with known cell wall or m e m b r a n e attachment or receptor function indicates the evolutionary conservation of the hexapeptide ASP-GLY-ASN-LYS-PRO-GLY. Since it is often believed that the physical binding of a molecule to the cell w a l l / m e m b r a n e involves a large hydrophobic interaction, it is of course highly probable that this predominantly hydrophilic and conserved sequence is not necessarily a primary binding sequence, but rather, a lock-key mechanism to locate the protein to a specific site on the wall or m e m b r a n e prior to further hydrophobic interaction or binding.
Note Added During Revision of the Manuscript Satake, Coligan, Elango, Norrby and Venkatesan (Nucleic Acids Research 13, 7795-7811, 1985) have recently published the sequence and novel structure of the respiratory syncytial virus envelope glycoprotein which binds to h u m a n cell membranes. The authors propose that the 23 amino acid hydrophobic domain located 41 residues from the N-terminus is responsible for m e m b r a n e insertion. We have noted an extremely high ( > 80% ) degree of homology between SpAx and a sequence in the middle of the molecule (residues 191 to 233) comprising a tetrapeptide/octapeptide repeat. The relevant sequence is shown below for comparison: ASN-LYS-LYS-PRO-GLY-LYS-LYS-THR-THR-THR-LYS-PRO-THR-LYSLYS-PRO-THR-LEU-LYS-THR-THR-LYS-LYS-ASP-PRO-LYS-PRO-GLNTHR-THR-LYS-SER-LYS-GLU-VAL-PRO-THR-THR-LYS-PRO-THRGLU-GLU REFERENCES AIR, G. M. (1981). Proc. natn. Acad. Sci. U.S.A. 78, 7639. BECK, E. & ZINK, B. (1981). Gene 16, 35. BRETSCHER, M. S. & PEARSE, B. M. F. (1984). Cell 38, 3. CHOU, P. Y. & FASMAN, O. D. (1978). Adv. Enzymol. 47, 45. COLaERT, D. A. & ANILIONIS,A. (inventors) (1983). European Patent Publication no. EP 0 107 509 A2. Repligen Corporation. DEISENHOFER, J. (1981). Biochemistry 20, 2361. DOWNWARD, J., YARDEN, Y., MAYES, E., SCRACE, G., TOTTY,N., STOCKWELL,P., ULLRICH,A., SCHLESSINGER, J. & WATERFIELD, M. D. (1984a) Nature 307, 521.
306
M. J. T A Y L O R E T A L .
DOWNWARD, J., PARKER, P. & WATERFIELD, M. D. (1984b) Nature 311, 483. DUGGLEBY, C. J. & JONES, S. A. (1983). Nucleic Acids Res. 11, 3065. FRISTENSKY, B., LIB, J. & Wu, R. (1982). Nucleic Acids Res. 10, 6451. GIBSON, F. (1982). Biochem. So~ Trans. 11, 229. GODING, J. W. (1978). J. ImmunoL Meth. 20, 241. Guss, B., UHLEN, M., NILSSON, B., LINDBERG, M., SJOQUIST,J. • SJODAHL,J. (1984). Eur. J. Biochera. 138, 413. HELENIUS, A., MARSH, M. & WHITE, J. (1980). TIBS 5, 104. HUNT, L. T., BARKER, W. C. & DAYHOFF, M. O. (1978). In Atlas o f protein sequence and structure 5, (Supp. 3), 25. HUNTER, T., LING, N. & COOPER, J. A. (1984). Nature 311, 480. KNOSSOW, M., DANIELS, R. S., DOUGLAS, A. R., SKEHEL, J. J. & WILEY, D. C. (1984). Nature 311, 678. KRIKOS, A., MUTOH, N., BOYD, A. & SIMON, M. I. (1983). Cell 33, 615. KRYSTAL, M., ELLIOT, R. M., BENZ, E. W. JR, YOUNG, J. F. & PALESE, P. (1982). Proc. natn. Acad. Sci. U.S.A. 79, 4800. LANGONE, J. J. (1982). Ado. lmmunoL 32, 157. LOFDAHL, S., Guss, B., UHLEN, M., PHILIPSON, L. & LINDBERG, M. (1983). Proc. natn. Acad. Sci. U.S.A. 80, 697. LUNDGREN, S., RONNE, H., RASK, L. & PETERSON, P. A. (1984). J. bioL Chem. 259, 7780. PRIVALSKY, M. L., RALSTON, R. & BISHOP, J. M. (1984). Proc. natn. Acad. Sei. U.S.A. 81, 704. SHOTTON, D. M. R, HARTLEY, B. S. (1970). Nature 225, 802. SJODAHI.~J. (1977). Eur. Z Biochem. 78, 471. UHLEN, M., Guss, B., NILSSON, B., GOTZ, F. & LINDBERG, M. (1984). J. Bacteriol. 159, 713. ULLRICH, A., COUSSENS, L., HAYFLICK, J. S., DULL, T. J., GRAY, A., TAM, A. W., LEE, J., YARDEN, Y., LIBERMANN, T. A., SCHLESSINGER, J., DOWNWARD, J., MAYES, E. L. V., WHIIWLE, N., WATERFIELD, M. D. Rr SEEBURG, P. H. (1984). Nature 309, 418. VAERHOEYEN, M., VAN ROMPUY, L., MIN JOU, W., HUYLEPROECK, D. & FIERS, W. (1983). Nucleic Acids Res. 11, 4703. VAN ROMPUY, L., MIN Jou, M., VERHOEYEN, D., HUYLEBROECK, D. & FIERS, W. (1983). TIBS 8, 414. Xu, Y-H., ISHII, S., CLARK, A. J. L., SULLIVAN,M., WILSON, R. K., MA, D. P., ROE, B. A., MERLINO, G. T. & PASTAN, I. (1984). Nature 309, 806. YAMAMOTO, T., NISHIDA, T., MIYAJIMA, N., KAWAI, S., OoI, T. & TOYOSHIMA, K. (1983). Cell 35, 71.