Comp. Biochem. Physiol. Vol. 77B, No. 2, pp. 399-412, 1984
0305-0491/84 $3.00 + 0.00 Pergamon Press Ltd
Printed in Great Britain
A COMPARISON OF THE AMINO ACID SEQUENCES OF DIFFERENT CLASSES OF IMMUNOGLOBULIN AND HISTOCOMPATIBILITY C-DOMAINS FROM DIFFERENT MAMMALIAN SPECIES CORRELATIONS
BETWEEN CONSERVANCY DIMENSIONAL STRUCTURE
AND
THREE
D. BEALE Agricultural Research Council, Institute of Animal Physiology, Babraham, Cambridge, UK (Received 22 June 1983)
Abstract--1. A comparison has been made of the amino acid sequences of 77 different immunoglobulin C domains and 22 histocompatibility C-like domains from different species and classes. Conserved positions have been correlated with structural features observed on the high resolution X-ray crystallographic models of human immunoglobulin fragments. 2. The only invariant residues are two cysteines which form an intra-domain disulphide bridge. 3. There are six positions which always contain amino acid residues with hydrophobic side chains. On the crystallographic models these side chains point to the domain interior and make essential contributions to its hydrophobic nature. 4. There are four other positions which always contain amino acid residues with uncharged and generally hydrophobic side chains. On the crystallographic models these side chains also point towards the domain interior but tend to lie at the periphery of the fl-sheets. As such they can contribute towards the hydrophobic interior of the domain but need not necessarily do so. 5. There are also four positions which are uncharged and generally hydrophobic except for a single occurrence of a charged residue in each case. These exceptions could be erroneous and need confirmation. On the crystallographic models the side chains at these positions point to the domain interior but tend to lie at the periphery of the fl-sheets. 6. Immunoglobulin C-domains and histocompatibility C-like domains show small but characteristic divergences from each other. The different types of C-domain also show characteristic divergences from each other as do the different types of histocompatibility C-like domains. A particular type of domain shows less divergence in different species than different types of domain do in the same species. 7. The relationship of immunoglobulin C-domains and histocompatibility C-like domains to immunoglobulin V domains, other histocompatibility domains, Thy-I antigen and superoxide dismutase is discussed. 8. Lateral and longitudinal contacts between immunoglobulin domains are almost entirely conserved regardless of species or class. Similar contacts could exist between class II histocompatibility ~2 and f12 domains but a somewhat modified type of contact might occur between fl2-microglobulin and class I histocompatibility ~3 domain.
INTRODUCTION
The structures of immunoglobulins are based on a subunit consisting of two light (L) polypeptide chains of mol. wt ca. 22,500 and two heavy (H) polypeptide chains of mol. wt 45,000-75,000. The four chains are linked by disulphide bridges but there are also strong non-covalent interactions between them. There are two types of L-chain (x and 2) c o m m o n to all immunoglobulins and several different H-chains (Tt, 72, 73, 74, 0el, 52, ~/, £~and E) which determine the class and subclass of the immunoglobulin (IgG~, IgG2, IgG3, IgG+, IgAt, IgA2, IgM, IgD and IgE). Proteolytic enzymes can cleave immunoglobulins into fragments such as F a b (containing an L-chain and the N-terminal half of the H-chain) and Fc (containing the C-terminal half of the H-chain) (see review by Beale and Feinstein, 1976). X-ray crystallographic analysis of a human L399
chain dimer (Schiffer et al., 1973), human (IgG) Fab (Poljak et al., 1973), and Fc fragments (Deisenhofer et al., 1976), a mouse (IgA) Fab fragment (Segal et al., 1974) and several h u m a n IgG proteins (see reviews by Davies et al., 1975, Amzel and Poljak, 1979) revealed that the peptide chains are folded into conformationally related globular domains confirming earlier predictions by Edelman and Gall (1969). The basic structure of a domain was found to consist of two sheets of antiparallel fl-structure joined by an intra-domain disulphide bridge. Examination of the amino acid sequences of other species and classes of immunoglobulins showed that all their chains are probably folded into basically similar domains (reviewed by Beale and Feinstein, 1976). L-chains have two domains of which the N-terminal one (VL) has a variable amino acid sequence and contributes towards the antigen binding site. The C-terminal domain (CL) has a constant amino acid
400
D. BEALE i m m u n o g l o b u l i n s a n d histocompatibility antigens in a n a t t e m p t to assess the structural features t h a t are m o s t p r o b a b l y required by C d o m a i n s a n d C-like d o m a i n s as well as factors which m i g h t determine the type of d o m a i n (CH1, C H 2 etc.). I m m u n o g l o b u l i n V - d o m a i n s a l t h o u g h c o n f o r m a t i o n a l l y related to Cd o m a i n s have diverged in specialised ways ( E d m u n d son et al., 1975; Poljak et al., 1974) a n d are not included in the present comparison. The d o m a i n s of histocompatibility antigens t h a t appear to have little or n o relationship to i m m u n o g l o b u l i n C d o m a i n s are also excluded. Similarly with Thy-1 antigen which is related to i m m u n o g l o b u l i n s (Campbell et al., 1981) b u t is m o r e V-like t h a n C-like (Moriuchi et al., 1983). However, all these d o m a i n s are included in the discussion.
sequence within each type of L-chain. The N terminal d o m a i n (VH) o f all H-chains also has a variable a m i n o acid sequence a n d also contributes towards the antigen binding site. Each H - c h a i n has 3 - 4 o t h e r d o m a i n s ( C H 1, CH2, CH3, CH4) h a v i n g a c o n s t a n t a m i n o acid sequence within each class a n d subclass. Studies o n genes coding for m o u s e IgG~, IgG2, a n d IgG2b (Sakano et al., 1979; Y a m a w a k i - K a t a o k a et al., 1980; O11o et al., 1981) showed t h a t the d o m a i n s of a chain are encoded by separate D N A segments (exons) interspersed with n o n - c o d i n g segments (introns). This also applies to the m o u s e / ~ - c h a i n gene ( G o u g h et al., 1980) a n d h u m a n E-gene ( M a x et al., 1982; K e n t e n et al., 1982). The structures o f histocompatibility antigens (see reviews by Strominger et al., 1981; Shackleford et al., 1982) are also based o n peptide chains folded into d o m a i n s b u t in this case some o f the d o m a i n s have no a p p a r e n t relationship. Class I antigens consist of a light chain (fl2-microglobulin) o f 12,000 mol. wt. non-covalently associated with a heavy chain (~) o f 44,000 mol. wt. /~2-Microglobulin has a c o n s t a n t a m i n o acid sequence within a species a n d is closely related to i m m u n o g l o b u l i n C - d o m a i n s ( C u n n i n g h a m , 1974). The heavy c h a i n p r o b a b l y has three d o m a i n s (~ 1, ~2, ~3). The C - t e r m i n a l one is closely related to i m m u n o g l o b u l i n C - d o m a i n s but the other two a p p e a r to have little or n o relationship (Orr et al., 1979; Coligan et al., 1981; T r a g a r d h et al., 1980). Class II antigens consist of a 34,000 mol. wt heavy chain (~) a n d a 29,000 mol. wt light chain (/~). Each chain has two domains, the C - t e r m i n a l ones (~2, /~2) being closely related to i m m u n o g l o b u l i n C - d o m a i n s whereas the N - t e r m i n a l d o m a i n s (~1, i l l ) show n o such relationship (Kratzin et al., 1981; L a r h a m m a r et al., 1982a,b; K o r m a n et al., 1982; Lee et al., 1982; Auffrey et al., 1982; K a u f m a n a n d Strominger, 1982, 1983; Benoist et al., 1983). Structural studies o n genes coding for h u m a n class I a n d II antigens (Malissen et al., 1982; K o r m a n et al., 1982; Lee et al., 1982; Benoist et al., 1983) showed t h a t d o m a i n s are encoded by separate exons interspersed by introns. The present p a p e r examines the available a m i n o acid sequences of different species a n d classes of
Fab (New) CL domain Fab (New) CH1 domain Fc (Kol) CH2 domain Fc (Kol) CH3 domain
Table 1. Alignment used in the comparison of the amino acid sequences 110 120 130 1~ 155 Q P K A A P S V T L F P P S S E E L Q - - - A N K A T L V C L I S D F Y P - - G A V T V A W K A D S S - -120 130 140 150 160 A S T K G P S V F P L A P S S K S T S - - - G G T A A L G C L V K D Y F P - -EPVTVSWN-SG . . . . 2~ 250 260 270 280 APELLGGPSVFLFPPKPKDT-LMISRTPEVTCVVVDVSHEDPQVKFNWY-VDGVQ350 360 370 380 QP-REPQVYTLPPSREE---MTKNQVSLTCLVKGFYP--SDIAVEWE-SNGQPO © O
Segment number
METHODS
Amino acid sequences, not all complete, were taken from Kabat et al., (1983). These sequences have been determined from the proteins and/or by translation of nucleotide sequences. In many cases more than one sequence is available for a particular domain. These are usually identical or differ in one or two unimportant positions. Only one of these sequences is used in Table 2. The rare cases of important discrepancies amongst sequences will be mentioned in the text. Amino acid sequences for mouse class II ~2 domains were obtained by Benoist et al. (1983) by translation of nucleotide sequences. A total of 77 immunoglobulin domains has been used consisting of CL domains from human • and 2 chains; mouse x, 2~, 22 and 23; rabbit 2 and K chains (including the B4, B5, B6 and B9 aliotypes); rat ~ chain and pig 2; CH domains from human IgG~, IgG2, IgG3, IgG 4, IgA~, IgA 2, IgD, IgM and IgE; mouse IgG~, IgG2,, IgG2b, IgA, IgM, IgD and IgE; rat IgE, rabbit IgG, dog IgM and guinea pig IgG~ and IgG> A comparison of 22 histocompatibility C-like domains has been made including class I ~3 domains from human HLA-B7, - A 2 and - A 2 8 antigens, a human HLA-A,B,C pool and a human HLA clone; three mouse H2 antigens and three mouse H2 clones of undetermined specificity; ]~2-microglobulin from man, mouse, rabbit, cattle and guinea pig; class II ~2 domains from two human I antigens and two mouse I antigens and class II f12 domains from two human I antigens. In order to compare amino acid sequences they were aligned as described by Beale and Feinstein (1976) so that they could be related to X-ray crystallographic results by the
~1
O
O O O
bl
~2
©
O O O
b2
~1
b3
HLA-B7 (Class I ~3 domain) DPPKTHVTHHP I S . . . . . . DHEATLRCWALGFYP - - AE I T LTWQRDGL - - Column number 10 20 30 40 50 A single letter code for amino acid residues is used as follows: A, alanine; B, aspartic acid or asparagine; C, cysteine; D, aspartic acid; Q, glutamine; R, arginine; S, serine; T, threonine; V, valine; W, tryptophan; Y, tyrosine; Z, glutamic acid or glutamine. The amino occupying similar positions in three dimensional space fall into the same column. The amino acid sequences of the CH2 and CH3 Residues which form the strands of the two anti-parallel/~-sheets are underlined and /%structured and non-fl-structured segments are have a small circle below them. The sequence numbers used by Poljak et al. (1974) and Deisenhofer (1981) are placed above each which are placed at the bottom of the alignments. The sequence of the ~3 domain of HLA-B7 (Orr et al., 1979) has been aligned The sequences of all other immunoglubulin C-domains and histocompatibility C-like domains have been aligned in a similar manner to Table 2 and their correlation with three dimensional features given above and shown in Fig. 1 is discussed in the text.
Immunoglobulin from different mammalian species method of Poljak et al. (1974). The comparison involved the use of crystallographic models built by Mrs Mary Saunders from co-ordinates kindly provided by Dr R. Poljak and Dr Deisenhofer. Predictions for fl-structure were made using the methods of Lim (1974) and Chou and Fasman (1978). Information about the orientation of side chains and contact residues was also obtained from Poljak et al. (1974); Edmundson et al. (1975); Saul et al. (1978); Deisenhofer (1981).
RESULTS General comments
In Table 1 the amino acid sequences of the CA and C~I domains of Fab (New) are aligned with each other and with the C72 and C~,3 domains o f F c (Kol), every tenth residue is numbered according to the X-ray crystallographers (Poljak et al., 1974; Deisenhofer et al., 1981). This means that residues falling into the same column in Table 1 have different sequence numbers. Hence column numbers will be used in the text. F o r example the first invariant cysteine residues have sequence numbers 136, 144, 261 and 367 but column number 32. In some columns such as 77 and 78 the Fab and Fc domains do not record a residue. This is because some domains such as C/~2 have extra residues relative to CL, C7 1, C72 and C~ 3 domains. The amino acid residues forming segments of fl-structure in the crystallographic models (Poljak et al., 1974; Deisenhofer et al., 1981) are underlined and those whose side chains point to the interior of the domain are indicated by small circles below the sequences. Segments are labelled according to Beale and Feinstein (1976). The amino acid sequence of histocompatibility antigen HLA-B7 is aligned for convenience although no three dimensional structure is yet available. All available amino acid sequences of immunoglobulin C-domains and histocompatibility C-like domains were aligned in accordance with Table 1 and columns showing a marked tendency to conserve particular types of amino acid residues are given in Table 2. A single letter code for amino acids based on that introduced by Dayhoff and Eck (1968) is used. A number placed after a letter indicates the recording of a particular residue.
The amino acid sequences are compared in groups with increasing degrees of domain specialization. The first group consists of all available immunoglobulin C-domains and histocompatibility C-like domains (a total of 99). The second group consists of all immunoglobulin C-domains (a total of 77) and the third group of all histocompatibility C-like domains (a total of 22). Groups 4-8 deal with the different immunoglobulin C-domains; CL(13), CH1 (19), Cextra (5), CH2 (19) and CH3 (21). It should be remembered that C extra refers to C/~2 and CE2 domains; that C/~3 and CE3 are grouped with CH2 and that C/~4 and Ce4 are grouped with CH3. Groups 9-11 deal with fl2-microglobulin (5), Class I ~3 domains (11) and Class II ~t2 domains (4) and f12 domains (2). When there are four or less sequences in a group only columns recording identical residues are shown. It is a c o m m o n occurrence that during the amino acid sequencing of several hundred residues a few errors will arise. Therefore when sequences are compared a unique exception to conservancy points to a possible error and requires confirmation. Errors in nucleotide sequencing are very rare but there is the danger that a pseudogene has been processed instead of the true gene. A pseudogene could code for a pseudoprotein lacking one or more essential structural criteria. In rare instances where there is a difference between a residue obtained by protein sequencing and a residue obtained by translation of a nucleotide sequence, and this difference involves a unique exception to conservancy, only the conservative residue is recorded in Table 2. However the discrepancy is mentioned in the text. It will be seen that the number of columns showing a tendency to conserve certain types of amino acid residue increases with group number (i.e. with domain specialization), although of course fewer sequences are being compared. Beale and Feinstein (1976) have already pointed out that there is a greater homology between a particular type of domain, from different species (i.e. human, mouse, rabbit, guinea pig C7 l) than there is between different domains (i.e. C y l , C~2, Cy3) from the same species. This is because a particular type of domain has tended to
of immunoglobulin C-domains and histocompatibility C-like domains 160 170 180 P V K A - G V E T T T P S K Q - - SNN- - K Y A A S S Y L S L T P E Q 170 180 190 ALTS-GVHTFPAVLQ- -SSG--LYSLSSVVTVPSSS290 300 310 - -HNAKTKPR-EQQYNS ..... TYRVVSVLTVLHQN390 400 410 -N---NYKTTPPVLD- -SDG- -SFFLYSKLTVDKSRo o
b3
o
o
o
o
o
401
200 210 -WKSHKSYSCQVTH- - E G - S T V E K T V A P T E C 200 210 220 - -LGTQTYICNVNHKPSN-TKVDKKVEPKSC 320 330 340 - W L D G K E Y K C K V S N K - A L P A P I E K T I SKAKG 430 ~0 -WQQGNVFSCSVMHEALHNHYTQKSLSLSPG o o o o
o
o
~3 b4 ~4 b5 ~2 b6 ~3 230 240 250 260 270 DQT-QDTELVETRPA- -GDR--TFEKWAAVVVPSG ..... EEQRYTCHVQH- - E G - L P K P L T 60 70 80 90 100 110 1~ E, glutamic acid; F, phenylanine; G, glycine; H, histidine; I, isoleucine; K, lysine; L, leucine; M, methionine; N, asparagine; P, proline; acid sequences of the CL and CH1 domains of Fab (New) have been aligned by the method of Poljak et al. (1974) so that residues domains of Fc (Kol) have been similarly aligned using the information on three dimensional structure given by Deisenhofer (1981). numbered according to Beale and Feinstein (1976). Columns containing residues whose side chains point towards the domain interior sequence but since residues falling into the same column will have different sequence numbers it is necessary to use column numbers as described by Beale and Feinstein (1976) for ease of reference but no X-ray crystallographic results are yet available. those shown above. Columns in which there is a marked tendency to conserve a particular type of amino acid residue are given in
2 3 4 5 6 7 8 9 10 II 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
I
Col. No. in Table 1
V58,*I4,T1,G1,NI
"67,T9
V65,*24,T6,G I,N 1
*87,T9,HI
V42,'34
F38,YI6,*I9,S2,HI
P55,X21
V38,'36
V47,'29
V51,'47
F58,*37,S2,H 1
P77,X21
V41,I44,* 11
V53,'45
C98
L46,'30 T35,*24,GI 5,$2 C76
L67,'31
*65,S5,T4,R2
P62,*12,$2
Group 2. imm' C-domains
*22
I19,V3
P22
F20,I2
C22 "20,$2 *22
*10,Tll L21,I1
*20,H1 *10,h12
*17,T4
P21
V13
V7,'4
P13
013
"13
* 13 hi3 "13 VI3 C13
LI2,VI
*13 F13 P13 P13 S11,A2
VI2,L1
A9,h3 P12,AI
G r o u p 3. his' C-like Group 4. domains CL
V12,'6
PI0,h6 V10,*8
PI8
017,S1
L12,'6 G15,A3 C18 L18 V12,'6
V 12,'6,T 1
V14,'4 FI4,Y4 PI7,L1 LI7,II
P18
G r o u p 5. CH1
*4
I3,V1
04,H 1 h4,l 1 P4,L1
*5 h4,Y1
L5 *5 C5
*5 P3,L2
*4,G 1
V3,P2
P3,T2
13,V2
Group 6. C-extra
Fll,*8
V15,'4
• 18,S1
• 19 TI8,SI C19 • 16,T3 V16,L3
P 12,A6.T 1
L16,I3 "19 I1 I,L6,N 1,QI
DI 5,N2,$2
PI0,*9
P18,V1
* 17,T I,N 1 F13,Y2,S3 LI0,*7,T2
P12,'6
Group 7. CH2
V19,L2
116,V5
P20,S 1
GI4,X7 FI9,YI,LI
LI8,V2,I1 H18,L2,A1 C21 LI 5,M4,E2 VI3,I6,A2
* 17,$2,R2
LI0,*9,G1
PI7,*3,G1 P18,L2,T1 P10,h8,A3
L8,*6,T7
VI8,L2,TI
P18,V2,SI
Group 8. CH3
Table 2. Conservancy of amino acid residues in immunoglobulin C-domains and histocompatibility C-like domains
P83,*12,$2
Group 1. imm' C-domains and his' C-like domains
All D6,E5 Ill Tll Lll Tll
P4,S 1 I5 E4,DI 13,V2
G5 F5 H5 P5
* 11 Tll LI 1 RI I CI I WI I AI I L11 Gll FII Yll P11
$8,P3
PI0 KI0 A6,T4 H10 *9,H1 TI1 HII HII
D9
N5 •5 L4,II N5 C5 Y5 V5
G5 K5
I3.V2 Q5 V5 Y5 $5 R5 H5 P5 A3,P2 E5 N5
"3.T2 P5
Group 10. Group 9. Class I fl2-micro 73
P4 V4 V4 N4 *4 T4
P4
*4
L4 14 C4 *4 *4 D4
*4 N4
14
$4 P4 V4
V4 T4 V4 *4
*4 P4
V2 R2
I2
F2 Y2 P2
H2 H2 N2 L2 L2 V2 C2 $2 V2 h2
*2 L2
P2 K2 *2 T2
V2 T2 *2
P2
R2 R2 V2
Q2
T2 *2
Group II. 12. Class II 72 f12
>
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 ~00
063,*7,TI,QI,KI
083,*7,T2,Q1,K1
L58,V13,A3
V35,*35,G3,h2
L64,*29,T3
V46,*45,G3,T2,S1
$68,G3,'2
*62,h10,Ql
W76
W93,L4,MI
LI3 hl2,Ql
*19,T3 *21,T1 *16,T6
HI0,X2
W6,Y7
LI3
*22
Y12,F1
h13
GI2,NI *12,S1
"13
WI3
"13 S13 S13
h18,L2 F20
D22
h16,'6
W17,'5
VI3,L5 T12,'3,$3 VI0,L6,h2 P12.*2.h4
Y16,K1 hll,*5 *10,h6 h14,'4 S15,'2
P15,F3,AI
WI8
W19,$2
WI8 W5 L4,M 1
GI6,R1
V19,LI,M1
L20
LI5,A3 PI0,T6,S2 *15,G3
L5 T4,NI I4,LI
$20
*18,T2 S15,G3
VI4,A3,11,Q1
V3,T2
018,TI,WI
W21
$5
013,*4,T1,Q1
W19
Y3,A2
*4,TI
V4,A 1
T3,'2
*4,TI
*4,T1
G4,RI
W5 L4,SI
D4,NI E4,T1
F5 T5 P5
T9 F9
S4,T 1 F5 Y5 L4,I 1 L5 "4,S1 H4,Y1 T3,A2
Ell Q10,LI
K9 Wl 1 A10,VI A6,$5 V11 V11 VI 1 PI 1 $5,L6 GI 1
GII DI 1
E 11 Lll V11 E 11 TII RI 1 PI 1 A11
Qll DI0,E1
TI0,II
GI! Ell
WI1 QII
K5 D5 W5
$5 D5 L4,MI $5 F5 S4,N1
V4 E5
K4,R1 15
L4,MI L5 K5 N4,DI G5
-4 -4 *4
F4 *4 P4 $4
Y4 L4
K4 *4
F4
D4 W4
F4 *4
E4 T4
G4 V4
V4 T4
N4 G4 +4
W4 L4
G2 -2 V2
P2
L2 V2 *2 L2 E2
T2 F2 Q2
I)2
L2 I2
V2 $2 T2
A2 G2
E2 E2
Q2
N2
W2 F2
..-t
p~
e.,,
o
I:1 Q Q
Group 2. imm' C-domains
Group 3. his' C-like Group 4. domains CL Group 5. CHI
Table 2. (contd)
Group 6. C-extra
Group 7. CH2
G r o u p 8. CH3
G r o u p 9. /~2-micro
Group 10. Class I ~3
Group 11. 12. Class II :~2 /~2
Y22 *11,S1 * 16,T1,H 1 05 F11,Y5,V2 FI2,Y9 Y5 YI 1 Y4 Y2 F30,Y28,* 15,h2,H 1 Y50,F30,* 15,h2,H l 101 T5 hl9,II,Q1 $3,A2 T11 D4 T2 102 C22 C12 C19 C5 C19 C21 C5 CI 1 C4 C2 C76 C98 103 R5 H8,R3 104 V22 Vl I,AI VI 7,I2 V5 VI 5,A2,LI V20,A 1 V5 V11 V4 V2 V68,'7 V90,'7 105 -4 E2 106 H22 H7,X3 HI7,YI,K1 H4,Y1 H8,XI0 H21 H5 HII H4 H2 H57,X16 H79,X16 107 E20,G 1 108 109 • 14,$2,H1 *5 W4 P2 110 T4,SI G11 G4 $2 111 112 "21 L4,MI L10 L4 *2 113 *18,Q1,+2 PI1 4 114 V10,I2 E3,Q2 E4 $2 115 P20 P5 P9 P4 P2 116 L9 *4 *2 117 12,T2 T9 T2 118 "12 VI2,*4,G1 115,'2,T1 *18,$2,T1 V5 L9 K4 V2 119 R9 H4 E2 120 WI9 KI8,R1 W5 W8 W4 W2 121 122 123 C13 124 The amino acid sequences of immunoglobulin C-domains and histocompatibility C-like domains were aligned in accordance with Table 1 and columns which showed a marked tendency to conserve particular amino acid residues are listed above together with the residues involved. The single letter code for amino acid residues is the same as that used in Table I with the following additions: *, hydrophobic residues; h, hydroxyamino acid residues; 0, aromatic residues; + , positively charged residues; - - , negatively charged residues; X, any residue. Numbers placed after the code letters indicate the number of times the respective amino acid residues occurred in a particular column. Sequences were compared in different groups as indicated above. C-extra domains involve C~2 and C~2. CH2 domains include C/~3 and C~3. CH3 domains include Ctt4 and C~4.
Col. No. in Table 1
Group 1. imm' C-domains and his' C-like domains
Immunoglobulin from different mammalian species evolve to carry out a particular function (i.e. C),2 and complement fixation). Tables 1 and 2 also show that most of the columns recording conservancy in the first few groups correspond to residues lying in the t-strands of the crystallographic models suggesting that these are some of the essential factors for a typical C-domain structure. Later groups (more specialized domains but fewer sequences) show further conservancy in columns corresponding to residues involved in the non-t-structured segments of the crystallographic models, suggesting that specialization involves partly, features that are additional to the main C-domain criteria. Isoleucine (I), leucine (L), valine (V), methionine (M), alanine (A), proline (P), tryptophan (W), phenylalanine (F) and tyrosine (Y) are classed as hydrophobic residues and given a general symbol * in Table 2. However, the smallness of the alanine side chain means that it makes little contribution to the hydrophobic interiors of proteins. Proline is limited in its usefulness at hydrophobic sites since its aliphatic side chain forms a ring with the peptide chain backbone. The aromatic ring of a tyrosine residue is often orientated so that the hydroxyl group is directed towards the exterior of protein molecules although most of the aromatic ring can occupy an internal position. Alternatively the hydroxyl group can be hydrogen bonded to the peptide chain backbone or to an adjacent polar group. The polar residue threonine (T) is found fairly frequently in certain positions generally occupied by hydrophobic residues. Here again the hydroxyl group is usually orientated to the protein exterior or is hydrogen bonded. On a few occasions serine (S) can also occupy some positions generally filled by hydrophobic residues. Here again the hydroxyl group is directed outwards or hydrogen bonded. As such serine plays a neutral role in that it does not disturb the hydrophobic interior but does not make any contribution to it. Glycine, with no side chain, can also be regarded as neutral. Asparagine (N) and glutamine (Q) residues having relatively large polar groups are very rarely found at positions that normally have hydrophobic residues. The charged residues glutamic acid (E), aspartic acid (D), arginine (R) and histidine (H) cannot be accommodated in hydrophobic regions. However the charged group of lysine (K) is at the end of an aliphatic chain that is sometimes long enough to lead the charged group to the exterior, but this is a rare event.
Conservancy in all immunoglobulin C domains and histocompatibility C-like domains (group 1) It will be seen from Table 2 (group 1) that the only invariant residues throughout the 99 sequences compared fall into columns 32 and 103 (Table 1). They contain the two cysteine residues which form the intra-domain disulphide bridge. There are six positions which always contain amino acid residues with hydrophobic side chains and involve columns 30, 34, 44, 46, 48 and 105. These six exclusively hydrophobic positions correspond to residues on the crystallographic models which lie in the strands of the t-pleated sheets (fx and fy in
405
"O/¢ Fig. 1. A schematic drawing of a CL domain based on high resolution X-ray crystallography of a Bence Jones dimer (Edmundson et al., 1975). It is used in the text to represent a typical immtmoglobulin C domain. The numbering of the /~-struetured and non-t-structured segments is the same as that used by Beale and Feinstein (1976). Fig. 1), have their side chains pointing into the interior of the domain and make essential contributions to its hydrophobic nature. Ninety three of the 98 sequences available for comparison as regards col. 48 record a tryptophan residue. Only in the five fl2-microglobulin sequences (group 9) are other hydrophobic residues found in this position. For col. 105 90 out of 97 sequences record a valine residue. This position is occupied by an alanine residue in the mouse Cx domain, the Cot2 domains of human IgA, and IgA2, and in the human Ce4 domain. Isoleucine is found at this position in mouse C# 1 and C61 domains and there is a leucine residue in the human C~2 domain. There is a discrepancy between the sequences of human CE2 domain as regards col. 30. Analysis of the protein is reported to give a glutamine or glutamic acid residue whereas translation of the nucleotide sequence gives a leucine residue. This difference is probably due to a protein sequencing error so a leucine residue is recorded in Table 2. There are four positions (cols 8, 10, 86 and 88) which always contain residues with uncharged side chains which are nearly always hydrophobic. These residues correspond to ones on the crystallographic models which lie within or at the ends of t-strands and the side chains are directed towards the domain interior. These side chains generally but not necessarily contribute towards its hydrophobic nature. Eighty three of the recordings for col. 8 are due to plmline residues. On the crystallographic models this proline bends the peptide chain into the plane of the t-pleated sheet. Twelve of the remaining recordings are due to other hydrophobic residues. Indeed, the only exceptions to the hydrophobic nature of this position are findings of serine residues in the C72 domain of mouse IgG1 and the C6 3 domain of mouse IgD. Column l0 contains 89 hydrophobic residues out of a total of 97. There are six recordings of threonine, a residue which occurs fairly frequently in hydrophobic positions, and single occurrences of glycine and asparagine residues. The latter two are found in
406
D. BEALE
the dog C#2 domain and the mouse C/~3 domain. There is some discrepancy here since the asparagine residue was obtained from the protein sequence of MOPC 104E whereas translation of the nucleotide sequences from two IgM producing mouse cell clones give threonine residues. There are only three exceptions to the hydrophobic nature of col. 86. Threonine residues are found in fl2-microglobulin of man, mouse and rabbit. Ninety-one of the recordings for col. 88 are due to hydrophobic residues. The C~2 domains of human IgA1 and IgA 2 and mouse IgA have glycine residues at this position. The human D R antigen f12 domain and IgD C61 domain have threonine residues and the mouse CE 1 domain has a serine residue. There are four other positions (cols 12, 37, 80 and 101) which would contain exclusively residues with uncharged side chains but for a unique exception in each case. On the crystallographic models these sites lie at the ends of fl-strands and the side chains point towards the domain interior. These sites can contribute towards the hydrophobic nature of the domain interior but need not necessarily do so. Column 12 records 87 hydrophobic residues, 6 threonine residues and 1 histidine residue. This unique exception to the uncharged nature of this site occurs in the ct3 domain of the human class I antigen HLA-A2. It could be due to a sequencing error and needs confirming. Ninety five hydrophobic residues are recorded for col. 37, 58 of which are due to phenylalanine. The mouse C61 domain and human C6 2 domain have serine residues in this position. The unique exception to the uncharged nature of this site is due to a histidine residue in the mouse CE 2 domain. This was obtained by translation of a nucleotide sequence and obviously needs confirming. For column 80 there are 83 recordings of phenylalanine or tyrosine residues out of a total of 94. The requirement for this is not clear from the crystallographic models since the aromatic side chain is only partially buried. It could help to shield the side of the domain at a place where the two fl-sheets are twisting away from each other. Seven of the remaining residues have hydrophobic side chains. Mouse C/~ 3 and rat Cc 3 domains have threonine residues, human C62 has glutamine and mouse CE 1 domain has lysine. This single exception to the uncharged nature of col. 80 has been obtained by translation of a nucleotide sequence and needs confirming. However, since the site is at the edge of the fx fl-pleated sheet and the charged group of lysine lies at the end of a long aliphatic chain it could reach the exterior of the domain. Eighty residues with aromatic side chains are recorded for col. 101 and fifteen more have other hydrophobic side chains. The 21 chain of MOPC 104E gives a serine residue but this could be a protein sequencing error since translation of the nucleotide sequence from a 21 producing mouse cell clone gives a tyrosine residue. A similar situation arises with the Cyl domain of MOPC 173 where the protein sequence gives a threonine residue but translation of nucleotide sequences from two IgG2a producing mouse cell lines give isoleucine. The exception to the uncharged nature of this site occurs in the mouse CS 1
domain where translation of a nucleotide sequence gives a histidine residue. Again confirmation is needed. There are two other interesting cases of conservancy. Column 39 records 77 proline residues and column 107, 79 histidine residues. All histocompatibility C-like domains, CL domains, CH1 domains and all but one of the CH3 domains have the proline residue which helps to determine the shape of segment bl (Fig. 1). All histocompatibility C-like domains, CH3 domains and most CH1 domains have the histidine residue. In the crystallographic models it is hydrogen bonded to several neighbouring residues and helps to determine the shape of segment b6 (Fig. 1). In the case of most of the 99 sequences used in this comparison prediction methods gave stretches of fl-structure corresponding with segments fxl, fx2, fx4, fyl and fy2 of the crystallographic models (Fig. 1). These are the regions where conserved columns predominate (Table 2) and homology is generally good. In only about half of the sequences did prediction methods indicate fl-structure corresponding to segments fx3 and fy3. There are no or very few columns showing over all conservancy in these parts of the sequences and homology is generally poor.
Conserved features in all immunoglobulin C-domains (group 2) As well as the conservancy just described for all immunoglobulin C domains and histocompatibility C-like domains the former show additional conservancy within themselves. Thus it will be seen in Table 2 (group 2) that for column 28, 65 of the 76 available sequences have a hydrophobic residue. Nine of the remaining sequences have threonine or serine and there are two recordings of arginine residues. These two exceptions to the uncharged nature of this position occur in the human and mouse CE4 domains. In the crystallographic models the residues corresponding to col. 28 are partly buried at the very end of the fx fl-pleated sheet. This is clearly not a feature of CE 4 domains because the charged arginine side chain must lie externally. All residues of col. 31 are uncharged. Column 82 also now contains only uncharged residues and 62 of them have hydrophobic side chains. In the crystallographic models the side chains at this site point to the domain interior. Sixty-eight out of 73 C-domain sequences place a serine residue in col. 84. This serine is probably hydrogen bonded to the tryptophan residue (invariant in immunoglobulin C domains) in col. 48, a situation which must be a great advantage to domain structure. The amino acid sequence of mouse lgG2a C), 1 domain gives a valine residue in col. 84, however this could be a sequencing error since reading from the nucleotide sequence of the gene gives a serine residue. The mouse CS 1 domain gives a leucine residue translated from the nucleotide sequence. However, as mentioned in other parts of the text the C61 sequence shows several unusual features and could be derived from a pseudogene. The C#2 domains of human, mouse and dog IgM have glycine residues in this position.
Immunoglobulin from different mammalian species
Conserved features in all histocompatibility C-like domains (group 3) It should be remembered that relatively few sequences are available and some of these differ at only a few positions. Therefore the results given in Table 2 might tend to be biased. However accepting this limitation some of the conserved features discussed above for all immunoglobulin C-domains and histocompatibility C-like domains (group 1) become even more pronounced when histocompatibility C-like domains alone are considered. Thus these domains all have a proline residue in col. 8, large hydrophobic residues in col. 30, proline residues in col. 39, phenylalanine residues in col. 80, tyrosine residues in col. 101, valine residues in col. 105 and histidine residues in col. 107. The histocompatibility C-like domains also show additional features such as in cols. 13, 29 and 33 where all residues are uncharged. Twenty of the 22 recordings in col. 33 are hydrophobic residues. All 22 sequences available for comparison as regards col. 83 place residues with large hydrophobic side chains in this position. This feature is not found in any of the immunoglobulin C-domain groups indeed CL and CH1 domains (groups 4 and 5) show a marked tendency to conserve the hydroxy amino acid residues threonine and serine at this position as inter-domain contacts. It is possible that this site is an important hydrophobic contact between histocompatibility C-like domains. However, the situation is complicated by the fact that the existence of regular fl-structure between cols 80 and 85 corresponding to segment fx4, in at least some of the histocompatibility domains, seems uncertain using prediction methods. The Class Ict 3 and Class II ~2 domains (groups 10 and 11) have a lysine residue in col. 82 so it seems highly unlikely that this can be an internal site as in immunoglobulin C-domains. The distance would be too great for the charged group to reach the domain exterior. Hence there might be a shortening or distortion of fl-structure relative to the crystallographic models. Continuing with the additional features of histocompatibility C-like domains it will be seen that col. 89 now contains only uncharged residues, 16 of which are hydrophobic and the remaining 6 are threonines. The conserved histidine residue in column 107 is also largely conserved in immunoglobulin CH1 and CH3 domains. It is near to residues in the conserved cols 37 and 39 and to a lesser extent col. 8. It could also hydrogen bond to one of several nearby polar residues. As such this histidine plays an important part in shaping segment b2 (Fig. 1). Column 113 now contains exclusively hydrophobic residues, col. 116 proline residues and in column 121 the 19 available sequences contain a tryptophan residue. Only some of these features can be found in a few immunoglobulin C-domain groups such as hydrophobic residues in col. 116 of the Cp2 domain sequences. It seems that this segment of the histocompatibility C-like domain sequences corresponding to the fl-strand fy3 of immunoglobulin Cdomains (Fig. 1) has developed more specialized features. This may be related to the fact that at least in the case of Class I ct3 domains and class II ~t2 and /32 domains this segment lies adjacent to the cell
407
surface. Indeed the tryptophan residue in col. 121 is translated by the last codon of the ~ 3 exon in the Class I ~t gene (Malisen et al., 1982) and the last codon of the ~2 exon in the Class II ~ gene (Korman et al., 1982; Lee et al., 1982; Benoist et al., 1983). In both cases the following exon codes for the transmembrane part of the sequence.
CL domains (group 4) Table 2 shows that conserved tendencies are noticeably increased relative to former groups. The role of the residues in some of the columns is not clear but in other cases there are interesting correlations between conservancy and three dimensional structure apart from those already discussed above. Thus the residues in cols. 13 and 14 form contacts with the CH 1 domain and the conserved proline residue in col. 15 is a major factor in determining the direction of segment bl in Fig. 1. The hydrophobic residue in col. 20 is near to the conserved proline in col. 15 and the conserved hydrophobic residue in col. 28. This cluster of side chains almost certainly plays an important role in determining the shape of segment bl. Column 31 which contains exclusively valine is another important site of contact with the CH1 domain as is the serine in col. 83. Column 95 which contains residues with large aromatic side chains is near to the tyrosine residue in col. 101 and the conserved leucine residue in col. 88. It helps to establish the spatial relationship between segments fx4, b5 and fyl. Column 115 and 119 both have conserved hydrophobic residues which contribute towards the domain interior. CH1 domains (group 5) Apart from the conserved features discussed above for domains in general CH1 domains show other characteristics. Thus aromatic residues are conserved in col. 11 and hydrophobic ones in col. 13. Both these sites make important contacts with the CL domain as do the conserved leucine residues in col. 33. The residues in cols. 81, 83 and 87 can also make contacts with the CL domain. There is no noticeable tendency for CHI domains to conserve an aromatic residue (usually tryptophan) in col. 95 as do CL domains (discussed above), CH2 domains and most CH3 domains (groups 4, 7 and 8). Of the 19 CH1 sequences 17 have a histidine residue in col. 107. The role of this has already been discussed above (group 3). Column 119 corresponds to a site on the crystallographic models where the amino acid side chain contributes to the hydrophobic interior of the domain. CH 1 domains conserve this except for the C6 1 domain of mouse IgD. As discussed above this amino acid sequence has been read from the nucleotide sequence of the gene and there are other instances (cols 85, 101,107) in Table 1 where this C61 sequence deviates from other CH1 domains. C extra domains (Cp2, CE2) (group 6) Homology considerations (Low et al., 1976; Beale and Feinstein, 1976) and model building (Beale and Feinstein, 1976; Feinstein and Beale, 1977) indicate that the Cp4 domain corresponds to C73, and C#3 to C72. The Cp2 domain has no counterpart in IgG unless it is regarded as replacing the hinge region. The Ce2 domain can also be regarded as an extra domain.
408
D. BEALE
Table 2 shows several characteristic differences between C extra domains and CH2 domains (group 7). The latter have conserved hydrophobic residues for cols 15, 17, 22 and 23 which the former do not have. C extra domains have conserved hydrophobic residues in cols 13 and 39 but CH2 domains do not have these features. In col. 31 C extra domains have conserved hydrophobic residues whereas CH2 domains have conserved hydroxyamino acids. The conserved hydrophobic residues could be involved in inter-domain contacts since their position is analogous to contacts used by CL and CH3 domains.
CH2 domains (group 7) As well as the general domain characteristics discussed above, CH2 domains conserve proline residues in col. 15 except in the case of mouse IgG2a. This however could be an amino acid sequencing error since reading from the nucleotide sequence gives the expected proline residue. CL domains (group 4) and nearly all CH3 domains (group 8) also have this proline residue which plays an important role in shaping the segment bl in Fig. 1. The conserved hydrophobic residues in cols 17, 22 and 23 and basic residues in col. 121 can make important longitudinal contacts with the CH3 domains. It should be noted that C/~3 and Ce3 domains have such residues to form potential contacts with C/~4 and CE4 domains. All CH2 domain sequences place a tryptophan residue in col. 95. The large side chain is directed towards the interior of the domain and is spatially close to the side chains of the conserved hydrophobic residue in col. 15, the conserved hydrophobic side chain of col. 17, the conserved hydrophobic side chain of col. 34 and the conserved hydrophobic, usually aromatic, residue in col. 101.
CH3 domains (group 8) There is a strong tendency for proline residues to be conserved in cols 14, 15 and 16. Indeed nearly all sequences have at least two of these sites containing proline. The C6 3 domains of human and mouse IgD are a noteable exception since they have no prolines in this part of the sequence and the CE4 domain of human IgE has proline only in col. 16. These proline residues put characteristic turns in segment bl (Fig. 1) and their absence from the C63 domain suggests that the shape requirements for this segment are different. Columns 31 and 33 are of particular interest since they would be potential sites for CH3-CH3 contact. Of the 21 recordings 18 in col. 31 are hydroxyamino acids, 16 of them due to threonine residues. Again the C6 3 domains of man and mouse differ from the other CH3 domains by having leucine residues in this position and the CE 3 domain of man has an alanine residue. Of the 20 residues recorded in col. 33 19 have large hydrophobic side chains ideally suited for CH3-CH3 contact. The two remaining sequences, human and mouse C63 domains have a glutamic acid residue which is rather incompatible with CH3-CH3 contact. Indeed in all C-domains and C-like domains the only other instance of a charged residue in this position occurs in the C/~2 domain of mouse IgM. CH3 domains, including C#4 and Ce4,
conserve glutamic acid residues in col 108 to make contacts with CH2 domains, including C# 3 and CE 3. The conserved residues in cols 8, 34, 37, 39 and 107 are spatially near to each other. Such a cluster of conserved residues helps to characterise the shape of this part of the CH3 domain. Similarly the largely conserved tryptophan residue in col. 95 is spatially near to the proline residues in cols 14 and 15 (already discussed), the largely conserved hydrophobic site in col. 28, the conserved hydrophobic residue in col. 88 and the largely conserved hydrophobic site in col. 119. Again such a cluster must help to characterise the shape of the CH3 domain. The only CH3 domains which do not have a tryptophan residue in col. 95 are the C~3 domains of mouse and human lgD. They also do not have a hydrophobic residue in the spatially related col. 119. Again this suggests that the C63 domains may have developed some characteristically divergent features relative to other CH3 domains.
fl2-microglobulin (group 9) The homology between fl2-microglobulin from different species is quite high as evidenced by the number of columns showing conservancy (Table 2). One striking feature of this type of domain is the absence of a tryptophan residue for col. 48. A tryptophan residue in this position is usually important for shielding the intra-domain disulphide bridge. Wolfe and Cebra 0980) have suggested that the tryptophan residue in col. 121 might take over the role of the tryptophan usually found in col. 48. Unfortunately an examination of the crystallographic models reveals that a tryptophan residue in col. 121 is too far from the intra-domain disulphide bridge to offer any shielding. Further the tryptophan residue in 121 has already been discussed above as a unique feature in all histocompatibility C-like domains even those with the usual tryptophan residue in col. 48. It seems very unlikely that both these tryptophan residues could be accommodated into the environment of the intra-domain disulphide bridge. fl2-Microglobulin could partially compensate for the absence of tryptophan in col. 48 by conserving large hydrophobic side chains at many internal sites such as valine or isoleucine in col. 10, valine in col. 12, leucine or isoleucine in col. 30, valine in col. 34, phenylalanine in col. 80, leucine or isoleucine in col. 82 and so on. This topic is dealt with further under general discussion. fl2-microglobulin interacts non-covalently with the class I or-chain. Since the ct3 domain has pronounced C-like features it is reasonable to presume that there might be C-like contacts between these two histocompatibility domains analogous to those found in the crystallographic models of IgG Fab and Fc fragments. Potential contact sites are col. 13 where fl2-microglobulin conserves tyrosine, col. 29 where large hydrophobic side-chains are conserved, col. 33 which conserves tyrosine, col. 81 which also conserves tyrosine and col. 83 where leucine is conserved. Further, cols. 11 and 31 do not contain charged residues. Unfortunately the potential of the class I • 3 domain to form a C-like contact is less clear as discussed below.
Immunoglobulin from different mammalian species Class I ct3 domains (group 10)
At present the group consists of very homologous sequences so that the number of conserved columns (Table 2) is quite high. The recording of a histidine in col 12 due to the H L A - A 2 protein sequence must be considered dubious. None of the other 99 domains considered in this paper has a charged residue at this position and all other histocompatibility C-like domains have residues with large hydrophobic side chains. These side chains would be expected to contribute towards the hydrophobic interior of the domain. Due to the strong homology between the class Ict 3 domain and immunoglobulin C-domain one might have expected C-like contacts to be involved in the interaction with fl2-microglobulin. However, there are features of the class I ~ 3 domain sequence which do not favour such a type of contact. Indeed the conserved lysine residues in col. 9, conserved histidine residues in col. 11 and conserved arginine residues in col. 31 seem to be deliberately designed to prevent C-like contact. Since the class I ct3 domain has no features characteristic of immunoglobulin V-domains (Schiffer et al., 1974; Poljak et al., 1974; Beale and Feinstein, 1976) a V-like contact would not be expected. In any case the charged residue in cols. 100 and 104 would not favour such a contact. It therefore seems likely that fl2-microglobulin and the class I ct3 domain interact in a unique way. fl2-Microglobulin also probably interacts with the class Ict 2 domain which shows little or no homology with C-domains. It has already been mentioned above that the conserved lysine residue in col. 82 of class I ~3 suggests some divergence from C-domains. Usually this is a hydrophobic site lying on the fl-strand fx3 and contributes to the domain interior. It is possible that the /~-structure is distorted here (prediction methods give a low probability for/~-formation) so that the charged group can be directed outwards. Table 2 shows many other conserved sites for class I ~ 3 domains but the limited number of sequences being compared makes any correlation with three dimensional structure rather premature. Class I I ~2 and f12 domains (groups 11 and 12) Three identical sequences have been reported for human DR antigen ct2 domain by translating the nucleotide sequence from cDNA (Larhammar et al., 1982; Lee et al., 1982) and from the gene (Korman et al., 1982). These are used as one sequence in Table 2. The other ct2 domain sequences are from human DC1 antigen (Auffrey et al., 1982) and mouse Ia-E and Ia-A antigens (Benoist et al., 1983). Due to the small number of sequences only positions showing identical or a similar type of residue are recorded in Table 2. The number of such recordings is quite high indicating that there are conserved features which have withstood mutational pressure. However, the smallness of the group does not allow any meaningful correlation with three dimensional structure to be made apart from that already discussed for histocompatibility C-like domains in general (group 3). It is worth noting here that 82~ of the residues in human DR ~2 domain are identical with th~se in
409
mouse Ia-E ct2 domain and 75~ of the residues in human DC1 ct2 domain are identical with those in mouse Ia-A ~2 domain. In both cases many of the remaining positions have conservative substitutions. Human DR and DC1 ~2 domains have only 64~ identity while mouse Ia-A and Ia-E ~t2 domains have 60~. Hence a particular type of ct2 domain (DR and I-E or DC1 and I-A) shows less divergence in different species (man and mouse) than different types of ~t2 domain (DR and DC1 or I-A and I-E) in the same species (man or mouse). Beale and Feinstein (1976) have pointed out a similar occurrence in immunoglobulin C domains. Little can be said at present about the f12 domains since only two sequences from human antigens are available, but of course the comments made about domains in groups 1 and 3 apply. Due to their C-like nature the class 1I ~2 and f12 domains might be expected to interact in a similar manner to the CL and C), I domains of Fab (New) and the C73 domains of Fc (Kol). The conserved threonine residues in cols 11 and 67, and hydrophobic residues in cols 13, 29, 33 and 65 are in suitable positions for such a type of interaction. DISCUSSION The various groups of X-ray crystallographers have pointed out correlations between three dimensional structure and amino acid sequences of immunoglobulins particularly as regards the invariant cysteine and tryptophan residues, the alternating hydrophobic residues in the fl-strands and certain contact residues (Schiffer et al., 1973; Poljak et al., 1973; Segal et al., 1974; Deisenhofer et al., 1976). Beale and Feinstein (1976) aligned all thirty immunoglobulin C-domain sequences available at the time from different species, classes and subclasses and related them to the crystallographic models of Fab (New) (Poljak et aL, 1974) and L-chain dimer (mcg) (Edmundson et al., 1975). They showed that residues which corresponded to those whose chains pointed to the domain interior of the crystallographic models were nearly always hydrophobic and tended to form clusters. Residues in some non-fl-structured segments also tended to be conserved and form clusters which seemed to influence the shape of these segments. Residues used as contacts between the domains of the crystallographic models could often be seen in other domains. Lesk and Chothia (1982) have carried out a particularly detailed analysis of the relationship between the eleven available high resolution structures of immunoglobulin V and C domains and key amino acid residues. They define a domain "pin" consisting of the invariant cysteine disulphide bridge (cols 32 and 103 in Table 1), the invariant tryptophan residue (col. 48) and a hydrophobic residue (col. 30) next but one before the invariant cysteine residue in strand fx2. This "pin" limits the relative movement of the two/~-sheets and the spatial relationship of the side chains of the "pin" residues is almost identical in the eleven domains studied. Lesk and Chothia (1982) also define a domain "core" consisting of residues in /?-strand positions equivalent in all eleven V and C domains. The side
410
D. BEALE
chains of the core residues which point towards the domain interior generally make important contributions to its hydrophobic nature. There is considerable variation in volume of each of these side chains with little if any local compensation by nearby side chains. Lesk and Chothia (1983) find that compensation is more likely to be made by shifts in the relative positions of the fl-sheets, lateral insertion of residues in non-fl-structural segments, formation of a fl-bulge and changes in the orientation of side chains. The results given in the present paper show that there are six sites which are always hydrophobic in all immunoglobulin C-domains and histocompatibility C-like domains. One of them is the residue in col. 48 in Table 1 which is an invariant tryptophan in immunoglobulin V and C domains but leucine or methionine in fl2-microglobulin. As already mentioned the loss of volume could at least in part be compensated by the presence of large hydrophobic side chains at several other positions. It should be noted here that fl2-microglobulin is the only domain included in Table 2 which exists naturally as an individual domain and not part of a multi-domain chain. Interestingly enough Thy-1 antigen (Campbell et al., 1981; Moriuchi et al., 1983) also exists as a single domain closely related to immunoglobulin V domains and has a leucine residue in place of the tryptophan residue. Another site which is always hydrophobic in immunoglobulin C-domains and histocompatibility Clike domains is col. 30 in Table 1. This is next but one before the invariant cysteine residue in the fx2 strand and is part of the "pin" as defined by Lesk and Chothia (1983). All amino acid sequences have a large hydrophobic side chain in this position except in the case of the C72 domain of guinea pig IgG, which has an alanine residue. Unfortunately part of the sequence is incomplete so it is difficult to assess if there is any local compensation. Thy-1 antigen has a leucine residue in this position. Column 34 is another of the six hydrophobic sites. Although this is a core residue as defined by Lesk and Chothia (1983) it is often not hydrophobic in Vdomains but remains uncharged. This constitutes one of the characteristic differences between V and C domains. Thy-1 antigen has a histidine residue in this position and thereby differs from V and C domains. C/~ 1 domains have an alanine residue at this site and the smallness of the side chain could be compensated by a phenylalanine residue in col. 46. However, several other domains have an alanine residue in this position and there is no obvious local compensation. All immunoglobulin C-domains and histocompatibility C-like domains have a hydrophobic residue in col. 44. This is not defined as a core residue by Lesk and Chothia (1982) and is not usually hydrophobic in V-domains. It is another characteristic difference between V and C domains. Thy-1 has a histidine residue in this position, Another of the six sites is col. 46 which records exclusively large side chain hydrophobic residues. It is part of the "core" and is also always hydrophobic in V domains. Thy-1 antigen has a phenylalanine residue. The last of the six sites is col. 105 which is the next but one residue to the invariant cysteine in
strand fy2. Human C~2 domains have an alanine residue in this position and there could be some local compensation due to phenylalanine in col. 46. This site is part of the core as defined by Lesk and Chothia (1982) but is not hydrophobic and often charged in V-domains. Thy-1 antigen has leucine in this position. There are four positions (cols 8, 10, 86, 88) which are always uncharged and generally hydrophobic in immunoglobulin C-domains and histocompatibility C-like domains. Cols 10 and 86 are "core" positions and are uncharged and hydrophobic respectively in V-domains. Thy-1 has hydrophobic residues in cols 10, 86 and 88. There are a further four positions (cols 12, 37, 80 and 101) which in C-domains and C-like domains are generally hydrophobic and always uncharged except for a unique exception in each case. As pointed out in previous sections of this paper these exceptions need confirming. Cols 12 and 101 are "core" residues and the latter is a tyrosine residue in V domains and Thy-1. All the fourteen sites discussed above correspond to residues on the C domain crystallographic models that have their side chains pointing towards the domain interior and nearly always contribute towards its hydrophobic nature. In the case of the V domain crystallographic models some of these sites are present but there are also others unique to V domains. Some mention should be made of the conservancy found in cols 37, 39 and 107. Nearly all histocompatibility C-like domains, CL, CH 1, C extra and CH3 domains, regardless of species or class, have this conservancy but CH2 domains, V domains and Thy1 do not. On the crystallographic models the aromatic residue in col. 37 lies near the end of the //-strand fx2 and the side chain points towards the domain interior. The proline residue in col. 39 begins the bend of segment b2 and the side chain is in close proximity to that of histidine in col. 107 lying at the end of fl-strand fy2 and the beginning of the bend of segment b6. Hence an interesting spatial arrangement is established between these strands and segments. Why CH2 domains do not conserve this is a matter for speculation. The biological functions of immunoglobulins and histocompatibility antigens are partially dependent on conserving the basic structure discussed above for some of their domains but conserving contacts between domains is also important. The present paper shows that residues involved in CL/CH1, CH3/CH3 and CH2/CH3 contacts are almost entirely conserved throughout the different species and classes studied. The very few class II histocompatibility antigen sequences available indicate that there are suitable sites for the ~2 and f12 domains to make similar contacts. The situation with class I antigens is less clear. As pointed out in the present paper, fl2-microglobulin has suitable sites for CL/CH1 and CH3/CH3 type contacts and Parker and Strominger (1983) have obtained experimental evidence that at least some of these (cols 13, 33 and 81) are situated in the region of contact with the class I ~-chain. However, the class I ~3 domain has charged residues in cols 9, 11 and 31 which seem incompatible with CL/CH1 and
Immunoglobulin from different mammalian species C H 3 / C H 3 type contacts and a somewhat different type of contact may have evolved. Although comparisons of class I • 1 and ~t2 domains and class II :t 1 and fl 1 domains have not been made in the present paper the following features are of some interest. Class I ~ 1 and Class II ~ 1 domains do not have an intrachain disulphide bridge and show no homology with immunoglobulin V or C domains. Class I :t2 and Class II fl I domains have an intrachain disulphide bridge and the number of residues lying between the two invariant cysteine residues is similar to that found in immunoglobulin domains. These histocompatibility domains do not have an invariant tryptophan residue but a tyrosine residue can be placed in this position. The position next but one before the first cysteine residue is not necessarily hydrophobic in these domains and can even be charged. However the few sequences that are available all have residues with large hydrophobic side chains in a position next but one after the cytsteine residue. Hence these domains could have a modified " p i n " structure analogous to that described by Lesk and Chothia (1982) for immunoglobulin domains. The class I ~t2 and class II fl 1 domains also appear to conserve hydrophobic and uncharged sites but few of these correspond with those found in immunoglobulin domains. It is possible that the basic structures of these domains are distantly related. It would seem that nature has devised an intriguing series of structural variations. Immunoglobulin C domains, fl2-microglobulin, histocompatibility class I ~t3 domains and class II ct2 and r 2 domains are closely related to each other but each shows a number of divergent characteristics. Small divergences are also apparent within C-domains themselves so that CL, CH1, CH2 and CH3 domains each have some characteristic features as do fl2-microglobulin, class I ~3 domains, class II ~2 and r 2 domains. Even within a group such as CH3 domains the C6 3 domain and the CE 4 domain show some characteristic differences from C~3, C~3 and Cp4. Immunoglobulin V-domains although clearly related to C-domains are quite distinguishable from them. Thy-1 is more related to V-domains than C-domains but again has characteristic differences. The histocompatibility class I ~2 domains and class II fl 1 domains might be distantly related to immunoglobulin domains but class ! ct l, and class II ct 1 appear to have no relationship. However, even here some caution is needed. Superoxide dismutase, which is quite unrelated functionally to immunoglobulins and histocompatibility antigens, and does not have an intra domain disulphide bridge, has a barrel like structure of eight r-strands which are spatially arranged in a similar manner to those found in immunoglobulin domains (Richardson et al., 1976).
REFERENCES
Amzel L. M. and Poljak R. J. (1979) Three-dimensional structure of immunoglobulins. A. Rev. Biochem. 48, 961-997. Auffrey C., Korman A. J., Roux-Dosetto M., Bono R. and Strominger L. J. (1982) cDNA clone for the heavy chain of the human B cell alloantigen: DCI: strong sequence
411
homology to the HLA-DR heavy chain. Proc. natn. Acad. Sci. U.S.A. 79, 6337-6341. Beale D. and Feinstein A. (1976) Structure and function of the constant regions of immunogiobulins. Q. Rev. Biophys. 9, 135-180. Benoist C. O., Mathis D. J., Kanter M. R., Williams V. E. and McDevitt H. O. (1983) The murine Ia~ chains, E~t and A~, show a surprising degree of sequence homology. Proc. natn. Acad. Sci. U.S.A. 80, 534-538. Campbell D. G., Gagnon J., Reid B. M. and Williams A. F. (1981) Rat brain Thy-1 glycoprotein. The amino acid sequence, disulphide bonds, and an unusual hydrophobic region. Biochem. J. 195, 15-30. Chou P. Y. and Fasman G. D. (1978) Prediction of the secondary structure of proteins from their amino acid sequence. Adv. Enzym. 47, 45-148. Coligan J. E., Kindt T. J., Vehara H., Martinko J. and Nathensen S. G. (1981) Primary structure of a murine transplantation antigen. Nature, Lond. 291, 35-39. Cunningham B. A. (1974) fl2-Microglobulin: an immunoglobulin domain associated with cell surfaces. In Progress in Immunology (Edited by Brent L. and Holborow J.), Vol. II (I), pp. 5-12. North Holland, Amsterdam. Davies R. D., Padlan E. A. and Segal D. M. (1975) Three dimensional structure of immunoglobulins. A. Rev. Biochem. 44, 639-667. Dayhoff M. O. and Eck R. V. (1968) Atlas of Protein Sequence and Structure. National Biomedical Research Foundation, Silver Spring, Maryland, U.S.A. Deisenhofer J. (1981) Crystallographic refinement and atomic models of a human Fc fragment and its complex with fragment B of Protein A from Staphyloccocus aureus at 2.9 ,~, and 2.8 A resolution. Biochemistry 20, 2361-2370. Deisenhofer J., Colman P. M., Epp O. and Huber R. (1976) Crystallographic structural studies of a human Fc fragment. A complete model based on a Fourier map at 3.5 /~ resolution. Hoppe-Seyler's Z. physiol. Chem. 357, 1421-1434. Edelman G. M. and Gall W. E. (1969) The antibody problem. A. Rev. Biochem. 38, 415~,66. Edmundson A. B., Ely K. R., Abola E. E., Schiffer M. and Panagiotopoulos N. (1975) Rotational allomerism and divergent evolution of domains. Biochemistry 14, 3953-3961. Feinstein A. and Beale D. (1977) Models of immunoglobulins and antigen-antibody complexes. In Immunochemistry: An Advanced Textbook. (Edited by Glynn L. E. and Steward M. W.), pp. 263-306. Wiley & Sons, New York. Gough N. M., Kemp D. J., Tyler B. M., Adams J. M. and Cory S. (1980) Intervening sequences divide the gene for the constant region of mouse immunoglobulin p-chains into segments each encoding a domain. Proc. natn. Acad. Sci. U.S.A. 77, 554-558. Kabat E. A., Wu T. T., Bilofsky H., Reid-Miller M. and Perry H. (1983) Sequences of Proteins of Immunological Interest. U.S. Department of Health and Human Services, Bethesda, MD. Kaufman J. F. and Strominger J. L. (1982) HLA-DR light chain has a polymorphic N-terminal region and a conserved immunoglobulin-like C-terminal region. Nature, Lond. 297, 694-697. Kaufman J. F. and Strominger J. L. (1983) The extracellular region of light chains from human and murine MHC class II antigens consists of two domains. J, Immun. 130, 808-817. Kenten J. H., Molgaard H. V., Houghton M., Derbyshire R. B., Viney J., Bell L. O. and Gould H. J. (1982) Cloning and sequence determination of the gene for the human immunoglobulin e chain expressed in a myeloma cell line. Proc. natn. Acad. Sci. U.S.A. 79, 6661-6665.
412
D. BEALE
Korman A. J., Auffray C., Schamboeck A. and Strominger J, L. (1982) The amino acid sequence and gene organization of the heavy chain of the HLA-DR antigen: homology to immunoglobulins. Proc. natn. Acad. Sci. U.S.A. 79, 6013 6017. Kratzin H,, Yang C-Y, Gotz H., Pauly E., Kolbel S., Eg6rt G., Thinnes F. P., Wernet P., Altevogt P. and Hilschmann N. (1981) Primarstruktur menschlicker Histokompatibilitatsantigene der Klasse II. Mitteilung Aminos/iuresequenz der N-terminalen 198 Reste der/~-Kette des HLA-Dw,2;DR2-Alloantigens. Hoppe-Seyler' s Z. physiol. Chem. 362, 1665-1669. Larhammer D., Gustafsson K., Claesson L., Bill P., Wiman K., Schenning L., Sundelin J., Widmark E., Peterson P. A. and Rask L. (1982b) Alpha chain of HLA-DR transplantation antigens is a member of the same protein super-family as the immunoglobulins. Cell 30, 153-161. Larhammer D., Schenning L., Gustafsson K., Wiman K., Claesson L., Rask L., and Peterson P. A. (1982a) Complete amino acid sequence of an HLA-DR antigen-like /3-chain as predicted from the nucleotide sequence. Similarities with immunoglobulins and HLA-A, -B and -C antigens. Proc. natn. Acad. Sci. U.S.A. 79, 3687-3691. Lee J. S., Trowsdale J., Travers P. J., Carey J., Grosveld F., Jenkins J. and Bodmer W. F. (1982) Sequence of an HLA-DR 7-chain cDNA clone and intron-exon organisation of the corresponding gene. Nature, Lond. 299, 750-752. Lesk A. M. and Chothia C. (1982) The evolution of proteins formed by/~-sheets. II. The core of the immunoglobulin domains. J. Mol. Biol. 160, 325-342. Lim V. I. (1974) Algorithms for prediction of ~-helical and /~-structural regions in globular proteins. J. molec. Biol. 88, 873-894. Low T. L. K., Liu V. Y. and Putnam F. W. (1976) Structure, function and evolutionary relationship of the Fc domains of human immunoglobulins A, G, M and E. Science, N. ii, 191, 390-391. Malissen M., Malissen B. and Jordon B. R. (1982) Exon-intron organisation and complete nucleotide sequence of an HLA gene. Proc. natn. Acad. U.S.A. 79, 893-897. Max E. E., Battey J., Ney R., Kirsch I. R. and Leder P. (1982) Duplication and deletion in the human immunoglobulin ~ genes. Cell 29, 691-699. Moriuchi T., Chang H-C., Denome R. and Silver J. (1983) Thy-1 c-DNA sequence suggests a novel regulatory mechanism. Nature, Lond. 301, 80-82. Ollo R., Auffray C., Morchamps C. and Rougeon F. (1981) Comparison of mouse immunoglobulin ~,2a and ~,2b chain genes suggests that exons can be exchanged between genes in a multigenic family. Proc. natn. Acad. Sci. U.S.A. 78, 2442-2446. Orr H. T., Lancet D., Robb R. J., Lopez De Castro J. A. and Strominger J. L. (1979) The heavy chain of human histocompatibility antigen HLA-B7 contains an immunoglobulin-like region, Nature, Lond. 282, 266-270. Parker K. C. and Strominger J. L. (1983) Localization of the sites of iodination of human /~2-microglobulin: qua-
ternary structure implications for histocompatibility antigens. Biochemistry, N.Y. 22, 1145-1153. Poljak R. J., Amzel L, M., Avey H. P., Chen B. L., Phizackerley R. P. and Saut F. (1973) Three dimensional structure of the Fab' fragment of a human immunoglobulin at 2.8A. resolution. Proc. natn. Acad. Sci. U.S.A. 70, 3305-3310. Poljak R. J., Amzel L. M., Chen B. L., Phizackerley R. P. and Saul F. (1974)The three dimensional structure of the Fab' fragment of a human myeloma immunoglobulin at 2.0 A resolution. Proc. hath. Acad. Sci. U.S.A. 71, 3440-3444. Richardson J. S., Richardson D. C., Thomas K. A., Silverton E. W. and Davies D. R. (1976) Similarity of three dimensional structure between the immunoglobulin domain and the copper, zinc superoxide dismutase subunit. J. molec. Biol. 102, 221-235. Sakano H., Rogers J. H,, Huppi K., Brack C., Traunecker A., Maki R., Wall R. and Tonegawa S. (1979) Domains and the hinge region of an immunoglobulin heavy chain are encoded by separate DNA segments. Nature, Lond. 277, 627-633. Saul F. A., Amzel L. M. and Poljak R. J. (1978) Preliminary refinement and structural analysis of the Fab fragment from human immunoglobulin New at 2.0 A resolution. J. biol. Chem. 253, 585-597. Schiffer M., Girling R. L., Ely K. R. and Edmundson A. B. (1973) Structure of a )~-type Bence Jones protein at 3.5 A resolution. Biochemistry, N.Y. 23, 4620-4631. Segal D. M., Padlan E. A., Cohen G. H., Rudikoff S., Potter M. and Davies D. R. (1974) The three dimensional structure of a phosphorylcholine-binding mouse immunoglobulin Fab and the nature of the antigen binding site. Proc. natn. Acad. Sci. U.S.A. 71, 42984302. Shackleford D. A., Kaufman J. F., Korman A. J. and Strominger J. L. (1982) HLA-DR antigens: structure, separation of subpopulations, gene cloning and function. Immunological Rev, 66, 133-187. Strominger J. L., Engelhard V. H., Fuks A., Guild B. C., Hyafil F., Kaufman J. F., Korman A. J., Kostyk T. G., Krangel M. S., Lancet D., Lopez de Castro J. A., Mann D. L., Orr H. T., Parham P. R., Parker K. C., Ploegh H. L., Pober J. S., Robb R. J. and Shackleford D. A. (1981) Biochemical analysis of products of the MHC. In The Role o f the Major Histocompatibility Complex in lmmunobiology (Edited by Dorf M. E.), pp. 115-145. Garland Press, New York. Tragardh L., Rask L., Wiman K., Fohlman J. and Peterson P. A. (1980) Complete amino acid sequence of pooled papain solubilized HLA-A, -B and -C antigens. Relation to immunoglobulins and internal homologies. Proc. natn. Acad Sci. U.S,A. 77, 1129-1133. Wolfe P. B. and Cebra J. J. (1980) The primary structure of guinea pig ]~2-microgiobulin. Molec. Immun. 17, 1493-1505. Yamawaki-Kataoka Y., Kataoka T., Takahashi N., Obata M. and Honjo T. (1980) Complete nucleotide sequence of immunoglobulin v2b chain gene cloned from newborn mouse DNA. Nature, Lond. 283, 786-789.