ANALYTICAL
BIOCHEMISTRY
Circular
CHIANG Cardiovascular
91, 13-31
(1978)
Dichroic Analysis of Protein Conformation: Inclusion of the P-Turns’ TUNG
CHANG,
Research Utliveysity
CHUEN-SHANG
ftlstitutr
of Califttxirr.
C. Wu, AND
and Drparrtttent San Frctnci,scv,
Received
of Biochcrttist~ Califitmia
JEN
TSI YANG~
and Bioph~.sics. 94143
June 2, 1978
The mean residue ellipticity, [fI], at any wavelength, A. of a protein in aqueous solution is expressed as [e], =f,[fI],Y 1 - k/n) + fp[O], + f,[@], +fJO], with two constraints: 1 2 f, 2 0 and E_f, = I. The subscripts H. p, t, and R refer to the helix, p-form. p-turn, and unordered form. The fractions, f,‘s. of I5 proteins are based on X-ray crystallography. f, refers to the net p-turn after cancelling those residues having dihedral angles of opposite sign. The [0],” of an infinite helix and its chain-length dependence factor. !i. were computed from the myoglobin data (Chen c’t al., 1974, Biochemistry 13, 3350). The average number of residues per helical segment. ri, for 15 proteins was about 10. which can be used for proteins of unknown structure. The reference spectra of other three structural elements are computed by a least-squares method. Once the reference spectra are chosen, the same equation above can be used to estimate the fractions of the secondary structure of a portein from its CD data points between 190 and 240 nm at I-nm intervals. The computed helical content is usually good to excellent (concanavalin A is a notable exception). Inclusion of the p-turn in the analysis improves the correlation for the estimates of the p-form. but the computed p, values are not significantly correlated with the X-ray results. Matrix formulation proves the equivalence of the least-squares method and the integral curve-fitting.
The CD3 or ORD of the structural elements in a protein molecule is assumed to be additive (I); this idea is now commonly accepted. With the reference spectra of the helix, p, and unordered (or aperiodic) form known, the corresponding fractions of these elements can be solved from the experimental CD (or ORD) spectrum of a protein. One school uses synthetic polypeptides as mode1 compounds for the three conformations (2-4); the other school computes the reference spectra from CD of proteins of known three-dimensional structure (5-8). Synthetic polypeptides, however, do not resemble real proteins whose structural elements differ from ideal models. Helical polypeptides of high molecular weight such as depro’ Presented at the 61st Annual Meeting of the Federation of American mental Biology. Chicago, April 1977. 2 To whom all correspondence should be addressed. a Abbreviations used: CD. circular dichroism: ORD. optical rotatory 13
0003-26971781091 CopyrIght All nghtr
Societies
for Experi-
dispersion.
l-0013$02.00/O
0 1978 by Academic Pres. Inc. of reproduction I” any form reserved.
14
CHANG.
WU,
AND
YANG
tonated poly(L-lysine) are unlike short helical segments in a protein, whose CD is chain-length dependent (7). The p-form in a protein is more complicated than Pauling’s pleated sheets. Furthermore, the conformation of even the same ,&polypeptides may have a different CD profile or magnitude or both under different conditions; for example, p-(Lys), in aqueous solution (3) and in sodium dodecyl sulfate solution (9) has the same profile but different magnitude. The “coiled’ form of (Lys), in water (7) is more extended than that of (Ser),, in 8 M LiCl (IO), thus resulting in different CD. In either case, the so-called “coiled” form is not the same as the unordered form in a globular protein, which is compact and rigid. In contrast to the very simplicity of model polypeptides, the variants of each conformation in a protein molecule present a major problem for the choice of reference spectra based on proteins whose structure has been solved by X-ray diffraction methods. For the sake of simplicity, the 3,,,helix is combined with o-helix in the counting, but their difference in CD may not be too serious (11). The p-form is even more diverse; its CD depends on the polarity (parallel versus antiparallel), number of strands, and residues per strand (12- 14), not to mention the distortions that have been observed in proteins. What is computed as the reference spectrum is at best a statistical average of numerous variants. The same is true for the unordered form: therefore, the CD of both forms will depend to some extent on the number of reference proteins chosen. In early studies only the helix and p-form are considered as regular structural elements. The p-turn (or p-bend, reverse turn, or 3,, bend) has recently gained attention; the X-ray crystallographers now also discuss this structural element. Woody (15) has calculated the CD contribution of the p-turns. For many globular proteins, perhaps as much as one-quarter of its amino acid residues can be recognized as the p-turns. To make the matter worse, these reverse turns have numerous variants, but three types, which have similar CD profiles account for about 80% of the occurrence in proteins (15). More than a minimum set of three proteins for three conformations were used to minimize variations in determining the reference spectra (6-8). Here we employ 15 reference proteins; the CD analysis also includes a p-turn term, neglect of which has been a cause for concern. Our purposes are threefold: (i) to minimize the possible uncertainties about the estimates of helix and p-form because the p-turn is not considered; (ii) to determine whether inclusion of the p-turn will improve the method of analysis and (iii) to find out the possibility of a rough estimate for the p-turn in view of its many variants. We find that the calculated& remains good to excellent as in previous work (8). This even applies to proteins having anfH of less than 0.1 such as cw-chymotrypsin and elastase; concanavalin A (X-ray: fn = 0.02) is a notable exception and our analysis vastly overestimates the helical content of this protein. Inclusion of the
CD ANALYSIS
OF
PROTEIN
CONFORMATION
15
p-turns improves the estimates of the p-form (cf. Ref. (H)), but the correlation of the pt estimates and X-ray results is not significant. In all methods of CD analysis the contributions due to non-peptide chromophores are assumed to be comparatively small and thereby neglected. MATERIAL
AND METHODS
Mnteriuls. Among the 18 proteins studied, sperm whale myoglobin, egg white lysozyme, dogfish lactate dehydrogenase, papain from papaya, ribonuclease A from bovine pancreas, bovine insulin, horse heart cytochrome c, Staphylococcus aureus nuclease, and cw-chymotrypsin from bovine pancreas were reported previously (6-8). The other nine proteins were: porcine adenylate kinase, thermolysin, subtilisin BPN’, trypsin inhibitor from bovine pancreas and ribonuclease S from bovine pancreas (all from Sigma), bovine carboxypeptidase A (Mann), porcine elastase (Worthington), concanavalin A (a gift of Dr. F. C. Hartman), and parvalbumin B (extracted from carp (Cyprinus cctrpio) and purified by the method of Pecherer et al. (16)). Proteins were dissolved in and dialyzed against 0.01 M phosphate buffer (pH 7.0), except 0.01 M acetate buffer (pH 6.4) was used for thermolysin. The protein solutions were filtered through 3-pm Millipore filters before use. Their concentrations were determined by micro-Kjeldahl nitrogen analysis; the percentage nitrogen was based on available amino acid composition. Method. CD was measured with a Jasco SS-IO spectropolarimeter (modified by Sproul Scientific Instruments) under constant nitrogen flush at 25°C. The instrument had been calibrated with d-IO-camphorsulfonic acid (17,18). The path length of optical cells was checked with a National Bureau of Standards sucrose (sample 17a) of known rotations on a Cary 60 spectropolarimeter (19). The CD data were expressed in terms of mean residue ellipticity in deg.cm’.dmol-‘. The mean residue weight of each protein was calculated from its amino acid composition. We repeated the nine CD spectra by Chen et (11. (8); only the CD of papain was about 10% higher than previously reported probably because accurate percentage nitrogen (17.7%) was used in this work. METHOD
The mean residue ellipticity, [a* =ftdW The (H), omit tural
OF ANALYSIS
[0], at any wavelength, A, can be expressed as +fo[a3
+.ft[4,
+fR[%
[II
[($‘s on the right side of Eq. [l] are the reference values for the helix p-form, p-turn (t), and unordered form (R) (for brevity we the subscript h). Thef,‘s are the corresponding fractions of the strucelements. We also introduce two constraints: Cf, = 1 and 1~ f, 2 0.
16
CHANG,
WU,
AND
YANG
The superscript ui refers to the average number of amino acid residues per helical segment. Previously, the reference [e]j’s (neglecting the p-turn) at each wavelength were determined from the CD of five or eight proteins of knownfj’s by solving five or eight simultaneous equations with a leastsquares method (8). A CD spectrum between 185 and 240 nm has 50 or more data points ([e],‘s) at I-nm intervals. Once the reference [e]j’s are known, we can estimate thef,‘s in Eq. [l] by solving 50 or more simultaneous equations again with the least-squares method. To account for the chain-length dependence, the helical reference value in Eq. [l] can be replaced by [e],” = [e],yl
- k/ii),
PI
where the superscript x refers a helix of infinite length and k is a wavelength-dependent constant (8). The CD spectrum of a helix between 185 and 240 nm can be resolved into three Gaussian bands representing the n - 7~*, 7~ - n-,* and rr - 7~~* transition:
[4HZ = 5 [G%X exp[-(A - A,Y/A,21 f/=1
and k[e],=
=
i kg[~~olH~exp[-(A
Based on the CD spectrum of myoglobin,
- Ay)z/Ay21.
141
Eq. [3] was found to be (8):
[t31Hn = -3.73
x
lo4 (1 - 2.50/E) exp[-(A
- 223.4)*/10.8*]
-3.72
x
lo4 (1 - 3.50/c) exp[-(A
- 206.6)*/8.9*]
+lO.l
[31
x IO4 (1 - 2.50/e) exp[-(A
- 193.5)*/8.4*]
[5]
(all A’s are in nanometers). In principle, we can combine Eqs. [ 11, [2], and [5] and solve the 5:‘s and also n. Chen et al. (8) have proposed several methods but the ri so estimated was not always reliable. Because most proteins studied have an fi of 10 to 11, we arbitrarily choose fi = 10 for an unknown protein and determine the reference [OIHti values from Eq. [2]. The error involved in the estimatedf, was small in most cases. In this work we use 15 proteins of knownfj’s to solve the reference [0]j’s. To minimize uncertainties further, we subtract [O],Oc(1 - k/ii) based on Eq. [S] from the experimental [e], for each protein and solve the [e],, [e],, and [e], values at different wavelengths by the least-squares method. For h determinations the BMD07RT, UCLA, computer program was used. This is a nonlinear least-squares method. The reference CD spectra were determined with a linear least-squares program. The counting of the p-turn residues deserves an explanation. Four residues are involved in a chain reversal but only three peptide chromophores are considered theoretically. Residue i or i + 3 or both can be a part
CD ANALYSIS
OF PROTEIN
CONFORMATION
17
of a helix or p-form. We therefore compromised by normally counting 4 residues for each p-turn and reducing it to 3 or 2 if residue i or i + 3 or both of them were counted in the ordered structures. In two consecutive p-turns 2 or 3 shared residues were only counted once. Venkatachalam (20) first classified 1.5possible types of p-turn structures found in globular proteins. Lewis cf nl, (2 1.22) used more relaxed criteria by considering the distance between C, (i) and C, (i + 3) to be less than 7 A without requiring a hydrogen bond and residues (i + I) and (i + 2) to be nonhelical. Of the allowed conformations, Lewis et nl. (22) have grouped into 10 types and termed three fundamental types I. II, and III. Their approximate mirror images I’, II’, and III’ do not frequently occur, but their optical activities mostly cancel out. Thus, we only computed the net p-turn residues and counted the “canceled” residues as part of the unordered form. Type IV was defined as any of bended types I through III’ with two or more dihedral angles of the p-turn differing by at least 40” from the given angles of the six types (22). Types V, VI, and VII are less frequently encountered and therefore neglected. Lewis et al. (22) listed the types of p-turns for 8 of our 18 proteins (Nos. 1, 6-8, 10, 13-16 in Table 1 under Results); ribonuclease A (No. 15) is close to ribonuclease S (No. 13). The dihedral angles of parvalbumin (No. 2) were taken from Moews and Kretsinger (23), those of insulin read from Blundell et ui. (Ref. (4) in Table 1 footnote}. The total p-turns of adenytate kinase (No. 3) were given by Schulz et al. (24) and those of concanavalin A (No. 18) based on the results of Reeke et al. (Ref. (18) in Table 1). The dihedral angles of the other proteins (Nos. 5, 9, 11, 12, and 17) were determined from their atomic coordinates provided by the Langridge Protein Data, Lawrence Berkeley Laboratory, University of California-Berkeley, and the p-turns were computed by the criteria of Lewis et nf. (22). RESULTS
Figure 1 shows the reference CD spectra of four conformations. Those of the p-form, p-turn, and unordered forms are based on 15 proteins. Three proteins in Table 1 (see below) are excluded in the computations: adenylate kinase, whose net p-turn content is not available, and ~-chymotrypsin and elastase, whose CD spectra do not resemble those for p-rich proteins. The spectrum of the helix having an average number of 10 residues is calculated from Eq. [S]. The [OjHR has its characteristic double minimum at 222 and 210 nm and a maximum at 193 nm. The CD of the p-form has a minimum at 213 nm and a maximum at 198 nm and that of the p-turn two maximaat 224 and 202 nm with another strong negative band below 190 nm. The unordered form shows a strong negative band near 200 nm and another negative one around 225 nm.
18
CHANG,
WU. TABLE
OF THE CD ESTIMATES STRUCTURAL ELEMENTS
COMPARISON
No.’ I.
2.
(No.
Protein of residues)
Myoglobin 11531
Parvalbumin
(108) 3.
Adenylate
kinase
(1931
4.
5.
Insulin (51)
Lactate
dehydrogenase
1332) 6.
7.
8.
9.
10.
11.
Lysozyme (129)
Cytochrome (104)
c
Carboxypeptidase (307)
Thermolysin (316)
Subtilisin (275)
Papain
(212)
BPN’
A
AND
YANG 1
WITH THE OF SEVERAL
X-RAY RESULTS PROTEINS~‘~
f-t%
fi
fs
ft
0.79 0.80 0.83 0.83 0.62 0.49 0.58 0.58 0.54 0.46 0.48 0.48 0.51 0.46 0.44 0.44 0.45 0.45 0.46 0.47 0.41 0.32 0.32 0.32 0.39 0.44 0.44 0.48 0.37 0.45 0.43 0.48 0.36 0.28 0.30 0.30 0.31 0.15 0.19 0.19 0.28 0.29 0.29 0.30
13.4
0 0 0.05 0.05 0.05 0 0.13 0.10 0.12 0.26 0.09 0.09 0.24 0.22 0.46 0.45 0.24 0.18 0 -0.06 0.16 0.29 0.34 0.33 0 0 0 -0.16 0.15 0 0 -0.25 0.22 0.56 0.26 0.26 0.10 0.58 0.15 0.15 0.14 0 0.06 0.06
0.11
11.2
10.5
8.7
12.5
7.6
6.8
14.1
16.1
10.8
12.0
0.17
0.19
0.16
0.09
0.37
0.24
0.28
0.27
0.28
0.18
OF THE
fl (net) 0.05 0.02 0.06 0.07 0.17 0.26 0.39 0.38 0.08 0.09 0.10 0.12 0.19 0.18 0.17 0.06 0.13 0.13 0.15 0.23 0.08 0.07 0.08 0.24 0.28 0.29 0.33 0.26 0.37 0.33 0.38 0.18 0.05 0.06 0.06 0.22 0.04 0.06 0.06 0.17 0.15 0.16 0.18
fR 0.16 0.18 0.22 0.23 0.16 0.25 0.39 0.39 0.15 0.20 0.18 0.18 0.13 0.13 0.16 0.16 0.25 0.24 0.22 0.23 0.20 0.31 0.32 0.33 0.37 0.28 0.30 0.31 0.22 0.18 0.15 0.17 0.24 0.11 0.09 0.09 0.37 0.23 0.18 0.18 0.41 0.56 0.59 0.60
2.6
1.16 1.18
1.49 1.45
0.84 0.85
1.24 1.23
0.81 0.79
1.05 1.06
1.03 0.96
0.90 0.78
0.71 0.71
0.58 0.58
1.10 1.14
CD ANALYSIS
OF PROTEIN
TABLE
No.’ 12.
Protein (No. of residues) Trypsin inhibitor
158t
(1) 12)
13) 13.
Ribonuclease S (124)
(1) (2) (3)
14.
15.
Nuclease (149) Ribonuclease A (124)
(1) (3 (3) (1)
(2) (3) 16.
o-Chymotrypsin (214)
(1)
173 (3) 17.
Elastase
(240)
(1)
(2) (3) 18.
Concanavalin A
(237)
I.
19
CONFORMATION
(Continurd)
fti
fi
0.28 0.07 0.04 -0.02 0.26 0.24 0.20 0.20 0.24 0.30 0.29 0.29 0.23 0.21 0.17 0.17 0.09 0.05 0.06 0.07 0.07 0 0 -0.08 0.02 0.25 0.24 0.33
8.0
10.7
12.0
9.7
11.0
8.0
4.0
fo 0.33 0.32 0.98 1.16 0.44 0.33 0.74 0.74 0.15 0.21 0.31 0.30 0.40 0.39 0.79 0.79 0.34 0.53 0.35 0.35 0.52 0.46 0.47 0.72 0.51 0.46 0 -0.56
h 0.03
0.19
0.25
0.19
0.38
0.30
0.14
fl (net) 0.03 0 0 0.06 0.13 0.14 0.12 0.12 0.18 0.12 0.12 0.12 0.13 0.10 0.08 0.09 0.34 0.02 0.03 0.04 0.26 0.07 0.07 0.01 0.09 0.20 0.16 0.25
hi 0.36 0.61 0.71 0.68 0.17 0.29 0.34 0.35 0.43 0.37 0.38 0.39 0.24 0.30 0.35 0.36 0.23 0.40 0.39 0.39 0.15 0.47 0.47 0.44 0.38 0.09 0 -0.02
Xf,
1.73 1.76 1.40 1.41 1.10 1.10 1.39 1.41 0.83 0.85 1.04 1.07 0.40 0
“fH and f. on the first row of each protein are based on X-ray results; p-turns are calculated from the dihedral angles of X-ray data. * The computed values based on CD spectra are listed after the parentheses: (I) with both constraints (i) 1 ?f, zz 0 and (ii) Zj; = 1: (2) with constraint (i) only; (3) with no constraints (Baker-Isenberg method). c The references for the X-ray studies: 1. Kendrew. J. G., Dickerson, R. E., Strandberg, B. E., Hart, R. G., Davies. D. R., Philips, D. C., and Shore, V. C. (1960) Nature (London) 185, 422-427. 2. Kretsinger. R. H., and Nockolds, E. C. (1973) J. Biof. Clzem. 248, 3313-3334. 3. Schulz, G. E.. Elzinga. M., Marx, F., and Schirmer. R. H. (1974) Nature (London) 250, 120-123. 4. Blundell, T. L., Dodson, G.. Hodgkin. D., and Mercola, D. (1972) Advun. Protein Chem.
26, 279-402.
5. Rossman, M. G.. Adams, M. J.. Buehner. M.. Ford, G. C., Hackert. M. L., Lentz, P. J., Jr.. McPherson, A., Jr., Schevitz, R. W.. and Smiley. I. E. (1972) Culd Spring Harbor Symp. Quunt. Bid. 36, 179- 191. 6. Blake, C. C. F., Koenig, D. F.. Mair, G. A., North. A. C. T., Philips, D. C., and Sarma, V. R. (196.5) Nature (Londorz) 206, 757-761.
CHANG,
20
WU. AND YANG
7. Dickerson, R. E., Takano, T., Eisenberg, D., Kallai. 0. B., Samson, L., Cooper,A.. and Margohash, E. (197l)J. Biol. Chrm. 246, 1511-1535. 8. Quioscho. F. A., and Lipscomb. W. N. (1971) Advan. Prot. Chem. 25, l-78. 9. Colman, P. M., Jansonius. J. N., and Mathews, B. W. (1972)J. Mof. Bio(. 70,701-724. IO. Wright, C. S., Alden. R. A.. and Kraut, J. (1969) Nutwe (London) 221, 235-242, 11. Drenth, J., Jansonius, J. N., Koekoek, R., and Wohhers, B. G. (1971) Enzj~nps 3,484499.
12. Huber, R.. Kukala, E., Bode, W., Schwager. P.. Bartels, K.. Deisenhoffer. J., and Steigeman, W. (1974) J. Mol. Biol. 89, 73-101. 13. Wycoff, H. W., Tsernoglou, D., Hanson, A. W., Knox, J. R., Lee, B., and Richards, F. M. (1970) J. Biof. Chem. 245, 305-328. 14. Arnone. A.. Bier, C. J.. Cotton, F. A., Day, V. W., Hazen, E. E., Jr., Richardson, D. C.. Richardson, J. S., and Yonath. A. (1971) .I. Biol. Chem. 246, 2302-2316. 15. Carlisle. C. H., palmar, R. A.. Kazumdar, S. K.. Gorinsky, B. A., and Yeates, D. G. R. (1974) J. Mof, Bbl. 85, l- 18. 16. Birktoft, J. J.. and Blow, D. M. (1972) J. Mol. Biof. 68, 187-240. 17. Shotton. D. M., White. N. J., and Watson, H. C. (1972) Cold Spring Harbor Symp. Quant. Biol. 36, 91-105. 18. Reeke, G. N.. Becker, J. W.. and Edelman, G. M. (1975)5. Biol. Chem. 250, 1525- 1547.
The I&” spectrum is based on the CD of myoglobin, which contains 79% helix and no p-form; its magnitudes are close to those of helical polypeptides such as poly( L-glutamic acid) and poly( L-lysine) (25). Previously, Chen et al. (8) did not consider the p-turn and computed the [~9], from five reference proteins. With newly determined [(I], (neglecting the contribution of 5% net p-turn in myoglobin), we re-checked the three [Oj”]n’S and the corresponding k,‘s in Eq. [5] (8), which turned to be -35,600, -37,000, and 101,000 deg.cm2*dmol-’ and 2.5, 3.0, and 2.5, respectively. In this computation the three bandwidths were again taken from the work of Straus et al. (26; see also Ref. (8)). The differences between new and old values for Eq. ES] are not large enough to affect the CD spectrum. We therefore retain the old values (see Discussion). Because of the many variants of both the p-form and p-turn in a protein molecule the reference CD spectra for the two conformations could vary with the reference proteins chosen. The same is also true for the unordered form. Figure 2 illustrates the similarities in the CD profiles for each ofthe three conformations in spite of the different number of reference proteins chosen. In addition to the curves shown in Fig. 1, we determined the reference CD spectra based on 13-, 1 l-, 9-, and 7-reference proteins. The criterion used was to successively exclude two proteins with the highest fractions of the unordered form. They were in descending order: nuclease, papain, concanavalin A, subtilisin BPN’, cytochrome c, trypsin inhibitor, lactate dehydrogenase. and ribonuclease A. Curves for the pform and p-turn from 13- and 7-proteins (not shown) were similar to those based on 15 proteins.
CD ANALYSIS
OF PROTEIN
CONFORMATION
21
6-
,2-
6I
190
210
I
230
A, nm FIG. 1. Reference spectra of the helix (H), p-form. p-turn C/i), and unordered form (R). The helical spectrum assumes5 = 10 and the other three spectra are computed from the circular dichroism of 15 proteins. Numerical values at I-nm intervals will be provided upon request.
Table 1 summarizes the CD analysis of 18 proteins. We compute the fractions of helix, p-form, p-turn, and unordered form in Eq. [l] under three conditions: (1) with both constraints (i) 1 2 fj 2 0 and (ii) Cfj = 1, (2) with constraint (i) only, and (3) with no constraints (see Discussion). For comparison the X-ray results are also listed in the first row after
22
CHANG, WU. AND YANG I
I p-
2-
190
I
FORIW
210
I
I 4-
230
I
cz p-TURN : :
190
210
A,
2-
I
I
I
UNORDERED
-
I7
230
190
210
230
nm
FIG. 2. Variations of the reference spectra of the p-form (left), p-turn. (middle), and unordered form (right) with the number of reference proteins. See text for details.
each protein. Figure 3 shows the correlation curves of the three ordered conformations based on method (1) for 15 reference proteins and three other proteins. The& values computed from CD agree well with the X-ray results in most cases. A notable exception is concanavalin A, which should essentially have no helix from X-ray results. The curve-fitting of the CD spectrum for this protein is also poor (see Discussion). The agreement between computed and observed values for thefp and fr is relatively unsatisfactory. Such uncertainties probably arise from the many variants of these two conformations. With the constraint Cfj = 1, thef, value depends on the values of the other three conformations. By and large, the estimates of the fj’s are as good as the empirical CD can provide at present.
C a
0 0
0.5
1.0
0
0.5
FRACTIONS
1.0
( X- royl
FIG. 3. Correlation curves of the fractions of the three conformations between CD analyses and X-ray results. The numberals refer to the 18 proteins listed in Table 1.
CDANALYSISUFPRUTEtNCONFURMATlON
23
DlSCUSSlON Since the CD bands of the helix are strong, a double minimum near 222 and 210 nm in the spectrum is a good indication that the protein contains some helical segments. In addition, the variations of the helix are small in number: most are found to be the o-helix, some adopt the 3,,helix, and others may be further distorted. Thus, the estimated~~ values are reliable in most cases. Previously, Chen er al. (78) introduced the chain-length dependence factor, k (Eq. [2]). If fi = 10, the factor (1 - Lifi) between 190 and 240 nm varied from about 0.65 to 0.75, neglect of which could lower the apparent f;l value by as much as 30%. Thus, we do not use synthetic helical polypeptides for determining the reference CD spectra. In the procedures of Chen uf ni. fgf, ti is considered as an unknown and Eq. IS] is incorporated into Eqs.. [ 11and [2]. Here we fix the rZvalue of a protein and solve Eq. [2] without the three Gaussian terms (Eq. IS]). This simplification is partly prompted by recent development on predicting the secondary structure of a protein from its primary sequence. Schulz et ul. (24) have tested many of these empirical methods and compared their predictions with the X-ray results on adenylate kinase with various degrees of accuracy. For a protein whose sequence is known but without X-ray results, we can use the Chou-Fasman method (27,28), for instance, to estimate the average number of helical residues per segment or the number of helical segments, i, noting thatf, = in/N (N is the total number of residues) so thatf” (1 - klri) simply becomes Cfir - MN). We find that for most proteins, Ti can be taken as 10. The results listed in Table 1 assume ri = 10.4 (an average for the 18 proteins studied), which simplifies the computations. If the actual P-rof a protein is smaller (or targer) than IO, the computedfn might often be underestimated for overestimated)* An exception is convanavalin A rich in P-form; its CD shows a CD minimum at 225 nm which overlaps the 222-nm minimum of the helix. Numerous attempts to improve the curve-fitting failed: the computed results in Table 1 should therefore be discounted because our reference spectra (Fig. 1) does not seem to be applicable in this case. We also tried to analyze the CD spectrum of concanavalin A by presettingf, = 0; the results gave 100% p, which is unacceptable, noting that our reference spectrum for p-forms has a CD minimum at about 213 to 214 sm. The three Gaussian bands in Eq. [5] invotve nine parameters, three each for [8&“, A, and k,, which reduce to six if the A,‘$ are taken from Straus et ui. (26). With so many parameters, the numerical values given in Eq. [5] may not be unique. We use Eq. [S] because the [@hii” versus l/fz plot gave a slope virtually identical with that based on theoretical calculations, although the experimental and theoretical values at various ri’s did differ (8).
24
CHANG,
WU.
AND
YANG
The computed CD spectrum of the p-form shows a strong positive band near 200 nm and a negative one below 220 nm. Qualitatively, they agree with the theoretical treatment of Woody (12), which also predicts that the positions of the bands can be shifted by several nonometers and their magnitudes varied among the many variants of the p-form. Our computed CD spectrum for the p-turn only partiatly agrees with the theoretical calculations. Woody (15) predicts that the three fundamental types 1, II and III of the p-turns are characterized by a weak negative CD band at about 225 nm, a positive one at about 205 nm, and another negative one below 190 nm. Urry et al. (29) and Brahms et al. (30) reported that the CD of synthetic polypeptides having repeating p-turns agreed with the theoretical values at least semiquantitatively. But our computed CD of the p-turn shows a positive one at about 224 nm instead (Figs. 1 and 2); the sign and positions of the other two bands agree with Woody’s treatment, Aromatic groups such as tryptophan may have a positive CD band in the 220 to 230-nm region, but it is difficult to explain the large magnitude, which is almost close to that of [0],l’ at 222 nm (see Fig. 1). Neither had the nonpeptide chromophores been anticipated to contribute to [0], rather than [@]a. Prior to the present work, Chen ef al. (3) have reported that the CD of cobra neurotoxin also shows a weak positive band at 228 nm, a very weak negative one at 215 nm and another negative one below I90 nm. These authors have suggested that this 228-nm band probably belongs to the aromatic groups; its mean residue ellipticity at pH 7 and 25°C (about 3500 deg*cm2*dmol-‘) is only one-fifth to one-sixth that of the computed p-turn band in Fig. 2. Cobra toxin has 62 amino acid residues with four disulfide bonds. The amino acid sequence method predicts no helix, 14% p-form and 44% p-turn (32); no X-ray study is available. However, the structure of erabutoxin, an analogue of cobra toxin, has been solved (33); it contains about 40% twisted antiparallel p-sheets, about 26% p-turns and no helix. In view of sequence homologies it appears that this class of toxins are all rich in the p-form and p-turn and have little helices. Because we do not know the net p-turn content in cobra toxin, it is not possible to predict whether its 228-nm band will have the same magnitude as that shown in Fig. 2 when its observed value is extrapolated to 100% net p-turns. Therefore, it is still an open question whether or not the @turns or the aromatic groups or both are responsible for this strong positive band between 220 and 230 nm. We also cannot detect any trend for the magnitude of this band with respect to the number of aromatic groups found in the p-turns. The least-squares method. A least-squares estimate is a curve-fitting technique. A good fit between the experimental and computed curves does not necessarily mean a correct solution, but a poor fit often points up an imperfection in the method of analysis (8). The most important source of error is of course the choice of reference spectra of various structural
CD ANALYSIS
OF PROTEIN
CONFORMATION
25
elements. First, our [6],‘s are computed from the CD spectra of proteins of known structure. In some cases the percentages of helix and p-form have been upgraded because the X-ray diffraction results are refined (6-8). This in turn will alter our reference spectra and thereby the computed fi’s. More troublesome is the fact that the proper reference spectra may vary from one protein to another. We can only assume that such variations are not too large in most cases. We have introduced two constraints: (i) 1 “fj 5 0 and (ii) cfj = I. cf~ is therefore replaced by (1 - fn - f0 - f$). The first condition avoids any unreasonablefj value being negative or greater than 1. The second condition merely reduces one variable because the more the variables, the less certain the estimates will be. Recently, Baker and Isenberg have employed integrals over the CD data which is supposed to bypass the curve-fitting requirements (34). By matrix fo~ulation, we have shown that their method is equivalent to the least-squares method (35; see the Appendix). Similally, Hammonds (36) has also shown that a continuous least-squares fit is identical to a discrete least-squares fit. Baker and Isenberg advocate no prior requirement of the unity sum off;‘s. Instead, they reject those analyses which fail to pass the unity test. This method seems very sensitive to the chosen spectra; anyfj value that is negative or greater than one is of course unacceptable. We have analyzed the CD spectra of 18 proteins with the integral procedure by including a term for p-turn (Table 1, (3) rows) and recomputed our results by removing the unity sum constraint (Table 1, (2) rows). Thef, values usually appear to be good to excellent with or without any constraints. Notable exceptions are concanavalin A and trypsin inhibitor, whose fH is, respectively, overestimated and underestimated by all three methods (we have no explanation for these results). For a given set of reference spectra the constraint that anyfj must be between zero and one often improves the results of the analysis, for instance, cytochrome c (No. 7 in Table I), insulin (No. 4) and ribonuclease A (No. 15). We also did not find a negativef, value in our computations. No trend can be detected for the unity sum test. The results of myoglobin are excellent by all three methods and yet its Cf;: = 1.17 (No. 1 in Table l), which suggests an overestimate offR in this case. Lacetate dehydrogenase (No. 5) has a CJ’J of only 0.8 and ribonuclease A (No. 15) of 1.40, but their estimatedfii’s are close to the X-ray results. With the limited number of proteins studied, the estimates of helix and p-form seem to be better with the two constraints than without them. The Baker-Isenberg method is essentially identical with our method. Perhaps the major difference is that we introduce the two constraints in the analysis beforehand, whereas Baker and Isenberg use them afterward. Ironically, only lysozyme (No. 6 in Table 1) seems to meet both requirements without prior assumptions but itsf, is underestimated andf, overestimated, and
26
CHANG,
WLJ. AND YANG
6
200
230
200
230
200
230
200
230
A, nm
FIG. 4. Comparison of the experimental (line) and computed (circles) CD of eight proteins. Open circle, with both constraints: 1 zf, 2 0 and CJ; = 1; full circle, without the second condition. See text for details.
its curve-fitting is also not satisfactory. In all fairness, the use of two constraints beforehand versus afterward remains to be an issue at present. Figure 4 illustrates the goodness of curve-fitting or the lack of it for eight proteins by our method. Myoglobin represents one extreme; its high helicity dominates the CD spectrum. Thus, the curve-fitting for this protein is good. All three computations in Table 1 are good; if anything, our method with two constraints is slightly better than the other two methods. On the other extreme, the curve-fitting of concanavalin A is extremely poor and the computed results are also bad (Table 1). This protein virtually has no helix, but this may not be the reason for poor analysis because both elastase and a-chymotrypsin have an& less than 0.1 and their curve-fitting are reasonably good. The choice of reference CD spectra for various conformations does present a problem for an empirical method. But it is impractical to expand the number of terms in Eq. [l] and accomodate numerous variations of the structural elements. All one can hope for is that the chosen reference
CD ANALYSIS
OF PROTEIN
CONFORMATION
27
spectra might be reasonably close to what one would find for the average CD of each of the many structural elements in most proteins. Exceptions are bound to occur and interpretations of these estimates should be viewed with caution. Rank analysis. We analyzed the CD spectra of 18 proteins between 190 and 240 nm at 2-nm intervals by the Wallace-Katz procedure (37). The data matrix is simply B in Eq. [A21 of the Appendix. The reduced data matrices and reduced error matrices were formed by assuming the error level at 7, 10, and 20%, respectively. The rank of the system that defines the number of linearly independent components is the number of the nonzero diagonal elements in the reduced data matrices. We considered an element nonzero if it was more than twice the corresponding element in the reduced error matrix. By this criterion the rank of the 18-protein system was four if the error limits were with 7 and 20% (three for errors larger than 20%); our CD analysis is therefore justified. The CD spectra of five reference proteins (8) had been analyzed for wavelength range between 205 and 240 nm at 2.5nm intervals; the rank was three (38,39). Such analysis only gives the number of independent variables; it does not necessarily exclude many other conformations with similar CD profiles, for instance, the parallel and antiparaller p-form and types I, II, and III of the p-turns. Inspection of Fig. 2 also indicates some resemblance between the p-form and p-turn (except the 224 nm band). Thus, we confine our CD analysis to four components. Statistical tests. Because of the empirical nature of CD analysis we tested the associations between our computed values and the X-ray results by both Spearman rank order and Pearson product-moment correlations. The Spearman correlation coefficient, rsr is defined as r, = 1 - 6 i
di2/n(n2
-
1).
161
i=l
where di is the rank difference between the observed and computed values and n is the number of observations (18 for 18 proteins). The Pearson correlation coefficient, Ye, is defined as
Yp = [C Xiyi - 1 Xi C YiiH]l{[C Xi2 - (1 xJ*inl x [I
yi2 - CC yiY/nl}1’2.
[71
Here again the summation ranges from i = 1 to n ; xi and yi are the fractions of observed and computed structures, respectively. A correlation coefficient near + 1 indicates a successful prediction, whereas a value near zero predicts no better than a random assignment and a value close to -1 means a total disagreement between observed and computed results. The rs values forfH.fD andf, with two constraints (Table 1, (1) rows) are 0.87, 0.59, and 0.09 and the corresponding rr, values are 0.87, 0.61, and 0.15,
28
CIIANG.
WU.
AND
YANG
respectively. Essentially the same correlation coefficients were obtained when one constraint, Cfj = 1, was removed (Table 1, (2) rows). Therefore, they, values based on CD analysis correlate highly with the X-ray results. Thef,‘s show significant correlation, but theft’s have little or no association with the X-ray results. Previously, Chen et al. (8) solved Eq. [l] without the pt term. The correlation coefficient forfn of 18 proteins remains about the same that is, r, = 0.77 and rp = 0.85 when the p-turn is excluded in the analysis. However, the corresponding rs and rp for the p-form drop to 0.19 and 0.28, respectively, suggesting that the computed values do not correlate significantly with the X-ray results. Statistically, inclusion of the pt term improves the estimates of the p-form and does not affect thef, estimates, which remain highly significant. But the estimates of the p-turn are uncertain by the present method. The netf, ofthe 17 proteins studied varies from 0.03 to 0.34 with an average of 0.17; this narrow range of net ft values may have contributed to the poor correlation by the CD analysis. Now that the X-ray diffraction method determines the three-dimensional structure of crystalline proteins, one may justifiably question whether the CD analysis of protein conformation is desirable or even futile. It is commonly accepted that the protein structure is identical in the solid state and in solution. Notable exceptions are some polypeptides and small proteins such as /3-endorphin and /3-lipotropin (40), which are not compact and rigid in aqueous solution. The very simplicity of the CD analysis is an attractive feature. More important, it can monitor any conformational changes of a protein in solution. But to achieve this goal even semiquantitatively the CD analysis of the native protein under study must be fairly reliable; otherwise, it is meaningless to interpret the observed changes in conformation in terms of the helix and p-form. The test of reliability is simple for proteins of known structure. If the primary structure of a protein is available, we can also estimate its various conformations by sequence-predictive methods. The choice of reference CD spectra is indeed problematic. Recently, Grosse et al. (41) have studied the statistical behavior and suitability of such reference spectra. They conclude that the conformational specificity rather than the curve-fitting capability characterizes the suitability of the basis spectra and their standard deviation is reduced when the conformational diversity of reference proteins is increased. Indeed, in our analysis we have focused our attention on conformational specificity. Because of lack of X-ray results not enough proteins rich in p-turn and p-form such as cobra toxin can be included in the analysis, which may affect the requirement of conformational diversity. At present CD is a powerful technique for studying protein conformation in solution and it is still developing and will continue to be refined or improved. However, we should not add another reference protein and revise the reference spectra each time the structure of another protein is solved by the X-ray method. Rather, our CD analysis will be applied to such a pro-
CD ANALYSIS
OF PROTEIN
29
CONFORMATION
tein. As the number of proteins of known structure accumulates, we can decide on the general applicability or the lack of it of our reference spectra, which in turn will determine whether or not a new set of reference spectra will be needed. APPENDIX
For the sake of convenience,
MLztri”rfOrnlu/ation.
we rewrite Eq. [l] as
[AlI
bj = 1 XijJ; + Ei.
j=l
Here bi is structural wavelength calculated lengths at equations.
the experimental [e] at wavelength i;fj‘s are the fractions of elements; Xij are the reference mean residue ellipticities at i; ei is the difference between the experimental and the ellipticity. In our case r = 4 (see Eq. [ 11). For n discrete wave1-nm intervals, Eq. [Al] actually consists of n simultaneous In matrix formulation, it becomes B=XF+E.
In the least-squares as R, is
method,
R = i i=l
[i
[A21
the quantity xiifj
- hi]’
to be minimized, = i
j=l
designated
ei2.
[A31
i=l
Hence, we have R = (XF - B)T(XF - B).
[A41
where B is an n x 1 column matrix, X is an n x r rectangular F is a r x 1 column matrix. and R is a scalar number. -aR = 0 = (()...I. %
. .O)XT(XF - B) + (XF - B)TX(O.. .I..
matrix.
.O)“‘.
[A51
The two terms on the right-hand side of Eq. [A51 are transposes of each other and are scalar numbers. Therefore, they must be identical, that is. (()...I.. .O)XT(XF - B) = 0. [A61 There are r equations the matrix product
of this sort asj = 1 to r. They are equivalent lX’r(XF
- B) = 0,
where 1 and 0 are unit and null matrices of order r. Equation
to [A71
[A71 is just
XT(XF - B) = 0
[A81
F = (XTX)-‘XTB
[A91
or which is the solution to Eq. [A2].
30
CHANG.
The integrat method reciprocal functions:
of Baker and Isenberg yj
(in our terminology),
WU. AND YANG
(34) uses the following
j=1tor
= 1 iljfJl;3
k
(A101
which satisfy the relation
[Al 11 where 6fj is the Kronecker from
delta function. f; =
.i
Hence, thefj’s
biyjdh.
can be obtained
[A121
In matrix form, Eqs, [AIO] to [A121 become Y = XA
LA131
Y’X = 1
[A141
and
!A151
F = YTB From Eqs. [Al31 and [AI4],
we have = 1
IA161
ATXTX = f AT = (XTX)--I
IA171 [A181
A = (XTX)-I.
LA191
(XA)‘X
or Substituting
Eq. [AI91 into Eq. [Al3],
we have
Y = X(XTX)-’
[A201 [A211
YT = (XTX)-'XT Replacing
YT in Eq. [A151 by Eq. [A21], we have F = (XTX)-‘XTB
which is just Eq. ]A91 based on the least-squares
method.
ACKNOWLEDGMENT We thank Professor L. Peller for his valuable discussion and comments on the matrix derivation. All the computations were done on an IBM 3701148 at the UCSF Computer Center. The statistical tests were performed By Dr. A. Bostrom. This work was aided by USPHS Grants GM-10880 and HL-06285 (Program Project).
REFERENCES 1. Yang, J. T.. and Doty, P. (1957) J. Amer. Chrm. SW. 79, 761-77.5. 2. Greenfield, N., Davidson, B.. and Fasman. G. D. (1967) Bioche~~istry 6, 1630-1637.
CD ANALYSIS
OF
PROTEIN
CONFORMATION
31
3. Greenfield. N.. and Fasman. G. D. (1969) Biochemistry 8, 4108-4116. 4. Rosenkranz. H.. and Scholtan. W. (1971) Hoppc-Scylev’s Z. Phxsiol. Chew. 352, 896-904. 5. Saxena. V. P.. and Wetlaufer, D. B. (1971) Proc,. Nut. Ac,ad. Sc,i. USA 68, 969-972. 6. Chen, Y. H., and Yang, J. T. (1971) Bioc~hern. Biophys. Res. Comnum. 44, l285- 1291. 7. Chen. Y. H.. Yang, J. T., and Martinez, H. M. (1972)Bioc~hrmistry 11, 4120-4131. 8. Chen, Y. H.. Yang, J. T.. and Chau, K. H. (1974) Biwhrmislry 13, 3350-3359. 9. Li, L.-K., and Spector, A. (1969) J. Amer. Chew. Sot. 91. 220-222. IO. Quadrifoglio. F.. and Urry. D. W. ( 1968) J. Amer. Chc,nz. SM. 90, 2760-2765. I I. Woody, R. W., Tinoco, I., Jr. (1967) J. Chem. Phy.s. 46, 4927-4945. I?. Woody. R. W. (1969) Biopo/.vmc,rs. 669-683. 13. Pysh. E. S. (1970) J. Chrru. Pl7y.t. 52. 4723-4733. 14. Madison. V.. and Schellman. J. (1972) Biopo/.vnzers 11. 1040-1075. IS. Woody. R. W. (1974) in Peptides. Polypeptides and Proteins (Blout, E. R.. Bovey. F. A.. Goodman. M.. and Lotan, N.. eds.), pp. 338-350. Wiley, New York. 16. Pechere. G. F.. Demaille. J., and Capony. J. P. (1971) Biochim. Bioph.vs. Acrcr 236, 391-408. 17. Cassim. J. Y., and Yang, J. T. (1969) Bioc~hernistry 8, 1947-1951. 18. Chen, G. C.. and Yang, J. T. (1977) AM/. Left. 10, 1195-1207. 19. Samejima. T.. and Yang, J. T. (1964) Biwhrmistry 3, 613-616. 20. Venkatachalam. C. M. (1968) Biopo/ytnc,rs 6, l425- 1436. 21. Lewis. P. N.. Momany, F. A.. and Scheraga, H. A. (1970) Proc. Nut. Acud. Sc,i. USA 68, 2293-2297. 22. Lewis. P. N.. Momany. F. A., and Scheraga. H. A. (1973) Bioc~him. Biophy.s. Acta 303, 221-229. 23. Moews. P. C.. and Kretsinger, R. H. (1975) J. Mol. Biol. 91, 201-228. 24. Schulz, G. E., Barry. C. D.. Chou. P. Y.. Finkelstein. A. V., Lim. V. I.. Ptitsyn. 0. B.. Kabot. E. A.. Wu. T. T.. Levitt, M.. Robson, B.. and Nagano. K. (1974) Nu/llre (London) 250, l40- 142. 25. Cassim. J. Y., and Yang, J. T. (1970) Biopo/ymrrs 9, 1475-1502. 26. Straus,J. H..Gordon.A. S..and Wal1ach.D. F. H. (1969)&r. J.Eiochrm. 11,201-212. 27. Chou, P. Y.. and Fasman. G. D. (1974) Biochemistry 13, 222-245. 28. Fasman. G. D., Chou. P. Y.. and Adler. A. (1976) Biophys. J. 16, 1201-1238. 29. Urry, D. W.. Long. M. M.. Ohnishi. T.. and Jacobs, M. (1974) Bioc,hcm. Biophyx. Res. C‘rmn~rrn. 61, l427- 1433. 30. Brahms. S.. Brahms. J.. Spach. G.. and Brack, A. (1977) Prr,(.. &‘a[. Accld. SC,/, USA 74, 3208-3212. 31. Chen. Y. H.. Lo, T. B.. and Yang, J. T. (1977) Biwhc~mis~r~ 16, l826- 1830. 32. Chen. Y. H.. Lu. H. S.. and Lo. T. B. (1975) J. Chinrsc, Biochrm. Sot.. (Tuipeij 4, 69-82. 33. Low. B. W.. Preston. H. S.. Sato. A., Rosen. L. S.. Searl. J. E.. Rudko. A. D., and Richardson, J. S. (1976) Prrjc. Not. Acad. Sci. CJSA 73, 2991-2994. 34. Baker. C. C., and Isenberg. 1. (1976) Biochrmisrg 15, 629-634. 35. Chang. C. T.. Wu, C. S. C.. and Yang, J. T. (1977) Fed. Prm 36, 839. 36. Hammonds, R. G., Jr. (1977) Eur. J. Biochrm. 74, 421-424. 37. Wallace. R. M.. and Katz. S. M. (1964) J. Phys. Chrm. 68, 3890-3892. 38. Bannister. W. H.. and Bannister. J. V. (1974) Inr. J. Biochrm. 5, 679-686. 39. Coombs. R. W.. Verpoorte, J. A.. and Easterbrook, K. B. (1976) Biopolymrrs 15, 2353-2369. 40. Yang. J. T., Bewley. T.. Chen. G. C.. and Li. C. H. (1977) P~K. Nor. Ac,ud. .Sc,i. USA 74. 3235-3238. 41. Grosse, R.. Malur. J.. Meiske, W.. and Pepke. K. R. H. (1974) Biwhim. Biopkys. Acru 359. 33-46.