Int. J. Biochem.,
1974, Vol. 5, pp. 673 to 677.
Pergamon Press.
Printed in Great Britain
673
MLNIRE VIE W CIRCULAR
DICHROISM
AND PROTEIN
W. H. BANNISTER Department
of Physiology
AND
STRUCTURE
J. V. BANNISTER
and Biochemistry, Msida, Malta
Royal
University
of Malta,
(Received 8 324 I 974) ABSTRACT Current methods for fitting protein circuIar dichroism (CD) spectra with basis spectra for helix, p-sheet and unordered structure to estimate the fractions of the secondary structures are reviewed. 2. It is pointed out that the most satisfactory and promising approach is to use calculated basis spectra derived from the CD spectra of reference proteins of known structural composition. 3. Numerical procedures and precautions are outlined. I.
CIRCULAR dichroism is currently recognised as one of the most useful methods for studying the conformation and conformational changes of globular proteins in solution. The secondary structures of globular proteins contribute to the circular dichroism (CD) in the far ultraviolet, and much work has been carried out to unravel the structural information in the CD. A reasonable assessment of current methods would be that with proper safeguards a tentative estimation of the main forms of secondary structure may be made from the CD spectrum. Protein structure can be determined exactly by X-ray diffraction analysis of crystals. CircuIar dichroism is a poor structural tool compared to X-ray diffraction. However, current CD methods can give useful estimates of the helix, p-sheet and unordered structure in a protein in the absence of X-ray diffraction data. The problem of estimating the secondary structures from CD data is considerably complicated by the types of structure found in globular proteins. The ordered structures, helix and p-sheet, occur in short segments with much variation. In addition to PaulingCorey or-helix, 310-helix and various distorted forms of a-helix can occur. The j-sheet can be parallel or antiparallel or a mixture ofboth. As a general rule it is partially distorted. The ordered structures are not true random coils
but have a rigid, definite, nonrepeating structure. To make structural interpretations from protein CD spectra at the present time it has to be assumed that the CD contributions of the secondary structures are due to helix, p-sheet and unordered structure operationally defined by their model spectra. If CD contributions of nonpeptide chromophores including disulphide bonds, aromatic side chains and prosthetic groups are absent or negligibly small, the problem becomes one of three-component fitting of protein CD spectra. This approach was introduced by Greenfield & Fasman (I 969). THREE-COMPONENT FITTING It is thought that the CD contributions o the secondary structures are linearly additive. With a linear three-component model, the CD of a protein at any wavelength X in the far ultraviolet can be expressed as 4 =
fml
+ffl&Y-+f$AR
(1)
The notation is based on that of Chen, Yang & Martinez (1972) ; dA is the CD, expressed as mean residue ellipticity, of the protein; the x~~, xna, and xAR are the mean residue ellipticities for pure helix (H), P-sheet and unordered structure (R), respectively; the fs can be considered as the distributive coefficients of the xAu, x,~, and xaR, or as the
674
BANNISTER
AND BANNISTER
fractions of helix, p-sheet and unordered structure in the protein with sum equal to unity. Thefs can be found by writing equation (I) at three wavelengths and solving the three simultaneous equations obtained. The computations are simplified if the three wavelengths are chosen to be isodichroic points of the spectra for pure helix, p-sheet and unordered structure (Myer, 1970). Solutions for the fs at three wavelengths can be wavelength-dependent because of the approximate nature of the three-component model and the model spectra for the secondary structures. It is preferable to determine the fs by fitting at several wavelengths in the far ultraviolet. For a series of m wavelengths the threecomponent model can be expressed by the matrix equation
d=Xf
(2)
Here d is a column vector representing the protein CD spectrum digitized at m wavelengths; X is an m x 3 matrix whose columns consist respectively of the CD spectra of pure helix, p-sheet and unordered structure digitized at the same wavelengths as the protein CD spectrum; f is a column vector consisting of the three f values, fH, fp, and fR. Given d and X, the fs can be found by least squares methods (Magar, 1972) which minimize the sum of the squares of the residuals between the protein spectrum and the fitted spectrum given by the product of X and the computed fs. The least squares solution can be unconstrained. In this case the fs give the distribution coefficients of the spectra of pure helix, B-sheet and unordered structure for the protein which can, however, be expressed as fractions of 2 f (the sum of the fs) to obtain measures of structural composition. Alternatively, the least squares solution can be made subject to the linear constraint C f = I, when the fs give the fractions of helix, P-sheet and unordered structure for the protein. Least squares fitting has been most successful in the 205-240 nm. wavelength region (Greenfield & Fasman, 1969; Chen et al., Below 205 nm. the signal-to-noise 1972). ratio of CD spectra decreases. This is the main reason why fitting has often been
Int.
3. Biochem.
terminated at 205 nm. To fit below 204 nm. a weighted least squares approximation is advisable (Barela & Darnall, Ig74), but this type of fitting has not been reported so far. Least squares fitting can give a negative value for one (or two) of the fs which is physically meaningless. To avoid this Magar (I 97 I) has recommended fitting by linear programming where the fs are constrained to be positive. However, large negative f values with respect to least squares indicate that the threecomponent model is inadequate or that the model spectra for the secondary structures are inappropriate or both. Analogously to equation (2), for a set of n protein CD spectra digitized at m wavelengths the three-component model gives the matrix equation
D=XF
(3)
Here D is an m x n matrix whose columns consist of the spectra of the proteins; X is an m x 3 matrix whose columns consist respectively of the spectra of pure helix, p-sheet and unordered structures as before; F is a 3 x n matrix whose columns consist of the three f values for each protein. In general the rank of matrix D will be the same as the rank of matrix X (Wallace, Ig6o), which is three by the assumption of a three-component model. Matrix rank analysis (Wallace & Katz, I 964) can therefore show whether the three-component model is valid for a set of protein CD spectra. Since the spectra are subject to experimental error the computed rank is a numerical rank (Noble, I 969) consistent with determinable error bounds for the CD measurements. The error bounds for rank three should therefore be compared with the experimental error of the CD measurements. In this way it has been shown that a threecomponent model is justifiable for the CD spectra of a number of proteins (Bannister & Bannister, Ig74a). Factor analysis may be used as an alternative to matrix rank analysis with similar findings, but more computation is involved (Bannister & Bannister, I g74c). BASS
SPECTRA
The spectra chosen for pure helix, F-sheet and unordered structure constitute the main
PROTEIN
‘97495
STRUCTURE
problem of the three-component model for protein CD spectra. The first spectra to be utilized to form a three-component basis for protein CD spectra were those of the synthetic homopolypeptide poly-L-lysine in a-helical, p-sheet and random coil conformation, respectively (Greenfield & Fasman, 1969). However, these spectra have proved to be unsatisfactory in actual practice with many proteins of known structural composition. Despite the better fitting obtained with alternative h.omopolypeptide spectra for /!?sheet and unordered structure (Rosenkranz & Scholtan, rg7r), homopolypeptides cannot be considered as satisfactory models for protein secondary structures. In contrast to synthetic polypeptides which form long chain helix and /l-sheet and extended random coils, the ordered segments of globular proteins are short and the unordered segments are constrained. Of major concern is the fact that helix CD, which dominates that of the other secondary structures, is chain length-dependent. This is particularly true below 2 IO nm. in both ideal and real helices (Madison & Schellman, 1972). The CD of /Lstructures is relatively insensitive to aggregate size, but depends strongly on the local conformation. For this reason estimates of protein p-sheet based on the CD of polypeptide models are unlikely to be quantitative (Madison & Schellman, I 972). To overcome the objections to the use of homopolypeptide spectra, Saxena & Wetlaufer (rg7r)l and Chen & Yang (1971) proposed the use of basis spectra derived from the CD spectra of a set of three or more proteins of known structural composition. A detailed treatment of this approach is given by Chen et al. (rg72), who used five reference proteins (myoglobin, lysozyme, ribonuclease, lactate dehydrogenase and papain) to derive statistically-average basis spectra for helix, /?-sheet and unordered structure. The calculation of the basis spectra at any wavelength proceeds by writing equation (I) for the five reference proteins which gives the matrix equation
d=Fx Here d is a column
vector
(4) consisting
of the
675
mean residue ellipticities of the five reference proteins; F is a 5 x 3 matrix whose columns consist respectively of the fractions (fvalues) of helix, p-sheet and unordered structure for the proteins; x is a column vector consisting of the mean residue ellipticities for helix, p-sheet and unordered structure which are to be determined. The components of x are found by a least squares solution. The f values for the reference proteins are taken from the results of X-ray diffraction studies. This introduces some difficulties. The counting of amino acid residues in various secondary structures and the assignment of some secondary structures is not always clear-cut. Saxena & Wetlaufer (197 I) chose as fH the average of the lower limit representing true, regular a-helix and the upper limit representing total helix including 3n,- and distorted helices. Chen et al. (I 972) adopted the upper limit. Also no distinction is made between different forms of P-sheet. A three-component basis is assumed for the CD spectra of the reference proteins. This can be checked by matrix rank analysis of the spectra (Bannister & Bannister, Ig74b). Matrix rank analysis has another useful application. Before using the calculated basis spectra to fit the CD spectra of proteins of unknown structure, it is advisable to check that the reference and unknown proteins have the same three-component basis for their CD. This can be done by matrix rank analysis of the CD spectra of the set of reference and unknown proteins (Bannister & Bannister, 1973, r974b). The basis spectra calculated by Chen et al. (I 972) are ofconsiderable interest as presumptive spectra for protein helix, P-sheet and unordered structure. The helix spectrum closely resembled that of synthetic polypeptides with a double minimum at 222 (-) and 2og ( - ) nm. and a maximum at I 93 ( + ) nm., but the magnitude of the CD was about smaller than in synthetic poly20% peptides. This was attributed to the end effects of the short helical segments with an average of I I residues per segment in the reference proteins. The p-sheet spectrum showed a minimum at 214 ( -) nm. and a maximum at 194 ( + ) nm. These are close to
676
BANNISTER
AND BANNISTER
similar extrema at 2 I 7 to 2 I 8 ( - ) nm. and rg6 ( + ) nm., respectively, in the CD spectrum of /_I-poly-L-lysine, but the magnitude of the computed extrema was only one-third to one-fourth of that for p-poly-L-lysine. In addition the @sheet spectrum showed two small positive bands between 225 and 250 nm. not shown by /3-poly-L-lysine. The spectrum for unordered structure showed a minimum at 194 ( -) nm. whose magnitude was about one-third that of the minimum of random coil polypeptides at 197 ( -) nm. Besides this extremum there was a small negative band near 225 nm. This is not shown by the ordinary CD spectrum of random coil polypeptides which has a small positive band at 2 I 7 to 2 I 8 nm. and a much smaller negative band near 240 nm. It is not clear to what extent the small bands in the calculated basis spectra for protein /I-sheet and unordered structure are significant. These may be the bands subject to the most revision with small revisions of the X-ray diffraction data for the secondary structures of the reference proteins. The p-sheet spectrum of Chen et al. (1972) is quite similar in major extrema and magnitude of the CD to the spectrum of fi-poly-L-lysine in I % sodium dodecyl sulphate, which has been proposed as a model for the CD of protein p-sheet by Rosenkranz & Scholtan (1971). The unordered structure spectrum of Chen et al. (1972) is also quite similar qualitatively and in amplitude to the spectrum of poly-Lserine in 8 M LiCl, which has been proposed by Rosenkranz & Scholtan (1971) as a model for the CD of protein unordered structure. However, these resemblances may be purely fortuitous. Theoretical calculations of protein helix CD by Madison & Schellman (1972) for myoglobin, lysozyme, ribonuclease-S and a-chymotrypsin support the approach for the fraction of helix and the findings for helix CD of Chen et al. (I 972). The theoretical CD of the protein @regions showed a small negative band at about 222 nm. and an order-ofmagnitude larger positive band at 200 nm. The negative band at 214 nm. in the p-sheet spectrum of Chen ( I 972) was of approximately equal magnitude as the positive band at
Int. J. Biochem.
I 94 nm. The theory of Madison & Schellman (1972) failed to give the negative band at about 200 nm. expected for the CD of protein unordered structure from residual spectra (except for lysozyme residues 36 to 41). At about 200 nm. the theory predicted a large positive band from the protein crystal coordinates. A small negative band was also predicted at about 220 nm. (and a large negative band at about 180 nm.). In summary, the calculation of basis spectra for protein helix, p-sheet and unordered structure from the CD spectra of reference proteins whose structural composition is interpretable from the results of X-ray diffraction studies represents the most promising empirical approach to the understanding of the CD of protein secondary structures in the absence of satisfactory experimental models. The calculated basis spectra are also the most satisfactory for fitting the CD of proteins of unknown structure with the numerical precaution of matrix rank analysis to show that the reference and unknown proteins have the same basis for their CD. The best basis spectra which can be expected are statistically-average spectra. These should emerge conclusively from extension of the work of Chen et al. (1972) to other reference proteins.
ACKNOWLEDGEMENT
The authors thank the WellcomeTrustfor for circular dichroism studies.
a grant
REFERENCES BANNISTER, W.
H.,
‘ Conformational
&
BANNISTER, J.
V.
(Ig73),
analysis of proteins from circular dichroism spectra with reference to human erythrocuprein ‘, Ex_berientia, 29, I 343-
BA%%ER, W. H., & BANNISTER, J. V. (tg74a), ‘ Evidence for the validity of three-component fitting of protein circular dichroism spectra ‘, <. Naturforsch., 2gc, g-1 I. BANNISTER, W. H., & BANNISTER, J. V. (rg74b), ‘ On the estimation of the secondary structures of proteins from circular dichroism spectra ‘, Int. 3.ABiocht’m., 5, 123-127. BANNISTER, W. H., & BANNISTER, J. V. (IgTq), ‘A study of three-component fitting of protein circular dichroism spectra ‘, Int. 3. Biochem., 5,
679-686.
‘974>5
PROTEIN STRUCTURE
BARELA, T. D., & DARNALL, D. W. (Ig74), ‘ Practical aspects of calculating protein secondary structure from circular dichroism spectra ‘, Biochemistry, 13, I 694-x 700. CHEN, Y.-H., & YANG, J. T. (IgTI), ‘ A new approach to the calculation of secondary structures of globular proteins by optical rotatory disnersion and circular dichroism ‘. Rio&m. B&hys. Res. Commun., 44, 1285--12gI.‘ CHEN, Y.-H., YANG, J. T., & MARTINEZ, H. M. (Ig72), ‘ Determination of the secondary structuresof proteins by circular dichroismand optical rotatory dispersion ‘, Biochemistry, II, 4120413’. GREENFIELD, N., & FASMAN, G. D. (Ig6g), ‘ Computed circular dichroism spectra for the evaluation of protein conformation ‘, Biochemistry, 8,
4108-4’16. MADISON, V., 81 SCHELLMAN, J. (Ig72), ‘ Optical activity of polypeptides and proteins ‘, Bio~O~)Wl~?rs, II, Io41-1075. MAGAR, M. E. (IgTI), ‘ On the possibility of determining the secondary structure of proteins in solution ‘, 3. Theor. Biol., 33, 105-1 Ig. MAGAR, M. E. (1972) ‘ Data Analysis in Biochemistry and Biobhysics ‘, pp. I 04-1 I 7. New York: Academic Press.
677
MYER, Y. P. (r g7o), ‘ A new method for the conformational analysis of proteins and polypeptides from circular dichroism spectra ‘, Res.
Commun. Chm. Pathol. Pharmacol., ;, 607-616. NOBLE, B. (Ig6g), ‘ Applied Linear Algebra ‘, pp. 241-246. New Jersey: Prentice-Hall. ROSENKRANZ, H., & SCHOLTAN. W. (IgTI), ‘ Eine Verbesserte Methode zur Konformationsbestimmung von Helicalen Proteinen aus Messungen des Circulardichroismus ‘, Hoppe-Seyler’s <. Physiol. Chem., 352, 896-904. SAXENA. V. P., & WETLAUFER, D. B. (197 I), ‘ A new basis for interpreting the circulars d
88, g6g-g;z.
I
’
WALLACE, R. M. (Ig6o), ‘ Analysis of absorption spectra of multi-component mixtures ‘, 3. P&s. Chem., Ithaca, 64, 899-90 I. WALLACE, R. M., & KATZ, S. M. (Ig64), ‘A method for the determination of rank in the analysis of absorption spectra of multi-component mixtures ‘, 3. Phys. Chem., Ithaca, 68,
3890-3892. Key Word Index: Protein protein circular rank analysis.
dichroism,
secondary structures, basis spectra, matrix