[11] Calculation of protein conformation from circular dichroism

[11] Calculation of protein conformation from circular dichroism

208 MACROMOLECULAR CONFORMATION: SPECTROSCOPY [1 II [11] C a l c u l a t i o n o f P r o t e i n C o n f o r m a t i o n f r o m Circular Dichroism...

3MB Sizes 38 Downloads 81 Views

208

MACROMOLECULAR CONFORMATION: SPECTROSCOPY

[1 II

[11] C a l c u l a t i o n o f P r o t e i n C o n f o r m a t i o n f r o m Circular Dichroism By

JEN TSI YANG, CHUEN-SHANG C.

WU,

and HUGO M. MARTINEZ

Optical rotatory dispersion (ORD) and circular dichroism (CD) are two chiroptical phenomena which differentiate two enantiomers. They are caused by different interactions of left- and right-circularly polarized light with chiral molecules. Chirality is a geometric property of molecules; the corresponding substances are therefore optically active. The application of ORD and CD to the study of protein conformation was extensively reviewed in Volume XXVII of this series by Adler e t al. in 1973.1 Today CD has replaced ORD for the conformational analysis of proteins because CD bands that are characteristic of various secondary structures can be directly observed. In this chapter we will discuss some recent developments in the estimation of various conformations in a protein molecule from its CD spectrum in the ultraviolet region. Historical Background It is to Blot in 1812 that we owe the laws of rotatory polarization and rotatory dispersion. It was Cotton in 1896 who described the CD of solutions. Pasteur's separation of a racemic mixture of sodium ammonium tartrate in the 1840s was another milestone. However, the fertile era of ORD ended with the invention of the Bunsen burner in 1866, which made it almost too easy to work with the nearly monochromatic light of the sodium flame (for an account of the early history, see Lowry's classical monograph in 19352). In the 1950s we witnessed a resurgence of interest in ORD, which stemmed from, first, conformational studies of steroids 3 and, second, the discovery of the helical ORD of synthetic polypeptides .4-6 Progress was speeded up by the introduction of the first manualtype Rudolph spectropolarimeter in 1955. In the early 1950s the idea that the Pauling-Corey a-helix must have a positive optical rotation at the sodium D line so that proteins become A. J. Adler, N. J. Greenfield, and G. D. Fasman, this series, Vol. 27, p. 675. 2 T. M. L o w r y , "Optical Rotatory P o w e r . " Longmans, Green, London, 1935; Dover, New

York, 1964. 3C. Djerrassi, "Optical RotatoryDispersion." McGraw-Hill,New York, 1960. 4 p. Doty and J. T. Yang, J. Am. Chem. Soc. 78, 498 (1956). 5 W. Moffittand J. T. Yang, Proc. Natl. Acad. Sci. U.S.A. 42, 596 (1956). 6j. T. Yang and P. Doty, J. Am. Chem. Soc. 79, 761 (1956). METHODS IN ENZYMOLOGY,VOL. 130

Copyright © 1986by Academic Press, Inc. All rightsof reproduction in any form reserved.

[11]

CALCULATION OF PROTEIN CONFORMATION FROM C D

209

more levorotatory upon denaturation (cf. Ref. 7) was received with some skepticism. A polypeptide chain was thought to have an equal probability of winding into a right- and left-handed helix, although earlier Huggins8 had pointed out that L-polypeptides favor a right-handed helix because the /3-carbon of each side chain sterically interferes with the carbonyl oxygen of the same residue in a left-handed helix. The dextrorotation of a helical polypeptide in the visible region was first observed with a crude polarimeter equipped with a Na lamp and a Hg lamp with green and blue filters for three wavelengths at 589, 546, and 436 n m . 9 Today modern instruments with a data processor seem to make it almost too easy to record data. Ironically, some American manufacturers in the early 1960s did not think it possible to design a precise and accurate circular dichrometer. At present the latest JASCO J-500 model spectropolarimeters virtually monopolize the market in this country. Crick and Kendrew in 19571° stated that "this type of evidence [optical rotation of helices] is suggestive but falls short of being conclusive. It leads to a strong presumption that some sort of helical configuration is present, and of a single hand." But they went on to say that "It may be remarked that there is an encouraging parallelism between the X-ray and optical results." The structure of myoglobin was then not yet completely determined. Actually, Pasteur in 1860Jl foresaw the chiroptical phenomenon of a helix: "Imagine a winding stair, the steps of which shall be cubes, or any other object with a superposable image. Destroy the stair, and the dissymmetry will have disappeared. The dissymmetry of the stair was the result only of the mode of putting together its elementary steps." Fresnel in 1824 even predated Pasteur by anticipating the optical rotatory power of a helicoidal arrangement (see Ref. 2). The late W. Moffitt developed an exciton theory of optical rotation for the a-helixJ2just for fun because he had frequently heard biochemists talk about this structure. The simplified form as proposed by Moffitt and Yang 5 [m']

= ao)ko2/()k 2 -

)ko2) -1- boXo4/(h z -

)k02) 2

(1)

7 C. C o h e n , N a t u r e (London) 175, 129 (1955). 8 M. L. Huggins, J. A m . Chem. Soc. 74, 3963 (1952). 9 j. T. Yang, in " C o n f o r m a t i o n of B i o p o l y m e r s " (G. N. R a m a c h a n d r a n , ed.), p. 157. A c a d e m i c Press, N e w York, 1967. 10 F. H. C. Crick and J. C. K e n d r e w , Adv. Protein Chem. 12~ 133 (1957). H L. Pasteur, in two lectures on " R e s e a r c h e s on the Molecular D i s y m m e t r y of Natural Organic P r o d u c t s " presented to the Chemical Society of Paris on 20 January and 3 February, 1860. Translated from " L e q o n s de chimie profess6es en 1860," by W. S. W. Ruschenberger, A m . J. Pharm. 34 (Ser. 3, Vol. 10), I, 97 (1862). *-"W. Moffitt, J. Chem. Phys. 25, 467 (1956).

210

MACROMOLECULAR CONFORMATION: SPECTROSCOPY

[ 11 ]

was widely used for studying the conformation of polypeptides and proteins up to the 1970s. Here [m'] is the reduced mean residue rotation, which equals the mean residue rotation [m] multiplied by the Lorentz correction factor, 3/(nx + 2), n being the wavelength-dependent refractive index of the solvent. [Yang's minor contribution to the Moffitt equation was to replace frequency by wavelength and regroup all constants into two parameters, a0 and b0. Plotting [m'](k 2 - h02) against 1/(h 2 - k02) yields a straight line, provided a third parameter h0 can be preset. 9 Moffitt was skeptical about the graphical determination of three parameters and had wanted to use the then novel computer at Harvard.] Simultaneously, Fitts and the late J. G. Kirkwood applied the polarizability theory of rotatory power 13 to explain the change of specifc rotation in a helix-coil transition. 14,15 The dispute between Moffitt and Kirkwood was subsequently resolved in a joint publication. 16 Both theories require revisions and Eq. (1) is now regarded as empirical. Experimentally, with h0 preset at 212 nm a b0 value of -630 deg cm 2 dmol ~is taken to represent a 100% right-handed helix; this ORD method still works reasonably well. Moffitt's prediction of a right-handed helix with a negative b0 value happens to agree with the later X-ray results. This is a fortunate coincidence. Early experiments were done on poly(y-benzyl-L-glutamate) in poor solvents; its helix turned out to be right-handed. Had poly(fl-benzyl-L-aspartate) been used, its positive b0 value of about +600 deg cm 2 dmol -~ might have raised questions as to why the handedness of helices in polypeptides differs from that in proteins, which usually give a negative b0 value. (For a comprehensive review of early ORD results of polypeptides and proteins, see Ref. 17.) Because of the lack of circular dichrometers, CD remained to be overlooked until the middle 1960s. Holzwarth and Doty were the first to measure the CD spectrum of an a-helix. 18It shows a typical double minimum at 222 and 208-210 nm and a maximum at 191-193 nm (Fig. I), 19'20 which represent the n-Tr* transition, 2~,22and 7r-Trll* and 7r-Tr±* transitions, 12,22,23 13 j. G. Kirkwood, J. Chem. Phys. 5, 479 (1937). 14 D. D. Fitts and J. G. Kirkwood, Proc. Natl. Acad. Sci. U.S.A. 42, 33 (1956). J5 D. D. Fitts and J. G. Kirkwood, J. Am. Chem. Soc. 78, 2650 (1946). 16 W. Moffitt, D. D. Fitts, and J. G. Kirkwood, Proc. Natl. Acad. Sci. U.S.A. 43, 723 (1957). 17 p. Urnes and P. Doty, Adv. Protein Chem. 16, 401 (1961). 18 G. M. Holzwarth and P. Doty, J. Am. Chem. Soc. 87, 218 (1965). t9 N. Greenfield and G. D. Fasman, Biochemistry 8, 4108 (1969). ~-0j. T. Yang and S. Kubota, in "Microdomains in Polymer Solutions" (P. L. Dubin, ed.), p. 311. Plenum, New York, 1985. 21 j. A. Schellman and P. Oriel, J. Chem. Phys. 37, 2114 (1962). 22 I. Tinoco, Jr., R. W. Woody, and D. F. Bradley, J. Chem. Phys. 38, 1317 (1963). z3 R. W. Woody and I. Tinoco, Jr., J. Chem. Phys. 46, 4927 (1967).

[11]

CALCULATION OF PROTEIN CONFORMATION FROM C D

211

"7 E "o

4

% o 2

"o ,q.

o x

0

-2

-4

I

I

200

]

I

220

I

L

240



.A, nm

FIG. 1. CD spectra of the helix, #-form, and unordered form based on (Lys),, (M, = 193,000) in water at 25 °. Curves: R, unordered form at neutral pH; H, s-helix at pH 10.8;/3, #-form at pH 1 I. 1 after heating for 15 min at 52 ° and cooling back to 25 °. Concentration of (Lys),: 0.07%. (From Yang and K u b o t a -'° with the permission of Plenum and copyrighted by Plenum.)

respectively. Current commercial instruments can record CD spectra down to about 184 nm. With a vacuum ultraviolet CD (VUCD) instrument the CD spectra can be further extended to 140 nm under ideal conditions; for aqueous solutions the lower limit is about 165 to 178 nm. At present there are only four such circular dichrometers in this country and no commercial VUCD instruments. The VUCD of the helix 24,25 shows a positive shoulder near 175 nm and one negative and one positive band at shorter wavelengths (Fig. 2). In 1966 the CD spectrum of Pauling and Corey's/3-pleated sheet (of/3form) was found for (Lys), in sodium dodecyl sulfate solution, 26 (Lys), at

24 W. C. J o h n s o n , Jr. and I. Tinoco, Jr., J. Am. Chem. Soc. 94, 4389 (1972). 25 M. A. Y o u n g and E. S. Pysh, Macromolecules 6, 790 (1973). 26 p. K. Sarkar and P. Doty, Proc. Natl. Acad. Sci. U.S.A. 55, 981 (1966).

212

MACROMOLECULAR CONFORMATION:SPECTROSCOPY

[11]

20 txE IO

0

E x I0-3 4

-I0

0 t140

2:>0

180

260

A, nm

FIG. 2. Vacuum ultraviolet CD and absorption spectra of poly(y-methyl-L-glutamate) in hexafluoroisopropanol. (Redrawn from Johnson and Tinoco 24 with the permission of the American Chemical Society, copyright 1972.)

alkaline pH (Fig. I), 27 and silk fibroin in methanol-water. 28 It shows a negative band near 216-218 nm and a positive one between 195 and 200 nm, but the intensities of these CD bands are different in the three cases. For some homopolypeptides containing alkylated or acylated Ser, Thr, and Cys side chains, the extrema of the CD spectra of the/3-form can be red-shifted; in particular, the n-~'* transition was located above 225 nm, but the ~'-~-* transition remained below 200 rim. 29,3° The similarity of the red-shifted n-~-* transition to that predicted for/3-turns (see below) suggests that these homopolypeptides may have a large fraction of residues in /3-turns, 3~ which also agree with the observation that most of these polypeptides form cross-/3 structures according to infrared dichroism criteria. Alternatively, the red-shifted n-Tr* transition could result from the coupling of a strong transition near 194 nm characteristic of dialkyl sulfides 27 R. Townend, R. F. Kumosinski, S. N. Timasheff, G. D. Fasman, and B. Davidson, Biochem. Biophys. Res. Commun. 23, 163 (1966). 28 E. Iizuka and J. T. Yang, Proc. Natl. Acad. Sci. U.S.A. 55, 1175 (1966). 29 G. D. Fasman and J. Potter, Biochem. Biophys. Res. Commun. 27, 209 (1967). 30 L. Stevens, R. Townend, S. N. Timasheff, G. D. Fasman, and J. Potter, Biochemistry 7, 3717 (1968). at R. W. Woody, in "'The Peptides'" (V. Hrudy, ed.L Vol. 7, p. 15. Academic Press, New York, 1985.

[11]

CALCULATION OF PROTEIN CONFORMATION FROM CD

213

with the peptide transitions. 32 The VUCD spectrum of the /3-form has another negative band below 170 nm (not shown). 33-35 Parallel and antiparallel /3-sheets have been predicted to have qualitatively similar CD s p e c t r a .32,36-39

The CD spectrum of the unordered form (Fig. 1) generally shows a strong negative band near 200 nm and a very weak band around 220 nm, which can be either a positive band or a negative shoulder. Theoretical attempts to explain this general feature have been reviewed by Woody. 4° Experimentally, the magnitude of the 200-nm band can vary with the polypeptides studied and the experimental conditions (see section on "Reference Spectra Based on Model Polypeptides"). Until the middle 1970s, the CD of/3-turns (or/3-bend, 3 j0-bend, hairpin bend,/3-loop, 1-4 bend, reverse turn, U-bend, etc.) had been neglected, although this conformation is now recognized to be abundant in protein molecules. Chou and Fasman 4~ searched 29 proteins whose X-ray diffraction data were available and found that as many as one-third to one-fourth of the amino acid residues could be identified in/3-turns. [For an extensive review on reverse turns in peptides and proteins, see Ref. 42. y-Turns have a hydrogen bond between the C = O of the first residue and the NH of the third residue instead of the fourth residue as found in/3-turns, but these turns occur infrequently in proteins and their CD spectrum is much less known than that of/3-turns.] Earlier, Venkatachalam43 classified 15 possible types of/3-turns in globular proteins. By using more relaxed criteria, Lewis e t al. 44,45 have grouped 10 types and termed three fundamental types as types I, II, and III, which account for about 80% of the occurrence in proteins (see Ref. 41). Types 1', II', and III' are approxi32 j. Applequist, Biopolymers 21, 779 (1982). 33 j. S. Balcerski, E. S. Pysh, G. M. Bonora, and C. Toniolo, J. Am. Chem. Soc. 98, 3470 (1976). 34 M. M. Kelly, E. S. Pysh, G. M. Bonora, and C. Toniolo, J. Am. Chem. Soc. 99, 3264 (1977). 35 S. Brahms, J. Brahms, G. Spach, and A. Brack, Proc. Natl. Acad. Sci. U.S.A. 74, 3208 (1977). ~ E. S. Pysh, Proc. Natl. Acad. Sci. U.S.A. 56, 825 (1966). 37 R. W. Woody, Biopolymers 8, 669 (1969). 38 E. S. Pysh, J. Chem. Phys. 52, 4723 (1970). 39 V. Madison and J. A. Schellman, Biopolymers 11, 1041 (1972). 4o R. W. Woody, J. Polymer Sci. Macromol. Rev. 12, 181 (1977). 4~ p. y . Chou and G. D. Fasman, J. Mol. Biol. 115, 135 (1977). 42 j. A. Smith and L. G. Pease, CRC Crit. Rev. Biochem. 8, 315 (1980). 43 C. M. Venkatachalam, Biopolymers 6, 1425 (1968). 44 p. N. Lewis, F. A. Momany, and H. A. Scheraga, Proc. Natl. Acad. Sci. U.S.A. 68, 2293 (1970). 4~ p. N. Lewis, F. A. Momany, and H. A. Scheraga, Biochim. Biophys. Acta 303, 221 (1973).

214

MACROMOLECULAR CONFORMATION: SPECTROSCOPY T ~

[1 1]

21~ ~ / 2

o ~D

0

/

I0 X

-2

I

I

200

~,

~

220

I

I

240

i

I

260

nm

FiG. 3. Vacuum ultraviolet CD spectra of model /~-turns. Curves: I, (Ala2-Gly~),, in water; 2, N-isobutyl-L-proline-isopropylamide film cast from trifluoroethanol solution; 3, Nacetyl-L-Pro-Gly-L-Leu in trifluoroethanol at -60°; 4, cyclo(D-AIa-L-Ala-L-Ala-D-Ala-DAla-L-AIa) in D20. The intensities of (AIa2-GIyD,, have been multiplied by 0.5. (Redrawn from Brahms and Brahms. 4v)

mate mirror images of types I, II, and III but they do not occur frequently. W o o d y ' s theoretical analysis 46 predicts that types I, II, and III should have simlar CD spectra, which generally resemble the spectrum of the/3form identified as a class A spectrum, but which have a weak negative band red-shifted to 220-230 nm, a positive band between 200 and 210 nm, and a strong negative band between 180 and 190 nm. This CD spectrum is referred to as a class B spectrum, the most c o m m o n class associated with any/~-turn conformation. T y p e II' E-turns may have a helix-like CD spectrum, called a class C spectrum. T y p e I E-turns can have any L-amino acid as the second and third residues of the 4-residue reverse turn, i.e., the corner residues; type II has a glycine as the third residue; type III is a variant o f type I in which the dihedral angles of the corner residues are roughly identical, thus forming one turn of 310-helix. Experimentally, the CD spectra of/3-turns generally support the theoretical predictions of W o o d y 46 a s illustrated in Fig. 3. 47 [Brahms and Brahms have proposed that (Ala2-Gly2)n and N-isobutyl-L-Pro-D-Ala rep46 R. W. Woody, in "Peptides, Polypeptides and Proteins" (E. R. Blout, F. A. Bovey, M. Goodman, and N. Lotan, eds.), p. 338. Wiley, New York, 1974. 47 S. Brahms and J. Brahms, J. Mol. Biol. 138, 149 (1980).

[11]

CALCULATION OF PROTEIN CONFORMATION FROM CD

215

resented type II, N-acetyl-L-Pro-Gly-L-Leu type I (although a possible type II was not excluded), and cyclo(D-Ala-L-Alaz-D-Ala2-L-Ala) type IV' and partially type III.] As Woody 46 points out, many cyclic peptides whose type II structure has been established by NMR or X-ray diffraction studies show the class B spectrum, 48-5~ as do several other systems for which fl-turns have been proposed. 47,52-56 A number of peptides show the class C (or its image, C') spectrum. 48'49'57-59However, Woody 46 indicates that the broad range of conformations in fl-turns combined with the sensitivity of CD to the conformation imply that the CD spectra of some/3turns may be quite different from the consensus pattern and that one aspect of his theoretical calculations needs revision because the CD spectra of two well-authenticated type 1 fl-turns, viz. cyclo(Gly-L-Pro-Ala)249 and cyclo(L-Ala-L-Ala-e-aminocaproyl),5° resemble the spectrum of an ahelix (class C spectrum). According to the data compiled by Chou and Fasman, 41 the fl-turns of 29 proteins are 45% type I, 22% type III, and 18% type II. Therefore, W o o d y 46 states that the crucial question is whether type I fl-turns generally give a class C spectrum or whether because of their cyclic character the two authenticated systems show CD spectra deviated from class B spectra. The answer to this question is important in the CD calculations to be described in the section on "Methods of CD Analysis."

48 C. A. Bush, S. K. Sackar, and K. D. Kopple, Biochemistry 17, 4951 (1978). 49 L. M. Gierasch, C. M. Deber, V. Madison, C.-H. Niu, and E. R. Blout, Biochemistry 20, 4730 (1981). ~0 j. Bandekar, D. J. Evans, S. Krimm, S. J. Leach, S. Lee, J. R. McQuie, E. Minasian, G. N6methy, M. S. Pottle, H. A. Scheraga, E. R. Stimson, and R. W. Woody, Int. J. Pept. Protein Res. 19, 187 (1982). 5~ R. Deslauriers, D. J. Evans, S. J. Leach, Y. C. Meinwald, E. Minasian, G. N6methy, 1. D. Rae, H. A. Scheraga, R. L. Somorjai, E. R. Somorjai, E. R. Stimson, J. W. Van Nispen, and R. W. Woody, Maeromolecules 14, 985 (1981). 52 D. W. Urry, M. M. Long, T. Ohnishi, and M. Jacobs, Biochem. Biophys. Res. Commun. 61, 1427 (1974). 53 S. Brahms, J. Brahms, G. Spach, and A. Brack, Proc. Natl. Acad. Sci. U.S.A. 74, 3208 (1977). 54 M. Kawai and G. D. Fasman, J. Am. Chem. Soc. 100, 3630 (1978). 55 S. K. Brahmachari, V. S. Ananthanarayanan, S. Brahms, J. Brahms, R. S. Rapaka, and R. S. Bhatnagar, Biochem. Biophys. Res. Commun. 86, 605 (1979). ~6 S. K. Brahmachari, T. N. Bhat, V. Sudhakar, M. Vijayan, and V. S. Ananthanarayanan, J. Am. Chem. Soc. 103, 1703 (1981). 57 D° W. Urry, A. L. Ruiter, B. C. Starcher, and T. A. Hinners, Antimicrob. Agents Chemother. 87 (1968). ~8 S. Laiken, M. Printz, and L. C. Craig, J. Biol. Chem. 244, 4454 (1969). ~9 V. S. Ananthanarayanan and N. Shyamasundar, Biochem. Biophys. Res. Commun. 102, 295 (1981).

216

MACROMOLECULAR CONFORMATION: SPECTROSCOPY

[ 1 1]

In the 1950s almost all biophysical studies of synthetic polypeptides were limited to two states, the helix and the coil. The theories of Zimm and Bragg and others (see Ref. 60) provide a solid foundation for the interpretation of experimental results. In retrospect it is obvious why the helix was overemphasized. Myoglobin and hemoglobin, the first two proteins studied by X-ray diffraction, are rich in helices and have no/3-form. In fact, the then prevailing view was that the Pauling-Corey a-helix is the only regular secondary structure in globular proteins, and their fl-pleated sheets were thought to exist in some fibrous proteins such as silk fibroin and perhaps in aggregates of polypeptides and proteins. Now we know that fl-sheets and fl-turns are abundant in many globular proteins. Therefore, the discovery of the CD spectra of these conformations in addition to the helix challenges us to improve and refine the analysis of the chiroptical properties of proteins. Expression of Experimental Data 61 Linearly polarized light can be resolved into two circularly polarized components. By definition CD is a measure of the differential absorbance between the left- and right-circularly polarized light, AL -- AR. The emerging light from an optically active medium becomes elliptically polarized. Thus, an alternate measure of CD is the ellipticity, ~. Because ~b is extremely small as compared with the absorbances, it simply becomes the ratio of the minor axis to the major axis of the resultant ellipse. Most commercial instruments, which directly measure At. - AR, are actually calibrated in units of ~k and many publications report CD in ellipticity rather than in differential absorbance. The two measures at wavelength h are related by ~b(h) = 33(AL -- AR)(h)

(2)

where ~b is in degrees. (A more exact conversion factor is 32.982; however, 33.0 will suffice for three significant figures.) In analogy to specific rotation, specific ellipticity, [~], is defined as [~0](x) = g,(x)/lc

(3)

Here l is the light path in dm and c the concentration in g ml -~. The dimension of [~b] is deg cm 2 dg -1 (or 1.745 × 10-4 rad m 2 kg-~). Biot in 60 D. Polland and H. A. Scheraga, "Theory of Helix-Coil Transitions in Biopolymers." Academic Press, New York, 1970. 61 j. T. Yang, in " A Laboratory Manual of Analytical Methods of Protein Chemistry Including Polypeptides (P. Alexander and H. P. Lundgren, eds.), Vol. 5, p. 25. Pergamon, New York, 1969.

[11]

CALCULATION OF PROTEIN CONFORMATION FROM C D

217

1836 introduced the term specific rotation in which decimeters instead of centimeters were used "in order that the significant figures may not be uselessly preceded by two z e r o s . " This awkward tradition is retained because the wealth of published data for more than 150 years forbids any new definition, which will merely cause confusion. The word " s p e c i f i c " is also not in accordance with IUPAC rules but is retained for historical reasons. For biopolymers it is more convenient to express the data in terms of mean residues than in molar quantities. Thus, one measure of CD is simply the differential absorption coefficients, eL - eR, and its dimension is cm -1 M -1 (or 0.1 m 2 mo1-1) (it is understood that the molar concentration refers to mean residue moles per liter). The symbol Ae is often used when there is no risk of ambiguity; the same symbol has long been used for difference in absorption coefficients in absorbance spectroscopy. An alternate measure of CD is the mean residue ellipticity, [0]: [0](;~) =

Mo[qd(h)/100

(4a)

[0](h) =

lO0~(h)/l'c'

(4b)

or

where M0 is the mean residue weight, sometimes referred to as MRW, l' the light path in cm, and c' the concentration in mean residue moles per liter. The dimension of [0] is deg cm 2 dmol -l (or 1.745 × 10 -5 rad m 2 tool-J). F o r most proteins M0 is around 115, which can be used if the molar mass (or molecular weight) and the number of amino acid residues, or the amino acid composition of the protein is not known. In Eq. (4a) the factor I00 is again introduced by tradition; it merely reduces the magnitude by two orders of magnitude. The two measures of CD are related by [0](h) = 3300(eL - eR)(h)

(5)

Sometimes molar CD instead of mean residue CD may also be used. In this case the concentration in eL - eR is simply moles per liter. Likewise, in Eq. (4a) the molar mass (or molecular weight), M, replaces mean residue weight, M0. Obviously, if the number of amino acid residues in a protein molecule is n, M = nMo, that is, the molar ellipticity, [0], is n times the mean residue ellipticity, [0]. The latter quantity is used to estimate the fractions of various conformations in a protein molecule regardless of its molar mass (or molecular weight). Experimental Measurements CD instruments have been greatly improved within the last decade. The Pockels' cell is now replaced by a photoelectric modulator, which

218

MACROMOLECULAR CONFORMATION: SPECTROSCOPY

[11]

TABLE I OPTICAL PROPERTIES OF d-10-CAMPHORSULFONIC ACID IN WATER~'b Quantity e285 (eL -- eR)Zg0.5

Value

Quantity

( E L - - E:R)I92. 5

34.5 2.36 -4.72

[0]Z90.5 [0]192.5

(EL -- ER)I92.5/(eL -- ER)290.5

--2.00

[0]192.5/[01290.5

Value

7,800 - 15,600 --2.00

F r o m C h e n and Yang. 63 b Units: e and (eL -- eR) are in cm t M t and [0] is in deg cm 2 d m o l i. All s u b s c r i p t s refer to w a v e l e n g t h s in n a n o m e t e r s .

eliminates much noise; multiple scannings are aided with a data processor. Nevertheless, an instrument must be properly calibrated; otherwise, any quantitative analysis of experimental data would be meaningless. Assuming that the wavelength scale of commercial instruments is correctly adjusted by the manufacturer, the accuracy of the readings at various wavelengths must still be carefully checked. While a spectropolarimeter for ORD can easily be calibrated with a solution of sucrose (National Bureau of Standards sample), no standard for CD has been universally accepted. One frequently used compound is d-10-camphorsulfonic acid (this compound is dextrorotatory in the visible region and therefore has a prefix d by tradition). However, the compound is hygroscopic. Neglect of its water of hydration results in a smaller CD amplitude than the true value and has caused much confusion in the literature. Indeed, early values recommended by some manufacturers were incorrect. In addition, commercial products often contain colored impurities. In the late 1960s a computerized calibration of the circular dichrometer against a standardized spectropolarimeter was proposed. 62 This is made possible by using the Kronig-Kramers transform from CD to ORD. The ratio of molar ellipticity to molar rotation of d-10-camphorsulfonic acid at their extrema was found to be [0129o.5/[M]3o6 = 1.76. The advantage of this method is that the ratio remains unaffected by the water of hydration and impurities, provided that the impurities are optically inactive. Now that ORD is rarely used, a two-point calibration of the circular dichrometer by using purified, dried d-10-camphorsulfonic acid has been proposed (Table I). 63 One such procedure is to dissolve the compound in ethyl acetate at 75°; charcoal is added to remove colored impurities. The solution is filtered and the compound crystallized upon cooling. The im62 j. y . C a s s i m and J. T. Yang, Biopolymers 9, 1475 (1969). 63 G. C. C h e n a n d J. T. Yang, Anal. Lett. 10, 1195 (1977).

[11]

CALCULATION OF PROTEIN CONFORMATION FROM C D

219

pure compound is recrystallized and washed several times with 1 : 1 ethyl a c e t a t e - e t h e r . The crystals are vacuum dried over P205 in a desiccator at 50 ° and stored in the desiccator at room temperature. Alternately, d-10camphorsulfonic acid can be converted to its monohydrate by storing the compound in a desiccator at 50% relative humidity for several days. To ensure accuracy the concentration of d-10-camphorsulfonic acid should be further determined spectroscopically (see Table I). The [0]290.5 value (Table I) is several percent larger than that recommended by some manufacturers. During a 10-year period of testing, the ratio of [01192.5/[01290.5 of one instrument was found to vary between - 1.96 and - 2 . 0 4 , i.e., - 2 . 0 0 -+ 0.04. Occasionally, the ratio could be as low as - 1.90. This was usually traced to the aging of the xenon lamp, cleanness of optical cells (especially those of short pathlengths), or improper adjustment of the scale range of the instrument or combinations of these factors. H e n n e s s e y and Johnson 64 also reported A~ of 2.36 cm -1 M J at 290.5 nm, but their Ae at 192.5 nm was - 4 . 9 instead of - 4 . 7 2 cm -1 M -l as listed in Table I. Thus, their ratio of A~3192.5/A~3290.5 w a s --2.08. Users often take for granted that commercial instruments have been properly adjusted by the manufacturer. For routine probes of protein conformation operational errors in a particular instrument may be overlooked. H o w e v e r , if the CD spectrum of a protein is used to estimate the secondary structure, we must demand a high degree of accuracy for the experimental data collected. First, a blue or red shift of 1 or 2 nm in the wavelength scale could affect the CD analysis. This can easily be checked by observing the extrema of, say, d-10-camphorsulfonic acid (Table I) during calibration of the instrument. Since a CD spectrum is scanned from long to short wavelengths, care should also be taken to avoid backlash from the gears, that is, the scanning should begin several nanometers longer than the starting point. Hennessey and Johnson 64 tested a wavelength shift of up to 2 nm for the CD spectrum of lactate dehydrogenase and found that the total error in the estimated secondary structure could amount to I% or more (see section on " C o m p a r i s o n of Methods for CD Analysis"). Second, the small difference in absorbance between left- and right-circularly polarized light is inherently noisy. This is particularly true when the scanning approaches the limits of the instrument. Thus, the signal-to-noise ratio decreases rapidly at wavelengths below 200 nm for commercial instruments. One way to reduce the noise is to increase the bandwidth so that more light will reach the photomultiplier tube. This operation is done at the expense of spectral purity, which in turn can distort the CD spectrum. As a rule of thumb the bandwidths should al64 j. p. H e n n e s s e y , Jr. and W. C. Johnson, Jr., Anal. Biochem. 125, 177 (1982).

220

M A C R O M O L E C U L A R C O N F O R M A T I O N : SPECTROSCOPY

[11]

ways be kept within 2 nm. Third, the noise of the CD spectrum can be reduced by raising the time constant. However, the rate of scanning must be slowed so that the instrument can respond to signal changes. An instrument equipped with a multiple scanning device and a data processor often can speed up the rate of scanning, but within limits. Hennessey and Johnson 64 recommend that the product of time constant and rate of scanning be kept below 0.33 nm. For instance, at the rate of scanning of 2 nm/ min, the time constant should be set below 10 sec. On the other hand, a too high time constant may not be desirable because of possible instrumental drift. In our experience the product is usually kept below 0.1 nm if a data processor for repeated scannings is not used. To record a spectrum, the circular dichrometer must first be warmed up for about 30 min to I hr so that the light source and the electric components of the instrument will reach a steady state of operation. To detect any possible shift the baseline should be measured before and after each sample spectrum. For many proteins the spectrum can be scanned from wavelengths longer than 240-250 nm where the CD of the sample and baseline coincides, provided of course the sample has no observable CD in the near ultraviolet region. Aromatic groups such as tyrosine and tryptophan of proteins often show small CD bands up to 300 nm, neglect of which will of course introduce errors in the alignment of the baseline. Experimental errors by individual users are unavoidable, but they can be minimized by taking certain precautions. For routine measurements, fused quartz cells should be free of birefringence. Their path lengths must be accurately determined, especially for 1-mm or less cells. This can easily be done by measuring the absorbance of a solution of known absorption coefficient such as an aqueous solution of d-10-camphorsulfonic acid (Table I). Short path lengths can also be calculated by counting the interference fringes of the empty cell in a spectrometer. The cell must be thoroughly cleaned; a dirty inner wall can easily trap air bubbles and thereby distort the CD signal. If a constant temperature is critical, the cell must be thermostated. Whether a jacketed cell or a cell holder with a water jacket is used, the cell should be positioned firmly and reproducibly. If more than one cell is used, it is advisable to overlap portions of the spectrum obtained from two cells, which will indicate the precision of the data. The choice of the protein concentration is a compromise between enough solute to improve the precision of the data and not too much absorbance at the wavelengths studied. Usually the absorbance of the solution in the cell is kept around one, that is, at least 10% of the light is transmitted. The absorbance should never be above two, for which less than 1% of the light is transmitted. If the supply of sample is limited,

[11]

CALCULATION OF PROTEIN CONFORMATION FROM C D

221

extremely dilute solutions may have to be used and loss of sample due to adsorption on the cell wall can introduce serious errors in the protein concentration. Water is the usual solvent for proteins. Buffers such as Tris and acetate will all absorb at low wavelengths and therefore increase the noise. Potassium fluoride or perchlorate instead of NaCI may be used for desired ionic strength because chloride ion absorbs strongly at low wavelengths (KF should not be used in acidic solution to avoid the production of HF). Perhaps a major source of uncertainty is the preparation of proteins. The same protein studied by different laboratories can sometimes show considerable variations in its CD spectrum, probably because of the use of a poor preparation or of errors in the determination of protein concentration. But sometimes the positions of the extrema and even the entire profile can be different among published data. This poses a serious problem in the CD analysis to be described in the section on "Comparison of Methods for CD Analysis." The concentration of a protein solution must be known accurately, but this is not always easy to achieve. Thus, the method of determination of concentrations should be reported for the sake of comparison by other workers. If there is enough sample, the micro-Kjeldahl method can be used for determining the nitrogen content, provided that the amino acid composition of the protein is known and the solvent used does not contain nitrogen compounds. Colorimetric assays such as the Lowry method or ninhydrin test can be used with caution. These methods often vary from protein to protein even when they are standardized against the protein to be measured. Ideally, these assays should be calibrated against an amino acid analysis of the protein so that a correct absorption coefficient can be determined for routine determinations. Significant variations in the CD spectra reported by workers in various laboratories are disquieting and should be reinvestigated perhaps through collaborative efforts. Methods of CD Analysis The chiroptical properties of various conformations in a protein molecule are assumed to be additive. The contributions due to non-peptide chromophores below 250 nm are frequently neglected, presumably because aromatic groups and disulfide bonds account for only about 10% of the residues and apparently may not significantly perturb the CD spectrum due to various conformations. Thus, the experimental CD spectrum of a protein at each wavelength ~, can be expressed as X(X) = ~ fXi(X) i-I

(6)

222

MACROMOLECULAR CONFORMATION: SPECTROSCOPY

[11]

where X(h) is the mean residue CD, that is, e L - - ,SR or [0]; f is the fraction of the ith conformation and Xi(k) is the corresponding reference CD. If the reference spectra can be evaluated, t h e f ' s can be solved from a series of simultaneous equations (one equation for each ?~) by a least-squares method. One school uses synthetic polypeptides as model compounds to provide reference spectra. An alternative approach is to compute reference spectra from CD spectra of proteins of known secondary structure. Recently, a third approach was proposed: to directly analyze the CD spectrum of a protein as a linear combination of the CD spectra of proteins of known secondary structure, thus avoiding the problem of defining reference spectra of individual conformations.

Reference Spectra Based on Model Polypeptides (Lys)n as a model compound is quite attractive because it can adopt three conformations merely by varying the pH and temperature of its aqueous solution (Fig. 1). The polypeptide behaves as a polyelectrolyte in neutral or acidic solution and is therefore a "random coil." Deprotonation of the polypeptide at pH above 10 induces a coil-to-helix transition. Mild heating of the helix around 50° for 10 or so min converts the helix to the B-form. In 1969 Greenfield and Fasman 19first proposed estimation of various amounts of a-helix (H), //-form (fl), and unordered form (R) in a protein molecule by utilizing three reference spectra of (Lys), for Eq. (6): x ( x ) = f H x . ( x ) + f~x~(x) + fRXp.(X)

(7)

with ~ f = I (Table II). 65 Rosenkraz and Scholtan 66 used the same approach but substituted (Ser), in 8 M LiC1 for (Lys),z at neutral pH for the unordered form (Table II). In 1980 Brahms and Brahms 47 included fl-turns (t) in Eq. (6): x(x) = f.x.(x)

+ f~x~(x) + Axt(x) + fRXR(X)

(8)

and have extended CD measurements into the vacuum ultraviolet region down to 165 nm. The reference spectrum for the helix was taken from the CD spectrum of myoglobin (79% helix, 6% B-turn, and no B-form) normal-

65 j. T. Yang, G. C. Chen, and B. Jirgensons, in "Handbook of Biochemistry and Molecular Biology, Proteins" (G. D. Fasman, ed.), Vol. 111, 3rd Ed., p. 3. CRC Press, Cleveland, Ohio, 1976. 66 H. Rosenkraz and W. Scholtan, Hoppe-Seyler's Z. Physiol. Chem. 352, 896 (1971).

[11]

CALCULATION OF PROTEIN CONFORMATION FROM C D

223

T A B L E II REFERENCE CD SPECTRA OF THREE CONFORMATIONS BASED ON SYNTHETIC POLYPEPTIDES; MEAN RESIDUE ELLIPTICITIES IN deg cm 2 dmol -~" (Lys). in w a t e r b [0]H × 10 3

[0]~ × 10 3

A

B

A

0

0

0

-3.30 -3.80 -4.30 -6.20 -8.00 10.0 -11.4 14.0 17.0 19.2 -21.9 -23.9 -26.0 -28.8 -30.8 -32.4 -33.7 -35.0 -35.7 -36.0 -35.3 -35.5 -35.0 -33.1 -32.5 -31.4 -31.0 -31.0 -31.0 -32.1 -32.4 -32.3 -32.6 -32.5 -29.5 -25.0 -20.5 -12.5 0

-3.94 -5.01 -6.44 -8.23 -10.0 - 12.2 -14.9 - 17.4 -20.0 -22.7 -25.2 -27.9 -30.1 -31.9 -33.7 -35.4 -36.3 -37.2 -37.6 -37.2 -36.9 -36.5 -36.2 -35.8 -36.2 -36.2 -36.5 -37.2 -38.5 -40.1 -41.5 -43.0 -43.3 -41.2 36.5 -30.1 -23.3 -16.1 -7.52

X(nm) 25O 24O 239 238 237 236 235 234 233 232 231 230 229 228 227 226 225 224 223 222 221 22O 219 218 217 216 215 214 213 212 211 210 209 2O8 2O7 206 2O5 204 203 202

-

-

-

-

0.700 -1.40

-3.60

-6.40

- 11.4

-13.8 -15.7

-18.4 -17.9 -16.4

-12.1 -10.8 -4.70

5.70

19.3

B

0 -1.16 -1.43 -1.97 -2.33 2.86 -3.58 -4.12 -4.83 -5.73 6.80 -7.70 -8.59 -9.67 -10.4 - 11.8 - 12.8 -14.0 - 14.9 -15.8 - 16.8 -17.4 -17.9 - 18.4 -18.6 - 18.3 -17.6 -17.0 -15.8 -14.1 -12.5 -11.1 -8.06 -5.37 -2.15 1.79 6.27 8.95 13.2 17.1

[0]R × 10 3 A

0 -0.150 -0.140

0

0.800

2.70

3.90 4.40

4.60 4.10 3.50

0 -1.40 -3.40

-14.5

-25.6

B

(Lys)n in SDS c [0]~ × 10 3

(Ser). in 8 M LiC1 d [0]R × 10 3

-0.250 -0.333 -0.500 -0.670 -0.750 -0.834 -1.17 - 1.32 - 1.67 -2.00 -2.16 -2.50 -3.00 -3.50 -4.00 4.50 -5.00 -5.50 -6.00 -6.50 -6.83 -7.33 -7.66 -7.83 -7.83 -7.50 -7.00 -6.16 -5.50 -4.83 -4.00 -3.33 -2.50 -0.834 0 3.33 5.84 7.80 8.50

0 -0.100 -0.120 -0.190 -0.200 -0.250 -0.290 -0.300 -0.400 0.450 -0.500 -0.600 -0.700 -0.750 -0.800 -0.900 -0.950 - 1.00 -1.10 - 1.20 -1.30 -1.40 - 1.50 -1.70 - 1.80 -1.90 -2.10 -2.20 -2.30 -2.60 -2.80 -3.20 -3.60 -3.90 4.30 -4.80 -5.30 -5.90 -6.40

0 -0.125 -0.143 -0.143 0.125 -0.107 -0.054 0.054 0.143 0.250 0.446 0.643 0.928 1.29 1.46 1.79 2.18 2.50 2.82 3.00 3.36 3.57 3.71 3.75 3.68 3.50 3.14 2.50 1.79 0.928 -0.678 -2.39 -4.21 -6.78 -9.64 -12.9 -17.1 -21.8 -25.3 -29.6

(continued)

224

MACROMOLECULAR CONFORMATION: SPECTROSCOPY

[1 1]

T A B L E II (continued) (Lys)n in water ~

h(nm) 201 200 199 198 197 196 195 194 193 192.5 192 191 190

[0]H × 10-3

[0]8 × 10-3

A

A

6.00 14.3 25.0 35.0 44.3 55.0 64.3

B 0 13.2 26.9 40.5 53.7 66.2 80.1 85.1 87.2

73.3 76.9 74.8

24.3

30.0 31.9

B 21.4 24.6 26.8 28.6 30.3 32.8 33.9 33.2 32.1

30.0 87.4 84.0 80.6

25.3 22.4

[0]R × 10-3 A

-36.4

-41.9 -41.0

B -33.9 -36.1 -38.9 -40.3 -40.7 -39.3 -38.2 -36.4 -33.9

(Lys)n in SDS c [0]t~ × 10-3 9.60 10.4 11.4 12.0 12.6 13.0 13.4

(Ser)~ in 8 M LiC1d [0]R × 10 3 -6.70 -7.00 -7.10 -7.30 -7.40 -7.50 -7.40

-37.5 31.8 28.6 25.0

-34.7 -32.2

-32.0 -29.3 -26.1

" From Yang et al. 6~ by courtesy of CRC Press, and from Yang and Kubota 2° with lhe permission of Plenum (copyrighted by Plenum). Because circular dichrometers scan from long to short wavelengths, the h values are listed in descending order. b A, Helix, pH 11.1 at 22°;/t-form, pH 11.1, heated at 52 ° for 15 min and cooled to 22°; unordered form, pH 5.7 at 22 °. C = 0.01%. From Greenfield and Fasman. 19Additional values for the helix in 0.1 M KF, pH 10.6 to 10.8 were taken from Holzwarth and Doty/8 B, Repeated experiments2° under the same conditions as those used by Greenfield and Fasman in A. c From L. K. Li and S. Spector, J. A m . Chem. Soc. 91, 220 (1969). a From F. Quadrifoglio and D. W. Urry, .I. A m . Chem. Soc. 90, 2760 (1968).

ized to 100% helix. The model system for the/3-form was (Lys-Leu), in 0.1 M NaF at pH 7 and that for the unordered form was (Pro-Lys-LeuLys-Leu)n in salt-free solution. The/3-turns were represented by L-Pro-DAla, (Ala2-Gly2)n and Pro-Gly-Leu (see Fig. 3). Brahms and Brahms tested 13 proteins comprising o~-helix-rich,/3-rich, ~//3 (alternating ot and/3 segments), and a+/3 (mixtures of all-o~ and all-/3 segments) classes and reported a surprisingly good agreement between observed and calculated data, except for one protein, rubredoxin. They further indicated that the use of three sets of/3-turns did not improve estimates of the secondary structure as compared with the use of (Ala2-Gly2)n alone. Thus, the CD spectrum of this polypeptide can be used as the reference spectrum for the /3-turn, although intensities were arbitrarily reduced to one-half of their experimental data.

[11]

CALCULATION OF PROTEIN CONFORMATION FROM C D

225

The major advantage of this approach is its simplicity. The reference spectra for the four conformations are directly measurable. The disadvantages are several-fold. First, the CD of the helix varies to some extent among different helical polypeptides. 1 In addition, uncharged polypeptides can easily aggregate; this will in turn distort the CD spectrum. The difference in numerical values for (Lys)n in water from two laboratories (Table II) may be partly due to the uncertainty in the degree of aggregation and partly due to experimental errors from different instruments, noting that instrumentation has been improved during the past decade. Second, the choice of the reference spectrum for the fl-form is more problematic. (Lys), in neutral SDS solution also adopts a fl-form and yet its CD intensities are about one half of those obtained by mild heating at pH 1 1 (Table II). Which reference spectrum better represents the/3-form is a matter of conjecture. Third, the so-called "random coil" is an extended polyion, which may not be an appropriate model for the unordered segments in proteins. The unordered form of (Ser)n in 8 M LiCI has a CD magnitude about one-fifth of that of (Lys)n in neutral solution (Table II). This uncertainty is again not yet resolved. In spite of these problems this method gives a reasonable estimate of the helicity mainly because the CD spectrum of the helix usually predominates over that of the/3-form and the unordered form. However the use of helical (Lys)n does not take into consideration the chain-length dependence of the helical CD spectrum (see section below). Indeed, Straus et al. 67 resolved the CD spectra of myoglobin, hemoglobin, and lysozyme into several bands and found that the rotational strengths of the helical bands were always less than those of the helical polypeptides. Therefore, neglect of this chain-length factor will yield smaller fractions of helices than the true values. R e f e r e n c e Spectra B a s e d on Proteins

With proteins of known secondary structure, it is possible to determine the reference spectra from the CD spectra of these proteins by using Eq. (6). If only the helix, fl-form, and unordered form are considered, solving the X parameters in Eq. (7) requires a minimum of three simultaneous equations with the f-s of three proteins deduced from X-ray diffraction studies. Saxena and Wetlaufer 68 used the CD spectra of myoglobin, lysozyme, and ribonuclease and solved the X/s at a series of wavelengths. The resultant reference spectra of the helix and/~-form are quite similar to those obtained from synthetic polypeptides (Fig. 1). Independently, Chen 67 j. H. Straus, A. S. Gordon, and D. F. H. Wallace, Eur. J. Biochem. 11, 201 (1969). 6s V. P. S a x e n a and D. B. Wetlaufer, Proc. Natl. Acad. Sci. U.S.A. 68, 969 (1971).

226

MACROMOLECULAR CONFORMATION: SPECTROSCOPY

[ 1 1]

and Yang 69'7° used five proteins (myoglobin, lysozyme, ribonuclease, papain, and lactate dehydrogenase) whose secondary structures were then known and solved five simultaneous equations [Eq. (7)] at each wavelength by a least-squares method with the constraint ~]f- = 1. The reference spectra so obtained are again similar to those of model compounds. If only three proteins were used, the calculated reference spectra varied from one set to another, but they were heavily weighted by the inclusion of myoglobin, which has the highest helicity among the five proteins studied. [Inadvertently, Saxena and Wetlaufer68 used reduced mean residue ellipticities of ribonuclease in their calculations. Thus, the numerical values of [0] for this protein were about 25% lower than their true values because of the Lorentz correction factor (see footnote a in Ref. 70). However, their reference spectra were qualitatively correct.] This method of CD analysis has since been evaluated and modified by several laboratories.V1-75 As X-ray diffraction results were updated, Chen et al. 76 reinvestigated the reference spectra of the helix,/3-form, and unordered form based on five reference proteins according to Eq. (7). With the availability of new X-ray data on proteins, Chen et al. 76 also determined the reference spectra based on eight proteins (by adding insulin, cytochrome c, and nuclease). The reference spectrum of the helix so obtained was close to that reported previously,7° but that of the/3-form and unordered form changed significantly. To account for the chain-length dependence of CD for the helix, 23,39 Chen et al. 76 replaced XH(X) in Eq. (7) by XHn(;t) and introduced the relation XH~(X) = XH:(X)(1 -- k/n)

(9)

Here the superscripts ~i and oo refer to the average number of amino acid residues per helical segment in a protein molecule, mostly about 10 to 11, and the chain length of an infinite helix, respectively, and k is a wave69 Y.-H. Chen and J. T. Yang, Biochem. Biophys. Res. Commun. 44, 1285 (1971). 7o Y.-H. Chen, J. T. Yang, and H. M. Martiner, Biochemistry 11, 4120 (1972). 7~ R. Grosse, J. Malur, W. Meiske, J. G. Reich, and K. R. H. Repke, Acta Biol. Med. Germ. 29, 777 (1972). 7: R. Grosse, J. Malur, W. Meiske, and K. R. H. Repke, Biochim. Biophys. Acta 359, 33 (1974). 73 j. Markussen and A. V¢lund, Int. J. Pept. Protein Res. 7, 47 (1975). 74 j. B. Siegel, W. E. Steinmetz, and G. L. Long, Anal. Biochem. 104, 160 (1980). 75 I. A. Bolotina, V. O. Chekhov, V. Lugauskas, A. V. Finkel'stein, and O. B. Ptitsyn, Mol. Biol. (USSR) 14, 891 (1980); English translation, p. 701, 1981. 76 Y.-H. Chen, J. T. Yang, and K. H. Chau, Biochemistry 13, 3350 (1974).

[11]

CALCULATION OF PROTEIN CONFORMATION FROM CD I

8 i

I

I

I

I 220

I 240

-40 30~0

227

5E -o 4 £ xJ 0

O4

o

'0

-if

I0

u._J

,50 4

I 180

I 200

A,

260

nm

FIG. 4. Computed CD spectra for helices of different chain lengths. Numerals refer to the numbers of amino acid residues. (Redrawn from Chen et al. 76 with the permission of the American Chemical Society, copyright 1974.)

length-dependent constant. In practice an infinite helix applies to helical polypeptides such as (Lys)n. On the basis of the CD spectrum of myoglobin 67,76the reference spectrum of the helix can be calculated from its three Gaussian bands between 184 and 240 nm: XH~(h) = --3.73 x 104(1 - 2 . 5 0 / h ) e x p [ - ( h - 223.4)2/10.82] - 3 . 7 3 x 104(1 - 3.50/fi)exp[-(X - 206.6)2/8.92] +10.1 x 104(1 - 2.50/fi)exp[-(h - 193.5)2/8.42]

(10)

(all Xs are in nanometers). Thus, the CD magnitude of the helix decreases with decreasing chain length (Fig. 4). In 1978 Chang e t al. 77 enlarged the number of reference proteins to 15 (see footnote a in Table III) and added to Eq. (7) a term for the/3-turn (t) (previously, the CD of/3-turns was obscured in that of the other conformations): X(X)

= fHXn~(X) + ft~X#(X) + ftXt()k ) + fRXR(X)

(I 1)

which is the same as Eq. (8) except that XH~ substitutes for X H . The net/3turn was used by subtracting from types I, II, and III their mirror images, 77 C. T. Chang, C.-S. C. Wu, and J. T. Yang, A n a l . B i o c h e m . 91, 12 (1978).

228

MACROMOLECULAR CONFORMATION : SPECTROSCOPY

[ 11

]

T A B L E III REFERENCE C D SPECTRA OF FOUR CONFORMATIONS BASED ON FIFTEEN PROTEINS:

MEAN RESIDUE ELLIPT1CITIES IN d e g c m ~ drool -1" X(nm)

[0]~ X 10 -3b

kb

[0]8 X 10 3c

[0]t X l0 3c

[0]R X 10 3c

240 239 238 237 236 235 234 233 232 231 230 229 228 227 226 225 224 223 222 221 220 219 218 217 216 215 214 213 212 211 210 209 208 207 206 205 204 203 202 201 200 199 198 197 196

-3.51 -4.63 -6.00 -7.64 -9.56 - 11.8 - 14.2 - 16.2 -18.9 -21.5 -24.0 -26.7 -29.0 -31.2 -33.2 -35.5 -37.0 -37.5 -37.4 -36.9 -36.3 -35.7 -35.3 -35.0 -34.8 -35.0 -35.3 -36.0 -36.8 -37.3 -38.0 -37.5 -36.0 -33.2 -28.8 -22.6 - 14.5 -4.53 7.06 20.0 36.7 47.6 61.0 73.2 83.4

2.50 2.50 2.50 2.50 2.50 2.50 2.50 2.50 2.50 2.50 2.50 2.50 2.50 2.50 2.50 2.50 2.50 2.50 2.50 2.50 2.60 2.60 2.70 2.70 2.80 2.90 3.00 3.10 3.10 3.20 3.30 3.40 3.50 3.60 3.70 4.10 4.80 9.40 1.50 1.20 1.80 2.10 2.20 2.30 2.30

1.16 1.47 1.74 1.92 2.00 2.11 2.32 2.58 2.97 2.19 3.16 2.88 3.12 2.72 2.31 1.78 0.93 0.40 0.35 - 1.04 - 1.79 -2.49 -3.32 -3.97 -4.55 -4.93 -5.19 -5.17 -5.00 -4.44 -3.77 -3.45 -3.14 -2.40 - 1.52 -0.62 0.55 1.35 2.26 4.93 8.05 9.39 9.58 8.56 6.52

1.43 1.71 2.20 2.85 3.39 3.88 4.63 4.87 6.31 7.42 9.96 12.6 14.3 16.1 17.4 19.3 21.0 20.3 19.3 18.0 16.0 14.5 12.7 11.6 9.77 8.55 7.33 6.58 6.07 4.91 5.57 6.00 6.45 8.63 11.4 13.9 16.3 17.9 20.1 18.0 12.4 11.3 - 1.17 - 13.3 -27.7

-2.04 -2.46 -2.81 -3.20 -3.45 -3.72 -4.33 -5.30 -6.62 -7.33 -9.67 - 11.0 -12.8 - 13.7 - 14.3 - 14.7 - 14.6 - 14.4 - 13.8 - 13.4 - 12.7 - 11.8 - 10.8 -9.64 -8.58 -7.75 -7.17 -6.94 -6.64 -6.94 -8.02 -9.19 -9.87 - 11.8 - 14.0 - 16.7 -20.1 -23.2 -25.1 - 31.1 -36.0 -35.6 -33.1 -28.8 -22.9

[11]

CALCULATION OF PROTEIN CONFORMATION FROM C D

229

T A B L E III (continued) X(nm)

[0]~ × 10 3b

k~

[0] 8 × 10 3,

195 194 193 192 191 190

91.0 95.6 97.0 95.3 90.7 83.8

2.40 2.40 2.40 2.40 2.40 2.40

4.23 0.52 3.05 -6.91 -9.42 - 1.23

[0]t × 10 -3" -39.4 -51.5 -49.9 -70.3 -75.7 -78.4

[0]R × 10 3c - 16.5 -9.64 0.28 6.41 13.4 20.1

" T h e proteins used were myoglobin, parvalbumin, insulin, lactate d e h y d r o g e n a s e , lysozyme, c y t o c h r o m e c, carboxypeptidase A, thermolysin, subtilisin B P N ' , papain, trypsin inhibitor, ribonuclease S, nuclease, ribonuclease A, and concanavalin A. Solvent: 0.01 M p h o s p h a t e buffer (pH 7.0), except 0.01 M acetate buffer (pH 6.4) for thermolysin. See C h a n g et al. 77 t, [01) = I0]~(I - k/h). Based on the CD spectra of myoglobin IEq. (10)1: h = 13.4 for myoglobin. ' Solved by the least-squares method for Eq. (8) after subtracting.fHl0]~, s from corresponding experimental [01s at each wavelength for the 15 proteins.

type I', II', and III', respectively. With four reference spectra determined from reference proteins, the fractions of the helix (H),/3-form (/3),/3-turn (t), and unordered form (R) can be estimated by a least-squares method. Chang e t al. 77 introduced two constraints: ~ f = 1 and I -> f - 0. For experimental data between 190 and 240 nm, there are 51 data points at lnm intervals. Fifty-one simultaneous equations of Eq. (11) can be used to solve the f s . Since the reference spectrum of the helix can be calculated from Eq. (10), Chang e t al. 77 solved the reference spectra of the/3-form,/3turn, and unordered form by first subtractingfHXH~(h) (assuming ~ = 10.4) from the experimental X(h) for each of the 15 proteins used instead of directly determining the four reference spectra according to Eq. (11) (Table III). [Because the BMD07RT, UCLA computer program used previously 7° is not readily available for many computers, a program written in C-language is now listed in Appendix A.] The CD spectrum of the/3-form as computed by Chang e t al. 77 resembles that shown in Fig. 1; it has a strong positive band near 200 nm and a negative one above 210 nm, which qualitatively agrees with Woody's theoretical treatment. 37 The computed CD spectrum for the/3-turn has a positive band near 224 nm, another positive one around 200 nm, and an intense negative one below 190 rim. A positive rather than a negative band above 220 nm was unexpected, but the sign and positions of the other two bands agreed with Woody's treatment. 46 Chang e t al. 77 also successively excluded two proteins with the highest content of unordered form from the set of reference proteins and determined the reference spectra based on 13, 11, 9, and 7 proteins. The spectra for the /3-form, /3-turn, and

230

MACROMOLECULAR CONFORMATION: SPECTROSCOPY

[ 1 1]

unordered form so obtained were qualitatively similar to those based on 15 proteins. These uncertainties in the reference spectra of the fl-form,/3turn and unordered form will undoubtedly affect the CD analysis of proteins (see section on " C o m p a r i s o n of Methods for CD Analysis"). In an attempt to bypass the curve-fitting requirements, Baker and Isenberg TM e m p l o y e d integrals over the CD data and calculated the helix, /3-form, and unordered form of proteins from such integrals. But by matrix formulation this approach is equivalent to the least-squares method. 77 Similarly, H a m m o n d s 79 has shown that a continuous least-squares fit is identical to a discrete least-squares fit. The introduction of the constraint f = 1 reduces one variable by replacingfR by (1 - fH -- ft~ -- ft). Removal of this constraint can provide the unity sum test. A CD analysis may be in doubt if it fails to pass this test. H o w e v e r , Chang e t al. 77 could not detect a trend in their studies (to be discussed later). In some cases it made no difference whether this constraint was introduced or not. In other cases there were good agreements between CD estimates and X-ray results for the ordered structures, even though the sum of f s could be much greater or less than unity. Conversely, poor results could be obtained even when the sum unity test was met. The introduction o f the constraint 1 - f -> 0 avoids a possible negative f value or one that is greater than unity, both of which are unacceptable. On the other hand, such unreasonable values can point up the uncertainty in the analysis, which would have otherwise escaped detection. Chang e t al. 77 found that the introduction of this constraint often improved the estimation o f various conformations. In all fairness the use of the two constraints or the lack of them is still an issue at present. Perhaps one compromise is to determine the secondary structures both with and without the two constraints. Any unusual disagreement between the two results will caution against a too literal interpretation of the results. Recently Bolotina e t a l . 75,8°,81 reinvestigated the method of Chen e t al. 77 but counted the secondary structures from the X-ray diffraction data based on a single " r i g i d " criterion proposed by Finkel'stein e t al. 82 The a s s i g n e d f values are often smaller than those based on what they termed

78 C. C. Baker and I. Isenberg, Biochemistry 15, 629 (1976). 79 R. G. Hammonds, Jr., Eur. J. Biochern. 74, 421 (1977). 8o I. A. Bolotina, V. O. Chekhov, V. Lugauskas, and O. B. Ptitsyn, Mol. Biol. (USSR) 14, 902 (1980); English translation, p. 709 (1981). 81 I. A, Bolotina, V. O. Chekhov, V. Lugauskas, and O. B. Ptitsyn, Mol. Biol. (USSR) 15, 167 (1981); English translation, p. 130 (1981). 82 A. V. Finkel'stein, O. B. Ptitsyn, and S. A. Kozitsyn, Biopolymers 16, 497 (1977).

[11]

CALCULATION OF PROTEINCONFORMATIONFROMCD

231

the " m i l d " criterion proposed by Levitt and Greer. 83 In their first paper Bolotina e t a l . 75 used the same five proteins (myoglobin, lysozyme, ribonuclease A, papain, and lactate dehydrogenase) that had been studied by Chen e t a l . 69'70'76 The calculated reference spectra of the helix, fl-form, and unordered form were found to agree well with the CD spectra of the corresponding conformations of (Lys)n (see Fig. 1). This observation contrasts with the finding of Straus e t a l . 67 that the rotational strengths and thereupon the intensities of the helical CD bands of proteins are always lower than those of helical synthetic polypeptides. It also disagrees with the chain-length dependence of the helical bands as found by Chen e t al. 76 Nevertheless, the secondary structures of the five proteins as redetermined from their CD spectra were in good agreement with the X-ray data based on the "rigid" criterion. In their second paper Bolotina e t al. 8° also considered the contributions of the fl-turns and determined the four reference spectra in Eq. (11) (Table IV). For the fl-turns they used the data of Chou and Fasman on 29 proteins 41 but considered only the second and third residues of the fourresidue fl-turns to contribute to the CD reference spectrum. As in the method of Chang e t a l . , 77 only net fl-turns were counted. The four reference spectra so obtained were used to analyze the secondary structures of five reference proteins and five additional proteins (subtilisin BPN', glyceraldehyde-3-phosphate dehydrogenase, insulin, concanavalin A, and cytochrome c). In their third paper Bolotina et al. 8~ extended the number of reference proteins to six by adding subtilisin BPN' or seven by further adding glyceraldehyde-3-phosphate dehydrogenase. In addition, the flform was separated into parallel and antiparallel fl-forms. The reference spectra of the helix, fl-turn, and unordered form remained the same regardless of whether five, six, or seven reference proteins were used. Whether the fl-form in Eq. (11) was split into two terms, the changes were as small as 5% or less between 200 to 250 nm and well within the uncertainties of the method of analysis (O. B. Ptitsyn, personal communication). The reference spectra, X;(~.) in Eq. (6) obtained from proteins depend on the a s s i g n e d f values of the reference proteins. For a set of CD spectra, X(~), the use of smaller f s would lead to large Xi(~)s, which may partially account for the reference spectrum of the helix as found by Bolotina e t al. 8° For instance, the mean residue ellipticities at 222 nm for helical (Lys)n, an infinite helix ([0]H~), a helix with 10 residues ([0]hi°), and the helix based on the "rigid" criterion were -37,600, -37,400, -28,100, and -36,100 deg cm 2 dmo1-1, respectively. O. B. Ptitsyn (personal com83 M. Levitt and J. Greer, J. Mol. Biol, 114, 181 (1977).

232

MACROMOLECULAR CONFORMATION: SPECTROSCOPY T A B L E IV

REFERENCE CD SPECTRA OF FOUR CONFORMATIONS ~ASED ON FIVE PROTEINS: MEAN RESIDUE ELLIPTICITIES IN deg cm z drool ~ " k(nm)

[0]H x 10 3

[0]8 x 10 3

[0It x 10 3

[0]R x 10 3

250 248 245 242 240 239 238 237 236 235 234 233 232 231 230 229 228 227 226 225 224 223 222 221 220 219 218 217 216 215 214 213 212 211 210 209 208 207 206 205 204 203 202 201 200

0 -0.15 -0.85 -2.14 -3.83 -4.62 -6.03 -7.29 -8.87 - 10.7 - 12.8 - 15.4 - 17.9 -20.3 -23.0 -26.4 -29.1 -31.5 -33.4 -34.7 -35.4 -35.9 -36.1 -36.0 -35.6 -34.3 -33.4 -32.6 -32.2 -31.4 -30.6 -30.0 -29.9 -30.3 -30.4 -30.5 -29.7 -28.0 -25.2 -20.6 - 14.6 -8.03 -0.69 8.05 16.6

0 -0.02 0.30 0.71 1.97 2.70 2.52 3.46 3.62 5.12 5.57 7.16 6.04 6.97 6.48 3.86 2.20 -I.17 -4.15 -6.71 -9.15 - 11.4 - 14.2 - 15.8 -16.9 - 17.0 -17.9 - 18.7 - 18.4 -18.4 -18.0 - 16.7 - 15.6 - 13.6 - I 1.3 -9.61 -6.66 -3.99 0.69 4.47 7.21 11.5 13.7 19.3 25.0

0 0.92 0.64 2.42 1.24 0.99 1.95 1.86 1.93 -0.34 - 1.83 -0.40 -0.32 1.47 5.31 11.7 17.3 22.2 28.4 30.9 31.4 31.8 31.3 30.6 29.7 27.5 24.7 23.0 20.9 20.1 18.3 16.0 16.4 14.6 13.6 11.9 8.48 5.62 0.60 -5.22 -9.40 - 12.9 - 10.8 -6.0 6.42

0 -0.28 -0.19 -0.80 -0.26 -0.40 -0.28 -0.45 -0.34 0.05 0.47 -0.07 0.22 -0.69 -1.55 -2.33 -3.27 -3.73 -4.81 -5.10 -5.13 -5.03 -4.49 -4.27 -4.21 -4.34 -3.87 -3.54 -3.17 -3.36 -3.50 -3.74 -4.38 -4.65 -5.41 -5.78 -6.47 -7.23 -7.93 -8.53 -9.37 - 10.7 - 13.3 - 17.7 -23.7

" The proteins used were myoglobin, lactate d e h y d r o g e n a s e , lys o z y m e , papain, and r i b o n u c l e a s e A. From Bololina e t a l . 75 with the permission of Plenum (copyrighted by Plenum).

[ 1 1]

[11]

CALCULATION OF PROTEIN CONFORMATION FROM C D

233

munication) finds the chain-length dependence of the helix unnecessary in their empirical method. He further points out that the reference spectrum of the helix is practically the same for the "rigid" method of Bolotina e t al. 8° and for the early results of Chen e t al. 7° [The fH values for papain (0.21) and lactate dehydrogenase (0.29) were lower than the later updated ones (0.28 and 0.45, respectively) used by Chen e t al. 76 and Chang e t al. 77 ] That the counting o f f s in a protein molecule is a serious problem will be discussed in the section on "Comparison of Methods for CD Analysis." O. B. Ptitsyn (personal communication) suggests that (Lys)n in aqueous solution at pH above 10 is not a good model for an ideal infinite helix. It may not be completely helical and the fluctuating helical regions are rather short and probably a little distorted (its low intrinsic viscosities were even smaller than the corresponding ones of the coiled form). Therefore, the reference spectra between helical (Lys)n and the helix based on the rigid method of Bolotina e t al. 8° practically coincided. To what extent helical (Lys), deviates from a perfect helix is not known. We tend to think that a helix consisting of long interrupted segments virtually approaches an infinite helix according to Eq. (9). On the other hand, the aggregation of an uncharged polypeptide can distort its CD spectrum. The use of proteins for the determination of reference spectra is more realistic than the use of synthetic polypeptides. The disadvantages of this approach are several-fold. First, the choice of reference proteins is by necessity arbitrary. Initially, only proteins whose three-dimensional structures are available were used. Ideally, the set of reference proteins should cover a wide range offH, f~, and ft. This condition is difficult to meet. Today many proteins of known secondary structure can be used as reference proteins, but what constitutes a representative set of reference proteins remains obscure. Thus, different sets of reference proteins may give different intensities of reference spectra and even different profiles for the fl-turn and unordered form. Second, the determination of the secondary structures is not straightforward. X-Ray crystallographers often use different criteria for identifying the secondary structure. Users of X-ray diffraction data may also set their own criteria. For instance, Saxena and Wetlaufer6s chose for fH a mean value between the lower limit representing regular a-helix and the upper limit representing total helices, including 3~0- and distorted helices, a practice that is not followed by other workers. Chen e t al. 70'76 and Chang e t a l . 77 used the values given by the X-ray crystallographers. No distinction was made between a-helix and 3j0- and distorted helices, nor between parallel and antiparallel fl-forms. In the early days X-ray results were often subject to revision when data at higher resolutions were obtained. [In the mid-1960s ORD studies of cytochrome c suggested a helicity of 17-

234

MACROMOLECULAR CONFORMATION: SPECTROSCOPY

[ 1 1]

27%, whereas X-ray diffraction at 4 A indicated little or no o~-helix (see, for example, Ref. 84). Doubts were therefore cast on the ORD analysis, which after all is empirical. Now we know that cytochrome c has about 40% helix and no/3-form, in full support of CD analyses of this protein.] Of the five proteins used by Chen et al.,7° thefH andf~ of lactate dehydrogenase were later updated from 0.29 to 0.45 and 0.20 to 0.24, respectively (even the number of amino acid residues was raised from 311 to 331) and those of papain from 0.21 to 0.28 and 0.05 to 0.14, respectively. Obviously, such revision will affect the reference spectra; while the CD of the helix did not alter much, that of the/3-form and, in particular, the unordered form varied significantly. Bolotina et a/. 75,8°,81 preferred to use a "rigid" criterion for the secondary structures, 82 which differs from the criteria used by X-ray crystallographers or the relaxed criteria set by Levitt and Greer. 83 At present, there are no generally accepted criteria yet. Third, the secondary structure of a protein molecule is usually far from ideal. At present only the reference spectrum of the helix is well understood. Even here the 310-helix and distorted helices are lumped together with the s-helix and the matter of chain-length dependence of the helix is still undergoing investigations. But these assumptions seem to be supported by experimental and theoretical calculations. Unlike the helix, the/3-form is often twisted and nonplanar. The degree of twisting varies from one protein to another, but the sense of twist is always the same. The dependence of CD on the chain length and sheet width is still not settled. Even the separation of parallel and antiparallel /3-forms can present problems. In the hypothetical case of a 3-strand sheet, two/3strands are parallel to each other and a third/3-strand is antiparallel to the second strand. It is difficult to decide whether the middle strand can be considered as half parallel and half antiparailel. However, the /3-form appears to have a single characteristic CD spectrum, albeit with considerable variations. The types of/3-turns are more diverse and their CD spectra are not well characterized even for model compounds. The unordered form is more complicated than the ordered one; we can only assume a statistical average for the observed CD. All these problems will certainly complicate the CD analysis of proteins.

Linear Combination o f CD Spectra o f Proteins Recently, Provencher and G1/Sckner85 proposed direct analysis of the CD spectrum of a protein by linear combination of the CD spectra of 84 S. Beychok, Annu. Rev. Biochem. 37, 437 (1968). 85 S. W. Provencher and J. Gl6ckner, Biochemistry 20, 33 (1981).

[11]

CALCULATION OF PROTEIN CONFORMATION FROM C D

235

reference proteins of known secondary structure. This approach may circumvent the dilemma between a stable but inadequate model and a realistic but unstable one (see section above). The instability of many parameters in Eq. (6) is controlled by a simple constrained regularization procedure. By introducing a coefficient 3'i to be determined and expressing X(X) in Eq. (6) as Ny

X(M = ~ 3,jRj(X)

(12)

j=l

the f values are simply N, f = ~ 3,fjs

(13)

.]= 1

Here Rj(h) is the CD of thejth reference protein at wavelength )t and N~ is the number of reference proteins used. Fj/is the fraction of the ith conformation of the jth protein. The Yi values are determined by minimizing a quantity e: Ny

N~

e = ~ [Y(X - X(X)]2 + a ~ (3,j - 1/NT) 2 h=l

(14)

j=l

again with the constraints: ~ f = 1 and fi >- O. X(X) and Y(h) are the experimental and calculated values, respectively, and Ny is the number of data points. For a = 0, Eq. (14) is reduced to the least-squares solution. For a > 0, the second term on the right side of Eq. (14) tends to stabilize the solution by keeping each 3,j small, i.e., near 1/Nx, unless the corresponding Ri(y) happens to have components that fit X(h) well and thereby significantly reduce the least-squares term, the first term on the right side of Eq. (14). A general FORTRAN regularization package has been developed and a user-oriented version is available upon request from the authors (see Appendix B). Provencher and Gl6ckner 85 used the same CD data of 18 proteins provided by Chang et al. 77 except that the CD spectra of thermolysin and subtilisin BPN' were excluded in their analysis in order to improve the correlation coefficients. Independently, Hennessey and Johnson 86 used a similar approach but applied an eigenvector method of multicomponent analysis with an unconstrained least-squares method. Their analysis was based on matrix calculation of orthogonal basis CD spectra from the CD spectra of reference proteins. Fifteen proteins and one helical polypeptide, proton86 j. p. H e n n e s s e y , Jr. and W. C. Johnson, Jr.,

Biochemistry 20, 1085 (1981).

236

M A C R O M O L E C U L A R C O N F O R M A T I O N : SPECTROSCOPY

[11]

ated (Glu)n, were studied (see footnote a of Table V) and their CD spectra were extended to 178 nm in the vacuum ultraviolet region. Hennessey and Johnson concluded that five most significant basis spectra were needed to reconstruct the original CD spectra of 15 proteins and one polypeptide (Fig. 5) and the remaining 11 basis spectra were probably noise. With only five basis CD spectra, no more than five independent types of secondary structure can be learned from a CD spectrum. However, Hennessey and Johnson actually considered eight types of secondary structures: the helix, the parallel and antiparallel/3-forms, the four types of/3-turns (types I, II, and III and type T, which combines the remaining/3-turns), and the unordered form. They are able to do this because the same eigenvector method applied to the X-ray structural data gives only five important basis structure vectors. Thus only five independent secondary superstructures are necessary to describe the eight standard secondary structures, which are not independent to within a small error in the protein analyses. Since each basis CD spectrum corresponds to a mixture of secondary structures, the same coefficients for the reconstruction of the CD spectrum of a protein can be used to estimate the secondary structure of the protein. A FORTRAN program, "PROSTP," developed for this method of CD analysis of proteins is given in Appendix C. The five basis spectra from 250 to 178 nm (Fig. 5) can only be used to analyze the conformation of a protein whose CD spectrum covers the same range of wavelengths. This limitation poses a practical problem because no commercial instruments for VUCD will be available in the foreseeable future unless there is such a demand. Consequently, the majority of users are compelled to limit their CD measurements to about 184 nm under ideal conditions; most reported CD spectra were cut off at 190 nm or even longer wavelengths. Hennessey and Johnson analyzed the effect of data truncation on the CD analysis and found that the 184-nm cut-off still made little difference in the results. As long as any major CD band is not eliminated, their method of analysis is fairly insensitive to the wavelength. However, truncation to 190 nm resulted in striking changes, which were reflected almost solely in the estimates of the /3-form and unordered form as well as the totals ( ~ f ) . Further truncation to 200 nm accentuated some of these changes but actually corrected others. Table V cannot be used for the method of Hennessey and Johnson over wavelength ranges other than 250 to 178 nm. Instead, the basis CD spectra should be regenerated by using Program " B A V G E N " in Appendix C so that the vectors will be orthogonal. Provencher and Gl6ckner 85 analyzed the CD spectrum of a protein from CD spectra of 16 proteins (Ref. 77), excluding one spectrum of the protein to be analyzed. This is equivalent to analysis with 15 basis spectra

[11]

CALCULATION OF PROTEIN CONFORMATION FROM C D TABLE V FIVE BASIS CD SPECTRA (8 L 8R) DERIVED FROM FIFTEEN PROTEINS AND ONE POLYPEPTIDE" h(nm)

1

2

3

4

5

250 248 246 244 242 240 238 236 234 232 230 228 226 224 222 220 218 216 214 212 210 208 206 204 202 200 198 196 194 192 190 188 186 184 182 180 178

0 0 -0.2 -0.3 -0.6 - 1.4 -2.6 -4.4 -6.8 -9.1 -11.7 - 14.8 - 17.2 -19.3 --20.3 -20.1 -19.4 -18.9 --18.4 -18.3 -18.5 - 18.3 - 16.8 - 11.7 -3.8 5.4 17.4 28.0 36.3 41.2 40.0 36.4 30.8 24.4 18.6 13.5 10.4

0 0 0.2 0.3 0.5 0.6 0.7 0.8 0.9 1.0 1.2 1.7 1.7 1.3 1.2 1.4 1.6 1.4 1.0 0.6 0.2 -0.9 -2.6 -4.4 -6.1 -8.5 -9.6 -9.6 -6.0 -1.2 2.3 5.3 6.6 6.7 6.0 4.9 4.6

0 0 -0.1 -0.2 -0.3 -0.4 -0.5 -0.6 -0.8 -1.1 -1.5 - 1.8 -2.0 -2.1 -2.3 -2.8 -3.1 -3.4 -3.6 -3.8 -3.9 -4.0 -3.6 -3.3 -3.0 -2.5 -2.5 -2.1 -2.5 -2.9 -2.6 -2.5 -2.3 -2.1 -2.3 -2.9 -3.6

0 0 -0.1 -0.1 -0.1 -0.1 -0.1 0 0.1 0.3 0.5 0.9 0.9 0.8 0.7 0.5 0.5 0.3 0.3 0.3 0.5 0.5 0.2 -0.2 -0.8 - 1.1 -1.1 -0.5 0.9 2.0 2.3 1.9 0.7 -0.7 -2.1 -3.3 -3.6

0 0 -0. I -0.2 -0.3 -0.4 -0.5 -0.7 -0.7 -0.7 0.8 -0.7 -0.8 -0.8 0.7 -0.6 -0.5 -0.3 -0.1 0.1 0.5 1.0 1.4 1.4 1.1 0.2 -0.9 -1.1 -0.8 -0.2 0.4 0.5 0.3 0.2 0.2 -0.1 -0.5

" The biopolymers used were o~-chymotrypsin, cytochrome c, elastase, flavodoxin, glyceraldehyde-3-phosphate dehydrogenase, hemoglobin, lactate dehydrogenase, lysozyme, myoglobin, papain, prealbumin, ribonuclease, subtilisin BPN', subtilisin Novo, triosephosphate isomerase, and helical poly(L-glutamic acid). All data are expressed as mean residue CD in cm ~ M L From Hennessey and Johnson 86 with the permission of the American Chemical Society, copyright 1981.

237

23 8

MACROMOLECULAR CONFORMATION: SPECTROSCOPY

[ 1 1]

.,o X_L/ t" 0

I

I

I

I

-4

'--

O2/X8

. _.L'. V 180 2 0 0

220

240

.~, n m

FIG. 5. Five most significant base CD spectra generated by the eigenvector method of multicomponent matrix analysis in descending order of significance. (Redrawn from Hennessey and Johnson 86 with the permission of American Chemical Society, copyright 1981.)

in the method of Hennessey and Johnson,86 if the two groups had studied the same reference proteins. However, there are differences between the two methods. Provencher and Gl6ckner introduced the two constraints: ~f,- = 1 and fi ~ 0, but Hennessey and Johnson considered such constraints artificial. Instead, in their published work they normalized the calculated j~s to ~ f , = 1. Hennessey and Johnson used only five most significant basis spectra (Fig. 5). The third spectrum has no nodes, the first one node, the second two nodes, the fourth essentially three nodes, and the fifth four nodes. The trend appears to be more nodes and lower intensities for less important basis spectra. The less significant spectra probably reflect noise, and are therefore eliminated from the CD analysis (W. C. Johnson, Jr., personal communication). With orthogonal basis spectra, the fit for the CD spectrum of a protein other than one of the original set of 16 proteins will be unique. Hennessey and Johnson did not take into consideration the chain-length dependence of the CD of the

[11]

CALCULATION OF PROTEIN CONFORMATION FROM CD

239

helix. How the exclusion of (Glu)n, a helical polypeptide, will affect their basis spectra remains to be seen. However, W. C. Johnson, Jr. points out that their most significant basis spectra average any variability in the CD contributions due to other sources such as side chains, tertiary structure, chain length of the secondary structure, as well as errors in concentration measurements, CD measurement, etc. These variations are contained in the discarded basis spectra, which are merely "noise. ''87 Since the basis spectra are obtained from proteins of known secondary structure, the advantages and disadvantages of the Provencher-GlOckner and Hennessey-Johnson methods are the same as those described in the section on "Reference Spectra Based on Proteins." However, these two methods have one attractive feature: it circumvents the determination of reference spectra of the helix, /3-form, /3-turn, and unordered form. Of course these methods will only apply to proteins whose structural characteristics are well represented in the set of reference proteins used to construct the basis spectra. Hennessey and Johnson-extended their CD measurements to the vacuum ultraviolet region just as Brahms and Brahms did in the section on "Reference Spectra Based on Model Polypeptides." Unfortunately, users of commercial instruments do not have access to wavelengths below 184 nm at present.

Comparison of Methods for CD Analysis The very nature of any empirical method unavoidably invites adverse criticism, especially when various methods give different results. Before we rush to conclude that the methods of CD analysis are not suitable for determining protein conformation, it seems appropriate to compare the results based on some current methods and point out the uncertainties and possible improvements. Earlier, Rosenkranz 88 compared various methods that were available in 1974 and concluded that the reference spectra developed by Rosenkranz and S c h o l t a n 66 and Chen et al. 7° gave good agreement between experimental and calculated CD spectra for several proteins tested. He further indicated that these two methods were highly suitable for determining the conformation of proteins having CD bands of the helix or//-form or both. He also found that the new reference spectra obtained by Chen et a l . y6 were less satisfactory than those used previously, y° Suffice it to say, a good fit between the experimental and calculated curves does not guarantee a correct solution of the secondary structure of a protein, but a poor fit often points up the imperfection in the s7 W. C. Johnson, Jr., in ~'Methods of Biochemical Analysis" (D. Glick, ed.), p. 61, Wiley (lnterscience), New York, 1985. 88 H. Rosenkranz, Z. Klin. Chem. Klin. Biochem. 12, 415 (1974).

240

MACROMOLECULAR CONFORMATION: SPECTROSCOPY

[11]

method of analysis. Thus, we should also compare CD estimates with Xray results, if possible. All current methods for CD analysis are developed to determine not only the helix and fl-form but also the/3-turn and unordered forms. The correlation coefficients between CD and X-ray diffraction analysis from five laboratories can be calculated, although the proteins used were not all identical (Table VI): (I) Chang e t al. 77 and (II) Bolotina e t al. 8° determined reference spectra of proteins for the four conformations, (III) Provencher and Gl6ckner 85 and (IV) Hennessey and Johnson 86 used a linear combination of basis spectra of proteins, and (V) Brahms and Brahms 47 computed the CD spectrum of a protein from the spectra of model compounds. A correlation coefficient, r, near 1 indicates a successful prediction, whereas an r value near zero predicts no better than a random assignment and an r value close to - 1 indicates a total disagreement between experimental and computed results. On this basis the estimated fH values correlated highly between CD and X-ray results among all five laboratories. The CD estimates for the fl-form show significant correlation, especially for the Hennessey-Johnson and Brahms-Brahms results. But in all five methods the correlation coefficient for/3-turns was poor when unbiased results were compared. The results from proteins used to calculate correlation coefficients by Chang e t al. 77 and Bolotina e t al. 8° had a bias effect, viz. the reference proteins included the protein to be analyzed. Provencher and Gl6ckner 85 evaluated the data of Chang e t al. 77 and showed that the unbiased correlation coefficient for the/3-form dropped sharply and that for the/3-turn became negative (Table VI). Hennessey and Johnson determined the correlation coefficients both with and without the CD spectrum of the protein to be analyzed in the construction of their basis spectra. 86The r values for the/3-forms and/3-turns were poorer with the unbiased data than with the biased ones. Bolotina e t al. 8° found that the CD spectrum of concanavalin A was poorly described by their reference spectra and they attributed this anomaly to large CD contributions from aromatic chromophores in accordance with the work of Hermann e t al. 89 Correction for these nonpeptide contributions led to good correlation coefficients, but when such corrections should be applied to an unknown protein and how they can be applied become problematic. The method of Brahms and Brahms is also unbiased because they used model peptides and myoglobin for the reference spectra, and their arbitrary reduction of the CD magnitude of (Ala2GIy2), for the/3-turn by one-half is subject to mention. 89 M. S, Hermann, C. E. Richardson, L. M. Setzler, W. D. Behnke, and R. E. Thompson, Biopolymers 17, 2107 (1978).

[11]

CALCULATION OF PROTEIN CONFORMATION FROM C D

241

T A B L E VI CORRELATION COEFFICIENTS BETWEEN CD ESTIMATE AND X-RAY RESULTS OF T H E f VALUES Method C h a n g e t al. (1978) a l 2 3 Bolotina e t al. (1980) b 1 2 P r o v e n c h e r and G16ckner (1981)' H e n n e s s e y and J o h n s o n (1981) '1 1 2 B r a h m s and B r a h m s (1980) e

Helix

/3-Form

/3-Turn

Unordered form

0.87 0.92 0.85

0.61 0.83 0.25

0.15 0.23 -0.31

-0.37 0.46

0.93 0.95 0.96

0.93 0.57 0.94

0.84 0.39 0.31

-0.39 0.49

0.98 0.95 0.92

0.83; 0.71 0.66; 0.51 0.93

0.78 0.25 0.33

0.88 0.72 0.65

" (I) From C h a n g et al. 77 based on 18 proteins (see footnote a of Table ill plus adenylate

kinase, a - c h y m o t r y p s i n , and elastase). The Pearson p r o d u c t - m o m e n t correlation coefficient, r, is defined as r = [Y, X i Y i - ~ X i ~

Y i / n ] / { [ E X i 2 - (Y~Xl)2/rt] × [ ~ Yi2 - ( ~ Y,)2/n]}l/'-

Here Xi and Yi are the experimental and calculated values, respectively, and n is the n u m b e r of samples studied. (2) F r o m Provencher and GI6ckner. 85 The data on 16 of the 18 proteins u s e d by C h a n g e t al. in (1), excluding subtilisin B P N ' and thermolysin, were recalculated. (3) Same as in (2) but with 15 proteins. The protein to be analyzed was excluded from the reference proteins. b (1) F r o m Table 3 of Bolotina et al. 8~ based on 13 proteins (see the text). The CD s p e c t r u m of concanavalin A was corrected. Glyceraldehyde-3-phosphate dehydrogenase was omitted in the calculation of the r value for the fl-turn. (2) From Woody. ~ T h e data for 10 proteins from Table 3 of Bolotina e t al. 8° including the uncorrected data for concanavalin A (see the text) were recalculated. Again glyceraldehyde-3-phosphate d e h y d r o g e n a s e was excluded for the/3-turn calculation. c F r o m Table II o f P r o v e n c h e r and Gl6ckner in footnote a(2) based on the CD spectra of C h a n g e t al. in footnote a(1). The protein to be analyzed was not used in the construction o f base spectra. d F r o m Table VIII of H e n n e s s e y and Johnson. 86 The two n u m b e r s under the/3-form refer to antiparallel and parallel/3-forms. The r value for the fl-turn was based on all/3-turns. (I) Analyzed from basis CD spectra; (2) analyzed from basis CD spectra constructed without the s p e c t r u m of the protein to be analyzed. The individual r values for the/3turns (types 1, 11, and II1 and other ,t3-turns combined) were (I) 0.53, 0.73, 0.36, and 0.71, respectively, and (2) - 0 . 0 7 , 0.51, - 0 . 4 4 , and 0.38, respectively. e F r o m W o o d y 3j in footnote b(2) based on data from 17 proteins in Table 2 of B r a h m s and Brahms,47 excluding rubredoxin and two other proteins. B r a h m s and B r a h m s suggested that the s e c o n d a r y structure of rubredoxin was different in crystals and in solution. The two o t h e r proteins were not analyzed by Levitt and Greer w h o s e m e t h o d was used by W o o d y for determining the s e c o n d a r y structures.

242

MACROMOLECULAR CONFORMATION: SPECTROSCOPY

[11]

A further test of various methods can be made by comparing the CD estimates of the secondary structure of several proteins, most of which had been studied by the five laboratories (Table VII). Several features emerge. First, there are considerable differences in the way secondary structures are totalled from X-ray diffraction results by different laboratories. Following Bolotina e t a l . , 75 we illustrate in Fig. 6 the secondary structures of five proteins from X-ray analysis. X-Ray crystallographers inspect their models and often determine the local conformation of a residue in relation to neighboring residues and the pattern of hydrogen bonds involving these residues. According to Levitt and Greer, 83 the most successful criteria should be based on peptide hydrogen bonds, inter-C" distances, and inter-C" torsion angles. Small errors in the dihedral angles could lead to quite a different assignment. Accordingly, these authors have used precise rules and developed a computer program to automatically analyze the atomic coordinates of many globular proteins. A list of a-helices,/3-sheets, and/3-turns of almost all proteins of known secondary and tertiary structure before 1977 was compiled. The assignment of Levitt and Greer does not always correspond to that by X-ray crystallographers. Bolotina e t al. 8° used rigid criteria proposed by Finkel'stein e t al. 82 for the helix and fl-form and counted only the second and third residues of the 4residue/3-turns based on the Chou-Fasman work with 29 proteins. Chang e t al., 77 Bolotina e t a l . , 8° and Provencher and Gl6ckner 85 considered only the net/3-turns by canceling types I', II', and III' from their mirror images, types I, II, and III. Hennessey and Johnson 86 lumped together all/3turns other than types I, II, and III in one class. Clearly, these different procedures will yield different f values. As Fig. 6 illustrates, myoglobin has anfH of 0.88 according to Levitt and Greer, but most workers count it slightly less than 0.80. ThefH andfe values for lysozyme are 0.45 and 0.19, respectively, by the method of Levitt and Greer, 83 but only 0.30 and 0.09, respectively, according to Bolotina e t a/.8°; the ft value for lysozyme is taken as 0.32 by Hennessey and Johnson 86 and 0.19 by Bolotina e t al. 8° Likewise, the ft~ value varies from 0.23 found by Bolotina e t al. 8° to 0.40 by Chang e t al. 77 and to 0.46 by Levitt and Greer. 83 T h e f values used by five laboratories (Table VII) are not exactly identical in most cases. These differences will in turn affect the reference or base spectra of proteins of known structure, not to mention that different sets of reference proteins may also alter these reference or base spectra. Another complication involving the counting of secondary structure often seems to have been overlooked. Because the CD bands of various conformations arise from peptide chromophores, peptide bonds rather than amino acid residues should be used to determine the fractions of helix,/3-form, and/3-turn. As far as we are aware, Hennessey and John-

[11]

CALCULATION OF PROTEIN CONFORMATION FROM C D

243

TABLE VII COMPARISON OF THE SECONDARY STRUCTUREOF TEN PROTEINS BETWEEN CD ESTIMATES AND X-RAY RESULTS Protein Myoglobin

Method a I

II III IV

Lactate dehydrogenase

I

II III IV

V Lysozyme

I

II III IV

V Cytochrome c

I

II III IV

X-ray CD (1) CD (2) X-ray CD X-ray CD X-ray CD (1) CD (2) X-ray CD (1) CD (2) X-ray CD X-ray CD X-ray CD (1) CD (2) X-ray CD X-ray CD (1) CD (2) X-ray CD X-ray CD X-ray CD (1) CD (2) X-ray CD X-ray CD (1) CD (2) X-ray CD X-ray CD X-ray CD (1) CD (2)

fH 0.79 0.80 0.83 0.76 0.79 0.79 0.86 0.78 0.75 0.75 0.45 0.45 0.47 0.39 0.32 0.45 0.40 0.41 0.40 0.42 0.42 0.41 0.41 0.32 0.32 0.30 0.30 0.41 0.45 0.36 0.34 0.36 0.45 0.43 0.39 0.44 0.48 0.43 0.33 0.39 0.33 0.38 0.40 0.32

f~ 0 0 0.05 0 0.01 0 0 0 -0.01 -0.03 0.24 0.18 0 0.14 0.11 0.24 0.22 0.17 0.19 0.04 0.26 0.22 0.16 0.29 0.33 0.07 0.07 0.16 0.21 0.09 0.32 0.20 0.19 0.12 0 0 -0.16 0 0.09 0 0.09 0 0.04 -0.01

ftt 0.05 0.02 0.07 0.06 0.07 0.05 0 0.12 0.08 0.06 0.06 0.13 0.13 0.11 0.09 0.06 0.13 0.11 0.13 0.11 0.19 0.17 0.23 0.08 0.08 0.19 0.19 0.23 0.26 0.32 0.26 0.28 0.14 0.08 0.24 0.28 0.33 0.14 0.08 0.24 0.17 0.17 0.14 0

fR

~fi b

0.16 0.18 0.33 0.18 0.13 0.16 0.14 0. l0 0.07 0.07 0.25 0.24 0.22 0.36 0.48 0.25 0.26 0.31 0.29 0.24 0.13 0.20 0.20 0.31 0.33 0.44 0.44 0.20 0.08 0.23 0.25 0.18 0.22 0.37 0.37 0.28 0.31 0.43 0.50 0.37 0.41 0.45 0.36 0.16

1.18

0.89 0.87

0.81

1.01 0.81

1.06

1.17 1.02

0.96

0.93 0.46

(continued)

244

MACROMOLECULAR CONFORMATION" SPECTROSCOPY

[11]

TABLE VII (continued) Protein

Method" V

Subtilisin BPN'

I

II III IV

Papain

I

II III IV

Ribonuclease A

I

II Ill IV

a-Chymotrypsin

I

III IV

Elastase

I

III IV

X-ray CD X-ray CD (1) CD (2) X-ray CD X-ray CD X-ray CD (1) CD (2) X-ray CD (1) CD (2) X-ray CD X-ray CD X-ray CD (1) CD (2) X-ray CD (1) CD (2) X-ray CD X-ray CD X-ray CD (1) CD (2) X-ray CD (1) CD (2) X-ray CD X-ray CD (1) CD (2) X-ray CD (1) CD (2) X-ray CD X-ray

fH 0.430.48 0.46 0.31 0.15 0.19 0.27 0.26 0.31 0.15 0.30 0.24 0.26 0.28 0.29 0.30 0.24 0.25 0.28 0.27 0.28 0.27 0.25 0.23 0.21 0.17 0.17 0.21 0.23 0.26 0.24 0.25 0.22 0.09 0.05 0.07 0.09 0.09 0.10 0.13 0.12 0.07 0 -0.08 0.07 0.04 0.10



ft

0.09

0.23

0 0.10 0.58 0.15 0.16 0.16 0.10 0.48 0.09 0.22 0.19 0.14 0 0.06 0.11 0.11 0.14 0.05 0.09 0.14 0.19 0.40 0.39 0.79 0.23 0.25 0.40 0.44 0.33 0.18 0.38 0.34 0.53 0.35 0.34 0.29 0.34 0.25 0.09 0.52 0.46 0.72 0.52 0.49 0.37

0.22 0.22 0.04 0.06 0.13 0.12 0.22 0.18 0.21 0.14 0.14 0.17 0.15 0.18 0.10 0.10 0.17 0.31 0.14 0.15 0.17 0.13 0.10 0.09 0.15 0.16 0.13 0.11 0.14 0.20 0.15 0.34 0.02 0.04 0.34 0.22 0.20 0.19 0.15 0.26 0.07 -0.01 0.26 0.14 0.22

fR 0.250.20 0.31 0.37 0.23 0.18 0.44 0.46 0.37 0.18 0.40 0.29 0.40 0.41 0.56 0.60 0.55 0.54 0.41 0.36 0.49 0.50 0.45 0.24 0.30 0.36 0.45 0.38 0.24 0.19 0.29 0.22 0.26 0.23 0.40 0.39 0.23 0.40 0.36 0.32 0.20 0.15 0.47 0.44 0.15 0.32 0.31

~f~'

0.58

0.89 1.00

1.14

1.06 1.06

1.41

0.86 1.02

0.85

0.89 0.56

1.07

[11]

CALCULATION OF PROTEIN CONFORMATION FROM C D

245

TABLE VII (continued) Protein

Concanavalin A

Method"

I

II 111 V

CD (l) CD (2) X-ray CD (1) CD (2) X-ray CD X-ray CD X-ray CD

fH

f~

0.09 0.09 0.02 0.25 0.33 0.02 0 0.02 0.08 0.06 0.03

0.30 0.35 0.51 0.46 -0.56 0.46 0.12 0.51 0.41 0.59 0.49

ft

0.22 0.23 0.09 0.20 0.25 0.16 0.02 0.09 0.15 0.20 0.19

fR

0.36 0.38 0.38 0.09 -0.02 0.34 0.86 0.38 0.36 0.25 0.29

~fi b

0.97 1.05

" From Table I of Chang et al. 77 (1) With the constraints: Y~f = 1 and 1 -> f -> 0; (2) without the constraints. II, From Table 3 of Bolotina et aL so If the fl-form was split into parallel and antiparallel /3-forms, the results were slightly different (see Table 1 of Bolotina et al. 8~ III, From Table I of Provencher and Gl6ckner, 85based on the same CD spectra and X-ray results used by Chang et al. in footnote aI. IV, Based on Table IX of Hennessey and Johnson. 86 The results listed here were provided by W. C. Johnson, Jr. (private communication) who pointed out that the f values by X-ray diffraction were slightly different from the published work. Also unlike the published results, the CD analysis was not adjusted for intensity so that Y~f = 1.00. Thef~ value represents the sum of fg(parallel) and f~(antiparallel). The ft value represents the sum off(I), ft(II), ft(III), andfi(T). However, the results listed here have the bias effect, i.e., the protein to be analyzed was included in the set of reference proteins. V, From Table 2 of Brahms and Brahms. 47 The f values listed here were based on Levitt and Greer. 83 Method III was unbiased; the protein to be analyzed was removed from the set of reference proteins. Method V also had no bias effect since the reference spectra were based on model peptides and myoglobin. b For X-ray results, Y,f is always unity and is therefore not listed. For CD results, the absence of a number in this column indicates that the constraint ~ f = 1 was used.

s o n d i d u s e p e p t i d e b o n d s as b a s i c u n i t s . 86 F o r a p r o t e i n h a v i n g i h e l i c a l s e g m e n t s , t h e n u m b e r o f b o n d s is t h e t o t a l n u m b e r o f a m i n o a c i d r e s i d u e s in t h e s e s e g m e n t s m i n u s i r e s i d u e s . T h e s a m e is t r u e f o r c o u n t i n g /3s t r a n d s . T h e n u m b e r o f b o n d s is t h e t o t a l n u m b e r o f r e s i d u e s m i n u s i r e s i d u e s f o r i s t r a n d s . C h e n e t a l . 76 a l s o r e c o g n i z e d t h i s d i s c r e p a n c y in t h e counting of/3-turns. Each/3-turn involves four residues but only three p e p t i d e b o n d s a r e c o n s i d e r e d in t h e o r e t i c a l c a l c u l a t i o n s ; t h u s , f o r n / 3 t u r n s , w h i c h h a v e 4n r e s i d u e s , t h e n u m b e r o f p e p t i d e b o n d s is 3n, w h i c h is a 2 5 % r e d u c t i o n . T h e s i t u a t i o n is e v e n m o r e c o n f u s i n g if a / 3 - t u r n is l i n k e d t o a h e l i x o r a / 3 - s t r a n d . H o w to i d e n t i f y a n d d i s t r i b u t e t h e s e r e s i d u e s a m o n g t h e t h r e e c o n f o r m a t i o n s r e m a i n s p r o b l e m a t i c . F o r a n unknown protein we do not know the number of helical segments,/3-strands,

246

MACROMOLECULAR CONFORMATION:SPECTROSCOPY [,C

[11]

"

fNo.~

Myoglobin Lysozyme RNoseA Popoin

Loctote dehydroo~enose

FIG. 6. The fractions of helix and fl-form of five proteins based on X-ray diffraction results. Open bar, according to "rigid" criteria proposed by Finkel'stein e t al. 8z and used in Bolotina e t al. 75,8° Solid bar, according to the criteria of Levitt and Greer. 83 Stippled bars, based on the data provided by X-ray crystallographers and used in Hennessey and Johnson 86 (left) and Chen e t al. 76 (right).

and/3-turns. Thus, the secondary structures are usually counted by the number of amino acid residues. Second, CD estimates for the helix are usually excellent regardless of the method used. This is perhaps due to the high intensities of the helical CD bands, which usually predominate over the CD contributions of other conformations if the protein contains a moderate amount of helix. In addition, the helical segments found in proteins are mostly close to the ideal a-helix, and do not have many 310-helices or distorted helices. Chen e t al. 7° found that the reference spectra varied from one set of reference proteins to another, but such variations were minimized by the inclusion of myoglobin, which has the highest helicity among the reference proteins studied. Chang e t a l . 77 found that their method did not apply to concanavalin A, which has almost no helixes; however, CD estimates for the low helicity in a-chymotrypsin and elastase were good. Bolotina e t al. 8° reported a good estimate for the helicity, but not for the/3-forms and/3turns of concanavalin A after correction for the CD contributions of aromatic chromophores. On the other hand, Provencher and Gl6ckner 85 obtained a good fit between the observed and calculated CD spectrum of concanavalin A without resorting to such corrections, but their estimates

[11]

CALCULATION OF PROTEIN CONFORMATION FROM C D

247

appeared to be low in relation to the X-ray results. Results for the fl-forms and B-turns of the proteins listed in Table VII were often unsatisfactory by the method of Chang e t al. 77 However, no single method can guarantee reliable results in every case. If Af _> -+0.10 between the CD estimate and X-ray result for any conformation was considered to be unsatisfactory, the method of Bolotina e t al. 8° gave poor results for cytochrome c, thermolysin (not shown), and concanavalin A (based on an uncorrected CD spectrum). The method of Provencher and G16ckner 85 was unsatisfactory for fl-turn estimates in papain, a-chymotrypsin, and elastase (Table VII) and also in insulin, parvalbumin, and trypsin inhibitor (not shown). Estimates for the helix and fl-forms of subtilisin BPN' were also poor, probably due to inaccurate CD data for this protein used in their analysis. The method of Hennessey and Johnson 86 did not do too well for fl-estimates in a-chymotrypsin, lysozyme, subtilisin BPN', and ribonuclease (Table VII) and in flavodoxin and prealbumin (not shown). Indeed, there are uncertainties in each method. However attractive a particular method may be, it must be subjected to extensive tests on many proteins of known secondary and tertiary structure. Whether and how the use of a different set of reference proteins or additional proteins to the current sets will alter the CD estimates remain to be investigated. Third, Chen e t a / . 69,7°first introduced two constraints: ~ f -- 1 and 1 -> f -> 0. This will reduce one parameter, fR, to be determined and also will avoid any negative numbers in the computatedf values. Such constraints are of course artificial, but Chang e t al. 77 found that their CD estimates improved with the constraints. Subtilisin BPN' was a notable exception, but the calculated Y~f without the constraints was too far from unity. The results with concanavalin A without the constraints were even worse (~]f became zero). However, the sum unity test did not seem to guarantee the correctness of the CD estimates. For instance, ~ f was close to one for lysozyme, but the fl-estimate was too high and the f/t-estimate too low without the two constraints. On the other hand, the a-estimate for myoglobin was excellent, even though it failed the sum unity test. Provencher and G16ckner85 used the two constraints, whereas Hennessey and Johnson 86 did not. It is not known whether the removal of the two constraints would affect the Provencher-Gl6ckner analysis. Fourth, Hennessey and Johnson 86 advocated the use of vacuum ultraviolet CD and suggested that even the basis spectra measured to 178 nm might not be enough for all independent pieces of information.87 But CD estimates by the Hennessey-Johnson method with a 200-nm cut-off were rather close to those with a 178-nm cut-off in several cases (Table VII). Notable exceptions were the/3-estimates for lactate dehydrogenase, lysozyme, and a-chymotrypsin and the f/t-estimate for cytochrome c. On the

248

MACROMOLECULAR CONFORMATION" SPECTROSCOPY

[1 1]

other hand, the/3-estimate for ribonuclease A was better with the 200-nm cut-off. Truncation at 190 nm led to some CD estimates that were not as good as with truncation at 200 nm (not shown). This could be merely a fortuitous coincidence, but it deserves further investigation. In the spectral analysis by Provencher and Gl6ckner, 85 the data below 190 or 210 nm were given lower statistical weights or completely discarded and yet their results were quite insensitive to this adjustment. The CD analysis of protein conformation has made considerable progress during the past decade, but the methods described here are still not perfect and need extensive tests on proteins of known secondary structure. Some methods appear to be good in most cases, but there is no guarantee that CD estimates of the secondary structure of an unknown protein will always be correct. Because /3-forms have a much broader range of conformations than helices, the CD spectra of/3-forms are expected to be more variable than those of helices. The geometry of/3-turns in proteins is even more variable than that of either helices or/3-forms. What constitutes representative CD spectra for/3-turns is still an open question. Perhaps there is a whole range of CD curves for/3-turns, which depend on the types of/3-turns and also the amino acid residues in them. One can only hope that the reference proteins used for the determination of reference or basis spectra happen to adequately represent an unknown protein to be analyzed. If possible, calculations should be complemented by other evidence, such as the sequence-predictive method. | f a CD spectrum of a protein shows a well-defined double minimum at 222 and 208210 nm or a single minimum between 210 and 220 nm the CD estimate of the amount of helix or the /3-forms may be accepted with some confidence. On the other hand, if the negative CD bands between 200 and 240 nm are broad, or skewed, or have shoulders instead of distinct minima and the positive band below 200 nm is unusually weak, the CD estimates may be very uncertain. At present the estimation of/3-turns by CD analysis is most uncertain. For proteins containing a mixture of/3-forms and/3turns and no helix, such as cobra neurotoxin 9° and ~-bungarotoxin, 9~ the CD contributions due to nonpeptide chromophores seem no longer to be negligible. Therefore, these "atypical" CD spectra are difficult to analyze by current CD calculations. Occasionally, one finds CD estimates in the literature good to the first decimal. Such practice goes beyond the limits of CD analysis. The methodology of CD analysis also deserves further investigation. 9o Y.-H. Chen, T.-B. Lo, and J. T. Yang, Biochemistry 16, 1826 (1977). 9~ Y.-H. Chen, J.-C. Tai, W.-J. Huang, M.-Z. Lai, M.-C. Hung, M.-D. Lai, and J. T. Yang, Biochemistry 21, 2592 (1982).

[11]

CALCULATION OF PROTEIN CONFORMATION FROM C D

249

Among the uncertainties totaling of secondary structure from X-ray data remains problematic. The rigid criteria used by Bolotina e t al. 75,8° versus the relaxed criteria proposed by Levitt and Greer 83 or some other criteria to be developed must be further tested; eventually, a general consensus may be reached for CD analysis. The use of peptide bonds rather than amino acid residues as the basic units in the counting is logical, but the number of helical segments,/3-strands, and/3-turns of an unknown protein cannot be predetermined and this problem must be overcome. The use of VUCD is attractive, but for practical purposes whether commercial circular dichrometers with a cut-off around 184 nm will be equally adequate for the CD analysis of proteins must yet be decided. The assumption that the CD contributions due to nonpeptide chromophores can be neglected may be permissible for most proteins because the mole fractions of aromatic groups and cystine residues are usually small. Yet there are bound to be some exceptions and failure to detect them will affect CD estimates of various conformations in a protein (for an early discussion on this subject, see Ref. 92). Despite these reservations, we feel confident that current methods for the CD analysis of proteins still can be refined and improved. It is immaterial whether one method is better than another. What is important is that one or more methods can provide a reasonable estimate of the secondary structure of proteins with confidence. Appendix A lists a computer program for the Chang-Wu-Yang method; appendix B provides information concerning the ProvencherGl6ckner program; appendix C is the Hennessey-Johnson program. Tables VIII and IX list experimental CD data for 25 proteins from two laboratories. The purpose is to illustrate differences for the same proteins studied by different laboratories and also to provide for future comparisons. Requests from many researchers for these data have convinced us that they are being used for testing new methods of CD analysis. The numerical values for the proteins studied by Brahms and Brahms 47 and Bolotina e t al. 75'8°,81 a r e not available. Concluding Remarks Since X-ray diffraction provides the three-dimensional structure of crystalline proteins, one may justifiably question whether the CD analysis of proteins is useful or perhaps futile. The chiroptical method does not challenge but can complement the precise determination of the secondary structure of proteins by crystallography. First, not all proteins are easily crystallized. More important, crystallographic analysis may be insuffi95 G. D. Fasman, P A A B S Rev. 2, 587 (1974).

250

MACROMOLECULAR CONFORMATION" SPECTROSCOPY

[ 1 1]

O I

< "O

? ×

E Z I-

Z [.

,2

7. °

Z <

m

¢2 X

,.<

I

I

I

I

I

I

I

I

I

I

I

I

l

l

l

l

[11]

251

CALCULATION OF PROTEIN CONFORMATION FROM CD e~

I I I I I I I I I I P I I ( I I I I

r~

I l l I I I I I I I I I I I I I I I

l l l l l l l I l l I I l l ~ l l l

I I I I I I I I I I I I I I I

I

r

~. ~

I

I

I

I

I

I

l

i

l

~

l

l

l

l l l I l l l l l l i l l I l l l l

~

~

~

-

~

I I I I I I I I I I I I

Ill

i

~ i I I I I I

I I I I I I I I I I I I I I I I I I

252

MACROMOLECULAR

CONFORMATION:

[ 11 ]

SPECTROSCOPY

.-.q

I I I I I I I I I I I I I L)

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I I I I I 1 ~ 1

I I I I I I

e~

~6

,.z

I I I I I I

7.

Z

,.d < [..

~ l l l l l l

I l l l l

[-.~

.-q

.~z ~q ¢.q .<

[11]

CALCULATION

OF

PROTEIN

c~

¥

CONFORMATION

CD

FROM

253

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

t

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

l

I

I

I

I

t

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

Z

r~ b-

I

I

I

e~ .d © r" <

Z

© ¢-~

~z

Zu2 [L ©

L)

..d <

;>

¢::::

.d .< L)

,7

"6 d::

,,.<

I

254

MACROMOLECULAR CONFORMATION: SPECTROSCOPY

[1 1] O O O O O O

..=

I I I I I I

I I I I I I I I I

¢; 0 >

N

I l l l l

P~

<

..= O O I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

P~ 0

~

. ~ I I I I

. . . .

I

I

~2

I I I I

[1 1]

255

CALCULATION OF PROTEIN CONFORMATION FROM C D

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

~

I

I

I

I

I

I

I

I

I

I

I

I

v3 r-

.o

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

O

¢'qcqeqcq

.c~ e,

256

MACROMOLECULAR CONFORMATION:SPECTROSCOPY

[11]

cient in extrapolating static information to the dynamic properties of proteins in solution. It is here that physical techniques such as CD will realize their full potential. The very simplicity of the CD method and the short time required for the measurements are extremely attractive. To study conformation and conformational changes of proteins in solution, we must first know that the CD analysis of native proteins is fairly reliable, albeit empirical. This is the raison d'etre for numerous attempts to improve the estimates of secondary structure of proteins from their CD spectra. To further ensure reliability, it is often advantageous to compare the CD analysis with empirical predictions of secondary structure of proteins from sequences, if such are available. The very nature of any empirical method does caution us against a too literal interpretation of experimental analysis. At present CD is a powerful tool for studying protein conformation. It will remain so and continue to be refined or improved unless a better method to determine the secondary structure of proteins in solution can be found.

Appendix A. Program for the Chang-Wu-Yang Method 77 The program in C language consists of two parts. (1) It generates the reference spectra of the helix, fl-form, fl-turn, and unordered form or, if the reference spectrum of the helix is predetermined from the CD spectrum of myoglobin [Eq. (10)], the reference spectra of the other three conformations, from the CD spectra of a set of reference proteins according to Eq. (11) in the text. (2) It uses the four reference spectra to analyze the CD spectrum of a protein for secondary structure.

Appendix B. Program for the Provencher-G16ckner Method 85 S. W. Provencher (private communication) has developed a constrained regularization method for inverting data represented by linear algebraic or integral equations. A portable FORTRAN IV package called CONTIN was written for converting noisy linear operator equations and seeking an optimal solution. The program for CD analysis of proteins is one of the USSR subprograms in the general purpose package, which has nearly 6000 lines of code. Application packages are documented in S. W. Provencher, CONTIN Users Manual, EMBL Technical Report DA05, European Molecular Biology Laboratory (1982). The program was also described in detail in two recent publications: S. W. Provencher, Computer Phys. Commun. 27, 213-227, 229-242 (1982). A listing or tape is available upon request from the author.

[11] /*

CALCULATION OF PROTEIN CONFORMATION FROM C D C o p y r i g h t b y the R e g e n t s of t h e U n i v e r s i t y of C a l i f o r n i a , ~1983. All rights reserved.

Nov.

257 16,

P r e p a r e d in t h e B i o m a t h e m a t i c s C o m p u t a t i o n L a b o r a t o r y , Dept. of B i o c h e m i s t r y a n d B i o p h y s i c s , Univ. of Calif. at S a n F r a n c i s c o , S a n F r a n c i s c o , Calif. 94131, b y H u g o M. M a r t i n e z . T h e p r o g r a m is w r i t t e n in t h e C language.

CIRCULAR DICHROISMANALYSIS

OF P R O T E I N C O N F O R M A T I O N P A R A M E T E R S

G i v e n the C D s p e c t r u m X 0 of a protein, e s t i m a t e s of t h e h e l i c a l H, b e t a B, t u r n s T a n d r a n d o m R f r a c t i o n a l a m o u n t s a r e o b t a i n e d . T h e e s t i m a t i o n p r o c e d u r e is t h e m e t h o d of l e a s t s q u a r e s a n d is b a s e d on the a s s u m p t i o n t h a t the CD v a l u e X(1) o f the p r o t e i n at the w a v e l e n g t h 1 can b e e x p r e s s e d as X(1)

: H'h(1)

+ B'b(1)

+ T't(1)

* R'r(1)

in w h i c h h 0 is the C D s p e c t r u m of a p r o t e i n for w h i c h H = i, b 0 is the C D s p e c t r u m of a p r o t e i n for w h i c h B =i, etc. M i n i m i z i n g the s u m o f t h e s q u a r e s of t h e d e v i a t i o n s b e t w e e n t h e e x p e r i m e n t a l X(1) a n d p r e d i c t e d v a l u e s H'h(1) + B'b(1) + T't(1) + R'r(1) is s u b j e c t to t h e c o n s t r a i n t t h a t t h e f r a c t i o n u n k n o w n s H, B, T a n d R are n o n - n e g a t i v e a n d m u s t s u m to t h e v a l u e i. T h e r e q u i r e d m i n i m i z a t i o n is o b t a i n e d b y a s t e p w i s e exhaustive search strategy. T a k i n g t h r e e of the u n k n o w n s as i n d e p e n d e n t , the m i n i m i z i n g s o l u t i o n lies in a 3 - d i m e n s i o n a l c u b e in w h i c h e a c h c o o r d i n a t e is r e s t r i c t e d to v a l u e s b e t w e e n 0 a n d i. U s i n g a s t e p s i z e of .05 for e a c h c o o r d i n a t e , t h e b e s t p o i n t of t h e u n i t c u b e is first found. T h i s e s t i m a t e is t h e n r e f i n e d b y a d o p t i n g a s t e p s i z e of .005 a n d r e s r i c t i n g t h e e x h a u s t i v e s e a r c h to a c o o r d i n a t e v a r i a t i o n of less t h a n .05 a b o u t the first e s t i m a t e . T h e a s s u r e d a c c u r a c y of t h e s o l u t i o n is t h e r e f o r e to w i t h i n p l u s or m i n u s .05 for e a c h f r a c t i o n a n d v e r y l i k e l y a c c u r a t e to w i t h i n p l u s or m i n u s .005. T h e C D s p e c t r u m of a s a m p l e p r o t e i n is a s s u m e d to b e in a u s e r - d e s i g n a t e d file. It is to b e in a 2 - c o l u m n format, w i t h t h e first c o l u m n c o n t a i n i n g the w a v e n u m b e r s in d e c r e a s i n g o r d e r ( n o r m a l l y from 240 to 190 ran) a n d the s e c o n d c o l u m n c o n t a i n i n g the c o r r e s p o n d i n g C D values, T h e n u m b e r of w a v e n u m b e r s is g i v e n b y t h e p a r a m e t e r NI21M, Its v a l u e is d e f i n e d in t h e s o u r c e c o d e a n d is n o r m a l l y set to 51 c o r r e s p o n d i n g to t h e n o r m a l r a n g e of 240 to 190 for the w a v e numbers. If a d i f f e r e n t r a n g e is to b e used, t h e v a l u e of N L A M m u s t b e r e d e f i n e d a n d t h e s o u r c e code recompiled. T h e r e f e r e n c e "pure" s p e c t r a h 0 , b 0 , t 0 a n d r 0 c o n s t i t u t e a d a t a b a s e c o n t a i n e d in a u s e r - d e s i g n a t e d file. The program g i v e s the o p t i o n to c o n s t r u c t s u c h a file b y p r o v i d i n g it w i t h t h e n a m e o f a d i r e c t o r y in w h i c h t h e r e is a file for e a c h reference protein. E a c h r e f e r e n c e p r o t e i n file c o n t a i n s t h e H, B, T a n d R v a l u e s for t h e c o r r e s p o n d i n g p r o t e i n , f o l l o w e d b y its N L A M C D s p e c t r u m v a l u e s X(1) in d e c r e a s i n g o r d e r of i. It is s e p a r a t e l y p r e p a r e d b y the u s e r as a t e x t flle w i t h a t e x t editor. C o n s t r u c t i o n of t h e r e f e r e n c e p u r e s p e c t r a d a t a base, g i v e n t h e r e f e r e n c e p r o t e i n d i r e c t o r y , is b y t h e m e t h o d of l e a s t s q u a r e s b a s e d on t h e a b o v e l i n e a r r e l a t i o n . S i n c e t h e r e are n o c o n s t r a i n t s on t h e v a l u e s of the p u r e r e f e r e n c e spectra, t h e n o r m a l e q u a t i o n s of t h e l e a s t s q u a r e s p r o b l e m are s o l v e d d i r e c t l y u s i n g the O a u s s - S e i d e l m e t h o d of s o l v i n g l i n e a r e q u a t i o n s . T h e r e a r e two c o n s t r u c t i o n p r o c e d u r e s of the r e f e r e n c e spectra. In c o n s t r u c t i o n p r o c e d u r e A all four r e f e r e n c e s p e c t r a are c o m p u t e d . In c o n s t r u c t i o n p r o c e d u r e B o n l y the b 0 , t 0 and r 0 spectra are c o m p u t e d s i n c e h 0 is r e g a r d e d as c o r r e s p o n d i n g to the s p e c t r u m of m y o g l o b i n m o d i f i e d b y a w a v e l e n g t h and p r o t e i n d e p e n d e n t factor (I k/n). Thus, t h e a b o v e r e l a t i o n b e c o m e s X(1)

= H*hmyo(1)*(l-k/n)

+ B'b(1)

+ T't(1)

+ R'r(1)

258

*/

[11]

MACROMOLECULAR CONFORMATION: SPECTROSCOPY in w h i e h h m y o 0 is the m y o g l o b i n spectrum, k(1) is t h e w a v e l e n g t h d e p e n d e n t p a r a m e t e r and n is the p r o t e i n d e p e n d e n t parameter. In order to c a r r y out c o n s t r u c t i o n p r o c e d u r e B, t h e r e is a data file c a l l e d m y o i n f . t b l in w h i c h s u c c e s s i v e lines c o n t a i n the i, hmyo(1) and k(1) v a l u e s in d e c r e a s i n g order of 1 from 240 to 190. Additionally, the first llne of e a c h r e f e r e n c e p r o t e i n file c o n t a i n s a fifth e n t r y c o r r e s p o n d i n g to its n value. For an i n t e r p r e t a t i o n of the k a n d n parameters, refer to the a r t i c l e b y Chang, C.T., Wu, C.-S.C. a n d Yang, J.T. (1978) Anal. Biochem. 91, 13-31. Note: If a range d i f f e r e n t from 240 to 190 for the w a v e n u m b e r s is to b e used, the file m y o i n g . t b l m u s t b e e d i t e d accordingly.

#include



float s p e c t r a [52] [5] ;

/* the r e f e r e n c e

float w [51 [5] , p [51 ; int nref; float f[201 [51 ;

/* w o r k i n g arrays u s e d b y e s t i m a t e 0 function */ /* n u m b e r of r e f e r e c n c e p r o t e i n s */ /* R e f e r e n c e c o n f o r m a t i o n fractions. H = f[Jl Ell, B = f[J] [2], T = f[j] [31 and R = f[J] [4] for the Jth r e f e r e n c e protein. */ /* CD spectra of t h e r e f e r e n c e proteins, cd[J] [11 is X(1) for the Jth r e f e r e n c e protein. */ /* A u g m e n t e d m a t r i x u s e d in s o l v i n g the normal e q u a t i o n s of the "pure" s p e c t r a problem. */

float cd[20] [603 ; float a[51 [5] ; #define

NLAM

51

/* number

pure

spectra

of w a v e l e n g t h s

*/

(wave numbers)

*/

int cp;

/* c o n s t r u c t i o n p r o c e d u r e flag; A and i for p r o c e d u r e B */

float hmyo[52] ; /* this

is the c d s p e c t r u m for myoglobin, also k n o w n as theta inf */ /* this is t h e w a v e l e n g t h d e p e n d e n t p a r a m e t e r the factor (i - k(1)/n) */

float k m y o [521 ;

cp is 0 for p r o c e d u r e

k in

main 0

{ char o p t i o n 0 ;

title 0 ; if

(option() == 'e') estimate();

else construct();

} estimate()

{

char r e f n a m e [503 ,s a m p n a m e [50] : int lain, k, i, j, np; float eave, emin, ecurrent; float error 0 ; float f[5], i[5], u[5]; float step, stepl; char c, ans [5] ; float dum; FILE *fopen 0 , *fp;

p r i n t f ("\n* printf("\n* printf("\n* p r i n t f ("\n* printf("\n* p r i n t f ("\n* printf("\n* printf("\n*

It is a s s u m e d that the r e f e r e n c e spectra exist in an"); ascii file of N L A M rows c o r r e s p o n d i n g to the N L A M wavelengths" I in d e c r e a s i n g order. Each r o w h a s four"); c o l u m n s corresponding, respectively, to the helical, beta, ") ; turn and r a n d o m r e f e r e n c e CD values.\n"); T h e r e f e r e n c e spectra file is n o r m a l l y p r e p a r e d with"); the 'construction' o p t i o n a n d g i v e n t h e name 'refcdA.out'"), or 'refcdB.out' d e p e n d i n g upon w h e t h e r construction");

[11]

CALCULATION OF PROTEIN CONFORMATION FROM C D printf("\n* procedure p r i n t f ("\n") ;

A or B w a s used. ");

printf("\nEnter n a m e of r e f e r e n c e s c a n f ("~s", refname) ; if

259

spectra

((fp = f o p e n ( r e f n a m e , " r " ) ) =--NULL) { printf("\n\7The reference spectra re fname) ; e x i t (0) ;

file:

") ;

file ~s d o e s

not

exist!",

} for

(k = i; k <= NI2H~; k++) { f s e a n f (fp, "~f~f~/~f"/~f", & s p e c t r a [k] [i] ,& s p e c t r a [k] [23 , & s p e c t r a ~<] [3] , & s p e c t r a [k] [4] ) ,

} felose (fp) ; printf("\n* In the f o l l o w i n g r e q u e s t for the s a m p l e p r o t e i n file"), printf("\n* it is a s s u m e d that it is an a s c i i file w i t h the format:"), printf("\n* T h e r e a r e 51 lines of two e n t r i e s each. The") ; printf("\n* ist is t h e w a v e n u m b e r a n d the 2nd is t h e CD v a l u e . " ) ; printf("\n* W a v e n u m b e r s are in d e c r e a s i n g order. "); p r i n t f ("\n") ; printf("\nEnter n a m e of d a t a file for the s a m p l e p r o t e i n : ") ; s c a n f ("~s", sampname) ; if ((fp = f o p e n ( s a m p n a m e , " r " ) ) = = NULL) { printf("\n\7The s a m p l e file ~s does n o t exist! ",sampname)

exit

(o) ;

} for

(k = i; k <: NLAM; k++) f s e a n f (fp, "~ f~/.f", &dum, & s p e c t r a [k] [0] ) • fclose (fp) ; fprint f (stderr, " \ n C o m p u t a t i o n in p r o g r e s s . \ n " ) fp = f o p e n ( " c o n f p a r . o u t " , "w") ;

;

f p r i n t f (fp, "\n* C o n f o r m a t i o n p a r a m e t e r e s t i m a t e s of the p r o t e i n " ) " fprintf(fp, "\n* ~s u s i n g the ref. s p e c t r a file ~ s . " , s a m p n a m e , re fname) ; fprint f (fp, "\n\n") ; /* c o m p u t e c o e f f i c i e n t s of e r r o r f u n c t i o n */ for

(i = 0; i <= 4; i++) for (j = 0; j <= 4; j++)

for

(i = 0, i <= 4; i++) for (j = 0; j <= 41 j++) for (k = i; k <= NLAM; k++) w[i] [j] += spectra[k] [i] * s p e c t r a [k] [j];

w[i]

/* ist o r d e r eave p[0]

= 0; e m l n = -i;

f[l]

=

np= step

0; = 0.05;

for

scan

f[2]

=

[]]

=

o;

*/

= l.e20; f[32

-

0;

fE4]

=

i;

~o[i] = 0; pill <= i; pill += step) for ~o[2] = 0; p[2] <= l-p[l]; p[2] += step) for 5 [ 3 ] = 0; p[3] <= l-p [l] -p [2] : p[3]

+= step)

{ p[4] = l-p [l] -p [2] -p [3] : np++; ecurrent = error 0 ; e a v e += ecurrent, if (ecurrent < emin)

{ for (i = l;i <= 4, emin = ecurrent;

}

i++)

f[i]

= p[i] ;

260

M A C R O M O L E C U L A R C O N F O R M A T I O N : SPECTROSCOPY /* o u t p u t

results

of ist order

scan

[ 11]

*/

eave= eave/np; fprintf(fp,"\n\n F i r s t O r d e r S c a n R e s u l t s w i t h s t e p - ~f", step) ; fprintf(fp,"\n\n\t a v e r a g e r~ns e r r o r = ~f, m i n i m u m r m s e r r o r = ~f", eave, emin); fprintf(fp,"\n\n\t F h = ~6.4f, F b = ~6.4f, F t = ~ 6 . 4 f , Fr ~ ~ 6 . 4 f " , f[l]

/*

, f[2]

2nd order

stepl for

, f[4])

;

*/

step/lO;

= ~

, f[3]

scan

= i; k <= 3; k++)

{ if (f[k] - s t e p < 0) l [ k ] = e l s e l[k] = f[k] - step; if ( f ~ ] + s t e p > i) u p ] e l s e u[k] = f ~ ] + step;

02

= I;

} for for for

511] 5[2] 5[3]

: I[i]; = 112]; = 113];

p[l] p[2] p[3]

<= u[l]; pill += stepl) <= l-p[l] & & p[2] <: u[2]; p[2] <= l - p i l l - p [ 2 ] & & p[3] <= u [ 3 ] ;

+- stepl) p[3] +: stepl)

{ p [4] = l - p [i] -p [2] p [3] ; ecurrent error(); -

if

(ecurrent

< emin)

{ for (k = i; k <= 4; k++) emin = ecurrent;

f[k]

= p[k];

} } /* p r i n t

results

of

2nd order

scan

*/

fprintf(fp,"\n\n S e c o n d O r d e r S c a n R e s u l t s w i t h s t e p - ~f", s t e p l ) ; fprintf(fp,"\n\n\t m i n i m u m rnns e r r o r = ~f", e m i n ) ; fprintf(fp,"\n\n\t F h = ~6.4f, F b = ~6.4f, F t = ~ 6 . 4 f , Fr = ~ 6 . 4 f " , f[l], f[2], f[3], f[4]) ; fprintf(stderr,"Conformation p a r a m e t e r e s t i m a t e s a r e in t h e "); fprintf(stderr,"'confpar.out' file.\n");

} float

error()

{ f l o a t e r r = 0.; double sqrt0; int

i,

j;

for

(i = 0;

i <: 4;

i++)

for (j = 0; ] <= 4; j+*) err err

*= p[i] *p[j] *w[i] [J] ; ~ sqrt (err/(NLAM-3) ) :

r e t u r n (err) ;

} title 0 { p r i n t f ("\n\n* P r o g r a m for e s t i m a t i n g c o n f o r m a t i o n paramaters (fractions) ") printf("\n* o f a p r o t e i n f r o m its C D s p e c t r u m , or for c o n s t r u c t i n g a"); p r i n t f ("\n* r e f e r e n c e 'pure' s p e c t r a d a t a b a s e . \ n " ) ;

} char

option 0

{ char

a n s [5] ;

[11]

CALCULATION

while

OF PROTEIN CONFORMATION

FROM

CD

261

(i) { p r i n t f ( " \ n C o n f o r m a t i o n estimate (e) or reference spectra "); p r i n t f ( " c o n s t r u c t l o n (c) ? "); scanf("%s",ans); if (*ans != 'e' & & * a n s != 'c') { printf("\7"); /* sound bell */ continue;

} else break;

}

}

return (*ans) ;

construct()

/* construct the reference spectra */

< int l,k,j; char pans[5]; FILE * f p o u t , * f o p e n 0 ; float nave,hi printf("\n* There are two construction procedures: (A) for the"); printf("\n* construction of all 4 reference spectra, or (B) for"); printf("\n* the construction of just the beta, turn and randdom,"); printf("\n* w i t h the helical being c o m p u t e d from the m y o g l o b i n "); prlntf("\n* spectrum contained in the thetainf.tbl file.\n"); p r i n t f ( " \ n P r o c e d u r e A or B? "); scanf("%s",pans); if (*pans = = 'A' II *pans = = 'a') cp = 0; else cp = i; getdata0; /* components and spectra of the reference p r o t e i n s fprintf(stderr,"\nComputation in progress.\n"); if (cp) { /* if construction p r o c e d u r e B */ fpout = fopen("refcdB.out","w"); n a v e = 0; for (k = i; k <= nref; k++) nave += f ~ ] [5]; n a v e / = nref;

} else fpout = fopen("refcdA.out","w"); for

(I = 17 1 <= NLAM; mkeqns(1); solve(4-cp);

i++) { /* make the normal equations */ /* solve the normal equations */

if (cp) { /* if p r o c e d u r e B */ h = hmyo[l]*(l-kmyo[l]/nave); fprintf(fpout,"%8.2f ",h);

} for

(k - i; k <= 4-cp; k++) fprintf(fpout,"%8.2f fprintf(fpout,"\n");

",a[k-l] [4-cp]);

} if (cp) prlntf("\nThe

ref. spectra are in the file 'refcdB.out' .");

printf("\nThe

ref. spectra are in the file 'refcdA.out' .");

else

} getdata 0

/* get the reference data: fractional c o m p o n e n t s reference proteins and their cd spectra */

{ int j,k; char prodir[50]; char command[80]; char pname[25]; char rpname[lO0]; float dum; FILE * f p , * f p x , * f o p e n 0 ;

/* reference p r o t e i n d i r e c t o r y

*/

of the

*/

262

MACROMOLECULAR CONFORMATION: SPECTROSCOPY printf("\n* printf("\n* printf("\n* printf("\n* printf("\n* printf("\n* printf("\n* p r i n t f ("\n*

It is a s s u m e d that the r e f e r e n c e p r o t e i n data are all") ; in a s i n g l e d i r e c t o r y c o n t a i n i n g a file for e a c h protein. "); E a c h r e f e r e n c e p r o t e i n file is an ascii file of NLAM+I lines." I Line one are the helical, beta, turn and r a n d o m confor-"); m a r i o n fraction v a l u e s followed b y its n value. "); S u b s e q u e n t lines c o n t a i n a wave"); n u m b e r and the c o r r e s p o n d i n g CD value. T h e w a v e numbers"); are in d e c r e a s i n g order.\n") ,

p r i n t f ( " \ n E n t e r n a m e of the r e f e r e n c e scanf ("%s",prodir) ; s t r c p y (command, " i s ") ; s t r c a t (command, p r o d i r ) ; s t r c a t (command, " > t p r o d i r " ) ; s y s t e m (command) ;

protein

directory:

") ;

fp = fopen ("tprodir", "r") ; j = i; w h i l e (fscanf(fp,"%s",pname) != EOF) { s t r c p y (rpname, prodir) ; strcat (rpname, "/") ; streat (rpname, pname) ; fpx = fopen (rpname, "r") , if (fpx == NULL) fprintf (stderr, "cannot open %s", rpname) f o r (k = 17 k <= 57 k++) fscanf (fpx, "%f", &f [J] ~<] ) ; k = i; w h i l e (fseanf(fpx,"Zf°/of",&dum,&cd[j] ~ ] ) != EOF) k++;

{

} felose (fpx) ;

J++;

}

n r e f = j-l; if

(cp)

{

/* if c o n s t r u c t i o n

procedure

B */

k=l; fpx = fopen("thetainf.tbl", "r") ; for (k = I; k <= NLAM; k++) { f s c a n f ( f p x , " % f % f %f " , & d u m , & h m y o ~ ]

,&kayoS])

;

} fclose (fpx) ;

} } mkeqns (1)

/* m a k e t h e normal e q u a t i o n s in the form of d e r i v i n g an a u g m e n t e d matrix to p a s s on to t h e solve() routine; the v a l u e of 1 is the w a v e l e n g t h n u m b e r */

int i, j,k; float cdfact; for

(i = 17 i <= 4-cp; i++) { for (j = 17 j <= 4-cp: j+*) { a[i-1] EJ-1] = 0; for (k = 17 k <= nref; k++) all-l] [j-1] += f[k] [i+cp]*f~<] [j+cp];

} a E i-l] [j-1] = 0; for (k = 17 k <= nref; k++) { if (cp) /* if p r o c e d u r e B */ c d f a c t = ca ~ ] [i] - f ~<] [i] *hmyo Eli * (i - kmyo[l]/f[k] [5]) ; else c d f a c t = cd[k] [I] ; a[i-l] [j-l] += f ~ ] [i+cp]*cdfact,

}

[1 1]

[11] solve

CALCULATION OF PROTEIN CONFORMATION FROM C D

(neq)

263

/* F o r s o l v i n g n e q l i n e a r e q u a t i o n s h a v i n g a u g m e n t e d m a t r i x a. S o l u t i o n is l e f t in a[] [neq] . The Gauss-Jordan e l i m i n a t i o n m e t h o d is used. */

( int i, j, k, f l o a t z"

1;

/* i m p l e m e n t for

(k = O; k < neq;

the Gauss-Jordan

algorithm

*/

k++)

{ for (i = k; 1 < n e q && a [13 [k] == O; i++); if (i == neq) return(-l) ; /* to i n d i c a t e ecfns a r e s i n g u l a r if

(i != k)

/* i n t e r c h a n g e

for

(j = O; J <= neq;

rows k and

*/

1 */

{ j÷+)

{

z = a [k] D] : a[k] [ j ]

= a[l]

a[l] [j] =

[j];

z:

} } z = a[~] ~ ] ; /* n o r m a l i z e r o w k */ i f (!z) { f p r i n t f (stderr, "\nz is z e r o !") ; e x i t (0) ;

} for

(j = O; j <= neq; /* i n s e r t

for

j++)

aN]

[-1] = a[k] [j]/z;

and below

k-th

Z = a [i] ~ ] ; if (i ~= k) for (-1 = O; J <= neq;

J++)

(i = O;

O's a b o v e

row

in k - t h

col.

*/

i < neq; i++)

{

a[i] [_1] -=

}

z * a ~ ] [-1]:

} r e t u r n (0) ;

/* to i n d i c a t e

that

solution

OK

*/

Appendix C. The H e n n e s s e y - J o h n s o n Program 86 The program B A V G E N generates the basis CD spectra and their corresponding secondary structures from the CD spectra of the reference proteins and their secondary structures determined from X-ray analysis. The program PROSTP uses the basis CD spectra and their corresponding secondary structures to analyze the CD spectrum o f a protein for secondary structure. C ............................................................................... C P~OCRAM~AME: BAVGEN (BASIS VECTORS GENERATION) C C COMPUTATIONOF SINGULAR VALUE DECOMPOSITION OF A MATRIX AND C GENERATION OF BASIS VECTORS C C THIS PROGRAM CALLS THE SUBROUTINES SVD AND BASSP, SVD HAS BEEN TAKEN C FROM "COMPUTER METHODS FOR MATHEMATICAL COMPUTATION" BY G.E. FORSYTHE, C M , A , MALCOLM AND C.B. MOLER (19777 CHAPTER 9, PRENTICE-HALL INC, N.J. C THIS SUBROUTINE COMPUTES THE SINOULAR VALUE DECOMPOSITION OF A

264

MACROMOLECULAR CONFORMATION: SPECTROSCOPY

[ 11]

C RECTANGULAR MATRIX. C C INPUT DATA VARIABLES C ..................... C C CD: ~CD DATA MATRIX. C INT: WAVELENGTHINTERVAL BETWEEN TWO DATA POINT!5, C )WF~ t4AVELENCTHOF THE LAST DATA POINT, C IWS: WAVELENGTHOF THE FIRST DATA POINT. C MATXU~ MATXU=O(OR I ) WOULD ASSIGN MATU 'FALSE'(OR "TRUE'). C THIS SHOULD BE SET TO "TRUE' IF OUTPUT MATRIX "U" IS DESIRED. C MAT~V: MATXV=OIOR I ) WOULD ASSIGN MATV "FALSE'(OR "TRUE'). C THIS MUST SET TO :TRUE' TO GET THE OUTPUT MATRIX "V' AND TO C GENERATE THE BASIS VECTORS. C NBASV~ NUMBER OF BASIS VECTORS TO BE GENERATED FROM "CD" AND "V" C MATRICES. PROGRAM GENERATES THE MOST IMPORTANT BASIS VECTORS C FROM FIRST TO A MAXIMUM OF NBASV='NSAM'. C h~BASV=O WOULD NOT GENERATE ANY BASIS VECTORS AND C TERMINATE THE PROGRAM AFTER COMPUTING THE "U' AND "V" MATRIX. C N P T S : NUMBEROF CD DATA POINTS. C NSAM: NUMBEROF PROTEINS. c N S T R : NUMBEROF DIFFERENT SECONDARY STRUCTURES TO BE CONSIDERED C (EG. H~,BA, BP,TN,OT). NSTR=O WOULD NOT COMPUTE 'BVSS" MATRIX C AND TERMINATE THE PROGRAM AFTER COMPUTING ' U ' , "V" AND C 'BVCD' MATRICES. C PRNAME: NAME OF PROTEINS INCLUDED IN THE "CD' MATRIX. THE MAXIMUM C LENGTH OF THE T I T L E I ~ 40 CHARACTERS. C SS: PROTEIN SECONDARY STRUCTURE MATRIX CORRESPONDING TO CD DATA C MATRIX FROM X-RAY DATA. C SSTR: TYPE OF SECONDARY STRUCTURE, C ( H X : H E L I X ; B A : A N T I P A R A L L E L BETA SHEET&BP:PARALLEL BETA SHEET; C TN:TURNS;OT:OTHER STRUCTURES), C C OUTPUT DATA VARIABLES C ..................... C C BVCD: B A S I SCD VECTORS. C BVSS~ SECONDARYSTRUCTURE MATRIX CORRESPONDING TO THE BASIS CD C VECTORS DEFINED BY "BVCD', C SIGMA: SINGULAR VALUE MATRIX. THE SIZE OF THE SINGULAR VALUE INDICATES C THE IMPORTANCE OF THE CORRRESPONDING CD BASIS VECTOR. C ONLY CD BASIS VECTORS WITH SIGNIFICANT SINGULAR VALUES SHOULD C BE USED IN AN ANALYSIS. SEE HENNESSEY ~ JOHNSON (1981) C BIOCHEMISTRY, VOL.20, 1085-1094. C Ut U MATRIX AND WILL BE PRINTED WHEN MATXU=I. C V~ EIGEN VECTOR MATRIX V AND WILL BE PRINTED WHEN MATXV=I. C C. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C REAL CD(50,25)~U(50,25),V(50,?5)~SIGMA(25),WORK(25) INTEGER I,IERR, J,NSAM, NPTS DIMENSION BVCB(25,50),BVSS(25,50),SSTR(10),TOT(25},SS(50,25) DIMENSION PRNAME(40) LOGICAL MATU,MATV C C I N I T I A L INPUT DATA FOR SINGULAR VALUE DECOMPOSITION C WRITE(6,2000) WRITE(6,1000) READ(5~IOOI)NSAM, IWS, IWF, INT, NPTS,NBASV, NSTR,MATXU~MATXV IM=t MATU=IM.EO,MATXU MATV=IM,EQ.MATXV WR]TE(6,1OO2)NSAM, IWS, IWF, INT~NPTS WRITE(B~IO05) DO I0 I=I,NSAM READ(5,!OOG)PRNAME READ(5,1OIO)(CD(J,I),J~I,NPTS) WRITE(6.1OOB)PRNAME WRITE(G, IO20)(CD(J,I)~J=I,NPTS) tO CONTINUE

[11] C C C C C C

C A L C U L A T I O N OF P R O T E I N C O N F O R M A T I O N FROM

CD

265

~UBROUTINESVB IS CALLED FOR SINGULAR VALUE DECOMPOSITION THE VARIABLE NM MUST BE SET ATLEAST AS LARGE AS THE MAXIMUM OF POW DIMENSION OF 'CD'.SEE SUBROUTINE SVD FOR FURTHER COMMENTS. NM=50

C

15 17 20 C C C 22

25 C C C C

30

35 40 C C C ~000

CAL.L SVD(NM,NPTS,NSAM, CD, SIGMA,MATU,U, MATV,V, IERR,WORK) IF(IERR .NE, O)WRITE(B, I025) IERR WRITE(G,/030) WRITE(6,1035)(SIGMA(1),I=1,NSAM) IF(.NOT, MATU) GO TO 17 WRITE(G,1040) DO 25 I=I,NSAM WRITE(G,1020)(U(J,I),J=I,NPTS) ] F ( , N O T , MATV) 00 TO 22 WRITE(G, I045) DO 201=1,NSAM WRITE(6,1020)(V(I,J),J=1,NSAM) GENERATION OF BASIS CD VECTORS FROM CD DATA AND V MATRIX IF(NBASV.EQ.O) GO TO 40 WRITE(G,2000) CAI.L BASSP (CD,V, NSAM,NBASVpNPTS,BVCD) WRITE(6,1050) WRITE(6~I055)NSAM, NBASV DO 23 I=I,NBASV WRITE(F, IOBO)I,SIGMA(1) WRITE(fi,1020)(BVCD(I,J),J=I,NPTS) CFNERATION OF SECONDARY STRUCTURES CORRESPONDING TO BASIS CD VECTORS USING PROTEIN STRUCTURE DATA AND V MATRIX IF (NSTR.EO.O) GO TO 40 WRITE(6,2000) WRITE(B~1065) WRITE(6,1070) READ(5,1CI75)(SSTR(1),I=1,NSTR) WRITE(G,1080)(SSTR(1),I=I,NSTR) DO 30 I=I,NSAM READ(5,1OO6)PRNAME ~EAD(5,1010)(SS(J,I),J=I,NSTR) WRITE(~,IODO)PRNAME,(SS(J~I),J=I,NSTR) CALL BASSP(SS~V,NSAM, NBASV,NSTR,BVSS) DO 33 I=ItNBASV WRITE(6tI085)I,SIGMA(1) WRITF(6,1020)(BVSS(I,J),J=1,NSTR) CONTINUE END OF PROGRAM

FORMAT(/,2X,'COMPUTATION OF SINGULAR VALUE DECOMPOSITION FOR A $ RECTANGULAR M A T R I X ' m / I ) lOOt FORMAT(IOI4) 1002 FORMAT(IOX~'NUMBER OF CD SPECTRA = ' ~ I S , / j $tOX,'WAVELENCTH RANCE IN N M = ' , I 5 , " T O ' , I 5 , " A T ' , / 5 , ' NM INTERVAl." Sj/,10Xt'TOTAL NUMBER OF CD DATA POINTS = ' , 1 5 ~ / / / ) 1005 FORMAT(IOX," PROTEIN CD MATRIX: CD " , / ) 1006 FORMAT(1Xt40A11 1008 FORMAT(SXp4OAI) 1010 FORMAT(EIO.5) 1020 FORMAT(IOX,15F6.2) 1025 FORMAT(" TROUBLE, IERR=',I4) 1030 FORMAT(///,10Xp' LIST OF SINGULAR VALUES',//) I0~5 FORMAT(SX,)OFIO.2) 1040 FORMAT(///,10Xp'U MATRIX'p/) 1045 F O R M A T ( / / / p I O X , ' V M A T R I X ' , / ) 1050 FORMAT(IOX,'GENERATION OF BASIS CO VECTORS BVCD $ USING MATRICES CB AND V ' p / / / )

266 1035 1060 1065 )070 1075 ]080 I0~5

lOgO 2000

M A C R O M O L E C U L A R C O N F O R M A T I O N : SPECTROSCOPY

[11]

FORMAT(IOX,'NUMBER OF CD SPECTRA=',IS,/, $10X,'NUMBER OF BASIS VECTORS TO BE GENERATED=',IS,//I) FORMAT(//~IOX,'THIS IS BASIS CD VECTCR NUMBER',IS, $13X,'SIGMA=',F10.3,/) FORMAT(2X,'GENERATION OF SECONDARY STRUCTURE MATRIX BVSS CORRES SPONOING TO BASIS CD VECTORS USING MATRICES SS AND V ' / / / ) FORMAT(IOX~'PROTEIN SECONDARY STRUCTURE MATRIX FROM X-RAY D A T A ' / ) FORMAT(A4) FORMAT(/5OX,5(2X,A4)) FORMAT(//,IOX,'THIS IS THE STRUCTURE FOR BASIS CD VECTOR SNUMBER',I3,13X,'SIGMA=',FiO,3,/) FORMAT(IOXt4OAI,IOF6.2) FORMAT(IHI) ~N~ SUBROUTINE BASSP(S,V,N, NBASV,NPTS, BSPEC)

C DIMENSION S ( 5 0 , 2 5 ) , V ( 5 0 , 2 5 ) , B S P E C ( 2 5 , 5 O ) , T O T ( 2 5 ) C C C C

20 C C C

T H I S PROGRAM I S DESIGNED TO COMPUTE THE BASIS VECTORS FOR A SAMPLE MATRIX USING DATA S AND V. DO 20 I - I , N TOT(1)=O.O DO 20 J=I~NPTS BSPEC(I,J)=O.O COMPUTEBASIS VECTORS.

DO 700 I=t,NBASV DO 70 J=I,NPTS DO 70 K=I~N BSPEC(I~J)=BSPEC(I,J)+V(K~I)*S(J,K) 70 CONTINUE ~0 700 K = I , N P T S 700 TOT(I)=TOT(I)+BSPEC(I,K) RETURN END C .............................................................................. C............................................................................. C PPOGRAM NAME: PRZ}STP (PROTEIN STRUCTURE P R E D I C T I O n ) C C T H I S PROGRAM COMPUTES THE SECONDARY STRUCTURE FOR A NUMBER OF PROTEINS C FROM THE MATRIX OF THEIR CD DATA "SAMV'. THE MATRIX OF BASIS CD VECTORS C "BVCD" AND THEIR CORRESPONDING STRUCTURE MATRZX "BVSS" USED FOR T H I S C CALCULATION ARE GENERATED BY A SEPARATE PROGRAM CALLED "BAVGEN'. C WHENEVER A NEW 'BVCD" MATRIX IS GENERATEDt THE CORRESPONDING STRUCTURE C MATP1X "BVSS" MUST ALSO BE COMPUTED. SEE HENNESSEY & JOHNSON ( 1 9 8 1 ) C ~IOCHEMISTRY, V O L . 2 0 , 1 0 8 5 - 1 0 9 4 FOR MORE DETAILS ABOUT THE METHOD. C C THE NUMBER OF DATA POINTS, WAVELENGTH RANGE AND WAVELENGTH INTERVAL C BETWEEN TWO DATA POINTS SHOULD BE SAME FOR "SAMV" AND ' B V C D ' . C C TI41S PROGRAM CALLS THE SUBROUTINES LSTSQ, L I N V I F AND MATPROo L I N V 1 F C I S THE STANDARD LIBRARY SUBOROUTINE TO INVERT A S~UARE MATRIX C DEVELOPED BY IMSL, INC. C C THE MAXIMUM NUMBER OF PROTEINS AND DATA POINTS HAVE BEEN DIMENSIONED TO C 20 AND 50 RESPECTIVELY, THIS CAN BE EXTENDED BY CHANGING THE DIMENSIONS. C C INPUT DATA VARIABLES C ..................... C C BVCD: BASIS CD SPECTRA MATRIX. C B V S S ~ SECONDARY STRUCTURE MATRIX CORRESPONDING TO THE SPECTRA C DFFINED BY "BVCD'. C INT: WAVELENGTH INTERVAL BETWEEN TWO DATA POINTS. C IWF: WAVELENGTH OF THE LAST DATA POINT. C IWS: WAVELENGTHOF THE FIRST DATA POINT. C NBASV: NUMBER OF BASIS CD VECTORS. ONLY CD BASIS VECTORS WITH C S I G N I F I C A NSINGULAR T VALUES SHOULD BE USED IN AN ANALYSIS. C THESE VALUES ARE COMPUTED IN THE PROGRAM "BAVGEN'o SEE C HENNESSEY ~ JOHNSON ( 1 9 8 1 ) , BIOCHEMISTRY, VOL.20, I085-1094,

[11] C C C C

C C C C

C C C C C C C

C C

CALCULATION OF PROTEIN CONFORMATION FROM C D

NSAMV: NUMBER OF SAMPLE PROTEINS FOR WHICH SECONDARY STRUCUTRES ARE TO BE DETERMINED, NPTS: NUMBER OF CD DATA POINTS, NSTRt NUMBER OF DIFFERENT SECONDARY STRUCTURKS(EG. HX~BA. BP, TN, OT). PRNAME:NAME OF THE PROTEINS INCLUDED IN 'SAMV' MATRIX. MAXIMUMLENGTH FOR EACH TITLE IS 40 CHARACTERS, SAMV: CD DATA mATRIX OF CD SPECTRA FOR "NSAMV'SAMPLE PROTEINS. SSTR: TYPE OF SECONDARY STRUCTURE. (HX~HELIX;BA:ANTIPARALLEL B-SHEETIBP:PARALLEL B-SHEET TNtTURNS;OT=OTHERS).TOT(TOTAL) SHOULD ALSO BE INCLUDED IN THE END. OUTPUT DATA VARIABLES ....................... PROSY:

SECONDARY STRUCTURE MATRIX FOR THE "NSAMV' SAMPLE PROTEINS COMPUTED FROM 'BVSS" A N D ' X ' . THIS IS CARRIED OUT BY THE SUBROUTINE "MATPRO', SUM OF THE PREDICTED SECONDARY STRUCTURE OF A GIVEN PROTEIN, THIS NEED NOT BE EQUAL TO ONE AS THE METHOD I S UNCONSTRAINED. THE COEFFICIENTS DETERMINED USING THE MATRICES "BVCD" AND "SAMV'. THIS IS CARRIED OUT USING "LSTSQ" AND "LINVIF" SUBROUTINES,

C TOTAL: C C X: C C C. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

DIMENSION BVCD(20,50),SAMV(20,50)tX(20,20),BVSS(20,20)~ ~PROST(2C,20),TOTAL(20),SSTR(IO),PRNAME(20,40) C WRITE(6,20CO) C WRITE(~,~O00) C C C

READS THE INITIAL DATA READ(5~IOOS)NSAMV, NPTS, IWS, IWF, INT, NBASV,NSTR

C C C

267

READS THE BASIS CD VECTORS WRITE(B, IOIO)NSAMV,IWS, IWF, INT,NPTS,NBASV, NSTR WRITE(6,1012) DO I0 )=I,NBASV READ(5,1020)(BVCD(I,J),J=I,NPTS) WRITE(6,1OSO)(BVCD(I,J),J=I,NPTS)

10 C C READS THE CD DATA OF PROTEINS TO BE PREDICTED C WRITE(S, IOI5)NSAMV DO 20 I=I,NSAMV READ(5,1OIG)(PRNAME(I,K),K=I,40) READ(5, IO20)(SAMV(I,J),J=I,NPTS) WRITE(6~IOIB)(PRNAME(I,K),K=I,40) 20 WRITE(B,IOSO)(SAMV(ItJ),J=I,NPTS) C CALL LSTSQ(BVCD,SAMV,NPTS, NBASV,NSAMV,X) C C PRINT OUT RESULTS C WRITE(6~040) D O 50 I=I,NSAMV 50 WRITE(6~IO30)(X(I,J),J=I,NBASV} C C PEADSTHE SECONDARY STRUCTURE CORRESPONDING TO BASIS CD VECTORS C WRITE(S,;O45)NSTR READ(5,1065)(SSTR(1),I=I,NSTR÷1) WRITE(~,IO58)(SSTR(I),I=I,NSTR) WRITE(6.1070) DO ~0 I=I,NBASV READ(5.1020)(BVSS(I,J),J=I,NSTR) 60 WRITE(6,1OSO)(BVSS(I,J),J=I,NSTR)

268

M A C R O M O L E C U L A R C O N F O R M A T I O N : SPECTROSCOPY

CALL MATPRO(NSAMV, NBASV,NSTR, X, BVSS,PROST, TOTAL) C C C

70

PRINT OUT RESULTS WRITE(6,1055) WRITE(6,1060)(SSTR(1),I=I,NSTR+I) DO 70 I=J,NSAMV WRITE(B,IOBO)(PRNAME(I,K),K=I,40),(PROST(I,J),J=I~NSTR), $TOTAL(I) END OF PROGRAM

I000 FORMAT(//15X,'COMPUTATION OF SECONDARY STRUCTURE OF PROTEINS FROM $ CD D A T A ' , / / / ) 1005 FORMAT(IOI4) 1010 FORMAT(IOX,'NUMBER OF PROTEINS FOR PREDICTION = ' , 1 5 ~ / , SIOX,'WAVELENGTH RANGE IN NM = ' , 1 5 , ' T O ' , I 5 , " A T ' , I 5 , ' NM $ INTERVAL',/,IOX,'NUMBER OF DATA P O I N T S = ' , I 5 , / , $10X,'NUMBER OF BASIS C D VECTORS='~I5,/, $10X,'NUMBER OF SECONDARY STRUCTURES=',IS,////I) 1012 FORMAT(IOX,'BASISCD VECTORS',//) I015 FORMAT(////tlOX,'CD DATA O F ' , I 5 , " PROTEINSFOR PREDICTION',//) 1016 FORMAT(IX,4OAI) I018 FORMAT(IOX,4OAI) 1020 FORMAT(FIO,5) 1030 FORMAT(10X,15F6.2) 1040 FORMAT(////,IOX,'RATRIX X ' , / ) 1045 FORMAT(///,IOX,'PROTEIN STRUCTURE VECTORS CORRESPONDING $ TO BASIS CD VECTORS'/,IOX,'NUMBER OF SECONDARY STRUCTURE=',IS,/) !055 FORMATI/////IOX,'SECONDARYSTRUCTURE PREDICTION'//) 1058 FORMAT(//IOX,IO(2X, A4)) 1060 FORMAT(//5OX,IO(2X, A 4 ) , / ) 1065 FORMAT(A4) I070 FORMAT(/) 1080 FORMAT(IOX,4OAI,IOF6.2) 2000 FORMAT(IHI) END C C C C SUBROUTINE LSTSQ(BASV~SAMV,NPTStNBASV, NSAMV,BCOEF) DIMENSION BASV(2Op50)tSAMV(2Op50),BCOEF(20,20),BST(20,20), $BRT(2Ot20),BBTI(20,20),WK(20) C C INITIALIZE REGISTERS C DO 31 = I , 2 0 DO 2J =1,20 BBT(IrJ)=O.O BBTI(I,J)=OoO BCOFF(I,J)=O,O BST(I,J)=O.O S WK(1)~O.O C C SET UP "PRO" MATRIX C DO 30 J=I,NBASV DO 20 K=I~NBASV DO 10 L=I,NPTS I0 BBT(J,K)=BBT(J,K)+BASV(J,L)*BASV(~,L) 20 CONTINUE 30 CONTINUE IDCT=3 C C INVERT "PRO" MATRIX C CALL LINVIF (BBT,NBASV,2OiBBTI~IDOT,WK, IER) C nO 500 I=I,NSAMV

[ 11]

[11]

90 100

200 500

CALCULATION

OF PROTEIN

CONFORMATION

FROM

CD

269

DO I00 J=I,NBASV DO 90 K=!,NPTS BST(I,J)=BST(I,J)+SAMV(I,K)*BASV(J,K) CONTINUE 00 200 J=I,NBASV DO 200 K=I,NBASV BCOEF(I,J)=BCOEF(I,J)+BST(I,K)*BBTI(K,J) CONTINUE CONTINUE RETURN END

C c C C C

C C C C C

2 3 C C C

4B 49 30 52 53 C

PROCRAM MATPRO.FOR SUBROUTINE MATPRO(IX, IY, IZ,A~B,C,SUM) DIMENSION A(20,20),B(20,20),C(20,20),SUM(20) TH~S PROGRAM CALCULATES THE PRODUCT OF TWO INPUT MATRICES. INITIALIZATION OF RKGISTERS. DO 3 I=1,20 DO 2 d=l,20 C(I,J)=O,O SUM(1)=O.O ~ALCULATEPRODUCT OF MATRICES AND SUMS OF THE PRODUCT ROWS. DO 50 I = 1 , I X O0 49 J = I , I Z DO 48 K = I , I Y C ( l , J ) = C ( l , d ) + A(I,K)*B(K~J) CONTINUE CONTINUE DO 53 I = I , I X DO 52 J=1,1Z SUM(1)=SUM(1) ÷ C ( I , J ) CONTINUE RETURN END

Addendum This review was completed in December 1983. L. A. Compton and W. C. Johnson, Jr.

[Biophys. J. 49, 494a (1986)] have now reported a simplified method of computing the leastsquares solution to the CD spectra of proteins by using a simple matrix multiplication. It is based on the generalized (Moore-Penrose) inverse matrix theorem, which does not depend on standard matrix diagonalization or inversion subroutines as described in Appendix C. Acknowledgments We thank Professor W. C. Johnson, Jr. for providing us with his computer program and unpublished data quoted in this work and Professor R. W. Woody for sending us a preprint of his review and the English translations of three USSR publications. We are indebted to both of them for their valuable comments and discussion. Thanks are also due Mrs. Y. M. L. Yang for her assistance in the preparation of this chapter. This work was supported by U.S. Public Health Service Grant GM-10880-24 and National Science Foundation grant PCM 83-14716.