Protein secondary structure from fourier transform infrared spectroscopy: A data base analysis

Protein secondary structure from fourier transform infrared spectroscopy: A data base analysis

ANALYTICAL BIOCHEMISTRY 194, 89-100 (19%) Protein Secondary Structure from Fourier Transform Infrared Spectroscopy: A Data Base Analysis Ronald D...

1MB Sizes 0 Downloads 68 Views

ANALYTICAL

BIOCHEMISTRY

194,

89-100

(19%)

Protein Secondary Structure from Fourier Transform Infrared Spectroscopy: A Data Base Analysis Ronald Department

W. Sawer, of Physical

Jr., and William and Analytical

C. Krueger

Chemistry,

The Upjohn

Co., Kalamazoo,

Michigan

49001

Received June 29,199O

An infrared (ir) method to determine the secondary structure of proteins in solution using the amide I region of the spectrum has been devised. The method is based on the circular dichroism (CD) matrix method for secondary structure analysis given by Compton and Johnson (L. A. Compton and W. C. Johnson, 1986, Anal. Biochem. 155, 155-167). The infrared data matrix was constructed from the normalized Fourier transform infrared spectra from 1700 to 1600 cm-’ of 17 commercially available proteins. The secondary structure matrix was constructed from the X-ray data of the seventeen proteins with secondary structure elements of helix, &sheet, &turn, and other (random). The CD and ir methods were compared by analyzing the proteins of the CD and ir databases as unknowns. Both methods produce similar results compared to structures obtained by X-ray crystallographic means with the CD slightly better for helix conformation, and the ir slightly better for &sheet. The relatively good ir analysis for concanavalin A and a-chymotrypsin indicate that the ir method is less affected by the presence of aromatic groups. The concentration of the protein and the cell path length need not be known for the ir analysis since the spectra can be normalized to the total ir intensity in the amide I region. The ir spectra for helix, &sheet, B-turn, and other, as extracted from the database, agree with the literature band assignments. The ir data matrix and the inverse matrix necessary to analyze unknown proteins are presented. o 1991 Academic Press.

Inc.

Kabsch and Sander classified protein secondary structure based primarily on the analysis of hydrogen bonding patterns and geometrical features extracted from X-ray coordinates (1). The amide I (1700-1600 cm-‘) region of the infrared (ir) spectrum of a protein is sensitive to the strength of hydrogen bonding (2). Therefore, the amide I region presents a means to examine 0003-2697/91 $3.00 Copyright 0 1991 by Academic Press, All rights of reproduction in any form

protein secondary structure. Often though, the ir frequency shifts induced by hydrogen bonding of various secondary structures are small, resulting in one broad amide absorption. The underlying component bands can be mathematically resolved by deconvolution techniques and the resolved bands related to protein structure (3-12). Unfortunately, because variation in band frequencies for the same structural feature occur, correlation of bands to secondary structure poses difficulties. At times, deuterium exchange of the amide hydrogens in proteins can aid with band correlations as Olinger et al. demonstrated (3). Many of the ir methods devised to quantitate protein conformation are based on band areas obtained from the deconvoluted amide I absorptions (5-7). These methods depend not only on proper deconvolution but also on correct assignment of component bands to secondary structures (7,12). Eckert et al. used a leastsquares curve-fitting of the amide I region of deuterated proteins to estimate secondary structure (13). This approach avoided deconvolution and prior assignment of component bands. As Eckert showed, the method was limited due to variable deuteration of the proteins used in the analysis, limited spectrophotometer resolution, uncertainties in protein conformation, and calculations assuming only three conformations. Two recent articles have appeared in the literature dealing with the determination of protein secondary structure using either partial least-squares or factor analysis methods (14,15). Both of these methods use a calibration matrix of aqueous ir spectra of proteins with secondary structures determined from X-ray crystallographic analysis. Further, both methods avoid deconvolution techniques and prior assignment of spectral bands. Dousseau and Pezolet (14) obtained the best agreement between ir and X-ray determined secondary structures of thirteen proteins when both the amide I and II regions of the spectra were used to generate a calibration set. A partial least-squares method was used 89

Inc. reserved.

90

SARVER

AND

to analyze the calibration set. Four structures were analyzed to produce the best results for this method: helix structure broken into ordered and disordered, ,&sheet, and undefined. The method of Lee et al. (15) applied factor analysis and multiple linear regression to aqueous ir spectra of 18 proteins with known secondary structures. The best results for a-helix, P-sheet, and turn structure were obtained by this method when just the amide I region from normalized ir spectra were used to construct the calibration set. Mahalonobis’ statistic (16,17) was also introduced to evaluate whether an unknown lies within the vector space of the basis set. In this investigation, we apply the circular dichroism (CD) matrix method of Compton and Johnson (18) to the analysis of protein secondary structure with ir spectra replacing the CD spectra. This method applies singular value decomposition (SVD)l techniques to determine the mathematical relationship between the X-ray secondary structures of known proteins and some physical observable, in the present work, the ir spectra of the known proteins in the amide I region. With some modification of the method, we show that moderately successful analyses can be obtained without assigning the ir bands to individual secondary structures and without determining the cell path length or protein concentration. MATERIALS Protein

AND

KRUEGER

the CD calculations were verified by calculating the secondary structure of each protein in the database (with that protein being analyzed removed). These calculations reproduce exactly the secondary structure data given in Ref. (19) (HJ calculation, Table 1) when we use the matrix methods and CD data between 260 and 178 nm given in Ref. (18), and the X-ray structure elements of Ref. (19). Singular

Value Decomposition

Techniques

The method described briefly here is given in more detail in Ref. (18). We start with the system of linear equations given in matrix notation as:

XR=F,

VI

where F is a m x n matrix of m secondary structures determined from the analysis of X-ray structures for n proteins used to construct a basis. The m x n matrix R contains the normalized ir data, m, for the n proteins and the matrix X fits the ir data to F. Since R contains more amide I ir spectra than the information content of the data, a least-squares method of solving Eq. [l] for X can be unstable (18). Therefore, SVD (20) is used to reduce the R matrix rank to a level consistent with the information content. Decomposing R using SVD results in

METHODS

R = USV*,

Sources

Grade XII-S ribonuclease S (bovine pancreas), type II myoglobin (sperm whale skeletal muscle), grade I lysozyme (chicken egg white), type IV papain (papaya latex), type I-S a-chymotrypsin (bovine pancreas), type X triose phosphate isomerase (rabbit muscle), type XI cytochrome c (tuna heart), type IV concanavalin A (Canavalia ensiformis), grade III-S pepsinogen (porcine stomach), parvalbumin (rabbit muscle), azurin (Pseudomonas aeruginosa), immunoglobulin G (bovine), and ferredoxin (Spirulina platensis) were purchased from Sigma Chemical Co. Pre-albumin (human), hemoglobin (bovine), elastase (porcine pancreas) and trypsin inhibitor (soybean) were purchased from Calbiochem-Behring. The proteins were dissolved in 10 IIIM sodium phosphate buffer, pH 7.15, at a concentration of 50 mglml (total volume of about 0.025 ml).

PI

with U, an orthogonal matrix of eigenvectors, S, a matrix with singular values on the diagonal and zeros elsewhere, and VT, a transpose matrix of V, an orthogonal matrix of coefficients. Since U and V are unitary matrices, we can apply principles of SVD theory to obtain the generalized inverse of R and the solution for X,

X = F(VS-‘UT),

[31

where S-’ is the pseudo-inverse of S and SS’ = I, where I is the identity matrix. The three matrices comprising the generalized inverse of R are truncated to include only the most significant eigenvectors. The X matrix can then be used to obtain secondary structure, fi, from the experimental ir spectrum, i, of a protein through the relationship

CD Analysis

fi = Xi = FVS’U*i.

[41

The database of CD spectra was taken from Ref. (18) and the secondary structure matrix from Ref. (19). The data of these matrices and our computer programs for

Normalized Matrix

’ Abbreviations mulative percent

Five microliters of the protein solution were deposited between two 13 X 2 mm CaF, disks separated by a thin aluminum spacer. The disk edges were wrapped

used: SVD, variance.

singular

value

decomposition;

CPV,

cu-

Fourier

Transform

ir Spectra-ir

Data

PROTEIN STRUCTURE

BY INFRARED

with Parafilm to minimize protein dehydration. The path length, estimated at 7 pm, was determined using ir interference fringes. Infrared absorbance spectra were measured at room temperature (22’C) through the coaddition of 512 scans at an optical resolution of 2 cm-’ from 3800 to 900 cm-’ using a Digilab FTS-40 Fourier transform ir spectrophotometer equipped with a Spectra-bench beam condenser (Spectra-Tech Inc.) and a mercury-cadmium-telluride detector. The spectrophotometer was purged with nitrogen to minimize water vapor and carbon dioxide absorptions. A reference spectrum, measured using identical conditions with only buffer between the disks, was digitally subtracted, using Digilab’s interactive SUBTRACT software, from the 50 mg/ml solution spectra to produce the protein spectra used for the matrix analysis. The buffer reference spectrum was scaled to produce protein spectra with flat baselines in two regions: (i) from 3800 cm-’ to the point just before where the buffer started to totally absorb the incident radiation; and (ii) from 1900 to 1750 cm-‘. Further, no large negative absorptions between 2000 and 1000 cm-’ were allowed. The resulting protein spectra were then baseline-corrected to zero absorbance at 1750 cm-’ and smoothed with a 13-point Savitsky-Golay (21) function. Spectral data were transferred through RS-232C cabling to a Harris computer for normalization and conformational analysis by the matrix method. Each ir protein spectrum was normalized by dividing absorbance values from 1700 to 1600 cm-’ by the total area under the ir spectrum from the same region. Every other normalized data point between 1700 and 1600 cm-’ was multiplied by one hundred, simply to scale the results, and placed in the 52 X 17 ir data matrix, R. The normalized ir spectra are directly proportional to the molar ir spectra if the area under the ir curve is directly proportional to the product of the amide concentration and the cell path length. This treatment of the data assumes that every amide group contributes the same to the ir spectrum in the 1700-1600 cm-’ region of the spectrum regardless of protein sequence or secondary structure, and that non-amide contribution to the intensity is small. The same assumptions for the 190-nm region of the ultraviolet spectrum for application to CD analyses would probably not be valid because the CD effect is dependent on the relative orientation of the amide groups and because significant intensity contributions occur from non-amide groups. Programs for the Harris computer used for all computations reported in this work, were written in FORTRAN, and IMSL subroutines were used for the eigenvector analysis. RESULTS

AND

DISCUSSION

IR Data Matrix The 52 X 17 matrix, R (Table l), contains 52 points, every other normalized ir absorbance from 1700 to 1600

91

SPECTROSCOPY

cm-‘, for each of the 17 proteins used in the analysis. The strongest unnormalized absorption in the amide I region for all proteins was less than 3.0 absorbance units before the buffer spectrum was subtracted. After subtraction of the buffer spectrum, the strongest absorption was under 0.6 absorbance units. The number of independent components in the R matrix are determined from analysis of the eigenvalues from the SVD of the R matrix. Using the singular values or eigenvalues, a variety of methods exist to determine the number of primary factors, eigenvectors with more true information than noise, without knowledge of the experimental error. These methods are discussed in more detail by Wang and Isenhour (22). We will consider and give data for two methods of determining the number of significant eigenvectors. In the end, our choice as to the number of eigenvectors to retain depends as much on what structures we can reliably extract from the X-ray data as it does on the analyses given below. One method considers the most significant eigenvectors as those with eigenvalues greater than the average eigenvalue. The eigenvalues for the R matrix are 45.974, 5.116, 4.986, 1.369, 1.038, 0.729, 0.480, 0.299, 0.209, 0.161, 0.127, 0.119, 0.085, 0.056, 0.039, 0.025, and 0.018. Only the first three eigenvalues are larger than the average eigenvalue of 3.578, which suggests they are most significant. The cumulative percent variance (CPV) method (Eq. [5]) provides the percent variance of the data in n dimensions accounted for by the data reduced to c dimensions: CPV = (2 XjIC Xi) x 100, i=l

i=l

where Xi, the eigenvalues, are summed for each dimension. Using this approach, the CPV for the first through the seventh most significant eigenvectors is 75.6, 84.3, 92.2,94.4,96.1,97.3, and 98.1, respectively. This method indicates that the four most significant eigenvectors are sufficient to reconstruct the data in the R matrix to within 94% of the variance in n dimensions. Since four is currently the number of secondary structures that are reliably extractable from the X-ray data, we will retain four eigenvalues for the R matrix in the calculations. The most significant eigenvectors, weighted by their corresponding eigenvalues, form the basis for the vector space described by the ir spectra of the proteins. Secondary Structure Matrix Since eigenvector analysis of the R matrix suggests the presence of four independent components, the 4 X 17 matrix, F (Table 2), is constructed using four secondary structure types for each of the 17 proteins. Ex-

TABLE

Normalized

92

Infrared

1

Spectral Data of Proteins from 1700 to 1600 cm-’ at 1.93-cm-’ Intervals (10’ X Absorbance/Area) (Matrix R)

Ribonuclease 0.34 0.40 1.77 1.82 1.62 1.49

S 0.48 1.86 1.38

0.56 1.89 1.26

0.66 1.92 1.15

0.75 1.96 1.06

0.85 1.99 0.98

0.93 2.02 0.92

1.00 2.05 0.86

1.07 2.08 0.80

1.13 2.12 0.74

1.20 2.15 0.69

1.28 2.15 0.65

1.37 2.13 0.62

1.46 2.08 0.60

1.55 1.99 0.58

1.63 1.88

1.71 1.75

Hemoglobin 0.02 0.04 1.79 2.00 1.18 1.06

0.08 2.23 0.95

0.13 2.46 0.86

0.19 2.64 0.77

0.27 2.75 0.68

0.33 2.75 0.60

0.39 2.66 0.53

0.46 2.54 0.47

0.54 2.41 0.43

0.63 2.27 0.39

0.73 2.14 0.34

0.84 2.00 0.31

0.95 1.85 0.28

1.08 1.72 0.27

1.24 1.58 0.26

1.41 1.44

1.59 1.30

Prealbumin 0.15 0.18 1.03 1.14 2.16 1.89

0.21 1.26 1.62

0.24 1.39 1.36

0.28 1.55 1.15

0.31 1.77 0.97

0.35 1.96 0.83

0.39 2.14 0.71

0.43 2.32 0.61

0.47 2.49 0.53

0.50 2.66 0.46

0.55 2.78 0.42

0.60 2.85 0.37

0.66 2.86 0.35

0.73 2.86 0.32

0.79 2.78 0.31

0.86 2.64

0.94 2.42

Papain 0.48 1.97 1.73

0.56 2.02 1.63

0.62 2.08 1.55

0.67 2.13 1.47

0.76 2.19 1.38

0.84 2.27 1.31

0.96 2.33 1.24

1.08 2.36 1.18

1.18 2.36 1.12

1.27 2.33 1.05

1.35 2.29 0.98

1.44 2.23 0.94

1.54 2.15 0.90

1.64 2.08 0.87

1.74 2.05 0.85

1.81 2.00 0.85

1.86 1.93

1.91 1.84

Myoglobin 0.36 0.41 1.82 2.02 1.38 1.27

0.43 2.25 1.19

0.42 2.46 1.10

0.47 2.66 1.00

0.47 2.89 0.93

0.57 3.01 0.88

0.67 2.98 0.82

0.75 2.86 0.76

0.82 2.62 0.69

0.86 2.42 0.62

0.91 2.24 0.59

1.01 2.05 0.56

1.11 1.90 0.53

1.23 1.83 0.52

1.35 1.74 0.51

1.47 1.64

1.64 1.51

Cytochrome c 0.33 0.38 0.44 2.17 2.33 2.49 1.37 1.27 1.19

0.51 2.59 1.10

0.60 2.64 1.02

0.68 2.67 0.97

0.82 2.66 0.95

0.95 2.59 0.91

1.08 2.49 0.89

1.19 2.36 0.84

1.27 2.21 0.78

1.36 2.06 0.75

1.47 1.92 0.71

1.55 1.81 0.69

1.66 1.72 0.68

1.76 1.66 0.68

1.87 1.57

2.01 1.48

Lysozyme 0.25 0.31 1.79 1.92 1.08 0.98

0.35 2.07 0.91

0.39 2.18 0.83

0.47 2.28 0.75

0.50 2.39 0.70

0.63 2.41 0.66

0.75 2.35 0.61

0.85 2.26 0.56

0.94 2.14 0.51

1.00 2.07 0.44

1.08 1.97 0.41

1.21 1.84 0.38

1.32 1.71 0.36

1.42 1.62 0.33

1.51 1.50 0.32

1.58 1.37

1.68 1.23

Elastase 0.58 1.78 2.02

0.66 1.80 1.89

0.72 1.83 1.75

0.78 1.85 1.61

0.88 1.89 1.48

0.95 1.95 1.38

1.08 2.01 1.31

1.19 2.04 1.23

1.27 2.08 1.16

1.33 2.09 1.08

1.36 2.16 1.00

1.39 2.23 0.96

1.49 2.29 0.91

1.56 2.32 0.88

1.63 2.36 0.85

1.68 2.36 0.83

1.70 2.29

1.75 2.16

a-Chymotrypsin 0.40 0.48 1.50 1.53 1.74 1.60

0.53 1.57 1.46

0.58 1.62 1.32

0.68 1.67 1.19

0.72 1.79 1.08

0.86 1.87 1.00

0.97 1.93 0.92

1.05 1.97 0.84

1.12 1.97 0.75

1.14 2.02 0.66

1.17 2.03 0.62

1.25 2.03 0.58

1.31 2.05 0.53

1.36 2.09 0.51

1.41 2.08 0.47

1.43 2.02

1.48 1.90

Triose 0.17 1.30 1.59

phosphate 0.20 0.22 1.45 1.60 1.37 1.19

isomerase 0.24 0.28 1.76 1.95 1.04 0.91

0.30 2.22 0.81

0.35 2.47 0.73

0.40 2.63 0.67

0.45 2.78 0.62

0.51 2.86 0.57

0.56 2.96 0.51

0.62 3.02 0.47

0.69 2.97 0.44

0.77 2.83 0.41

0.86 2.68 0.39

0.95 2.44 0.38

1.06 2.15

1.17 1.87

Trypsin 0.20 1.55 1.81

inhibitor 0.25 0.30 1.65 1.76 1.63 1.47

0.36 1.88 1.32

0.44 2.00 1.19

0.51 2.11 1.07

0.58 2.20 0.97

0.66 2.29 0.89

0.73 2.37 0.82

0.80 2.45 0.76

0.88 2.49 0.70

0.95 2.52 0.66

1.03 2.51 0.62

1.10 2.46 0.59

1.18 2.39 0.57

1.27 2.30 0.56

1.35 2.16

1.45 2.00

Concanavalin 0.51 0.62 1.43 1.46 2.09 1.91

A 0.70 1.52 1.80

0.76 1.54 1.69

0.79 1.60 1.57

0.79 1.69 1.43

0.84 1.74 1.32

0.93 1.80 1.20

0.98 1.86 1.06

1.01 1.86 0.95

1.04 1.91 0.84

1.06 1.97 0.79

1.12 2.03 0.70

1.18 2.12 0.65

1.23 2.23 0.62

1.29 2.32 0.60

1.32 2.32

1.37 2.23

Pepsinogen 0.22 0.28 1.41 1.51 1.66 1.57

0.31 1.63 1.48

0.35 1.72 1.37

0.43 1.84 1.23

0.45 2.02 1.11

0.57 2.13 1.01

0.66 2.21 0.91

0.72 2.27 0.83

0.77 2.27 0.73

0.79 2.33 0.62

0.84 2.36 0.57

0.93 2.34 0.51

1.01 2.28 0.48

1.10 2.22 0.45

1.18 2.11 0.43

1.24 1.96

1.32 1.81

Parvalbumin 0.97 1.01 2.69 2.85 2.32 2.18

1.04 3.03 2.06

1.06 3.18 1.93

1.10 3.33 1.80

1.13 3.49 1.70

1.21 3.56 1.62

1.31 3.56 1.55

1.41 3.51 1.50

1.50 3.40 1.45

1.59 3.31 1.40

1.69 3.21 1.38

1.81 3.09 1.36

1.93 2.96 1.35

2.08 2.87 1.34

2.23 2.75 1.34

2.37 2.62

2.53 2.47

Azurin 0.57 1.81 1.92

0.66 1.85 1.78

0.73 1.91 1.66

0.75 1.95 1.53

0.80 2.02 1.41

0.82 2.13 1.33

0.93 2.22 1.25

1.05 2.26 1.18

1.14 2.30 1.11

1.23 2.27 1.05

1.28 2.29 0.96

1.35 2.29 0.93

1.46 2.27 0.89

1.53 2.27 0.85

1.62 2.29 0.82

1.68 2.26 0.79

1.72 2.18

1.76 2.06

Ferredoxin 0.38 0.44 1.74 1.84 1.82 1.68

0.47 1.95 1.56

0.50 2.03 1.43

0.57 2.15 1.30

0.60 2.31 1.21

0.73 2.44 1.13

0.83 2.51 1.05

0.92 2.57 0.98

0.99 2.56 0.91

1.04 2.59 0.83

1.10 2.60 0.79

1.20 2.54 0.75

1.29 2.48 0.72

1.39 2.43 0.71

1.47 2.31 0.70

1.54 2.17

1.65 1.99

Immunoglobulin 0.39 0.46 1.57 1.60 2.03 1.90

0.55 1.63 1.78

0.66 1.67 1.68

0.76 1.71 1.57

0.86 1.74 1.48

0.93 1.79 1.40

0.99 1.88 1.32

1.04 1.96 1.24

1.11 2.04 1.16

1.19 2.13 1.09

1.27 2.22 1.02

1.32 2.29 0.95

1.37 2.34 0.89

1.42 2.34 0.85

1.47 2.30 0.82

1.51 2.23

1.53 2.14

G

PROTEIN STRUCTURE TABLE 2 Protein Secondary Structure from X-ray Data (Matrix Secondary Protein Ribonuclease Hemoglobin Prealbumin

Helix

S

Papain Myoglobin Cytochrome c Lysozyme Elastase

a-Chymotrypsin Triose phosphate Trypsin Inhibitor Concanavalin A

isomerase

Pepsinogen Parvalbumin

Azurin Ferredoxin Immunoglobulin

G

21 76 07 26 77 42 36 10 11 46 21 03 20 48 11 13 03

@-Sheet 38 00 50 22 00 02 10 40 35 18 26 44 40 06 30 17 67

structure

BY INFRARED

F)

(%)

b-Turn

Other

19 13 22 23 12 22 36 26 25 15 24 25 20 27 34 27 18

23 11 21 30 11 34 19 24 30 21 29 28 21 19 26 43 12

cept for the secondary structure of immunoglobulin GFab taken from Ref. (23), the secondary structures were determined from analysis of the X-ray diffraction data found in the Brookhaven Protein Data Bank (24). Analysis of the diffraction data using the program DSSP (1) provides a structural summary which is based mainly on hydrogen bonding patterns but also on geometrical features extracted from X-ray coordinates. The structural types established include ct-, 3-lo-, and 7r-helices; turns that are not part of helices mentioned, bends (areas of high chain curvature); P-ladders and sheets; and other or random. The program assigns each residue in a protein to one of the structural types. Overlap of structural types are avoided by assigning precedence to 3-lo-helix, P-ladder, ,&sheet, a-helix, r-helix, turn, bend, and other, respectively. The number of residues assigned to the three helix structures were totalled, divided by the total number of protein residues, and multiplied by 100 to yield the percent secondary structure due to helix. Similarly, residues assigned to ladders and sheets were combined to yield the percent due to P-sheet; residues assigned to turns and bends were combined to yield the percent due to p-turn; and all other residues were totalled to yield the percent due to other structure. The structure percentages obtained in this manner differ for some proteins than those given by Levitt and Greer (23), but an X-ray structure matrix could also be constructed using Levitt and Greer data. To maximize the vector space, proteins were selected for inclusion in the F matrix to encompass a wide variety of structural types. The fraction of helix structure ranges from 3 to 77%; ,&sheet, from 0 to 67%; ,&turn,

93

SPECTROSCOPY

from 12 to 36%; and other, from 11 to 43%. Selection was also aimed to include proteins with well defined atomic coordinates as determined from X-ray diffraction analysis. Every protein except for azurin has been analyzed to a resolution of 2.5 A or better, with 11 analyzed to better than 2.1 A. The Transformation

Matrix

X

The 4 x 52 transformation matrix, X (Table 3), is obtained through Eq. [3] using the F matrix and the generalized inverse of the R matrix truncated to the four most significant eigenvectors (vide supra). Figure 1 plots each row of X, which are the inverse ir spectra corresponding to the four secondary structures: helix, P-sheet, ,&turn, and other. Similar to the inverse CD spectra generated by Compton and Johnson (la), the inverse ir spectra undergo periodic oscillation, with three crests and four nodal areas. The position of the maxima at 1674,1655, and 1636 cm-’ suggest where most of the structural information occurs. These frequencies are consistent with previous literature assignments for P-turn and/or unordered, a-helix, and P-sheet structure, respectively (3,7,8). The curve structure below 1620 cm-l does not correspond to any previously mentioned secondary structural feature and is possibly due to slight aromatic contributions, but attempts to include a fifth eigenvector to account for aromatic contributions (unpublished results) has yielded little improvement over the results obtained using four eigenvectors. The curves for P-turn and other tell at a glance that the analysis of a given protein will yield approximately the same values for these structures, and this is the case. This result should be reflected in the database of secondary structures. Indeed, the X-ray structures given in Table 2 show that the database does not include a wide range of these structures as compared to helix and ,& sheet. A database with a wider range of these structures might improve the analysis. Calculated

Secondary

Structure

Spectra

We calculated the ir spectra of the amide I region corresponding to each of the four secondary structures using the matrix equation R = BF, where the ir data matrix R is constructed from the four most significant eigenvectors and F is the X-ray structure matrix. The equation is solved for B, the matrix which contains the calculated ir spectra of the amide I region. The solution, B = R(V&WT), is determined by applying SVD theory to obtain the generalized inverse of F, where F = ufss,v;. Figure 2 shows the calculated amide I region for each secondary structure. The peak positions of 1660, 1653, 1650, and 1634 cm-’ for the P-turn, helix, other, and

94

SARVER

AND TABLE

Generalized

KRUEGER 3

Inverse Infrared Spectral Data for Four Secondary Structures at 1.93-cm-’ Intervals (10’ X Inverse Normalized Absorbance)

Helix: 1.4637 -2.6432 3.2513 -0.3691 -1.6400 1.7142

0.9190 -2.5707 4.3569 -1.7098 -1.2989 1.9558

0.3492 -2.5810 5.3872 -2.7064 -0.9762

-0.3036 -2.4163 6.3122 -3.5061 -0.6210

-1.0124 -2.0798 6.6450 -3.8447 -0.2650

-1.5255 -1.5214 6.2361 -3.8245 0.1411

-2.1388 -0.7991 5.2852 -3.5257 0.5619

-2.5301 0.0296 3.8871 -3.0782 0.8490

-2.6745 0.9724 2.3291 -2.5491 1.1703

-2.6964 2.0727 0.9681 -2.0969 1.4908

P-Sheet: -3.9246 3.2606 -2.0139 2.7705 0.8125 -4.4369

-3.1589 3.3210 -3.0984 4.1735 0.2634 -4.8145

-2.3087 3.5439 -4.1923 5.1028 -0.2726

-1.2888 3.5205 -5.2278 5.7142 -0.8321

-0.0806 3.2589 -5.6469 5.6782 -1.3549

0.8148 2.6968 -5.2525 5.2049 -1.9849

1.9414 1.9729 -4.1073 4.3813 -2.6549

2.6491 1.1321 -2.3229 3.4900 -3.1334

3.0138 0.2067 -0.3214 2.4983 -3.6187

3.2141 -0.8421 1.3068 1.6437 -4.0922

P-Turn: -2.3031 2.1316 0.6327 0.6349 -0.3392 -2.5959

-1.9014 2.2420 0.1859 1.1043 -0.5128 -2.7839

-1.4491 2.4718 -0.3713 1.3991 -0.6976

-0.8956 2.5541 -1.0091 1.6224 -0.9135

-0.1890 2.5178 -1.4500 1.5799 -1.1156

0.3430 2.3236 -1.5863 1.3876 -1.4045

1.0653 2.0659 -1.3923 1.0488 -1.7374

1.5353 1.7630 -0.8734 0.7181 -1.9711

1.8337 1.4370 -0.2452 0.3346 -2.1963

2.0399 1.0739 0.2128 0.0015 -2.4253

Other: -2.7165 2.2332 0.5160 0.9487 -0.4169 -3.0664

-2.2673 2.3530 0.0247 1.4896 -0.6350 -3.2823

-1.7579 2.6033 -0.5806 1.8205 -0.8643

-1.1323 2.6914 -1.2640 2.0513 -1.1234

-0.3368 2.6449 -1.7180 1.9708 -1.3631

0.2623 2.4220 -1.8233 1.7136 -1.6979

1.0674 2.1297 -1.5450 1.2885 -2.0791

1.5863 1.7854 -0.8966 0.8755 -2.3496

1.9117 1.4172 -0.1260 0.4035 -2.6093

2.1353 1.0100 0.4411 -0.0054 -2.8711

P-sheet conformation, respectively, are consistent with reported literature values of 1666-1688, 1655-1657, 1650, and 1627-1642 cm-‘, respectively (3,7,8). We point out one more correlation which may be only for-

tuitous mathematics. That is, the spectrum associated with P-sheet conformation also shows a broad absorption centered near 1680 cm-‘, consistent with the higher frequency component described in the literature (4,7). The curve for P-turn does not look ideal. We would much rather have seen the wing absorptions approach zero at 1700 and 1600 cm-‘.

z 2 FIG. 1. (0, helix;

from 1700 to 1600 cm-’ (Matrix X)

1700

Generalized 0, B-sheet;

1650 FREQUENCY

1700

1bUll

(cm - ’ )

inverse ir spectra for four 0, p-turn; n , other).

secondary

structures

FIG. 2. ir spectra genvectors for four turn; W, other).

1600

1650

FREQUENCY reconstructed from secondary structures

(cm - ‘) the four most significant ei(0, helix; 0, p-sheet; Cl, /3-

PROTEIN

Secondary Structure of Proteins

Analysis

STRUCTURE

BY

of the ir Database

The accuracy of the ir method can be accessed by taking the dot product of the ir data matrix R with each inverse ir spectrum from the X matrix. The secondary structures generated can then be compared to the X-ray structure matrix F. For example, the dot product of the data for the first protein of the R matrix, ribonuclease, with the first generalized inverse of the X matrix equals 17, which is the calculated percent of helix for ribonuclease. This we did for all proteins in the database with and without removing the protein being analyzed from the database. Table 4 lists the secondary structure percentages obtained along with the X-ray structures. Consider first the results obtained without removing the protein being analyzed from the data base. The calculated secondary structures of Table 4 analyze to within 8% of the X-ray structures for 9 of the 17 proteins and within 15% for all except ferredoxin and immunoglobulin G. Pearson product-moment correlation coefficients (25) between ir and X-ray secondary structures (Table 4) show that most variation occurs in structure attributable to other. Root mean square error between the ir protein spectra and those reconstructed using four eigenvectors is between 0.024 and 0.085 normalized absorbance units. Notice that the goodness-offit between the experimental and the calculated ir curves does not correlate very well with the accuracy of the analysis. For example, the fit for concanavalin A is not good according to this parameter (0.085), but the analysis is reasonably accurate. Notice also that the total of the calculated structure always sums closely to 100 even though only 9 of the 17 proteins analyze correctly (8% limit). This suggests that the total parameter also does not correlate very well with the accuracy of the analysis. The agreement between the X-ray and calculated structures is not as good when the protein being analyzed is removed from the database. With the protein removed to unbias the results, 5 of the 17 proteins analyze to within 8% of the X-ray structure. These are prealbumin, papain, a-chymotrypsin, soybean trypsin inhibitor and concanavalin A. All estimated secondary structures are within 15% of the X-ray structure except for hemoglobin, myoglobin, lysozyme, parvalbumin, ferredoxin, and immunoglobulin G. Hemoglobin and myoglobin show slightly less, and lysozyme slightly more, helix content compared to the X-ray structures. Parvalbumin shows more and immunoglobulin G less P-sheet structure than the X-ray results. The correlation coefficients for these unbiased results (Table 4) indicate good agreement between the ir and X-ray secondary structures for helix and P-sheet conformations and poor agreement for P-turn and other structure. The

INFRARED

SPECTROSCOPY

95

average percent difference between the ir and X-ray calculated secondary structures is 8.3% for helix, 8.4% for P-sheet, 5.1% for P-turn, and 7.3% for other. The sum of the secondary structures range from 86 to 111% with the exception of 148% for parvalbumin, which also shows the worst fit between the experimental and calculated ir spectra according to the root mean square error. Concanavalin A and cu-chymotrypsin have been shown to produce anomalies in the CD spectra (26) due to contributions from aromatic side chains. Since the ir-determined structures for these two proteins agree well with X-ray obtained structures, any absorptions in the amide I region due to aromatic side chains must have a minimal affect on the analysis, in agreement with Eckert’s earlier findings (13). To test for experimental reproducibility, ir spectral data for two proteins, myoglobin and concanavalin A (the first high in helix content, the second high in /3sheet content) were collected at concentrations of about 80 and 40 mg/ml (total volume of about 0.025 ml). The secondary structures were determined at these two concentratiqns using the basis determined from all 17 proteins. The results for both proteins (Table 5) reproduce the results presented in Table 4 to within 5%. In these ir experiments (as in the CD experiments), the analyst must strike a balance between cell path length and concentration to yield a high enough absorbance for accurate measurement and a short enough path length for minimum background absorption by water. Too low a concentration will not produce accurate analyses since the increase in path length necessary to yield measurable absorbance data will produce intolerable background absorption by water. Comparison

of the ir and CD Methods

We can crudely compare the ir results to published CD analyses by determining the number of proteins in the database for which the calculated structures agree to 8% or better. The unbiased results for the ir analysis (see above) indicate that five out of the seventeen proteins analyze correctly within the 8% limit. The results from the CD analysis, which uses five eigenvectors corresponding to a known mixture of helix, antiparallel ,&sheet, parallel P-sheet, P-turn, and other (HJ method, Ref. (19)), indicate that 4 out of 16 proteins analyze correctly for the unbiased calculation (with summation of the antiparallel and parallel P-sheet results). The two methods are comparable in this analysis. The (unbiased) correlation coefficients between the calculated and X-ray structures for helix, P-sheet, P-turn, and other are 0.98,0.54,0.30, and 0.61 for CD (19) and 0.90, 0.81, 0.22, and -0.03 for the ir (Table 4). These results indicate that CD does better than ir for helix and other structures, ir does better than CD for P-sheet, and both

96

SARVER

AND

KRUEGER

TABLE

Comparison

of the X-ray

and Infrared

Determined

4

Structure Secondary

Protein Ribonuclease

S

Hemoglobin

Prealbumin

Papain

Myoglobin

c

l3

T

0

Total

X-ray ir’ ird X-ray if ird X-ray ir’ ird X-ray ir’ ird X-ray ir’

21 17 17 76 65” 60* 7 11 15 26 23 23 77 64* 57* 42 49 51’ 36 46’ 51* 10 2 1* 11 6 6 46 42 39 21 25 26 3 -3 -4 20 23 24 48 54 52 11 14 15 13 29’ 30’ 3 0 -1 0.94 0.90

38 33 32 0 -4 -7 50 50 50 22 25 26 0 -7 -7 2 5 6 10 8 7 40 45 45 35 41 42 18 23 28* 26 32 32 44 46 46 40 29* 28* 6 11 32’ 30 34 35 17 27* 28* 67 44* 39* 0.91 0.81

19 27 28* 13 18 20 22 20 19 23 26 26 12 16 18 22 25 26 36 24’ 18* 26 28 28 25 27 26 15 17 19 24 22 22 25 23 23 20 20 20 27 24 31 34 25* 24* 27 23 22 18 24 25 0.60 0.22

23 28 28 11 19 21* 21 24 27 30 26 25 11 15 18 34 24* 21* 19 24 25 24 29 30 30 28 27 21 19 20 29 24 23 28 24 24 21 22 22 19 23 33* 26 26 26 43 23* 22* 12 25* 28* 0.44 -0.03

101 105 105 100 98 94 100 105 111 101 100 100 100 88 86 100 103 104 101 102 101 100 104 104 101 102 101 100 101 106 100 103 103 100 90 89 101 94 94 100 112 148 101 99 100 100 102 102 100 93 91

ray

ir’ ird X-ray ir’ ird X-ray ir’ . d ;;. ray ir’

Elastase

a-Chymotrypsin

. d

Triose phosphate isomerase

;;-

Trypsin inhibitor Concanavalin

A

Pepsinogen

Parvalbumin

Azurin

Ferredoxin

Immunoglobulin

Correlation Correlation

G

coeff.” coeff.d

1%)

H

;-

Lysozyme

structure”

Proteins

Method

d

Cytochrome

of the 17 Database

=w

ir’ ird X-ray ir” ird X-ray ir’ ird X-ray ir’ ird X-ray ire ird X-ray ir’ ird X-ray ir’ ird X-ray ir’ ird

Note. * Indicates ir results deviate from X-ray structure by ~8%. ’ Secondary structures: H, (01, 3-10, and x) helix; B, parallel and antiparallel b-sheet; T, p-turn; 0, other. * Root mean square error between experimental ir and ir calculated using four eigenvectors (units, normalized ’ Results with protein being analyzed included in the database. d Results with protein being analyzed removed from the database.

absorbance).

rms*

0.036 0.046 0.067 0.099 0.038 0.071 0.037 0.043 0.062 0.090 0.029 0.039 0.036 0.062 0.024 0.030 0.042 0.054 0.077 0.120 0.032 0.030 0.085 0.100 0.045 0.049 0.038 0.150 0.025 0.028 0.026 0.028 0.057 0.070

PROTEIN TABLE

STRUCTURE

BY

5

Infrared Determined Secondary Structures of Myoglobin and Concanavalin A at Two Different Concentrations Secondary

structure”

(%)

Concentration

(mdml)

Protein Myoglobin Concanavalin

D Secondary

A

structure

80 40 80 40 abbreviations

H

B

T

0

64 64 -1 -2

-9 -4 48 45

16 14 21 21

15 14 23 22

as given

in Table

Total 86 88 91 86

4.

techniques do about equally poor for P-turn. The ir correlation coefficient for other structure is almost zero for this database (-0.03) but improves considerably if the protein being analyzed is included in the database (to 0.44, Table 4). For a more direct comparison between the ir and CD results, the X-ray matrices for both techniques were trimmed to include just the 10 common proteins. The CD results were calculated using five eigenvectors and then the parallel and antiparallel P-sheet results were summed together. The X-ray structure matrix used for the ir and CD analyses are not identical, and this is perhaps as it should be, since the ir transitions are more local in molecular origin than the electronic transitions of the CD spectra. It is likely that the methods used to extract secondary structures from the X-ray data for CD analyses (27) will not be valid for ir analyses. Nevertheless, we can compare both methods on the basis of how well the X-ray structures can be reconstructed by the techniques. Table 6 lists the results of the biased and unbiased calculations for the database comprised of the 10 common proteins. Both sets of calculations yield the same general conclusion as obtained in the analysis given in the above paragraphs: one method is about as good as the other, but the CD does a little better for helix, and the ir does a little better for P-sheet. With the protein included in the basis before analysis, seven of the 10 proteins analyze to within 8% using ir and CD methods. For the unbiased results, two proteins analyze at the 8% error level by ir (papain and cY-chymotrypsin), and one by CD (triose phosphate isomerase). The average percent difference between the IR (unbiased results with 10 proteins) and X-ray calculated secondary structures is 10.3% for helix, 6.5% for P-sheet, 6.7% for ,&turn, and 6.3% for other. The average percent difference between the CD (unbiased results with 10 proteins) and X-ray calculated secondary structures is 3.8% for helix, 15.7% for ,&sheet, 5.0% for P-turn, and 9.3% for other. The correlation coefficients from the unbiased calculations (Table 6) also show that CD does better for helix and P-turn, IR does better for P-sheet,

INFRARED

SPECTROSCOPY

97

but both methods do poor for other. A detailed inspection of the correlation coefficients from Tables 4 and 6 indicate that these numbers are sensitive to the proteins in the data base. This is not surprising. Until we understand what constitutes a good database, a detailed interpretation of the correlation coefficients from these calculations is probably not warranted. Indeed, the correlation coefficient for P-turn conformation from the CD method is better than that obtained from the ir method (Table 6, unbiased results), but the average percent difference between the calculated and X-ray determined ,&turn conformation for both techniques is nearly the same. We were unable to improve the ir method significantly by the retention of only three principal values corresponding to helix, P-sheet and other. For the biased calculations, 6 out of 17 proteins met the 8% or less error criterion, as compared to 9 out of 17 for the four principal value analysis. The correlation coefficients for this analysis were 0.93, 0.85, and 0.28 (helix, sheet, other) as compared to 0.94, 0.91, 0.60, and 0.44 (helix, sheet, turn, and other). CONCLUSIONS The matrix method to estimate protein secondary structure from the IR spectra of the amide I region provides results that agree well with the X-ray structures for helix and ,&sheet conformations, but shows poorer agreement for P-turn and other structure. As evidenced by the results for concanavalin A and a-chymotrypsin, the ir method is less influenced by aromatic side chain absorptions than the CD method. The IR method compliments the CD method in that the CD method provides better results for helix determination and the IR method provides better results for P-sheet determination. One experimental advantage of the ir method is that accurate concentration and cell path length measurements are not critical for a good analysis if the ir spectra are normalized according to the procedure given in this work. This procedure, as applied in the ir analysis, assumes that only amides absorb in the 1700 to 1600 cm-’ region of the spectrum and that each amide group contributes the same to the ir spectrum (except for frequency shifts) with no intensity dependence on secondary structure or amino acid sequence. Evidently, these assumptions are valid enough to yield ir results comparable to those from CD methods. The normalization procedure may provide better agreement in the analysis of the same protein using different ir instruments, but we have not obtained data as yet to answer this question. The analogous situation, in the case of the CD method, would be if all proteins showed about the same band intensity in their electronic absorption spectrum at 190 nm. This situation may not be obtained for ab-

98

SARVER

AND TABLE

Comparison

of the Infrared

and CD Analyses

KRUEGER 6

Using

a Database

of 10 Common

Proteins

Secondary structure” (%) Protein Ribonuclease

S

Ribonuclease

A

Hemoglobin

Prealbumin

Papain

Myoglobin

Cytochrome

c

Lysozyme

Elastase

a-Chymotrypsin

Method

H

B

T

0

Total

rms*

X-ray’ IR’ IR*

21 15 14

38 31 29*

19 27 29*

23 27 28

101 100 100

0.02 0.03

X-ray’ CD CD*

24 26 26

33 21* 18*

14 20 21

29 33 35

100 100 100

0.24 0.30

X-ray’

76 68 65’ 75 75 74

0

IR” IRd X-rayl CD’ CD*

1

13 18 19 14 16 19

11 15 17 11 14 24’

100 99 97 100 103 118

X-ray’ IR’ IR* X-ray’ CD’ CDd

7 15 27* 7 9 12

50 45 38* 45 44 13*

22 18 13* 14 15 19

21 20 19 34 32 -4*

100 98 97 100 100 40

X-ray IR’ IRd X-ray’ CD CD*

26 29 27 28 27 25

22 24 28 9 17 25*

23 22 24 14 15 15

30 27 27 49 50 37*

101 102 106 100 109 102

X-raye IR IRd X-ra# CD CDd

77 76 57* 78 80 90*

0

12 10 22* 12 13 19

11 14 22* 10 9 18

100 97 106 100 97 106

X-raf IR” Id X-ray’ CD’ CD*

42 51* 54s 38 38 37

2 7 9 0 6 11+

22 24 24 17 12 8*

34 24’ 21* 45 38 28+

100 106 108 100 94 84

X-ray’ IW IR*

36 41 54*

10 10 3

36 27* 15*

19 22 18

101 100 90

0.04 0.09

X-ray’ CD” CDd

36 33 32

9 27* 46*

32 27 22*

23 21 29

100 108 129

0.10 0.25

X-ray’ ir’ ir* X-rag CD’ CD*

10 7 4 10 8 8

40 41 41 37 32 29

26 24 23 22 22 23

24 30 35* 31 39 43*

100 102 103 100 101 102

X-ray’ ir’

11 7 8 10 12 13

35 36 35 34 22* 20*

25 25 24 20 20 19

30 27 25 36 28 27*

101 95 92 100 82 79

d

g-ra# CD’ CDd

-2 -4 0 -2

-3 5 0 -5 -21*

0.05 0.09 0.08 0.28 0.05 0.11 0.06 0.45 0.04 0.06 0.10 0.27 0.03 0.11 0.09 0.48 0.03 0.04 0.10 0.20

0.02 0.03 0.07 0.10 0.04 0.06 0.13 0.17

PROTEIN

STRUCTURE

TABLE

BY

INFRARED

6-Continued Secondary

Protein Triose

phosphate

ir correlation ir correlation CD correlation CD correlation

isomerase

coeff.’ coeff.d coeff.” coeff.d

99

SPECTROSCOPY

structure”

(%)

Method

H

B

T

0

Total

X-ray’ ir’ ird X-ray’ CD’ CDd

46 43 41 52 49 47

18 27* 32* 14 14 12

15 18 19 11 9 7

21 18 16 23 25 21

100 106 108 100 97 87

0.74 -0.21 0.86 0.46

0.76 0.38 0.92 0.24

0.97 0.86 1.0 0.98

0.96 0.90 0.83/0.83 0.40/0.20

rmsb

0.08 0.13 0.10 0.18

Note. *Deviates from X-ray structure by >8%. D Secondary structure abbreviations as given in Table 4 (parallel and antiparallel P-sheet were summed for the CD method). b Root mean square error between experimental and calculated spectra (units: IR, normalized absorbance; CD, 66). ’ Results with protein included in basis before analysis. d Results with protein removed from basis before analysis. e From analysis of the Brookhaven Data Bank as described under Results and Discussion. f Reference (19).

sorbance data, however, as evidenced by the range of the molar absorptivity at 190 nm for a series of proteins, from 8510 for lactate dehydrogenase to 11,460 for lysozyme (27). To our knowledge, the intensities obtained by band areas in the 190 nm region of the spectrum have not been compared. Perhaps the ir method can be fine tuned by a judicious choice of what proteins to include in the database, by choosing a secondary structure matrix from the X-ray data more suited for the ir analysis, by using a cell of accurately known path length (a difficult task), by more rigorous control over the buffer subtraction (28,29), by using protein solutions with well determined protein concentrations, by using protein solutions of D,O rather than H,O, by applying the variable selection method (19), and so forth. However, in the end, one must always be ready for an analytical disaster. The reason for this is that even if the experimental error is small, and the mathematical treatment exact, there exist certain basic assumptions which must be met for a successful analysis. These assumptions-that the crystal and solution structures are the same, that there are no tertiary structure effects, that only the amide chromophore absorbs in the analytical region, and that a single spectral curve describes a given secondary structure-have recently been reviewed by Manning in the case of the CD method (26). These assumptions apply equally well to the ir analysis. A breakdown of one of these assumptions can produce an unsuccessful analysis, even though the experiment is accurate. As regards the variable selection method, we do not have enough experience with the ir method at this time to judge whether or not an unknown has been analyzed successfully. This work suggests that

the structure total or the curve fit parameter will not be able to differentiate a good analysis from a poor one, but this remains to be explored.

ACKNOWLEDGMENTS We thank Howard Einspahr and Barry Finzel for consultations about the X-ray structures of the proteins used to construct the basis, Bruno Vansina for programming support and Dave Elrod for help with the analysis of the Brookhaven Protein Data Bank.

REFERENCES 1. Kabsch,

W., and Sander,

2. Parker, F. S. (1971) chemistry, Biology, York. 3. Olinger, (1986)

J. M.,

C. (1983)

Biopolymers 22,2577-2637.

Applications of Infrared Spectroscopy in Bioand Medicine, Chapter 10, Plenum, New

Hill,

D. M.,

Jakobsen,

4. Harm, P. I., Lee, D. C., and Chapman, Acta 874,255-265. 5. Byler,

R. J., and

Brody,

R. S.

Biochim. Biophys. Aetu 869,89-98.

D. M.,

Brouillette,

D. (1986)

J. N., and Susi,

Biochim. Biophys.

H. (1986)

Spectroscopy

l(3), 29-32. 6. Byler,

D. M.,

7. Surewicz, Acta 962,

and Susi,

H. (1986)

W. K., and Mantsch, 115130.

8. Gorga, J. C., Dong, A., Manning, W. S., and Strominger, J. L. 2321-2325.

Biopolymers 26, 469-487. Biochim. Biophys.

H. H. (1988)

M. C., Woody,

R. W., Caughey,

(1989) Proc. Natl. Acad. Sci. 86,

9. Haris, P. I., Chapman, D., Harrison, R. A., Smith, K. F., and Perkins, S. J. (1990) Biochemistry 29, 1377-1380. 10. Dev, S. B., Rha, C. K., and Walder, F. (1984) J. Biomol. Strut.

Dyn. 2,431-442.

100

SARVER

11. Yang, W. J., Griffiths, P. R., Byler, Appl. Spectrosc. 39, 282-287.

D. M.,

and Susi,

AND

H., (1985)

12. Mantsch, H. H., Surewicz, W. K., Muga, A., Moffatt, D. J., and Casal, H. L. (1989) 7th International Conference on Fourier Transform Spectroscopy (Cameron, D. G., Ed.), pp. 580-581, Proceedings SPIE, Bellingham, WA [Abstract 11451 13. Eckert,

K., Grosse,

R., Malur,

J., and

Repke,

K. R. H. (1977)

Biopolymers 16,2549-2563. 14. Dousseau, 8779.

F., and

Pezolet,

M.

(1990)

15. Lee, D. C., Haris, P. I., Chapman, Biochemistry 29,9185-9193.

D., and Mitchell,

16. Fredericks, P. M., Lee, J. R., Osborn, (1985) Appl. Spectrosc. 39, 303-310. 17. Fredericks, P. M., Lee, J. R., Osborn, (1985) Appl. Spectrosc. 39.311-316. 18. Compton,

L. A., and Johnson,

Biochemistry

29, 8771R. C. (1990)

P. R., and Swinkels,

D. A. J.

P. R., and Swinkels,

D. A. J.

W. C., Jr. (1986)

Anal. Biochem.

155,155-167. 19. Manavalan,

167,76-85.

KRUEGER 20. Forsythe, G. E., Malcolm, M. A., and Moler, C. B. (1977) Computer Methods for Mathematical Computations, pp. 192-239, Prentice-Hall, Englewood Cliffs, NJ. 21. Savitsky, A., and Golay, M. J. E. (1964) Ad. Chem. 36, 16271639. 22. Wang, 195.

C. P., and Isenhour,

23. Levitt,

M., and Greer,

Appl. Spectrosc. 41,185-

T. L. (1987)

J. (1977)

Biol. 114, 181-239.

J. MOE.

24. Berstein, F. C., Koetzle, T. F., Williams, G. J. B., Meyer, E. F., Brice, M. D., Rogers, J. R., Kennard, 0. Shimanouchi, T., and Tasumi, M. (1977) J. Mol. Biol. 112, 535-542. 25. Young, H. D. (1962) Statistical Treatment of Experimental Data, pp. 126-132, McGraw-Hill, New York. 26. Manning, M. C. (1989) J. Pharm. Biomed. Anal. 7,1103-1119. 27. Hennessey, J. P., Jr., and Johnson, 20, 1085-1094. 28. Dousseau,

F., Therrien,

M.,

and

W.

C., Jr. (1981) Biochemistry

Pezolet,

M.

(1989)

Appl. Spec-

trosc. 43, 538-542. P. and

Johnson,

W. C., Jr.

(1987)

Anal. Biochem.

29. Powell,

J. R., Wasacz,

spectrose. 40,339-344.

F. M.,

and Jakobsen,

R. J. (1986)

Appl,