ANALYTICAL BIOCHEMISTRY ARTICLE NO.
257, 1–11 (1998)
AB972502
Infrared Absorbances of Protein Side Chains Kim Rahmelow, Wigand Hu¨bner, and Th. Ackermann Institut fu¨r Physikalische Chemie, Albert-Ludwigs Universita¨t, Albertstraße 23a, D-79104 Freiburg, Germany
Received February 18, 1997
The spectral parameters of amino acid residue side chain and peptide bond absorptions in the region 1800 –1440 cm21 have been obtained by using an inverse matrix method applied to the infrared spectra of 42 amino acids, dipeptides, and higher peptides in aqueous solution. In addition the pH-dependent extinction coefficients of the amino acid and peptide COO2/NH1 3 end groups were derived. It is shown that the secondary structure prediction accuracy of proteins by multivariate data analysis methods increases slightly, if the side chain absorbances of the residues asparagine, glutamine, aspartic acid, glutamic acid, arginine, tyrosine, and lysine are subtracted from the amide I and amide II region. © 1998 Academic Press Key Words: infrared spectroscopy; protein secondary structure; side chains; peptides; quantitative spectroscopy; multivariate data analysis; prediction accuracy.
In recent years a number of publications have shown the possibility to determine the secondary structure content of aqueous protein solutions by infrared spectroscopy (1–5). The results are based on the sensitivity of the peptide group absorption bands to the conformation of the polypeptide chain and to the hydrogen bonding patterns (6 – 8). To extract quantitative protein secondary structure information, a contour shape analysis of the amide I band of proteins in D2O, including Fourier self-deconvolution, curve fitting procedures, and the assignment of the resultant amide I components to different secondary structure types, was presented by Byler and Susi (9). However, the possibilities of an incomplete H/D exchange and uncertainties in the assignment of all spectral components within the amide I band (10) have lead to the application of multivariate data analysis methods to circumvent these problems. With the aid of factor analysis and partial least squares procedures (2–5) the secondary structures of proteins in H2O were obtained by analyzing the spectral region of the amide I/II bands. Although a calibra0003-2697/98 $25.00 Copyright © 1998 by Academic Press All rights of reproduction in any form reserved.
tion matrix of IR spectra of proteins with known secondary structure contents, determined by X-ray crystallography, is needed, it is advantageous that deconvolution, curve fitting techniques, and prior assignment of spectral components can be avoided by these methods. All the investigations reported above on quantitative secondary structure prediction were based upon the assumption that the IR spectra in the amide I/II region are solely a summation of the backbone peptide group absorbances occurring in the different structural elements, like helix, sheet, turn, and random conformation. Unfortunately, several other characteristic group frequencies absorb in the spectral range of the amide bands, which might influence the prediction of secondary structures by deconvolution or various multivariate data analysis techniques. This has to be considered even for high-purity protein solutions, since other parts of the protein cause absorbances in the region where the amide I also is found. The main disturbances to be mentioned are the side chains of the amino acid residues. For a protein with unusual composition (e.g., high glutamine content) the side chain contribution has to be subtracted prior to secondary structure prediction. An estimation of the side chain effects on the spectral proportions of proteins in D2O was given by Chirgadze et al. (12). In a more recent publication Venyaminov and Kalnin (13) calculated the spectral parameters of the amino acid residue absorption bands in H2O. Band assignments and intensities were derived from amino acids and amino acid derivates by curve fitting procedures with Gaussian/Lorentzian lineshapes. Besides the peptide bonds, which are the basis of the secondary structure prediction, one-COO2 and one-NH1 3 group at both ends of the protein backbone chain and some functional groups in the side chains of the different amino acid residues contribute to the spectral region 1750 –1480 cm21. Intensive absorption in the amide I/II area is caused by the side chain carboxylate group of aspartic and glutamic acid, by the 1
¨ BNER, AND ACKERMANN RAHMELOW, HU
2
side chain amide group of asparagine and glutamine, by the guanidinium group of arginine, by the amino group in the side chain of lysine, and by the ring vibrations of the phenolic side chain group in tyrosine. In our approach we have used matrix inversion techniques, applied to a data set of 42 IR spectra including amino acids, di- and higher peptides dissolved in H2O, to derive the extinction coefficients of amino acid side chains, peptide bonds, and protein end groups in the range 1800–1440 cm21. MATERIALS AND METHODS
As model substances for deriving the side chain absorptions of proteins we use amino acids and short peptides. These are at disposal with various combinations of side chains. In contrast to proteins, their side chain absorbances are not superimposed by dominating amide bands. Assuming that strong coupling is absent, i.e., the vibrations of a peptide are a linear combination of the peptide bonds, the end groups, and the various side chains, the absorbances of the individual groups can be separated by means of matrix inversion techniques. The principle of the method used here is to generate a matrix C (concentrations), which contains the respective amounts of the absorbing groups and to multiply its inverse with the matrix A (absorbances) made up by the measured extinction coefficients.1 The rows of the matrix A contain the experimental extinction coefficients (per mole amino acid residue) of the amino acids, di-, tri-, and higher peptides. Each row of C consists of the linear coefficients needed to assemble the peptides from the individual groups. Based upon the matrix equation A 5 RC,
[1]
spectra for the distinguishable absorbing amino acid side chains, the peptide bond and the OCOO2/ONH1 3 end groups are calculated according to the matrix equation R 5 AC T(CC T)21,
Peptides Used To apply the described principle we have measured nine amino acids, 23 dipeptides, and seven tri- or tetrapeptides, as well as three polypeptides in aqueous solution. The exact composition of the data set is documented in Table 1. The amino acids and di-, tri, tetra-, and polypeptides used for the determination of side chain absorbances were obtained from Sigma (except Lys–Lys, from Serva Heidelberg). Extinction coefficients are determined by exactly weighing the dry peptides and by determining the exact cell thickness through a two-area water subtraction routine (19). From the limitations for this type of measurement an overall error of about 10% can be expected. Infrared Spectroscopy Infrared spectra were obtained with a Bruker IFS 113v FTIR2 spectrometer equipped with an MCT detector. A total of 1024 scans were averaged for each sample. All spectra were recorded at a resolution of 2 cm21 and apodized with a triangular function and a zerofilling factor of 2 resulting in data encoded every 1 cm21. The sample compartment was continuously purged with dry air. Concentrations of the sample solutions were approximately 30 – 40 mg/ml, and 7.5 ml thereof was measured in a demountable cell with two CaF2 windows separated by a 6-mm Teflon spacer. Water subtraction for the measurements was carried out in two spectra ranges with straight baseline: 2300 – 1800 and 4000 –3650 cm21. RESULTS AND DISCUSSION
[2]
where T denotes a transposed matrix. In this special case each row of the Matrix R is thus made up of the molar extinction coefficients (E0 in liter/mol cm) of one of the above-mentioned residues between 1800 and 1440 cm21. The quality of the calculated resultant component spectra substantially depends on the exact knowledge of all factors that contribute to the absorption in the examined spectral range. If components are considered 1
that actually do not absorb in the given wavenumber range, virtual spectra are generated for these species, which have no physical meaning but explain the measurements mathematically in the best possible sense. On the other hand the spectral contents of omitted species are distributed upon the other group absorbances, thus leading to a deterioration of the results.
At this stage it may be advisable to employ factor analysis techniques to stabilize the result.
The 42 samples were compiled in such a manner that for each relevant amino acid residue at least three different compounds have been measured. Additionally, it has to be verified that all selected groups are distinguishable and thus predictable in the investigated spectral range. For this purpose we have developed a verification procedure based on predictions using singular value decomposition (11). Therefore, we have calculated relative standard errors of prediction 2 Abbreviations used: FTIR, Fourier transform infrared; MCT, mercury cadmium telluride; SVD, singular value decomposition; SEP, standard error of prediction.
PROTEIN SIDE CHAIN ABSORBANCE TABLE 1
Composition of the Peptides Contained in the Data Set Sample
Order No.
pH
Concentration
Alanine Aspartic acid Glutamic acid Phenylalanine Glycine Lysine Asparagine Glutamine Arginine Gly–Ala–Ala Glu–Gly–Phe Ala–Gly–Gly Ala–Leu–Ala (Ala–Leu)2 Ala3 Ala4 Ala–Ala Ala–Asp Ala–Glu Ala–Gly Ala–Asn Asp–Ala Asp–Asp Asp–Glu Asp–Phe Asp–Gly Asp–Gln Glu–Glu Glu–Lys Glu–Gln Gly–Ala Lys–Ala Lys–Gly Lys–Lys Polyaspartic acid Polylysine Polyasparagine Arg–Glu Arg–Gly Tyr–Leu Tyr–Glu Tyr–Tyr
A-7627 A-9256 G-1251 P-2126 G-7126 L-5626 A-0884 G-3126 A-5131 G-4504 G-3501 A-1378 A-3671 A-3546 A-9627 A-4900 A-9502 A-0253 A-0378 A-0878 A-9877 A-1277 A-6416 A-1916 A-7660 A-8634 A-1791 G-3640 G-3390 G-2634 G-0502 L-5127 L-5752 51990 P-5387 P-7886 P-8137 A-0261 A-6301 T-9878 T-2382 T-5504
6.2 6.0 8.0 8.4 6.3 5.8 8.6 8.3 6.1 5.5 7.3 5.6 5.7 8.0 5.8 8.0 5.7 7.0 7.0 7.4 5.65 6.7 7.5 6.7 7.7 7.0 7.0 7.8 6.0 7.5 5.9 8.0 5.7 5.2 7.3 6.9 5.0 5.9 6.6 8.0 6.3 7.8
0.7116 0.33316 0.3665 0.16035 0.6287 0.28149 0.4029 0.25507 0.29521 1.0870 0.50898 0.8784 0.4863 0.31968 0.6537 0.55243 0.7603 0.50542 0.59855 0.57994 0.3382 0.60329 0.41960 0.37491 0.32703 0.43315 0.23664 0.29166 0.32184 0.40795 0.6489 0.27998 0.5223 0.39635 0.3654 0.25722 0.4384 0.4366 0.42839 0.21520 0.21994 0.19854
Note. Specified are the catalog number, the pH value, and the concentrations for the respective measurements, the latter in mole residues per liter.
by means of column cross validation as described in a previous article (16). In the following sections we will illustrate the spectra of the distinguishable (relative SEP , 1) absorbing groups. The unit of the extinction coefficient for all resulting spectra is liter/mol cm. To emphasize that our approach makes no assumptions about underlying components we deliberately did not resolve the resulting spectra into single bands by fitting certain band profiles. For comparison, however, we include the approximate band parameters wavenumber and extinction coefficient at the band maximum, a rough estimate of
3
the band width, and an assignment (13) to a vibrational mode for all residues in Table 2. The integrated intensity values are listed in Table 3. Side Chain Spectra Aspartic and glutamic acid. Both aspartate and glutamate have terminal side chain carboxyl groups and, therefore, their spectra are similar (Fig. 1). Mainly the antisymmetric OCOO2-stretching vibration appears in the relevant spectral region. The remaining differences result from the different side chain length. Asparagine and glutamine. Asparagine and glutamine both contain amide groups at the end of the side chain. The similarities of their side group spectra are clearly indicated in Fig. 2. In the amide-region the OCAO-stretching vibration and ONH2-deformation vibrations give rise to two absorbance bands. The slight shift to higher wavenumbers of the band referred to the OCAO-stretching vibration of asparagine in comparison to glutamine, and the intensity differences between the two NH2-deformation bands are due to the increase of the side chain length by one CH2 group. Arginine and lysine. Arginine and lysine both have basic amino side chain end groups. Based on the structural characteristics of the guanidinium group in arginine and the protonated primary amine group in lysine their side group spectra are quite different (Fig. 3). The symmetric and antisymmetric OCN3H5-stretching vibrations of the arginine side chain give rise to two strong absorbance bands, whereas the symmetric and antisymmetric ONH1 3 bending vibrations of lysine only cause two bands with low integral intensity. Tyrosine. The phenolic side chain absorption of tyrosine in the amide region is caused by skeletal vibrations of the aromatic ring (Fig. 3). The three absorption bands at 1614, 1599, and 1517 cm21 are also characteristic marker bands in phenol derivatives like cresols. The values given in Tables 2 and 3 can be compared with amino acid side chain absorptions published by Kalnin and Venyaminov (13). Although this group used curve fitting procedures, which were applied to amino acids and amino acid derivatives to obtain the spectral characteristics, the results are in good agreement with those obtained by our simple linear matrix model. It should be noted that in case of overlapping bands our half-width and maxima values are only rough estimates which would be optimized by fitting procedures. The overall bandshape (compare Figs. 1–3 with the figures in Ref. (13)) and the integrated intensities are in remarkable agreement. With the exception of lysine,
¨ BNER, AND ACKERMANN RAHMELOW, HU
4
TABLE 2
Spectral Parameters of Bands in the Amide Region (Intensities in liter/cm mol) Side chain
Position (cm21)
Intensity
Width
Assignment
Aspartate Glutamate Asparagine Asparagine Glutamine Glutamine Arginine Arginine Lysine Lysine Tyrosine Tyrosine Tyrosine Amino acid end group Amino acid end group Amino acid end group Peptide end group Peptide end group Peptide end group Peptide bond Peptide bond
1579 1556 1677 1612 1668 1611 1672 1636 1626 1527 1614 1599 1517 1630 '1600 1517 1677 '1630 1582 1647 1546
289 449 325 144 377 238 485 336 63 69 85 72 339 401 '375 189 247 '330 575 312 173
55 45 30 39 26 25 30 16 61 44 8 9 10 83 Covered 38 24 Covered 55 40 60
OCOO2, stretch, antisymmetric OCOO2, stretch, antisymmetric OCAO, stretch ONH2, deformation OCAO, stretch ONH2, deformation OCH3H5, stretch, antisymmetric OCN3H5, stretch, symmetric ONH1 3 , deformation, antisymmetric ONH1 3 , deformation, symmetric Unassigned ring vibration Unassigned ring vibration Ring vibration ONH1 3 , deformation, antisymmetric OCOO2, stretch, antisymmetric ONH1 3 , deformation, symmetric Not assigned ONH1 3 , deformation, antisymmetric OCOO2, stretch, antisymmetric Amide I Amide II
all calculated extinction coefficients are within the given error range.3 End Groups Since the amino and carboxyl end groups always occur as pairs in amino acids as well as in peptides and proteins they cannot be separated with the described method. On the other hand it is not of interest to distinguish between them and, therefore, only one common spectrum is assigned here. However, it proves insufficient to treat all end groups uniformly and independently from their environment. In fact, a strong effect of the pH value upon the end group absorptions is observed. Apart from that, it also makes a difference whether the end groups are part of an amino acid or a peptidic environment. The differences between end groups of di-, tri-, or longer peptides are less severe, which is in accordance with the observed pK values (17). The pK value changes by about 1 unit when going from an amino acid to the respective dipeptide, and only a minor increase is observed for longer peptides. This behavior explains the spectroscopic differences and, therefore, the end groups of these two species have to be considered separately. They are predicted independently from each other and hence give rise to two different spectra. Dependence upon the pH Value By assuming a linear dependence of the end group spectra on pH values between 5 and 9, in addition to 3
In Ref. (13), the integrated intensity values are calculated incorrectly and have to be divided by a factor of 2.
the side chain contributions the pH values of the different end groups also are recognized by SVD. Moreover, the reconstruction of the experimental spectra from the calculated structure spectra is improved. By including the measured pH values in our calculation we obtained a better model for the description of the peptide spectra and as a result refined spectra of the end groups. Figure 4 displays the spectrum of the end groups in amino acids at pH 7 as well as the difference spectrum induced by an increase of one pH unit. The pH-dependent spectral characteristics for the end groups of peptides are given in Fig. 5. For amino acid end groups a change from pH 7 to 8 results mainly in an intensity increase of a band at 1565 cm21, which can be assigned to a ONH2-deformation vibration. One possible explanation is an increased deprotonation of the amino groups as the medium becomes more basic. An alternative explanation of the intensity change around 1565 cm21 at higher pH values could be that the electrostatic and hydrogen 2 bonding interactions between the ONH1 3 and OCOO groups are decreased by the formation of ONH2 groups. In this case the OCOO2 groups behave like undisturbed carboxylate vibrations with absorption bands as in, e.g., glutamate. In case of peptidic end groups a change of 1 pH unit is observed mainly as a gain in intensity of the ONH1 3 -deformation vibration at 1636 cm21. Based on these observations the spectral range in which the NH-deformation vibration occurs is very sensitive to a pH change, and even for the lysine residue absorption changes are observed. With increasing
5
PROTEIN SIDE CHAIN ABSORBANCE TABLE 3
Integrated Intensities and Frequencies for the Various Absorbing Groups in the Amide Range and Relative Contribution of These Bands to an Average Protein Absorbance Amino acid
Integrated intensity
Frequency (%)
Absorbance
Contribution (%)
Aspartate Glutamic acid Arginine Lysine Asparagine Glutamine Tyrosine End groups Peptide bond
22,590 32,430 35,500 5,960 23,220 29,220 7,680 61,640 32,050
5.2 4.9 3.3 6.9 5.0 4.0 3.7 0.6 99.4
1175 1589 1171 411 1161 1169 284 370 31858
3.0 4.1 3.0 1.0 3.0 3.0 0.7 0.9 81.3
Note. The integrated intensities are in liter/cm2 mol.
pH the intensity of the antisymmetric ONH1 3 -deformation vibration at 1626 cm21 gains and the extinction coefficient of the symmetric ONH1 3 -vibration at 1527 cm21 decreases. It is important to note that we have measured our samples in the pH range of 5–9 and, therefore, changes in the protonated/deprotonated state, especially of the amino groups, are inherent in our spectral data set. In this limited pH range, which is also applied for most investigations of proteins, the undissociated side chain OCOOH groups are mainly absent. We know that a linear model for the pH-induced spectral changes is only an approximation of the real situation. However, the errors introduced at pH values of 7 6 1 are of minor
importance. In addition, a more correct model would presume the exact knowledge of all individual pK values. Peptide Bond Due to differences between various protein tests for the determination of protein concentrations, accurate extinction coefficients are rarely measured. However, if they are not known one usually refers to a normalization of the integrated intensity. To establish the contribution of a side chain a precise knowledge of the peptide bond spectrum of the protein backbone is required. With the calculated integral intensities of a peptide bond and the known number of peptide bonds
FIG. 1. Side chain spectra of aspartate at pH 6.9 (—) and glutamate at pH 7.1 (– –).
6
¨ BNER, AND ACKERMANN RAHMELOW, HU
FIG. 2. Side chain spectra of asparagine at pH 6.7 (– –) and glutamine at pH 7.8 (—).
it is possible to obtain the relative spectral contribution of the side chains in a protein from the normalized spectrum. Since several peptide bonds are inherent in our data set, the necessary extinction coefficients are automatically calculated. The obtained spectrum, dis-
played in Fig. 6, corresponds to a peptide bond without any preferred conformation and should be independent from the respective residues. This may be different from the spectrum of an isolated peptide such as Nmethyl acetamide; however, it should be remembered
FIG. 3. Side chain spectra of arginine at pH 6.1 (—), lysine at pH 6.0 (- - -), and tyrosine at pH 7.4 (– –).
PROTEIN SIDE CHAIN ABSORBANCE
7
FIG. 4. Amino acid end group spectrum at pH 7 (—) and spectral changes according to an increase of 1 pH unit.
that these spectral features are derived from the spectra of various peptides and, therefore, should represent the amide I/II spectrum of a protein peptide backbone and should be more appropriate for the evaluation of protein spectra.
The only exception is the peptide bond of a proline residue, which possesses no additional NH proton because of the ring formation, and hence the corresponding amide II band is absent. This is illustrated by the spectrum of polyproline in Fig. 6.
FIG. 5. Peptidic end group spectrum at pH 7 (—) and spectral changes according to an increase of 1 pH unit.
¨ BNER, AND ACKERMANN RAHMELOW, HU
8
FIG. 6. Calculated spectrum for a peptide bond (—) and polyproline spectrum measured at pH 4.3.
Contribution to a Protein Spectrum Which part of a protein spectrum, which is assumed to be additively composed, is made up by the various side chains depends on the frequency of the different amino acids in the polypeptide or protein. This relation may vary intensely, but as a first approximation the side chain absorption in the amide region can be obtained by averaging the known composition of a number of different proteins. Integrated Intensity Two factors determine the quantitative contribution of the side chains to the spectrum: the intensity of the bands and the respective frequency of the residues in the amino acid sequence. The integrated intensities in the range of 1700 –1480 cm21 weighted with the average occurrence per residue (in 39 proteins investigated by infrared spectroscopy (16)) are compiled in Table 3. For such an average protein spectrum an integrated intensity in the amide I and II region of about 39,200 liter/cm2 mol results. Almost 20% of the intensity is due to the side chain absorptions. Figures 7 and 8 serve as examples of how protein spectra are composed and how the side chain contributions may vary. For concanavalin A the side chain contribution amounts to 16.7% and amounts to 22% for erabutoxin. The fact that 20% of a protein absorption in the amide region is caused by the side chains does not automatically imply an effect on the quality of a secondary structure prediction by multivariate data analysis. The procedures
will cope with a constant absorbance contribution because it is simply treated as an offset. Only a large variation of the residues with absorbing groups within the different proteins will result in an interpretation of these contributions by a structure prediction and hence cause a disturbance. Therefore, it is important to know the standard deviation within the side chain contributions. Figure 9 displays the absolute side chain absorbance compared to the total protein absorbance and the standard deviations of protein spectra and side chain spectra. The SVD method will use variations in the spectra to calculate the structural properties of proteins as deviations from the average content. Therefore, Baumruk et al. used difference spectra to improve the numerical stability of their factor analysis method (5). However, using our method for difference spectra, no improved predictions are observed, because all coefficients are included for the prediction and thus a model for the difference spectra is equivalent to the model for the original spectra with one additional factor. When comparing the standard deviation of the different side chains with the standard deviation of the protein spectra, one recognizes that about 25% of the respective differences is caused by the amino acid composition. A precise secondary structure determination has to take this into account. Subtraction of Side Chain Contributions and Prediction Subtracting from protein spectra the corresponding fractions that do not arise from peptide bonds should
PROTEIN SIDE CHAIN ABSORBANCE
9
FIG. 7. Spectrum of concanavalin A (—) separated into peptide (– –) and side chain contribution (- - -).
lead to pure peptide spectra, which ought to possess better correlations with the secondary structures. As an example, this is illustrated by the two spectra of concanavalin A and erabutoxin. Both proteins exhibit high b sheet (40.5% for concanavalin A and 43.5%
for erabutoxin) and no helical content (calculated from X-ray data by the program DSSP (18)). However, the spectra are quite different, and a structure prediction directly from the raw IR spectra is in both cases not particularly successful. The b-sheet contents of the
FIG. 8. Spectrum of erabutoxin (—) separated into peptide (– –) and side chain contribution (- - -).
10
¨ BNER, AND ACKERMANN RAHMELOW, HU
FIG. 9. Averaged protein spectrum (—) standard deviation of the 39 protein spectra (– –), spectrum of the side chain contributions (- - -), and standard deviation of the side chain spectra (z z z).
toxin is underestimated and a high helical content is predicted, whereas in the other case the b-sheet content estimate is too high (the average errors are 20 17.4%, respectively). After a correction for side chain absorptions the spectra appear more similar, and the prediction errors drop to 14.2 and 5.7%, respectively. For individual proteins one may always find improved predictions. Therefore all 39 spectra of the data set are tested by column cross validation and the results are judged based on the standard error of prediction as explained in a previous article (16). Normalization and Amide Intensity Ratio Before starting with this systematic verification some assumptions have to be checked to determine whether they are still fulfilled after the corrections are made for side chain absorbances. The normalization has to be mentioned first, since it remains a basic assumption that all secondary structure types exhibit the same integrated intensity. The standard of a common band intensity has to be achieved by a normalization after compensating for other variable contributions. Furthermore, our water subtraction strategy for proteins in solution (19) is based on constant intensity ratios of the amide bands. Hence, a fine tuning of the water subtraction, which readjusts the intensity ratio of the two amide bands to 1.393, is inserted before normalizing. This constant amide I/amide II intensity
ratio is derived from an investigation of dried protein films with different secondary structure content (19). Very little variation of this factor was observed, while the proteins used covered the whole range of structural contents, thus making it possible to incorporate it into an accurate subtraction algorithm. The fact that the integrated intensity of both amide bands is independent of the secondary structure type is widely used. However, an investigation on polypeptides and fibrous proteins indicates that the integrated intensity of the amide I band of a pure a helix or a pure b sheet is larger than that of a pure random conformation, whereas all three conformations exhibit equal values for the amide II band (14, 15). In our opinion the possibility of an erroneous water subtraction and their use of extreme pH values as well as the inclusion of ORD results for the calculation of a 100% helix and sheet structure may explain the differences. It is interesting to note that the integrated intensity of the amide I band in a 100% random structure obtained directly from a measurement of polyalanine at pH 7 is in accordance with our results (compare Fig. 1 of Ref. (14) with the spectrum in Fig. 6). This group also cited that for an a helix the integrated intensity of the amide I band in D2O equals that of the random conformation in H2O. In summary, their results of a constant amide II band intensity and our findings of a constant amide I/amide II ratio, lead to the conclusion that the integrated intensity of amide bands can be considered as
PROTEIN SIDE CHAIN ABSORBANCE
independent of the peptide backbone conformation. This justifies a normalization of the different protein spectra after subtraction of the side chain absorbances. For the thus-pretreated data of 39 proteins with known X-ray structure (16), an SVD analysis indeed yields a better secondary structure prediction when compared to the same data without side chain subtraction. The earlier reported optimum for seven factors with SEP values of 9.2 for b sheet and 12.3 for helix is improved, especially for b sheets (8.2 and 12.2). Similar results are obtained for other sets of parameters as well (e.g., with six factors 8.1, and 12.2). The improvement of the prediction accuracy is small but significant. More profound effects can be expected if proteins with unusually high amounts of absorbing side chains are investigated. CONCLUSIONS
Based on our investigations we will condense our results to several characteristic points: ● Useful estimates of protein side chain absorbances can be obtained from measurements of amino acids and di-, tri, and polypeptides. In this context it is remarkable how the spectral parameters derived by our matrix method are in agreement with the parameters derived by Venyaminov and Kalnin (13) using curve fitting procedures. 2 1 ● OCOO and ONH3 end groups of amino acids and peptides differ in their spectroscopic properties. Their extinction coefficients are strongly dependent on the pH, but the effects can be sufficiently described by a simple linear matrix model. ● Almost 20% of the protein absorption in the amide region is caused by the side chains. ● The variation of the amino acid side chains is large, and suggests compensation for side chain absorbance.
11
● This subtraction improves the secondary structure prediction. The better correlation of the pure peptide absorbances is documented by lower SEP values for various multivariate data analysis methods.
REFERENCES 1. Goormaghtigh, E., Cabiaux, V., and Ruysschaert, J. M. (1990) Eur. J. Biochem. 193, 409 – 420. 2. Dousseau, F., and Pe`zolet, M. (1990) Biochemistry 29, 8771– 8779. 3. Lee, D. C., Haris, P. I., Chapman, D., and Mitchell, R. C. (1990) Biochemistry 29, 9185–9193. 4. Sarver, R. W., and Krueger, W. C. (1991) Anal. Biochem. 194, 89 –100. 5. Baumruk, V., Pancoska, P., and Keiderling, A. (1996) J. Mol. Biol. 259, 774 –791. 6. Elliot, A., and Ambrose, E. J. (1950) Nature 165, 921–922. 7. Miyazawa, T. (1960) J. Chem. Phys. 32, 1647–1652. 8. Krimm, S., and Bandekar, J. (1986) Adv. Prot. Chem. 38, 181. 9. Susi, H., and Byler, D. M. (1986) Methods Enzymol. 130, 290 – 311. 10. Surewicz, W. K., Mantsch, H. H., and Chapman, D. (1993) Biochemistry 32, 389 –394. 11. Compton, L. A., and Johnson, W. C. (1986) Anal. Biochem. 155, 155–167. 12. Chirgadze, Y. N., Fedorov, O. V., and Trushina, N. P. (1975) Biopolymers 14, 679 – 694. 13. Venyaminov, S. Y., and Kalnin, N. N. (1990) Biopolymers 30, 1243–1257. 14. Venyaminov, S. Y., and Kalnin, N. N. (1990) Biopolymers 30, 1259 –1271. 15. Kalnin, N. N., Baikalov, I. A., and Venyaminov, S. Y. (1990) Biopolymers 30, 1273–1280. 16. Rahmelow, K., and Hu¨bner, W. (1996) Anal. Biochem. 241, 5–13. 17. Cantor, C. R., and Schimmel, P. R. (1980) Biophysical Chemistry, Freeman, New York. 18. Kabsch, W., and Sander, C. (1983) Biopolymers 22, 2577–2637. 19. Rahmelow, K., and Hu¨bner, W. (1997) Appl. Spectrosc. 51, 160 – 170.