Vibrational Spectroscopy 35 (2004) 87–92
Structural characterization of proteins and viruses using Raman optical activity Ewan W. Blancha,*, Iain H. McCollb, Lutz Hechtb, Kurt Nielsenc, Laurence D. Barronb,1 a
Department of Biomolecular Sciences, UMIST, P.O. Box 88, Manchester M60 1QD, UK b Department of Chemistry, University of Glasgow, Glasgow G12 8QQ, UK c Department of Chemistry, DTU 207, Technical University of Denmark, DK-2800 Lyngby, Denmark Received 17 September 2003; received in revised form 27 November 2003; accepted 1 December 2003 Available online 28 January 2004
Abstract The sensitivity to chirality of Raman optical activity (ROA), which we measure as a small difference in vibrational Raman scattering from chiral molecules in right- and left-circularly polarized incident light, makes it a powerful probe of the structure and behaviour of biomolecules in aqueous solution. Protein ROA spectra provide information on the secondary and tertiary structure of the polypeptide backbone and effects of the local environment, including hydration and the side chain conformations of particular residues. The large number of structure-sensitive bands in protein ROA spectra is especially favourable for fold determination using pattern recognition techniques. Intact viruses may also be studied and their ROA spectra provide information not only on the folds of the coat proteins but also about the conformation of the encapsidated nucleic acid. In this article we present the ROA spectra of several proteins and viruses in order to illustrate some of the applications of ROA spectroscopy in biomolecular research. # 2004 Elsevier B.V. All rights reserved. Keywords: Raman optical activity; Chirality; Protein conformation; Fold recognition; Virus structure
1. Introduction High resolution nuclear magnetic resonance (NMR) spectroscopy and X-ray crystallography dominate the field of structural biology due to their ability to reveal the details of biomolecular structure at atomic resolution and will continue to be invaluable in the future. However, there are limitations to their applicability as many biomolecules are difficult to crystallize while many others are too large to be solved by current NMR methods. Although vibrational spectroscopic techniques do not usually provide information at atomic resolution, they will become increasingly valuable since they can be routinely applied to a wide range of biological systems and provide a large amount of information about structure and dynamics [1–4]. A promising vibrational spectroscopic technique for the study of biological molecules is Raman optical activity (ROA), a novel form of chiroptical spectroscopy which * Corresponding author. Tel.: þ44-161-200-5819; fax: þ44-161-236-0409. E-mail address:
[email protected] (E.W. Blanch). 1 Fax: þ
[email protected].
0924-2031/$ – see front matter # 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.vibspec.2003.12.005
measures a tiny difference in the intensity of vibrational Raman scattering from chiral molecules in right- and leftcircularly polarized incident light or, equivalently, a small circularly polarized component in the scattered light [3–6]. The incisiveness of ROA in probing biomolecular structure and behaviour derives from the fact that, like the complementary technique of vibrational circular dichroism (VCD) [3,4,6,7], it is a form of vibrational optical activity and is therefore sensitive to chirality associated with all the 3N6 fundamental molecular vibrational transitions, where N is the number of atoms. This article presents a brief survey of the application of ROA to biomolecular science from several recent studies on proteins and viruses.
2. Experimental The normal vibrational modes of biopolymers are highly complex as they contain contributions from local vibrational coordinates in both the backbone and the side chains. ROA is able to cut through the complexity of the corresponding vibrational spectra as the largest ROA signals originate from those vibrational coordinates which sample the most rigid
88
E.W. Blanch et al. / Vibrational Spectroscopy 35 (2004) 87–92
and chiral parts of the structure, which are usually found within the backbone and give rise to ROA band patterns characteristic of the backbone conformation. As a consequence, the conventional Raman spectra of proteins are usually dominated by bands arising from the amino acid side chains which may obscure the peptide backbone bands while ROA spectra are dominated by bands characteristic of secondary, loop and turn structures. The apparatus and sample conditions used to measure the ROA spectra presented here have been described previously [8–10].
3. Results and discussion 3.1. Proteins Fig. 1a–c shows the backscattered Raman and ROA spectra of the a-helical protein human serum albumin
Fig. 1. Backscattered Raman and ROA spectra of (a) human serum albumin, (b) jack bean concanavalin A, and (c) hen lysozyme at pH 5.4 and 20 8C.
(HSA), the jelly roll b-barrel protein jack bean concanavalin A and the a þ b protein hen lysozyme, respectively. The folds of these proteins are very different to each other and this is reflected in the large differences between their ROA spectra. Furthermore, the main features of each ROA spectrum correlate with the details of the corresponding X-ray crystal structure from the Protein Data Bank (HSA PDB code 1ao6 reports 69.2% a-helix, 1.7% 310-helix with the rest being loops and turns; jack bean concanavalin A PDB code 2cna reports 43.5% b-strand, 1.7% a-helix and 1.3% 310-helix with the remainder being hairpin bends and long loops; hen lysozyme PDB code 1lse reports 28.7% a-helix, 10.9% 310-helix and 6.2% b-sheet). The strong sharp positive bands at 1340 and 1342 cm1 in the ROA spectra of HSA and hen lysozyme, respectively, are assigned to a hydrated form of a-helix while the positive bands at 1300 and 1299 cm1, respectively, appear to be associated with a-helix in a more hydrophobic environment. The relative intensities of these two bands appear to correlate with the exposure of the polypeptide backbone to the solvent within the elements of a-helix in each case, as previously reported for a-helical polypeptides and other globular proteins [11,12] and the coat proteins of intact filamentous bacteriophages [13]. Conventional Raman bands at similar wavenumbers have been assigned to a-helix in polypeptides [14] and filamentous bacteriophages [15]. The ROA bands assigned to a-helix in this region may be related to a number of these Raman bands, with the ROA intensities and exact wavenumbers being a function of the perturbations (geometric and/or due to various types of hydration) to which the particular helical sequence is subjected. The amide I ROA couplets of HSA, negative at 1640 cm1 and positive at 1665 cm1, and hen lysozyme, negative at 1641 cm1 and positive at 1665 cm1, are also characteristic of a-helix and correspond to the wavenumber range 1645– 1655 cm1 for a-helix bands in conventional Raman spectra [2,16,17]. Other ROA bands with positive intensity originating from a-helix appear in the range 870–950 cm1 with the detailed band structure in this region appearing to show a dependence upon side chain composition, helix length and the presence of irregularities. The large positive peak 1297–1300 cm1 observed for proteins with a lysozyme-type fold may be boosted by bands other than from a-helix. Proteins with a lysozyme-type fold often contain an unusually high 310-helix component. It is possible that the positive band 1299 cm1 contains a contribution from a 310-helix band, with a positive signal at 1295 cm1. There may also be bands from turns contributing in this region. A sharp negative band at 1241 cm1 in the ROA spectrum of concanavalin A is assigned to b-structure. Similar bands have been observed in the ROA spectra of other proteins containing b-sheet [11], as has another negative band at 1220 cm1 which appears to be associated with a distinct variant of b-structure, possibly hydrated. It appears that the true signature of b-sheet in the amide III
E.W. Blanch et al. / Vibrational Spectroscopy 35 (2004) 87–92
89
region may be a couplet, negative at low wavenumber and positive at high [10]. Hydration, side chain interactions and structural irregularity may influence the positions of the negative and positive bands, and bands from loops and turns may also contribute in this region. We assign the positive band 1295 cm1 in the spectrum of concanavalin A as being the high wavenumber signal of the b-sheet couplet for this protein. Amide III bands from b-sheet in conventional Raman spectroscopy are assigned to the region 1230– 1245 cm1 [17]. The ROA bands assigned to b-sheet in this region are more intense than the corresponding Raman bands as ROA band intensities are sensitive to chirality and, therefore, the dynamics of conformational groups in the vicinity of the chiral centres. Residues within less mobile structural elements such as b-sheets give rise to ROA signatures with greater intensity than those residues in more flexible or dynamic structural elements. The amide I couplet, negative at 1658 cm1 and positive at 1677 cm1, is another signature of b-sheet and is easily distinguished from the amide I couplet produced by a-helix, which typically occurs 5–20 cm1 lower. This correlates with b-sheet amide I bands in conventional Raman spectroscopy which occur in the range 1665–1680 cm1 [17]. The small shoulder on the amide I couplet in the ROA spectrum of hen lysozyme at 1683 cm1 corresponds to the presence of the small amount of b-sheet in that protein. A number of ROA bands associated with b-sheet may appear in the backbone skeletal stretch region, as found here for concanavalin A and hen lysozyme. However, the details of wavenumber, intensity and band shape are variable, possibly reflecting differences in local conformations and amino acid compositions found within different b-sheets. Proteins containing a significant amount of b-sheet often display additional ROA signals originating in loops and turns. Negative ROA bands in the range 1340– 1380 cm1 appear to originate in b-hairpin bends, for example the band of medium intensity appearing at 1345 cm1 in the spectrum of concanavalin A. Many b-sheet proteins also show a strong positive ROA band at 1314–1325 cm1 thought to be a signature of the PPII helical elements known from X-ray crystal structures to occur in some of the longer loops between elements of secondary structure [18,19]. An example of such a PPII signal is found at 1316 cm1 in the ROA spectrum of jack bean concanavalin A. The ROA spectrum of hen lysozyme also shows a relatively large positive band 1554 cm1 assigned to the W3type vibrational mode of the indole ring of tryptophan residues. This has been discussed in greater detail elsewhere [10,20].
are developing a pattern recognition program, based on principal component analysis (PCA), to identify protein folds from ROA spectral band patterns [10]. The method is similar to one developed for the determination of the structure of proteins from VCD [21] and UVCD [22] spectra, but is expected to provide enhanced discrimination between different structural types since protein ROA spectra contain many more structure-sensitive bands than either VCD or UVCD. Fig. 2 shows a scatter plot of the expansion coefficients of the two most important PCA basis functions for our current database of 75 protein, polypeptide and virus spectra. The protein positions are colour-coded in terms of the seven structural types listed in the figure caption. This serves to separate the spectra into clusters corresponding to different types of protein structure, which enables structural similarities between proteins of unknown structure with those of known structure to be identified. This also provides a useful initial classification that will be refined in later work to provide, among other things, quantitative estimates of the various types of structural elements, such as a-helix, b-sheet, loops and turns. MOLSCRIPT diagrams [23] are shown for a number of proteins with well-defined fold types which serve to illustrate the trends of increasing a-helix content to the left, increasing b-sheet content to the right, and increasing disordered or irregular structure from bottom to top. Most of the polypeptides and proteins in our set of ROA spectra may be divided into two sets, one containing ahelices and b-sheets in various amounts, the other containing mainly disordered or irregular structures. This is reflected in the expansion coefficients of the two most important basis functions. For the first and most significant basis function, large coefficients are associated with polypeptides and proteins containing high proportions of a-helix or b-sheet. Since the a-helix and b-sheet contents are inversely correlated, the coefficients associated with a-helix and b-sheet are opposite in sign, negative for a-helix and positive for b-sheet in Fig. 2. The coefficients of the second basis function reflect the amount of disordered structure, with large positive coefficients being associated with mainly unfolded polypeptides and proteins. This leads to the inverted V-shape in the distribution of data points shown in Fig. 2 as unfolded or disordered proteins contain little or no a-helix or b-sheet and so have small coefficients, either positive or negative in sign, for the first basis function. The raw basis functions do not have any direct physical interpretation beyond this. Higherorder basis functions will also be related to important spectral features and these will be incorporated in future developments of the PCA programme.
3.2. Principal component analysis
3.3. Nucleic acid conformation in viruses
As protein ROA spectra contain bands characteristic of loops and turns in addition to bands characteristic of secondary structure, they should provide information about the overall three-dimensional solution structure of proteins. We
Knowledge of the structure of viruses at the molecular level is essential for enterprises such as structure-guided antiviral drug design [24]. However, the application of techniques such as X-ray crystallography or fibre diffraction
90
E.W. Blanch et al. / Vibrational Spectroscopy 35 (2004) 87–92
Fig. 2. Plot of the PCA coefficients for the two most important basis functions for a set of 75 polypeptide, protein and virus ROA spectra. Definitions of the structural types are as follows: all alpha, 60% a-helix with little or no other secondary structure; mainly alpha, 35% a-helix and a small amount of b-sheet (5–15%); alpha beta, significant amounts of a-helix and b-sheet; mainly beta, 35% b-sheet and a small amount of a-helix (5–15%); all beta, 45% b-sheet with little or no other secondary structure; mainly disordered/irregular, little secondary structure; all disordered/irregular, no secondary structure. MOLSCRIPT diagrams of the following proteins, with their corresponding PDB codes listed in brackets, are shown as examples; HSA (1ao6), tobacco mosaic virus coat protein (1vtm), rat metallothionein (4mt2), Bowman–Birk protease inhibitor (1pi2), jack bean concanavalin A (2cna), MS2 phage coat protein (1msc), bovine ribonuclease A (1rbx), human prion C-terminal domain (1qlz), hen lysozyme (1lse), bovine insulin (2ins), and fd phage coat protein (1ifj).
is often hampered by practical difficulties. Conventional Raman is valuable in studies of intact viruses at the molecular level as it is able to simultaneously probe both the protein and nucleic acid constituents [17,25]. The additional incisiveness of ROA further enhances the value of Raman spectroscopy in structural virology [9,13,20,26]. Determination of the structures of the nucleic acid component of intact viruses has proven difficult in even the bestresolved X-ray crystal structures. However, we have used ROA to determine the conformation of nucleic acid molecules in cowpea mosaic virus (CPMV), the type member of the comovirus genus [9]. CPMV has a bipartite genome, with two RNA molecules (called RNA-1 and RNA-2) being separately encapsidated. CPMV particles can be separated into four distinct isolates by ultracentrifugation with a CsCl gradient [27]. The top component (T-CPMV) consists of empty protein capsids, the middle component (M-CPMV) contains capsids with a molecule of RNA-2 bound, the bottom-upper component (BU-CPMV) contains capsids with
RNA-1 bound, and the bottom-lower component (BLCPMV) contains capsids with RNA-2 bound plus Csþ ions permeating the interior of the particles. The Raman and ROA spectra of T-CPMV, M-CPMV and BU-CPMV are shown in Fig. 3a–c, respectively. CPMV has an icosahedral capsid of T ¼ 3 symmetry, with two protein chains forming a large (L) coat protein and a third protein chain forming a small (S) coat protein. Despite the low sequence homology between these three protein chains, X-ray crystallography shows that they exhibit very similar eight-stranded jelly roll b-barrel folds [28]. The details of the T-CPMV ROA spectrum are consistent with those of a jelly roll b-barrel fold (see the ROA spectrum of concanavalin A in Fig. 1b). There are several large changes in the ROA spectra of MCPMV and BU-CPMV due to new bands originating from the RNA-2 and RNA-1 molecules, which constitute 24 and 34.5%, respectively, of the particles by mass. These RNA bands are isolated in Fig. 4a and b, which show the Raman and ROA spectra of the bound RNA-2 and RNA-1 molecules
E.W. Blanch et al. / Vibrational Spectroscopy 35 (2004) 87–92
91
Fig. 3. Backscattered Raman and ROA spectra of the icosahedral viruses (a) T-CPMV, (b) M-CPMV and (c) BU-CPMV at pH 7.0 and 20 8C.
Fig. 4. Difference Raman and ROA spectra for (a) (M-T)-CPMV and (b) (BU-T)-CPMV, and backscattered Raman and ROA spectra of Mg2þ-free tRNAPhe at pH 6.8 and 20 8C.
obtained by the subtractions (M-CPMV)-(T-CPMV) and (BU-CPMV)-(T-CPMV), respectively. Both the parent Raman and ROA spectra were normalized with respect to sample concentrations and experimental conditions before subtraction to produce Fig. 4a and b. Most of the bands in these difference Raman and ROA spectra are insensitive to variations of 20% in the scaling factors used for the normalization procedure. Only the couplets in the amide I regions of the difference ROA spectra demonstrate much sensitivity to this subtraction process within the variation of 20% of scaling factors, presumably because most of the ROA intensity in this region originates in the protein component. Small distortions in the baselines of the difference spectra were also subtracted. This is usually a straightforward process when calculating ROA difference spectra as the parent ROA spectra are by definition difference measurements that contain several well-defined regions exhibiting zero intensity. These regions serve to anchor the baseline in the corresponding difference spectrum and the baseline between these regions can be obtained by interpolation. The high signal-to-noise levels observed in the ROA difference
spectra shown in Fig. 4a and b are due to the high RNA contents of the M-CPMV and BU-CPMV particles (24 and 34%, respectively). Bands corresponding to conformational changes induced in the protein structure by nucleic acid binding may also be present in Fig. 4a and b. These ROA difference spectra are obviously very similar in appearance and can be compared with that of Mg2þ-free tRNAPhe [29], shown in Fig. 4c. The strong negative, positive, negative triplet at 992, 1048, 1091 cm1 in the ROA spectrum of tRNAPhe is assigned to the C30 -endo sugar puckers characteristic of the A-type helical conformation and indicate that encapsidated RNA-2 and RNA-1 both adopt A-type helical structures. The ability of ROA difference spectroscopy to probe the conformations of viral subunits is illustrated in this case by the identification of this important triplet feature in the difference spectrum despite the presence of overlapping bands from the capsids in the parent spectra in Fig. 3. As mentioned above, the high RNA contents of the M-CPMVand BU-CPMV particles leads to the high signal-to-noise ratios for the difference spectra shown. This work has provided new information on the RNA
92
E.W. Blanch et al. / Vibrational Spectroscopy 35 (2004) 87–92
structure of CPMV since the nucleic acid is not observed in the X-ray crystal structure [29].
4. Conclusion The sensitivity of ROA to molecular chirality makes it a valuable new tool for investigating biomolecular structure and behaviour in solution. ROA may now be applied routinely to a wide range of proteins and viruses and is already providing information complementary to that obtained from high resolution techniques, such as X-ray crystallography and NMR spectroscopy. The large number of resolved structure-sensitive bands in protein ROA spectra compared with other spectroscopic techniques makes pattern recognition methods, such as PCA especially valuable. This will greatly facilitate applications of ROA to high throughput protein fold recognition in structural proteomics and to structural virology. With the expected availability of a commercial instrument in the near future, we hope that this brief review will encourage wide use of vibrational ROA spectroscopy in biomolecular science.
Acknowledgements We thank the EPSRC and BBSRC for research grants, and the students and collaborators who have contributed to the Glasgow biomolecular ROA programme over many years.
References [1] B.R. Singh (Ed.), Infrared Analysis of Peptides and Proteins: Principles and Applications, ACS Symposium Series 750, American Chemical Society, Washington, DC, 2000. [2] P.R. Carey, Biochemical Applications of Raman and Resonance Raman Spectroscopies, Academic Press, New York, 1982. [3] M. Diem, Modern Vibrational Spectroscopy, Wiley, New York, 1993. [4] P.L. Polavarapu, Vibrational Spectra: Principles and Applications with Emphasis on Optical Activity, Elsevier, Amsterdam, 1998.
[5] L.D. Barron, L. Hecht, in: N. Berova, K. Nakanishi, R.W. Woody, Circular Dichroism, Principles and Applications, second ed., Wiley, New York, 2000, p. 667. [6] L.A. Nafie, Ann. Rev. Phys. Chem. 48 (1997) 357. [7] T.A. Keiderling, in: N. Berova, K. Nakanishi, R.W. Woody, Circular Dichroism: Principles and Applications, second ed., Wiley, New York, 2000, p. 621. [8] L. Hecht, L.D. Barron, E.W. Blanch, A.F. Bell, L.A. Day, J. Raman Spectrosc. 30 (1999) 815. [9] E.W. Blanch, L. Hecht, C.D. Syme, V. Volpetti, G.P. Lomonossoff, L.D. Barron, J. Gen. Virol. 83 (2002) 2593. [10] L.D. Barron, E.W. Blanch, I.H. McColl, C.D. Syme, L. Hecht, K. Nielsen, Spectroscopy 17 (2003) 101. [11] L.D. Barron, L. Hecht, E.W. Blanch, A.F. Bell, Prog. Biophys. Mol. Biol. 73 (2000) 1. [12] G. Wilson, L. Hecht, L.D. Barron, J. Chem. Soc. Faraday Trans. 92 (1996) 1503. [13] E.W. Blanch, A.F. Bell, L. Hecht, L.A. Day, L.D. Barron, J. Mol. Biol. 290 (1999) 1. [14] S.-H. Lee, S. Krimm, J. Raman Spectrosc. 29 (1998) 73. [15] M. Tsuboi, M. Suzuki, S.A. Overman, G.J. Thomas Jr., Biochemistry 39 (2000) 2677. [16] A.T. Tu, Adv. Spectrosc. 13 (1986) 47. [17] T. Miura, G.J. Thomas Jr., Proteins: Structure, Function and Engineering, in: B.B. Biswas, S. Roy, Subcellular Biochemistry, vol. 24, Plenum Press, New York, 1995, p. 55. [18] A.A. Adzhubei, M.J.E. Sternberg, J. Mol. Biol. 229 (1993) 472. [19] B.J. Stapley, T.P. Creamer, Protein Sci. 8 (1999) 587. [20] E.W. Blanch, L. Hecht, L.A. Day, D.M. Pederson, L.D. Barron, J. Am. Chem. Soc. 123 (2001) 4863. [21] P. Pancoska, S.C. Yasui, T.A. Keiderling, Biochemistry 30 (1991) 5089. [22] S.Y. Venyaminov, J.T. Yang, in: G.D. Fasman, Circular Dichroism and the Conformational Analysis of Biomolecules, Plenum Press, New York, 1996, p. 69. [23] P.J. Kraulis, J. Appl. Cryst. 24 (1991) 946. [24] W. Chiu, R.M. Burnett, R.L. Garcia (Eds.), Structural Virology of Viruses, Oxford University Press, New York, 1997. [25] G.J. Thomas Jr., Ann. Rev. Biophys. Biomol. Struct. 28 (1999) 1. [26] E.W. Blanch, D.J. Robinson, L. Hecht, C.D. Syme, K. Nielsen, L.D. Barron, J. Gen. Virol. 83 (2002) 241. [27] G.P. Lomonossoff, J.E. Johnson, Prog. Biophys. Mol. Biol. 55 (1991) 107. [28] T. Lin, Z. Chen, R. Usha, C.V. Stauffacher, J.-B. Dai, T. Schmidt, J.E. Johnson, Virology 265 (1999) 20. [29] A.F. Bell, L. Hecht, L.D. Barron, J. Am. Chem. Soc. 120 (1998) 5820.