Principal component analysis based methodology to distinguish protein SERS spectra

Principal component analysis based methodology to distinguish protein SERS spectra

Journal of Molecular Structure 993 (2011) 500–505 Contents lists available at ScienceDirect Journal of Molecular Structure journal homepage: www.els...

959KB Sizes 2 Downloads 77 Views

Journal of Molecular Structure 993 (2011) 500–505

Contents lists available at ScienceDirect

Journal of Molecular Structure journal homepage: www.elsevier.com/locate/molstruc

Principal component analysis based methodology to distinguish protein SERS spectra G. Das a,⇑, F. Gentile a,b, M.L. Coluccio b, A.M. Perri b, A. Nicastri b, F. Mecarini a, G. Cojoc b, P. Candeloro b, C. Liberale a,b, F. De Angelis a,b, E. Di Fabrizio a,b a b

Nanofabrication Div., Italian Institute of Technology, Via Morego 30, 16163 Genoa, Italy Lab. BIONEM, Dipartimento di Medicina Sperimentale e Clinica, Università ‘‘Magna Graecia’’ di Catanzaro, Catanzaro, Italy

a r t i c l e

i n f o

Article history: Available online 30 December 2010 Keywords: Conformational analysis SERS Proteins Principal component analysis

a b s t r a c t Surface-enhanced Raman scattering (SERS) substrates were fabricated using electro-plating and e-beam lithography techniques. Nano-structures were obtained comprising regular arrays of gold nanoaggregates with a diameter of 80 nm and a mutual distance between the aggregates (gap) ranging from 10 to 30 nm. The nanopatterned SERS substrate enabled to have better control and reproducibility on the generation of plasmon polaritons (PPs). SERS measurements were performed for various proteins, namely bovine serum albumin (BSA), myoglobin, ferritin, lysozyme, RNase-B, a-casein, a-lactalbumin and trypsin. Principal component analysis (PCA) was used to organize and classify the proteins on the basis of their secondary structure. Cluster analysis proved that the error committed in the classification was of about 14%. In the paper, it was clearly shown that the combined use of SERS measurements and PCA analysis is effective in categorizing the proteins on the basis of secondary structure. Ó 2010 Elsevier B.V. All rights reserved.

1. Introduction Surface-enhanced Raman scattering (SERS) is a novel and promising technique to characterize biomaterials due to its enormous potential to provide chemical information with unprecedented accuracy. This technique is based on the giant increase in Raman signal that occurs when the biomaterial at study (even of very low concentration) is in close proximity of the SERS substrate surface [1–3]. Upon the interaction of laser radiation with the sharp metallic surfaces, an enhancement in the local electric field and thereby a significant increase in the Raman signal [4] can be observed. It is also important to point out that SERS, as all other Raman related techniques, is well suited for the analysis in aqueous environment and thus may be conveniently and extensively used in biotechnology applications for the analysis of organic compounds. Many ways exist to fabricate SERS substrates, ranging from metal sputtering, electro-chemical techniques to produce nanocolloids [5–7], nanosphere lithography [8] to e-beam lithography techniques [9]. Among these techniques, e-beam lithography coupled with electro-chemical techniques is one of the most promis-

⇑ Corresponding author. Address: Nanofabrication Division, Italian Institute of Technology, Via Morego 30, 16163 Genoa, Italy. Tel.: +39 010 71781217. E-mail address: [email protected] (G. Das). 0022-2860/$ - see front matter Ó 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.molstruc.2010.12.044

ing to achieve controllable and reproducible SERS substrate with the great prospective to detect low concentrated molecules (down to attomole) [10]. Once the SERS spectra are obtained they should be conveniently analyzed and deconvoluted to extract the chemical and even structural information. In these regards, different methods can be employed for spectral analysis; difference Raman techniques [11], 2D correlation [9], principal component analysis (PCA) [12], cluster analysis [13], etc. In the last few years, PCA technique has been recognized as a fundamental way to differentiate cells on the basis of their biochemical features, and has been advantageously applied to a number of spectral data-sets regardless their source that can be Raman scattering [14,15], nuclear magnetic resonance [16], fluorescence [17], Raman optical activity [17]. PCA is a multivariate method of analysis whose main concern is to reduce the dimensionality of a multi-dimensional data-set comprising a large number of variables, while retaining those characteristics of the original data-set that contribute most to its variance. It is an effective technique that delivers the ability to classify or categorize SERS spectra that via visual examination are otherwise harshly distinguishable. Herein, we utilize this technique to distinguish various proteins and, thereafter, place them in 2D PCA co-ordinate space based on their secondary structures. In this paper, we employed PCA and cluster analysis on SERS protein spectra, obtained by using a novel SERS device [10], comprised of gold nanoaggregates on gold–chromium base-plated Si

G. Das et al. / Journal of Molecular Structure 993 (2011) 500–505

wafer. The durability of the substrate is guaranteed by the use of gold nanoparticles instead of silver. SERS spectra were carried out on lysozyme, BSA, RNase-B, myoglobin, ferritine, trypsin, and two milk proteins (casein and lactalbumin). There are many cases when the researchers have to distinguish various similar proteins and they have to choose X-ray crystallography, NMR spectrometer or dual polarization interferometry in order to distinguish the proteins. Prior measurements they also need to follow a complicated sample preparation. Instead, using PCA over SERS spectra, the screening on the proteins can be made very fast with minimum sample preparation. The scientific motivation of this research is that with the help of PCA over SERS spectra, proteins can be differentiated on the basis of secondary structures even when the SERS spectra looks alike. In other word, this technique can be used as a fast screening of proteins to monitor them with great information in molecular vibration level [14,15]. This analysis can be done both, for single protein and/or for mixture of proteins of a limited number of components (proteins). The major fabrication novelty of the present work is constituted by gold nanoaggregates with a very small inter-spatial distance, down to 10 nm, that was achieved by using high resolution e-beam patterning [10]. To be noted, the substrate with the nanoaggregates of gold nanoparticles is the best suited SERS substrate and the inter-spatial distance (less inter-particle distance means further signal enhancement) increases further the hot-spot sites [18]. Various proteins, namely, lysozyme, bovine serum albumin (BSA), myoglobin, ribonuclease-B (RNase-B), ferritin, trypsin, a-casein and a-lactalbumin, are characterized. A laser with 830 nm wavelength was used for all the experiments which assures a series of advantages to the SERS measurements, that are: (i) the reduction in fluorescence background, (ii) lower probability of protein conformational alteration during the measurements due to lower absorption for this wavelength and (iii) reduction/disappearance of Tyr and Phe vibrational bands at around 1615 cm1 with respect to the case in visible/UV laser excitation [19]. Herein, we introduce and explain 2D plot where proteins are grouped and arranged in the plane on the basis of their secondary structure. The plot was obtained employing the PCA analysis on the SERS spectra in a way largely described in Section 2 and Supplementary information A. In the present work, for the first time at the best of our knowledge, PCA analysis was successfully implemented for SERS spectra to discriminate the proteins and locate them on the basis of their secondary structure. Well assessed clustering techniques were finally utilized for grouping the proteins: different subsets were found on the basis of a dissimilarity (or distance) function, with error in classification of about 14%.

501

2. Experimental 2.1. Device fabrication Reproducible SERS devices were fabricated onto an Au–Cr baseplated silicon wafer using high resolution fabrication techniques and, in particular, electron-beam lithography techniques as shown in Fig. 1. A ’Crestec (model: CABL-9000C)’ e-beam lithography system was used for all the experiments. Patterns of nano-holes were conveniently obtained upon a layer of electronic-resist spinned onto the wafers. The resist film was then selectively removed by immersion of 35 s in ZEP developer (ZED-N50). Nanoaggregates of gold nanoparticles were grown inside the nano-holes using electro-plating technique and a gold–potassium cynide solution. Nanostructures of gold were finally obtained with diameter of 80 nm and a mutual distance between the aggregates (gap) ranging from 10 to 30 nm [10]. 2.2. Sample preparation BSA, myoglobin, ferritin (all three a-helix dominated (>65%)), lysozyme, RNase-B, a-casein, a-lactalbumin and trypsin (random coil secondary structure dominated (>45%)) (Sigma–Aldrich) were utilized for SERS measurements without any further treatment. All the proteins were prepared with 7 lM conc. in high purity water. The different solution, one for each protein, was analyzed using drop-coating deposition Raman (DCDR) [20] methods where the solution of interest was micro-deposited (2 ll) on both SERS and CaF2 substrates. CaF2 substrates, Raman transparent for near IR, were utilized as a control to reveal different vibrational bands un-enhanced with respect to SERS substrates. The solution was left for around 10 min in a closed micro-cryostat chamber (to avoid any impurity from surroundings) to evaporate the superficial water. Protein SERS spectra were taken at different points of the visible ring of protein (outer edge of the deposition region). The DCDR samples were remained in well-hydrated state after deposition and evaporation [20]. The concentration of protein is found to be in the order of attomole (1018 mole) and is calculated as described in detail in Ref. [10]. It is well known and widely reported in literature [21] that DCD protein spectra do not show substantial differences from normal protein Raman spectra [21]. 2.3. SERS measurements and data analysis MicroRaman spectra (inVia Renishaw) were excited by near-IR laser with 830 nm laser line from diode laser in backscattering geometry through a 50  long-range objective (NA = 0.50). The

Fig. 1. SEM image of gold nanoaggregate array with different magnifications.

502

G. Das et al. / Journal of Molecular Structure 993 (2011) 500–505

laser power was maintained at 13 mW throughout all the experiments. An accumulation time of 100 s, and spectral resolution of 1.1 cm1 were used. Various measurements were performed at different location of the SERS substrate to verify the reproducibility of the substrate and were observed the reproducible behavior. All the SERS spectra were baseline-corrected using 4th to 5th order polynomials and then normalized to Raman band centered at 1450 cm1, attributed to C–Hx bending vibration [22]. PCA analysis was performed for all the proteins in the range of 600–2000 cm1, which allows the protein to group (and thus classify) on the basis of their secondary structure. PCA analysis involves the calculation of principal components (PCs) that describe the greatest variance of the spectral data from the mean; 53 PCs were calculated for the purposes of this study. Each of the protein spectra in the data-set can be reconstructed from 53 PCs, by multiplying each PC by a different variable, known as score. As the PCs are common for all the spectra in the data-set they can be put aside, allowing each spectrum to be represented by its 53 scores. This allows a massive reduction in the data that need to be processed, with a minimal reduction in functional information. The algorithm for the PCA analysis can be found in Supplementary information A. Further statistical analysis, so called cluster analysis, was performed to categorize the protein data. Cluster analysis (see details in Supplementary information B) is an unsupervised learning technique used for classification of data whereby data elements are partitioned into groups called clusters. These clusters are particular subsets, representing collections of elements that are grouped based on a distance function. The distance function measures how dissimilar data elements are. Identical element pairs have zero distance and all others have positive distance. The distance function f(ei, ej) treats pairs of elements ei and ej as being less similar when their distances f are larger. If the ei and ej are vectors (and this is the case), f can be a distance function such as Euclidean, Manhattan, Hamming, Mehalanobis distance (see, Supplementary information B). Manhattan distance function is used in our case and is given by

Dðu; v Þ ¼

X

Absðu  v Þ ¼ Absðu1  v 1 Þ þ Absðu2  v 2 Þ

where, u(u1, u2) and v(v1, v2) are the position co-ordinate of two points. The principal components of the proteins at study were clustered, based on partitioning around medoids for a particular number of clusters k: the method starts by building a set of k representative objects and clustering around those, iterating until a locally optimal clustering is found. 3. Results and discussion Various proteins were chosen for our analysis. These are myoglobin, ferritin, and BSA that carry the maximum percentage (>65%) of a-helix structure; a-casein and trypsin that contain the random coil in a percentage greater than 60%; and lysozyme,

Table 1 Secondary structure distribution for the proteins (the data are taken from the reference reported with the protein).

Myoglobin[23] Ferritin[24] BSA[25] Lysozyme[26] RNase-B[27,28] Trypsin[29,30] a-Casein[31] a-Lactalbumin[32]

a-Helix (%)

b-Sheet (%)

Random coil (%)

83.77 74.86 69.03 37.91 29.77 7.32 35.05 46.48

0 3.43 0.66 16.64 16.18 30.49 4.20 9.86

16.23 21.71 30.31 45.45 54.05 62.20 60.75 43.66

RNase-B and a-lactalbumin which hold all the secondary structures (a-helix, b-sheet and random coils). The secondary structure of the proteins at study [23–32] is summarized in Table 1. Raman spectroscopy, a vibrational spectroscopy technique, provides the structural and chemical information of the substance under analysis. To understand the molecular vibrational information, SERS spectra were collected for all the proteins at room temperature in the range between 600 and 2000 cm1, shown in Fig. 2a. There are many characteristic vibrational bands which may be attributed to various side chains of the proteins: 655, 976, and 1450 cm1, related to the C–S, S–O, and C–Hx, respectively. In Fig. 2a and b, various well known Raman bands related to different amino acids can be observed at around 760 (Trp), 830 (Tyr), 855 (Tyr) [33], 1004 (Phe), 1012 (Trp), 1035 (Phe), 1361 (Trp) and 1618 (Tyr) cm1. Raman bands and their associated chemical vibrations for different proteins are summarized in Table 2. Different vibrational bands associated to the amide I, amide II and amide III Raman vibrational bands can be clearly seen in Fig. 2a and b. Various measurements also were performed in a similar manner depositing proteins on CaF2 substrates to compare the SERS with normal microRaman spectra, shown in Fig. 2b. Variations in the band intensity were observed, e.g.: for lysozyme 644 cm1, for BSA, 621 cm1, for myoglobin 1450 cm1, relative intensity of 1004 cm1 with respect to the other bands, etc. Most importantly, these spectra appeared to be very noisy with a very low intensity. A sharp and very intense peak was also observed centered at 322 cm1 (not shown here) for the samples with CaF2 substrate which is related to the Ca–F vibration. Myoglobin SERS spectrum showed the characteristic features centered at around 935, 1126, 1373 and 1625 cm1 which are related to the skeleton stretching [34], C–N stretching, oxidized myoglobin marker for heme group [35], and Tyr side chain, respectively [34]. Since the SERS measurements for Myoglobin were carried out in the past using Ag colloids [34] unlikely in our case where Au-nanopatterned structures are used, there are the differences in the peak ratio of different bands but the characteristic bands were always found alike. A broad band centered at 1560 cm1 was also observed. The SERS spectrum of trypsin was observed to be in good agreement with the spectrum observed by Forbes et al. [36]. Vibrational bands, observed at around 830 and 855 cm1, can be attributed to the Fermi resonance for Tyr residue (sensitive to the environment of H-bonding at the phenolic OH group) [37], and were found much more intense for trypsin in comparison to the other proteins. The integrated intensity ratio of these two peaks (I855/I830) reveals that trypsin carries strongest H-bonds which keep this protein much more stable than others. RNase-B showed a vibrational band at 980 cm1, due to the symmetric stretching of sulphate ion, which acts as an inhibitor of enzymatic action and stabilizes the native structure of RNase [38]. Lysozyme showed some unique intense peaks at 760, 1011, 1365 and 1554 cm1 which could be attributed to Trps’ normal modes: W18, W16, W7, and W3 residues, respectively. BSA showed a broad band at around 760 cm1, assigned to the indole ring breathing vibration of Trp residue. The large intensity of Trp band centered at 760 cm1, sensitive to amphipathic environment of the indole ring [37], suggests that lysozyme Trp indole ring is the most hydrophilic whereas the RNase-B Trp indole ring is the most hydrophobic among the measured proteins. a-Lactalbumin showed many intense peaks throughout the spectral range (760, 1004, 1010, 1035, 1130, and 1560 cm1) as it is also observed in the past [39]. General similarities were observed for the spectra of a-lactalbumin and lysozyme proteins reflecting the known homologies of these two proteins [40,41]. However, some differences were also observed for the bands centered at 760 cm1, well resolved doublet at 1004 (Phe) and 1012 (Trp) cm1, and 1555 cm1. Higher intensity of 1012 cm1 band for lysozyme with respect to the a-lactalbumin

G. Das et al. / Journal of Molecular Structure 993 (2011) 500–505

503

Fig. 2. (a) MicroRaman spectra for the proteins; myoglobin, trypsin, ferritin, RNase-B, BSA, lysozyme in the range of 350–2000 cm1 using DCDR deposition on gold nanoaggregate array based SERS substrate, (b) protein microRaman spectra for lysozyme, RNase-B and myoglobin DCDR deposited on CaF2 substrate.

Table 2 Raman bands observed in SERS spectra and their associated chemical bonding. In the table, W, M, S, vS are relative intensity representing weak, moderate, strong and very strong, respectively with respect to the reference band. m, d and x are different vibration types: stretching, bending and wagging vibration, respectively. The superscript sh represents the shoulder band. Band (cm1) 621 644 758 830 855 880 937 980 1004 1012 1035 1128 1257 1340 1373 1447 1562 1582sh 1625 1674

Band assignment

m–C–S (cysteine) (BSA-S; Rnase-B-M; Casein-M; Trypsin:W) Tyr (Trypsin:vS; BSA-M; RNase-vS; Casein-S) Tip (W18) (Lys-vS; Myog-vS; a-Lactalbumin-S; Casein-M) Tyr (Casein-M; RNaseB-M; BSA-M; Trysin-S; Lactal-M) Tyr (Casein-M; RNaseB-M; BSA-M; Trysin-S; Lactal-M) Tyr (Casein-M; RNaseB-M; BSA-M; Trysin-S; Lactal-M) a-helix (Mb,-W; Ferritin-W; BSA-W)

m–S–O ðSO2 4 Þ (RNase-B-vS) Phe (vS for all the proteins) Tip (Lys-vS; Lactalbumin-S) Phe (W for all the proteins) m–C–N (Mb-S; Lactal-M; Trypsin-M; Casein-W) Amide III (Mb-S; Rnase-B-M; Trypsin-M; Casein-M) (x-CHx (Mb-W; Rnase-B-W; Trypsin-W; Casein-W) Heme Gr. (Mb-vS) d–C–Hx (Reference band) Tip (Mb-vS; Lys-S; Lactal-M) Trp (Mb-S) Tyr (Mb-S; Trp-W) Amide I (Rnase-vS; Trypsin-vS; Lys-S; Lactal-M; BSA-M; CaseinM; Ferritin:W; Mb-vW)

could be an indication of the higher content of hydrophobic Trp residues in lysozyme (six Trp) compared to the a-lactalbumin (four Trp). Various intense Raman bands were observed in the SERS spectrum for casein. The vibrational band showing the structural information at around 937 cm1 was observed for myoglobin, ferritin and BSA, and was attributed to the a-helix secondary structure. Many bands exist which could be employed to distinguish one protein to another in a direct visual way, nevertheless it is not often possible to differentiate all the proteins as a whole because they show very much similarity. In these regards, PCA and cluster analysis could play an important role to distinguish and organize them according to their similarity. Fig. 3 illustrates the PCA analysis (PC1 vs. PC2) of proteins,

considering the SERS data in the range of 600–2000 cm1. PCA is a statistical method that reduces the dimensionality of multi-dimensional data-set while keeping the characteristics of all the spectra. The information content of each spectrum is described by a limited number of variables, known as principal components (PCs). These PCs contain most of the spectral information. The spectra, in the present case, can be satisfactorily described by 53 components. Plotting the 1st PC (PC1) vs. the 2nd PC (PC2) would provide valuable information (insight) of the proteins at study (see Supplementary information A). The plot may be subdivided into three horizontal bands (delimited by dotted lines in Fig. 3): (a) the most upper part with more than 65% of ordered structures (a-helix + b-sheet); (b) the middle one with more than 45% but less than 65%; and (c) the lowest part with less than 45% ordered structure. In Fig. 3, certain rules can be elicited which describe the secondary structure content of the proteins on the basis of their position in the PCA plot. In particular, more the protein point moves upward, more the ordered structure (could be a-helix + b-sheet) of the protein exists. A further displacement towards left side would instead indicate an increase in a-helix content. Generally, an increase in the random coil content would move the scatter points to the right-lower part of the PCA plot. Noticeably, the b-sheet content has small effects. It can be observed that the myoglobin finds its place in the extreme upper left corner of the PCA scatter plot (Fig. 3) with respect to the other since in that the a-helix of this protein (83.77% a-helix, 0% b-sheet, and 16.23% random coil or unordered structure) is higher than that of ferritin (74.86% a-helix, 3.43% b-sheet, and 21.71% random coil) or BSA (69.03% a-helix, 0.66% b-sheet, and 30.31% random coil). In the figure, ferritin and BSA show close to each other. On the basis of above considerations, it would be a very interesting effort to correlate the position of proteins in the plot to their secondary structure, as shown in Fig. 3. Let a, b, and c are numbers expressing the percentage of a-helix, b-sheet and random coil respectively for each protein. The sum of a, b and c is normalized to one and thus a + b + c = 1. Consider now two proteins, and, say, A and B. Let us assume that b is the same for A and B, then if the a-helix component in A is greater than the b-sheet component in B (aA > aB), it also follows that the random coil component in A is smaller than that in B (cA < cB). Therefore, aA > aB ) cA < cB. Under these circumstances, the position of

504

G. Das et al. / Journal of Molecular Structure 993 (2011) 500–505

Fig. 3. PCA analysis on various proteins. In the left and bottom sides, the color code dark red shows the maximum while the yellow shows the minimum in unit. Three horizontal group is categorized (>65%, <65% > 45%, and <45 of the ordered secondary structure) by dotted lines. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Fig. 4. PC1, PC2, and PC3 loading plot for all the proteins presented, herein.

protein B in the PCA scatter plot will be further right with respect to the protein A, as can be observed for BSA and myoglobin protein. Based on these considerations, in Fig. 3, a-lactalbumin and lysozyme could be clearly distinguished, notwithstanding they are very similar in their secondary structure. Also, it was observed that within the subset constituted by RNase-B, a-casein and trypsin, RNase-B finds its position far left, whereas trypsin falls in the right hand side of the plot. Indeed, in the case of lysozyme and a-lactalbumin, lysozyme is found to be to the right-up with respect alactalbumin (alactalbumin > alysozyme), in small disagreement with the model predictions: the amount of ordered structure in lysozyme is a little bit smaller than in a-lactalbumin, and this would suggest that the lysozyme (a-helix: 37.91%, b-sheet: 16.64% and random coil: 45.45%; total ordered structure: 54.55%) would be placed below the a-lactalbumin (a-helix: 46.48%, b-sheet: 9.86% and random coil: 43.66%; total ordered structure: 56.34%). This discrepancy could be explained considering that the secondary structure differences are very small, well within the limits of experimental errors. This is the first time that proteins have been placed in the PCA graph on the basis of their secondary structures for acquired SERS spectra. The loadings spectra corresponding to PC1, PC2 and PC3

Fig. 5. Cluster analysis on protein Raman data. In the inset, PCA analysis is shown for comparison. In the side of the figure, the associated 3D protein structure is illustrated.

were shown in Fig. 4 to further substantiate the analysis. The loading curve for first principal component with spectral variance of 53%, mostly evidences the dissimilarities over the myoglobin spectrum. As can be observed myoglobin has maximum a-helix structure and the only metalloprotein among the list of proteins which make it distinguishable clearly from others. Indeed, looking at the protein SERS spectra, it is clearly observable that the myoglobin SERS spectrum is quite different from others. Regarding the PC2 loading curve, with spectral variance of 15%, the major variances may be attributed to the 644 (trypsin), 760 (a-lactalbumin, lysozyme, and/or trypsin), 830, 855, 880 (trypsin), 940 (myoglobin, BSA, and/or ferritin), 980 (RNase-B), doublet at 1004 and 1012 (lysozyme and/or a-lactalbumin), 1132 (trypsin and/or a-lactalbumin), 1375 (myoglobin), 1558 (a-lactalbumin and/or lysozyme), 1618 (trypsin), 1676 cm1. The discrimination in PC3 loading curve with spectral variance of 13% is mainly due to the bands associated to a-lactalbumin (760, and 1014 cm1), trypsin (880 cm1), RNaseB (978 cm1), myoglobin (1373, 1558 and 1628 cm1). Cluster analysis was finally employed to categorize the protein data-sets on the basis of dissimilarity function. The correct assignment of particular data points to specific clusters of proteins was proved, shown in Fig. 5 (PCA scatter plot is also reported in inset). Figure shows that two points of BSA, three point of a-lactalbumin, and five of a-casein are erroneously grouped with the other proteins similar to them. The error of classification in our case is around 14%.

4. Conclusions A controllable and reproducible SERS substrate, comprised of regular arrays of nanograin aggregates with a mutual distance between the aggregates (gap) down to 10 nm, was realized using electro-plating and e-beam lithography techniques. The multivariate analysis, PCA, is employed to the data-sets of proteins’ (BSA, lysozyme, myoglobin, ferritin, RNase-B, a-lactalbumin, a-casein and trypsin) SERS spectra to distribute them in a 2D scattered plot on the basis of their secondary structures. PCA technique on proteins is successfully applied to provide a direct way for their folding informations. Cluster analysis was further utilized to classify the samples according to a dissimilarity (distance) function. Appreciable results were observed with a grouping error of about 14%.

G. Das et al. / Journal of Molecular Structure 993 (2011) 500–505

This technique could be employed for screening proteins fast and in a easy way. It was also shown that the method could be employed to assess the relative content of the secondary structures (a-helix, b-sheet and random coils) for each protein. Thus it could be possible to utilize this idea of combining the analytical methods ‘‘PCA and cluster analysis’’ with SERS spectra to differentiate even for same protein where external conditions (temperature, solvent pH, buffer effect, etc.) are changing. Appendix A. Supplementary material Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j.molstruc.2010.12.044. References [1] D.A. Stuart, J.M. Yuen, N. Shah, O. Lyandres, C.R. Yonzon, M.R. Glucksberg, J.T. Walsh, R.P. Van Duyne, Anal. Chem. 78 (2006) 7211. [2] F. De Angelis, M. Patrini, G. Das, I. Maksymov, M. Galli, L. Businaro, L.C. Andreani, E. Di Fabrizio, Nano Lett. 8 (2008) 2321. [3] F. De Angelis, G. Das, P. Candeloro, M. Patrini, M. Galli, A. Bek, M. Lazzarino, I. Maksymov, C. Liberale, L.C. Andreani, E. Di, Nat. Nanotechnol. 5 (2010) 67–72. [4] C.L. Haynes, A.D. McFarland, R.P. VanDuyne, Anal. Chem. 77 (2005) 338A. [5] P. Hildebrandt, M. Stockburger, J. Phys. Chem. 88 (1984) 5935. [6] K. Kneipp, Y. Wang, H. Kneipp, L.T. Perelman, I. Itzkan, R.R. Dasari, M.S. Feld, Phys. Rev. Lett. 78 (1997) 1667. [7] S. Nie, S.R. Emory, Science 275 (1997) 1102. [8] A.J. Haes, J. Zhao, S. Zou, C.S. Own, L.D. Marks, G.C. Schatz, R.P. VanDuyne, J. Phys. Chem. B 109 (2005) 11158. [9] M. Kahl, E. Voges, S. Kostrewa, C. Viets, W. Hill, Sens. Actuators B: Chem. 51 (1998) 285. [10] G. Das, F. Mecarini, F. Gentile, F. De Angelis, M.H.G. Kumar, P. Candeloro, C. Liberale, G. Cuda, E. Di Fabrizio, Biosens. Bioelectron. 24 (2009) 1693. [11] G.J. Thomas Jr., Ann. Rev. Biophys. Biomol. Struct. 28 (1999) 1.

505

[12] N. Alizadeh-Pasdar, N. Shuryo, E.C.Y. Li-Chan, J. Agric. Food Chem. 50 (2002) 6042. [13] M. Harz, M. Kraus, T. Bartels, K. Cramer, P. Rosch, J. Popp, Anal. Chem. 80 (2008) 080. [14] R. Malini, K. Venkatakrishna, J. Kurien, K.M. Pai, L. Rao, V.B. Kartha, C.M. Krishna, Biochemistry 81 (2006) 179. [15] P. Crow, J.S. Uff, J.A. Farmer, M.P. Wright, N. Stone, BJU Int. 93 (2004) 1232. [16] C. Hyung-Kyoon, Y. Jung-Hye, Proc. Biochem. 42 (2007) 271. [17] E.W. Blanch, A.C. Gill, A.G.O. Rhie, J. Hope, L. Hecht, K. Nielsen, L.D. Barron, J. Mol. Biol. 343 (2004) 467. [18] M.K. Hossain, G.G. Huang, T. Kaneko, Y. Ozaki, Chem. Phys. Lett. 477 (2009) 130. [19] I.K. Lednev, V.E. Ermolenkov, W. Hei, M. Xu, Deep–UV, Anal. Bioanal. Chem. 381 (2005) 431. [20] D. Zhang, Y. Xie, M.F. Mrozek, C. Ortiz, V.J. Davisson, D. Ben-Amotz, Anal. Chem. 75 (2003) 5703. [21] M.J. Pelletier, R. Altkorn, Cell Anal. Chem. 73 (2001) 1393. [22] G.J. Thomas Jr, B. Prescott, L.A. Day, J. Mol. Biol. 165 (1983) 321. [23] M. Dautrevaux, Y. Boulanger, K. Han, G. Biserte, Eur. J. Biochem. 11 (1969) 267. [24] T. Granier, B. Gallonois, A. Dautant, B.L. D’Estaintot, G. Precigoux, Acta Crystallogr. D53 (1997) 580. [25] K. Hirayama, S. Akashi, M. Furuka, K.–I. Fukuhara, Biochem. Biophys. Res. Commun. 173 (2) (1990) 639. [26] A. Jung, A.E. Sippel, M. Grez, G. Schutz, PNAS 77 (1980) 5759. [27] D.G. Smith, W.H. Stein, S. Moore, J. Biol. Chem. 238 (1963) 227. [28] R.L. Williams, S.M. Greene, A. McPherson, J. Biol. Chem. 262 (33) (1987) 16020. [29] K.A. Walsh, D.L. Kauffman, K.S.V.S. Kumar, H. Neurath, PNAS 51 (1964) 301. [30] W. Bode, P. Schwager, J. Mol. Biol. 98 (1975) 693. [31] G. Kontopidis, C. Holt, L. Sawyer, J. Dairy Sci. 87 (2004) 785. [32] K. Brew, F.J. Castellino, T.C. Vanaman, R.L. Hill, J. Biol. Chem. 245 (1970) 4570. [33] M.N. Siamwiza, R.C. Lord, M.C. Chen, T. Takamatsu, I. Harada, H. Matsuura, T. Shimanouchi, Biochemistry 14 (1975) 4870. [34] T.G. Spiro, B.P. Gaber, Ann. Rev. Biochem. 46 (1977) 553. [35] H. Xu, E.J. Bjerneld, M. Kall, L. Borjesson, Phys. Rev. Lett. 83 (1999) 4357. [36] R.T. Forbes, B.W. Barry, A.A. Elkordy, Eur. J. Phar. Sc. 30 (2007) 315. [37] T. Miura, H. Takeuchi, I. Harada, Biochemistry 30 (1991) 6074. [38] N. Howell, E. Li-Chan, Int. J. Food Sci. Technol. 31 (1996) 439. [39] M.C. Chen, R.C. Lord, Biochemistry 15 (1976) 1889. [40] N. Yu, J. Am. Chem. Soc. 96 (1974) 4664. [41] G. Wilson, S.J. Ford, A. Cooper, L. Hecht, Z.Q. Wen, L.D. Barron, J. Mol. Biol. 254 (1995) 747.