Chemometrics and Intelligent Laboratory Systems 57 Ž2001. 93–95 www.elsevier.comrlocaterchemometrics
Is information about peptide sequence necessary in multivariate analysis? Tore Lejon ) , Morten B. Strøm, John S. Svendsen Department of Chemistry, UniÕersity of Tromsø, N-9037 Tromsø, Norway Received 2 November 2000; accepted 20 April 2001
Abstract If information about sequence is omitted, permutation of amino acids within a peptide can cause unwanted effects in multivariate analysis. This effect is readily apparent in principal component analysis, while it is less so in partial least squares analysis so that care must be exercised when interpreting the results. q 2001 Elsevier Science B.V. All rights reserved. Keywords: Peptide sequence; Multivariate analysis; Amino acid
1. Introduction Multivariate analysis of peptides can, e.g., be based on variables that describe the entire molecule or variables that describe the individual amino acids in the peptides. The major advantage of describing the entire peptide by means of physicochemical variables is that each molecule will be described by a unique set of variables, usually leading to useful structure–activity relationships. The major drawback in this is that design of new molecules is difficult due to problems in directly correlating response to peptide structure. One way of overcoming this problem has been described by Hellberg et al. w1,2x who used descriptors for the individual amino acids for describing the peptides. This leads to models with good predictive ability. Both of these methods have
)
Corresponding author. E-mail address:
[email protected] ŽT. Lejon..
been used with success in an ongoing project in our laboratory where modified pentadecapeptides based on lactoferricins have been investigated for their antibacterial activity w3,4x. However, results from principal component analysis of an extended set of peptides were at first somewhat surprising with the second principal component not significant and objects projected on top of each other. This led us to reinvestigate the original set of peptides once more.
2. Results and discussion Studies of antibacterial activity of lactoferricins of different origin indicated that a pentadecapeptide from murine lactoferricin ŽLFM. would be a suitable candidate as a starting point for synthesis of new modified peptides with high antibacterial activity w5x. Based on these initial studies, the amino acid in position 8, asparagine, was exchanged for tryptophan and since the amino acids in positions 1 and 9 were important for peptide activity, substitutions were also
0169-7439r01r$ - see front matter q 2001 Elsevier Science B.V. All rights reserved. PII: S 0 1 6 9 - 7 4 3 9 Ž 0 1 . 0 0 1 2 6 - 5
T. Lejon et al.r Chemometrics and Intelligent Laboratory Systems 57 (2001) 93–95
94
undertaken in these positions. An apparently unfortunate aspect which caused problems in some of the multivariate analyses was that the same three amino acids, i.e. glutamic acid, alanine and arginine, had been incorporated in positions 1 and 9. Results were inconclusive regarding the importance of the valine in position 13, so some of the new peptides were substituted in this position and tyrosine was introduced. This resulted in a test set consisting of 18 pentadecapeptides in which three positions had been varied. We were able to show that macroscopic properties describing the peptides could be exchanged for descriptors for the individual amino acids without loss of descriptive power in PLS analysis w3x. The results from principal component analysis, in which ) 94% of the variation was described by four components, resulted in objects being projected on top of each other. Which objects were not separated depended on the type of pre-treatment applied to the original data set. If data was not centred prior to analysis, pairs of peptides that contained the same amino acids attained the same score values for t1, t 2 and t4 Žloadings for positions 1 and 9 in the peptides
coincide for p1, p2 and p4., irrespective of the position of the amino acids in the peptide ŽTable 1., and objects were thus projected on top of each other in these dimensions. On the other hand, if data was centred prior to analysis peptides that contained the same amino acid in position 1 were projected on top of each other in the t1, t 2 and t4 dimensions ŽTable 2.. In a PLS analysis, each block of variables is treated as in a PCA, with the additional constraint that the calculated components for the dependent variables should correlate as well as possible with the components for the independent variables. Since the y-values are not the same for peptides in which amino acids are permuted, the objects in the x-matrix are forced apart as compared to the ordinary PCA. Even though our results from the PLS analysis were good Žtwo significant components, R 2 X s 60.2, R 2 Y s 98.1, Q 2 s 96.2., we wished to investigate if results could be improved. After including crossterms for adjacent amino acids in positions 1r2, 9r10 and 13r14, which can be regarded as a kind of addition of sequence information, a new analysis was performed. This resulted in a model with slightly
Table 1 Scores and loadings for non-centred data Peptidea
t w1x
t w2x
t w3x
t w4x
Amino acid position
pw1x
pw2x
pw3x
pw4x
LFM W8 LFM W8 Y13 LFM A1 W8 LFM A9 W8 LFM A1,9 W8 LFM R1 W8 LFM R9 W8 LFM A1 R9 W8 LFM A9 R1 W8 LFM R1,9 W8 LFM A1 W8 Y13 LFM A9 W8 Y13 LFM A1,9 W8 Y13 LFM R1 W8 Y13 LFM R9 W8 Y13 LFM A1 R9 W8 Y13 LFM A9 R1 W8 Y13 LFM A9 R1 W8 Y13
4.834 3.602 2.869 2.869 0.904 6.278 6.278 4.313 4.313 7.722 1.636 1.636 y0.329 5.045 5.045 3.08 3.08 3.08
y2.503 2.47 y3.089 y3.089 y3.675 y1.845 y1.845 y2.431 y2.431 y1.186 1.884 1.884 1.298 3.128 3.128 2.542 2.542 2.542
0 0 2.028 y2.028 0 y2.354 2.354 4.382 y4.382 0 2.028 y2.028 0 y2.354 2.354 4.382 y4.382 y4.382
y1.454 y2.254 y1.837 y1.837 y2.219 0.584 0.584 0.202 0.202 2.623 y2.636 y2.636 y3.018 y0.215 y0.215 y0.597 y0.597 y0.597
1 Ž z1 . 1 Ž z2 . 1 Ž z3 . 9 Ž z1. 9 Ž z2 . 9 Ž z3 . 13 Ž z1 . 13 Ž z 2 . 13 Ž z 3 .
0.492 0.204 y0.328 0.492 0.204 y0.328 y0.431 y0.099 y0.149
0.095 0.133 y0.117 0.095 0.133 y0.117 0.232 0.899 0.238
y0.311 y0.484 0.411 0.311 0.484 y0.411 0 0 0
y0.224 0.477 y0.29 y0.224 0.477 y0.29 0.426 y0.299 0.074
a
Symbols are in accordance with IUPAC recommendations.
T. Lejon et al.r Chemometrics and Intelligent Laboratory Systems 57 (2001) 93–95
95
Table 2 Scores and loadings for centred data Peptidea
t w1x
t w2x
t w3x
t w4x
Amino acid position
pw1x
pw2x
pw3x
pw4x
LFM W8 LFM W8 Y13 LFM A1 W8 LFM A9 W8 LFM A1,9 W8 LFM R1 W8 LFM R9 W8 LFM A1 R9 W8 LFM A9 R1 W8 LFM R1,9 W8 LFM A1 W8 Y13 LFM A9 W8 Y13 LFM A1,9 W8 Y13 LFM R1 W8 Y13 LFM R9 W8 Y13 LFM A1 R9 W8 Y13 LFM A9 R1 W8 Y13 LFM R1,9 W8 Y13
y2.593 2.593 y2.593 y2.593 y2.593 y2.593 y2.593 y2.593 y2.593 y2.593 2.593 2.593 2.593 2.593 2.593 2.593 2.593 2.593
y0.154 y0.154 y3.022 y0.154 y3.022 3.176 y0.154 y3.022 3.176 3.176 y3.022 y0.154 y3.022 3.176 y0.154 y3.022 3.176 3.176
y0.154 y0.154 y0.154 y3.022 y3.022 y0.154 3.176 3.176 y3.022 3.176 y0.154 y3.022 y3.022 y0.154 3.176 3.176 y3.022 3.176
1.505 1.505 y0.809 1.505 y0.809 y0.697 1.505 y0.809 y0.697 y0.697 y0.809 1.505 y0.809 y0.697 1.505 y0.809 y0.697 y0.697
1 Ž z1 . 1 Ž z2 . 1 Ž z3 . 9 Ž z1 . 9 Ž z2 . 9 Ž z3 . 13 Ž z1 . 13 Ž z 2 . 13 Ž z 3 .
0 0 0 0 0 0 0.251 0.935 0.251
0.44 0.685 y0.581 0 0 0 0 0 0
0 0 0 0.44 0.685 y0.581 0 0 0
0.756 0.068 0.651 0 0 0 0 0 0
a
Symbols are in accordance with IUPAC recommendations.
better predictive power, five significant components, R 2 X s 99.3, R 2 Y s 99.0, Q 2 s 97.9.
3. Conclusions It is clear that the position of a specific amino acid in a peptide is of importance to the physico-chemical behaviour of the peptide and that the results presented above are due to the way data is treated in a PCA. The results are not surprising from a mathematical point of view, but for those who use multivariate analysis only as a tool it must be emphasised that problems can arise from a badly designed test set. In order to avoid the problems with objects being projected on top of each other in principal component analysis and to improve the result from partial least squares analysis, information that distinguishes between positions within the peptide can be added.
4. Experimental The program package Simca-P 8.0 from Umetrics ŽUmea, ˚ Sweden. was used for all calculations. The
theoretically derived z-scales were used without scaling, since the variables used for the original analysis had been scaled w3x. Cross-terms were calculated as z 1 z 1 , z 2 z 2 and z 3 z 3 for the pairs of amino acids. The logarithms of the activities were used as dependant variables in the PLS analysis.
Acknowledgements Alpharma is greatly acknowledged for financial support to Morten B. Strøm and John S. Svendsen.
References w1x S. Hellberg, M. Sjostrom, ¨ ¨ S. Wold, Acta Chem. Scand. B 40 Ž1986. 135–140. w2x S. Hellberg, M. Sjostrom, ¨ ¨ B. Skagerberg, S. Wold, S. J. Med. Chem. 30 Ž1987. 1126–1135. w3x M.B. Strøm, Ø. Rekdal, W. Stensen, J.S. Svendsen, J. Pept. Res. 57 Ž2001. 127–139. w4x T. Lejon, M.B. Strøm, J.S. Svendsen, J. Pept. Sci. 7 Ž2001. 74–78. w5x Patents no. PCTrGB99r02850 and no. PCTrGB99r02851.