THEO CHEM Journal of Molecular Structure (Theochem) 362 (1996) 263-273
Amino acid conformational P&ter Hudhky”, And&
analyses of proteins
(ACAP
program)
Perczela7b1*T1, Imre G. Csizmadiaa,b
“Department of Organic Chemistry, Etitvds University, H-1117 Budapest, 112 P.O.B. 32, Hungary bDepartment of Chemistry, Utliversity of Toronto. Toronto, Ont. MS5 IAl, Canada
Received 1 August 1995; accepted 15September 1995
Abstract The secondary structure analysis of proteins is a powerful tool, efficiently supporting spectroscopic (CD, IR, NMR) and biochemical research projects. However, the precise location of the different secondary structural elements in a sequence incorporates some subjectivity. A linearized notation of a uniform and objective description of protein backbone structures was developed. This involves a three-dimensional to one-dimensional transformation (3D -+ 1D). The classification is based on a step-by-step comparison performed between reference conformers (also called template values) and backbone sub-conformations of the protein. The readily provided structure templates may be modified, but the procedure still remains objective and uniform. This linearized notation of protein structures provides a description of the three-dimensional backbone conformation without relying on the traditional concept of secondary structure. Previously, the sequence analysis (primary structure) of proteins resulted in the identification of the 20 natural amino acid residues. Similarly, the backbone conformation analysis of the same proteins confirmed the presence of nine basically different subconformers. This recognition and the application of the linearized notation of the 3D structure makes easier the comparison of proteins at the levels of primary to tertiary structure. Keywords:
Ab initio calculation;
ACAP
program; Amino acid; Conformational analysis; Conformational cluster; Protein
1. Introduction
Until now, no explicit relationship, with significant reliability, has been found between the amino acid sequence of a protein and its conformer(s). Different approaches attempted to create a variety of prediction algorithms, but no “fine-tuned” a priori technique is in sight. The objective and uniform classification of the already determined backbone conformation types should be the initial * Corresponding author. ’ E-mail address:
[email protected]. Department of Biochemistry, University Parks Road, Oxford OX1 3QU, UK.
Present address: of Oxford, South
step. The identification of the nowadays familiar conformers started in the 1950s with the pioneering work of Pauling and co-workers [ 11,continued by Venkatachalam [2], Sheraga and co-workers [3], Chou and Fasman [4], Rose et al. [5] and others. A common characteristic of these approaches was the comprehensive search for “highly similar folds” in proteins using the available X-ray structure databases. Initially, only some dozens of proteins were analysed and therefore beside the regular some irregular (or atypical) subunits had to be incorporated among the secondary structural elements. These irregular backbone conformers were collectively referred to as random or unordered structures. Sequentially,
0166-1280/96/$15.00 0 1996 Elsevier Science B.V. All rights reserved SSI)I 0166-1280(95)04416-7
P. Hudriky et ul./Journal qf Molecular
264
Sfructure (Theochem)
362 (1996) 263-273
Y ,.../ ‘>.3 -.+;> ,_(.<. _ t&.4
(a)
GLY 0
’ PRO
No
38
Name
His Pro Asp Leu Asn Ile Arg . . . . . -......................-----------........
No Name
51 Pro
39
52 Gly -
40
41
42
43
44
45
46
47
48
49
50
Gly
Gly
Ala
Ser
Phe
Val
53 54 55 56 51 58 59 60 61 62 63 Glu Pro Ser Thr Gln Asp Gly Asn Gly His Gly . * . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . l\l\/\/\l\l
Fig. 1. (A) The “ribbon” of a serine protease [6]segment (residues 38-63) (lST3) from X-ray data [7].(B) The amino acid sequence of the serine protease (lST3) fragment as shown, with assigned secondary structures (NvlMA, u-helices; - - - - - -, P-sheets; I , O-turns; .) unordered). The secondary structure assignment is identical with that reported in the PDB file. with the enlargement of the investigated X-ray databases, more and more aperiodic but regular substructures (e.g. P-turn types) were recognized (Fig. 1).
However the category of irregular secondary structural element had to be preserved. The last four decades resulted in partial success in the comprehensive analysis of the secondary structure of proteins. Even the classification criteria of the periodic regular backbone units (a-helix, P-plated sheet, polyproline II, etc.) have problematic aspects. Not only is the determination of the minimal chain lengths ambiguous, but also no consensus of the allowed limit for torsion angle deviations has been established. Moreover, various aperiodic, but regular sub-structures (rare P-turn types and/or loops) are often ignored owing to their sporadic occurrence. The division of the backbone conformers was typically achieved by the fragmentation of the X-ray-determined protein backbone structures.
The main obstacle of such a method is that the relative frequency ratios of the above subconformations in the investigated protein database influence the analysis; less frequently observed aperiodic secondary structure types (e.g. type I’, II’, III’, Vla, VIb, VIII P-turns) are often ignored.
2. An alternative approach The conceptual difference between our method and the previous approaches is that the possible backbone conformers are determined by ab initio computations performed on pLptide models. This method is therefore independent of the common weaknesses of the pattern recognition techniques, applied on X-ray databases. The multidimensional conformational analysis [8] of potential-energy hypersurfaces (PEHS) obtained for model peptides, where energy is a function of the torsion
P. Hudciky el al./Journal of Molecular Structure (Theochem)
angles, may result in the recognition of all possible conformers. Even if a backbone conformer has high energy relative to the global minimum (and therefore a low probability would be expected), it is considered important to acquire complete conformational template sets. The determination of a suitable number of “conformation centres” on a 2n-dimensional a Ramachandran map (e.g. up to nine “conformation centres” on a 2D PES (E = E($, $)) provides the basis for the recognition of discrete backbone subconformations. Our observation that only a limited number of backbone conformations exist is equivalent to the recognition that a finite number of conformations play a role during protein folding. Comparing, from residue to residue, the backbone sub-conformations of a protein with the reference conformers of a template, an objective approach was developed to describe the whole 3D structure of a protein (Fig. 2). 3. The determination of the template conformers The conformational analysis requires two types of input data: the geometrical parameters of the
A
Yop
by
Conformational assignement
Fig. 2. Block diagram of the ACAP (Amino Analyses of Proteins) program.
acid conformation
3o0°
y
180° 6oo
b
D
a,
ED
pL
c
c6
%
t
180’
300”
D
I 60 ’
4
top
Fig. 3. The ideal locations of the 9 minima of a backbone conformational unit as predicted by multidimensional conformational analysis. in a “topology orientated” representation (0” < 4 < 360”, 0” < $ < 360”).
investigated protein(s) and the reference backbone conformations accumulated in a template. We recommend the use of templates determines by ab initio computations since these results are almost “free” from external parameters. The backbone conformational potential-energy hypersurface, E = E(x), of a protein containing n amino acids, where x = $i, &, &, . . , q$_, , $+_, , qb,, may be subdivided into (n - 1 - k) number of 2k-dimensional Ramachandran-type potentialenergy hypersurfaces (PEHS), (k = 1,2,. . . , IZ- 2) since only the first and the last amino acid residues cannot be assigned, since 4, and +, are undefined. In a “diamide concept”, when k = 1, a total of (n - 2) two-dimensional (2D) Ramachandran maps [9], E = E(c#Q, $;), are obtained. Therefore, the conformational properties of single amino acid residues composing the primary sequence of the protein are analysed. It has been successfully demonstrated [lo] that such an E = E(c$~, pi) Ramachandran-type surface has nine or fewer minima, depending on the type of amino acid residue (Fig. 3). These nine minima have been predicted by multidimensional conformational analysis. The initial calculations were carried out by molecular mechanics (MM) technique on a number of amino acid derivatives of the types Ac-X-NHMe and For-X-NH2 (AC = CHsCO-; For = HCO-; Me = -CH3). The minima were subsequently subjected to ab initio computations carried out on For-X-NH*, where X = glycine [lOc,d], L-alanine [ lOc,d], p-alanine [ 10~1, L-valine [ lOd,e], L-serine [l 11, L-threonine [12], L-cysteine [12] and L-phenylalanine [ 131 residues. Although ab initio studies
?z Comrpakn
265
362 (1996) 263-273
266
P. Hudbky et al./Journal of Molecular Structure (Theochem)
revealed the annihilation of certain minima in various cases [lo-141, which has not been observed when molecular mechanics (ECEPP/2) [15] were applied [8], it nevertheless should be emphasized that an increase in the number of minima beyond nine was never observed. In a “triamide concept”, when k = 2, (n - 3) number of different four-dimensional (4D) Ramachandran maps, all with E = E($iy $i, 4i+ 1, $i+ 1) could be obtained. In this way, the conformational properties of the triamides of two adjacent amino acid residues are analysed. It has been successfully demonstrated that such an E = E(~i, $i, 4i+ 1, $i+ 1)-type surface may have 9 x 9 = 8 1 or fewer minima [16,17]. Consequently, (32)“, as an upper bound for the number of minima (if n is the number of amino acids in the protein), is a good approximation for the maximum number of main-chain (or backbone) conformers in a protein. The influences of the different side-chains are important [ 1Oe,1 l- 131 in terms of modifying the position of a given minimum on the backbone PEHS surfaces and also in terms of determining the relative energetic spectrum of the minima. Furthermore, special side-chains may also modify the topology of peptide conformational PEHS [lo- 131.
362 (1996) 263-273
Brookhaven Protein Data Bank (PDB)). Therefore, the PDB database format is the default input type of the ACAP program, although any other coordinate types can serve as input. When assigning, ACAP does not require the Cartesian coordinates themselves, but uses only the [$, $1 torsional angles. The appropriate internal coordinates can be calculated easily from the appropriate Cartesian coordinates. The subroutine performing such a calculation of the [4, $1 torsional angle values in ACAP is the same as that provided by PDB routine (dihedral.exe). (Note that a small modification was introduced, however, in the presentation of the results: the PDB coordinate file may contain some duplicates originating from uncertainties of the structure refinement. In such cases only one of the alternative sub-structures should remain in the transformed coordinate file). If side-chain coordinates are available, then xl, x2, x3, x4 are also calculated. At the current stage of our work, the side-chains are usually not taken into account during conformational analyses. Initial efforts have already been made to differentiate in the main-chain assignment depending on the side-chain conformation.
5. Applications in 4. Computational implementation Owing to the linearized notation of the peptide 3D structures, it became possible to create, an automatic way to describe, compare and examine protein 3D structures. The basis of our notation is the classification of [$, $1 torsional angle pairs into major conformational clusters. The program called ACAP (amino acid conformational analyses of proteins) makes a fully automatic assignment of the overall protein backbone structure. It has additional features that make protein 3D structure examinations simpler and effective. The input of the program may be any of two common data formats: Cartesian or internal coordinates. The primary result of a protein structure determination (by X-ray or by NMR) is usually a set of [x,y, z] coordinates. (This type of coordinates is to be found most easily in the
ACAP
Using our method, the assignment and “labelling” of the consecutive protein backbone sub-structures is possible. This is based on the step-by-step comparison of the “measured” substructures with the computed ab initio minima collected in “templates”. This “linearized” or compact notation of protein backbone conformations has been developed to perform conformational pattern recognition in protein families. 5.1. Description
of protein 30 structure:
the
assignment
The assignment of each and every amino acid, not counting the first and the last residue of a sequence, is performed. During such a step the [4, $1 torsional angle pair of the current residue is compared with all of the value pairs incorporated in the selected template. The deviations (dev) are
P. Hudciky et al.lJournal
of Molecular
Structure (Theochem)
267
362 (1996) 263-273
Table 1
Template torsional angle values a for the nine conformational centres of amino acid residues
4 *
in proteins
aL
QD
PL
YL
YD
6L
6D
EL
CD
-68.6 -17.5
61.8 31.9
-161.6 169.9
-84.5 68.1
14.3 -59.5
-126.2 26.5
-179.6 -43.7
-74.7 167.8
64.1 -178.6
a Data are calculated from the ab initio conformers of For-L-Ala-L-Ala-NH* (-180” < 4 < + 180”, -180” < 4 < + 180”).
This new method differs from all previous approaches as it is based on the topological behaviour of peptides and protein fragments. The current method is a priori rather than a posteriori and therefore meets all the requirements of an objective method. This conformational analysis is an easy procedure, which introduces a complete set of conformational elements (also called conformational codes) describing the folding of the main chain. The assignment results in a 3D + ID transformation by preserving the geometrical properties of the folded protein in terms of linearized codes (Q~, ffD, PL, YL, ?D, SLt SD, CL? CD). It is hoped that such a “linearized description” of the 3D geometry might lead to useful qualitative comparisons based on quantitative structural data. Fig. 4 shows the assignment of the same serine protease (lST3) fragment as reported in Fig. 1. Besides the primary sequence (His, Pro, Asp, etc.) of the protein fragment (38-63), the linearized description (yL, cL, CY~, pL, etc.) of the 3D structure is also reported. The numbers associated with the deviation express the reliability of conformational centre
calculated for all template elements ([c$,$1 torsional angle pairs of the conformational centres) with respect to the torsional angles of the current amino acid residue. The conformational assignment of the residue is based on the lowest deviation. The deviation, which is the Pythagorean distance, is calculated in the following way: dev = &(4[residue]
- 4[templ.]’
+ (@[residue] - $[templ.])*) Note that 4[residue] and $[residue] are the torsional angles of the examined residue and &empl.] and $[templ.] are those of the template. At the current stage of the work, the same template is used for all residue types in a protein. This simplification is based on our previous observation that only small differences were calculated between the templates determined for the different types of amino acid diamides [IO- 141.Initial efforts have been made to differentiate between the various amino acids during assignments, but no significant improvement in the accuracy has been achieved. The template used in this study is shown in Table 1 [16]. No.
38
39
40
Name
His
Pro
Asp
AssDev. No.
41
42
and reported according to IUPAC-IUB convention
43
44
45
46
PL
QL
6~
PL
YL
EL
aL
PL
PL
2
0
1
4
5
4
3
4
4
51
52
53
47
48
49
50
60
61
62
63
Leu Asn Ile Arg Gly Gly Ala Ser Phe Val
54
55
56
57
58
59
Name Pro Gly Glu Pro Ser Thr Gln Asp Gly Asn Gly His Gly Ass. aL aL ye aL EL &L aL "L YD aL aL Dev. 2 2 4 1 1 6: “f 4 3 5 5 2 0 Fig. 4. A fragment (38-63) of a serine protease (1ST3) [6] and its structure using the linearized description technique. Note that all amino acids have been assigned and the magnitude of deviation with respect to the template is specified. (Ass. = assignment according to ACAP using the template reported in Table 1; Dev. = deviation as shown in Scheme 1.)
P. Hudbky et al.lJournal of Molecular Structure (Theochem)
268 Table 2 The complete conformational taken from the PDB [7] PDBlLZM 1 2 3 4 5 6 7 8 9 10 11 12 13
Met Asn Ile Phe Glu Met Leu Arg Ile Asp Glu Gly Leu
pL or or (Yt_ q cyL CX~ q cyL cyt_ co pL
14 Arg PL 15 Leu or_ 16 Lys 17 Ile
PL
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51
PL
Tyr Lys Asp Thr Glu Gly Tyr Tyr Thr Ile Gly Ile Gly His Leu Leu Thr Lys Ser Pro Ser Leu Asn Ala Ala Lys Ser Glu Leu Asp Lys Ala Ile Gly
tL EL tL oL oL o,, eL cL pL &_ to yL (Yp yL tL oL PL or tL oL tL aL oL oL oL cq or or_ q_ oL oL oL oL on
assignme&
*** *** * ** * **
* * **
**
PDBlLO2
PDBlL03
Met
Met
Met
PL
oL oL oL q_ oL
Arg Ile
(YL
Asp Glu
QL
Gly
@D
oL oL
Leu
PL
Arg
PL
Leu
oL
LYS PL Ile tL Tyr Lys Asp Thr GIu
PL CL
*** *** * *** * **
CL
oL oL
Gly
@D
Tyr
EL
Tyr
CL
Thr
BL
Ile PL
Gly
**
ED
** **
**
*
**
* *
Gly or, His yL Leu tL Leu cyL Thr Lys Ser Pro Ser Leu Asn Ala Ala
PL
LYS Ser Glu Leu
aL
Asp Lys Ala Ile Gly
6L
cL aL q_ oL oL oL oL
* ** * *
**
QL QL
ffD
PL
Arg Ile
QL
Asp Glu
0~
Gly
ED
oL oL oL oL oL oL oL
Leu PL Ax
PL
Leu
q
LYS PL Ile eL Tyr Lys Asp Thr Glu
PL
Gly
@D
Tyr
EL
Tyr
CL
Thr
PL
Ile
PL
Gly
ED
EL
** *** *** * *** ** **
CL
oL oL
* **
**
Gly OD His yL Leu tL Leu oL Thr Lys Ser Pro Ser Leu Asn Ala Ala
PL 6L
eL oL eL oL oL oL oL
* ** * **
**
Lys aL Ser oL Glu oL Leu oL
aL oL oL
oL oL
Asn Ile Phe Glu Met Leu
Ile YL
Ile YL
* * *
and five of its point mutants.
PDBlLOl
Asn Ile Phe Glu Met Leu
*
of a lysozyme
*
Asp LYS Ala Ile Gly
@L ffL
oL oL aD
*
Asn Ile Phe Glu Met Leu
PL
Arg Ile
~YL
Asp Glu
(YL
Gly
CD
362 (1996) 263-273
All structures
oL oL oL oL oL LYL oL
Leu PL Av
PL
Leu
CYL
LYS PL Ile cL Tyr LYS Asp Thr Glu
PL fL
* *** *** * ** * **
EL
(YL oL
Gly
OD
Tyr
EL
Tyr
EL
Thr
PL
Ile PL
Gly
CD
Ile
YL
Gly aD His ye Leu eL Leu oL Thr PL Lys 'YL Ser eL Pro q Ser tL Leu LYL Asn oL Ala q Ala q
* * ** * **
* * ** *
**
have a resolution
PDB 1LO4
PDBlLO5
Met
Met
Asn Ile Phe Glu Met Leu
PL
Arg Ile
@L
Asp Glu
0~
Gly
CD
oL oL crL cq oL a~ oL
Leu
PL
Arg
PL
Leu
oL
LYS fL Ile tL Tyr PL Lys EL Asp EL Thr oL Glu oL Gly
"0
Or
EL
Tyr
EL
Thr
PL
Ile
PL
Gly
tD
Ile
YL
Gly OD His yL Leu EL Leu oL Thr PL Lys &L Ser eL Pro oL Ser eL Leu q_ Asn q Ala cq Ala oL
Lys QL Ser ‘YL Glu oL Leu oL
LYS Ser Glu Leu
aL
Asp Lys Ala Ile
Asp Lys Ala Ile
LYL
Gly
o/D
Gly
~YL CyL
oL oL aD
*
of 1.7 A and were
or aL crL
* *** *** * *** * **
* * **
**
* ** * **
*
PL
Arg Ile
CYL
or oL oL oL oL 0~
Asp @L Glu oL Gly
ED
Leu PL Arg
PL
Leu oL LYS CL Ile tL Tyr LYS Asp Thr Glu
PL
Gly
@D
Tyr
CL
EL CL
aL oL
Tyr
EL
Thr
PL
Ile
PL
Gly
CD
Ile
YL
Gly His Leu Leu
aD
7~ EL oL
Thr PL Lys QL Ser q Pro or Ser eL Leu cq Asn oL Ala CYL Ala q_
* *** *** * ** ** * ** * * * * **
**
* ** * *
**
*
Lys CyL Ser q Glu oL Leu oL
*
L~L
*
Asp Lys Ala Ile
*
Gly
OD
LyL
q_ q
Asn Ile Phe Glu Met Leu
(YL
q_ cq
P. Hudrjk,p et al./Journal qf’ Molecular Structure (Theochenl)
269
362 [ 1996) 263-273
Table 2 continued
52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104
Arg Asn cys Asn Gly Val Ile Thr Lys Asp Glu Ala Glu Lys Leu Phe Asn Gln Asp Val Asp Ala Ala Val Arg Gly Ile Leu Arg Asn Ala Lys Leu Lys Pro Val Tyr Asp Ser Leu Asp Ala Val Arg Arg cys Ala Leu Ile Asn Met Val Phe
flL yL CYL no Cko 3t, $,_ cL ~1~ No q, oi_ o,_ q_ (Y,_ cyyL CQ No q 0,. (rL oL mL q ~1~ cq o;L q oL et dL 0,. oL oL Cq CQ oL oL dL FL F,_ oL oL oL mL q q_ (ki_ cq aL q mL q
* *
**
*
* **
Arg 3~ Asn YL CYs fiL Asn oo Gly Val Ile Thr Lys
OD
Asp Glu Ala Glu
QL
ijL dL tL C’L
**
*
** **
Arg ilj~
Asn YL cys
OL
Asn do Gly oo Val JL Ilc JL Thr cL
Lys QL Leu nL Phe cvL Asn dL Gln q_
Lys ffL Asp cxL Glu cyL Ala cvL Glu (jL Lys CYL Leu dL Phe mL Asn aL Gln cyL
Asp 0~ Val ckL
Asp QL Val a!L
Asp 0~ Ala oL Ala oL Val q
Asp NL Ala (yL Alas nL Val q
Arg QL Gly QL Ile CYL Leu oL Arg ciL Asn tt Ala oiL
Arg Gly Ile Leu Arg Asn Ala
@L
LYS Leu Lys Pro Val Tyr
flL
oL dL dL
Lys 01. Leu q_ Lys Pro Val Tyr Asp Ser Leu
flL
Asp Ala Val
CL
Arg Arg
0~
CYs Ala Leu Ile Asn Met Val Phe
QL
oi_ dL dL oL oL tL dL cyL oL dL nL ~2,. q dL ckL aL
* **
** * * ** **
0~
q oL (jL cL at_ (Ye q cyL (yL (jL
* **
*
Arg 31, Asn TL CYS"L Asn Gly Val Ile Thr Lys Asp Glu Ala Glu
ckD do
Lys Leu Phe Asn Gln
“I. aL oL oL (lL
Asp Val Asp Ala Ala Val
0~
ijL
;jL fL q o, OL (Ye aL
Arg Gly Ile Leu Arg Asn Ala Lys Leu Lys Pro Val
Arg PL Asn YL CYSNL
** *
Asn CY~ Gly (lo Val JL Ile ijL Thr cL Lys q_
Arg JL Asn yL cys ot, Asn (Ye
OL oL (kL cL oL (IL oL “L ckL (kL
“L
QL
ASP
01
Asp Ala Val Arg
FL
Asp Ala Val
CL
Arg CYs ala Leu Ile Asn Met Val Phe
NL
Arg Arg cys Ala Leu Ile Asn Met Val Phe
at. oL 0L dL dL dL (Ye ‘Ye (IL aL
uL (yL CYL (yL aL fiL nL
* *
0~
Tyr
QL
PDBI LO5
(jL
Asp @L Ser fzvL Leu fL ~1~ ~1~ (yL
PDB 1LO4
PDBI LO3
PDBILOZ
PDBILOI
PDBILZM
Ser uL Leu fL ~2~ cjL
* **
Asp Glu Ala Glu Lys Leu Phe Asn Gln Asp Val Asp Ala Ala Val Arg Gly lie Leu Arg Asn Ala Lys Leu Lys Pro Val Tyr Asp Ser Leu Asp Ala Val Arg Arg cys Ala Leu lie Asn Met Val Phe
Gly
NL
aL cyL (Ye CYL mL mL mL (yL (yL q oL oL (kL oL (tiL (kL (kL (kL ckL cL (yL CYL oL CYL CY~ CYL cyL wL tvL CL CL
(yL oL (yL (yL q (yL (kL oL (kL oL ckL mL
**
* *
OD
Val
JL
**
Ile Thr
JL fL
*
Lys Asp Glu Ala Glu
Qt.
Lys Leu Phe Am Gln Asp Val Asp Ala Ala Val
LyL
Arg Gly Ile Leu Arg Asn Ala Lys Leu Lys Pro Val
NL
(Ye dL dL (yL uL aL (Ye dL dL
aL o>L dL q fL (kL
* **
OL
dL OL oL nL
Tyr fin Asp (yL Ser oL Leu ft Asp Ala Val
~1. (kL (lL
Arg Arg cys Ala Leu Ile Asn Met Val Phe
a~
(kL “L oL CI~ (lL oL (lL o)_ q_
*
P. Hudbky et aLlJournal of Molecular Structure (Theochem)
270
362 (1996) 263-273
Table 2 continued PDBILZM 105 Gln 106 Met 107 Gly 108 Glu 109 Thr 110 Gly 111 Val 112 Ala 113 Gly 114 Phe 115 Thr 116 Asn 117 Ser 118 Leu 119 Arg 120 Met 121 Leu 122 Gin 123 Gin 124 Lys 125 Arg 126 Trp 127 Asp 128 Glu 129 Ala 130 Ala 131 Val 132 Am 133 Leu 134 Ala 135 Lys 136 Ser 137 Arg 138 Trp 139 Tyr 140 Asn 141 Gln 142 Thr 143 Pro 144 Asn 145 Arg 146 Ala 147 Lys 148 Arg 149 Val 150 Ile 151 Thr 152 Thr 153 Phe 154 Arg 155 Thr 156 Gly 157 Thr
q_ ~5~ Eu cr,_ cq (Ye cq oL oL yL oL q_ q q_ q_ q_ q cq q_ cq, yL q Q,_ cxL oL oL oL q oL cyL yt_ &_ q q q q q 6L q oL oL ot_ at_ q_ q q q cq q oL ot_ ou pL
*
*
*
*
**
* **
PDBlLOl
PDBlLO2
Gin Met Gly Glu Thr Gly Val Ala Gly
PDBlL03
PDBlL04
PDBILOS
L~L
Gln Met Gly Glu Thr Gly Val Ala Gly
@L
Gln Met Gly Glu Thr Gly Val Ala Gly
"L
Gin Met Gly Glu Thr Gly Val Ala Gly
&L
Gin Met Gly Glu Thr Gly Val Ala Gly
Phe
YL
Phe
YL
Phe
YL
Phe
YL
Phe YL
Thr Asn Ser Leu Arg Met Leu Gln Gln Lys
aL aL oL aL
aL cyL oL oL
oL oL oL oL
oL oL oL oL
q ~5~ ED
*
cyL q_ (YL
oL oL
**
oL 6L CD
* *
oL oL QL
q oL
*
oL ~5~ CD
oL oL QL
oL oL
oL 15~
*
ED
oL oL QL
oL oL
*
oL ~5~ eD oL oL QL
oL oL a~
al,
Thr Asn Ser Leu Arg Met Leu Gln Gln Lys
@D
Thr Asn Ser Leu Arg Met Leu Gln Gln Lys
cyD
Thr Asn Ser Leu Arg Met Leu Gln Gln Lys
@D
Thr Asn Ser Leu Arg Met Leu Gln Gln Lys
Arg
YL
Arg
YL
A%
YL
Arg
YL
4
YL
Trp
@L
Trp
QL
Trp
QL
Trp
OIL
@L
@L
YL
Asp Glu Ala Ala Val Asn Leu Ala LYS
&L
YL
Asp Glu Ala Ala Val Asn Leu Ala LYS
QL
oL aL oL oL (Ye oL cyL
Asp Glu Ala Ala Val Asn Leu Ala LYS
YL
Trp Asp Glu Ala Ala Val Asn Leu Ala LYS
oL
Asp Glu Ala Ala Val Asn Leu Ala LYS Ser
PL
Ser PL
Ser
PL
Ser
PL
Ser PL
A%
(YL
Trp
QL
QL
oL oL oL oL
*
0~
oL oL oL oL
oL oL oL oL oL oL oL YL
a~
oL oL oL oL
cxL oL oL oL oL oL crL
0~
oL oL oL cq
oL oL oL oL oL oL oL
Arg
oL oL oL oL oL oL oL oL @D
oL oL oL aL oL oL oL YL (YL
~YL
Arg
(YL
Aw
"L
(YL
Trp
0~
Tw
@L
Tyr
(YL
'M
(YL
'M
a~
'M
L~L
Asn Gin Thr Pro Asn Arg Ala LYS
oL cxL ~5~ oL aL
oL aL 6L cfL oL
aL
Asn Gln Thr Pro Asn Arg Ala LYS
oL oL 15~ oL oL
ffL
Asn Gln Thr Pro Asn Arg Ala LYS
oL oL ~5~ fxL oL
QL
Asn Gin Thr Pro Asn Arg Ala LYS
(YL
Trp Tyr Asn Gin Thr Pro Asn Arg Ala LYS
Arg
QL
Arg
QL
Arg
QL
Arg
QL
Arg
QL
Val Ile Thr Thr Phe Arg Ala Gly
oL oL oL oL oL
Val Ile Thr Thr Phe Arg Thr Gly
oL aL oL aL oL
oL aL oL oL oL
PL
Val Ile Thr Thr Phe Arg Thr Gly ASD
oL oL oL oL oL
Ala
Val Ile Thr Thr Phe Arg Thr Gly Asp
oL oL oL q oL
aD
Val Ile Thr Thr Phe Arg Thr Gly CYS
Ile
**
&L
oL
(YL
oL aD PL
* *
**
a~
oL
a~
cxL
* *
**
@L
oL
L~L
oL aD PL
* *
**
QL
oL
(YL
oL
*
QD
* *
PL
-
01~
Arg
*
*
QL
Trp
*
* *
oL
*
(YL
oL cy,_ ~5~ cyL oL
**
~YL
aL QL
(*L
oL oo a,
* * *
271
Swucrure (Theochemi 362 119961 263-273
P. HudLikp et al./Journal of‘ Molecular Table 2 continued PDB
I LZM
158 Trp 159 Asp 160 Ala 161 Tyr 162 Lys 163Asno, 164 Leu a Changes
bL oL aL hL bL
PDBlLOl
* ** *
in primary
Trp
6,.
Asp Ala Tyr
~YL
q 6L
LYS aL Asn oL Leu structure,
PDBl LO3
PDBlL02 *
Trp
**
Asp CYL Ala cyL Tyr 6,
*
conformational
6L
LYS ffL Asn oL Leu assignment
*
* *
and calculated
using the code in Scheme 1 0 : 0” d dev < 10” 1: 10”6dev<20” 2 : 20” d dev < 30” 3 : 30” d dev < 40” (*) 4 : 40” 6 dev < 50” (**) 5 : 50” d dev < 60” (***) Scheme
Trp
1
Alternatively, the deviations may also be denoted by varying number of asterisks. This latter notation is used in the next example. Table 2 shows the complete assignment of lysozyme and five of its point mutants. 5.2 Pattern search The folded structure of proteins may be decomposed into complex building units such as motifs (e.g. helix-turn-helix motif), or into smaller elements such as secondary structure types (a-helix, P-sheet, etc.). Both the more complex and the simpler units are built up from individual conformations of the appropriate amino acid residues. The easy examination and characterization of any of the large 3D building elements is now possible by the identification of a sequence of conformational codes. A given sequence of amino acid conformation codes defines a complex 3D unit (e.g. the ~L-~L-6L-PL-~L-~L-aL-PLPL-PL-~L-bL-~r-aL-aL-YL-QL-EL-bL-EL -eL-oL-oL-YD-@_-&Lsequence stands for the 3D structure shown in Fig. 1). It is convenient to assign and characterize any folding unit by its linear description. ACAP may perform different searches for a sequence determined by its primary or 3D structure.
bL
Asp QL Ala oL Tyr 6, Lys q Asn CY~ Leu deviation
PDB *
* * *
I LO4
Trp 6~ Asp (YL Ala oL Tyr bL LYS QL Asn oL Leu
* *
PDB
1LO5
Trp
6~
Asp ri,_ Ala oL Tyr bL LYS aL Asn (q Leu
* **
(* or -) are shown in bold.
This makes possible the quick identification of complex patterns even in larger protein (or proteins), based on a simple “code-sequence” search. The method may also be used to determine totally new conformational patterns in protein 3D structures. 5.3. Statistical analysis The linear description of the secondary structure makes it possible to analyse quantitatively the folding of globular proteins. The linear description created by ACAP can be used to examine the frequency of different conformation building units of the main chain of proteins. In the present approach, the well known periodic secondary structural elements have their conformational codes (e.g. a-helix + aL, P-sheet + flL, polyproline II --f eL), so they can easily be recognized. In addition, the less “well known” secondary structures may also be analysed in similar studies (e.g. rare P-turn types or loops). This makes it possible to collect homogeneous statistical information from the entire structure of a protein. Different amino acids may prefer different conformational centres. The method has the ability to handle the amino acids of protein separately, so that reports can be made about the conformational preferences of the different residues. The results of the analysis are obtained both in absolute values and in percentages. 5.4. Different output formats The linear description of 3D structure of proteins has the advantage that it is easy to handle. The whole “structure” may be printed beside the
272
P. Hudbky et al./Journal of Molecular Structure (Theochem) ALA ALA LYS SER GLU LEU ASP LYS ALA ILE GLY ARG ASN CYS ASN GLY VAL
al
crL
+
A
PL
+
Q
YL
-+
G
al al al al al
*
al al
?D-+
al al
R
*
bL
el
***
6o + S
gl al
* *
eL -+ E
ad
+
D
ad ad el
***
ILE
el
***
THR
el
LYS
al
ASP
al
Format
362 (1996) 263-273
Scheme 2
1.
amino acid sequence. ACAP performs three types of outputs for the same protein structure. The first Format (Format 1) is a vertical one, where the amino acid residue and its conformational code are printed one after the other. This file format is easy to handle by any text editor or other data management programs. The second Format type (Format 2) is a horizontal one, where 18 amino acid residues and the belonging conformational codes are in two separated lines. This output format is shorter and legible, so it can be examined easily by visual methods.
The third Format type is more special. It is a fully compressed format created for alignment and character oriented programs. In this format, the amino acid residues are denoted by the traditionally accepted one-letter codes. The conformational codes are also labelled by a single character letter as shown in Scheme 2. Thus it is possible to create output files containing only the amino acid residues or the conformational codes, or both. ACAP is also able to create these output formats for the results of additional code types. In the example below there is a code-segment from an output established on the basis of the KabschSander secondary structure codes [ 181, determined by hydrogen bond pattern analysis. (This output works in ACAP only for files containing already the Kabsch-Sander assignment). The names of the amino acid residues in a protein:
AFAGVLNDADIAAALEACKAADSFNHKAFFAKUGLTSKSADDVKKAFAIIDQD SGFIEEDELKLFLQNFKADARALTDGETKTFLKAGDSDGDGKIGVDEFTALVK
The conformational
assignments:
IAGSDQEAAAAAAAAAADAEEPDEQAAAAAAAAPAADEEAAAAAAAAAAAGAA APQGEAAAAADAAAAAQADEEEEEAAAAAAAAAAAGAAPAPQEEAAAAAAAAA
P. Hudriky et ai./Journal of Molecular Structure (Theochem)
No
Name Code dev
19
20
21
LYS ASP THR el el al **
22 GLU
23 GLY
24 TYR
25 TYR
al *
ad *
el
el **
26
27
THR
ILE
bl *
28 GLY
bl *
ed **
29 ILE
213
362 (1996) 263-273
30 GLY
gl
ad
31
32
33
34
35
36
HIS
LEU
LEU
THR
LYS
SER
al *
el
gl
el *
al *
hl **
Format 2.
The Kabsch-Sander structure:
notation
of the secondary
RRTTSSRHHHHHHHHHHTRSTTRRRRHHHHHHHHTGGGSRHHHHHHHHHHHSTT SSEERHHHHHTGGGGTSTTRRRRRHHHHHHHHHHHRTTRSSSEEHHHHHHHHH
Acknowledgements This research was supported by a grant from the Hungarian Scientific Research Foundation (OTKA Nos. T 017192 and F 013799). The continued financial support of the NSERC of Canada is gratefully acknowledged. The ACAP program is available free of charge upon request. A PC version of the ACAP program is available from Dr. A. Perczel at the Department of Organic Chemistry. Eotvos University, Budapest, 112 P.O.B. 32, H-1518 Hungary.
References [I] (a) L. Pauling and R. Corey, Proc. Natl. Acad. Sci. USA, 37 (1951) 729; (b) L. Pauling, R. Corey and H. Branson, Proc. Natl. Acad. Sci. USA, 37 (1951) 205. [2] C. Ventakatachalam, Biopolymers, 6 (1968) 1425. [3] P. Lewis, F. Momany and H. Sheraga, Biochim. Biophys. Acta. 303 (1973) 211. [4] P.Y.Ch0uandG.D. Fasman,J.Mol.Biol., 115(1977) 135. [5] G. Rose, L. Gierasch and J. Smith, Adv. Protein Chem., 37 (1985) 1. [6] D.W. Goddette, C. Paech, S.S. Yang, J.R. Milenez, C. Bystrofl, M. Wilke and R.J. Fletterick, J. Mol. Biol., 222 (1992) 580. [7] Protein Data Bank Chemistry Department, Building 555, Brookhaven National Laboratory, P.O. Box 5000, January 1994, Release #67. [8] LG. Csizmadia, Multidimensional stereochemistry and conformational potential energy surface topology, in
J. Bertran (Ed.), New Theoretical Concepts for Understanding Organic Reactions, Reidel, Dordrecht, 1989, pp. 1-31. 191 R. Chandrasekaran, L. Lakshiminarayanan, U. Pandya and G. Ramachandran, Biochim. Biophys. Acta, 303 (1973) 14-27. M. Head-Gordon, M.J. Frish, 1101 (a) T. Head-Gordon, C. Brooks, III, and J.A. Pople, Int. J. Quantum Chem., Quantum Biol. Symp., 16 (1989) 311; (b) T. Head-Gordon, M. Head-Gordon, M.J. Frish, C. Brooks, III, and J.A. Pople,, J. Am. Chem. Sot., 113 (1991) 5989; (c) A. Perczel, J.G. Angydn, M. Kajtar, W. Viviani, J.L. Rivail, J.F. Marcoccia and I.G. Csizmadia, J. Am. Chem. Sot., 113 (1991) 6256; (d) M.A. McAllister, A. Perczel, P. Csaszar, W. Viviani, J.-L. Rivail, LG. Csizmadia, J. Mol. Struct. (Theochem), 288 (1993) 161; (e) W. Viviani, J.L. Rivail, A. Perczel and I.G. Csizmadia, J. Am. Chem. Sot., 115 (1993) 8321. 1111 A. Perczel, 6. Farkas and LG. Csizmadia, J. Comput. Chem., submitted for publication. [I21 A. Perczel, 6. Farkas and I.G. Csizmadia, J. Mol. Struct. (Theochem), in preparation. [I31 A. Perczel, 0. Farkas and I.G. Csizmadia, Can. J. Chem., submitted for publication. u41 L. Schafer, V.J. Klinkowski, F.A. Momany, H. Chuman and C. Van Alsenoy, Biopolymers, 23 (1984) 2335. u51 M. Vasquez, G. Nemethy and H.A. Scheraga, Macromolecules, 16 (1983) 1043. [lb1 (a) A. Perczel, M.A. McAllister, P. Csaszar and I.G. Csizmadia, J. Am. Chem. Sot., I15 (1993) 4849; (b) A. Perczel, M.A. McAllister, P. Csaszar and I.G. Csizmadia. Can. J. Chem., 72 (1994) 2050. [I71 A. Perczel, M. Kajtar, J.F. Marcoccia and I.G. Csizmadia, J. Mol. Struct. (Theochem), 232 (1991) 291. 1181 W. Kabsch and C. Sander, Biopolymers, 22 (1983) 2577.