Amino acid conformational analyses of proteins (ACAP program)

Amino acid conformational analyses of proteins (ACAP program)

THEO CHEM Journal of Molecular Structure (Theochem) 362 (1996) 263-273 Amino acid conformational P&ter Hudhky”, And& analyses of proteins (ACAP pr...

881KB Sizes 0 Downloads 64 Views

THEO CHEM Journal of Molecular Structure (Theochem) 362 (1996) 263-273

Amino acid conformational P&ter Hudhky”, And&

analyses of proteins

(ACAP

program)

Perczela7b1*T1, Imre G. Csizmadiaa,b

“Department of Organic Chemistry, Etitvds University, H-1117 Budapest, 112 P.O.B. 32, Hungary bDepartment of Chemistry, Utliversity of Toronto. Toronto, Ont. MS5 IAl, Canada

Received 1 August 1995; accepted 15September 1995

Abstract The secondary structure analysis of proteins is a powerful tool, efficiently supporting spectroscopic (CD, IR, NMR) and biochemical research projects. However, the precise location of the different secondary structural elements in a sequence incorporates some subjectivity. A linearized notation of a uniform and objective description of protein backbone structures was developed. This involves a three-dimensional to one-dimensional transformation (3D -+ 1D). The classification is based on a step-by-step comparison performed between reference conformers (also called template values) and backbone sub-conformations of the protein. The readily provided structure templates may be modified, but the procedure still remains objective and uniform. This linearized notation of protein structures provides a description of the three-dimensional backbone conformation without relying on the traditional concept of secondary structure. Previously, the sequence analysis (primary structure) of proteins resulted in the identification of the 20 natural amino acid residues. Similarly, the backbone conformation analysis of the same proteins confirmed the presence of nine basically different subconformers. This recognition and the application of the linearized notation of the 3D structure makes easier the comparison of proteins at the levels of primary to tertiary structure. Keywords:

Ab initio calculation;

ACAP

program; Amino acid; Conformational analysis; Conformational cluster; Protein

1. Introduction

Until now, no explicit relationship, with significant reliability, has been found between the amino acid sequence of a protein and its conformer(s). Different approaches attempted to create a variety of prediction algorithms, but no “fine-tuned” a priori technique is in sight. The objective and uniform classification of the already determined backbone conformation types should be the initial * Corresponding author. ’ E-mail address: [email protected]. Department of Biochemistry, University Parks Road, Oxford OX1 3QU, UK.

Present address: of Oxford, South

step. The identification of the nowadays familiar conformers started in the 1950s with the pioneering work of Pauling and co-workers [ 11,continued by Venkatachalam [2], Sheraga and co-workers [3], Chou and Fasman [4], Rose et al. [5] and others. A common characteristic of these approaches was the comprehensive search for “highly similar folds” in proteins using the available X-ray structure databases. Initially, only some dozens of proteins were analysed and therefore beside the regular some irregular (or atypical) subunits had to be incorporated among the secondary structural elements. These irregular backbone conformers were collectively referred to as random or unordered structures. Sequentially,

0166-1280/96/$15.00 0 1996 Elsevier Science B.V. All rights reserved SSI)I 0166-1280(95)04416-7

P. Hudriky et ul./Journal qf Molecular

264

Sfructure (Theochem)

362 (1996) 263-273

Y ,.../ ‘>.3 -.+;> ,_(.<. _ t&.4

(a)

GLY 0

’ PRO

No

38

Name

His Pro Asp Leu Asn Ile Arg . . . . . -......................-----------........

No Name

51 Pro

39

52 Gly -

40

41

42

43

44

45

46

47

48

49

50

Gly

Gly

Ala

Ser

Phe

Val

53 54 55 56 51 58 59 60 61 62 63 Glu Pro Ser Thr Gln Asp Gly Asn Gly His Gly . * . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . l\l\/\/\l\l

Fig. 1. (A) The “ribbon” of a serine protease [6]segment (residues 38-63) (lST3) from X-ray data [7].(B) The amino acid sequence of the serine protease (lST3) fragment as shown, with assigned secondary structures (NvlMA, u-helices; - - - - - -, P-sheets; I , O-turns; .) unordered). The secondary structure assignment is identical with that reported in the PDB file. with the enlargement of the investigated X-ray databases, more and more aperiodic but regular substructures (e.g. P-turn types) were recognized (Fig. 1).

However the category of irregular secondary structural element had to be preserved. The last four decades resulted in partial success in the comprehensive analysis of the secondary structure of proteins. Even the classification criteria of the periodic regular backbone units (a-helix, P-plated sheet, polyproline II, etc.) have problematic aspects. Not only is the determination of the minimal chain lengths ambiguous, but also no consensus of the allowed limit for torsion angle deviations has been established. Moreover, various aperiodic, but regular sub-structures (rare P-turn types and/or loops) are often ignored owing to their sporadic occurrence. The division of the backbone conformers was typically achieved by the fragmentation of the X-ray-determined protein backbone structures.

The main obstacle of such a method is that the relative frequency ratios of the above subconformations in the investigated protein database influence the analysis; less frequently observed aperiodic secondary structure types (e.g. type I’, II’, III’, Vla, VIb, VIII P-turns) are often ignored.

2. An alternative approach The conceptual difference between our method and the previous approaches is that the possible backbone conformers are determined by ab initio computations performed on pLptide models. This method is therefore independent of the common weaknesses of the pattern recognition techniques, applied on X-ray databases. The multidimensional conformational analysis [8] of potential-energy hypersurfaces (PEHS) obtained for model peptides, where energy is a function of the torsion

P. Hudciky el al./Journal of Molecular Structure (Theochem)

angles, may result in the recognition of all possible conformers. Even if a backbone conformer has high energy relative to the global minimum (and therefore a low probability would be expected), it is considered important to acquire complete conformational template sets. The determination of a suitable number of “conformation centres” on a 2n-dimensional a Ramachandran map (e.g. up to nine “conformation centres” on a 2D PES (E = E($, $)) provides the basis for the recognition of discrete backbone subconformations. Our observation that only a limited number of backbone conformations exist is equivalent to the recognition that a finite number of conformations play a role during protein folding. Comparing, from residue to residue, the backbone sub-conformations of a protein with the reference conformers of a template, an objective approach was developed to describe the whole 3D structure of a protein (Fig. 2). 3. The determination of the template conformers The conformational analysis requires two types of input data: the geometrical parameters of the

A

Yop

by

Conformational assignement

Fig. 2. Block diagram of the ACAP (Amino Analyses of Proteins) program.

acid conformation

3o0°

y

180° 6oo

b

D

a,

ED

pL

c

c6

%

t

180’

300”

D

I 60 ’

4

top

Fig. 3. The ideal locations of the 9 minima of a backbone conformational unit as predicted by multidimensional conformational analysis. in a “topology orientated” representation (0” < 4 < 360”, 0” < $ < 360”).

investigated protein(s) and the reference backbone conformations accumulated in a template. We recommend the use of templates determines by ab initio computations since these results are almost “free” from external parameters. The backbone conformational potential-energy hypersurface, E = E(x), of a protein containing n amino acids, where x = $i, &, &, . . , q$_, , $+_, , qb,, may be subdivided into (n - 1 - k) number of 2k-dimensional Ramachandran-type potentialenergy hypersurfaces (PEHS), (k = 1,2,. . . , IZ- 2) since only the first and the last amino acid residues cannot be assigned, since 4, and +, are undefined. In a “diamide concept”, when k = 1, a total of (n - 2) two-dimensional (2D) Ramachandran maps [9], E = E(c#Q, $;), are obtained. Therefore, the conformational properties of single amino acid residues composing the primary sequence of the protein are analysed. It has been successfully demonstrated [lo] that such an E = E(c$~, pi) Ramachandran-type surface has nine or fewer minima, depending on the type of amino acid residue (Fig. 3). These nine minima have been predicted by multidimensional conformational analysis. The initial calculations were carried out by molecular mechanics (MM) technique on a number of amino acid derivatives of the types Ac-X-NHMe and For-X-NH2 (AC = CHsCO-; For = HCO-; Me = -CH3). The minima were subsequently subjected to ab initio computations carried out on For-X-NH*, where X = glycine [lOc,d], L-alanine [ lOc,d], p-alanine [ 10~1, L-valine [ lOd,e], L-serine [l 11, L-threonine [12], L-cysteine [12] and L-phenylalanine [ 131 residues. Although ab initio studies

?z Comrpakn

265

362 (1996) 263-273

266

P. Hudbky et al./Journal of Molecular Structure (Theochem)

revealed the annihilation of certain minima in various cases [lo-141, which has not been observed when molecular mechanics (ECEPP/2) [15] were applied [8], it nevertheless should be emphasized that an increase in the number of minima beyond nine was never observed. In a “triamide concept”, when k = 2, (n - 3) number of different four-dimensional (4D) Ramachandran maps, all with E = E($iy $i, 4i+ 1, $i+ 1) could be obtained. In this way, the conformational properties of the triamides of two adjacent amino acid residues are analysed. It has been successfully demonstrated that such an E = E(~i, $i, 4i+ 1, $i+ 1)-type surface may have 9 x 9 = 8 1 or fewer minima [16,17]. Consequently, (32)“, as an upper bound for the number of minima (if n is the number of amino acids in the protein), is a good approximation for the maximum number of main-chain (or backbone) conformers in a protein. The influences of the different side-chains are important [ 1Oe,1 l- 131 in terms of modifying the position of a given minimum on the backbone PEHS surfaces and also in terms of determining the relative energetic spectrum of the minima. Furthermore, special side-chains may also modify the topology of peptide conformational PEHS [lo- 131.

362 (1996) 263-273

Brookhaven Protein Data Bank (PDB)). Therefore, the PDB database format is the default input type of the ACAP program, although any other coordinate types can serve as input. When assigning, ACAP does not require the Cartesian coordinates themselves, but uses only the [$, $1 torsional angles. The appropriate internal coordinates can be calculated easily from the appropriate Cartesian coordinates. The subroutine performing such a calculation of the [4, $1 torsional angle values in ACAP is the same as that provided by PDB routine (dihedral.exe). (Note that a small modification was introduced, however, in the presentation of the results: the PDB coordinate file may contain some duplicates originating from uncertainties of the structure refinement. In such cases only one of the alternative sub-structures should remain in the transformed coordinate file). If side-chain coordinates are available, then xl, x2, x3, x4 are also calculated. At the current stage of our work, the side-chains are usually not taken into account during conformational analyses. Initial efforts have already been made to differentiate in the main-chain assignment depending on the side-chain conformation.

5. Applications in 4. Computational implementation Owing to the linearized notation of the peptide 3D structures, it became possible to create, an automatic way to describe, compare and examine protein 3D structures. The basis of our notation is the classification of [$, $1 torsional angle pairs into major conformational clusters. The program called ACAP (amino acid conformational analyses of proteins) makes a fully automatic assignment of the overall protein backbone structure. It has additional features that make protein 3D structure examinations simpler and effective. The input of the program may be any of two common data formats: Cartesian or internal coordinates. The primary result of a protein structure determination (by X-ray or by NMR) is usually a set of [x,y, z] coordinates. (This type of coordinates is to be found most easily in the

ACAP

Using our method, the assignment and “labelling” of the consecutive protein backbone sub-structures is possible. This is based on the step-by-step comparison of the “measured” substructures with the computed ab initio minima collected in “templates”. This “linearized” or compact notation of protein backbone conformations has been developed to perform conformational pattern recognition in protein families. 5.1. Description

of protein 30 structure:

the

assignment

The assignment of each and every amino acid, not counting the first and the last residue of a sequence, is performed. During such a step the [4, $1 torsional angle pair of the current residue is compared with all of the value pairs incorporated in the selected template. The deviations (dev) are

P. Hudciky et al.lJournal

of Molecular

Structure (Theochem)

267

362 (1996) 263-273

Table 1

Template torsional angle values a for the nine conformational centres of amino acid residues

4 *

in proteins

aL

QD

PL

YL

YD

6L

6D

EL

CD

-68.6 -17.5

61.8 31.9

-161.6 169.9

-84.5 68.1

14.3 -59.5

-126.2 26.5

-179.6 -43.7

-74.7 167.8

64.1 -178.6

a Data are calculated from the ab initio conformers of For-L-Ala-L-Ala-NH* (-180” < 4 < + 180”, -180” < 4 < + 180”).

This new method differs from all previous approaches as it is based on the topological behaviour of peptides and protein fragments. The current method is a priori rather than a posteriori and therefore meets all the requirements of an objective method. This conformational analysis is an easy procedure, which introduces a complete set of conformational elements (also called conformational codes) describing the folding of the main chain. The assignment results in a 3D + ID transformation by preserving the geometrical properties of the folded protein in terms of linearized codes (Q~, ffD, PL, YL, ?D, SLt SD, CL? CD). It is hoped that such a “linearized description” of the 3D geometry might lead to useful qualitative comparisons based on quantitative structural data. Fig. 4 shows the assignment of the same serine protease (lST3) fragment as reported in Fig. 1. Besides the primary sequence (His, Pro, Asp, etc.) of the protein fragment (38-63), the linearized description (yL, cL, CY~, pL, etc.) of the 3D structure is also reported. The numbers associated with the deviation express the reliability of conformational centre

calculated for all template elements ([c$,$1 torsional angle pairs of the conformational centres) with respect to the torsional angles of the current amino acid residue. The conformational assignment of the residue is based on the lowest deviation. The deviation, which is the Pythagorean distance, is calculated in the following way: dev = &(4[residue]

- 4[templ.]’

+ (@[residue] - $[templ.])*) Note that 4[residue] and $[residue] are the torsional angles of the examined residue and &empl.] and $[templ.] are those of the template. At the current stage of the work, the same template is used for all residue types in a protein. This simplification is based on our previous observation that only small differences were calculated between the templates determined for the different types of amino acid diamides [IO- 141.Initial efforts have been made to differentiate between the various amino acids during assignments, but no significant improvement in the accuracy has been achieved. The template used in this study is shown in Table 1 [16]. No.

38

39

40

Name

His

Pro

Asp

AssDev. No.

41

42

and reported according to IUPAC-IUB convention

43

44

45

46

PL

QL

6~

PL

YL

EL

aL

PL

PL

2

0

1

4

5

4

3

4

4

51

52

53

47

48

49

50

60

61

62

63

Leu Asn Ile Arg Gly Gly Ala Ser Phe Val

54

55

56

57

58

59

Name Pro Gly Glu Pro Ser Thr Gln Asp Gly Asn Gly His Gly Ass. aL aL ye aL EL &L aL "L YD aL aL Dev. 2 2 4 1 1 6: “f 4 3 5 5 2 0 Fig. 4. A fragment (38-63) of a serine protease (1ST3) [6] and its structure using the linearized description technique. Note that all amino acids have been assigned and the magnitude of deviation with respect to the template is specified. (Ass. = assignment according to ACAP using the template reported in Table 1; Dev. = deviation as shown in Scheme 1.)

P. Hudbky et al.lJournal of Molecular Structure (Theochem)

268 Table 2 The complete conformational taken from the PDB [7] PDBlLZM 1 2 3 4 5 6 7 8 9 10 11 12 13

Met Asn Ile Phe Glu Met Leu Arg Ile Asp Glu Gly Leu

pL or or (Yt_ q cyL CX~ q cyL cyt_ co pL

14 Arg PL 15 Leu or_ 16 Lys 17 Ile

PL

18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51

PL

Tyr Lys Asp Thr Glu Gly Tyr Tyr Thr Ile Gly Ile Gly His Leu Leu Thr Lys Ser Pro Ser Leu Asn Ala Ala Lys Ser Glu Leu Asp Lys Ala Ile Gly

tL EL tL oL oL o,, eL cL pL &_ to yL (Yp yL tL oL PL or tL oL tL aL oL oL oL cq or or_ q_ oL oL oL oL on

assignme&

*** *** * ** * **

* * **

**

PDBlLO2

PDBlL03

Met

Met

Met

PL

oL oL oL q_ oL

Arg Ile

(YL

Asp Glu

QL

Gly

@D

oL oL

Leu

PL

Arg

PL

Leu

oL

LYS PL Ile tL Tyr Lys Asp Thr GIu

PL CL

*** *** * *** * **

CL

oL oL

Gly

@D

Tyr

EL

Tyr

CL

Thr

BL

Ile PL

Gly

**

ED

** **

**

*

**

* *

Gly or, His yL Leu tL Leu cyL Thr Lys Ser Pro Ser Leu Asn Ala Ala

PL

LYS Ser Glu Leu

aL

Asp Lys Ala Ile Gly

6L

cL aL q_ oL oL oL oL

* ** * *

**

QL QL

ffD

PL

Arg Ile

QL

Asp Glu

0~

Gly

ED

oL oL oL oL oL oL oL

Leu PL Ax

PL

Leu

q

LYS PL Ile eL Tyr Lys Asp Thr Glu

PL

Gly

@D

Tyr

EL

Tyr

CL

Thr

PL

Ile

PL

Gly

ED

EL

** *** *** * *** ** **

CL

oL oL

* **

**

Gly OD His yL Leu tL Leu oL Thr Lys Ser Pro Ser Leu Asn Ala Ala

PL 6L

eL oL eL oL oL oL oL

* ** * **

**

Lys aL Ser oL Glu oL Leu oL

aL oL oL

oL oL

Asn Ile Phe Glu Met Leu

Ile YL

Ile YL

* * *

and five of its point mutants.

PDBlLOl

Asn Ile Phe Glu Met Leu

*

of a lysozyme

*

Asp LYS Ala Ile Gly

@L ffL

oL oL aD

*

Asn Ile Phe Glu Met Leu

PL

Arg Ile

~YL

Asp Glu

(YL

Gly

CD

362 (1996) 263-273

All structures

oL oL oL oL oL LYL oL

Leu PL Av

PL

Leu

CYL

LYS PL Ile cL Tyr LYS Asp Thr Glu

PL fL

* *** *** * ** * **

EL

(YL oL

Gly

OD

Tyr

EL

Tyr

EL

Thr

PL

Ile PL

Gly

CD

Ile

YL

Gly aD His ye Leu eL Leu oL Thr PL Lys 'YL Ser eL Pro q Ser tL Leu LYL Asn oL Ala q Ala q

* * ** * **

* * ** *

**

have a resolution

PDB 1LO4

PDBlLO5

Met

Met

Asn Ile Phe Glu Met Leu

PL

Arg Ile

@L

Asp Glu

0~

Gly

CD

oL oL crL cq oL a~ oL

Leu

PL

Arg

PL

Leu

oL

LYS fL Ile tL Tyr PL Lys EL Asp EL Thr oL Glu oL Gly

"0

Or

EL

Tyr

EL

Thr

PL

Ile

PL

Gly

tD

Ile

YL

Gly OD His yL Leu EL Leu oL Thr PL Lys &L Ser eL Pro oL Ser eL Leu q_ Asn q Ala cq Ala oL

Lys QL Ser ‘YL Glu oL Leu oL

LYS Ser Glu Leu

aL

Asp Lys Ala Ile

Asp Lys Ala Ile

LYL

Gly

o/D

Gly

~YL CyL

oL oL aD

*

of 1.7 A and were

or aL crL

* *** *** * *** * **

* * **

**

* ** * **

*

PL

Arg Ile

CYL

or oL oL oL oL 0~

Asp @L Glu oL Gly

ED

Leu PL Arg

PL

Leu oL LYS CL Ile tL Tyr LYS Asp Thr Glu

PL

Gly

@D

Tyr

CL

EL CL

aL oL

Tyr

EL

Thr

PL

Ile

PL

Gly

CD

Ile

YL

Gly His Leu Leu

aD

7~ EL oL

Thr PL Lys QL Ser q Pro or Ser eL Leu cq Asn oL Ala CYL Ala q_

* *** *** * ** ** * ** * * * * **

**

* ** * *

**

*

Lys CyL Ser q Glu oL Leu oL

*

L~L

*

Asp Lys Ala Ile

*

Gly

OD

LyL

q_ q

Asn Ile Phe Glu Met Leu

(YL

q_ cq

P. Hudrjk,p et al./Journal qf’ Molecular Structure (Theochenl)

269

362 [ 1996) 263-273

Table 2 continued

52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104

Arg Asn cys Asn Gly Val Ile Thr Lys Asp Glu Ala Glu Lys Leu Phe Asn Gln Asp Val Asp Ala Ala Val Arg Gly Ile Leu Arg Asn Ala Lys Leu Lys Pro Val Tyr Asp Ser Leu Asp Ala Val Arg Arg cys Ala Leu Ile Asn Met Val Phe

flL yL CYL no Cko 3t, $,_ cL ~1~ No q, oi_ o,_ q_ (Y,_ cyyL CQ No q 0,. (rL oL mL q ~1~ cq o;L q oL et dL 0,. oL oL Cq CQ oL oL dL FL F,_ oL oL oL mL q q_ (ki_ cq aL q mL q

* *

**

*

* **

Arg 3~ Asn YL CYs fiL Asn oo Gly Val Ile Thr Lys

OD

Asp Glu Ala Glu

QL

ijL dL tL C’L

**

*

** **

Arg ilj~

Asn YL cys

OL

Asn do Gly oo Val JL Ilc JL Thr cL

Lys QL Leu nL Phe cvL Asn dL Gln q_

Lys ffL Asp cxL Glu cyL Ala cvL Glu (jL Lys CYL Leu dL Phe mL Asn aL Gln cyL

Asp 0~ Val ckL

Asp QL Val a!L

Asp 0~ Ala oL Ala oL Val q

Asp NL Ala (yL Alas nL Val q

Arg QL Gly QL Ile CYL Leu oL Arg ciL Asn tt Ala oiL

Arg Gly Ile Leu Arg Asn Ala

@L

LYS Leu Lys Pro Val Tyr

flL

oL dL dL

Lys 01. Leu q_ Lys Pro Val Tyr Asp Ser Leu

flL

Asp Ala Val

CL

Arg Arg

0~

CYs Ala Leu Ile Asn Met Val Phe

QL

oi_ dL dL oL oL tL dL cyL oL dL nL ~2,. q dL ckL aL

* **

** * * ** **

0~

q oL (jL cL at_ (Ye q cyL (yL (jL

* **

*

Arg 31, Asn TL CYS"L Asn Gly Val Ile Thr Lys Asp Glu Ala Glu

ckD do

Lys Leu Phe Asn Gln

“I. aL oL oL (lL

Asp Val Asp Ala Ala Val

0~

ijL

;jL fL q o, OL (Ye aL

Arg Gly Ile Leu Arg Asn Ala Lys Leu Lys Pro Val

Arg PL Asn YL CYSNL

** *

Asn CY~ Gly (lo Val JL Ile ijL Thr cL Lys q_

Arg JL Asn yL cys ot, Asn (Ye

OL oL (kL cL oL (IL oL “L ckL (kL

“L

QL

ASP

01

Asp Ala Val Arg

FL

Asp Ala Val

CL

Arg CYs ala Leu Ile Asn Met Val Phe

NL

Arg Arg cys Ala Leu Ile Asn Met Val Phe

at. oL 0L dL dL dL (Ye ‘Ye (IL aL

uL (yL CYL (yL aL fiL nL

* *

0~

Tyr

QL

PDBI LO5

(jL
Asp @L Ser fzvL Leu fL ~1~ ~1~ (yL

PDB 1LO4

PDBI LO3

PDBILOZ

PDBILOI

PDBILZM

Ser uL Leu fL ~2~ cjL

* **

Asp Glu Ala Glu Lys Leu Phe Asn Gln Asp Val Asp Ala Ala Val Arg Gly lie Leu Arg Asn Ala Lys Leu Lys Pro Val Tyr Asp Ser Leu Asp Ala Val Arg Arg cys Ala Leu lie Asn Met Val Phe

Gly

NL

aL cyL (Ye CYL mL mL mL (yL (yL q oL oL (kL oL (tiL (kL (kL (kL ckL cL (yL CYL oL CYL CY~ CYL cyL wL tvL CL CL

(yL oL (yL (yL q (yL (kL oL (kL oL ckL mL

**

* *

OD

Val

JL

**

Ile Thr

JL fL

*

Lys Asp Glu Ala Glu

Qt.

Lys Leu Phe Am Gln Asp Val Asp Ala Ala Val

LyL

Arg Gly Ile Leu Arg Asn Ala Lys Leu Lys Pro Val

NL

(Ye dL dL (yL uL aL (Ye dL dL
aL o>L dL q fL (kL

* **

OL

dL OL oL nL

Tyr fin Asp (yL Ser oL Leu ft Asp Ala Val

~1. (kL (lL

Arg Arg cys Ala Leu Ile Asn Met Val Phe

a~

(kL “L oL CI~ (lL oL (lL o)_ q_

*

P. Hudbky et aLlJournal of Molecular Structure (Theochem)

270

362 (1996) 263-273

Table 2 continued PDBILZM 105 Gln 106 Met 107 Gly 108 Glu 109 Thr 110 Gly 111 Val 112 Ala 113 Gly 114 Phe 115 Thr 116 Asn 117 Ser 118 Leu 119 Arg 120 Met 121 Leu 122 Gin 123 Gin 124 Lys 125 Arg 126 Trp 127 Asp 128 Glu 129 Ala 130 Ala 131 Val 132 Am 133 Leu 134 Ala 135 Lys 136 Ser 137 Arg 138 Trp 139 Tyr 140 Asn 141 Gln 142 Thr 143 Pro 144 Asn 145 Arg 146 Ala 147 Lys 148 Arg 149 Val 150 Ile 151 Thr 152 Thr 153 Phe 154 Arg 155 Thr 156 Gly 157 Thr

q_ ~5~ Eu cr,_ cq (Ye cq oL oL yL oL q_ q q_ q_ q_ q cq q_ cq, yL q Q,_ cxL oL oL oL q oL cyL yt_ &_ q q q q q 6L q oL oL ot_ at_ q_ q q q cq q oL ot_ ou pL

*

*

*

*

**

* **

PDBlLOl

PDBlLO2

Gin Met Gly Glu Thr Gly Val Ala Gly

PDBlL03

PDBlL04

PDBILOS

L~L

Gln Met Gly Glu Thr Gly Val Ala Gly

@L

Gln Met Gly Glu Thr Gly Val Ala Gly

"L

Gin Met Gly Glu Thr Gly Val Ala Gly

&L

Gin Met Gly Glu Thr Gly Val Ala Gly

Phe

YL

Phe

YL

Phe

YL

Phe

YL

Phe YL

Thr Asn Ser Leu Arg Met Leu Gln Gln Lys

aL aL oL aL

aL cyL oL oL

oL oL oL oL

oL oL oL oL

q ~5~ ED

*

cyL q_ (YL

oL oL

**

oL 6L CD

* *

oL oL QL

q oL

*

oL ~5~ CD

oL oL QL

oL oL

oL 15~

*

ED

oL oL QL

oL oL

*

oL ~5~ eD oL oL QL

oL oL a~

al,

Thr Asn Ser Leu Arg Met Leu Gln Gln Lys

@D

Thr Asn Ser Leu Arg Met Leu Gln Gln Lys

cyD

Thr Asn Ser Leu Arg Met Leu Gln Gln Lys

@D

Thr Asn Ser Leu Arg Met Leu Gln Gln Lys

Arg

YL

Arg

YL

A%

YL

Arg

YL

4

YL

Trp

@L

Trp

QL

Trp

QL

Trp

OIL

@L

@L

YL

Asp Glu Ala Ala Val Asn Leu Ala LYS

&L

YL

Asp Glu Ala Ala Val Asn Leu Ala LYS

QL

oL aL oL oL (Ye oL cyL

Asp Glu Ala Ala Val Asn Leu Ala LYS

YL

Trp Asp Glu Ala Ala Val Asn Leu Ala LYS

oL

Asp Glu Ala Ala Val Asn Leu Ala LYS Ser

PL

Ser PL

Ser

PL

Ser

PL

Ser PL

A%

(YL

Trp

QL

QL

oL oL oL oL

*

0~

oL oL oL oL

oL oL oL oL oL oL oL YL

a~

oL oL oL oL

cxL oL oL oL oL oL crL

0~

oL oL oL cq

oL oL oL oL oL oL oL

Arg

oL oL oL oL oL oL oL oL @D

oL oL oL aL oL oL oL YL (YL

~YL

Arg

(YL

Aw

"L

(YL

Trp

0~

Tw

@L

Tyr

(YL

'M

(YL

'M

a~

'M

L~L

Asn Gin Thr Pro Asn Arg Ala LYS

oL cxL ~5~ oL aL

oL aL 6L cfL oL

aL

Asn Gln Thr Pro Asn Arg Ala LYS

oL oL 15~ oL oL

ffL

Asn Gln Thr Pro Asn Arg Ala LYS

oL oL ~5~ fxL oL

QL

Asn Gin Thr Pro Asn Arg Ala LYS

(YL

Trp Tyr Asn Gin Thr Pro Asn Arg Ala LYS

Arg

QL

Arg

QL

Arg

QL

Arg

QL

Arg

QL

Val Ile Thr Thr Phe Arg Ala Gly

oL oL oL oL oL

Val Ile Thr Thr Phe Arg Thr Gly

oL aL oL aL oL

oL aL oL oL oL

PL

Val Ile Thr Thr Phe Arg Thr Gly ASD

oL oL oL oL oL

Ala

Val Ile Thr Thr Phe Arg Thr Gly Asp

oL oL oL q oL

aD

Val Ile Thr Thr Phe Arg Thr Gly CYS

Ile

**

&L

oL

(YL

oL aD PL

* *

**

a~

oL

a~

cxL

* *

**

@L

oL

L~L

oL aD PL

* *

**

QL

oL

(YL

oL

*

QD

* *

PL

-

01~

Arg

*

*

QL

Trp

*

* *

oL

*

(YL

oL cy,_ ~5~ cyL oL

**

~YL

aL QL

(*L

oL oo a,

* * *

271

Swucrure (Theochemi 362 119961 263-273

P. HudLikp et al./Journal of‘ Molecular Table 2 continued PDB

I LZM

158 Trp 159 Asp 160 Ala 161 Tyr 162 Lys 163Asno, 164 Leu a Changes

bL oL aL hL bL

PDBlLOl

* ** *

in primary

Trp

6,.

Asp Ala Tyr

~YL

q 6L

LYS aL Asn oL Leu structure,

PDBl LO3

PDBlL02 *

Trp

**

Asp CYL Ala cyL Tyr 6,

*

conformational

6L

LYS ffL Asn oL Leu assignment

*

* *

and calculated

using the code in Scheme 1 0 : 0” d dev < 10” 1: 10”6dev<20” 2 : 20” d dev < 30” 3 : 30” d dev < 40” (*) 4 : 40” 6 dev < 50” (**) 5 : 50” d dev < 60” (***) Scheme

Trp

1

Alternatively, the deviations may also be denoted by varying number of asterisks. This latter notation is used in the next example. Table 2 shows the complete assignment of lysozyme and five of its point mutants. 5.2 Pattern search The folded structure of proteins may be decomposed into complex building units such as motifs (e.g. helix-turn-helix motif), or into smaller elements such as secondary structure types (a-helix, P-sheet, etc.). Both the more complex and the simpler units are built up from individual conformations of the appropriate amino acid residues. The easy examination and characterization of any of the large 3D building elements is now possible by the identification of a sequence of conformational codes. A given sequence of amino acid conformation codes defines a complex 3D unit (e.g. the ~L-~L-6L-PL-~L-~L-aL-PLPL-PL-~L-bL-~r-aL-aL-YL-QL-EL-bL-EL -eL-oL-oL-YD-@_-&Lsequence stands for the 3D structure shown in Fig. 1). It is convenient to assign and characterize any folding unit by its linear description. ACAP may perform different searches for a sequence determined by its primary or 3D structure.

bL

Asp QL Ala oL Tyr 6, Lys q Asn CY~ Leu deviation

PDB *

* * *

I LO4

Trp 6~ Asp (YL Ala oL Tyr bL LYS QL Asn oL Leu

* *

PDB

1LO5

Trp

6~

Asp ri,_ Ala oL Tyr bL LYS aL Asn (q Leu

* **

(* or -) are shown in bold.

This makes possible the quick identification of complex patterns even in larger protein (or proteins), based on a simple “code-sequence” search. The method may also be used to determine totally new conformational patterns in protein 3D structures. 5.3. Statistical analysis The linear description of the secondary structure makes it possible to analyse quantitatively the folding of globular proteins. The linear description created by ACAP can be used to examine the frequency of different conformation building units of the main chain of proteins. In the present approach, the well known periodic secondary structural elements have their conformational codes (e.g. a-helix + aL, P-sheet + flL, polyproline II --f eL), so they can easily be recognized. In addition, the less “well known” secondary structures may also be analysed in similar studies (e.g. rare P-turn types or loops). This makes it possible to collect homogeneous statistical information from the entire structure of a protein. Different amino acids may prefer different conformational centres. The method has the ability to handle the amino acids of protein separately, so that reports can be made about the conformational preferences of the different residues. The results of the analysis are obtained both in absolute values and in percentages. 5.4. Different output formats The linear description of 3D structure of proteins has the advantage that it is easy to handle. The whole “structure” may be printed beside the

272

P. Hudbky et al./Journal of Molecular Structure (Theochem) ALA ALA LYS SER GLU LEU ASP LYS ALA ILE GLY ARG ASN CYS ASN GLY VAL

al

crL

+

A

PL

+

Q

YL

-+

G

al al al al al

*

al al

?D-+

al al

R

*

bL

el

***

6o + S

gl al

* *

eL -+ E

ad

+

D

ad ad el

***

ILE

el

***

THR

el

LYS

al

ASP

al

Format

362 (1996) 263-273

Scheme 2

1.

amino acid sequence. ACAP performs three types of outputs for the same protein structure. The first Format (Format 1) is a vertical one, where the amino acid residue and its conformational code are printed one after the other. This file format is easy to handle by any text editor or other data management programs. The second Format type (Format 2) is a horizontal one, where 18 amino acid residues and the belonging conformational codes are in two separated lines. This output format is shorter and legible, so it can be examined easily by visual methods.

The third Format type is more special. It is a fully compressed format created for alignment and character oriented programs. In this format, the amino acid residues are denoted by the traditionally accepted one-letter codes. The conformational codes are also labelled by a single character letter as shown in Scheme 2. Thus it is possible to create output files containing only the amino acid residues or the conformational codes, or both. ACAP is also able to create these output formats for the results of additional code types. In the example below there is a code-segment from an output established on the basis of the KabschSander secondary structure codes [ 181, determined by hydrogen bond pattern analysis. (This output works in ACAP only for files containing already the Kabsch-Sander assignment). The names of the amino acid residues in a protein:

AFAGVLNDADIAAALEACKAADSFNHKAFFAKUGLTSKSADDVKKAFAIIDQD SGFIEEDELKLFLQNFKADARALTDGETKTFLKAGDSDGDGKIGVDEFTALVK

The conformational

assignments:

IAGSDQEAAAAAAAAAADAEEPDEQAAAAAAAAPAADEEAAAAAAAAAAAGAA APQGEAAAAADAAAAAQADEEEEEAAAAAAAAAAAGAAPAPQEEAAAAAAAAA

P. Hudriky et ai./Journal of Molecular Structure (Theochem)

No

Name Code dev

19

20

21

LYS ASP THR el el al **

22 GLU

23 GLY

24 TYR

25 TYR

al *

ad *

el

el **

26

27

THR

ILE

bl *

28 GLY

bl *

ed **

29 ILE

213

362 (1996) 263-273

30 GLY

gl

ad

31

32

33

34

35

36

HIS

LEU

LEU

THR

LYS

SER

al *

el

gl

el *

al *

hl **

Format 2.

The Kabsch-Sander structure:

notation

of the secondary

RRTTSSRHHHHHHHHHHTRSTTRRRRHHHHHHHHTGGGSRHHHHHHHHHHHSTT SSEERHHHHHTGGGGTSTTRRRRRHHHHHHHHHHHRTTRSSSEEHHHHHHHHH

Acknowledgements This research was supported by a grant from the Hungarian Scientific Research Foundation (OTKA Nos. T 017192 and F 013799). The continued financial support of the NSERC of Canada is gratefully acknowledged. The ACAP program is available free of charge upon request. A PC version of the ACAP program is available from Dr. A. Perczel at the Department of Organic Chemistry. Eotvos University, Budapest, 112 P.O.B. 32, H-1518 Hungary.

References [I] (a) L. Pauling and R. Corey, Proc. Natl. Acad. Sci. USA, 37 (1951) 729; (b) L. Pauling, R. Corey and H. Branson, Proc. Natl. Acad. Sci. USA, 37 (1951) 205. [2] C. Ventakatachalam, Biopolymers, 6 (1968) 1425. [3] P. Lewis, F. Momany and H. Sheraga, Biochim. Biophys. Acta. 303 (1973) 211. [4] P.Y.Ch0uandG.D. Fasman,J.Mol.Biol., 115(1977) 135. [5] G. Rose, L. Gierasch and J. Smith, Adv. Protein Chem., 37 (1985) 1. [6] D.W. Goddette, C. Paech, S.S. Yang, J.R. Milenez, C. Bystrofl, M. Wilke and R.J. Fletterick, J. Mol. Biol., 222 (1992) 580. [7] Protein Data Bank Chemistry Department, Building 555, Brookhaven National Laboratory, P.O. Box 5000, January 1994, Release #67. [8] LG. Csizmadia, Multidimensional stereochemistry and conformational potential energy surface topology, in

J. Bertran (Ed.), New Theoretical Concepts for Understanding Organic Reactions, Reidel, Dordrecht, 1989, pp. 1-31. 191 R. Chandrasekaran, L. Lakshiminarayanan, U. Pandya and G. Ramachandran, Biochim. Biophys. Acta, 303 (1973) 14-27. M. Head-Gordon, M.J. Frish, 1101 (a) T. Head-Gordon, C. Brooks, III, and J.A. Pople, Int. J. Quantum Chem., Quantum Biol. Symp., 16 (1989) 311; (b) T. Head-Gordon, M. Head-Gordon, M.J. Frish, C. Brooks, III, and J.A. Pople,, J. Am. Chem. Sot., 113 (1991) 5989; (c) A. Perczel, J.G. Angydn, M. Kajtar, W. Viviani, J.L. Rivail, J.F. Marcoccia and I.G. Csizmadia, J. Am. Chem. Sot., 113 (1991) 6256; (d) M.A. McAllister, A. Perczel, P. Csaszar, W. Viviani, J.-L. Rivail, LG. Csizmadia, J. Mol. Struct. (Theochem), 288 (1993) 161; (e) W. Viviani, J.L. Rivail, A. Perczel and I.G. Csizmadia, J. Am. Chem. Sot., 115 (1993) 8321. 1111 A. Perczel, 6. Farkas and LG. Csizmadia, J. Comput. Chem., submitted for publication. [I21 A. Perczel, 6. Farkas and I.G. Csizmadia, J. Mol. Struct. (Theochem), in preparation. [I31 A. Perczel, 0. Farkas and I.G. Csizmadia, Can. J. Chem., submitted for publication. u41 L. Schafer, V.J. Klinkowski, F.A. Momany, H. Chuman and C. Van Alsenoy, Biopolymers, 23 (1984) 2335. u51 M. Vasquez, G. Nemethy and H.A. Scheraga, Macromolecules, 16 (1983) 1043. [lb1 (a) A. Perczel, M.A. McAllister, P. Csaszar and I.G. Csizmadia, J. Am. Chem. Sot., I15 (1993) 4849; (b) A. Perczel, M.A. McAllister, P. Csaszar and I.G. Csizmadia. Can. J. Chem., 72 (1994) 2050. [I71 A. Perczel, M. Kajtar, J.F. Marcoccia and I.G. Csizmadia, J. Mol. Struct. (Theochem), 232 (1991) 291. 1181 W. Kabsch and C. Sander, Biopolymers, 22 (1983) 2577.