Stepwise assembling of polypeptide chain energy distributions

Stepwise assembling of polypeptide chain energy distributions

Computers and Chemistry 25 (2001) 145 – 159 www.elsevier.com/locate/compchem Stepwise assembling of polypeptide chain energy distributions Saul G. Ja...

314KB Sizes 0 Downloads 99 Views

Computers and Chemistry 25 (2001) 145 – 159 www.elsevier.com/locate/compchem

Stepwise assembling of polypeptide chain energy distributions Saul G. Jacchieri * Fundac¸a˜o Antoˆnio Prudente, Rua Prof. Antoˆnio Prudente 211, Sa˜o Paulo SP 01509 -900, Brazil Received 4 February 2000; accepted 16 March 2000

Abstract The principles and application of conformational analysis software that makes use of a new algorithm are described. It is known that the existence of a local energy minimum in the energy landscape is in general related to the clustering of polypeptide chain conformations near that energy value or, in other words, to a high density of states. A criterion based on this principle is part of an algorithm employed to select subsets of polypeptide chain conformations in broad energy ranges. Chain fragments belonging to these subsets are then combined to build larger polypeptide chains and the corresponding energy distributions. The functionality of the various operations employed in the process is described and the FORTRAN 77 source code that defines the algorithm is listed. The methodology is illustrated with a calculation involving three chain fragments belonging to the cellular prion protein (PrPC). © 2001 Elsevier Science Ltd. All rights reserved. Keywords: Conformational analysis; Density of states; Potential energy landscape; Local energy minima; Matrix algorithm

1. Introduction The exercise of a peptide chain bio-active function often requires a structural transition, i.e. more than one energy minimum is involved. Whereas protein folding calculations intend to predict the native structure, the purpose of calculations about bio-active peptides is usually to find the energy distribution of chain conformations and the biologically important states. Bioactive polypeptide chain conformations are considered to correspond to the global minimum (Scheraga, 1996) or to meta stable states (Honeycutt and Thirumalai, 1992; Karplus, 1997). In either case, a broad sampling of the Potential Energy Landscape is necessary. It is important for the presently described calculations that a comparison between different methodologies shows that in Molecular Dynamics (El* Tel.: +55-11-32725000, ext. 1437; fax: +55-11-32725088. E-mail address: [email protected] (S.G. Jacchieri).

ber, 1996) or Monte Carlo (Ortiz et al., 1998) calculations the energy is part of the algorithm, whereas in calculations based on lattice algorithms (Skolnick et al., 1997; Bahar et al., 1997) conformations may be obtained without an initial regard to their energies. Consequently, a larger breadth of conformational space is sampled. This is also an important feature of a conformational analysis software whose principles and functionality are described here. Based on results described in the literature (Jacchieri, 1997; Shortle et al., 1998) we have replaced the search of energy minima by a search of density of states maxima and minima in the energy distribution. A broad energy interval is considered in the calculation of the energy distribution. Density of states calculations require an energy unbiased algorithm. For instance, the Metropolis Algorithm (Ortiz et al., 1998 and references quoted therein), by its very definition, distorts the density of states near energy minima. In this work a matrix algorithm that generates

0097-8485/01/$ - see front matter © 2001 Elsevier Science Ltd. All rights reserved. PII: S 0 0 9 7 - 8 4 8 5 ( 0 0 ) 0 0 0 7 6 - 0

146

S.G. Jacchieri / Computers & Chemistry 25 (2001) 145–159

a uniform grid of main chain and side chain rotamers is employed in the conformational search. The present methodology enables a generation of energy distributions rather than a prediction of the native state. However, the native (or active) conformation should be part of a representative distribution. By applying the methodology to a problem that involves a known structural transition we may evaluate the presence in the distribution of the initial and final states belonging to this transition. We chose three regions of the prion protein (PrPC), whose structure has been recently determined (James et al., 1997), to illustrate the sampling of density of states maxima and minima. The tertiary structure of the PrPC protein is formed by a natively unfolded, a-helix, b-sheet and loop regions. It is known that an a-helix to b-sheet transition is associated (Keh-Ming et al., 1993) with pathogenic activity. These are structural features and transitions that we want to describe theoretically. The algorithm described in this manuscript has been applied to the helix-coil transition (Jacchieri and Richards, 1992), a characterization of possible a-MSH structural transitions (Jacchieri and Ito, 1995) from aqueous solution to membrane bound conformations, peptide chain structural transitions (Jacchieri et al., 1998) related to oligopeptidase specificity for conformation and, more recently, to the role (Jacchieri, 1998) of isolated b-strands in the formation of amyloid plaques.

2. Theory

2.1. The matrix algorithm N mer chain conformations are generated by the matrix product Mn = M1 œO2 œO3 œ…œON

(1)

Mi and Oi are matrices of statistical weights, the elements of matrices M and O are, respectively, the values of exp( −E ji /RT) and exp( −DE ji /RT), where j are chain conformations, Ei are internal energies of a chain fragment having residue i in the carboxyl terminus and DEi are interaction energies of residue i with preceding residues. The matrix multiplication in Eq. (1) generates a new set of chain conformations by combining (Jacchieri and Jernigan, 1992) monomer residue main chain rotamers. The internal energies of these conformations are given by E jn = E j1 +DE j2 +... +DE jn

(2)

The FORTRAN 77 source code employed in the execution of Eq. (1) is listed in Appendix A.

An important feature of this matrix product is that for each matrix of statistical weights the range of interaction and the number of main chain rotamers attributed to the actual residue may be varied. It is thus possible to freeze a chain fragment by attributing a single main chain rotamer to each monomer residue belonging to it. This freezing of chain fragments is part of the algorithm described below. When fi side chain conformers are considered per monomer residue i Eq. (1) is repeated f *f 1 *…*f 2 N times.

2.2. An algorithm based on density of states extremes Let us consider that, as shown in the previous section, Mn is the matrix of statistical weights corresponding to the chain fragment 1 –n. Each element of Mn is related to a conformation j and an internal energy E jn . By sorting the E jn ’s in increasing order the density of states h(E), defined as the number of states per unity energy interval, is obtained. As illustrated in Figs. 1 – 3, a plot of h(E) ×E shows the existence of energy intervals where h(E) is a local minimum or maximum. The physical interpretation (Jacchieri, 1997; Shortle et al., 1998) of energy distributions of the kind shown in Figs. 1 – 3 is that the energy intervals corresponding to maximum densities of states are populated by states corresponding to extremes in the potential energy landscape, including local maxima, minima and saddle points and the native structure, supposed (Scheraga, 1996) to occupy the global minimum. We have created a conformational analysis algorithm by extending this reasoning in the following ways. It is assumed that the maximum density of states condition applies to chain fragments as well as to the whole chain. Besides density of states maxima, density of states minima are part of the algorithm. The first assumption is justified by the minimum frustration principle and the second one enables the search of deep sharp energy minima that correspond (Jacchieri, 1997) to a minimum density of states. According to the above discussion, only the states belonging to h(E) maxima and minima are relevant to the conformational analysis, since these states correspond to maxima and minima (and saddle points) in the potential energy landscape and are candidates for biological activity. An important consequence of this reasoning is that a large decrease in the number of states included in the conformational analysis is enabled. This reduction in the number of chain conformations is part of the conformational analysis algorithm described in this paper. The algorithm runs as follows: 1. Carrying out Eq. (1), a tetrameric chain fragment in the carboxy end is submitted to a conformational analysis (source code listed in Appendix A). The

S.G. Jacchieri / Computers & Chemistry 25 (2001) 145–159

remaining part of the chain (the amino end group in the first iteration) stays frozen. The lowest 40 Kcal/ mol energy interval is selected. 2. The conformations and internal energies obtained in the previous stage are submitted to a density of states calculation (source code listed in Appendix B). A subset of chain conformations belonging to energy intervals corresponding to density of states maxima and minima is selected (source code listed in Appendix C). 3. A tetramer fragment is added to each conformation obtained in step two and step one is repeated. This way the polypeptide chain and the corresponding energy distribution grow with the addiction of tetramer fragments.

2.3. Numerical procedure As shown in the previous section the polypeptide chain and the energy distribution are built by adding chain fragments n residues long (where n= 4), so that the chain length increases as 1– n, 1–2n, 1–3n, …,N. N is not necessarily a multiple of n since in the last stage a chain fragment shorter than n may be added to the growing chain.

147

In the first step of the algorithm we generate a set S1 – 4 of conformations belonging to the 1 – 4 chain fragment. This is achieved with Eq. (1). Calculations carried out with the Build Up Procedure (Vasquez and Scheraga, 1985), which is also based on the combination of sets of chain conformations, have shown that the number of chain conformations increases rapidly with chain length even if we discard conformations whose energies are outside of a defined lower energy range. It is thus necessary to adopt a narrow bottleneck to select from S1 – 4 the conformations of 1 – 4 that will be used to build the S1 – 8 set of conformations belonging to the chain fragment 1 – 8, and so on. Following the previous discussion, we generate a h1 – 4 × E plot and select the energy regions that belong to maxima and minima. Since the latter set of chain conformations that we have called s1 – 4 is in general smaller than S1 – 4, avoidance of a combinatorial explosion is ensured. The size of s1 – n also depends on numerical parameters because the computation of densities of states is a numerical procedure. DE is the width of the energy interval used to calculate h(E), given by h(E) = DNs(E)/DE, DNs(E) being the number of states within the energy interval (E, E+ DE). DI is defined as the number of energy intervals of size DE in the immediate

Fig. 1. Various stages in the stepwise assembling of the energy distribution belonging to the PrPC chain fragment 106 – 129 (sequence in the one letter code KTNMKHMAGAAAAGAVVGGLGGYM). In each stage the energy distribution corresponds to step one of the algorithm (see Section 2.2, Appendices A and B) and the selection of maxima and minima in the energy distribution corresponds to step two of the algorithm (see Section 2.2, Appendix C). Density of states (h) and force field energy (E) calculated as shown in Sections 2 and 3. E is scaled so that the minimum energy conformation is the zero energy level. E in Kcal/mol.

148

S.G. Jacchieri / Computers & Chemistry 25 (2001) 145–159

Fig. 2. Various stages in the stepwise assembling of the energy distribution belonging to the PrPC chain fragment 129 – 142 (sequence in the one letter code MLGSAMSRPMMHFG). In each stage the energy distribution corresponds to step one of the algorithm (see Section 2.2, Appendices A and B) and the selection of maxima and minima in the energy distribution corresponds to step two of the algorithm (see Section 2.2, Appendix C). Density of states (h) and force field energy (E) calculated as shown in Theory and Methods. E is scaled so that the minimum energy conformation is the zero energy level. E in Kcal/mol.

vicinity of E that, when having densities of states smaller (or larger) than h(E), indicates that h(E) is maximum (or minimum). s1 – n decreases for decreasing values of DE and increasing values of DI. Our experience has shown that s1 – n should not exceed two thousand conformations, otherwise our computational capability would be sur-

passed. To keep the number of chain conformations within proper bounds, DE is decreased and DI is increased depending on the size of S, as shown in Table 1. The procedure is iterative, in accordance with the above discussion. As the same operations are repeatedly executed to add fragments to the polypeptide chain, the energy distribution changes accordingly.

S.G. Jacchieri / Computers & Chemistry 25 (2001) 145–159

Since chain conformations are given by strings of letters and numerals (Jacchieri and Jernigan, 1992) it is possible to find secondary or tertiary structural motifs and to generate the corresponding energy distributions simply by searching (Jacchieri et al., 1998) strings. The model of the solvent as a continuous is employed to add hydration energies to the energy distribution. For this purpose, molecular surface areas (Connolly, 1983) and hydration energies (Ooi et al., 1987) are calculated.

149

3. Methods Five main chain rotamers (A:f= −57, c = −47; B:f= − 139, c =135; G:f= −60, c = −30, D:f= − 90, c = 0 and E:f= 70, c = −60, for proline residues: A: f= −75, c= 158; B: f = − 75, c = 149) are employed in the search. For side chain conformations we make use of the gauche minus, trans and gauche plus rotamers of the (Ponder and Richards,

Fig. 3. Various stages in the stepwise assembling of the energy distribution belonging to the PrPC chain fragment 160 – 175 (sequence in the one letter code QVYYRPVDQYNNQNNF). In each stage the energy distribution corresponds to step 1 of the algorithm (see Section 2.2, Appendices A and B) and the selection of maxima and minima in the energy distribution corresponds to step two of the algorithm (see Section 2.2, Appendix C). Density of states (h) and force field energy (E) calculated as shown in Sections 2 and 3. E is scaled so that the minimum energy conformation is the zero energy level. E in Kcal/mol.

S.G. Jacchieri / Computers & Chemistry 25 (2001) 145–159

150

Table 1 Values of the DE and DI parameters, described in the theory, adopted for various sizes of the actual set of conformations Size of S (conformations)

DE (cal)

B4000 4000–100 000 100 000–200 000 \ 200 000

500 100 50 10

DI 3 10 50 100

Table 2 Various stages in the building of PrPC chain fragments 106– 129, 129–142 and 160–175, and of the corresponding energy distributions Chain fragment

Sa

106–109 106–113 106–117 106–121 106–125 106–129

3452 138 620 32 866 207 372 20 696 48 006

558 743 1751 1833 700 3349

129–132 129–136 129–140 129–142

1669 356 820 633 878 12 678

424 1620 2551 881

160–163 160–167 168–171 168–175

5547 180 683 24 236 3275

401 679 1209 560

4. Results and discussion The above-described algorithm was used to generate distributions of chain conformations adopted by fragments 106 – 129, 129 – 142 and 160 – 175 of the PrPC polypeptide chain. Fragment 106 – 129 belongs to the natively unfolded PrPC region (James et al., 1997) and undergoes a-helix –b-strand transitions (De Gioia et al., 1994; Zhang et al., 1995). Peptide 129 – 142 whose sequence is part of the PrPC structural core, is also subjected to a-helix –b-strand transitions (De Gioia et al., 1994; Zhang et al., 1995), whereas fragment 160 – 175 forms a loop (James et al., 1997). The chosen set of chain fragments enables a study of structural transi-

sb

a Number of conformations found in the lowest 40 Kcal/mol energy interval. b Number of conformations selected with a density of states criterion.

1987) classification (the numerals 1, 2 and 3 indicate, respectively, the gauche minus, trans and gauche plus rotamers of x1). Gas phase energies are calculated with the ECEPP/2 (Zimmerman et al., 1977) force field. Hydration energies are calculated with hydration potentials (Ooi et al., 1987) and a routine (Connolly, 1983) for surface area computation. The software is entirely written in FORTRAN 77. The Whatif software package is employed for molecular graphics and to evaluate the root mean square deviation between calculated and experimental chain geometries. All calculations are carried out with an Origin 200 Silicon Graphics workstation.

Fig. 4. Best fitting, evaluated by root mean square deviations (see Methods), of minimum energy conformations (blue) of PrPC chain segments 160 – 175 (sequence in the one letter code QVYYRPVDQYNNQNNF) and 129 – 142 (sequence in the one letter code MLGSAMSRPMMHFG) and the NMR (James et al., 1997) structure depicted in red. Letters and numerals indicate (as defined in Section 3) main chain and side chain rotamers.

S.G. Jacchieri / Computers & Chemistry 25 (2001) 145–159

tions and comparisons between the native PrPC structure and chain conformations present in energy distributions. As shown in Table 2, 5547 chain conformations were found in the lowest 40 Kcal/mol energy interval of the 160–163 tetrapeptide fragment. By applying the density of states algorithm this number was decreased to 401 conformations. This set of conformations was then employed in the generation of over six million conformations (401 times 27 combinations of side chain rotamers times 54 main chain conformations) belonging to the 160–167 fragment, 180 683 of which were found to lay in the lowest 40 kcal/mol energy interval. This number was again decreased to 679 conformations by applying the density of states criterion. The procedure continued until 3275 conformations adopted by the 160–175 chain fragment were obtained.

151

A similar procedure was applied to the 129 – 142 and 106 – 129 chain fragments. h × E plots corresponding to consecutive building stages of the energy distributions belonging to the three chain fragments in the study are shown in Figs. 1 – 3. These plots contain important information (Sali et al., 1994; Hao et al., 1995; Jacchieri, 1997) about folding tendency, kinetic barriers, flexibility, closely or loosely packing and number of conformational families. The S160 – 175 and S129 – 142 sets of chain conformations were analyzed in terms of similarity to the PrPC structure. In both cases it was found that it is the least energy structure, shown in Fig. 4, that most closely resembles the native structure with, respectively, root mean square deviations of 4.6 and 4.8 A, . Taking into account that the theoretical model is constrained to five pairs of main chain dihedral angles per monomer

Fig. 5. a-Helix and b-strand energy distributions of PrPC chain fragment 106 – 129 and wire frame and solid ribbon representations of the most stable conformation in each distribution. Letters and numerals indicate (as defined in Section 3) main chain and side chain rotamers. ha and hb are, respectively, densities of a-helical and b-strand states. The energy E is scaled so that the minimum energy conformation is the zero energy level. E in Kcal/mol.

S.G. Jacchieri / Computers & Chemistry 25 (2001) 145–159

152

residue (see methods section for details) and that in the NMR structure (James et al., 1997) there are main chain rotamers like the left handed a-helix, reverse turn IV and inverse gamma turn which are not included in our set of rotamers, we may consider that there is a good fitting between experimental and calculated geometries. The chain fragment 106–129 exhibits (James et al., 1997) a large flexibility and an undefined structure when part of the PrPC structure. Conformational transitions involving this fragment have been predicted (Huang et al., 1996) to be part of the mechanisms of prion infectivity and amyloidosis. A comparison between Figs. 1–3 shows that h106 – 129 is much larger than h129 – 142 and h160 – 175. The larger density of states and the increasing and almost continuous energy distribution corresponding to the 106–129 chain fragment indicate a greater chain flexibility and absence (Sali et al., 1994; Hao et al., 1995; Jacchieri, 1997) of a folding tendency. The 1000 more stable conformations belonging to the energy distribution of the 106–129 fragment were submitted to a hydration energy calculation, as described in methods, and the density of states was re calculated. The new energy distribution was used to calculate the densities of a-helical and b-strand states plotted in Fig. 5. These calculations show that the first alpha-helix–beta-strand transition takes place 12 kcal/ mol above the energy minimum (found in the conformational search). The structures corresponding to the initial and final states of this transition are also depicted in Fig. 5. We have recently proposed (Jacchieri, 1998) that the presence of such isolated hydrophobic bstrands leads to a non-cooperative mechanism for the seeding of amyloid plaques.

chains in stages, each stage comprehending a longer chain fragment. Since it is not possible to pursue the calculation taking into account all generated conformations a criterion must be adopted to select chain conformations and this criterion must not discard important conformations. Thus, in the Build up Procedure (Vasquez and Scheraga, 1985) very high energy conformations are discarded and in CONGEN (Bruccoleri and Karplus, 1987) and COMPOSER (Srinivasan and Blundell, 1993) conformations found in sequentially homologous regions of known proteins are preferentially selected. In the present work we have proposed a criterion based on a physical reasoning. The use of two independent algorithms is shown. The matrix algorithm (Jacchieri and Jernigan, 1992) is used to generate sets of chain conformations and the density of states criterion is used to select sub sets populated by conformations belonging to local extremes in the Potential Energy Landscape. Although restricted with respect to the number of chain conformations, these subsets contain representative samples of Potential Energy Landscape minima and maxima. One of the advantages of the matrix algorithm is that it is energy unbiased. By carrying out Eq. (1) a uniform sampling of Potential Energy Landscape regions is achieved. Distance from local potential energy minima is not taken into account in this sampling. The strategy is, therefore, to accomplish a broad scanning of large Potential Energy Landscape regions which is then submitted to a search of density of states extremes that yields a much smaller set of chain conformations.

Acknowledgements 5. Conclusion Various methodologies are used to build polypeptide

This work was supported by a FAPESP grant. We thank Dr Gerrit Vriend for providing the Whatif software package.

Appendix A. Matrix Algorithm FORTRAN 77 source code Instituto Nacional de Propriedade Industrial register 92004423 CHARACTER RA(200),RAN(5) DIMENSION RANGLE(10,200),PANGLE(80,10,200), 1ILIST(200),ISCH(200),PHI(5,200),PSI(5,200),ANGLES(10,30) DOUBLE PRECISION OP(5,78125),OA(5,78125), 1P(5,78125) INTEGER C,RO,FJ,ENU(200),RON(200),ICOND(200),LIST(200)

S.G. Jacchieri / Computers & Chemistry 25 (2001) 145–159

C RAN IS THE MAIN CHAIN ROTAMER CONFORMATION RAN(1)=’A’ RAN(2) =’B’ RAN(3)=’G’ RAN(4)=’D’ RAN(5)=’E’ C ICOMP is the chain length including the end residues ICP=ICOMP-2 C RON(I)::NUMBER OF CONFORMERS AVAILABLE TO RESIDUE I’S BACKBONE C ILIST(I) is the list of amino acid residues following a numerical code C ENU(I) is the interaction interval of each residue C ISCH(I) is the number of side chain rotamers attributed to each residue C IPCH:TOTAL NUMBER OF SIDE CHAINS CONFORMATIONAL STATES TO BE CONSIDERED IPCH=1 DO 10 I=2,ICOMP-1 IPCH=IPCH*ISCH(I) 10 CONTINUE c PHI and PSI are main chain dihedral angles C PANGLE(K,J,I) --- MATRIX OF SIDE CHAIN DIHEDRAL ANGLES C -----------------K B = ROTATIONAL STATE OF THE SIDE CHAIN OF ACTUAL C -----------------J B =DIHEDRAL ANGLE(CHI1,CHI2,...) C -----------------I B =RESIDUE NUMBER DO 610 ISIDE=1,IPCH DDD= 0.0 IN=0 C THIS LOOP ATRIBUTES A PARTICULAR CONFORMATIONAL STATE C TO EACH SIDE CHAIN OF EACH RESIDUE IROJ=1 DO 100 INUM=2,ICOMP-1 IROK=IROJ*ISCH(INUM) NK= INT((ISIDE +IROK-1)/IROK) NJ= INT((ISIDE + IROJ-1)/IROJ) NRO=NJ-ISCH(INUM)*(NK-1) IROJ=IROK IF(ISCH(INUM).EQ.1)NRO=1 DO 90 JANG=4,10 ANGLES(JANG,INUM)=PANGLE(NRO,JANG,INUM)*RAD TYPE=PANGLE(NRO,JANG,INUM) 90 RANGLE(JANG,INUM) = PANGLE(NRO,JANG,INUM)*RAD 100 CONTINUE INR= 2 NU=ENU(2) LIST(1) =ILIST(1) LIST(2) =ILIST(2) IRO=RON(2) RIDEF=50 DO 160 IS= 1,IRO,1 RA(2) =RAN(IS) ANGLES(1,2) =PHI(IS,2)*RAD RANGLE(1,2)=PHI(IS,2)*RAD ANGLES(2,2)=PSI(IS,2)*RAD RANGLE(2,2) =PSI(IS,2)*RAD ANGLES(3,2)=180.*RAD RANGLE(3,2) =180.*RAD CALL FORCEFIELD

153

154

S.G. Jacchieri / Computers & Chemistry 25 (2001) 145–159

C ETOT is the internal energy of the actual chain conformation C OA and OP are matrices of statistical weights OA(IS,1) =DEXP(-1000.*ETOT/(R*T)) 160 CONTINUE TRIDEF=RIDEF DO 170 IS=1,IRO OA(IS,1) =OA(IS,1)/DEXP(-1000.*RIDEF/(R*T)) 170 CONTINUE JRES =1 ITF =1 DO 490 INC=3,ICOMP-1,1 DDD= DDD+DD IN= INC INR=IN-1 LIST(IN) = ILIST(IN) DO 180 J=4,10 ANGLES(J,IN) =RANGLE(J,IN) 180 CONTINUE NU=ENU(IN) IT= INR-NU+1 IIC=ICOMP-1 IF(IN.NE.IIC)GO TO 190 IN=ICOMP LIST(ICOMP) =ILIST(ICOMP) 190 CONTINUE IF(ICOND(INC).EQ.0)GO TO 490 IRES=JRES JRES =INC-1 IDF=JRES-IRES +ENU(IRES+ 1)-ENU(JRES+1) IF(IDF.NE.0)ITF= 0 INA= IN IF(IN.EQ.ICOMP)INA=IN-1 RO =RON(JRES+1) C MC is the number of main chain conformations MC= 1 IBEG =JRES-ENU(JRES +1)+2 IFIM =JRES DO 191 IPRO=IBEG,IFIM,1 MC =MC*RON(IPRO) 191 CONTINUE DO 360 IRO= 1,RO,1 RA(INA) =RAN(IRO) ANGLES(1,INA)=PHI(IRO,INA)*RAD RANGLE(1,INA)=PHI(IRO,INA)*RAD ANGLES(2,INA) =PSI(IRO,INA)*RAD RANGLE(2,INA)= PSI(IRO,INA)*RAD ANGLES(3,INA) =180.*RAD RANGLE(3,INA)= 180.*RAD DO 350 C=1,MC,1 IBEG=IN-NU +1 IF(IN.EQ.ICOMP)IBEG =IN-NU IFIM = IN-1 IF(IN.EQ.ICOMP)IFIM =IN-2 DO 310 INU=IBEG,IFIM,1 P2=1

S.G. Jacchieri / Computers & Chemistry 25 (2001) 145–159

DO 251 IPROD=IBEG,INU-1,1 P2= P2*RON(IPROD) 251 CONTINUE NRO= 1+INT((C-1)/P2) NRO =NRO-INT((C-1)/(P2*RON(INU)))*RON(INU) RA(INU) =RAN(NRO) ANGLES(1,INU)=PHI(NRO,INU)*RAD RANGLE(1,INU)= PHI(NRO,INU)*RAD ANGLES(2,INU)=PSI(NRO,INU)*RAD RANGLE(2,INU)= PSI(NRO,INU)*RAD ANGLES(3,INU)=180.*RAD RANGLE(3,INU)= 180.*RAD 310 CONTINUE IFIM =NU+ 1 IF(IN.EQ.ICOMP)IFIM= NU+2 IF(ITF.EQ.0)LIST(1) =1 IF((JRES-ENU(JRES + 1)).EQ.0)LIST(1)=ILIST(1) INF=IN-1 IF(IN.EQ.ICOMP)INF =IN-2 DO 330 I= 2,IFIM DO 320 J= 1,10,1 LIST(I) =ILIST(INF-NU+ I) ANGLES(J,1) =0 ANGLES(J,I)= RANGLE(J,INF-NU+ I) 320 CONTINUE 330 CONTINUE CALL FORCEFIELD C Most forcefields may be adapted to this computation. In this work we used C the ECEPP (Zimmerman et al., 1977) ETOT= ETOT-DD EETOT=ETOT+DD IF(IRO.EQ.1.AND. C.EQ.1)RIDEF =ETOT IF(ETOT.LT.RIDEF)RIDEF =ETOT OP(IRO,C) =DEXP(-1000.*ETOT/(R*T)) 350 CONTINUE 360 CONTINUE DO 380 IRO=1,RO DO 370 C=1,MC OP(IRO,C) =OP(IRO,C)/DEXP(-1000.*RIDEF/(R*T)) 370 CONTINUE 380 CONTINUE TRIDEF=TRIDEF + RIDEF C MATRIX OPERATIONS P1=1 P2=1 P3=1 IBEG=IRES-ENU(IRES+1)+2 IFIM =JRES+ 1-ENU(JRES+1) DO 381 IPROD=IBEG,IFIM,1 P1=P1*RON(IPROD) 381 CONTINUE IBEG=JRES-ENU(JRES+1)+2 IFIM=IRES DO 382 IPROD=IBEG,IFIM,1 P2=P2*RON(IPROD) 382 CONTINUE IBEG=IRES-ENU(IRES+1)+2

155

156

S.G. Jacchieri / Computers & Chemistry 25 (2001) 145–159

IFIM= IRES DO 383 IPROD=IBEG,IFIM,1 P3=P3*RON(IPROD) 383 CONTINUE IDF=JRES-IRES +ENU(IRES+1)-ENU(JRES +1) PR= 1 IBEG=JRES-ENU(JRES+1)+2 IFIM =IRES +1-ENU(IRES+1) DO 384 IPROD=IBEG,IFIM,1 PR= PR*RON(IPROD) 384 CONTINUE IF(IDF.EQ.0)P1 =1 IF(IDF.LT.0)P1=1/PR IF(JRES.EQ.2)P1 =1 DO 2100 C =1,MC IW= 1+INT((C-1)/P2)-INT((C-1)/(P2*RON(IRES+ 1)))*RON(IRES+1) IJ= 1+INT((C-1)*P1)-INT((C-1)/P2)*P3 FJ= IJ+INT(P1-1) 2100 CONTINUE DO 430 IRO=1,RO RA(INA)= RAN(IRO) DO 420 C=1,MC RC=1.*C PLIN=P2*RON(IRES+1) IW=1+ INT((RC-1)/P2)-RON(IRES+1)*INT((RC-1)/PLIN) IJ=1+ INT((RC-1)*P1)-INT((RC-1)/P2)*P3 FJ=IJ +INT(P1-1) S=0 IBEG=IN-NU +1 IF(IN.EQ.ICOMP)IBEG=IN-NU IFIM =IN-1 IF(IN.EQ.ICOMP)IFIM=IN-2 DO 390 INU= IBEG,IFIM,1 P2L =1 DO 389 IPROD=IBEG,INU-1,1 P2L=P2L*RON(IPROD) 389 CONTINUE NRO= 1+ INT((C-1)/P2L) NRO= NRO-INT((C-1)/(P2L*RON(INU)))*RON(INU) RA(INU)=RAN(NRO) 390 CONTINUE C P=OP x OA DO 400 J=IJ,FJ,1 S=S+OA(IW,J) 400 CONTINUE P(IRO,C) =OP(IRO,C)*S IF(P(IRO,C).EQ.0)GO TO 410 ENERGY=1000*TRIDEF-R*T*LOG(P(IRO,C))+1000.*DDD GOTO 420 410 ENERGY=0.99999E+30 420 CONTINUE 430 CONTINUE C Z is the partition function Z=0 DO 450 IRO=1,RO DO 440 C= 1,MC Z=Z+P(IRO,C)

S.G. Jacchieri / Computers & Chemistry 25 (2001) 145–159

440 CONTINUE 450 CONTINUE C matrix OA is reseted DO 480 IRO= 1,RO,1 DO 470 C= 1,MC,1 OA(IRO,C) =P(IRO,C) 470 CONTINUE 480 CONTINUE 490 CONTINUE STOP END

Appendix B. Density of States Calculation, FORTRAN 77 source code DIMENSION E(1052010),DNST(20000) c WDW corresponds to Delta E in Section 2.3 c NROWS=ITERMS is the number of chain conformations IF(NROWS.LE.4000)WDW =500. IF(NROWS.GT.4000.AND.NROWS.LE.100000)WDW= 100. IF(NROWS.GT.100000.AND.NROWS.LE.200000)WDW= 50. IF(NROWS.GT.200000)WDW =10. C E(I) is the internal energy of the Ith chain conformation sorted in increasing C order ICNT=1 INMB=0 ECMP =ICNT*WDW DO 20 IT= 1,ITERMS-1 EI =E(IT)-E(1) IF(EI.GT.ECMP)THEN c DNST(ICNT) is the density of states of the ICNTth energy interval DNST(ICNT)=1000.*INMB/WDW DO 25 INTRV= 1,20000 ECMP =ECMP+WDW ICNT =ICNT+1 IF(EI.LE.ECMP)GOTO 19 DNST(ICNT)=0 25 CONTINUE 19 CONTINUE INMB= 0 ENDIF INMB=INMB +1 20 CONTINUE STOP END Appendix C. This routine finds density of states maxima and minima in the energy distribution, FORTRAN 77 source code DIMENSION DNST(700000) INTEGER MIN(1000000),MAX(1000000) c NROWS is the number of chain conformations c WE corresponds to Delta e in Section 2.3 IF(NROWS.LE.4000)WE= 500. IF(NROWS.GT.4000.AND.NROWS.LE.100000)WE=100. IF(NROWS.GT.100000.AND.NROWS.LE.200000)WE= 50. IF(NROWS.GT.200000)WE =10. .

157

158

S.G. Jacchieri / Computers & Chemistry 25 (2001) 145–159

c IWDW corresponds to Delta I in Section 2.3 IF(WE.EQ.500.)IWDW =3 IF(WE.EQ.100.)IWDW =10 IF(WE.EQ.50.)IWDW =50 IF(WE.EQ.10.)IWDW =100 c DNST(I) is the density of states of the Ith energy interval c IESPN is the number of energy intervals NLINES=0 DO I=1,IESPN MIN(I) = 1 MAX(I) =1 END DO IMIN =1 IMAX=1 DO INT=2,IESPN-1 IUB=INT +IWDW ILB =INT-IWDW IF(ILB.LT.1)ILB =1 IF(IUB.GT.IESPN)IUB= IESPN IMIN=1 IMAX=1 DO JNT= ILB,INT-1 IF(DNST(JNT).GE.DNST(INT))THEN c a minimum density of states energy interval IMINC=1 ELSE IMINC=0 END IF IMIN=IMIN*IMINC IF(DNST(JNT).LE.DNST(INT))THEN c a maximum density of states energy interval IMAXC=1 ELSE IMAXC=0 END IF IMAX=IMAX*IMAXC END DO DO JNT= INT+1,IUB IF(DNST(JNT).GT.DNST(INT))THEN c a minimum density of states energy interval IMINC=1 ELSE IMINC =0 END IF IMIN =IMIN*IMINC IF(DNST(JNT).LT.DNST(INT))THEN c a maximum density of states energy interval IMAXC =1 ELSE IMAXC= 0 END IF IMAX =IMAX*IMAXC END DO MIN(INT) =MIN(INT)*IMIN MAX(INT) =MAX(INT)*IMAX END DO STOP END .

S.G. Jacchieri / Computers & Chemistry 25 (2001) 145–159

159

A., Groth, D., Mehlhorn, I., Huang, Z., Fletterick, R.J., Cohen, F.E., Prusiner, S.B., 1993. Proc. Natl. Acad. Sci USA 90, 10962. Ooi, T., Oobatake, M., Nemethy, G., Scheraga, H.A., 1987. Proc. Natl. Acad. Sci. USA 84, 3080. Ortiz, A.R., Kolinski, A., Skolnick, J., 1998. Proc. Natl. Acad. Sci. USA 95, 1020. Ponder, J.W., Richards, F.M., 1987. J. Mol. Biol. 193, 775. Sali, A., Shakhnovich, E., Karplus, M., 1994. J. Mol. Biol. 235, 1614. Scheraga, H.A., 1996. Biophys. Chem. 59, 329. Shortle, D., Simons, K.T., Baker, D., 1998. Proc. Natl. Acad. Sci. USA 95, 11158. Srinivasan, N., Blundell, T.L., 1993. Protein Eng. 6, 501. Skolnick, J., Kolinski, A., Ortiz, A.R., 1997. J. Mol. Biol. 65, 217. Vasquez, M., Scheraga, H.A., 1985. Biopolymers 24, 1437. Zhang, H., Kaneko, K., Nguyen, J.T., Livshits, T.L., Baldwin, M.A., Cohen, F.E., James, T.L., Prusiner, S.B., 1995. J. Mol. Biol. 250, 514. Zimmerman, S.S., Pottle, M.S., Nemethy, G., Scheraga, H.A., 19771Vasques and Scheraga, 1987. Macromolecules 10.

References Bahar, I., Kaplan, M., Jernigan, R.L., 1997. Proteins 29, 292. Bruccoleri, R.E., Karplus, M., 1987. Biopolymers 26, 137. Connolly, M.L., 1983. J. Appl. Cryst. 16, 548. De Gioia, L., Selgaggini, C., Ghibaud, E., Diomede, L., Bugiani, O., Forloni, G., Tagliavini, F., Salmona, M., 1994. J. Biol. Chem. 269, 7859. Elber, R., 1996. Curr. Opin. Struct. Biol. 6, 232. Hao, M.H., Scheraga, H.A., 1995. J. Chem. Phys. 102, 1334. Honeycutt, J.D., Thirumalai, D., 1992. Biopolymers 32, 695. Huang, Z., Prusiner, S.B., Cohen, F.E., 1996. Fold. Des. 1, 13. Jacchieri, S.G., 1997. Int. J. Quant. Chem. 65, 1115. Jacchieri, S.G., 1998. Biophys. Chem. 74, 23. Jacchieri, S.G., Gomes, M.D., Juliano, L., Camargo, A.C.M., 1998. J. Pept. Res. 51, 452. Jacchieri, S.G., Jernigan, R.L., 1992. Biopolymers 32, 1327. Jacchieri, S.G, Ito, A., 1995. Int. J. Quant. Chem. 53, 335. Jacchieri, S.G., Richards, N.G.J., 1992. Biopolymers 33, 325. James, T.L., Jacchieri, S.G., Gomes, M.D., Juliano, L., Camargo, A.C.M., 19987. J. Pept. Res. 51, 452. Karplus, M., 1997. Fold. Des. 2, 69. Keh-Ming, P., Baldwin, M., Nguyen, J., Gasset, M., Serban,

.