COMATOSE, a method for constrained refinement of macromolecular structure based on two-dimensional nuclear overhauser effect spectra

COMATOSE, a method for constrained refinement of macromolecular structure based on two-dimensional nuclear overhauser effect spectra

JOURNAL OF MAGNETIC RESONANCE 79,493-s 12 (1988) COMATOSE, A Method for Constrained Refinement of Macromolecular Structure Based on Two-Dimensiona...

1MB Sizes 0 Downloads 18 Views

JOURNAL

OF MAGNETIC

RESONANCE

79,493-s 12 (1988)

COMATOSE, A Method for Constrained Refinement of Macromolecular Structure Based on Two-Dimensional Nuclear Overhauser Effect Spectra BRANDAN A. BORGIAS AND THOMAS

L. JAMES *

Department of Pharmaceutical Chemistry, University of California, San Francisco, California 9414.3 Received December 3, 1987; revised January 22, 1988 COMATOSE (complete matrix analysis torsion optimized structure) is a structurerefinement program based on the quantitative calculation of 2D NOE intensities taking into account the effects of network relaxation and spin diffusion (J. W. Keepers and T. L. James, J. Magn. Reson. 57, 404 (1984)). The macromolecular structure is defined by fixed bondlengths and two-bond angles, leaving only torsion angles and residue orientations as variable parameters. The performance of this structure refinement algorithm with idealized and experimental 2D NOE intensities for oligonucleotides is discussed. The structural information available from COMATOSE is compared with the distances obtained by analysis of the 2D NOE intensities, and with the distances available from the direct solution of the intensity eigenvalue problem. The most commonly used calculation of distances relies on the assumption that the intensities can be approximated by assuming that the individual spin pairs are essentially isolated from all other protons. This approach systematically results in underestimation of distances. Direct solution of the eigenvalue problem when all intensities (i.e., relaxation pathways) are not accounted for results in large errors especially at longer distances (greater than -3 A). COMATOSE is capable of optimizing the structure in favorable cases and, at least, yielding a set of reliable proton-proton distances (relative error - 10%). In most practical applications, an iterative approach to the development of the structure should probably IX taken. An initial trial structure is subjected to COMATOSE to obtain accurate distance estimates from the 2D NOE. These in turn can be used as constraints in energy minimization of the structure to arrive at an improved structure for recycling into COMATOSE. o 1988 Academic Press, Inc.

INTRODUCTION

The complete high-resolution determination of molecular structure in solution has been a goal of scientists since the development of X-ray diffraction techniques in the early part of this century. With the advent of 2D NMR techniques, coupled with powerful structure development and refinement techniques such as distance geometry, energy minimization, and restrained molecular dynamics, detailed structural models of the molecule in solution are now becoming achievable. The homonuclear 2D NOE (two-dimensional nuclear Overhauser effect) experiment readily yields qualitative structural information. This experiment relies on pro* To whom correspondence should be addressed. 493

0022-2364188 $3.00 Copyright 0 1988 by Academic Press, Inc. All rights of reproduction in anyform reserved.

494

BORGIAS

AND

JAMES

ton-proton dipolar retaxation to yield a correlation map between protons in close spatial proximity (~5 A). The intensities of the cross peaks in the 2D NOE spectrum are related to the distances between the protons and can therefore be used to obtain estimates of those distances. The use of 2D NOE in macromolecular structure determination is becoming widespread in recent years. But, while considerable success has been achieved with this technique, there are certain limitations and precautions that are not always considered which we wish to elaborate below. The 2D NOE experiment was first described (along with the mathematically equivalent experiment for chemical exchange) and given a firm theoretical foundation in Ernst’s laboratory ( 1, 2). We previously described a matrix method for the explicit calculation of the 2D NOE intensities expected for a known molecular structure (3) and compared results so obtained for a rigid test molecule (proflavin) in solution with its X-ray crystal structure (4). The essential concepts described in these seminal papers have been further elaborated by several other laboratories (5-9). We call this general approach to the calculation and interpretation of 2D NOE intensities CORMA for complete relaxation matrix analysis. The critical feature in CORMA is the explicit treatment of the complete relaxation network and specifically spin diffusion in calculating the 2D NOE intensities, An alternative (and widely used) approach assumes that a cross-peak intensity correlating a particular proton-proton pair depends on the distance separating that pair of protons alone. This assumption leads to the isolated spin pair approximation (ISPA). Most analyses of 2D NOE spectra to date have incorporated ISPA in the qualitative or semiquantitative extraction of distance limits based upon relative cross-peak intensities. Gronenborn and Clore ( 10) provide a good review of this approach along with many technical aspects of 2D NOE data collection and analysis. While elaborating the difficulties in performing an accurate analysis of the 2D NOE intensities, we present the results we have been obtaining with a structure refinement program, COMATOSE (complete matrix analysis torsion optimized structure), which is capable of optimizing a model structure with respect to the experimentally obtained 2D NOE intensities. The model structures we have been investigating are right-handed DNA duplexes, but the general methodology and the COMATOSE program have broader applicability. COMPLETE

RELAXATION

MATRIX

ANALYSIS-CORMA

The basic 2D NOE experiment centers around the three-pulse sequence ( 11): (delay time-90”-t,-90”-7,-9V-tZ)n. Double Fourier transformation with respect to times t, and t2 results in the 2D NOE spectrum containing many cross peaks with intensities that are roughly proportional to the inverse sixth power of the distance between them. This approximate proportionality is the basis for the commonly used ISPA approach to the estimation of distances. In order to determine distances from the 2D NOE spectrum as accurately as possible, we choose instead to analyze the intensities by taking into account the complete relaxation network. The cross relaxation during the mixing time 7, is described (2) by the system of equations

dM=-RM at

REFINEMENT

OF

NOE-BASED

495

STRUCTURES

In Eq. [I], M is the magnetization vector describing the deviation from thermal equilibrium (M = M, - MO), and R is the matrix describing the complete dipole-dipole relaxation network. The rate matrix we use is essentially an extension of the two-spin equations of Solomon ( 12) : Rii=2(ni-

l)(W’l+

We)+

C nj(W~+2W~+

W~)+Rli

Pal

j#i R,

=

ni(

WY

-

w;).

WI

Here ni is the number of equivalent spins in a group,such as a methyl rotor, and the zero-, single-, and double-transition probabilities IV,” are given (for isotropic random reorientation of the molecule) by &j=q7c 0

rij

wf=

wL647” 2

Pal

6

1 1.547c rt 1 + (~7,)~ 1 r& 1 + 4(~7,)~ ’

[3cl

where q = 0. ly4h 2. The term Rli represents external sources of relaxation such as paramagnetic impurities and will be ignored in the following discussion. The system of Eqs. [ l] - [ 31 has the well-known solution M(T,)

= a(T,)M(O)

= eeR’mM(0),

141

where a is the matrix of the mixing coefficients, which are proportional to the 2D NOE intensities. This matrix of mixing coefficients is what we need to evaluate if we have a known structure and wish to compare the intensities expected from it with experiment. Alternatively, and preferably, we would like to directly transform these mixing coefficients into distances and ultimately obtain a three-dimensional structure. However, the exponential dependence of the mixing coefficients on the crossrelaxation rates complicates the calculation of intensities (or the distances). As shown below, the calculation of intensities is a relatively easily surmountable problem. Before continuing, we wish to point out that the expression given in Eqs. [2a], [ 2b] for the rate matrix is an approximation in that it neglects cross-correlation terms between separate pairwise and higher-order interactions ( 13, 14). However, the importance of the cross-correlation terms appears to be small for 7, =G7ztp1,the optimal mixing time for maximum cross-peak intensity (14). The expressions also do not account for second-order effects due to strong scalar coupling (15, 16). Once again, the magnitude of error due to the neglect of scalar coupling is small for macromolecules where UT, 9 1 and T, G 7 Z’. APPROXIMATE

DISTANCE

CALCULATIONS

The exact evaluation of intensities (the mixing coefficients aij) according to Eq. [ 41 is hindered by the lack of a general analytical expression for matrix exponentiation.

496

BORGIAS

AND

JAMES

Historically, this has led to the ISPA approach since the exponential into a Taylor series expansion: a(7,)

= emRrm= 1 - RT~ + f RzT& - . . . + -(-1)”

n!

can be recast

RnT; + . . -.

151

Truncating the series after the term linear in R results in a simple approximation for the mixing coefficients that is valid for short (T, + 0) mixing times. The qualitative result of the approximation is that the intensities are inversely proportional to the sixth power of the distance between the correlated protons. The dependence on correlation time can also be eliminated from the calculation by scaling all the distances with respect to a known reference distance which is assumed to have the same correlation time as the proton-proton pair of interest (10). This method suffers from the neglect of network relaxation (i.e., failure of the isolated spin pair assumption: r;j 4 r;k and Yij G for all k atoms). This is usually a poor assumption to make. AS point in fact, there are on average 3.4 neighboring protons within a 3.0 A radius of each proton in B-DNA. Moreover, this protocol references all distances (up to - 5 A) to a single short distance that is not really representative of the majority of distances determined from the 2D NOE spectrum (for DNA, the reference distances typically used are the geminal H2’-H2’ distance of 1.77 A and the H5-H6 distance of 2.46 A in cytosine). rjk

EXACT

CALCULATION

OF

2D

NOE

INTENSITIES

The primary motivation for using the ISPA calculation is its computational simplicity. However, with currently available computer power the exact numerical solution for the mixing coefficient matrix a is a relatively tractable problem. The problem of numerically evaluating this matrix is simply one of solving the eigenvalue problem for the rate matrix: R = xXx-l , where x is the eigenvector matrix and X is the diagonal matrix of eigenvalues. Then, since X is diagonal, the series expansion for the exponential collapses ( 3) : a = 1 - xXX-~T, + ~xXX-‘XXX-~T~

a = ~e-~'mx-'.

- ...

161 [71

We have developed a program, CORMA, i for calculating and displaying the 2D NOE intensities for any set of proton coordinates. This program incorporates Eqs. [ 2 ] and ’ CORMA is written in FORTRAN and is compatible with both VMS and UNIX compilers. The program calculates intensities based on user-supplied coordinates and isotropic or local effective isotropic correlation times. The intensities are printed out in the format of (1) intra- and interresidue submatrices and (2) a formatted list along with distances and the relaxation rates (u’s and p’s). A comparison between input (either experimental or calculated for an alternative model) and calculated intensities can also be made. The RMS error between these intensities is reported for each submatrix as well as for the complete intensity set. Also created are POSTSCRIPT (Adobe Systems, Inc.) files for plotting a schematic plot of the intensities represented as gray squares of variable intensity, or as a number plot with the intensities displayed logarithmically from 0 to 9. Full implementation of the program’s plotting features requires a laser printer (e.g., Apple LaserWriter) which supports POSTSCRIPT language for graphic control of the printer. The program has been successfully mn on molecules ranging up to 500 protons (e.g., BPTI) and is available from the authors.

REFINEMENT

OF NOE-BASED

497

STRUCTURES

[ 3 ] for calculation of the rate matrix and calculates the exponential by diagonalizing R and evaluating Eq. [ 7 1. Calculations using CORMA allow us to determine the contributions to cross-peak intensity due to spin diffusion effects. We can also quantify the error associated with the ISPA determination of distances. As an illustration of the dangers associated with assigning distances based on their proportionality to intensities, consider the buildup and decay of cross peaks corresponding to distances between 2.3 and 2.5 A in a dinucleotide as shown in Fig. 1. The intensities were calculated by the program CORMA assuming an isotropic correlation time of 5 ns, i.e., a correlation time appropriate for the dinucleotide as a part of a larger DNA duplex of ca. 8- 10 base pairs. The error associated with simply relating

0.18

0.16

0.14

I: T

0.12

Fi s

0.10

I ::

0.08

0.06

0.04

0.01

o.oc 100

500

750 MIXING

1000 TlME(tnS)

1500

FIG. 1. Buildup and decay of intensities for cross peaks corresponding to proton-proton distances of 2.3-2.5 A in a dCC fragment extracted from the energy-minimized oligomer. Intensities were calculated using CORMA assuming an isotropic correlation time of 5 ns. Each curve corresponds to a specific protonproton pair and distance: (0) Hl’-H2”(2) 2.318 A;(m) H2’-H3’(1) 2.321 A; (0) H2’-H3’(2) 2.374 A; (O)H5”-H4’(1)2.396~;(~)H5-H6(1)2.397~;(O)Hl’-H2”(1)2.464~;(A)H5’-H3’(1)2.465~;(~) H5’-H4’( 1) 2.486 A; (+) HS-H6( 2) 2.507 A. The gray-shaded line corresponds to the average of all crosspeak intensities.

498

BORGIAS

AND JAMES

distances to intensities is evident. Not only do the intensities span a range of over 50% at 100 ms but their relative ranks do not even correspond to the relative distances. Using CORMA, we generated simulated intensity sets (i.e., theoretical 2D NOE mixing coefficients) for a defined molecular structure. We then calculated distances from these intensities according to the isolated spin pair approximation. Comparison between the ISPA-estimated distances and the true distances from the defined model shows that ISPA systematically underestimates distance even for relatively short mixing times where the approximation is generally presumed to be valid. The systematic underestimation results when the reference pair of protons is close such that little spin diffusion affects their cross-peak intensity; if a more widely separated reference pair is chosen, the calculated shorter distances may actually be too large. As shown in Fig. 2, which represents ISPA distances calculated for the oligomer d( GGTATACC) , even for moderate distances of 3.5 A the error is considerable. The systematic nature of the distance errors suggests the possibility of applying an empirical correction factor to the estimated distances. This factor not only will be a function of the correlation time and mixing time but also will depend on the local environment of the protons. So, assuming a plausible structural model is available to guide the selection of a suitable correction factor, the systematic distance error could be corrected. But note, the range of actual distances corresponding to an estimated distance of 2.0 A at 250 ms is - 1.7 A, so there still remains considerable uncertainty in the distances. Errors of this magnitude preclude the detailed determination of structure where a root-mean-square deviation ( RMSD) of less than 0.5 A is necessary

3.0 A iI 2.0 2.0

3.0

2.0

3.0

4.0

5.0

6.0

3.0 A 1 2.0 4.0 5.0 6.0 IDEAL DISTANCES FIG. 2. Comparison of distances calculated according to the isolated spin pair approximation (ISPA) with actual distances. The ISPA distances were calculated from ideal intensities generated by CORMA for d (GGTATACC) in energy-minimized B-DNA conformation assuming a correlation time of 4 ns and a mixing time of (a) 50 ms and (b) 250 ms.

REFINEMENT

OF NOE-BASED

STRUCTURES

499

to distinguish between equally plausible but different structures (e.g., Arnott B vs energy-minimized B-DNA). The ISPA calculation of distances is not without merit, the inherent errors notwithstanding. Considerable work has been accomplished with this approach, and for the overall determination of macromolecular structure, especially in proteins, it appears to be very successful. Complete structural studies have been performed on proteins in Wiithrich’s lab and in the labs of Clore and Gronenbom. Wiithrich’s group has used the ISPA distances determined from 2D NOE intensities to set limits in distance geometry ( 17-20) calculations for structure analysis of proteins (21-23). Gronenborn and Clore, on the other hand, have applied the complete battery of 2D NOE, distance geometry, and molecular mechanics and dynamics to the determination of the solution structures of proteins (24, 25) and nucleic acids (26, 27). While the distances determined from the 2D NOE spectra in these studies are subject to the errors of the ISPA calculations, reasonably good three-dimensional structures were obtained. This success in the protein structure determinations is largely due to the fact that accurate interproton distances are not required. For tertiary structure, the NOE-derived distances are generally between protons on amino acyl residues separated by at least five intervening residues. The effect of the judicious use of these long-range distances is that the overall structure is reasonably well constrained by important loop-closing interresidue contacts. Similarly, secondary structure classification into cyhelix or /I sheet can be made on the basis of existence or absence of 2D NOE cross peaks without need for accurate distances. The details of the local structure, however, are determined primarily by energetic, steric, and geometric constraints. Therefore, local structure detail will be lost or in error due to the errors associated with the approximation. The loss of short-range detail precludes the analysis of minor structural/chemical variations with respect to the activity of the macromolecule. This is particularly germane to the structural analysis of nucleic acids since the distances typically available here are all “short range”-either intranucleotide or between adjacent nucleotides. But the ability to obtain accurate distances and, consequently, better-defined active sites in proteins in addition to refined tertiary structures, will also be important. STRUCTURE

EVALUATION

BY CORMA

In our laboratory, CORMA has been applied to the investigation of several oligonucleotide sequences (28-31) . Over the course of these studies the sophistication of the analysis has developed from the evaluation of a few models on the basis of selected intensities to a more detailed analysis of many closely related models using all wellresolved intensities. The major drawback of the trial-and-error approach used in these earlier studies is that it is limited by the choice of structural models considered for analysis. It is capable only of discriminating between the proposed structures and also indicating regions of good (or bad) fit between the model and true solution structure. The success of this approach is limited by the ability to generate appropriate trial structures. There are two possible ways to circumvent this limitation. One is the direct determination of distances without making the isolated spin pair approximationessentially using the reverse of CORMA. The other is to automatically refine the structure based on the 2D NOE intensities.

500

BORGIAS DIRECT

DETERMINATION

AND OF

JAMES DISTANCES

BY

CORMA

An obvious solution to the problem of limited (and biased) trial structures is to apply the same computational techniques used in CORMA to the direct calculation of proton-proton distances from the experimental intensities. This approach works well in principle and has been discussed elsewhere (8, 9). Rearrangement of Eq. [4] gives

Fl So, assuming isotropic tumbling and using Eqs. [ 21 and [ 31, distances can be calculated directly from the rate matrix as determined by the experimental intensities. For relatively small systems, in which most of the major peaks (in particular the diagonals) can be resolved and accurately estimated, this is clearly the ideal method of distance determination. However, the effects of low resolution (peak overlap) and low (i.e., generally realistic) signal-to-noise severely hamper the accurate estimation of distances. We have tested the performance of the direct calculation of distances (referred to here as the DIRECT method) in the presence of several different types of data errors: random noise, cross-peak overlap, and diagonal-peak overlap. We used the program CORMA to calculate all intensities, taking into account all dipolar interactions, at mixing times of 50 and 250 ms for a single strand of d( GGTATACC) in a particular conformation, specifically the BDB conformation (31)) as a test model. Intensities were calculated either with random noise (amplitude of kO.003; a typical experimental value) or to a precision of five significant figures with no noise. The effects of crosspeak overlap were examined by selecting for analysis only those intensities which correspond to well-resolved and assignable peaks in an experimental spectrum. This necessarily results in errors, since the complete relaxation pathways will not be represented. The effects of diagonal peak overlap were considered by assigning a single value (averaged over all the calculated diagonal intensities) to each of the diagonals. The major problem of scaling the intensities was not at issue here, but must be considered in any analysis of experimental intensities. Using the CORMA-generated intensities for the test model as “experimental” 2D NOE spectra, the intensities were analyzed using either the DIRECT or the ISPA method to obtain distances. The distances calculated by the DIRECT approach compare favorably with ISPA-calculated distances. The results of these tests are presented in Table 1 and Figs. 2 and 3. Both methods accurately assign the distances for very short interactions and short mixing times. However, even with predicted distances of only 3-4 A, the accuracy of the distance calculations is already questionable. ISPA severely underestimates all the distances due to the neglect of spin diffusion and relaxation from nearby protons, while the DIRECT calculation suffers from the neglect of significant relaxation pathways when all the cross peaks or diagonals cannot be properly assigned or measured. It may be possible to extract considerable usable information from the DIRECT calculation if an approximate structure is known so that estimated distances which are very far off target can be discriminated against while retaining the reasonably accurate distances. The comparatively poor prospects for

REFINEMENT

OF NOE-BASED

501

STRUCTURES

TABLE 1 Comparison of ISPA-, DIRECT-, and COMATOSE’-Derived

Distance Errorsb in d(GGTATACC)

ISPAd

DIRECT

COMATOSE

Intensities used in calculation’

50 ms

250 ms

50 ms

250 ms

50 ms

250 ms

20.003 (A + D) kO.003 (A + (D)) kO.003 (R + D) to.003 (R + (D)) +5 X 1O-6 (A + D) +5 X 1O-6 (A + (D)) +5 X 10-6(R + D) k5 X 1O-6 (R + (D))

0.643 0.657 0.548 0.575 -

1.245 1.328 1.245 1.321 -

0.448 0.214 0.460 0.411 0.001 0.146 0.285 0.319

0.748’ 0.756 0.665e 0.784 0.005 0.421 0.75 1’ 1.188

0.440 0.460 0.480 0.400 -

0.430 0.420 0.390 0.420 -

a Only distances corresponding to intensities used in analysis are compared. The starting structure of optimization was Amott B-DNA. b Errors are RMS errors (A) for ideally 3-4 A distances. ‘Intensities were calculated by CORMA for d(GGTATACC) in the energy-minimized BDB conformation with precisions of 10m3and 10m5with noise at the level indicated. The notations “A” and “R” refer, respectively, to the use of all calculated intensities or only those corresponding to well-resolved and assignable intensities in experimental data. The notations “D” and “(D)” refer, respectively, to the use of calculated diagonal intensities (assuming perfect resolution) or an average diagonal intensity (worst case assignment of diagonals in case of overlap). d The errors in ISPA are predominantly systematic; ISPA always underestimated distances due to neglect of spin diffusion. The range of actual distances corresponding to a calculated distance of 3 A spans - 20.50 A at 50 ms, to - kO.80 A at 250 ms. ’ These calculations resulted in negative values for the two smallest eigenvalues of the intensity matrix.

structure determination by this approach led us to pursue the less direct and more time-consuming approach of iterative least-squares refinement of structure based on the 2D NOE intensities.2 REFINEMENT

OF STRUCTURES

BASED ON CORMA

In view of the relative ease with which we can calculate the intensity matrix from a known structure, we have developed a program for the refinementof macromolecular structure based on 2D NOE intensities. The goal was to optimize a trial structure while minimizing the error E between calculated and observed intensities:

E = c (1; - kz;)2.

191

’ Several reports have been made concerning the suppression of diagonal-peak intensity with the goal of improving the resolution of peaks near the diagonal. Such experimental techniques must be viewed with caution if quantitative analysis of the intensities is to be made. In particular, the “SKEWSY” technique of Bremer, Mendz, and Moore (41) is claimed to be a method that projects diagonal-peak intensity onto cross peaks in a quantifiable manner. However, we have reservations about the actual quantitative utility of the SKEWSY experiment. Our doubts arise from a questionable commutation of matrices leading to the expression for the SKEWSY mixing coefficients (their Eqs. [ 71, [ 81, and [ 91). Moreover, their Eq. [ 91 also appears to be in error. If, however, the SKEWSY experiment is in fact readily quantifiable, it could prove to be very useful in the DIRECT analysis of the 2D NOE intensities to yield distances.

502

BORGIAS

AND JAMES

D I ; 3.0 A N C g 2.0 2.0

5.0

5.0 D I R

b

: 4.0 T

i

FIG. 3. Comparison of distances calculated directly from the diagonalized scaled intensity matrix (DIRECT method) with ideal distances. DIRECT distances were calculated according to Eqs. [ 2]- [ 31 and [ 81. Intensities were calculated by CORMA for d( GGTATACC) in energy-minimized B-DNA conformation assuming an isotropic correlation time of 4 ns. (a) 50 ms mixing time, all calculated intensities >0.003; (b) 50 ms mixing time, intensities selected to correspond to experimentally resolved peaks; (c) 250 ms mixing time, all calculated intensities >0.003; and (d) 250 ms mixing time, intensities selected to correspond to experimentally resolved peaks.

To this end, we have incorporated the intensity calculations of CORMA into a program with a nonlinear least-squares optimizer. Several alternative algorithms for specifying different variable parameters were tried. One promising algorithm makes use of torsion angles and group orientation parameters rather than the full set of Cartesian coordinates. For nucleic acids, the

REFINEMENT

OF

NOE-BASED

503

STRUCTURES

D 1 5.0 R F

n

2.0 20

3.0 IDEAL DtiTANCES

5.0

_i 6.0

D 1 5.0 R i D 4.0 A : NC3-o

2.0

parameters defined for each nucleotide (see Fig. 4) are exocyclic torsion angles, x and y; sugar pucker parameters, P and 0 max; and group orientation parameters, CY’, /3’, y’ and x, y, z. The exocyclic torsion angles are obvious parameter choices. The sugar pucker parameters P and Omaxallow a reduction of parameters since the ring

504

BORGIAS

AND JAMES

FIG. 4. Oligonucleotide atom labeling and definition of COMATOSE conformational parameters. The oligonucleotide is defined by 10 X N parameters (N is the number of nucleotides in the oligomer) . The conformational parameters are the exocyclic torsion angles x and y, the sugar pucker parameters P and 8max1which define the ring torsion angles B,,,8,, and &, and the group orientation parameters: x, y, .z, and a’, ,8’, y’, which define the translation and orientation of the nucleotide relative to the initial trial structure (inset). The chain torsion angles involving the phosphodiester bonds (cy, @,C, and <) are not explicitly included in COMATOSE.

torsions are correlated and constrained by the condition of ring closure. P and B,, are used to define each of the furanose ring torsions according to the equations (32) j=o,

ej=s,,COS[P+(j-2)4a/5],

. ...4

[W

and tanP=

e4 + 8, - e3 - e, 2fI,[sin(a/5) + sin(2a/5)]



[lob1

These equations are only approximate for the sugar ring in DNA since it is not a regular five-membered ring. But, it was noted (32) in the majority of cases that the

REFINEMENT

OF

NOE-BASED

STRUCTURES

505

error between input torsion angles (for calculating P by Eq. [lob]) and those generated according to Eq. [ lOa] is small (< kO.6”). An alternative, and more exact calculation of sugar pucker parameters, is available (33). In these calculations, the pucker is described by a phase and the amplitude of deviation from a mean plane through the sugar atoms. While the accuracy of parameterizing the pucker by this method is laudable, it is computationally cumbersome to reverse the process and generate atomic coordinates for a specified phase and amplitude. Hence, we have chosen not to calculate the parameters according to this latter approach. Finally, each nucleotide is described as a freely floating entity, its position described by the group orientation parameters. Each nucleotide position is completely independent of the others, being held in place only by the constraints imposed by the fit between calculated and observed internucleotide NOE intensities. The internucleotide torsion angles ((Y, ,8, E, and {) are not specifically included in this description of the structure. This yields 10 structural parameters for each nucleotide. An alternative parameterization is to link the entire polymer together, resulting in only 8 parameters per nucleotide. But to do so requires propagating the coordinate changes (and associated errors) along the whole chain (affecting up to half of the atoms) for each change in a, p, y, c, {and Por 8,,. Since the time spent updating coordinates as,the result of parameter changes is relatively minute compared to the time spent diagonalizing the rate matrix, this may be an appropriate alternative and is currently being pursued in this lab. The two primary rewards for choosing torsion angles as variable parameters in the refinement are: (1) A reduction in the number of parameters is achieved. With the parameter set described above there are only 10 structural parameters for each nucleotide versus 24-30 proton coordinates, or - 100 coordinates if all atoms are included. (2) Chemically meaningful structural constraints are automatically applied. By varying only the torsion angles, all bond distances and angles will remain at reasonable values (although no consideration is currently made for unfavorable energetic or van der Waals interactions). As illustration of these benefits, an initial algorithm which varied only the proton coordinates without any restraints or constraints suffered from a pathological tendency for all the protons to gravitate toward the center of mass. This gravitation is partly due to the inclusion of only observed (nonzero) intensities in calculation of E. The consequence of this is that the weakest included intensities will always tend to be too large rather than too small (in which case they tend to be unobserved). This imposes a systematic positive bias on the weakest intensities included in the analysis. The end result is a tendency to underestimate all long distances. Moreover, the end result-a set of proton coordinates-was typically unrelated to any physically achievable structure. The optimization of the structure based on torsion angles, which we call COMATOSE, generally proceeds much more successfully. Tests with a tetranucleotide sequence d( TATA) showed considerable improvement in the estimated proton-proton distances (34). Further tests with the octanucleotide d(GGTATACC), the same sequence used in evaluating the ISPA and DIRECT distances above, also reveal that some improvement in the determination of distances is attainable (Fig. 5 and Table 1). Structures obtained by COMATOSE are represented in Fig. 6. This figure shows the superposition of three initial structures (all essentially BDB conformations but

506

BORGIAS

;

6.0

‘,

5.0

-

AND JAMES

a

: : L

IDEAL

DISTANCES

FIG. 5. Comparison of distances obtained from COMATOSE refinement with ideal distances. Intensities were calculated by CORMA for d( GGTATACC) in energy-minimized BDB-DNA conformation assuming an isotropic correlation time of 4 ns and a mixing time of 250 ms. The starting conformation was the BDB-DNA conformation prior to energy minimization. (a) Prior to COMATOSE refinement. (b) After COMATOSE refinement.

REFINEMENT

OF NOE-BASED

STRUCTURES

FIG. 6. Structures of a single strand ofd(GGTATACC) in BDB conformation before and after COMATOSE refinement. The three superimposed structures on the left are the initial models. The middle structure is the target structure from which the “experimental” intensities were calculated. The three superimposed structures on the right are the structures obtained after successful COMATOSE refinement. The orientations of the structures are determined by the least-squares fit of the coordinates for the C 1’ atoms on residues G, , T3, Ts , and Cs of the model structure to the target structure.

generated by different means), the ideal structure, and the superposition of the three structures after several cycles of COMATOSE. While the fit for proton-proton distances is improved by COMATOSE, a number of problem spots are readily apparent in the structures. In fact, the overall conformations are in many respects worse in the COMATOSE-refined structures than in the initial trial structures. This is not unexpected since long-range structure is relatively poorly determined in oligonucleotides where significant NOES are observed only between adjacent nucleotide units. This points out some of the limitations with the use of this approach alone for structure determination. In particular, clashes between atoms are not precluded (no van der Waals or energy terms are currently included), and in fact, clashes have sometimes occurred with the longer sequences. This is, in fact, an illustration of the difficulty of defining structure on the basis of distances alone (i.e., using distance geometry). Unless many distances are defined with narrow limits, a wide variety of structures can be generated which satisfy the geometric constraints imposed by those distances. Additionally, because COMATOSE is a least-squares process, it is possible that it might allow generation of incorrect structures/distances in the presence of incorrectly assigned peaks. Therefore it is prudent to compare the set of calculated and observed intensities in order to isolate outstanding errors before making a judgment as to the correctness of the structure. The program CORMA makes such a comparison relatively straightforward and allows one to efficiently locate areas in the spectrum which are not well fit. Still, the proton-proton distances corresponding to the cross peaks used in the analysis generally appear to be more reliable than those obtained from the other methods (ISPA and DIRECT). Figure 7 shows the range of

508

BORGIAS 2.0 10.0

E A E

8.0

D I f

6.0

A N C E

3.0

4.0

5.0

AND JAMES 6.0

7.0

8.0

9.0

10.0 11.0

b

7.0

5.0 4.0 3.0 2.0

A 7.0 G E 6.0 D I 5.0 S i 4.0 : E

3.0 2.0 2.0

3.0

4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0 MIN and MAX DISTANCES FIG. 7. Plots showing the range ofdistances spanned by the three models before (a) and after(b) COMATOSE refinement. For each intensity contributing to the residual minimized by the refinement program, the average of the corresponding proton-proton distances from each of the three models is plotted against the minimum and maximum distances found in the models.

distances spanned by the models before COMATOSE refinement along with the range of distances spanned after the refinement. Here it is evident that even though there are some steric and energetic problems with the structures obtained upon successful completion of COMATOSE, the proton-proton distances are very well defined. These improved proton-proton distances can potentially be of considerable

REFINEMENT

OF NOE-BASED

509

STRUCTURES

value as targets in molecular mechanics and dynamics calculations, or for generating alternative trial structures with distance geometry. This iterative process involving the use of COMATOSE to arrive at the best available estimates for 2D NOE distances is shown in Fig. 8. EFFECTS OF INTERNAL

MOTION-ESTIMATION

OF CORRELATION

TIMES

The problem of accounting for internal motion in detail is not a trivial one. Earlier work in our lab (35) described the calculations necessary for explicit treatment of internal motion assuming two different lattice jump models to describe the internal motion in oligonucleotides. While this study showed that it is possible to account for the dynamical properties of oligonucleotides in a more or less exact manner, we currently assume efictive isotropic motion in our analyses. For short oligonucleotides having approximately equivalent dimensions parallel and perpendicular to the helix axis, this is a reasonable assumption. Moreover, this assumption has been shown (3) to introduce relatively small errors with respect to distances (distance errors of - 10% ) . Effective isotropic correlation times can be estimated from nonselective T, measurements, or from the ratio between Tr and T2 (30)) 2 T, l/2 7, N 03T2 ’

L-1

[Ill

which is valid in the limit wr, $ 1. Better estimates of the correlation time are obtainable from selective Tl measurements. Errors in the correlation time estimate can result in systematic errors in the distances, so it is important to choose this value carefully. Ideally, the 2D NOE experiment holds all the information necessary to obtain correlation times. So the correlation time is an optional adjustable parameter in COMATOSE. Tests using synthetic intensity sets have shown that refinement of the overall effective isotropic correlation time is feasible along with the COMATOSE APPROXIMATE

DISTANCES

4

4

ISPA

DG

20

OPTIMIZED

DIRECT NOE

I INTENSITIES

4”

EM MD

COMATOSE

MODEL

STRUCTURE

/

DISTANCES

/ DG EM MD

I

’ OPTIMIZED STRUCTURE 8. Scheme for the complete structure determination of a macromolecule. DG, EM, and MD refer to distance geometry, energy minimization, and molecular dynamics, respectively. FIG.

510

BORGIAS

AND

JAMES

structure refinement. For example, the results shown in Fig. 5 represent a refinement in which the correlation time used for input was purposely set too long. The “experimental” intensity set was created as described above using energy-minimized BDB (EBDB) coordinates assuming a 4 ns correlation time. A subset of 129 calculated intensities, only those corresponding to intensities which were experimentally observed and distinctly resolved, were used in the COMATOSE refinement. Prior to refinement the RMS error in the distances corresponding to these intensities was 0.39 A, and there were 27 distances, some as short as 2.2 A, with errors exceeding 10%. COMATOSE was run with an initial isotropic correlation time of 10 ns rather than 4 ns. At the end of refinement, the refined correlation time was 4.19 ns and the RMS error in the 129 proton-proton distances was reduced to 0.29 A. Moreover, there were only 9 distances with errors greater than lo%, all of which were longer than 3.8 A. We are also pursuing the feasibility of refining local effective diffusion times for each nucleotide unit, in a manner reminiscent of the use of isotropic temperature factors in crystallography. The difficulty with this is that while the correlation times do depend on the motion of the individual protons, they are primarily a measure of the relative motion of the proton-proton vector, and therefore will be dependent on the specific geometry and primary modes of internal and librational motion of the molecule. MULTIPLE

EXCHANGING

CONFORMATIONS

A closely related problem to that of internal motion is the possibility of multiple conformations and fluctional behavior. It is well known that the energy barrier separating C3’-endo and C2’-endo conformations of furanose rings is small. In fact the barrier has been calculated to be -0.6 kcal mol-’ (36). Consequently, the possibility of rapid interconversion between multiple conformations is a real one in oligonucleotides (36, 3 7). Since each nucleotide unit can act independently of all the rest, it is unlikely that the whole oligonucleotide strand fluctuates between C3’-endo and C2’endo in concert, even though some torsional motions may be correlated (38). Consequently, for example, there are 2 ’ = 256 possible ways of describing the conformation of a single-strand octamer considering only the two limiting furanose conformations. The 2D NOE intensities actually reflect the time and ensemble average over all these conformations. At worst the problem is totally untractable; there are barely enough observable intensities to warrant refinement of a single structure to say nothing of 256 interconverting structures. However, since well-resolved peaks are observed (multiple conformations are not evident from the spectrum), it is reasonable to assume that the system can be defined by a single structure or perhaps a family of closely related interconverting structures. The structure obtained from analysis of the 2D NOE spectrum will then be an average over all members of this family. COMATOSE currently makes no provision for multiple conformations. The difficulty with doing so is the rapid increase in number of adjustable parameters once one begins to allow alternate conformations, and variable populations and exchange rates. One quickly encounters the situation where the system is defined by more parameters than there are observables-rendering any refinement of such a complex model highly question-

REFINEMENT

OF NOE-BASED

511

STRUCTURES

able. We prefer to keep the model simple (and admittedly “wrong”) in order to reliably extract at least some of the information available from the 2D NOE experiment. CONCLUSIONS

COMATOSE is a method for refining the structure of a macromolecule with respect to experimentally obtained 2D NOE intensities. It incorporates the explicit calculation of 2D NOE intensities based on the complete relaxation matrix analysis (CORMA), which accounts for the effects of spin diffusion and lattice relaxation by multiple neighboring spin systems. Moreover, it automatically incorporates chemically reasonable constraints on the structure in terms of bond distances, bond angles, and connectivity. In doing so, it also limits the number of variable parameters by defining the molecular conformation by a selected set of torsion angles, puckering parameters, and group orientation parameters. The COMATOSE refinement process is much more time consuming than simpler methods for estimating the distances from 2D NOE intensities such as the isolated spin pair approximation or the DIRECT method. However, it does not suffer from the systematic errors associated with such methods. No information is needed from the diagonal-peak intensities (which are typically overlapping and difficult to measure), and overlapping cross peaks pose no problem (other than loss of potential additional information) since these intensities are simply not included in the calculation of the error to be minimized (alternatively the error in the sum of the calculated intensities corresponding to these overlapped peaks can be added to the error). Only well-resolved peaks need be considered, while their intensities are calculated taking into account the complete relaxation matrix analysis (CORMA). A molecule the size of an octanucleotide ( - 70 protons) can be refined in w-20-30 min cpu time on a VAX 8650 computer. The major limitation of the approach is the neglect of energetic and steric constraints during the structure-refinement process. This occasionally allows conformations to be generated which satisfy the distance constraints imposed by the NOE intensities, but which are otherwise unacceptable. These structural “irregularities” can be removed by cycling the structure through a few cycles of COMATOSE and an energy minimization program such as AMBER (39,40). ACKNOWLEDGMENTS We thank Dr. Gerald Zon for providing [d( GGTATACC)]*, and Dr. Nadege Jamin for obtaining the 2D NOE spectra which were used in this work. Model structures for [ d( GGTATACC ) ] 2 were generated by Dr. Manju Bansal. This work was supported by the National Science Foundation (Grant PCM 8404 198), the National Institutes of Health (Grant CA 27343 and Grant RR 0 1695 ), donors of the Petroleum Research Fund administered by the American Chemical Society via Grant 16999-AC47, as well as by instrumentation grants from the National Institutes of Health (RR 016688) and the National Science Foundation (DMB 84-06826). We gratefully acknowledge the use of the University of California, San Francisco, Computer Graphics Laboratory (Director: Dr. R. Langridge, supported by NIH Grant RR01081). REFERENCES 1. J. JEENER, B. H. MEIER, P. BACHMANN, AND R. R. ERNST, 2. S. MACURAAND R. R. ERNST, Mol. Whys. 41,95 (1980). 3. J. W. KEEPERS AND T. L. JAMES, J. Magn. Reson. 57,404

J. Chem.

(1984).

Phys.

71,4546

(1979).

512

BORGIAS

AND

JAMES

4. G. B. YOUNG AND T. L. JAMES, J. Am. Chem. SocL106, 7986 (1984). 5. E. W. ABEL, T. P. J. COSTON, K. G. ORRELL, V. SIK, AND D. STEPHENSON, J. Magn. Reson. 70, 34 (1986). 6. S. MACURA, B. T. FARMER II, AND L. R. BROWN, J. Magn. Reson. 70,493 (1986). 7. W. MASSEFXI, JR. AND P. H. BOLTON, J. Mugn. Reson. 65,526 (1985). 8. E. T. OLEJNICZAK, R. T. GAMPE, JR., AND S. W. FESIK, J. Magn. Reson. 67,28 (1986). 9. P. A. MIRAU, “27th Experimental NMR Conference, Asilomar, California, WK12, 1987.” IO. A. M. GRONENBORN AND G. M. CLORE, in “Progress in Nuclear Magnetic Resonance Spectroscopy” (J. W. Emsley, J. Feeney, and L. H. Sutcliffe, Eds.), Vol. 17, p. 1, Pergamon, Oxford, 1985. 11. A. KUMAR, R. R. ERNST, AND K. W~THRICH, Biochem. Biophys. Res. Commun. 95,1(1980). 12. I. SOLOMON, Phys. Rev. 99,559 (1955). 13. L. WERBELOW AND D. M. GRANT, in “Advances in Magnetic Resonance” (J. S. Waugh, Ed.), Vol. 9, p. 189, Academic Press, New York, 1977. 14. T. E. BULL, J. Magn. Reson. 72,397 (1987). 15. L. E. KAY, J. W. SCARSDALE, D. R. HARE, AND J. H. PRESTEGARD, J. Magn. Reson. 68,5 15 (1986). 16. L. E. KAY, T. A. HOLAK, B. A. JOHNSON, L. M. ARMITAGE, AND J. H. PRESTEGARD, J. Am. Chem. Sot. 108,4242 (1986). 17. I. D. KUNTZ, G. M. CRIPPEN, AND P. A. KOLLMAN, Biopolymers 18,939 (1979). 18. G. M. CRIPPEN, J. Comput. Chem. 5,548 (1984). 19. T. F. HAVELAND K. W~~THRICH, J. Mol. Biol. 182,281(1985). 20. W. BRAUNANDN. G@J. Mol. Biol. 186,61 l(1985). 21. M. P. WILLIAMSON, T. F. HAVEL, AND K. W~HFUCH, J Mol. Biol. 182,295 (1985). 22. A. D. KLINE, W. BRAUN, AND K. W~THRICH, J. Mol. Biol. 189,377 (1986). 23. W. BRAUN, G. WAGNER, E. W~~RG&I-TER, M. VASAK, J. H. R. KXGI, AND K. W~THRICH, J. Mol. Biol. 187, 125 (1986). 24. G. M. CLORE, D. K. SUKUMARAN, M. NILGES, AND A. M. GRONENBORN, Biochemistry 26, 1732 (1987). 25. G. M. CLORE, D. K. SUKUMARAN, M. NILGES, J. ZARBOCK, AND A. M. GRONENBORN, EMBO J. 6, 529 (1987). 26. M. NILGES, G. M. CLORE, A. M. GRONENBORN, A. T. BRUNGER, M. KARPLUS, AND L. NILSSON, Biochemistry26,3718 (1987). 27. M. NILGES, G. M. CLORE, A. M. GRONENBORN, N. PIEL, AND L. MCLAUGHLIN, Biochemistry 26, 3734(1987). 28. M. S. BROIW, T. L. JAMES, G. ZON, AND J. W. KEEPERS, Eur. J Biochem. 150,117 (1985). 29. N. JAMIN, T. L. JAMES, AND G. ZON, Eur. J. Biochem. 152, 157 (1985 ). 30. E. SUZUKI, N. PATTABIRAMAN, G. ZON, AND T. L. JAMES, Biochemistry 25,6854 (1986). 31. N. ZHOU, A. M. BIANUCCI, N. PATTABIRAMAN, AND T. L. JAMES, Biochemistry 26,7905 (1987). 32. C. ALTONA AND M. SUNDARALINGAM, J. Amer. Chem. Sot. 94,8205 (1972). 33. D. CREMERAND J. A. POPLE, J. Am. Chem. Sot. 97,1354 (1975). 34. B. A. BORGIAS AND T. L. JAMES, “27th Experimental NMR Conference, Asilomar, California, MK03, 1987.” 35. J. W. KEEPERSAND T. L. JAMES, J. Am. Chem. Sot. 104,929 (1982). 36. M. LEVI~AND A. WARSHEL, J. Am. Chem. Sot. 100,2607 (1978). 37. G. C. LEW, P. S. MARCHETTI, A. EJCHART, L. F. LEW, A. KUMAR, P. R. HILLIARD, AND R. L. RILL, J. Biomol. Strut. Dyn. 1,795 (1983). 38. K. KITIMURA, A. WAKAHARA, H. MIZUNO, H. BABA, AND K. TOMITA, J. Am. Chem. Sot. 103,3899 (1981). 39. P. K. WEINER, U. C. SINGH, P. A. KOLLMAN, J. CALDWELL, AND P. A. CASE, “AMBER: A Molecular University of California, San Francisco ( 1984). Mechanics and Dynamics Program,” 40. R. W. BEHLING, S. N. RAO, P. K. KOLLMAN, AND D. R. KEARNS, Biochemistry 26,4674 (1987). 41. J. BREMER, G. L. MENDZ, AND W. J. MOORE, J. Am. Chem. Sot. 106,4691(1984).