Relaxation matrix analysis of 2D NMR data

Relaxation matrix analysis of 2D NMR data

Ro#ress in NMR S~fnoxopy. VoL 22 pp. 83-100.1990. Rioted ill Great Britain. All lighu reserved. RELAXATION Q 0079-6565/5um.00+50 1990 Rrpmon Press ...

1MB Sizes 0 Downloads 168 Views

Ro#ress in NMR S~fnoxopy. VoL 22 pp. 83-100.1990. Rioted ill Great Britain. All lighu reserved.

RELAXATION

Q

0079-6565/5um.00+50 1990 Rrpmon Press

plc

MATRIX ANALYSIS OF 2DNMR DATA

BRANDANA. BORGIAS,*MIRIAMGOCHIN, DEBORAHJ.

KERWOOD

and THOMASL. JAMES?

Departments of Pharmaceutical Chemistry and Radiology, University of California, San Francisco, CA 94143, U.S.A. (Received 14 August 1989)

CONTENTS 1. Introduction 2. Complete Relaxation Matrix Analysis-CORMA 3. Approximate Distance Calculations 3.1. Isolated spin pair approximation-ISPA 3.2 Other approximations 4. Distances via CORMA 4.1. Direct solution of the intensity matrix 4.2. Iterative methods 4.2.1. Iterative relaxation matrix approach-IRMA 4.2.2. Matrix analysis of rates for discerning geometry in solution-MARDIGRAS 4.2.2.1. Examples on real systems 4.2.3. Conformational refinement 5. Summary Acknowledgements References

83 84 87 88 89

90

91 91 92 98 100 loo 100

1. INTRODUCTION The ability to deduce a high-resolution structure of a complex molecule in solution has been a scientific ‘Holy Grail’ since the development of X-ray diffraction techniques in the early part of this century. It is generally recognized that the structure of any molecule can be determined if one can obtain a sufficient number of structural constraints, e.g. internuclear distances and bond torsion angles, to use in conjunction with holonomic constraints of bond lengths, bond angles, and steric limitations. A high-resolution structure will require a large number of such constraints. Until recently, scientists have had to settle for low-resolution structures generated from the partial or indirect structural information that is available from techniques exploiting hydrodynamic properties, electron diffraction and other scattering phenomena, circular dichroism (CD), fluorescence decay, extended X-ray absorption fine structure (EXAFS). and magnetic resonance (NMR and EPR). These methods yielded structural defining information which can range from estimates of the overall shape and dimension of a molecule, e.g. from hydrodynamic studies, to a measure of the extent of secondary structure such as the average composition of a-helix and B-sheet which can be determined from CD measurements. At best it has been possible to obtain a measure of a small number of the distances within the molecule using electron diffraction, fluorescence, EXAFS, and NMR or EPR. Conventional studies were able to generate only a very limited number of structural constraints, and consequently only a very low-resolution structural model. A high-resolution picture of the molecular structure (such as we have come to expect from X-ray crystallographic studies) could not be achieved with any of these methods. Suddenly, within the past decade, the whole picture has changed. A revolution in NMR spectroscopy has occurred due largely to the development of two-dimensional nuclear magnetic *Current address: Cray Research, Inc., 3130 Crow Canyon Place, Suite 400, San Ramon, CA 94583, U.S.A. tAuthor to whom correspondence should be addressed. 83

B.

84

A. BORGIASet al.

resonance spectroscopy (2D NMR). Today with 2D NMR techniques it is feasible to study molecules of such size and complexity for which conventional NMR techniques were all but useless. The use of 2D NMR techniques potentially enables us to obtain, in a single experiment, hundreds or thousands of structural constraints, which can ultimately lead to a high-resolution structure of the molecule. The notion of the 2D NMR experiment was first suggested by Jeener nearly two decades ago.‘*’ Today 2D NMR is routinely used in an ever increasing array of specialized experiments designed to aid in spectral assignment and structural characterization of macromolecules. The primary attraction of 2D NMR methods is that they effectively resolve crowded regions in the spectrum by mapping the spectral information onto two frequency axes rather than the conventional (chemical shift) plot of the spectrum. The typical 2D NMR spectrum is composed of peaks at coordinates (w,, w,) corresponding to the frequencies of two nuclei in communication with each other. The intensities of the peaks are dictated by the particular internuclear interaction exploited by the experiment. In particular, the distance-dependent dipole-dipole coupling, which is detected in the 2D nuclear Overhauser effect spectrum (2D NOE or NOESY), has been exploited to the greatest advantage in the study of molecular structure in solution. In this article we will survey some of the approaches uken to derive structural information from the 2D NOE experiment. The underlying physical basis for the methods will be discussed and their limitations will be highlighted. In so doing, we will try to develop guidelines for obtaining the most accurate and realistic structural parameters from the spectral data. Finally, we would like to consider what future developments in the field will need to be pursued in order to achieve the most comprehensive picture of the true dynamic structure of the molecules in solution.

2. COMPLETE

RELAXATION

MATRIX ANALYSIS-CORMA

The effect of cross-relaxation during the mixing time r, is to transfer magnetization between neighboring protons. This is manifested in the 2D NOE spectrum in the form of cross-peaks between spatially proximate protons. The initial build-up of cross-peak intensity between two isolated protons will be exponential with a characteristic time constant that depends primarily on the separation between the protons and the relative motion of their internuclear vector with respect to the magnetic field. If we assume isotropic motion, then the cross-relaxation rate is inversely proportional to the sixth power of the distance between them. This is the basis for all structural analysis of 2D NOE spectra. However, in a complex network of protons, the perturbing effect of the neighboring environment on the relaxation of each proton must be taken into account. Note that the cross-relaxation rates are not affected by the network of protons, but the overall relaxation rate (T,) is. The net effect is that the relative intensities, which depend on the rates in a multiexponential manner, are no longer reliable predictors of the internuclear distance. Moreover, the apparent cross-relaxation rates measured from initial build-up rates of the intensities are also subject to errors due to the approximation. In order to accurately analyze the 2DNOE intensities, the full matrix of distance- and motiondependent rates must be considered. Cross-relaxation during the mixing time 7, is described by the system of equations? . -RM. (1)

““=

In eqn. (1) M is the magnetization vector describing the deviation from thermal equilibrium (M = hfZ - Me), and R is the matrix describing the complete dipole-dipole relaxation network. This matrix R derives from what is essentially an extension of the two-spin equations of Solomon.‘3’ Rii=Z(ni-

l)(Wy+

I+‘:)+ C nj(W$r+Z~‘,/+

W’,i)+R,r;

(2a)

j+i Rij

Here

"i is the

=

q(Wy-

WY).

number of equivalent spins in a group such as a methyl rotor,

W-4

and

the zero, single, and

Relaxation matrix analysis

double transition probabilities

Wz are given (for isotropic random reorientation

85

of the molecule) by: (3)

where ~=7*h2/10, rij is the effective internuclear distance between protons i and j, T, is the correlation time, and w is the Larmor frequency of the proton. The term R ,i represents external sources of relaxation such as paramagnetic impurities and will not be considered further in this treatment. The system of eqns (l-3) has the solution M(r,)=a(r,)M(0)=e-RTm~(O)

(4)

where a is the matrix of so-called mixing coefficients which are proportional to the 2D NOE intensities. This matrix of mixing coefficients is what one needs to evaluate in order to test a structure for consistency with the measured 2D NOE intensities. The term ‘complete relaxation matrix analysis’ as used in this context has drawn some fire in that the expressions used here for the rate matrix are actually approximations. We consciously neglect crosscorrelation terms between separate pairwise and higher order interactions!4. ” This greatly simplifies the problem at the expense ofa relatively small degree oferror. The importance of the cross-correlation terms appears to be small for r, I so,P’,the optimal mixing time for maximum cross-peak intensityj5) Also, the expressions given here do not account for second-order effects which occur when there is strong scalar coupling.‘6*7’ An explicit account of J coupling can also be included in the analysis of the 2D NOE intensities.‘6’ However, studies performed in order to measure the effect of scalar coupling have found at most a 10% error, except in the case of very strong coupling (J/6 2 0.5) and short mixing times (c 50 ms) where the error can be _ 30%. From this we conclude that neglect of J coupling may be justified in most practical cases where cross-peaks arising from strongly coupled protons are generally not resolvable anyway.“’ The multi-exponential dependence of the mixing coefficients on the cross-relaxation rates complicates the calculation of intensities. We will show below that calculation of the distances corresponding to measured intensities is even more troublesome. The exact evaluation of intensities (the mixing coefficients aij) according to eqn. (4) is hindered by the lack of a general analytical expression for matrix exponentiation. The obvious approach to calculating the intensities is to note that the exponential can be recast into a series expansion: a(7,)=eeRr-Z

l-Rr,++R’ri-

. . .+yR”f,+ n.

. . ..

If the mixing time is short the higher order terms obviously vanish. The problem is that ‘short’ is a relative term that depends on the correlation time of the molecule, and the spatial proximity of the protons which govern the magnitude of the relaxation rates in R. For example, intensities were calculated according to eqn. (5) for protons in a fragment of B-DNA assuming a correlation time of 4 ns (at 500 MHz, or, = 12.6). and mixing times of 10.50, 100, 150, and 200ms. The results are given in Table 1 and Fig. 1. At 10 ms, intensities were in error by only 5-15% with a one term approximation to eqn. (5). However, intensities for distances between 1.8A and 2.3 A, were found to be overestimated by 20-90% at 50ms. The intensities for these cross-peaks are expected to be dominated by direct relaxation. Intensities for longer distances, where spin-diffusion is expected to make a significant contribution, were found to be underestimated by 50-70%. These calculations provide an example where diligence has no reward; simply increasing the number of terms in the expansion proves to be a very poor route to obtaining intensities. Figure 1 shows the fluctuations typical of a variety of distances. We find that the series expansion converges slowly. Convergence requires three terms at 10 ms and as many as 13 terms for a 200 ms experiment with a correlation time of 4 ns. Apparently brute force is not an appropriate response to this problem. A more expeditious approach to calculating intensities is to take advantage of linear algebra and the simplifications which arise from working with the characteristic eigenvalues and eigenvectors of a matrix.“’ The rate matrix R can be. represented by a product of matrices: R=~ix’ where x is the

86

B. A.

BORGIAS

er al.

0.6

6 1.769

A

0.6

3.602

I

*

A

H2’11

H613 I

10

-0

20

0

10

No. of TERMS

No.

20

ot TERMS

0.12 4.215

A

I 0

10

20

No. ot TERMS

-0.2 -0.4 4 0

-

H2’11

H3’11

0.3 I

10

20

No. of TERMS

0.2 0.1 0.0 -0.1 -0.2 -0.3 0

10

20

No. of TERMS

0.4

0.002 2.272

0.3

A

7.143

A

0.2 0.1 0.0 -

HYSHS11

1.

0

10

20

10

0

No. of TERMS

20

No. of TERMS

FIG. 1. Plots showing the convergence of intensities calculated using a term-by-term series summation of eqn. (5). Calculations assume a mixing time of 200 ms, a correlation time of 4 ns, and an operating frequency of 500 MHz (mrc = 12.6).Note that the intensity scales are not the same. The distances and specific proton-proton pair for the intensity are indicated.

unitary eigenvector matrix (I- 1=$j and 1 is the diagonal matrix of eigenvalues. The utility of making this transformation is that, since I, is diagonal, the series expansion for its exponential (and consequently that of the mixing coefficient matrix) collapses: a=

1-xIx’T,

a=xe-A%nxT.

++~1~~xifr$

- ,..

(6) (7)

87

Relaxation matrix analysis TABLE

1. Convergence of series expansion for 2D NOE intensity calculation Error* in single term approximation

5* I

N’

ri, = 1.77 to 227 A

10 so 100 150 200

3 4 7 10 13

+5 to +30 to +60 to +90 to + 130 to

15% 90% 205% 410% 570%

ri,=3.60 -0 -45 -50 -60 -60

to 7.1sA

to -30% to to to to

-70% -80% -90% -90%

* Mixing time for calculation in milliseconds. Correlation time is assumed to be 4 ns, and the operating frequency is 500 MHz (OT, = 12.6). ‘Number of terms in series expansion required to achieve less than a 5% deviation in intensities averaged over all calculated intensities. *Error is defined as (I, - f,,.)/l,V. where I, is the single term intensity, and 1, is the intensity calculated after N terms. Errors reported in these columns are typical of those encountered for distances in ‘short’ and ‘long’ ranges, as indicated. The actual proton-proton distances were 1.77, 2.22, 2.27, 3.60, 4.22, 5.53. and 7.15 A.

This calculation allows one to readily calculate all the cross-peak intensities for a proposed structural model. The break-even point in calculation time for performing matrix diagonalization versus series summation occurs at about three terms. If more than three terms are needed (almost certainly), then matrix diagonalization is a faster solution to the problem. Moreover, once the rate matrix R is diagonalized, the calculation of intensities at any mixing time requires only a single matrix-matrix multiplication (which is equivalent in computational effort to adding a single term to the series expansion). Hence only for very short mixing times would a series expansion ever be useful. Below we will compare several different methods of analyzing the 2D NOE spectra for internuclear distance and structural content. This analysis will entail use of hypothetical data sets generated using the program CORMAt9’ rather than actual intensity sets derived experimentally. We use hypothetical data since we need to know the true structure and molecular motion exactly in order to measure the consequences of any random or systematic errors in the experimental intensities on distance calculation and ultimately structure generation. We can calculate, using CORMAJ9’ a theoretical 2D NOE spectrum for a known structure using an arbitrary motional model. In the present case that model is an isotropically reorienting rigid molecule. We can also add random noise at any level desired, and we can consider any number of peaks to be overlapping or otherwise unmeasurable. This will allow us to compare the various methods in their abilities to handle realistic spectral limitations. In other words, for a given structure we can create spectra with various realistic problems. Then we can see how well we are able to deduce the structure using the different methodologies without using our a priori knowledge of the structure. 3. APPROXIMATE

DISTANCE

CALCULATIONS

Having just described the calculations necessary to generate a complete set of hypothetical intensities from a known structure, we now turn to the more interesting problem of deducing a structure from a limited set of measured intensities. The most direct approach is to make the simplifying assumption that each cross peak will be dominated by the cross-relaxation rate between the pair of correlated protons. It should be apparent from the foregoing discussion that this approach is prone to substantial error .t’O) Nevertheless, it has been the underlying approach for most macromolecular structure determinations via 2D NOE analysis to date. With this in mind, we would like to assess the magnitude of the errors in the distance and subsequent structures generated by this approach.

88 3.1. Isolated

B. A. BORGIASet al. Spin Pair Approximation-ISPA

The exact evaluation of intensities (the mixing coefficients aij) according to eqn. (4) is hindered by the lack of a general analytical expression for matrix exponentiation. Historically, this has led to the ISPA approach since the series expansion for the exponential (eqn. 5) can be truncated after the first term which is linear in R. This results in a simple approximation for the mixing coefficients that is valid for short (5,-O) mixing times. The qualitative result of the approximation is that the intensities are inversely proportional to the sixth power of the distance between the correlated protons. The dependence on correlation time can also be eliminated from the calculation by scaling all the distances with respect to a known reference distance which is assumed to have the same correlation time as the proton-proton pair of interest. (“) Equation (8) shows the typical expression used for such a scaling:

arer “6

(>

rij = rrcr -

aij

This method suffers from the neglect of network relaxation (i.e. failure of the isolated spin-pair assumption: rij B rit and ri, < r,k for all k atoms). The assumption that spins are ‘isolated’ is u:ually a poor assumption to make. In fact, there are an average of 3.4 neighboring protons within a 3.0 A radius of each proton in B-DNA. For a globular protein such as BPTI, there are about 4.7 such neighbors in a 3.0 A radius. Moreover, ISPA references all distances (up to -5 A) to short distances that are not really representative of the majority of distances determined from the 2D NOE spectrum (for DNA, the reference distances typically used are the geminal H2’-HZ” distance of 1.77 A, and the H5-H6 distance of 2.46 A in cytosine). As an illustration of the dangers associated with assigning distances based on their proportionality to intensities, consider the build-up and decay of cross-peaks corresponding to distances between 2.3 and 2.5 A in a dinucleotide as shown in Fig. 2. The intensities were calculated by the program

100

SW

750

1000 MIXING TIME (8-m)

l!iOO

FIG. 2. Build-up and decay of intensities for cross-peaks corresponding to proton-proton distances of 0.23-0.25nm in dCC extracted from energy-minimized oligomer. Intensities were calculated using CORMA assuming an isotropic correlation time of 5 ns. The grey shaded line corresponds to the average of all cross-peak intensities.

89

Relaxation matrix analysis

CORMAi9) assuming an isotropic correlation time of 5 ns; i.e. a correlation time appropriate for the dinucleotide as a part of a larger DNA duplex of ca. 8-10 base pairs at low temperature. The error associated with simply relating distances to intensities is evident. Not only do the intensities span a range of over 50% at 100 ms, their relative ranks do not even correspond to the relative distances. In all honesty this result can be turned around in order to state that a 50% error in measured intensity will result in a corresponding distance error of only lo%! This is true, of course, for very strong intensities corresponding to the relatively short distances. However, when the intensity is weaker, the reverse situation obtains: small deviations in intensity can correspond to large distance errors. The curve in Fig. 3 demonstrate this problem. 3.2. Ocher Approximations In view of the pr.oblems associated with ISPA, attempts have been made to determine the distances more accurately without applying the principles of CORMA. (a’ We make brief mention of two approaches to the problem. (“-r4’ A phenomenological approach to the problem of spin diffusion and network relaxation is used to iteratively correct the apparent cross-relaxation rates as determined from the intensities. The etfective cross-relaxation rate is first estimated from the row-normalized 2D NOE intensities:

I

I

0.206

lll111I 0.100

I

o.oso

I

0.020

11111I1 0.010

0.006

I

I

0.002

0.001

INTENSITY

FIG. 3. Expected errors for distances obtained from 2D NOE intensities. The (I priori error for any distance is a function of the cross-peak intensity. The curves are for random errors in the intensities of ~0.001 (-;-), f 0.093 (----),

and &0.005 (-),

and are calculated according to an approximate derivative: Arij z -(kr,)~.A,a,j/6a,,~.

B. A. BORGIASer al.

90

This estimate for the rate is then iteratively adjusted by taking into account all two-step magnetization transfers: Rif’= R$r-$

1

R;yR;If;

R$j=R$L$

kti.j

c

R{j’R$r;.

. ..

(10)

k#i.j

Such an approach generally converges in 2-4 iterations. Note that each iteration effectively involves a matrix multiplication, so the time required is approximately the same as for a single CORMAf9t calculation. Distance errors by this approach are estimated at less than 10%.‘13) On the surface this technique looks a lot like expanding the series; however, since the corrections do not incorporate the diagonal terms, the oscillations found there do not appear in this method. The second method relies on ratios of cross- and diagonal-peak intensities to determine the relaxation rates. Based on two-spin assumptions, it makes use of the relationship between the crossrelaxation rates and the intensities: aAB/aAA = -[l

-e-Rc’-]/[l

+eVRcrm]

(11)

from which the rate is obtained after rearrangement: R, = (k/r,)ln[(

1+x)/( 1-x)1, k=-1

where k = 1 when wr, > 1

(12)

whenwr,
where x is the experimental ratio between the cross- and diagonal-peak intensities. This method is apparently reasonably accurate (errors of -0.3 A) in the test cases performed on it, and does not require the collection of a series of intensities at different mixing times. (*wIt will be susceptible to errors arising from insufficiently resolved diagonal peaks.

4. DISTANCES VIA CORMA Is it practical to apply the same methodology used in CORMA to determine intensities to the estimation of interproton distances also? The answer is a qualified yes. In principle there is no problem with performing the necessary logarithm of the intensities to find the cross-relaxation rates. However, when one is faced with the experimental realities of poorly resolved intensities and both systematic and random intensity errors, the problem becomes much less straightforward. 4.1. Direct Solution of the Intensity Matrix Equation (13) shows the fundamental coefficients as derived from eqn. (4).

logarithmic

-In R=

relationship

a0 ( a(O) > Trn

between the rates and mixing

(13)

Several workers have described their efforts in solving this equation with experimental intensities, and have reported success.(t5-I’) The problem that will be encountered with most systems of increasing molecular complexity is that generally the diagonal intensities will tend to be very strongly overlapped as well as many of the cross-peaks. This precludes the direct evaluation of eqn. (13) since the full matrix a is incomplete. 4.2. Iterative Methods An alternative to directly solving the relaxation matrix relies on the principle that there are many constrained distances that remain invariant regardless of the conformation the molecule assumes, and the assumption that, while network relaxation effects are important in determining the intensities,

Relaxation matrix analysis

91

these effects are secondary and will have relatively little impact on the solution if the conformation is approximately correct. Taking this approach, it becomes apparent that it is possible to ‘fill in’ the gaps created by poorly resolved cross- and diagonal peaks with intensities generated by CORMA’*’ for a reasonable conformational model. Thus it is possible to generate a complete matrix that can be solved directly by the method outlined above. It will have parts that are poorly defined by experimental intensities, from which no distance information will be gleaned, but which will serve to form a foundation for accurately transforming into distances those intensities which are well defined. 4.2.1. Iterative Relaxation Matrix Approach--IRMA. Kaptein and co-workers have recently shown that it is feasible to substitute into the scaled matrix of observed intensities the corresponding mixing coefficients calculated for a reasonable model structure. This yields a full set of mixing coefficients a’ which can be successfully transformed to distances via eqns (13.2.3). Such an approach has been applied in an iterative manner to successfully generate solution structures.“8U19) This approach (called IRMA for Iterative Relaxation Matrix Approach) was initially applied within the context of an evolving structure in which distance geometry (DG) or restrained molecular dynamics (r-MD) was used after successfully obtaining distances from the substituted matrix to generate a new structure that would then be cycled through the process again. 4.2.2. Matrix

Analysis of Rates for Discerning GeometRy in Solution-MARDIGRAS. In our laboratory, we have taken a slightly different approach to this problem in order to probe the capability of the intensity substitution process alone to generate reliable distances.“” Recognizing the computationally intensive nature of DG and r-MD algorithms, we wanted to know how well-defined the model structure needs to be, and how closely the resulting distances match ideal distances. Both the IRMA and MARDIGRAS algorithms are similar in their treatment of the intensities. The central features of the algorithms are (1) the use of CORMA@’ to calculate a new matrix of mixing coefficients (a,) from the initial model structure and the evolving distance set, and (2) the back-calculation of the rate matrix via eqns (13,2,3) from the embedded matrix of mixing coefficients (4’). However, MARDIGRAS differs from IRMA in that it seeks convergence in a self-consistent set of distances which simultaneously satisfy the experimental intensities and the set of model intensities. The process of model intensity calculation, substitution and distance calculation can be repeated approximately 4-7 times while the calculated distances converge toward their ideal values. Several important manipulations are incorporated into the algorithm. (a) Only cross-peak rates which have a corresponding observed intensity are allowed to change. Since the intensity matrix is sparse, allowing all elements to be replaced at each cycle results in spurious distance shifts that can compensate for the real changes demanded by the data. (b) Cross-peak rates that correspond to known distances (i.e. geminal protons and aromatic ring protons) are not allowed to change. These are always reset to their known values at every iteration. (c) Diagonal rate constants (Rii) are replaced by the appropriate sums (cf. eqn. (2)) based on the calculated and constrained cross-peak rates. An important facet of MARDIGRAS is the very weak dependence of the algorithm on the starting structure. Starting from either an extended chain model, or a model based on the target structure but randomized by up to ISA in each atom coordinate resulted in essentially the same distances.(20) The former starting structure has many short-range distances that are not far from wrong, but the long range distances are in gross error. The latter has the long range structure approximately correct, but many local distances that are very bad. In spite of this, the resulting distances are very good regardless of the input model. The distances calculated by MARDIGRAS are dramatically better than the initial model (Figs 4 and 5), and not significantly different from the distances generated from the randomized structure. Another question probed was the significance of using different mixing times for the calculations. Since signal-to-noise improves when the mixing time is increased, a logical question to address is whether the distance estimates from MARDIGRAS are improved by using longer mixing times. The resulting distances from 100, 200 and 400ms mixing time experiments are compared in Fig. 5. The 200ms calculation is slightly better than the 1OOms calculation, but at 400ms the results are

92

B. A.

BORGIAS

et al.

2.0

4.0 3.0 X-RAY DISTANCES

i 7 3.0 A

2.0

3.0 X-RAY

4.0

DISTANCES

2.0 X-RAY

4.0 3.0 DISTANCES

FIG. 4. Results of MARDIGRAS on BPTI. Conditions: T, = 1.8 ns. r, = 100 ms, intensities were calculated by CORMA with noise at +0.003 level. Initial structure was taken from the true BPTI coordinates in which the coordinates were each shifted randomly by up to _t 1.5 A.The RMS error in initial H-H distances was 1.19 A.Panel (a) shows a plot of the initial H-H distances vs. the H-H distances in the X-ray structure from which the intensities were generated. Panel (b) shows the distances as in (a) after one cycle of MARDIGRAS. PanTI (c) shows the final distances after convergence (six cycles). The RMS error in the final H-H distances was 0.33 A for approximately 1100 distances.

considerably worse. At 4OOms, spin diffusion is beginning to impact the results by draining away intensity (and information) from the shortest-distance cross-peaks. The MARDIGRAS program is not capable of combining a series of mixing times (as is possible with IRMA) at the time of writing this paper, but this would undoubtedly improve the results overall. The resulting distances obtained by MARDIGRAS or IRMA are also more accurate than can be achieved by other methods. Figure 6 shows the anticipated error ranges for distances based on the intensity of the corresponding cross-peak. The MARDIGRAS distances are found to lie within the error bounds while distances calculated via ISPA are found to deviate systematically from the ideal values, as well as having a larger dispersion. 4.2.2.1. Examples on real systems. MARDIGRAS has been used in the preliminary analysis of two oligonucleotides: [d(GT).d(CA),] and [d(ATATATAUAT)],. Work in our group on [d(AT)s]$“)and [d(GGTATACC);‘*~*” suggested that a wrinkled D (wD) structure, a member of the B-DNA family, best accounted for the experimental ZDNOE spectra for d(TATA). Further, molecular mechanics calculationst2’-23’ using the program AMBER (24)found that the wD structure was also energetically most favorable in these oligomers. Interesting structural features of wD-DNA include (a) eight base pairs per helical turn, (b) a much narrower minor groove, (c) significant cross-strand hydrophobic interactions, and (d) alternating torsion angle values at TA and AT steps. Part of the driving force for

93

Relaxation matrix analysis 5.0 ? E

4.0

6

C :

3.0

k

S

2.0

2.0

3.0

4.0

2.0

3.0

4.0

3.0

4.0

A

F E

C

!i

4.0

9 6.0

:

* F T

E

X-RAY

DISTANCES

5.0

4.0

2.0 X-RAY

DISTANCES

FIG. 5. Results of MARDIGRAS on BPTI using an extended chain conformation for the initial structure. Intensities are the same as used in the calculations for Fig 4. (a) shows the initial H-H distances vs. the distqtces in the X-ray structure from which the intensities were generated. RMS error in 3-4 A H-H distances: l.64A RMS error in 4-5 A H-H distances: 2.65 A RMS error in all II00 H-H distances: 1.30 A.(b) shows the final results for intensities at r,,, = 100 ms. RMS error in 3-4 A H-H distances: 0.35 A RMS error in 4-S A H-H distances:_O.lS A Final RMS erron 0.32 A. (c) shows the final results for intensities at r m = 200 ms. RMS error in 34 A H-H distances: 0.31 A RMS error in k-5 A H-H distances: 0.48 A Final RMS error: 0.20 A. (d) shows the final results for intensities at r, = 400 ms. RMS error in M A H-H distances: 0.42 A RMS error in 4-5 A H-H distances: 0.92 A Final RMS error: 0.30 A.

the affinity for wD structures has been hypothesized to be caused by stacking of the methyl groups. A question arises as to whether these structural features are characteristic of alternating d(TATA) or whether alternating purine-pyrimidine sequences containing dA-dT base pairs would also manifest wD structural features. Consequently, investigations have been initiated to elucidate the structure of

CWTL*WA),l. The sequence [d(GT),-d(CA),] is of interest for two reasons. Firstly, it consists of an alternating purine-pyrimidine pattern, but has only half the number of methyl groups of an (AT) segment. Secondly, short tracts of GTC/CAC have been found with significant frequency in protein-binding domains of DNA,‘15’ raising the -question of a structure-function relationship. Four 5OOMHz

B.A. BORGIASet al. 4.0

2.0

0.0

-2.0

-4.0 0.200

O.O!iO 0.020 0.010 0.005 INTEIWTV

o.lm2

. r\ 4

4.0

.’

0.2000.1000.050 0,020 0.010 0.00s INlENSiTY FIG.6a.b.

0.002

Relaxation matrix analysis

o.zoa

0.100

o.oso

o.ozo

INlENSrrr

0.010

95

o.ow

0.002

FIG. 6. Comparison of errors in distance calculations by MARDIGRAS and ISPA. The data correspond to a test case on the first 100protons of BPTI. Intensities werecalculated by CORMA with noise at the _+0.003level.In each panel the error between the true distance and the calculated distance is plotted vs. the intensity of the cross Peak. The theoretical error ranges from Fig. 3 are superimposed over each figrue. (a) Results from MARDIGRAS. The intensities analyzed were from a single 200 ms set. Note that only a few points extend beyond the +0.003 boundaries.(b) Results from ISPA using a single exponential fit to the initial rates. Note the systematic error and the large number of points that exceed the ~0.003 range. (c) ISPA distances calculated assuming that the intensities

obtained at 200 ms are reasonable estimators of the initial rates.

2DNOE proton spectra at different mixing times were examined. with CORMA and MARDIGRAS employed in the analysis. Additionally, double quantum-filtered correlated spectroscopy (DQFCOSY) has been employed to obtain an independent measure of the sugar pucker parameters. Previous studies of the DQF-COSY and 2D NOE spectra of DNA segments have been separately described, with little attempt made to correlate the results obtained by the two methods. Yet these methods should provide corroborating structural evidence, validating the model obtained. The [d(GT),*d(CA),l study shows that the results obtained from these two techniques do in fact agree. Rigorous analysis of the 2D NOE spectrum by CORMA requires that the cross-peaks be measured as accurately as possible. To this end a procedure of adding up the intensity of a region enclosed by a specified base contour was developed. For a set of overlapping peaks, the relative contribution of each component peak was determined by combining simulated spectra for the sum of the components to fit the x and y projections of the region. CORMA was then used to compare the calculated intensities for B-DNA, wB-DNA and wD-DNA structures with the experimentally observed intensities. The startling result obtained in all cases was the observation of an oscillation in the intensity error ( Icals- lob) on moving from purine to pyrimidine throughout the chain (Fig. 7). This oscillation was observed for interactions between base and sugar protons of the same and neighboring bases, as well as for interactions between pairs of sugar protons. The effect was most marked for the B-DNA model, clearly indicating a deviation from regular B-like structure, and qualitatively confirming an alternating or wrinkled-like structure. The oscillations were smaller for wB and even smaller for wD models, with the

B. A. BORGIASer al.

s %

*

Hb/O-H1’

*

HW-HY

*

H6i8-lir

*

HSB-H3

0.04

0.02

0.00

-0.02

-0.04 0

2

4

6

a Residue

10

12

14

16

la

20

Number

FIG. 7. Oscillations for [d(GT),.d(CA),]. Difference between calculated and observed intensity is plotted along the y axis as a function of nucleotide position along the chain. The nucleotide numbering is given by 1 = IA, 2 = ZC, 3=3A,..., 8=8C,9=IG, lO=ZT,. . ., 16=8T. The NOE deviations shown are between the base proton H6 or HS and the corresponding sugar protons.

exception of the Hl’-H4’ interaction which remained with a fairly large deviation. Thus the structure is further established as being closer to wD than to B. A quantitative interpretation of the structure was made by applying MARDIGRASzO’ to the data to obtain a set of upper and lower bounds to be used as NOE distance constraints in molecular mechanics and dynamics, using the program AMBER. (24)To ensure that these constraints were not model-dependent, MARDIGRAS was run starting from both B and wD DNA, with excellent convergence between the two (Fig. 8). This result was anticipated from the theoretical tests performed on the program MARDIGRAS. (*O) Minimization of a standard wD structure using the NOE constraints resulted in a structure which fit the NOE intensities well but had a higher free energy than the wD structure itself. A series of molecular dynamics simulations are being run in order to ease the molecule into a lower energy region which is still consistent with the 2D NOE constraints. The sugar puckers were obtained by determining the J couplings between sugar protons. From the DQF-COSY spectrum, coupling constants between the sugar protons were determined from fitting the Hl’-H2’,.Hl’-HZ” and H2’-H3’ cross-peaks using a cross-peak simulation program described by Widmer et al.@) An estimate of the sugar pseudorotation phase P could then be derived using the method of Altona and Rinkel.*’ There is uncertainty in establishing sugar pucker from J couplings and cross-peak structure, in that they can be explained either by a single conformer or by a mixture of standard N and S conformers. Thus two possible alternatives could be derived: (1) pyrimidine sugars P= 144 f 10” and purine sugars P= 185 f 15”, or (2) pyrimidine sugars P= 144 + 10” and purine sugars P= 153” (85% S), 9” (15% N) in both cases assuming a pucker amplitude of 35”. To corroborate these findings and to try to distinguish between the two possibilities. we turned back to the NOE results, and found that the Hl’-H4’ distance bounds calculated by MARDIGRAS fit extremely well to the determined sugar puckers (Fig. 9). Thus the DQF-COSY results support the findings of CORMA and MARDIGRAS. This is extremely encouraging evidence which shows that the simulation methods work well in a real case. However, we are still unable to unequivocally distinguish between single- and multi-conformational states. The distance between the Hl’ and H4 protons in the N( P ==9”) conformer is about the same as for the S( P = 180“) conformer; in general, Hl’-H4’ cross-peak intensities are the most sensitive to sugar pucker. Although refinement and analysis of the [d(GT),.d(CA),] structure is still in progress, it is clear that its structure has many of the characteristics of wD-DNA, most notably, narrow minor groove and base propeller twist. Examination of the molecular mechanics calculations for [d(TATATATATA) J2 revealed that the methyl group on the thymine played-a role in stabilizing the wD structure. Therefore, an investigation

97

Relaxation matrix analysis

(a)

0 B-DNA

dlstancas

distances

Modal

from B-DNA

FIG. 8. Comparison of interproton distances for [d(GT),.d(CA),] in the B-DNA and wD-DNA conformations before (left) and after (right) structure refinement with MARDIGRAS.

V..,-.

-

0.03

- 0.02

-0.01

FIG. 9. Plot of the distance between the Hl’ and H4’ protons in a sugar as a function of pseudorotation angle P, assuming a pucker amplitude of 35”. Superimposed are the calculated upper bounds on the Hl’-H4’ distances for the various nucleotides in [d(GT),.d(CA),] from MARDIGRAS. Also shown is the corresponding intensity for the . Hl’-H4’ cross peak as a function of P calculated from CORMA.

of this observation was begun by studying [d(ATATATAUAT)],, which has one methyl missing in each of the complementary strands. Results to date from this study show that, based on CORMAt9) calculations, the wD structure is a better model than the A. B, or Z conformations (Fig. 10). MARDIGRAS was then used to calculate distances for 2DNOE intensities measured with a 50ms mixing time, using the wD structure as the starting point. The distance constraints were then used in the AMBERfz4) minimization of the model structure. The resulting structure revealed that, in the region of the deoxyuridine, the minor groove is somewhat wider than it is with wD-DNA and base pair propeller twisting is not quite as accentuated.

B. A. BORGIASet al.

98

0

100

200

mixingtime(msee)

FIG. 10. Residual factor R between calculated and observed 2D NOE spectra as a function of mixing time for various DNA structural models of [d(ATATATAUAT)],.

(R=xll, I)

- f,l/~ll,,l). n

4.2.3. Conformational Refinement. IRMA and MARDIGRAS have been shown to be superior methods for generating distances based on 2D NOE intensities, and subsequently complete conformational models (using DG or r-MD techniques). A more holistic approach was taken in designing the COMATOSE (Complete Matrix Analysis Torsion Optimized StructurE) program.“” This program also centers on the CORMA’s) calculation, but has a least-squares minimizer built into it to refine the conformation of the macromolecule. The object of this program is to minimize the error between the calculated and observed 2D NOE intensities while adjusting the parameters defining the conformation. The function minimized with respect to the conformational changes is: (14)

where 1: is calculated cross-peak intensity and klb is the corresponding scaled experimentally obtained intensity. To reduce the problem size and the number of adjustable parameters, the conformation is defined by a limited set of torsion angles, sugar pucker parameters and residue (amino acyl or nucleotide) coordinates and orientations. Bond lengths and angles are maintained at ideal values, and only conformationally significant variables are allowed to change. The reduction in variables from approximately 100 Cartesian coordinates to only 10 conformational variables per nucleotide makes the algorithm plausible for systems of up to 10-15 nucleotides. However, since numerical derivatives are necessary to calculate the gradient for changes in the conformational variables, the program is still computationally demanding. There are other drawbacks to the approach taken by COMATOSE. Since COMATOSE is a leastsquares minimizing routine, it is susceptible to locating the nearest local minimum in the vicinity of the starting conformation, rather than the structure corresponding to the global minimum that presumably reflects most accurately the true structure. The pervasiveness of this problem is illustrated in Fig. 11, which depicts the path taken by COMATOSE in refining the conformation of a single tryptophan residue. Only two degrees of freedom are optimized against ideal intensities (calculated using the program CORMA)t9) in this example. While this is seemingly a trivial problem, the fact that the global minimum is not obtained in all cases is indicative of the inherent problems associated with leastsquares minimizers. The plots show contours of the error function used by COMATOSE which are generated in a conformational scan of all torsion space available to the residue. Superimposed on the contour map are the actual paths taken by COMATOSE in optimizing a number of starting conformations. Note the shallow minimum in the upper right quadrant (SO’, 100”) which traps the refinement for all starting conformations in its vicinity. Also interesting is another shallow minimum in the lower right (-40”. -40”). which is sought by an even larger number of starting conformations. However, in this case, if the minimization is pursued long enough (more than 25 cycles in some cases),

99

Relaxation matrix analysis

120

60

I2 0

-60

-240

-160

-120

60

0

60

120

Ctill

FIG. 11. Paths taken by COMATOSE in refining the conformation of a single tryptophan residue. Contours are drawn at equal error levels determined during a complete conformational scan by COMATOSE and displayed using the NCSA program Imagetool (National Center for Supercomputing Applications, Attn: Lyon Samson 152 Computing Applications Building, 605 East Springfield Ave., Champaign, Illinois 61820). The error function is that defined in eqo. (14) of the text. The lines depicting the COMATOSE refinements were generated during separate runs of COMATOSE from the conformations indicated by the origins ofeach line. Each path is also annotated with the position of each conformer after the cycle number indicated. The dashed paths indicate those refinements that became trapped in a local minimum.

COMATOSE is eventually successful in seeking out the true global minimum. This eventual success will not generally be possible with actual data where the noise in the measured intensities tends to smear out the details of the ‘terrain’, and persistent minimization is not warranted. In addition to the local minimum problem encountered by COMATOSE, there are other practical problems with its application. First, the algorithm used for the conformational optimization requires large amounts of computer memory. This restricts the maximum size of molecule which can be studied by COMATOSE to approximately 70 residues (amino acids, nucleotides or saccharides). Second, the end result from COMATOSE is a coordinate set that fits the experimental intensities reasonably well, but is not necessarily energetically favorable. The way of defining the conformation with freely floating residues (defined only by one set of Cartesian coordinates and a set of rotation angles) resulted in a final structure which is not necessarily well connected. These flaws could be cleaned up by iteratively cycling through COMATOSE and an energy minimization program such as AMBERJz4’ but once this cycling is accepted, it is not clear that COMATOSE yields significantly better information than the distances available from MARDIGRAS. MARDIGRAS, however, yields its results in a fraction (only a few percent!) of the time required for a COMATOSE analysis. It may be that COMATOSE will find a niche for optimizing structures, such as the nucleic acids for which the gross structure is pretty much invariant (and for which it was designed). For protein structure analysis it seems that the MARDIGRAS approach might be a better route to take.

loo

B. A. BORGIAS et al. 5. SUMMARY

Complete relaxation matrix analysis methods are finally gaining general recognition in the analysis of 2D NOE spectra. Currently it is fairly easy to generate the distances which correspond to the measured intensities assuming a single static molecular conformation. The problems of molecular motion and conformational multiplicity have yet to be fully addressed, except in the off-hand recognition of the fact that the molecules are in motion and the intensities somehow reflect that motion. At this point it is not yet possible to define the structure and the motion from 2D NOE data alone. Nor does this capability appear to be on the horizon; there are too many structural and motional variables in proportion to the amount of data available. On the other hand, we are in a position to begin testing various models, e.g. generated by extended molecular dynamics runs, for their consistency with the experimental data. In addition, we can begin to include other experimental insights into molecular motions, e.g. from relaxation time measurements. Meanwhile, we can still make use of the time-averaged structures which can now be generated. Acknowledgements-This work was supported by National Institutes of Health grants GM 39247, CA 27343. and RR 01695. We gratefully acknowledge the use of the University of California, San Francisco, Computer Graphics Laboratory (supported by NIH Grant RR 01081). REFERENCES 1. 2. 3. 4. 5. 6. 7.

JEENER, Ampere International Summer School II, Basko Polje, Yugoslavia (1971). S. MACURA and R. R. ERNST, ~Mol. Phys. 41, 95 (1980). I. SOLOMON, Phys. Rev. 99, 559 (1955). L. WERBELOW and D. M. GRANT. Adu. Magn. Res. 9, 189 (1978). T. E. BULL, J. hfayn. Reson. 72, 397 (1987). L. E. KAY, J. W. SCARSDALE, D. R. HARE and J. H. PRESTEGARD, J. Magn. Reson. 68, 515 (1986). L. E. KAY, T. A. HOLAK, B. A. JOHNSON, I. M. ARMITAGE and J. H. PRESTWARD, J. Am. Chem. Sot. lo&4242 (1986). 8. J. W. KEEPERS and T. L. JAMES, J. Magn. Reson. 57,404 (1984). 9. B. A. BORGIAS, P. D. THOMAS and T. L. JAMES, Complete Relaxation Matrix Analysis (CORMA), University of California, San Francisco (1987, 1989). 10. B. A. BORGIAS and T. L. JAMES, J. Magn. Reson. 79.493-512 (1988). 11. A. M. GRONENBORN and G. M. CLORE, Prog. NMR Spectrosc. 17, l-32 (1985). 12. H. L. EATON and N. H. ANDERSEN, J. Magn. Reson. 74, 212 (1987). 13. X. LAI, T. M. MARSCHNER and N. H. ANDERSEN, 301 Experimental NMR Conference, p. 158 Asilomar,

California (1989). 14. G. ESPOSITO and A. PASTORE, J. Magn. Reson. 76, 331 (1988). 15. 16. 17. 18. 19.

W. MASSEFSKI, JR. and P. H. BOLTON, J. Magn. Reson. 65, 526 (1985). E. T. OLEJNICZAK, R. T. GA.MPE, JR. and S. W. FESIK, J. Magn. Reson. 67, 28 (1986). P. A. MIRAU, J. Magn. Reson. 80. 439 (1988). R. BOELENS, T. M. G. KONING and R. KAPTEIN, J. Mol. Srruct. 173, 299 (1988). R. BOELENS, T. M. G. KONING, G. A. VANDER MAREL, J. H. VANBOOM and R. KAPTEIN, J. Magn. Reson. 82,290

(1989). 20. B. A. BORGIAS and T. L. JAHES, J. Magn. Reson. (in press). 21. E.-I. SUZUKI, N. PATTABIRISIAN,G. ZON and T. L. JAMES, Biochemistry 25, 6845 (1986). 22. N. ZHOU, A. M. BIANUCCI, N. PA~ABIRAMAN and T. L. JAMES, Biochemistry 26, 7905 (1987). 23. N. ZHOU, S. MAN~GARAN, G. ZON and T. L. JAMES, Biochemistry 27, 6013 (1988). 24. U. C. SINGH, P. K. WEINER, J. CALDWELL and P. A. KOLLMAN, AMBER 3.0: A Molecular Mechanics and Dynamics Program, University of California, San Francisco, CA (1986). 25. P. Lu. S. CHEUNG and M. DONLAN, Adv. Biophys. 20, 153 (1985). 26. H. WIDMER and K. WUTHRICH, J. Magn. Reson. 70,270 (1986). 27. J. RINKEL and C. ALTONA, J. Biomol. Strucr. Dynamics 4, 621 (1987).