Utility of fitting two-dimensional NOE spectra

Utility of fitting two-dimensional NOE spectra

JOlJRNAL OF MAGNETIC RESONANCE 81, 17% 185 ( 1989) Utility of Fitting Two-Dimensional E. T. OLEJNICZAK, R. T. GAMPE, NOE Spectra JR., AND S...

774KB Sizes 3 Downloads 36 Views

JOlJRNAL

OF MAGNETIC

RESONANCE

81,

17% 185 ( 1989)

Utility of Fitting Two-Dimensional E. T.

OLEJNICZAK,

R. T.

GAMPE,

NOE Spectra

JR., AND S.

W.

FESIK

Pharmaceutical Discovery Division, Abbott Laboratories, Abbott Park, Illinois 60064 Received February 15, 1988; revised May 2, 1988

The two-dimensional NOE experiment is the most powerful tool for determining three-dimensional structures of m o lecules in solution ( I, 2). The quality of the structures that are obtained depends on the number of NOES that can be identified as well as the accuracy with which individual NOE cross peaks can be integrated and analyzed in terms of proton-proton distances. More reliable distances are obtained if higher-order effects, such as spin diffusion, are taken into account in a quantitative analysis of the NOE data, leading to more distance constraints from long-range NOES (3). A severe lim itation of this approach is the requirement for accurate cross- and diagonal-peak volumes, which are especially difficult to measure in complicated 2D NOE spectra of biomacromolecules. Various strategieshave been suggestedto quantify NOE peaks (4-9). In particular one m ight expect that linear prediction algorithms are the most promising for this purpose. However, in our own experience it is difficult to satisfactorily quantitate weak cross-peakintensities in the presenceof very intense diagonal peaks using linear prediction. Another approach for obtaining NOE cross-peak intensities involves a simpler least-squaresfitting procedure of two-dimensional NOE spectra. This approach (9) has the advantage of using a smaller set of variables than in linear prediction. The procedure was described by Denk et al. (9) and utilizes a set of reference lines to reconstruct the spectra. The procedure reduces the problem to a least-squaresfit of the a m p litudes of the peaks while the information about possible peak positions and lineshapesis contained in a catalog of reference lines. In this note we explore different strategiesfor obtaining reference lines and applying the fitting procedure described by Denk et al. ( 9). In addition, we test the quality of the fitting procedure and the ability of the method to accurately integrate overlapping cross peaks. By fitting two-dimensional NOE spectra, many new possibilities are available for data m a n ipulation. F itted NOE spectra can be used to resolve overlapping peaks and even to confirm cross-peakassignments.Examples of these and other applications will be presented as well as a brief summary of some of the unique features of the program. Using the same notation as Denk et al. ( 9) (i.e., L is a (12by i) matrix consisting of the (n) frequency-domain points of each of the (i) different reference lines, A is the (i by 1) vector of desired peak a m p litudes, and S is the ( IZby 1) vector containing the 0022-2364189 $3.00 Copyright 0 1989 by Academic Press. Inc. AU rights of reproduction in any form resewed.

178

NOTES

179

( n) frequency-domain points to be fit), the spectrum fitting problem can be rewritten as

M*A=B,

111

where M = L+*L

PI

B = L+*S.

[31

and Rewritten in this form it is easy to see that this is a standard least-squares problem. The preferred method to solve this type of problem is to use numerically stable decomposition methods. These methods are much better than explicitly calculating an inverse of matrix M . Programs to do this are available in most mathematical subroutine libraries (10). The program that was developed was designed to fit very complicated NMR spectra with many proton resonances. The computer routines were written in Fortran and all computations were run on a VAX-780 (Digital Equipment Corporation). The general strategy of the program consists of two distinct parts. The first part is to define the lineshapes of all resonances of interest and store these reference lines in a data file. Reference lines can be obtained from any resolvable cross or diagonal peak in the spectra ( 9). For example, a reference line for an CYproton could be obtained from a well-resolved NOE cross peak between an (Y and amide proton. The frequencydomain data can be used directly as a reference line, as long as a cross or diagonal peak can be found which is well resolved and where the signal-to-noise is sufficiently high. In practice, this is accomplished by first defining the region of the w1 or w2 slice which contains the resonance lineshape, zeroing all data points outside of this region and then storing the data points containing the desired reference lineshape in a data file. For very complex spectra, reference lines may be difficult to obtain due to the lack of well-resolved cross peaks. In order to circumvent these problems, we have implemented several strategies for obtaining reference lines in complex spectra. One method to separate a pair of overlapping lines or to better define the shape of a noisy peak is to use a least-squares fitting procedure of the individual lines to a Gaussian (or Lorentzian) lineshape. The fitted lineshape is then stored as the reference line. When this first strategy fails, we have tried to artificially improve the resolution of our spectra by using the leastsquares procedure to remove all of the resonances in the spectra for which reference lines have previously been generated. Unfortunately, in some cases, the badly overlapping cross peaks which are resolved in this manner have severely distorted lineshapes and thus are not able to be used directly as reference lines. Another approach for obtaining reference lines is to use data from other 2D correlation experiments which have better resolution for the cross peaks of interest. For example, in a heteronuclear multiple-quantum correlation experiment, the proton resonances are spread in the w1 dimension as a function of the chemical shift of the attached heteronucleus. This may help to resolve additional proton lineshapes which can be used as reference lines.

180

NOTES

Once reference lines are obtained, the second step is to determine the correct amplitude for each reference line in the 2D spectra by performing a least-squares fit using Eqs. [I] - [ 31. In order to reduce the computing time in this step and to decrease numerical errors, the algorithm that we have written performs the least-squares fit using a reduced set of reference lines chosen from the full set specifically for each o, or w2 slice to be fitted. The reduced set of reference lines is obtained by identifying regions of the spectra containing signals then comparing these regions to the known location of the reference lines. If a reference line overlaps with any region containing peaks, then it is included into the reduced set of reference lines. This section of the code was written conservatively and generally allows about twice as many reference lines as absolutely necessary. It is important to make the criteria generous enough to ensure that all necessary reference lines are included. Any erroneous reference lines included in the set are not a problem, since they will simply be eliminated by the least-squares procedure. The computing time saved by culling the set of reference lines for an individual trace of a 2D spectrum is small. However, in an automated fit of a complete two-dimensional spectrum, a significant amount of time is saved by avoiding a fit to the complete set of reference lines, especially in the rows or columns containing only noise. For protein spectra which may require hundreds of reference lines, it may also be prudent to reduce the dimensionality of the problem in order to avoid computational errors which will increase with the number of reference lines. Ideally, separate reference lines should be obtained for w1 and w2 ( 9). However, in order to minimize the time and effort of the user, the algorithm can use the same set of reference lines when fitting either w, or w2 traces, providing that the data matrices are square. In our hands, this time-saving strategy works surprisingly well for typical low-resolution (1 K by 1K) two-dimensional NMR spectra if the number of spectra accumulated in ti is greater than one-quarter of the number of points in t2. In one of the analysis strategies described here, reference lines are only needed for w2 and in these examples it was important to have very good resolution in w2 (i.e., 1.22 Hz/ point). These data matrices were not square (e.g., 1K( wi) by 4K( w2)). In Figs. 1A and 1B we show portions of a two-dimensional NOE spectrum of atria1 natriuretic factor (ANF) (7-23))

~-F-C-C-;F;-I-D-R-;S-G-A-Q-~-G-L-G-~ I

1

in the presence of an aqueous solution of sodium dodecyl sulfate ( SDS-dZ5) micelles. Thirty-four reference lines were included for the (Y and amide protons and one 0 proton found in this region of the spectrum. All of the reference lines were obtained from the fraction of the two-dimensional NOE map shown in Fig. 1, and the same set was used when fitting wI or w2 traces in the spectra. Better reference lines could have been generated by using the full spectrum, but we wanted to demonstrate that reasonably good reference lines could be obtained even if some of them are derived from overlapping cross peaks. Of the 34 reference lines used in the calculation, 12 were obtained from overlapping cross peaks using the lineshape-fitting procedure described earlier. The total user time necessary to create the full set of reference lines was less than an hour.

181

NOTES

The quality of the fitting procedure can be judged from a plot (Figs. 1C and 1D) of the difference between the amplitudes of the fitted spectrum and the experimental spectrum plotted on the same scale (just above the thermal noise). As shown in the difference spectrum (Fig. lD), the diagonal peaks are not completely canceled but are of magnitude similar to that of small cross peaks. The large diagonal signals which remain are due to several small contaminants in the sample. The small contours remaining in the amide ( 02), (Y (0,) spectral region (Fig. 1C) do not line up with any assigned resonances, suggesting that these contours correspond to noise in the spectrum. The quality of the fit can be seen more clearly in Fig. 2 where o, traces from the experimental spectra (Figs. 2A and 2C) are compared to wI traces of the difference spectra (Figs. 2B and 2D). These traces convincingly demonstrate that the errors in the fit of the diagonal peaks are of magnitude similar to that of small cross peaks and that the errors in the fit of the cross peaks are not much greater (less than a factor of 2) than the noise in the spectra. The results in Fig. 2 also suggest that numerical suppression of the diagonal peaks is feasible with the fitting procedure. As a further test of the method. we checked to see if the amplitudes of overlapping

.

i

‘.

C

lo ,,--.->: 8.0 0.4

lm._a!: 0.0 PPM

7.6

8.8

8.4

8.0 PPM

7.6

FIG. 1. (A, B) Contour plots of a 2D NOE experiment acquired at 40°C using a 10 m M Hz0 solution of ANF( 7-23) in SDS-& (200 mM) micelles. The experiment was performed on a General Electric GN500 NMR spectrometer with a mixing time of 200 ms. The real and imaginary parts of the t, dimension (280 t, values) were collected separately and processed using the procedure of States et al. (14). The data were processed using a CSPI minimap array processor using software written by E. R. P. Zuiderweg. All plotting was done using Dr. Dennis Hare’s FTNMR program (Hare Research Inc.). Cosine window functions were used in both dimensions and the frequency-domain spectra were baseline corrected in w, and wt. (A) The 01(wr) amide (wr ) spectral region of ANF( 7-23). (B) The amide (w,) amide ( wI) spectral region of ANF( 7-23). (C, D) Difference between the experimentally determined spectrum and the fitted peak amplitudes. Both spectra are plotted on the same scale using identical contour levels. (C) Difference spectra of the (Y(w,) amide ( w2) spectral region of ANF( 7-23). (D) Difference spectra of the amide ( w,) amide ( w2) spectral region of ANF( 7-23).

182

NOTES

/

9.0

I

9.0

I

1

8.0

/

8.0

7.0

I

7.0

/

I

6.0 PPM

I

6.0 PPM

I

5.0

/

5.0

4.0

I

I

4.0

FIG. 2. (A) Trace along W, at the I1 5NH frequency of the experimental 2D NOE spectrum (Fig. 1A). ( B) Difference between the experimental trace shown in (A) and the spectrum fitted by the least-squares procedure. (C) Trace along w, at the G 1ONH/ R 11NH frequency of the experimental 2D NOE spectrum (Fig. 1A). (D) Difference between the experimental trace shown in (C ) and the fitted spectrum.

cross peaks could be quantitated correctly from the fitted spectra. In particular, if the lineshapes of overlapping diagonal peaks are different, can the algorithm correctly quantitate the intensities of overlapping cross peaks and also differentiate between the cross peaks (i.e., determine which cross peaks are connected to which diagonal peaks)? In the 2D NOE spectrum of ANF( 7-23 ) (Fig. 1A, B) , the NH signals corresponding to Q18/G20 (8.16 ppm), GlO/Rll (8.08 ppm), and A17/L21 (7.91 ppm) overlap and cannot be resolved in a conventional 2D NOE experiment. In an earlier study, the overlapping peaks were resolved by performing isotope-filtered 2D NOE experiments using an ANF( 7-23) sample that was “N labeled for the GlO, A 17, and G20 residues ( I I ) . It was of interest to determine whether the fitting procedure could also be used to resolve these peaks from a conventional 2D NOE spectrum. The comparison between the two methods (i.e., fitting versus filtering) is shown in Fig. 3. In Fig. 3A is shown a trace along w1 at the A 17NH/L2 1NH frequency ( w2) of the 2D NOE experiment. Overlap between the cross peaks of A 17NH and L2 1NH is clearly not resolvable in Fig. 3A. However, due to the difference in the resonance lineshapes of the A17NH and L2lNH diagonal peaks obtained from the well-resolved cross peaks (A 17NH/ A 17”, L2 1NH/ L2 la), the two sets of cross peaks could

183

NOTES

C

aI.0

6’. 0 PPM

4’. 0

FIG. 3. (A) A trace along w, at the A17NH/L21NH frequency (w2) of a 2D NOE spectrum. This spectrum was accumulated with the same delays as the isotope-filtered spectrum but without the ‘*N pulses. (B) A trace along w, at the A17NH/L21NH frequency (wz) of a 2D NOE spectrum. In this spectrum the fitted amplitudes of the L2 1NH cross peaks have been found by the algorithm and subtracted from the trace shown in (A) leaving only the cross peaks and diagonal peak corresponding to A 17NH. (C ) A trace along w, at the A 17NH ( w2) frequency obtained from an isotope-filtered 2D NOE experiment (see Ref. ( I1 ) for experimental details).

be differentiated using the fitting procedure. This is accomplished by instructing the program to scan all w2 traces and to only subtract out intensity that can be fitted to the L2 1NH reference line. This leaves the diagonal and cross peaks of Al7NH unaffected (Fig. 3B). This result can be compared directly to the data from the isotopefiltered study in Fig. 3C. The cross peaks corresponding to L2 1NH that are not present in the isotope-filtered spectrum are also eliminated quantitatively in the “synthetically filtered” spectrum in Fig. 3B. Furthermore, the intensity of the A 17NH/G 16H” and the A 17NH / (G 16NH and/ or Q 18NH) cross peaks are similar in the fitted (Fig. 3B) and filtered (Fig. 3C) spectrum. This example clearly demonstrates that fitted 2D NOE spectra can accurately resolve the amplitudes of badly overlapping cross peaks and can even differentiate between the cross peaks of overlapping diagonal peaks provided that the reference lines of the diagonal peaks are different. One simple method which can be used to analyze overlapping cross peaks is to regenerate the fitted spectra with a new, much narrower Gaussian lineshape. This

184

NOTES

capability is a simple extension of the program and has been implemented as described below. As each w, column is fitted, the program replaces each peak by a Gaussian of the correct integrated intensity. This is followed by an analogous procedure along each row. Since the final ( w1, w2) coordinates for any peak depends on position pointers the user inputs for each of the reference lines, all of the peaks in the spectra can be regenerated such that none of them overlap. In Fig. 4A we illustrate the utility of this approach using an example in which we have regenerated the peaks at the center of the original proton positions. Many of the cross peaks which were unable to be resolved in Fig. 1A are now easily resolvable. Fitted 2D NOE spectra can also be used to advantage when comparing simulated NOE experiments to experimental data. Since the fitted NMR data can be regenerated with a known lineshape, direct comparison to fully simulated spectra is facilitated. An example of this is shown in Fig. 4B where we have simulated the 2D NOE spectra for ANF( 7-23) based on a proposed three-dimensional structure (12). The simulation program (13) calculates separate free induction decays for the real and imaginary parts of the tr dimension of a 2D NOE experiment. The data are then processed like conventional phase-sensitive 2D NOE data. Due to the improved resolution of the fitted spectra regenerated with a narrow Gaussian lineshape (Fig. 4A), detailed comparisons can be made between it and the simulated data (Fig. 4B). Several qualitative similarities and differences are apparent in the data. These differences are being evaluated as part of a refinement strategy for the three-dimensional structure determination of the peptide. In summary, many new analysis strategies can be employed by using the data obtained from fitted 2D NOE spectra. We have demonstrated the advantages of the fitting procedure for both quantitation and “resolution” of NOE cross peaks by synthetically filtering out selected resonances and have described several approaches to obtain reference lines required for the fitting procedure. In our experience these methods are useful in the interpretation and analysis of 2D NOE spectra.

PPM

PW

FIG. 4. (A) Stacked plot of the spectra of ANF( 7-23) regenerated with the fitted amplitudes and narrow Gaussian lineshape. The halfwidth of the Gaussian at half-height is 0.7 data point. The two glycine ru protons of both Gly 10 and Gly 22 are nearly degenerate and were regenerated at the same position. (B) Stack plot of the simulated NOE spectra based on a proposed three-dimensional structure of ANF( 7-23) derived from NMR data (ZZ). The correlation time for overall tumbling of the molecule used in the simulation is 1.7 ns. The correlation time for methyl rotation was assumed to be 0.1 ns. The mixing time in the simulation is 200 ms and is the same as in all of the experimental spectra. Methyl relaxation was treated using the formalism described by Tropp (15) and Woessner (16).

NOTES

185

ACKNOWLEDGMENTS The authors thank E. R. P. Zuiderweg for sharing his data-processing software and for many stimulating discussions and suggestions. The authors also thank G. Wagner for useful discussions and Todd Rockway for the synthesis of ANF( 7-23). REFERENCES

1. G. M. CLOREAND A. M. GRONENBORN, Protein Eng. 1,275 (1987). 2. 3. 4. 5. 6. 7. 8. 9.

10. 11. 12. 13. 14. 15. 16.

K. W~THRICH, “NMR of Proteins and Nucleic Acids,” Wiley, New York, 1986. E. T. OLEJNICZAK, R. T. GAMPE, JR., AND S. W. FESIK, J. Mugn. Reson. 67,28 (1986). J. TANG AND J. R. NORRIS, .I. Mugs. Reson. 69, 180 (1986). H. BARKHUUSEN, R. DEBEER, W. M. M. J. BOVEE, AND D. VAN ORMONDT, J. Magn. Reson. 61, 465 (1985). T. A. HOLAK, J. N. SCARSDALE, AND J. H. PRESTEGARD, J. Magn. Reson. 74,546 (1987). V. MANASSEN, G. NAVON, ANDC. T. W. MOONEN, J. Magn. Reson. 72,55 1 (1987). S. J. NEL.~ON AND T. R. BROWN, J. Mugn. Reson. 75,229 (1987). W. DENK, R. BAUMANN, AND G. WAGNER, J. Mugn. Reson. 67,386 (1986). For example, subroutine LLSQF from the IMSl subroutine library. S. W. FESIK, R. T. GAMPE, JR., ANDT. W. ROCKWAY, J. Magn. Reson. 74,366 (1987). E. T. OLEJNICZAK, R. T. GAMPE, JR., T. W. R~CKWAY, AND S. W. FESIK, Biochemistry 27,7 124 (1988). E. T. OLEJNICZAK, Ph.D. thesis, Harvard University, Cambridge, Massachusetts (1982). D. J. STATES, R. A. HABERKORN, AND D. J. RUBEN, J. Mugn. Reson. 70,336 (1986). J. TROPP, J. Chem. Phys. 72,6035 (1980). D. E. WOESSNER, J. Chem. Phys. 36, l(1962).