JOURNAL
OF MAGNETIC
RESONANCE
55, 3 16-32 1 ( 1983)
NOTES A Multivariate Method for Carbon-13 NMR Chemical Shift Predictions Using Partial Least-SquaresData Analysis DAN JOHNELS,ULFEDLUND,ERIKJOHANSSON,
AND SVANTE WOLD
Department of Organic Chemistry and Research Group of Chemometrics, Vmed University, S-901 87 Vmed. Sweden Received December 7, 1982; revised May 23, 1983
To obtain a complete assignment of the lines in a 13C NMR spectrum of a complex molecule, it is normally required to combine several assignment techniques. Various decoupling procedures, spin-lattice relaxation data, shift reagents, 13C-lH coupling constants, etc. all provide useful information for assigning resonances to individual carbons (I). Absolute assignments, using 13C or ‘H labeled compounds are both impractical and expensive. The recent development of sophisticated polarization transfer sequences is helpful only in special cases. Hence, in most complex systems the chemist still has to rely on analogy reasoning, i.e., on a comparison of chemical shifts in the compounds studied with those in suitable model compounds (2). This chemical shift correlation method has its pitfalls, as expected in view of the sensitivity of r3C shieldings to minor steric and conformational changes. Many correlation methods, mainly based on connectivity indices or fragmental, topological descriptors, have recently appeared (3). This approach is likely to be successful only in systems where there are weak bonding interactions between the molecular subunits, i.e., nonconjugated systems. For aromatic systems the most common methods to interpret and predict 13C chemical shifts are based on dual (DSP) or multiple substituent parameter equations (4~). In some aromatic systems, spectral assignments have been made on the basis of standard tables of SCS values for the four positions in monosubstituted benzenes, but in many cases there are vast discrepancies between predicted and observed shifts (4b). In our laboratory, the application of principal components (PC) models has shown to be useful since these models do not require any predetermined, fixed parameters (5). In a recent study of the 13C shifts of more than 80 monosubstituted benzenes, we found clusters in the PC plots, which reflect the chemical shift behavior (6). Benzene derivatives with substituents belonging to acceptors, donors, halogens, and alkyls formed clear subclasses. This indicates that the apparent fit of multiple parameter substituent models mostly accounts for differences between these four subclasses. The limited predictive ability of these models is mainly caused by considering the data analysis of substituent effects as a one class problem. Better description ofthe substituent induced chemical shifts (SCS) was obtained by deriving simple local models for each subclass of substituents (6). 0022-2364183 $3.00 Copyright 0 1983 by Academic Press. Inc. All rights of reproduction in any form reserved.
316
317
NOTES
This type of study has been extended to other systems, i.e., styrenes, indenes, naphthalenes, and triazenes. In all cases a similar cluster pattern was revealed. In the present communication it is our intention to utilize this similarity to predict the “C chemical shifts in two different systems, 2-substituted naphthalenes and 4substituted pterphenyls, from the 13C SCS-in the benzene system. A list of the substituents used is given in Scheme 1.
X
Me, Et, F, Cl, Br, I, COMe, C02Me, CN, NOZ, NH*, NMe2, OMe, OH, NHCOMe, OCOMe, t-Bu, COzH CHO CHzBr
= H,
X = H, Me, C02Me, N02, CN, NH2, NMe2, Cl, Br, I SCHEME
Ref. (II) Ref. (7) Ref. (22) Ref. (13)
Ref. (8)
1. Benzene chemical shifts from Ref. (14). All shifts were measured in CDCI, or CC&.
The choice of the 2-substituted naphthalene system was mainly the result of a search for a sufficiently large i3C chemical shift data matrix. The failure of DSP m&hods to account for shielding differences (7) for most carbons (except C6 and Cl 0) also made it a challenge to find a better predictive model in this case. It has also been noted that tables of or&ho SCS values for the monosubstituted benzenes cannot be used to predict 13C shifts in 2-naphthalenes (4b). The 13Cdata of 4-substituted pterphenyls have been thoroughly analyzed by various substituent parameter models, and the need for DSP and extended nonlinear resonance (DSP-NLR) models has been claimed (8). In the initial step of the data analysis, the shift data were scaled to unit variance, to give all positions the same weight in the modeling (9). For the data analysis the partial least-squares (PLS) method (10) was used. The scaled shift data were divided into two blocks, X and Y, as shown in Fig. 1. Each of these blocks was then modeled by a product of two smaller matrices, a loading matrix (B and C, respectively) and a score matrix (7’ and U, respectively) i.e., X = TB + El and Y = UC + ~5’2,where El and E2 are residuals. The matrices B and C were calculated so that both (IX - TBll and 11Y - UC’ll were small and so that U and T were correlated with each other columnwise. The validated results, i.e., the di@erences between predicted and experimental values where the “predicted” compound was excluded in the calculation of the PLS model, are reported in Table 1. It should be stressed that the used validation procedure corresponds to the prediction of the 13C SCS of an unassigned compound. Therefore, this procedure contrasts other data analysis methods where the tested compound is included in the regression.
318
NOTES Variables
Variables 1.. k...r
diagonal
matrix
loading
I
data matrix (n x PI
data matrix (n x r)
Y
“test
set-
__.
___
_ _.
_--
- _
”
* 1
FIG. I. Outline of procedurefor data analysis.
If one considers that the chemical shifts were taken from various laboratories, the predictive ability of the PLS method is remarkably good (Table 2). The standard deviation of prediction for remote carbon positions is reduced to a level close to the overall experimental error. The predictions of the ipso and ortho positions are satisfying but can be further improved by using reduced, unscaled matrices containing only these positions in the Y matrix. As seen in Table 1, the standard deviation of the residuals is then reduced to the same level as for the other positions. The results show that the four SCS in the benzenes (X) can describe all systematic variation in the shift data of the 2-substituted napththalenes and the 4-substituted p-terphenyls (Y). This means that all SCS in the latter systems are predicted from the shift values of the corresponding substituted benzenes. Thus, the present results show that the PLS data analysis approach offers a potent tool for predicting and assigning r3C shift values of quite different aromatic systems. Since the model used is directly based on measured data of the actual structures, it will always have a better average descriptive and predictive power than DSP or other multiparameter equations. This statement is valid provided that the same number of parameters is used in the equations and that the same variables or SCS values are handled. Preliminary results in the i-substituted naphthalene system indicate, as expected, that the presence of steric effects that are absent in the monosubstituted benzenes and that modulate the substituent effect will reduce the predictive power of the benzene data matrix. It is likely that the present approach can be extended to a variety of conjugated, sterically unperturbed systems but this needs further investigation. Hence, a prediction of the r3C NMR chemical shifts of suitable monosubstituted aromatic species can be made provided that a “training set” is available which contains (a) correctly assigned 13C data for at least six to eight representative substituted compounds in the series (Y matrix). Since most of the shift variance is explained by
319
NOTES TABLE 1 RESULTSOFVALJDATEDPLSPREDJCTIONS System 2-Naphthyl (N = 21)
4-pTerpheny1 (N = 10)
Position
SD”
s(EWb
SWW 1.2 0.3 0.9
1 2 3 4 5 6 I 8 9 10
9.7 16.2 4.8 0.8 0.2 2.1 0.6 1.3 1.0 2.7
1.9 3.5 2.1 0.4 0.2 0.3 0.4 0.4 0.4 0.3
1 2 3 4 I’ 2 3 4’ 1” 2 3 4”
5.8 0.6 7.7 17.7 1.0 0.4 0.1 1.0 0.3 0.1 0.1 0.3
0.6 0.4 2.2 4.7 0.6 0.4 0.1 0.2 0.1 0.01 0.04 0.1
0.4 0.3
LIStandard deviation around the mean value. b Standard deviation a&x applying a three component PLS model. c Ipso and or&o positions modeled separately. N = number of substituted compounds.
the interclass behavior, at least one substituent from each of the alkyl, donor, acceptor, and halogen classes is needed. If fluorine is chosen as a representative for the halogens, one additional halogen should be included. (b) correctly assigned benzene data (X matrix) for the same substituents as in the Y matrix. Shift predictions can then be made for unassigned compounds (test set) for which the corresponding benzenes have been measured. As exemplitkd by the 2naphthalenes, the PLS approach can provide chemical shift predictions that are considerably better than those obtained by conventional shift correlation or DSP methods (4b, 7). It is beyond the scope of this report to consider the theoretical implications of the observed general clustering behavior. However, since in both the systems studied all systematic shift variation can be described by the benzene 13C values, there is no need to interpret data in another way than for monosubstituted benzenes. Hence, in the pterphenyl system we do not find any reason to use a model specitk for disubstituted derivatives, i.e., for the 1’ and 4’ carbons (8). Shifts for these carbons, as well as for the others, are well predicted by the “C shifts of the monosubstituted benzenes. As mentioned above, even better shift predictions may result from the use of local
NOTES
TABLE 2 DWFERENCES BETWEEN EXPERMENTAL SoraRmwsmm~nvEZSummum Substituent
Me
Position I 2 3 4
5 6
I 8 9
10 NHz
1 2 3 4 5 6
I 8 9
10 COMe
1 2 3 4
126.10 135.09 121.90 121.54 121.42 124.13 125.62 127.06
133.60 131.62
0.04 -0.11
130.03 134.50
-0.34 -1.07 -0.66 0.31 0.03 0.32 0.20 0.23 0.15 0.18
1
129.88
119.15 129.16 129.46 127.78 126.16 126.40 126.82 134.55 131.80
5 6
I 8 9
10
2.28 -0.12 -0.14 -0.33 -0.15 -0.41 -0.23 -0.45
125.14 134.91 121.96
2 3 4
8 9
1.97 4.41
-1.35 -4.83 -2.44 0.31 0.21
10
6
1
AWam4
108.54 144.02 118.19 129.10 121.66 122.39 126.22
123.81 128.34 121.72 128.34 126.12 129.50 132.50 135.58
5
Br
G(exp)
AND CALCULATED SHINS D NAPHTHALENES
AWxpvW’
IN
.
0.28 0.13 -0.21
0.63 -0.19 -0.16
0.11 0.49
0.36
2.64 4.33 2.42 -0.12 0.01 -0.14 -0.55 -0.35 0.60 0.05
0.27
0.11 -0.04
0.34
0.01 -0.07
a Using a Y matrix containing only ipso and ortho positions.
models for the subclasses in the PLS analysis. A limitation of this approach is that a larger number of correctly assigned spectra must be at hand, to represent accurately each subclass.
321
NOTES ACKNOWLEDGMENT Grants
from
the Swedish
Natural
Science
Research
Council
to U.E.
and S.W.
are gratefully
received.
REFERENCES 1’. (a) F. W. WEHRLI AND T. WIRTHLIN, “Interpretation of C-13 NMR Spectra,” Chap. 3, Heyden, London, 1976; (b) M. L. -TIN, J.-J. DELPUECH, AND G. J. MARTIN, “Pmctical NMR Spectroscopy,” Heyden, London, 1980. ;I. H. J. REICH, M. JAUTELAT, M. T. HESSE, F. J. WEIGERT, AND J. D. ROBERTS, J. Am. Chem. Sot. 91, 7445 (1969). 3, N. A. B. GRAY, “Pro8res.s in Nuclear Magnetic Resonance Spectroscopy” (J. W. Emsley, J. Feeney, and I. H. Sutcliffe, Eds.), Vol. 15, p. 201, Pergamon, Oxford, 1982. 4. (a) D. F. EWING, “Correlation Analysis in Chemistry, Recent Advances” (N. B. Chapman and J. Shorter, Eds.), Chap. 8, Plenum, New York, 1978; (h) D. J. CRAIK AND B. TERNAI, Org. Magn. Reson. 15, 268 (1981). 5. (a) M. SJ&XR~M AND U . EDLUND, J. Mugn. Reson. 25,285 (1977); (h) U. EDLUND AND A. NOF~XR~M, Org. Magn. Reson. 9, 196 (1977); (c) U. EDLUND AND S. WOLD, J. Magn. Reson. 37, 183 (1980); (d) B. ELIASSON AND U. EDLUND, J. Chem. Sot. Perkin Trans. II, 403 (1981). 6. D. JOHNELS, S. CLEMENTI, W. J. DUNN III, U. EDLUND, H. GRAHN, S. HELLBERG, M. SJOSTRBM, AND S. WOLD, J. Chem. Sot. Perkin Trans. II, 863 (1983). 7’. W. KITCHING, M. BULLPITT, D. GARTSHORE, W. ADCOCK, T. C. KHOR, D. DODDRELL, AND I. D. RAE, J. Org. Chem. 42, 2411 (1977). 6’. N. K. WILSON AND R. D. ZEHR, J. Org. Chew. 47, I184 (1982). !? M. P. DERDE, D. COOMANS, AND D. L. MASSART, Anal. Chim. Acfa 141, 187 (1982). 10. S. HELLBERG, S. WOLD, AND W. J. DUNN III, “Proceedin8s of the Symposium on Applied Statistics” (A. Hiiskuldsson and K. Eshensen, Eds.), Copenhagen, 1982, ISBN 87-88257-00-2. 11. H. TAKAI, A. ODANI, AND Y. SASAKI, Chem. Pharm. Bull. 26, 1966 (1978). 12. J. SEITA, J. SANDS-I-ROM, AND T. DRAKENBERG, Org. Magn. Reson. 11,239 (1978). 13. M. BULLPITT, W. KITCHING, D. DODDRELL, AND W. ADCOCK, J. Org. Chem. 41,760 (1976). 14. D. F. EWING, Org. Magn. Reson. 12,499 (1979).