Evaluation of the jackknife technique for fitting multiexponential functions to biochemical data

Evaluation of the jackknife technique for fitting multiexponential functions to biochemical data

ANALYTlCAL BIOCHEMISTRY Evaluation 110, 407-411 (1981) of the Jackknife Technique for Fitting Multiexponential Functions to Biochemical Data I. A...

429KB Sizes 3 Downloads 61 Views

ANALYTlCAL

BIOCHEMISTRY

Evaluation

110,

407-411 (1981)

of the Jackknife Technique for Fitting Multiexponential Functions to Biochemical Data I. A. NIMMO, ANNE BAUERMEISTER,*AND J.E. DALE*

Department of Biochemistry, University of Edinburgh Medical School, Teviot Place. Edinburgh EHi3 9AG, Scotland, and *Department of Botany, University of Edinburgh, King’s Buildings, Mayfield Road, Edinburgh EH9 3.IH, Scotland

Received October 2, 1980 The jackknife technique was tested by fitting a two-exponential function to the time course of disappearance of radioactivity from the area of a wheat leaf that had been fed %Os. The function was fitted by both unweighted and weighted least squares, first without and then with the jackknife. Weighting altered the estimates of the function’s parameters, but jackknihng did not. Hence jackknifing did not remove any of the bias introduced by incorrect weighting. The confidence limits of the parameters calculated by jackknifing were greater than those estimated from the variance-covariance matrix of the regression, but sin’dar to those derived from replicate experiments. The jackknife also allowed confidence limits for the rate constants and transit time of the underlying two-compartment model to be derived.

ments can be assessed statistically. In practice bias should be absent as long as the error in the data is of known distribution (e.g., Gaussian) and variance (so that the individual points can be weighted correctly), Unfortunately the first of these requirements is not as easily met as might be supposed, because the distribution of error in biochemical experiments has not often been determined (5), and the same applies to the second, albeit to a lesser extent (5). The precisions of the primary parameters can be estimated from the final fit of the multiexponential function to the data, but they are of doubtful validity, especially if the model is severely nonlinear (6). The precisions of the secondary parameters are likely to be even less reliable, because when they are calculated the covariances of the primary parameters have to be taken into account as well as their variances (6). Thus the results of statistical tests carried out with precisions estimated from the final fit could easily be misleading.

Multiexponential functions describe numerous biochemical and physiological processes, one common example being the disappearance from plasma of some drug or metabolite (l-3). In many instances the data are analyzed in terms of a multicompartment model, and the rate constants for the transfer of material between compartments (the secondary parameters) calculated from appropriate combinations of the coefficients of the exponent& (the primary parameters) (1). The time course of thermal denaturation of a mixture of isoenzymes may also follow a multiexponential decay curve (e.g. (4)); in this case the secondary parameters (the relative activities and thermal decay constants of the enzymes) are equal to the primary parameters. Ideally the numerical method used to fit the multiexponential function to the experimental data should give estimates of both the primary and secondary parameters that are unbiased and of defined precision. The latter is particularly important, so that differences between individuals or treat407

0003-2697/81/020407-05$02.00/O Copyrisbt All rights

0 1981 by Academic Press, Inc. of reproduction in any fom reserved.

408

NIMMO,

BAUERMEISTER,

A possible way of reducing the bias in the parameters and of calculating realistic variances for them is to subject the data to the jackknife technique (7). This technique has recently been applied in the fitting of rate equations to enzyme kinetic data (g-11), with qualified success: its capacity to reduce bias was not proven, but it did seem to give reasonable confidence limits for the parameters. In this paper we shall evaluate the performance of the jackknife technique in fitting a two-exponential function, particularly with respect to its capacity for reducing any bias caused by incorrect weighting and for giving reliable confidence limits for the parameters. We have chosen as data the time course of disappearance of radioactivity from the area of a wheat leaf fed with a pulse of 14C0, (12), because the weighting factors of the individual points can be calculated simply by assuming that the numbers of counts recorded follow a Poisson distribution. Since the jackknife technique has not been used much by biologists, we shall first describe how it is applied. THEORY

AND METHODS

Jackknife technique. Suppose one has a good method for fitting the equation, but which may produce biased estimates of the parameters 8. Use it to fit the equation to all n data points, and call the estimates of the parameters 6. Now divide the data into g groups of size h (g*h = n), and delete the ith group from the rest. Fit the equation to the remaining (g-1)-h points, to get 6-i. Reinstate the ith group of data, delete the ith, and calculate &-j. Repeat, dropping each group of data in turn. From the values of 6 and I&~, calculate g “pseudovalues” of each of the parameters (e.g., Gi): 6, = g4

- (g-l)&.

(& is the weighted mean of ei and &.)

[II

AND

DALE

The final estimates of the parameters are the means of the g pseudovalues, and their confidence limits are calculated by assuming the pseudovalues follow Student’s t distribution with (g-l) df (7). Dafa. The experimental data (16 different sets) were for the time course of disappearance of radioactivity from a region of a wheat Ieaf that had been fed a pulse of 14C0, (12). During a typical experiment the radioactivity (y ) fell from 2000 to 400 cpm over a period of time (t) of 24 h, and 42-60 readings were taken. Numerical method. The two-exponential function y = X, *exp(-A,t) + X,*exp(-A,t) was fitted to the data by a versatile nonlinear regression program (13), written in the Edinburgh language IMP and run on an ICL 2970 computer at the Edinburgh Regional Computing Centre. Standard errors of the parameters were calculated from the variance-covariance matrix of the final fit. The experimental points were first weighted equally (hereafter called “unweighted”), and then (hereafter called “weighted”) by the reciprocals of their variances (calculated by assuming the number of counts recorded followed a Poisson distribution). The data were divided into six groups (i.e., g = 6) by omitting every sixth point, and pseudovalues of the primary parameters (X1,&, X,,h,) computed. From them were derived pseudovalues of the secondary parameters, namely, the rate constants kol, k12, and k,, and the transit time r of the two-compartment model in Fig. 1 (12). Means of the primary and secondary parameters and their SE’s were calculated from the corresponding pseudovalues. RESULTS The results for 12 of the 16 experimental sets of data are summarized in Table 1 (the other 4 sets have been omitted for the reason given below). The upper rows of the table show that the weighted regression gave appreciably lower estimates of the ex-

FITTING

EXPGNENTIALS

BY JACKKNIFE TABLE

EFFECTS

OF WEIGHTING

AND JACKKNIFING

409

TECHNIQUE

1

ON ESTIMATES

OF PARAMETERS

OF TWO-EXPONENTIAL

FUNCTION

Weighted

Unweighted All data

Jackknife

100.2 f 1.4 1.9

98.0 + 1.4 1.9

99.2 + 1.7 3.1

100 2.0

100.5 f 2.1 3.3

88.3 s 6.4 2.5

89.5 k 6.2 3.2

X2

100 1.7

100.0 * 0.7 1.8

93.3 2 3.5 1.2

93.5 f 3.4 1.5

s

100 4.4

100.0 t 1.6 4.3

86.4 + 5.2 2.9

86.7 f 5.0 4.3

Parameter

AU data

X1

100 1.0

Al

Jackknife

function y = X,.exp(-X,r) t X,.exp(-A$) was fitted to all the data for each experiment by and unweighted nonlinear least squares. The data were then divided into six subsets and the jackknife technique (see text). Results are for 12 separate experiments, and have been setting to 100 the estimates of the four parameters derived for each experiment by unweighted nonlinear least squares. Upper rows: means r SD (12) of parameter estimates. Lower rows: average normalized SE’s (from final variance-covariance matrix to fit to all data or the six pseudovalues of the jackknife) of parameters calculated from each experiment.

Note. The both weighted reanalyzed by normalized by

ponential parameters (X1 and AJ than did the unweighted one, and there were smaller but similar differences in the preexponential parameters (X, and X,). On the other hand essentially the same estimates of all four parameters were derived from the jackknifed and corresponding complete sets of data. Table 1 (lower rows) also gives the mean SE’s of the parameters, calculated both from the final variance-covariance matrix of the fit to the complete sets of data and from the pseudoestimates. The SE’s from the pseudoestimates are on average higher than their counterparts from the variancecovariance matrix. Since the former are associated with only 5 df (Student’s t = 2.571 at P = 0.05), they imply larger confidence limits for the parameters than do the former, which are based on at least 38 df (Student’s t = 2.024 at P = 0.05). Four sets of data were classified as anomalous and excluded from Table 1. “Good” sets gave pseudovalues that were all very similar, whereas the anomalous

TABLE P~EUDOVALUES

Points omitted

2

FROM GOOD AND ANOMALOUS DATA SETS

X1 (cpm)

103.h, @in-‘)

X2 (cpm)

105.h, (min-*)

Good data set 0 1,7...37 2, 8 . . . 38 3,9.. .39 4, 10 . . . 40 5, 11 . . . 41 6, 12 . . . 42

1099 1050 1141 1101 1101 1095 1099

14.77 13.86 16.01 14.83 15.20 14.52 13.71

431 422 452 432 436 424 414

73.4 72.0 79.7 74.9 74.9 70.4 68.1

Anomalous data set 0 1,7...37 2, 8 . . . 38 3, 9. . . 39 4, 10 . . . 40 5, 11 . . .41 6, 12 . . . 42

1606 1873 1490 1518 1602 1559 1703

14.44 IB.% 13.76 14.07 13.% 14.00 13.63

798 907 813 820 772 804 706

44.1 61.9 46.5 49.1 40.6 44.8 27.1

Note. The pseudoestimates were calculated by applying the jackknife with the unweighted regression (see text).

410

NIMMO,

BAUERMEISTER.

FIG. 1. Two-compartment model for distribution of radioactivity in leaf fed pulse of WO,. The sum (4) of the radioactivities in compartments 1 (ql) and 2 (q2) is recorded. The rates ofchange with time(t) of the latter are: dq,/dr = k,lq, - (kol + k2,)q1, dqJdr = k2,q, - klZq2. The time course of disappearance of radioactivity is q = X,.exp(-A,?) + X,.exp(-A&, and the rate constants and transit time (7) are (12): km = VA +X&W, +X,1, k,, = A~A&m kz, = (AI f hz) - (km + kd, T = (X, AZ+ &W(h,MX, +X,B.

sets produced one pseudovalue that was different from the rest; an example of each type is given in Table 2. These aberrant pseudovalues were always derived from the jackknifed subsets of data that lacked the first data point, and could reflect its having an abnormally high value. (If such an abnormally high point were omitted, X1 and h1 would be expected to fall and the pseudovalues calculated from them to rise: see Eq. [ 11.) Thus the jackknife technique may help one detect lack of fit in the very early part of the curve. The variations in the estimates of the secondary parameters (k,,, k12, kzl, and 7) reflected those of the primary parameters from which they were derived (see Fig. 1). For example, weighting led to reductions in the three rate constants but an increase in the transit time, and the coefficients of variation of the four parameters, calculated from the SE’s of the pseudoestimates, were usually between 3 and 5%. DISCUSSION

The jackknife technique was devised to reduce bias in parameter estimates and produce valid confidence Iimits for them (7). Incorrect weighting is a common cause of bias when the function being fitted is a sum of exponentials (14) or the rate equation for

AND DALE

an enzymic reaction (15). We therefore chose to analyze data for radioactive counts, because the variance of each point, and hence its weighting factor, could be derived by assuming the number of counts recorded followed a Poisson distribution. Ideally for jackknifing the data should be divided into subsets by leaving out one point at a time, but this was clearly impractical, and our choice of six subsets was large enough to give reasonably good means and SE’s without requiring an inordinate amount of computer time. Our results confirmed that different weighting functions produce systematically different estimates of the parameters (especially A1 and AZ, in our case) (14). On the other hand, the jackknife made very little difference to these estimates, and therefore does not seem to have reduced any bias incurred by incorrect weighting (a conclusion which does not depend on one’s assuming that either of the weighting functions used was “correct”). It is diflicult to assess objectively whether the confidence limits derived from the jackknife are valid. Their theoretical advantage is that they do not depend on dubious assumptions about the nature of the final sum-of-squares surface, or, so far as the secondary parameters are concerned, on equally dubious rules for combining variances and covariances (6,8). In practice they were larger than those calculated in the conventional way from the final variante-covariance matrix of the regression, but very similar to those deduced from experiments on different plants (12). Therefore they seem to be reliable. One of the criticisms of the jackknife in the context of enzyme kinetics has been its tendency to give “out-of-range” estimates: namely, a set of pseudovalues that does not bracket the original estimate based on the complete set of data (9,lO). In no instance have we observed out-of-range behavior. Another point of uncertainty is whether the parameter estimates 4 and 8-i should be

FITTING

EXPONENTIALS

transformed, for example, to their logarithms, before the pseudovalues are calculated from them (cf. Eq. [l]) (7,8): this would stop the pseudovalues from having biologically impossible negative numbers. In only 2 instances out of 384 did we encounter pseudovalues of the wrong sign. We have found the jackknife to be a useful (although subjective) tool for detecting poor fits. One example is the occasional observation of anomalously large values of the pseudoestimates obtained by omitting the first data point, implying this point to be greater than predicted (see Table 2). Another is the behavior of the regression program in converging to the minimum of the different subsets of data: with “good” data the program converged to the minimum from a common starting point in five to seven iterations, whereas with poor data several different starting points and eight or more iterations were required. (In fact it has recently been suggested (16) that a third, very fast, exponential is needed to describe adequately the loss of radioactivity from a wheat leaf fed a pulse of Y02.) In summary, one can infer from both our results and those for enzymic kinetics (8,9) that the jackknife technique does not reduce the bias in parameter estimates which is caused by incorrect weighting of the experimental data. On the other hand it seems to give reasonable confidence limits for the parameters. This property of the jackknife, together with its capacity to help detect lack of fit, implies that the technique could become a popular way of fitting biochemical equations to data.

BY JACKKNIFE

TECHNIQUE

411

ACKNOWLEDGMENTS We thank G. L. Atkins and E. G. Beare for their help.

REFERENCES 1. Atkins, G. L. (1969) Multicompartment Models for Biological Systems, Methuen, London. 2. Beckett, G. J., Douglas, J. G., Nimmo, I. A., Finlayson, N. D. C., and Percy-Robb, I. W. (1980) Clin. Chim. Acta 100, 193-200. 3. Nimmo, I. A., Smith, R. H., Dolder, M. A., and Oliver, M. F. (1976) Clin. Sci. Mol. Med. 50, 401-407. 4. Nimmo, I. A., Clapp, J. B., and Strange, R. C. (1979) Comp. Biochem. Physiol. 63B, 423-427. 5. Atkins, G. L., and Nimmo, I. A. (1980) Anal. Biochem. 104, l-9. 6. Draper, N. R., and Smith, H. (1966) in Applied Regression Analysis, pp. 263-304, Wiley, New York. I. Miller, R. G. (1974) Biometrika 61, I-15. 8. Comish-Bowden, A., and Wong, J. T.-F. (1978) Biochem. J. 175, %9-976. 9. Duggleby, R. G. (1979)Biochem. J. 181,255-256. 10. Comish-Bowden, A., and Wong, J. T.-F. (1980) Biochem.

J. 185,535-536.

11. Atkins, G. L., and Nimmo, I. A. (1980) in Kinetic Data Analysis: Design and Analysis of Enzyme and Pharmacokinetic Experiments (Endrenyi, L., ed.), pp. 121-136, Plenum, New York. 12. Bauermeister, A., Dale, J. E., Williams, E. J., and Scobie, J. (1980) J. Exp. Bar. 31, 1199-1209. 13. Atkins, G. L. (1971) Biochim. Biophys. Acta 252, 405-420. 14. Atkins, G. L. (1974) Biochem. J. 138, 125-127. 15. Dowd, J. B., and Riggs, D. S. (1%5) J. Biol. Chem. 240, 863-869. 16. Dale, J. E., Bauermeister, A., and Williams, E. J. (1981) in Mathematics and Plant Physiology (Rose, D. A., and Charles-Edwards, D. A., eds.), Academic Press, London/New York, in press.