Time Series Methods in Biological and Medical Data Analysis

Time Series Methods in Biological and Medical Data Analysis

TIME SERIES METHODS IN BIOLOGICAL AND MEDICAL DATA ANALYSIS A. J. Jakeman and P. C. Young Centre for Resource and Environmental Studies, Australian ...

2MB Sizes 1 Downloads 76 Views

TIME SERIES METHODS IN BIOLOGICAL AND MEDICAL DATA ANALYSIS A.

J. Jakeman and P. C. Young

Centre for Resource and Environmental Studies, Australian National University, Canberra, Australia

Abstract . This paper describes how time-series methods such as the recursive instrumental variable estimation procedure can be used to aid the analysis of biological and medical data obtained during transport experiments on the circulatory systems of both man and plants. The simplicity of the stochastic approach used contrasts with the conventional deterministic procedure based on regularisation. The proposed method appears to have applications in many areas of science and could lead to advances in the design of experiments for the collection and analysis of biological and medical data. Keywords . Discrete-time systems, difference equations, linear differential equations, identification, parameter estimation, stochastic systems, transfer functio~s, biology, biomedical. INTRODUCTION

time- varying linear; non- linear but linearin- the-parameters; and muZti - in~ut muZtioutput (Young and Jakeman, 1979). A non-

The use of time series methods for the analysis of biological and medical data has a rich history dating from early works like those of McKendrick (1926), Kingston (1936, 1937), Kendall (1941) and O'Bryan (1948) to later work in the 1950s including Gibson (1950), ~ loriyana (1950), Moran (1950), Utida (1950), Knott (1953), Hannan (1955), Miyashita (1955) and Shinozaki (1958) . Clearly, with the advent of electronic computers, research in this area has increased dramatically even to the point where specialist journals have evolved to cope with the demand for publication. Today each issue of IEEE Transactions on Bio-medicaZ Engineering, for example, contains numerous articles related to the time-series analysis of bio-medical data.

biological example will also be given to indicate a not so obvious application but one which has certain similarities with the bio-medical examples. It will show how IV methods have been used to improve single photon fluorescence decay analysis.

Our stochastic approach will be compared with a conventional deterministic procedure based on r eguZarisation (Tihonov, 1963). Furthermore, it can be extended to applications ~oolsaet and others, 1975, 1976) involving the fitting of exponentials to data. In this latter application, the advantages over most previous methods (Van Mastrigt, 1977, for exampl~ are first that the parameter estimates are consistent and asymptotically unbiased when the noise is non-white; second, that no initial estimate~ of the fitting parameters are required; and , further, that the number of exponentials does not have to be known a pr iori . If the problem is solved in iterative, en-bloc (off-line) form then the programming is simple, with only a slight modification to a linear regression program being necessary . Recursive (on-line) techniques however, may be preferable if the experimenter wishes to exploit the possibility of continuous experimentation: for example, on-line identification and estimation allows the monitoring of experiments and hence their termination when sufficient model accuracy has been obtained.

The contribution of this paper is as follows: we describe the use and applicability of time-series model identification and estimation procedures, based on methods such as instrumental variables (IV), for the analysis of transport data obtained from tracer experiments on the circulatory systems of both man and plants. The examples given are simple but important ones involving the determination of functionals of the impulse response of a noisy, discrete, linear, time-invariant, single input-single output system. In these examples we have preferred to use recursive IV methods of identification and estimation, both for their basic simplicity and their robustness in the presence of noise of a fairly general nature. In addition, we note here that the IV methods have been extended to systems that possess any of the properties: continuous - time;

It appears that the general stochastic approach described here has applications in many areas of science (Jakeman, Steele and Young, 1978; Minchin, 1978; Whitehead and 1219

A. J. J ak eman and P. C. Youn g

1220

Young, 1975 ; Young, 1971) but particularly in biological and medical data analysis. Further possible uses here include the analysis of tracer data in nuclear medicine (Snyder, 1973), the fitting of visco-elastic relaxation curves in investigative urology (Coolsaet and others, 1976) and the calculation of ventilatory responses of differential equation models from inputs of transient stimuli and output recovery data. The latter application provides insight into the role of the carotid body chemoreceptors in the control of ventilation (Smith and Dutton, 1974). PROBLEM FORMULATION AND METHODS OF SOLUTION The aim of this section is to briefly outline the alternative simple formulations with which we will be mainly concerned and to refer the reader to the appropriate literature for further details of the methods of solution . The circulatory applications in sections 3 and 4 and the fluorescence application in section 5 can be modeled as the Volterra convolution equation of the first kind x(t) =

t

J

o

k(t-s) u(s) ds

(1)

with u(t), x(t) and k(t) denoting the input, output and impulse response, respectively, of a noiseless linear system . Using a rational transfer function description of the system, equation (1) can be written as either a differential equation model (1 + a

1

D + .. . + a q Dq) x(t)

= (bo + bID + ... + bpDP) u(t)

(2)

or its "equivalent" difference equation model

(3)

The notation D represents the derivative operator, z-1 the backward shift operator, i.e. z-l xk = xk_ ' and the subscript k 1 denotes the sample of the associated variable at the k-th time instant. The assumptions made in proceeding from (1) to (2) to (3) are well known and have been given in Jakeman and Young (1978). Of course, the stochastic aspects associated with any particular problem can be introduced in a numbe r of different ways. In the discrete-time data case, which is the principle concern of the present oaper, this can be accomplished most simply by assuming that the measured output is

Yk= xk+ t; k·

(4)

The formulation (3) with (4) has appeared often (Box and Jenkins, 1970; Young, 1974) and a consistent, relatively efficient solution can be found by instrumental variable techniques (Young, 1974) if the noise t; k is uncorrelated with the input uk . If, in addit i on, t; k can be assumed as generated from a white noise process e k by an autoregressive moving average (ARMA) model of the form

(5)

then an asymptotically efficient, refined IV technique (Young and Jakeman, 1978) can be used on the model (3) - (5)) In addition, a procedure based on either the basic or refined instrumental variable technique has been proposed for the order identification of models of this form (Young, Jakeman and McMurtrie, 1978) . A large range of simulations is given in this latter reference and from them, it appears that the criteria used for correct model order detection are quite discerning in practical terms. In the ne xt three sections, an overview is given of three problems whose formulations involve a characterisation such as equation (1) and with which we have been closely involved. !n each case, the parameters T ~ = [aI' a 2 , ··· ,a n , bo ' b1 ,· ·· ,bm J

and the order (n, m) are required to estimate the impulse response k(t-s) in the usual manner (Jakeman , Young and Anderssen, 1978) . However, the specific properties of interest usually differ between examples and these will be noted. TRANSPORT FUNCTIONALS OF THE CIRCULATION IN MAN . In this application two curves, the input and the output , are recorded at different sites downstream from the injection of indicator into the circulatory system. The difference between any pair of curves represents the dynamic delay imposed by the passage of blood from one sampling site to another and this can be considered as a frequency distribution of transit times between the two sites. In this way, curves 1

This is sometimes referred to as the Bo x Jenkins or transfer function model .

Time s series methods in bi ol o gi ca l a nd me di ca l dat a ana l ysi s

recorded from different injection sites in the central circulation allow the computation of trans~ort functions descriuins the flow of blood in the portions of circulation between the injection sites . The quantitative properties which seem to be of most interest in this application are the steady state gain (or zeroth moment) and the first three moments of k(t) (Jakeman, Young and Anderssen, 1978; Weinstein and others, 1973) . The steady state gain merely serves as a validity check; the first moment or mean transit time when multiplied by the cardiac output yields the circulating blood volume; the last two moments may be of physiological interest (Weinstein and others, 1973); the second moment about the mean, for instance, measures dispersion. A conventional method of solution for this problem is termed regu~arisation . If we consider a finite difference or movingaverage type approximation to (1) xk = kk =

o + kk_1 u1 + ... + ko uk

U

Ik (z-l) uk '

then regularisation, in simple terms, is the minimisation of the error in this approximation subject to a smoothing constraint on the solution. A typical example would be min l lx k - I k(z-l) uk ll L 2 + 2

Y

2

I lx(t) II w

with 11.ll w2 being an L2-norm (or Sovolev norm) and y called the regularisation parameter. The disadvantages of this method are that y must usually be determined by trial and error and that it is difficult to choose the norm w so that it is consistent with removing the correct type of error in the data. A regularisation solution, given smooth circulation data, was given by Jordinson and others (1976) and this is plotted in FiS. 1 against the impulse response obtained from the type of procedure proposed in section 2. In the regularisation an L -norm was chosen and, as one might 2 expect, this has the undesirable effect of lowering the peak values of the true impulse response. Weinstein and others (1973) have considered a more general problem than the above . They have developed a method for obtaining the distribution of first passage times in the mammalian circulation. Because they deal with the whole circulation rather than just portions, recirculation must be taken into account. The resultant formulation for this new problem is a second kind convolution Volterra equation coupled with one of the first kind which allows for non ideal injection and sampling system distortion. Their method is basically deterministic and it is thought that a stochastic approach based upon that for the simpler problem will be

122 1

relevant here also . TRANSPORT PROPERTIES OF THE PHLOEM IN PLANTS This application is similar to that of the previous section. However, it seems th zt the transfer function modeling approach was ignored here until recently, to the extent that the results obtained were not independent of the type of loading process of the tracer. We were supplied with radioactive tracer data fi'om experiments conducted on Zea mays by Peter Minchin of DSIR, New Zealand. The results of our analysiS appear in Fig. 2 which shows the actual and model predicted concentrations of Carbon 11 at an output point in the plant, using the measurement of input tracer material upstream . The properties of interest here seem to be pathway loss and mean transport speed. The work by Minchin (1978) has shown that the usual frequency of transport measurements are probably too high and that better ~esults for the discrete time model can be obtained by using a larger sampling interval. An alternative would be to use high frequency sampling and the continuous-time (Young and Jakeman, 1979) IV approach. EXTENSION OF IV ESTIMATION TO EXPONENTIAL CURVE FITTING Let us first consider a particular problem encountered in single photon fluorescence decay analysis. Here a compound in liquid or vapour state is excited by pulse-like inputs, each input building up an empirical probability density of fluorescence decay with respect to wavelength, which can be considered as the "output" (Jakeman, Steele and Young, 1978). In this case, the impulse response is defined as the true fluorescence decay. The particular quantities in which we are interested are first, the "fl uorescence 1ifetimes " or time constants of the system; and second, the amplitude of the impulse response. They can be used to ascertain en2rgy properties of the excited state, thereby aiding understanding of the mechanism of the excitation and deactivation process€s. It is often important in this application that the dynamic order of the systerr be identified correctly since a given species may possess more than one decay fro p an excited state and the correct number of decays may be unknown. IV identification proved most useful for this purpose. In Fig. 3 a comparison of the actual and predicted model output is shown for the fluorescence of azulene-h8 which was identified as having a single decay prorerty. This conforms with the theoretical analysis in this case. Regularisation has also been tried for the analysis of this type of data and Knight and

12 2 2

A. J. Jakema n and P. C. Young

Seling er (1971) remark on the difficu lty of choosing an approp riate smoothing norm w. As in this fluores cence decay problem, many physical and biophysical phenomena yield a signal comprising a sum of exponential terms 1 i ke -a Yle 1t + Y2e -a 2t +... + yqe -a qt (6)

the discret e-time equiva lent of (2), as given by model (3), using an observ ation equation (4) and a basic instrumental variab le approach. Then an en-bloc solutio n is given by the normal-type equations -

a

N , 1:

k=1

~

T'

4 ]a

=

N 1:

k=l

~ Yk

(8)

where and it is often thought desirab le to separate such signals into their components ai' Yi from measured data. An example of this occurs in urology where the aim is to describ e quanti tativel y the dynamic behaviour of urinary bladders or bladder wall-s trips from visco- elastic relaxat ion curves obtained by straini ng with a step input. The relaxa tion consta nts ai' or recipro cal time consta nts, charac terise these stress- decrea se curves , as do the amplitudes y. from which the elastic model 1 can be calcula ted. These quanti ties in turn have been used to charac terise the behaviour of the bladder and bladder strips. In fact, the relaxat ion consta nts determined for bladder strips agree quite well with those for whole bladde rs, implying that the value of these consta nts is independent of geometry. (Coolsaetand others , 1975, 1976). A General Approach to the Exponential Decay Curve Fitting Problem. We now turn attenti on to developing a rationale for the exponential curve fitting method to be proposed here which is based on our fluorescence decay researc h . If equation (2) is solved with u(t) defined as a Dirac delta functio n, then the solutio n in Laplace(s) transform terms is x(s) When the denominator (or charac teristi c polynomial) has eigenvalues on the real negativ e axis in the complex plane, the system is charac terised by purely exponential modes of behaviour, so that the response of such a system to a unit impulse is given by x( s)

=

q

Yi

1:

i=1

s+a

(7)

i

and x(t), the inverse transform of (7), is given by (6). As a result , one method of fitting exponentials to noisy data is to assume an impulse input, regard the data to be fitted as the output and proceed by IV identif ication and estima tion. Let us consid er such curve fitting by using

T

4 ~

[-Yk-l ···- Yk-n

6 k ,1··· 6 k_m,l]

"

[-x k_1···-x k_n 6 k,1··· 6 k-m,l]

T

with x being the best estima te of the noisek free output x at sample time k and 6 ,j k k being the Kronecker delta functio n. Of courSl no initial estima tes are necess ary, if an iterati ve en-bloc solutio n is invoked: here initiat ion is by means of the least squares method using 4 !or ~ initial ly, and thereafter computing x from k ,

,

+ bo6 , 1 +. .. + b 6 _ 1 k m k m,

(9)

at the end of each iterati on . In this way , the whole procedure involves little more than solving (8) with a good least squares routine and calcula ting X via (9) for all k after k each iterati on until convergence has been obtained. This simple approach has the disadvantage that the input stimulus is a delta function and so the numerator coeffic ient estima tes are not well defined by the solutio n of (8). A better , albeit more complicated approach, is to utilise the refined IV method (Young and Jakeman, 1978). Here the adaptive prefilters used to induce greate r statist ical efficie ncy perform the additio nal useful service of filteri ng the impulse input signal a~d yieldin g a more complicated input cata structu re. In this manner, estima tes of the numerator coeffic ients are obtained from a "richer " data set and are better smoothed by the estima tion procedure. A caveat , however, must be applied to the above approach: an impulse input is not itself per sisten tly excitin g (Young, 1971; Astrom and Bohlin, 1966) and hence identi fiabili ty problems may sometimes be encountered. Theref ore, the usual statist ical proper ties given to IV techniq ues, like consis tency, must be interpreted loosely . It is felt that, for the most part, this curve fitting procedure will work well in the presence of noise of a quite general charac ter. This is demonstrated by the success that has been achieved when using these methods for the fitting of transfe r functions to unit hydrograph data (Jakeman and Young, 1978) . Finally it should be noted that this general approach to data analys is can also be used to determine the model order

Time s s e ries me thods in bi o l og ic a l a nd me dical data an a l ys i s

(n m) if the criteria suggested in Young, Ja~eman and McMurtrie (1978) are evaluated over all possible model orders . It is then a simple matter to infer the values of the y . and a . as is shown in Jakeman, Steele and 1 1 Young (1978). Curve Fitting: a Cautionary Message The urological application ment~oned ear~ier in this section was chosen spec1ally. W1th the fluorescence application, it can be used to illustrate an important point about other possible applications of cu:ve.fitting. ~t is our feeling that curve f1tt1ng peT se 1S often a poor approach to problem solv~n~, being merely an obvious approach, ent1c1ng in its simplicity of formulation, when knowledge of other more realistic approaches is unknown. But in the field of fluorescence decay analysis, this apparent simplicity coupled with a lack of ade q u ~ te knowl~d g e of sy stems analysis led to ~he 1ntro~uct10n of sophisticated and expens1ve exper1mental equipment, including spec~al ~asers, tha~ could yield inputs appro x1mat1ng the de~lred delta function pulse as closely as poss1ble . In this way, it was hoped that the output or observed fluorescence decay would be close to the "true decay" curve k(t) so that the fitting of exponentials directly to the output could be justified. In Jakeman, Steele and Young (1978) we argue that, with more appropriate "sufficiently exciting" input types, transfer functions, and in turn true decay curves can be more ea~ily and accurately identified perhaps uS1ng less expensive e x periment~ equipment. We note here also that in urology the rela xation curves can be considered as appro ximate step responses to dynamic systems and that a step input as Coolsaet and others ~1975) propose is diffi cult to effect exper1mentall y . Since it seems most likel y that these systems are appro ximately linear and that the approach of section 2 can be implemented, then we would suggest that other more theoretically appropriate inputs, which are also experimentally reproducible, should be considered: in this manner it may well be possible to use the robust, con~istent properties of the IV procedure 1n a fully identifiable conte xt. Conclusions In this paper, we have reviewed briefly certain recent widely applicable developments in the analysis of bio-medical data obtained from planned dyn amic experiments. The new methods seem obvious to the systems analyst, but have apparently not been considered in the various mono-disciplines associated with the practical examples considered in the paper . The robustness of the IV method in the presence of quite general noise contamination makes it an ideal tool for the analysis of dynamic experimental data ?btained from planned experiments where the 1nput signal can be chosen to possess the desirable

1223

statistical properties of "sufficient excitation" and independence from the noise contaminating signals . The IV method does not reouire specific kinds of input perturbation, however, and so can be utilised in its recursive, on-line mode for the analysis of data obtained during the normal operation of the bio-medical system. This could have important implications on future experimental system design . REFERENCES Astrom, K.J., and Bohlin, T., 1966. Numerical identification of linear dynami c systems from normal operating records in; TheoTY of Self Adapti ve ContTol Sy s teme, P. H. Hammond (ed.) Plenum Press: London. Bo x, G.E.P . , and Jenkins, G . ~1., 1970. Time SeTi es Analy s i s , FOTecas ting and ContTol , Holden Day, San Francisco. Coolsaet, B.L.R.A., van Duyl, W.A., van Mastrigt, R. , and Schouten, J.W., 1975. Viscoelastic properties of bladder wall strips. In vest . UTo l . }£, 351-356 . Coolsaet, B.L.R.A., van Mastrigt, R., van Duyl W.A., and Huygen, R.E.F., 1976 . Visco - elastic properties of bladder wall strips at constant elongat i on. Invest . UToz. , 11, 435-440. Gibson, G.D . , 1950. On Gladwin's methods of correlation in tree ring analysis. AmeT. AnthTOp ., 49, 337-340. Hannan, E.J., 1955 . An exact test for correlation between time series. B i ometTika , ~, 316-326. Jakeman , A.J., Steele, L.P., and Young, P.C., 1978. A simple transfer function method for estimating decay functions and its application to fluorescence decay data. CRES Report AS/R26, Australian National University. Jakeman, A.J., and Young, P.C., 1978. Systems identification and estimation for convolution integral equations . ?roe . One Day SeminaT on

Application and NumeTical Sol ution of IntegTal Equa t ions , November 29,1978 , Austral1an

National University. Jakeman, A.J., Young, P.C., and Anderssen, R. S., 1978. Use of the linear systems approach for transport function analysis of portions of the central circulation in man. CRES Report Number AS/R23, Australian National University. Jordinson, R. , Al'nold S., Hinde, B., Miller, G.F., Rodger, J.G., and Kitchin, A.H . , 1976 . Steady state transport function analysis of portions of the central circulation in man. Comput . Biomed . Res ., 2., 291-305 . Kendall, M.G., 1941 . The effect of the elimination of trend on oscillations in time series. J . Roy . Stat . Soc ., 104, 43-52. Kingston, J., 1936-37. On p er~odical ~na~ y sis and its application to productl on s tatlstlcS. Rev is ta de Economia e Es tati s t i ca ., July 25-32 ,

A. J. Jakeman and P . C. Young

1224

October 27-34, January 17-26. Knight, A.E.W., & Selinger, B.K., 1971. The deconvolution of fluorescence decay curves. A non method for real data. Spectrochimica Act~ 27A, 1223-1234. Knott, J . R., 1953. A'.; tomatic frequency analysis. EEG and Clin . Neurophysiol . Supp . , i,17-25. McKendrick, A.G., 1926. Applications of mathematics to medical problems. Proc . Edinburgh Math . Soc ., 44, 1, 98-130. Minchin, P., 197 5. Analysis of tracer profiles in phloem transport. J . Exptal . Botany,~, 1978. Miyashita, K., 1955. Some consideration on the population fluctuation of the rice stem borer. Bull . Nat . Inst . Agric . Sci . Japan , Bull 5, 99-109. Moran, P.A.P., 1950. Some remarks on animal population dynamics . Biometrics , ~, 250-258. Moriyana, M., 1950. Studies on the seasonal variation of Beriberi (time series analysis) . Rev . of Ge~ 23, 353-358 . Studies on the time series analysTS of the seasonal disease. Papers in meteoroZogy and geophysics , 1, 148-152. O'Bryan, D., 1948. Remarks on tree ring analysis techniques in the Southwest. Amer . Anthrop ., 50, 708-713. Shinozaki, K., 1958. Logistic theory of the growth of plant. Bull.Japan.Stat . Soc ., 36-41. Smith, E. J., & Dutton, R.E., 1974. Dynamic model of ventilatory response to changes in P at the carotid body chemoreceptors. IEEE Tran~. Biomed . Eng .,~, 227-231. Snyder, D.L., 1973. Statistical analysis of dynamic tracer data . IEEE Trans . Biomed . Eng ., 20, 11-20. Tlhonov, A.N., 1963. Solution of incorrectly formulated problems and the regularisation me thod. Soviet . Math . Dokl. ,i, 1035-1038.

....

..

Utida, S., 1950. On the equilibrium state of the interacting population of an insect and its parasite. Ecology , 11, 165-175. Van Mastrigt, R., 1977. Constant step approximation of multi-exponential signals using a least squares criterion. Comput.Biol.Med.,1., 231-347. Weinstein, H., Fu, B.S., Acosta, R., Bernstein, B. , & Shaffer,A.B., 1973. Estimation of circulating blood volume from a tracer response curve . IEEE Trans . Biomed.Eng .} 20, 269-277. Whitehead, P.G., & Young, P.C . , 1975. A dynamic stochastic model for water quality in part of the Bedford Ouse river system. In Modeling & rimulation of Water Resource Systems . G.C. Vansteenkiste (ed.) North Holland: Amsterdam. Young, P.C . , 1971. Lectures on recursive approaches to parameter estimation and time series analysis. In: Theory & Practice of Systems Modeling & Identification, ONERA Toulouse Research Centre . Young, P.C., 1974. Recursive approaches to time series analysis. Bull. Inst . Maths . Appl. , 1.Q; 209-224 . Young, P.C., & Jakeman, A.J., 1978. Refined instrumental variable methods of recursive time series analysis Part I: single input, single output systems. Int.J . Control. Young, P.C., Jakeman, A.J., and McMurtrie, R.E., 1978. An instrumental variable ~ethod for model structure identification. CRES Report AS/R22, Australian National University. Young, P.C., & Jakeman, A.J., 1979. Refined instrumental variable methods of recursive time series analysis, Part Ill: Extensions. CRES Report AS/RI7, Australian National Un i vers ity .

N

,

The linear system approximation

.

sa

Jordinson et al (1976) regularization appro ximation

6

Fi.9.. 1.

For' human circulation data, a comparison of the impulse response, obtained by basic instrumental variables on (3) and (4), and a regularisation method.

Times series methods in biological and medical data analysis

1225

......

...

!SI a)

IS!

... wo

.,g

.... !SI

151 N

m

.....

si

...

'" eO ,

e Fig. 2.

2

..

5

8

18

12

14

For phloem tracer data. the actual (e) output and model predicted (-) output using basic instrumental variables on (3) c.nd (in; and their difference.

go & &

. ::2!l

'"ca '~--------r--------r--------r--------.--------'--------.----21 40 6S 8B 182 121a

Fig. 3.

16

For fluorescence data from a species of azulene-h8. the actual (e) output and m~del predicted (-) output using refined instrumental variables on (3) - (5); and ti,eir difference.