( : ()P~ right
Estilll;lli()1I
©
I F:\( : Idt'llIifictttioll ;md S, :-.tl'lll York, l ' K, IQX:)
P
I~H'F),
CLINICAL TIME SERIES: ANALYSIS, MODELLING AND RECURSIVE ESTIMATION R. L. Flood*, M. S. Leaning**, D. G. Cramp*** and E. R. Carson* *Cmlre filr ,HmslIIPmelll IIl1d In/im!laliulI ill Mn/irille, The Cil\' Univenil" NurlhamplulI SqUllre , LUlldUII Eel , UK **Aradnllir Depm/me!/I 0/ Medim/ Physio, Roya/ Free Ho.I/Jilll/ SrlIOO/ 0/ Medicille, Row/alld Hill Slrer! , LOlldoll NW3 , UK ***Deparlme/ll 0/ ChemiclI/ Palhu/o/f.; , Rova/ FI'I'(' Hospila/ Sclwo/ 0/ i'vIedicille, Ruw/alld Hill SI/HI, Lundoll NW3, UK
,r/' /
/
ABstract. The nature and origin of clinical time series data are discussed. A number of model-based techniques of estimation and prediction are outlined and evaluated in worked examples. These examples are relevant to the management of disorders involving fluid/electrolyte acid/base dynamics. A distinction is made between the requirements and potential of both physiologicallybased models and time series models of the Box-Jenkins type. The paper concludes with a proposal as to how such methods, including recursive estimation, may form part of a decision support system. Keywords. Physiological models; time series analysis; recursive estimation; Kalman filtering; fluid-electrolyte balance. INTRODUCTION
(Nelson, 1973; Box and Jenkins, 1976; Chatfield, 1980). In an ARIMA model, the time series Zt is generated from white noise by three filtering operations: a moving average filter, an autoregressive filter, and a non-stationary summation filter. If B denotes t~e time ~elay operator, where BZt=Zt_l' and V the d1fferenc1ng operator, such that VZ = Zt -Zt-l' then an ARIMA (p,d,q) model has t the general form (1 - 0 B - ... - 0 BP) Vdz P t 1 8 + (1 - 8 B - •.• - 8q Bq)a t or 0 l 0(B)Vd zt = 80 + 8 (B)a t (1) where 0 and 8 i ~re the parameters of the autorei gressive and mov1ng average processes of order p and q, respectively.
In general, sequences of measurements of blood and urine substances and other physiological variables are generated in two modes. The first is laboratory-based and has both a low data rate and a low sample size (N ) 30, say). The second arises in online monitoring, as in the intensive therapy unit, and has a fast data rate and a large sample size (N)lOO, say). This paper is concerned with modelbased techniques for estimation and prediction from data that are generated in both modes. In particular, "clinical time series" related to the management of fluid/electrolyte acid/base disorders will be considered. The techniques reviewed include ARIMA (autoregressive integrated moving average) and discrete-time transfer function modelling. These depend solely upon the data. By contrast, the state space model approach considered subsequently embodies representations of the physiological processes underlying the observed variables. An example is given of the recursive estimation of such a state space model. In reviewing each of the above techniques the following format is used: an outline of the technique; a worked example; critical discussion in relation to the clinical time series of interest here. Finally, the various methods are assessed in the wider context of a decision support system for the management of disorders of fluid/electrolyte acid/base dynamics.
The Box-Jenkins methodology (Box and Jenkins, 1976) has three stages: identification; estimation; diagnostic checking. Identification consists of selecting tentative classes of (p,d,q) model and making preliminary estimates of the 0 . and 8. parameters. Identification, which is largely subj~ctive, relies on the estimated autocorrelation function, rk' and the estimated partial autocorrelation function 0kk , (Chatfield, 1980). In estimation the model parameters for a given (p,d,q) model are estimated, usually with maximum likelihood estimation. The validity of the final model is assessed in diagnostic checking, where the whiteness of the residuals (estimates of at) and the significance of adding extra AR or MA terms are considered.
It may be argued that much valuable information in the mass of time series data generated by diagnostic laboratories and on-line monitors is unused. The development of well-based and efficient techniques of obtaining such information is the objective of the work described here.
Example 1: Plasma Potassium Concentration Data. Figure 1 shows a 27 point plasma potassium concentration time series for a patient in the intensive therapy unit. The series appears to have a nOnstationary downward trend. After first differencing of the series to produce stationarity, the autocorrelation and partial autocorrelations are depicted in Fig . 2. The autocorrelation rk falls off quickly after k=2 suggesting an MA(2) process, and the 2& confidence limits (95%) on this hypoth-
ARIMA MODEL BUILDING Autoregressive integrated moving average (ARIMA) stochastic models are generalisations of autoregressive and moving average models capable of dealing with non-stationary univariate time seri e s lfil~
R. 1.. Flood 1'1 (It.
1614
esis show that va lues r3' r4' ... are not significantly different from zero. This finding is confirmed in the partial autocorrelation function 0kk which di es off more slowly. The tentative model identification is thus (0,1,2). Initial estimates of e l ande 2 can be made from rl and r2, either by solution of the non-linear theoretical autocorrelation equations or from chart C in Box and Jenkins (p519, 1976) for MA(2) processes . Allowin g for an approximate standard error or = 0.2 on rk (based on nl), the invertible ranges for e l and 92 are o
1
82 8
3
(Const.)
a.1
dynamic respon se of the patient to be determined and hence good control exercised . An approach to TF model building in the t ime domain was developed ~y Box and Jenkins (~976). 1£ X and Yt ar~ the t 1nput and output ser 1es , then the TF model 1S given by the linear difference equation (expressed in B opera t or form) Y
= w (B)
t
6(B)
X
t-b
+ N
t
(3)
6(B) and w(B) are polynomials of or ders rand s, with 60 = 1. The system "dead time" is band Nt is a noise term assumed to be generated by anARIMA process. Eq. (3) is referred to as an (r,s,b) TF model and may be written in terms of the impulse fun c tion v(B) (v o + vlB + V2B2 + ••• vkBk + ••• ) as (4)
0.589
°i 0.264
0.360
0.264
-0.0102
0.0096
Formal estimation of the (0,1,2) model was performed with maximum likelihood estimation using starting values 81 = 0.4, 82 = 0.6. The results are given in Table 1. The errors (Oi) of the estimates are fairly high (due to the small sample size) and the estimates are near the invertible limit (al + 92 = 0.949). The fitted model is thus ~Zt = -0.0102 + (1 - 0.589B - 0.360B2)a (2) t where s2 = 0.06, the mean sum of squares of the residuats.
In diagnostic checking the residua1s of the model fit were not found to be serially correlated. A number of mixed models (p f 0, q f 0) with p=5 (based on the partial autocorrelation function) were considered, but these did not significantly reduce the sum of squares and were rejected. Finally, forecasts were made with the fitted model (Eq. (2» with a lead time of 20 days. Figure 3 shows the expected value and 95% confidence intervals. The trend of plasma potassium down out of the normal reference range continues. Discussion. In the above example an ARIMA model is fitted to a sequence of n = 27 clinical data points. The forecasts of the model appear reasonable, but no claim is made to its uniqueness. Since the standard error of the au~ocorrelation estimate is approximately equal to nI, some writers observe that at least 50 data points are required in ARIMA model building (Box and Jenkins, 1976). In online patient monitoring, with frequent sampling this sample size is readily achievable, however in data derived from laboratory results much sparser time series are to be expected. For clinical implement a tion, given acceptable data rates, there remains the problem of model identifi ca tion. Automatic identification may be included in parameter estimation by selecting a model which minimises the Akaike Information Criterion, but a unique model cannot be guaranteed (Chatfie1d, 1980). Long-term clinical data are likely to contain seasonal variations at different frequencies. This type of data can be handled by seasonal ARIMA models (Box and Jenkins, 1976). DISCRETE TRANSFER FUNCTION MODEL BUILDING The univariate ARIMA approach cannot take into account relations between different timp series, - relations which may be of clinical importance. One form of bivariate relation, where one time series corresponds to an input and another time series to an observable effect, or output, can be investigated using discrete-time transfer function (TF) models. In principle, such models allow thp
There are three stages to model building: identification; estimation; diagnostic checking. In what follows, only identification is considered as this is most pertinent in application t o clin i cal time series. Before model buildin g , Xt and Yt are differenced in order to achieve stationarity, thus x = ~dXt and Yt = ~dYt' The objective of identiftcation is to determine the dead time b, and the orders rand s of the operator polynomials, and initial estimates for their parameters. The co efficients vk of the impulse function (Eq. (4» are related in a set of linear equations to the estimated cross-correlations rxy(j), j = 0,1,2, ... between x t and y~ at 1ag j. The impulse function has a characterist1c shape depending on the values r, sand b. Thus once vk have been obtained by solving the rxy(j) equations, tentati ve identification can be made. The determination of estimates vk is simplified if the input x t is pre-whitened. This is done by building an ARIMA model in which white noise a t drives x t (5) a t = e~l(B)l;lx(B)Xt The same transformation is applied to the output series Yt
8
= e:l(B)l;lx(B)Y t (6) t Since is now white noise, estimates of the impulse tfunction coefficients, vk' are simply proportional to the c ross-co rrelation r Q8 (k) of a t and 8t
°
r 08 (k) s8
(7)
So
where So and s8 are the standard deviations of Qt and 8t . Having estimated vk ' it is possible to determine r, sand b and preliminary parameterestimates. Example 2: Urine Sodium Response to \,later Int ake. Figure 4 shows time series "data" for wa ter input (X t ) and urine sodium output (Y t ) with n = lOO , obtained from a simulation model of fluid/electrolyte acid/base (FAB) dynamics (Flood and co -worker~ 1984). (The input Xt was generated by a stochastic proces s~ X was identified and fitted as an AR(l) t process (1 - 1;l1BlX t = Qt (8) with ~l = 0.96. Since (l - 0.96B) , (1 - B), this indicates that Xt was generated by a random walk process, Xt = Xt-l + Qt ' The pre-whitened series Qt' and transformed output series Bt were thus ob tained by simple di f ferencin g (9) Qt = ~X t' and Bt = ~t with So = 0.405 x 10- 3 , and s ~ = 0.14 x 10- 3 . Estimates of th: cross-correlation function ~j3(k) , standard error or ' and the impulse function vk are given in Table 2. The values of r j 3(k) for k = 0 , 3 are small compared with th eir erro r s , implying b = 4 . The estimated impulse function is plot -
Clinical Time Series
where u and u are hourly intake rates of water 2 l and sodium (corrected for non-urine losses). ~ l - ~3 are continuous white noises representing model uncertainty and system variability. The measured "clinical" outputs at time tk are given by
ted on Fig. 5. A first order model (1,0,4) may be postulated as an approximate description, shown as curve (a) in Fig. 5. This leads to the model (10) (1 - 0.75B)Y t = 0.lX t -4 The characteristic impulse function of a (2,2,4) model is shown also in Fig. 5 (Curve (b». The final choice must be made using automatic parameter estimation, and selecting the model with the best fit (e.g. with AIC).
TABLE 2
1615
Estimated Cross-Correlations (ro S) and Impulse Response Coefficients (vk) at lag k for Water Intake/Urine Sodium Data. or is the approximate error on roS (= n-!).
k roS(k)
. or
~~r
0
-
1
2
3
4
- .108 - .003
.095
.035
.1
.1
.0333
.0123 -.0378
.1
.1
7
8
5
6
.234
.151
.321
.125
.074
.1
.1
.1
.1
.1
.0529
.112
.0438
.0259
- .0011 .0819
9
-
10
.094 .1
-
-
.127 .1
.0329 - .0445
Discuss~n. As was also found for the univariate data (Example 1), a simple model may be an adequate representation of the data. The data requirements are high, and it appears that 100 sample pairs are not sufficient for the urine sodium/water intake experiment to perform subjective identification. With automatic methods of TF order determination and parameter estimation, the approach offers clinical potential where patient treatment is byphysical input (e.g. drugs, saline, dextrose, etc.) and monitoring enSures a high data rate.
RECURSIVE ESTIMATION OF PHYSIOLOGICALLYBASED MODELS For short time series (e.g. Example 1) it is necessary to limit the set of possible models inorder to achieve successful results. A natural way todo this is to use state-space models (or "control system mOdels", Carson and co-workers, 1933) of the physiological systems and pathological processes underlying the observed variables. Such models may be validated in advance, and then used on-line with recursive estimation. In a simple model of thyroid disease it has been shown that re2sonable parameter estimates and accurate predictions canbe made with as few as 3 or 4 data points (de la Sal le and co-workers, 1984). The success of this approach depends upon the amount of prior effort that is involved in constructing and validating the statespace models, and not solely on the quality of identification procedures at the time of data fitting. Example 3: Recursive Estimation of a Non-Linear Stochastic Compartmental Model of Fluid-Electrolyte Dynamics. The following stochastic model is a highly simplified representation of fluid / electrolyte dynamics. An extended Kalman filter algorithm for joint state-parameter estimation is presented (Jazwinski, 1970; Young, 1974). The model has three states:' x , extracellular fluid volume (1.); x2' extracellular sodium maSs (mEq.); x3' normalised ADH concentration (a.u.). The state equations are
(12) where Yl' Y2 and Y3 are urine flow (l.hr- l ), plasma sodium concentration (mEq.17 l ) and urine flow (mEq.hr- l ). nl - n3 are discrete white measurement noises. It is assumed that parameters PI, k03' P3 and P4 are known a priori, and that parameters kOl, k02 and P2 are unknown. In the extended Kalman filter, the state vector is augmented with the unknown parameters, hence ~ = [xlx2x3kOlk02P2]T. Assuming the parameters are constant, the extended state and output equations are - x 4x l + P l x 3
u
- x 5 x 2 + x6
u
- k03 x 3 + P3 x 2/ x l -P4 x l
0 +
~
r'l'
2
'2 '3 +
0
0
0
0
0
0
0
0
0
{'" -"'1
Ik-
l
+
x 2 /x l
x x - x6 3 2
[::J
(13)
(14)
~ = ~(tk)
or
i<
_f(~)
!k
.& (~( t k »
+ ~ +5 +~
(15)
The Kalman filter updates the estimate ~-l and its estimated covariance Pk-l in a two-step process each time a new data point Ik becomes available. The algorithm requires that the model dynamics are
[ S VOL 2-Q
R. L. Flood 1'1 (If.
1616
in part linearised. 1st step - a priori prediction (16)
~k is the linear state transition matrix from t k l to t , and Q is the discrete time covariance matk rix of ..£.
2nd step - correction (17)
Where ~ is the Kalman gain matrix and R is the covariance matrix of the output noise~. The linearised output matrix Ck has elements igi (evaluated at ~-l) . aX j In a preliminary study, the model and filter algorithm were tested using data simulated from the model, with the following parameter values: -1 -1 -1 kOl 0.0586, hr ; Pl = 0.758 l.hr. au. ; 0.00 44 6 hr -1.,P2 = 0; k03 -- 4 .0 hr -1 , 0.119 a.u.l.hr.- l mEq.-l; 0.904 a.u.hr7 l l- l . In the results presented, P2 (x ) was simulated as a random walk (i.e. 6 T . • ~2 ~3 0 0 ~6 ] ) to allow for over slmpll-
~ = [ ~l
fication of the sodium dynamics (x2)' The noise matrices were Q = diag. [2.5 x 10- 3 , 6.Z5 x 10 2 , 6.Z5 x 10- 4 , 0, 0, 1.0] and R = diag [3.6 x 10- 5 , 1.96, 0.81] . Figure 6 shows recursive parameter estimates when the model is fitted to the simu lated data. kOl and k02 track close to their true values, despite large output variability (Fig. 7). PZ varies in a random walk, as in the simulated system. Figure 7 depicts model outputs and simulated data. The fit is very good. The results indicate that a recursive estimator for a non-linear compartmental FAB model with the types of output and data encountered clinically is in principle feasible. Discussion. The recursive state-space approach appears to offer many possibilities for clinical time series (Brovko and co-workers, 1981; Trimble and co-workers, 1983). A precondition is the existence of well-validated models of the underlying pathophysiological processes. Although the extended Kalman filter used above works in practice (Young, 1974; Wallace and co-workers, 1983), theoretically it is a biased estimator (Ljung, 1979). Furthermore, it requires knowledge of the noise Q and R matrices. Wiberg and Ljung (1981) have produced a modification to the EKF which overcomes these difficulties. CONCLUSIONS The approaches reviewed above differ in their requirements for sample size and knowledge of under lying physiological system dynamics. The Box-Jenkins methods need data sets of N>50 which are large in chronic situations or laboratory terms, but small when sampling is frequent as in monitoring. Multivariable and recursive implementations of discrete transfer function model building exist which have potential in on-line monitoring and control in intensive situations. The omission of physio-
logical processes in these models may appear disadvantageous, but if the system dynamics are not well understood or if they are rapidly changing (e.g. owing to disease evolution), it is of course an advantage. Where the pathophysiological processes underlying the clinical data are codified in validated mathematical models, schemes may be devised where model estimates and predictions are made on only a few data and updated as new data arr i ve. The clinical implication is clear. Such an approach will be equally effective where data rates are high. We propose an integrated system of computational aids for the management of disorders of fluid/electrolyte acid/base (FAB) dynamics. In such a syste~ a number of models (e.g. ARlMA, TF, state -space, and even simple regression models, such as that of Ballardie and co-workers, 1983) will act as data filters and predictors, supplying information on trends, trend changes, etc. to the user or to other software modules. These include a comprehensive FAB simulation model and a rule-based expert system It is intended that such a system be developed for on-line monitoring and control in the intensive therapy unit. REFERENCES Ballardie, F.W., S. Gartside, and N.P. Mallick (1983) Computer prediction of the need for dialysis and transplantation using calculated creatinine clearance. Br. Med. J., 286, 13281331Brovko, 0., D.M. Wiberg, L. Arena, and J.W. Bellville (1981) The extended Kalman filter as a pulmonary blood flow estimator. Automatica, 17, 213-220. Box,-C.E.P., and G.M. Jenkins (1976) Time Series Analysis. Forecasting and Control. Holden-Day, San Francisco. 2nd edn. Carson, E.R., C. Cobelli, and L. Finkelstein (1983) The Mathematical Modelling of Metabolic and Endocrine Systems. Wiley, New York. Chatfield, C.(1980) The Analysis of Time Series. Chapman and Hall, London. 2nd edn. de la Salle, S., M.S. Leaning, E.R. Carson, P.R. Edwards and L. Finkelstein (1984) Control system modelling of hormonal dynamics in the management of thyroid disease. 9th IFAC Congress, Budapest, Hungary. Flood, R.L., E.R. Carson, M.S. Leaning, and D.G. Cramp (1984) The control of body fluid volume in man: a mathematical mOdelling approach . 2nd Colloquium on Theoretical Biology and Medicine, Angers, France, September. Jazwinski, A.H. (1970) Stochastic Processes and Filtering Theory. Academic Press, New York. Ljung, L. (1979) Asymptotic behaviour of the ex tended Kalman filter as parameter estimator for linear systems. IEEE Trans. Autom.Contr., AC-2~ 36-50. -Nelson, C.R .(19 73) App lied Time Series Analysis for Managerial Forecasting. Holden-Day, San Franci sec.
Trimble, I.M., M. West, M.S. Knapp, R. Pownall, and A.F.M . Smith (1983) Detection of renal allograft rejection by computer. Br. Med. J., 286, 1695-1699. Wallace, J.N., and R. Clark (1983) The application of Kalman filtering estimation techniques in power station control systems. IEEE Trans. Autom. Contr., AC-28, 416-427 . Wiberg, D.M., and L. Ljung (1980) Gaussian-optimal on-line parameter estimation. In Proc. 19th IEEE Conference on Decision and Control, pp. 681-683. Young, P. (1974) Recursive approaches to time series analysis. Bulletin Institute of Nathematics and its Applications, 10, 209 224 .
Clinical Time Se ries
1617
l.min
-1
Water Intake
4.0 llUI101.1- 1
3.5
0 3.0 I
0 Fig. 1. ,
v
40
60
10 20 k (days)
80
100
Urine Sodium
30
Plasma potassium, [K+J p , time series. (Dotted line shows lower end of reference range.)
.' rk
l.
20
mEq.min- l 0.12
'V
~kk z
'V z
0.119 0
20 limits
{'V z ,MA(2)}
---------
Fig. 4.
20
40
60
80
100
Time (min) Water intake and urine sodium data generated from comprehensive simulation model.
(a)
~\
:,
\ \ \
Fig. 2.
)~
Estimated autocorrelation rk and partial autocorrelation $kk functions.
-,, 5
[K~
I
0
P
l
- ... , ... ...
"
(b)
20
a -+k
4
3
- o. Fig. 5.
2
o o Fig. 3.
10
20
30
40
50
Data and 20 day (0,1,2) model forecast (solid line) with 95% confidence intervals (dotted lines) .
Estimated impulse response function vk. Long dashed lines are 0 and 20 confidence intervals. Short dashed lines are characteristic (1,0,4) (a) and (2,2,4) (b) models.
161 8
R. L. Flood
04-----------------~
o
25
1'1 fll.
o~----------------~
o
25
5
o -2+---------------~
o
25 k (hour)
Fig. 6.
Recursive parameter estimates for non-linea r compartmental fluid/electrolyte model.
[Na]
200
o~--~~----------~
o
25
(Y2)
P
0+----------------,
o
25
o+-----------------~
o
Fig. 7.
25
k (ho ur) Recursive model outputs and simulated data.