Time series prediction of lung cancer patients' breathing pattern based on nonlinear dynamics

Time series prediction of lung cancer patients' breathing pattern based on nonlinear dynamics

Physica Medica 31 (2015) 257e265 Contents lists available at ScienceDirect Physica Medica journal homepage: http://www.physicamedica.com Original p...

2MB Sizes 0 Downloads 20 Views

Physica Medica 31 (2015) 257e265

Contents lists available at ScienceDirect

Physica Medica journal homepage: http://www.physicamedica.com

Original paper

Time series prediction of lung cancer patients' breathing pattern based on nonlinear dynamics  c, d, * R.P. Tolakanahalli a, D.K. Tewatia b, W.A. Tome a

Department of Medical Physics, Hamilton Health Sciences, Hamilton, ON L8V5C2, Canada Department of Human Oncology, University of Wisconsin-Madison, Madison, WI 53705, USA c Montefiore Medical Center and Institute for Onco-Physics, Albert Einstein College of Medicine, Bronx, NY 10461, USA d Department of Medical Physics, University of Wisconsin-Madison, Madison, WI 53705, USA b

a r t i c l e i n f o

a b s t r a c t

Article history: Received 22 July 2014 Received in revised form 30 December 2014 Accepted 28 January 2015 Available online 26 February 2015

This study focuses on predicting breathing pattern, which is crucial to deal with system latency in the treatments of moving lung tumors. Predicting respiratory motion in real-time is challenging, due to the inherent chaotic nature of breathing patterns, i.e. sensitive dependence on initial conditions. In this work, nonlinear prediction methods are used to predict the short-term evolution of the respiratory system for 62 patients, whose breathing time series was acquired using respiratory position management (RPM) system. Single step and N-point multi step prediction are performed for sampling rates of 5 Hz and 10 Hz. We compare the employed non-linear prediction methods with respect to prediction accuracy to Adaptive Infinite Impulse Response (IIR) prediction filters. A Local Average Model (LAM) and local linear models (LLMs) combined with a set of linear regularization techniques to solve ill-posed regression problems are implemented. For all sampling frequencies both single step and N-point multi step prediction results obtained using LAM and LLM with regularization methods perform better than IIR prediction filters for the selected sample patients. Moreover, since the simple LAM model performs as well as the more complicated LLM models in our patient sample, its use for non-linear prediction is recommended. © 2015 Associazione Italiana di Fisica Medica. Published by Elsevier Ltd. All rights reserved.

Keywords: Non-linear dynamics Time delay Embedding dimension Prediction Infinite response prediction filters Non-linear prediction Lung cancer Breathing pattern

Introduction Tumor motion due to breathing poses challenges for precise radiation dose delivery to a tumor while sparing surrounding healthy organs. Mostly, all respiratory-compensating methods developed or being investigated require predictive filters to tackle inherent system latencies in radiation delivery systems. The synergistic approach of real-time imaging with a powerful and accurate prediction engine is a key to successful tumor motion management. The review article by Verma et al. presents mathematical models for all major prediction algorithms that have been developed in the last decade. In summary, the review concludes that predictions with long latency are error prone and are not accurate enough to be implemented clinically [1]. Specifically, the prediction algorithm by Vedam et al. [2] uses least mean square (LMS) to update the filter coefficients after each prediction. Other adaptive models of the respiration motion

* Corresponding author. Albert Einstein College of Medicine, Bronx, NY 10461, USA. ). E-mail address: [email protected] (W.A. Tome

include neural network models with integrated linear or nonlinear filters [3,4], and the finite state model of Wu et al. [5]. Ruan et al. [6] used subspace projection methods to derive models of periodic respiratory signals. The Ruan projection models use Fourier spectra and least-squared-error analysis to find the best-fit periodicity of the respiration signal. Recently, Ernst et al. [7] have compared normalized least mean square (nLMS), recursive least squares (rLS), and a wavelet-based autoregression (wLMS) as well as a support vector regression algorithm and a Kalman filtering approach for prediction horizons ranging from 77 ms to 307 ms. They conclude that for a prediction horizon of 300 ms, their support vector machine (SVM) regression implementation yields the best results. In this work, we present state-space based non-linear methods for the prediction of respiratory signals for prediction horizons ranging from 400 ms to 3000 ms. The presented non-linear prediction methodologies can be directly implemented on tumor coordinates once they become available with the advent of newer tumor tracking technologies such as the Calypso™ tracking system (Calypso, Seattle, Washington) or the MRIdian™, a real-time MR Radiotherapy system developed by Viewray (Viewray Inc., Cleveland, OH) with which it will be possible to acquire real-time MR images at

http://dx.doi.org/10.1016/j.ejmp.2015.01.018 1120-1797/© 2015 Associazione Italiana di Fisica Medica. Published by Elsevier Ltd. All rights reserved.

258

R.P. Tolakanahalli et al. / Physica Medica 31 (2015) 257e265

a frame rate of 4 images/sec. While linear predictive (LP) models, such as infinite impulse response (IIR) prediction filters, have been employed with great success for short prediction horizons ranging from 50 to 200 ms for a deterministic non-linear systems that exhibit sensitive dependence on initial conditions, their capacity to yield accurate prediction deteriorates for N-point multi-step prediction for different sampling/imaging rates especially in the presence of measurement noise. In our previous work, we have established that the breathing of lung cancer patients can be described as a 5 to 6 dimensional nonlinear, stationary and deterministic system that exhibits sensitive dependence on initial conditions, and hence any of the existing linear prediction models can only be used successfully for short prediction horizons(300 ms) [8]. The intended purpose of this paper is to investigate non-linear prediction algorithms that yield long prediction horizons (>300 ms). In particular, a one-dimensional time series obtained by measuring the behavior of a multidimensional dynamical system as a function of time can be used to reconstruct the underlying attractor using the time-delay embedding theorem [9]. We have applied a number of nonlinear prediction methods to successfully predict the time evolution of the breathing pattern for 62 lung cancer patients for a time ahead prediction horizons ranging from 400 to 3000 ms. Both single-step and N-point multi-step prediction are performed for sampling rates of 5 Hz and 10 Hz. For N-point multi-step ahead prediction, an iterative scheme is used and the single-step ahead predictions from previous steps are used to make prediction at the current step. We have also compared the non-linear prediction methods to the prediction accuracy from an Adaptive Linear Predictive (ALP) model based on an IIR prediction filter. Methods and materials 1. State space representation Scalar time series data of respiratory signals were obtained using the respiratory position management (RPM) system with a rate of 30 frames/sec. Suppose xi are scalar samples acquired at times ti separated by a fixed time interval ts, yielding the scalar time series S ¼ {xi}i2T;T ¼ {1,…,M}. Using time delay embedding, the data can be represented in an m-dimensional state space as shown in Eq. (1), where t is the embedding time delay and m is the embedding dimension.

  ~i ¼ xiðm1Þt ; xiðm2Þt ; …; xit ; xi ; i ¼ 1 þ ðm  1Þt; …; t x (1) The reader is referred to [8] for a detailed description of the chaotic characteristics of breathing. Signals were normalized such that the maximum peak-to-peak amplitude is equal to unity. 2. Adaptive linear prediction model (ALP model) A linear predictor is a system that predicts the future output signal as a linear function of a set of inputs [2,10]. We consider linear predictors that are based on an AR (autoregressive) model, i.e. that have the form _ x tþD

¼

m X j¼1

aj xtj$ts

(2)

εtþD ¼ b x tþD  xtþD where xt is the amplitude of the scalar signal at time t, m is the order the AR model. The estimated signal at b x tþD is therefore predicted as a linear combination of the known previous positions xt through

xtj$ts. Note that the noise component has been suppressed, which for prediction purposes has to be averaged over the AR part only. In case of an ALP model, the optimum set of coefficients are continually found by minimizing the mean squared error (εtþD) of predictions on a set of training samples based on the respiratory motion data collected prior over a signal history length (SHL). Using the so determined optimum set of coefficients, {aj}j 2 {1,……,SHL} the breathing signal was predicted 400e3000 ms into the future. 3. Model free local prediction method in state space (LAM model) The simplest form of non-linear local prediction in state space is to consider the most similar segment of a given scalar time series S ¼ {xi}i2T;T ¼ {1,…,M} in the past, that is one uses the nearest ~t on the time-delay reconstructed attractor neighbor vectors of x ~tðiÞ , with time indices formed from the scalar time series S, x t(i) < tt in the past, to predict the time series point xt N-time steps ahead, b x tþN , by taking the average of nearest neighbors in the past, ~tð1Þ ; …; x ~tðkÞ g that are mapped N-time steps ahead on the N ¼ fx time-delay reconstructed attractor formed from S. The set of vec~tð1Þ ; …; x ~tðkÞ g and x ~t are vectors of length m, where m is tors N ¼ fx the embedding dimension. This can be expressed mathematically as follows

b x tþN ¼

1 X xtðiÞþN jNj

(3)

~tðiÞ 2N x

where jN j denotes the number of elements contained in N , the set of nearest neighbor vectors. The idea of analogues, i.e. finding similar segments in scalar time series data using a time-delay reconstruction of the underlying attractor, is directly related to the property of recurrence of orbits of dynamical systems, which furnishes the theoretical underpinning for the use of non-linear local predictions [11,12]. Hence, the problem of predicting the future value of xt N-time steps ahead is reduced to finding the ~t in the past record on the time-delay nearest neighbor vectors of x reconstruction of the underlying attractor and to use them to obtain the prediction b x tþN . We have employed as our distance measure the Euclidean norm (l2-norm) between the ~tðiÞ defined ~t and the input vectors x query vector x  qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  ~ ~t  x ~tðiÞ  ~tðiÞ ÞT ðx ~t  x ~tðiÞ Þ. Compared to the other two asx  ¼ ðx t x frequently used norms, namely the l1-norm and the Supremum norm (l∞-norm), the Euclidian norm allows one to find an intermediate number of nearest neighbor vectors, while the l1-norm will yield the least and the l∞-norm the most nearest neighbor vectors. Since all the breathing waveforms are normalized, this distance measure has no units. Even if the original dynamics is chaotic, close orbits diverge only gradually from each other and hence some degree of short-term prediction can be achieved using this method [11]. However, if the reconstructed state space dimension is too low, then orbits ~t and its nearest neighbor vectors in the past may not starting from x deviate as smoothly on the time-delay reconstructed attractor formed from S as they do on the true underlying attractor. Therefore, careful state space reconstruction is of immense importance for local prediction, and this does not only rely on the selection of embedding dimension, m, and delay time t but rather on the selection of embedding window length, tw. The embedding window length tw ¼ (m  1)t is the length of data segments on the trajectory of the underlying attractor. If one maintains a constant embedding length tw, state space reconstructions for varying m are qualitatively the same [13] (adjusting t accordingly, so that tw ¼ (m  1)t is constant).

R.P. Tolakanahalli et al. / Physica Medica 31 (2015) 257e265

259

This approach, also known as the local average model (LAM) has limited prediction capability for sparse data, but for higher dimensional data it performs as well as other methods and sometimes better than the more advanced methods [11]. For very noisy data, model free methods can work very well.

~t and xtþN and VF N ðxÞ where xt and yt are centered differences for x is the gradient of FN evaluated at x.

4. Functional approximation: local linear models (LLM)

iT h ~i ¼ xiðm1Þt ; xiðm2Þt ; …; xi ; i ¼ 1 þ ðm  1Þt; …; t x

When building a local linear model (LLM) in state space one constructs an approximation of the tangent plane at a given target ~t as described in detail below. LLMs are AR models that hold vector x ~t formed by the nearest only for a region around the target vector x neighbor vectors sometime in the past [12]. The parameters are estimated by ordinary least squares (OLS) minimization. In local linear prediction with noisy data, the OLS solution can have large variance and thus regularization methods that are designed to be more robust against noise can provide better results. Regularization can be achieved by down weighting some or all of the singular directions that are due to noise. Two different regularization methods that allow one to minimize the influence of noise on predictions when using local linear models are: i. Principal component regression (PCR): PCR rotates the natural basis of the local state space to match the basis formed by the principal components found using singular value decomposition (SVD) of the input matrix X [14,15]. In our case the input matrix X, is the matrix formed from the k nearest neighbor vectors [14,15]. Then the local state space is projected onto the subspace formed by the principal axes whose corresponding singular values are large compared to the noise level and the solution for the parameters is then found in this subspace, which is then transformed back to the original state space to yield the PCR regularized solution for the parameters. In this way, the estimated parameters have smaller variance (they are more stable) at the cost of introduced bias. ii. Partial least squares (PLS): Strategies to select the salient components using Principal Component Analysis (PCA) for the PCR technique use the input matrix X only and do not take into account the output vector y. In contrast to PCR, PLS finds the components of the input matrix that are also relevant to the output vector y. The output vector is the response vector formed from the k time advanced nearest neighbor vectors. PLS regression uses a projection into the subspace spanned by the Krylov vectors, which are assumed to be linearly independent. The Krylov subspace K(C, b) is a subspace spanned by the sequence of vectors Cb,C2b,…,Cqb, where C ¼ XXT, and b is any vector in the column space of the input matrix X. The Krylov subspace method iteratively solves the linear system by finding a best-fit solution in solution space defined by the Krylov vectors. We now embark on the detailed description of the methods described above and the algorithms associated with them. For data generated by a deterministic dynamical system, there exists a ~t : xtþN ¼ F ðx ~t Þ. The graph of functional dependence of xtþN on x the reconstructed dynamics for N-time steps ahead, FN, is a smooth ~t , FN surface embedded in
yt ¼ bT xt þ εdt ~ xt ¼ xt  x; yt ¼ xtþN  y; bT ¼ VF N ðxÞ

(4)

Algorithm.

a) Using the method of delays, the time series data is represented in a m-dimensional embedded space by delay vectors

(5)

b) Given a training subset T consisting of delay vectors, and a ~t that is not included in T , we are interested in target point x ~t the local linear prediction of x ~ ¼ ðx ~tð1Þ ; x ~tð2Þ ; …; x ~tðkÞ ÞT 2
1 k 1X 1 xt ðiÞ C B C Bk C B i¼1 C B C B C2


(6)

k 1X yðiÞ2< k i¼1

e) Then the centered versions of the nearest neighbor matrix, ~ and the response vector y ~ are obtained as ~ of the matrix X X ~  1xT , and y ¼ y ~  y1, where 1 is a k1 vector of follows: X ¼ X ones. Furthermore, the centered versions of the target point and ~t  x and yt ¼ xtþN  y. the response are given by: xt ¼ x f) For prediction, the following model is then assumed for the centered versions of the response vector y ¼ Xb þ ε, where b is a vector of m unknowns and ε is a random error with zero mean and covariance matrix var(ε) ¼ s2I, where I is a k x k identity matrix. It has been shown that b only weakly depends on noise and hence the noise component can be safely ignored in the following steps [14]. g) Prediction estimators: For the prediction estimation of X, the following Singular Value Decomposition (SVD) of X is assumed.

X ¼ USV T ; UU T ¼ I; V T V ¼ I S ¼ diagðs1 ; s2 ; …sr Þ

(7)

where r  min(k,m) is the rank of X and s1 > s2 > … > sr are the ordered non-zero singular values of X. The columns of U and V span the r-dimensional range spaces RðXÞ3
260

b ¼ VS b

R.P. Tolakanahalli et al. / Physica Medica 31 (2015) 257e265

1

r X li  T  u y vi ; where L ¼ diagðl1 ; …; lr Þ LU y ¼ si i T

i¼1

(8) In fact Eq. (8) encompasses most of the well-known regularization estimators, differing only in their choice of the filter factor matrix L [14,15]. Ordinary least squares regression (OLS): The OLS regression b , is found by minimizing the Euclidian norm of the estimate, b OLS b , can be estimated as quadratic equation, min y  Xb22 and b OLS b follows: y 1 T b b OLS ¼ X y ¼ VS U y ¼

r X 1 T  u y vi si i

(9)

i¼1

where X y ≡VS1 U T denotes the generalized Moore-Penrose pseudo-inverse of X. This estimator is obtained from Eq. (8) by setting the filter factors li ¼ 1 for i ¼ 1, 2,...,r. This amounts to allowing all directions of R(X) spanned by the columns of U to contribute to the OLS regression estimate. Principal components regression (PCR): On the other hand, the PCR regression estimate uses a subspace of R(X) spanned by the first columns (q < r) of U and the filter factors are chosen such that li ¼ 1 for i ¼ 1,2,...,q and li ¼ 0 for i ¼ q þ 1,...,r. The rationale for this approach is that the last r-q singular values {srq,…,sr} are on the order of the noise level and hence do not provide any additional information about the true y and b. Recall the that the singular values are ordered from largest to smallest. Partial least squares regression (PLS): PLS regression uses a projection into the subspace spanned by the Krylov vectors to b . The Krylov vectors are assumed to be linearly indeestimate b PLS pendent and are given by Eq. (10).

2 q   XX T y; XX T y; …; XX T y

(10)

Let us denote by K the k x q matrix formed using the q Krylov vectors as columns. The matrix K is then used to define the following operator:

 1 P ¼ K KT K KT One can easily verify that P2 ¼ P, and hence P defines a projection operator into the subspace spanned by these q linearly independent Krylov vectors. Using this projection operator the partial least square estimator for b can be written as follows: y b b PLS ¼ X Py

(11)

Qualitatively, the estimator shrinks the OLS estimate by taking into account the size of singular values as well as the size of Fourier coefficients. 5. Prediction error estimator To validate the predictive power of a model, the available data set is split into two parts, one for fitting and one for testing. Crossvalidation procedures could be used here as well. The normalized root mean squared error (NRMSE), computed on all points in the test set, can be used to measure the quality of prediction [3,4,14]. A value of close to 1 for the NRMSE means that the prediction is as good as the mean value prediction whereas a value of 0 is indicative of a perfect prediction. The N-step time prediction error can be measured using NRMSE as follows:

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi v  u PMN u 2 1 u MNn b t¼nþ1 ðxtþN  x tþN Þ u t NRMSE ¼ PM 2 1 s¼1 ðxs  xÞ M

(12)

where M denotes the signal length of the entire data set, x denotes the mean of the entire data set, xtþN with t2{n þ 1,…,M  N} denotes the observed values in the test set, b x tþN with t2 {n þ 1,…,M  N} denotes the predicted values, and n denotes the number of time series points in the training set. It should be noted that there are slight differences in our definition of NRMSE compared to that used in prior work [4], in the way the sums in the numerator and denominator are normalized and the denominator in Equation (12) is computed. In both definitions of the NRMSE the numerator is computed only for values in the test set. However, in Murphy and Dieterich [4], the sum in the denominator of Equation (12) is computed only for values in the test data set, whereas in our work, this sum is computed for all values of the data set. However, we compensate for the difference in signal length of the entire data set and that of the training data set by normalizing each sum using its respective signal length. Therefore, for signal lengths of the test set that are very much larger than the number, N, prediction time steps, our definition will yield values for the NRSME that will be comparable to those found when using the definition of Murphy and Dieterich [4] for the NRMSE. For both the model free and local linear prediction method, the number of nearest neighbors (k) and the radius of neighborhood around the target point are two of the main parameters. Radius of neighborhood is defined as the radius of the m-dimensional sphere around the point of interest in which one can pick the k nearest neighbors. The radius of neighborhood does not have any units since the breathing pattern has been normalized such that peak-topeak amplitude is equal to 1 for all breathing traces. In the current work the number of neighbors has been fixed to 5 for all patient breathing pattern considered and the radius of neighborhood was varied from 0.02 to 0.2 units to achieve this. The success of the prediction with PCR and PLS relies on the proper selection of the regularization parameter, q. For PLS and PCR, cross-validation techniques are commonly used, in which the effect of the shrinkage parameter q on a measure such as Root Mean Square Error (RMSE) is evaluated [16]. Other methods for estimation of q, include finding a threshold value that represents the noise variance of the time series signal or by placing a condition that singular values account for at least some proportion of the total data variation in X. In this study, the value of q was chosen as r-2 for both PCR and PLS techniques by assessing the NRMSE calculated for various values of q. The methods described above have been applied to the breathing patterns of 62 patients. The data set for each patient was divided into a training set and a testing set. The length of the training set was varied from 10 to 200 s for all algorithms, depending on the total length of breathing cycle collected for a particular patient, yielding for a sampling rate of 5 Hz, 10, Hz, and 30 Hz, values of n varying between 50, 100, 300 to 1000, 2000, 6000, respectively. Prediction for all breathing patterns was done using the ALP method and the state-space methods discussed above, i.e. LAM, LLM: OLS, PLS and PCR. The length of prediction for N-step prediction was varied from 400 ms to 3000 ms for 5 Hz and 10 Hz sampling rates.

Results Among the 62 patients selected randomly, we present the data for 62 patients retrospectively binned into three separate groups,

R.P. Tolakanahalli et al. / Physica Medica 31 (2015) 257e265

hereafter referred to as Bin 1, Bin2 and Bin3. Bin 1 consists of 50 prediction cases where the state-space methods are significantly better than ALP, especially for longer prediction times whereas Bin 2 consists of 9 patients, where prediction by any of the methods does not yield an adequately good prediction of the breathing pattern since all prediction models depend mainly on the history of the signal. In other words, a breathing pattern that does not repeat itself to some extent in the observation window cannot be successfully or accurately predicted. However, if a breathing pattern were to be observed for a very long time, accurate predictions could be made, however, very long observation times are impractical for radiation therapy applications. Bin 3 consists of 3 patients for which both non-linear prediction and the adaptive linear prediction methods performed equally well, since their breathing was highly regular. Figure 1 shows the predicted signal for one of the patients in Bin1 (Patient 2) using LAM, OLS, PLS, PCR prediction methods that are contrasted to the ALP prediction. The full cycle including the training and testing set is shown in Fig. 1(a). Fig. 1(b)e(f) show the predicted signal overlaid on the test signal. The time window shown on the x-axis corresponds to the beginning of prediction or

261

(end of training time (18 s)) to end of prediction (49 s). Fig. 1(b) shows the predicted signal generated using ALP-single-step overlaid on the test signal. It can be seen from Fig. 1(c)e(f) that all statespace based regularized prediction methods perform better than the ALP model. Note that the prediction signals shown in Fig. 1(c)e(f) for each of the state-space methodologies and ALP are N-time step (N ¼ 5) ahead predictions of 1000 ms. Furthermore, Fig. 1 shows that the LAM performs as well as the model based regularized prediction methods. It can be clearly seen from Fig. 1 that both the ALP-model and OLS model fail for N-point multi time-step prediction (if N > 1). Note that most studies in the literature involve only single time step prediction for different imaging rates/sampling rates (5e30 Hz). We find that the state-space based methods, except for OLS, for N-point multi-time-step prediction for the two imaging rates (10 Hz and 5 Hz) yield a better prediction than the ALP model. The NRMSE values computed using LAM, ALP, OLS, PCR and PLS for patient 1 and patient 2 ranging from 400 ms to 3000 ms ahead prediction are tabulated in Table 1. Histogram plots for all patients in Bin 1 for 2000 ms prediction time at 10 Hz and 5 Hz is shown in Fig. 2(a) and (b) respectively. The

Figure 1. 5 time steps ahead prediction for Patient 2 in Bin 1 with the respiratory cycle shown in (a) predicted with using (b) ALP single-time step (c) LAM (d) OLS, (e) PCR, and (f) PLS are compared with the ALP model (5 time step ahead). Signals were predicted 1000 ms ahead employing a sampling frequency of 5 Hz with a 18 s training set.

262

R.P. Tolakanahalli et al. / Physica Medica 31 (2015) 257e265

Table 1 NRMSE for different prediction times is listed for the sample patients 1e4. Prediction time (sec) ALP prediction: 5 Hz (10 Hz) Patient 1 (Bin 1) Patient 2 (Bin 1) Patient 3 (Bin 2) Patient 4 (Bin 3) LAM prediction: 5 Hz (10 Hz) Patient 1 (Bin 1) Patient 2 (Bin 1) Patient 3 (Bin 2) Patient 4 (Bin 3) OLS prediction: 5 Hz (10 Hz) Patient 1 (Bin 1) Patient 2 (Bin 1) Patient 3 (Bin 2) Patient 4 (Bin 3) PCR prediction: 5 Hz (10 Hz) Patient 1 (Bin 1) Patient 2 (Bin 1) Patient 3 (Bin 2) Patient 4 (Bin 3) PLS prediction: 5 Hz (10 Hz) Patient 1 (Bin 1) Patient 2 (Bin 1) Patient 3 (Bin 2) Patient 4 (Bin 3)

0.4

0.8

0.33(0.29) 0.26(0.25) 1.56(1.56) 0.40(0.51)

0.56(0.52) 0.49(0.49) 1.70(1.87) 0.47(0.94)

0.31(0.27) 0.22(0.22) 0.84(0.80) 0.27(0.30)

1

1.5

2

3

0.63(0.56) 0.56(0.59) 1.56(1.83) 0.49(1.05)

0.85(0.75) 0.81(0.73) 1.78(2.17) 0.50(1.19)

0.97(0.93) 0.94(1.03) 1.72(2.10) 0.50(1.26)

1.1(1.07) 1(1.05) 1.81(2.26) 0.57(1.32)

0.36(0.34) 0.26(0.26) 0.97(0.88) 0.38(0.39)

0.37(0.37) 0.31(0.34) 0.93(0.91) 0.43(0.35)

0.44(0.43) 0.34(0.36) 1.05(0.99) 0.4(0.40)

0.47(0.53) 0.40(0.42) 1.05(1.00) 0.5(0.46)

0.52(0.57) 0.50(0.52) 1.3(1.22) 0.54(0.58)

0.70(0.51) 2.04(62.1) 1.45(1.95) 0.77(0.49)

1.22(0.83) 2.28(3.54) 3.16(2.49) 0.93(0.76)

0.85(0.83) 43.96(2.02) 1.73(2.79) 2.50(0.66)

1.45(0.62) 4.69(5.51) 3.84(1.90) 0.81(0.51)

1.24(0.97) 6.27(2.61) 2.49(3.64) 1.43(0.96)

1.26(0.91) 47.91(2.54) 2.13(1.58) 3.00(0.80)

0.28(0.27) 0.22(0.23) 1.05(0.99) 0.27(0.31)

0.37(0.34) 0.26(0.26) 1.24(0.90) 0.38(0.39)

0.38(0.37) 0.32(0.32) 1.13(1.07) 0.43(0.35)

0.48(0.43) 0.35(0.35) 1.19(1.03) 0.44(0.40)

0.50(0.50) 0.42(0.40) 1.20(1.26) 0.50(0.46)

0.62(0.57) 0.48(0.48) 1.41(1.23) 0.54(0.58)

0.28(0.28) 0.22(0.23) 0.94(0.80) 0.32(0.30)

0.35(0.40) 0.27(0.30) 1.32(0.88) 0.42(0.39)

0.39(0.38) 0.34(0.34) 1.02(0.91) 0.48(0.35)

0.46(0.43) 0.35(0.39) 1.07(0.99) 0.49(0.40)

0.51(0.50) 0.43(0.47) 1.17(1.00) 0.55(0.46)

0.59(0.60) 0.48(0.49) 1.41(1.22) 0.60(0.58)

Figure 2. Histogram distribution for all patients in Bin 1 predicted with LAM (Red) and ALP prediction (Gray) for 2000 ms prediction at (a) 10 Hz (b) 5 Hz) is shown. The overlapping regions are shaded in pink. (c) shows NRMSE as a function of Prediction Time for LAM (Red) and ALP (Blue) at 10 Hz (dotted) and 5 Hz (solid) sampling rates. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

R.P. Tolakanahalli et al. / Physica Medica 31 (2015) 257e265

mean NRMSE as a function of prediction time for all the 50 patient signals in Bin1 is shown in Fig. 2(c) for both 10 Hz and 5 Hz sampling rates. A histogram plot for the 9 patients in Bin 2 for 2000 ms prediction time at 10 Hz and 5 Hz is shown in Fig. 3(a) and (b) respectively. The mean NRMSE as a function of prediction time for all the 9 patient signals in Bin 2 is shown in Fig. 3(c) for 10 Hz and 5 Hz sampling rates. Histogram plots for 3 patients in Bin 3 for 2000 ms prediction time at 10 Hz and 5 Hz is shown in Fig. 4(a) and (b) respectively. The mean NRMSE as a function of prediction time for three patient signals in Bin 3 is shown in Fig. 4(c) for both 10 Hz and 5 Hz sampling rates. Discussion and conclusion For the cohort of patients we have studied, we have found that for N-point multi-step prediction, non-linear prediction methods, except for Ordinary Least Squares (OLS), did yield better prediction results than N-point multi-step prediction using an Adaptive Linear Prediction (ALP) model (cf. Figs. 2 and 3). In particular, we have found that the simple model free Local Average Model (LAM) performed as well as the more complicated model based regularized local linear models (LLM) such as Partial Least Squares (PLS) and Principal Components Regression (PCR) in the prediction of breathing patterns for the patient population we have studied.

263

Therefore, for patients that are good candidates for prediction the use of the simple local average model to predict the breathing pattern or tumor motion if tumor motion coordinates are available, could lead to improved, robust and accurate long-term predictions that can be used to account for system latencies. It is known that changes in the breathing pattern during treatment will lead to a change in the intrafraction tumor motion trajectory. Recently, two groups have shown that how even advanced techniques namely 4DCT can lead to misinterpretation of tumor motion extent in case of irregular breathing and have highlighted the importance of gating window selection for lung SBRT patients [17,18]. Thus, currently, a number of research groups are developing or have developed prediction algorithms for targeting moving tumors more efficiently. The error in prediction due to the sensitive dependence on initial conditions for all algorithms can be estimated. This error estimation is closely related to the prediction horizon in the field of nonlinear chaotic dynamics and is simply an estimate of how far into the future one can predict a signal for a given error threshold. We have found that a prediction engine based on nonlinear dynamics has a larger prediction horizon than other methods if an appropriate time delay and embedding dimension are chosen, which can be seen from the results presented for patients in Bin1. Proper choice of the time delay and embedding dimension allows one to reconstruct an attractor that has a minimal number of false nearest neighbors, which can lead to incorrect prediction [11,19].

Figure 3. Histogram distribution for nine patients in Bin 2 predicted with LAM (Red) and ALP prediction (Gray) for 2000 ms prediction at (a) 10 Hz (b) 5 Hz) is shown. The overlapping regions are shaded in pink. (c) shows NRMSE as a function of Prediction Time for LAM (Red) and ALP (Blue) at 10 Hz (dotted) and 5Hz(solid) sampling rates. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

264

R.P. Tolakanahalli et al. / Physica Medica 31 (2015) 257e265

Figure 4. Histogram distribution for three patients in Bin 3 predicted with LAM (Red) and ALP prediction (Gray) for 2000 ms prediction at (a) 10 Hz (b) 5 Hz) is shown. The overlapping regions are shaded in pink. (c) shows NRMSE as a function of Prediction Time for LAM (Red) and ALP (Blue) at 10 Hz (dotted) and 5 Hz(solid) sampling rates. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

As pointed out above, the only non-linear prediction methodology that performed poorly for all patients studied when compared to the ALP method was Ordinary Least Squares (OLS) (cf. Table 1). This is because the prediction capability of OLS deteriorates since the part of the OLS solution that relates to the directions masked with noise does not contain any useful information that can be used to predict the future signal [9]. While for a prediction horizon of 400 ms the NRMSE for the ALP model, LAM model, and the LLM models (PLS & PCR) are comparable, the NRMSE for the ALP model does however diverge from that of the nonlinear models for longer prediction horizons (cf. results shown in Table 1 and Figs. 2e4 comparing the NRMSE for the ALP and LAM models). For very large prediction horizons the NRSME of the ALP model approaches 1 indicating that its prediction is as good as simply using the mean value of the signal as a predicted value. In other words, based on the optimized coefficients generated using signal history, ALP in the case of multi step prediction loses its value after one or two time steps. The ALP methodology with updated signal history resets the coefficients yielding fairly accurate prediction for one or two steps for the next set of N-point prediction. With the advent of newer tumor tracking technologies with which it will be possible to acquire real-time MR images at a frame rate of 4 images/sec, the presented nonlinear prediction methods can be directly implemented on the tumor motion coordinates. It can be seen that performance of an ALP model is frequency dependent and deteriorates at 10 Hz, and this is mainly due to the presence of noise, which contributes to the determination of model

parameters. However, performance of the LAM and LLM models are not frequency dependent since they do not significantly worsen for either sampling rate (see Figs. 2 and 3). Thus, based on the cohort of patient breathing patterns analyzed for this study, the local statespace prediction models will be more desirable for prediction of real-time tumor coordinates; specifically the LAM model, since it is computationally inexpensive and thus is more desirable. The presented methodology greatly increases prediction horizon to about 3000 ms. Historical and recent nonlinear methods mentioned in the Introduction show improved and accurate prediction to address system latency up to 500 ms. Future studies should include analysis of these methods to yield more insight into how they can be employed synergistically for accurate short-term and long term predictions to address the problem of system latency and duty cycle in radiation therapy. The work presented can also be extended in several ways for tracking tumor motion in real time. One can, for example, model breathing pattern based on neurophysiologic mechanisms. Another possibility is to use chaotic dynamics to control and guide patients breathing for regularity and reproducibility. References [1] Verma PS, Wu H, Langer MP, Das IJ, Sandison G. Survey: real-time tumor motion prediction for image-guided radiation treatment. Comput Sci Eng 19 August 2010;13:24e35. [2] Vedam S, Keall P, Docef A, Todor D, Kini V, Mohan R. Predicting respiratory motion for four-dimensional radiotherapy. Med Phys 2004;31:2274e83.

R.P. Tolakanahalli et al. / Physica Medica 31 (2015) 257e265 [3] Isaksson M, Jalden J, Murphy MJ. On using an adaptive neural network to predict lung tumor motion during respiration for radiotherapy applications. Med Phys 2005;32:3801e9. [4] Murphy MJ, Dieterich S. Comparative performance of linear and nonlinear neural networks to predict irregular breathing. Phys Med Biol 2006;51:5903. [5] Wu H, Sharp GC, Salzberg B, Kaeli D, Shirato H, Jiang SB. A finite state model for respiratory motion analysis in image guided radiation therapy. Phys Med Biol 2004;49:5357. [6] Ruan D, Fessler JA, Balter JM, Sonke J-J. Exploring breathing pattern irregularity with projection-based method. Med Phys 2006;33:2491e9. [7] Ernst F, Dürichen R, Schlaefer A, Schweikard A. Evaluating and comparing algorithms for respiratory motion prediction. Phys Med Biol 2013;58:3911.  WA. Time series analyses of [8] Tewatia D, Tolakanahalli R, Paliwal B, Tome breathing patterns of lung cancer patients using nonlinear dynamical system theory. Phys Med Biol 2011;56:2161. [9] Takens F. Detecting strange attractors in turbulence. In: Rand D, Young L-S, editors. Dynamical systems and turbulence. Berlin Heidelberg: Warwick 1980: Springer; 1981. p. 366e81. [10] Sharp GC, Jiang SB, Shimizu S, Shirato H. Prediction of respiratory tumour motion for real-time image-guided radiotherapy. Phys Med Biol 2004;49:425. [11] Kantz H, Schreiber T. Nonlinear time series analysis. Cambridge University Press; 2004.

265

[12] Farmer JD, Sidorowich JJ. Predicting chaotic time series. Phys Rev Lett 1987;59:845. [13] Kugiumtzis D. State space reconstruction parameters in the analysis of chaotic time series e the role of the time window length. Phys D Nonlinear Phenom 1996;95:13e28. [14] Kugiumtzis D, Lingjærde O, Christophersen N. Regularized local linear prediction of chaotic time series. Phys D Nonlinear Phenom 1998;112:344e60. [15] Kugiumtzis D. Modelling and forecasting financial data. Techniques of nonlinear dynamics. 2002. p. 95e113. [16] Lukas MA. Comparisons of parameter choice methods for regularization with discrete noisy data. Inverse Probl 1998;14:161. [17] Aznar MC, Persson GF, Kofoed IM, Nygaard DE, Korreman SS. Irregular breathing during 4DCT scanning of lung cancer patients: is the midventilation approach robust? Phys Medica 2014;30:69e75. [18] Jang SS, Huh GJ, Park SY, Yang PS, Cho EY. The impact of respiratory gating on lung dosimetry in stereotactic body radiotherapy for lung cancer. Phys Medica 2014;30:682e9. [19] Basharat A, Shah M. Time series prediction by chaotic modeling of nonlinear dynamical systems. In: Computer Vision, 2009 IEEE 12th International Conference on. IEEE; 2009. p. 1941e8.