Consistent estimation of ordinary differential equation when the transformation parameter is unknown

Consistent estimation of ordinary differential equation when the transformation parameter is unknown

Accepted Manuscript Consistent estimation of ordinary differential equation when transformation parameter is unknown Jie Zhou, Hai-Lin Feng PII: DOI: ...

201KB Sizes 0 Downloads 25 Views

Accepted Manuscript Consistent estimation of ordinary differential equation when transformation parameter is unknown Jie Zhou, Hai-Lin Feng PII: DOI: Reference:

S0167-7152(16)00037-7 http://dx.doi.org/10.1016/j.spl.2016.02.008 STAPRO 7545

To appear in:

Statistics and Probability Letters

Received date: 12 May 2015 Revised date: 16 February 2016 Accepted date: 16 February 2016 Please cite this article as: Zhou, J., Feng, H.-L., Consistent estimation of ordinary differential equation when transformation parameter is unknown. Statistics and Probability Letters (2016), http://dx.doi.org/10.1016/j.spl.2016.02.008 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Consistent Estimation of Ordinary Differential Equation When Transformation Parameter is UnknownI Jie Zhou, Hai-Lin Feng Department of Statistics, Xidian University, Xi’an, Shaanxi, P R China

Abstract Two-step estimation of ordinary differential equation (ODE) is investigated when the transformation parameter is unknown. First we build the transformation parameter estimator by profile M-estimation. Then with a new consistency result of M-estimator, two-step estimator is proved to be consistent. Numerical studies validate the effectiveness of the proposed estimator. Key words: Two-step estimation, ODE, Transformation parameter, Weak convergence

1. Introduction Consider the following ordinary differential equation x(t) ˙ = F(x(t), θ ),

t ∈ [0, 1],

(.)

where x(t) is d-dimensional state variable and θ is p-dimensional unknown parameter. If the initial condition x(0) is unknown, then it can also be included in θ . Such model has been widely used in different fields to model the dynamic phenomena. For example in biology the famous Lotka-Volterra equation is used to model the dynamic interaction between the predator species and prey species; in clinical trials of infectious disease, the relationship between the virus and immune system has been widely modeled by ODE models, see Perelson and Nelson (1999), Nowak and May (2000), Robeva et al. (2008) for details. When ODE model (.) is used in practice, the unknown parameters θ should be estimated from the noisy observations of x(t) firstly. However the observations Yi ’s at time ti ’s are often not the direct observations of x(ti )(i = 1, · · · , n) and some scale transformation should be performed firstly in order to estimate θ . For example in clinical trial of AIDS, logarithm transformation is usually applied to the measurements of load of virus when the HIV model is estimated. Let B(·|λ ) denote such transformation family indexed by parameter λ ∈ Λ ⊆ Rd , in which λi , the ith component of λ , corresponds to the transformation applied to the ith component of x(t). The transformed observations are assumed to satisfy the following relationship Yλ ,i , B(Yi |λ ) = x(ti |λ ) + ελ ,i ,

i = 1, · · · , n.

(.)

with Eελ ,i = 0 and Dελ ,i = σλ2 , i.e., x(ti |λ ) = EB(Yi |λ ). We assume there exists a unique λ0 ∈ Λ such that x(t|λ0 ) is the solution of model (.). We call λ0 the true value of λ . The parameter in model (.) corresponding to x(t|λ0 ) is called the true value of θ and is denoted by θ0 . A natural question is how to estimate the transformation parameter λ0 given the observations Yi ’s. Traditionally parameter estimation for ODE is regarded as a problem of nonlinear regression and can be carried out based on the numerical integration approaches such as Runge-Kutta algorithm. Consequently transformation parameter can be estimated by the corresponding profile likelihood approaches. Though I Supported

by NSFC(71271165) Email address: [email protected] (Jie Zhou )

Preprint submitted to Statistics and Probability Letters

April 12, 2016

such estimation approach is reasonable theoretically, in practice the computational loads are notoriously large and the resulted estimators often are unreliable, see Cao and Ramsay (2007), Ramsay et al. (2007) for a detailed discussion of this approach. In this paper we study the estimation of transformation parameter under the framework of two-step estimation of ODE (.). Such estimation approaches were firstly proposed by Varah (1982) and recently studied by Brunel (2008), Liang and Wu (2008), Fang et al. (2011), Gugushvili and Klaassen (2012), Wu et al. (2013) among many others. We define the estimator of λ as a profile M-estimator. In order to gain some insights about such estimator, the property of consistency for ordinary M-estimator is firstly generalized to the model with unknown transformation parameter. Based on this result the consistency of the proposed profile M-estimator is established. Simulation studies show that the proposed transformation parameter estimator is efficient. A real data set from the clinical trial is examined and the results show that a better model fit can be attained by the proposed procedure compared to ordinary logarithm transformation. This paper is organized as follows. Section 2 presents the main results. The numerical studies, including simulated data set and real data set, appears in section 3. The detailed proofs of these results are presented in section 4. 2. Main Results Though the state variable x(t) can be d-dimensional for d ∈ N, in the following we will always assume d = 1 for the simplicity of notation. All the results involved can be generalized to the general multivariate settings straightforwardly. Given λ ∈ Λ, denote the transformed observations by Y = (Yλ ,1 , · · · ,Yλ ,n )T where Yλ ,i = B(Yi |λ ) for i = 1, · · · , n. In order to estimate θ , the two-step approach first estimates x(t) and its derivative x(t). ˙ Several choices have been proposed in this step, including spline based estimates (Varah (1982), Brunel (2008)), kernel smoothing based estimates (Liang and Wu (2008), Gugushvili and Klaassen (2012), Wu et al. (2013)). Here we employ the following Priestley-Chao estimate which had been proposed in Gugushvili and Klaassen (2012),   n t − ti 1 Yλ ,i , (.) x(t|λ ˆ ) = ∑ (ti − ti−1 ) K b b i=1 where b is bandwidth, K(·) is the kernel function. The estimate of x(t) ˙ is assumed to be the derivative of x(t) ˆ with ˙ˆ ˆ˙ = x(t). respect to t, i.e., x(t) In the second step the estimate of parameter θ is defined by θˆ (λ ) = arg minθ ∈Θ

Z 1 0

ˆ˙ [x(t|λ ) − F(x(t|λ ˆ ), θ )]2 w(t)dt

= arg minθ ∈Θ Mn (θ , λ )

(.)

for some set Θ ∈ R p . Here w(t) is weight function with support [δ , 1 − δ ] for some 0 < δ < 1/2. Gugushvili and Klaassen (2012) had investigated the asymptotics of such two-step estimator when λ0√ , the true value of λ , is known. Under some regular conditions they showed that θˆ (λ0 ) converges to θ0 at the rate n. Transformation parameter usually is unknown in practice. Therefore we substitute θˆ (λ ) into the right-hand side of (.) and define the estimator of λ as λˆ

= arg minλ ∈Λ Mn (θˆ (λ ), λ )

(.)

for some set Λ ∈ R. A desired property of an estimator is the consistency. In order to show λˆ is consistent, some insights about the asymptotic properties of θˆ (λ ) are needed. To this end, let x(t|λ ) = EB(Y (t)|λ ) and x(t|λ ˙ ) be the derivative of x(t|λ ) with respect to t. Define θ (λ ) = arg minθ ∈Θ

Z 1 0

[x(t|λ ˙ ) − F(x(t|λ ), θ )]2 w(t)dt

= arg minθ ∈Θ M(θ , λ ),

then we have the following result. 2

(.)

Proposition 1. Suppose b → 0 and nb3 / log n → ∞. For given λ and ∀ε > 0, we assume that there exists a positive constant τλ (ε) such that M(θ (λ ), λ ) + τλ (ε) <

inf

θ ∈Θ:kθ −θ (λ )k>ε

M(θ , λ ),

(.)

p then under the conditions (A1)-(A7) given in section 4 we have θˆ (λ ) → θ (λ ). Note the constant τλ (ε) in (.) depends on both λ and ε. Particularly when λ = λ0 , Proposition 1 is just the Proposition 3.2 of Gugushvili and Klaassen (2012). Though they have the different forms, condition (.) above is in fact equivalent to the condition (3.9) in Gugushvili and Klaassen (2012) √ when λ = λ0 . In that case Gugushvili and Klaassen (2012) showed that the rate of such convergence can attain n. This result can also be extended to the general case for ∀λ ∈ Λ. Define

J(θ (λ )) =

Z 1 0

(F˙θ (x(t|λ ), θ (λ )))T (F˙θ (x(t|λ ), θ (λ )))w(t)dt,

(.)

where F˙θ denotes the partial derivative of F with respect to θ . Then the following result holds. Proposition 2. For given λ ∈ Λ, let θ (λ ) be an interior point of Θ. Assume the conditions (A1)-(A7) together with (.) hold and that J(θ (λ )) is nonsingular. The parameter α defined in condition (A4) in section 4 satisfies α > 4 and the bandwidth b is such that b = O p (n−γ ) holds for 1/2α < γ < 1/6, then √ n(θˆ (λ ) − θ (λ )) = O p (1) holds. Though Proposition 1 and 2 are the generalization of the corresponding results in Gugushvili and Klaassen (2012), with a slight modification, the proofs in Gugushvili and Klaassen (2012) can also be applied here and are therefore omitted. On the other hand the issues that Proposition 1 and 2 address only involve the fixed λ ∈ Λ. In order to investigate the asymptotics of λˆ defined in (.), the uniform convergence of θˆ (λ ) with respect to λ should be established firstly. To this end, we investigate the uniform consistency of the ordinary M-estimator. The following Theorem 1 can be seen as a generalization of Theorem 5.7 in van der Vaart (1998). Theorem 1. Let Mn (θ , λ ) be random functions and let M(θ , λ ) be a fixed function such that sup

θ ∈Θ,λ ∈Λ

p

|Mn (θ , λ ) − M(θ , λ )| → 0.

(.)

We further assume that for every ε > 0 and λ ∈ Λ, there exists a constant τ(ε) > 0 which does not depend on λ such that M(θ (λ ), λ ) + τ(ε) <

inf

θ ∈Θ:kθ −θ (λ )k>ε

M(θ , λ ).

(.)

Then (i) any sequence θˆ (λ ) with Mn (θˆ (λ ), λ ) ≤ Mn (θ (λ ), λ ) uniformly converges to θ (λ ) in probability, i.e., p sup k θˆ (λ ) − θ (λ ) k→ 0.

(.)

λ ∈Λ

(ii) for θˆ (λ ) defined in (i), if we further assume that (a) all the third derivatives of M(θ , λ ) and Mn (θ , λ ) with respect to θ and λ are continuous on Θ × Λ; (b)infλ ∈Λ M¨ θ θ (θ (λ ), λ ) > 0; (c) the following uniform convergence results hold, p

sup

k (M¨ n )θ θ (θ , λ ) − M¨ θ θ (θ , λ ) k→ 0,

(.)

sup

p k (M¨ n )θ λ (θ , λ ) − M¨ θ λ (θ , λ ) k→ 0,

(.)

θ ∈Θ,λ ∈Λ θ ∈Θ,λ ∈Λ

3

where k · k in (.) and (.) denotes the Frobenius-norm of a matrix. Then we have p sup k θ˙ˆ (λ ) − θ˙ (λ ) k→ 0.

λ ∈Λ

(.)

Here M¨ θ θ denotes the second order derivative of M with respect to θ . Similar explanations apply to M¨ θ λ , (M¨n )θ θ and (M¨n )θ λ . Remark 1. (i) The difference between condition (.) and condition (.) is the constant τ. While τ in condition (.) depends on both λ and ε, in condition (.) τ is independent with λ which guarantees θ (λ ) is uniformly identifiable for λ ∈ Λ. √ Condition (.) may not hold for some choices of M(θ , λ ). Consider √ the following specification, M(θ , λ ) = λ1 (θ − λ )2 defined on R × R+ , for which the minimum point is θ (λ ) = λ . It can be easily verified for this criterion function and ∀ε > 0, while condition (.) holds, there does not exist τ(ε) > 0 that guarantees condition (.) holds. (ii) Assumption (b) requires that the second derivatives M¨ θ θ is positive definite at the minimum point θ (λ ) for all λ ∈ Λ. In most cases this condition can be verified straightforwardly. Now we return to θˆ (λ ) and θ (λ ) defined in (.) and (.) respectively, for which, with Theorem 1 in hand, we can get the following results. Theorem 2. Assume all the conditions in Proposition 2 hold. If the uniform identifiability condition (.) is true and infλ ∈Λ |M¨ θ θ (θ (λ ), λ )| > 0, then we have p sup k θˆ (λ ) − θ (λ ) k→ 0.

λ ∈Λ

(.)

0

Furthermore if assumption (A7) is replaced by (A7 ), then we also have p sup k θ˙ˆ (λ ) − θ˙ (λ ) k→ 0.

λ ∈Λ

(.)

Theorem 3. Under the same conditions as for (.) we have p λˆ → λ0 .

Remark 2. With λˆ in hand, we can naturally define the estimator of θ by θˆ (λˆ ). For θˆ (λˆ ) we have |θˆ (λˆ ) − θ0 |

≤ ≤ p

→0

|θˆ (λˆ ) − θˆ (λ0 )| + |θˆ (λ0 ) − θ (λ0 )| sup |θ˙ˆ (λ )|(λˆ − λ0 )| + |θˆ (λ0 ) − θ (λ0 )|

λ ∈Λ

(.)

by Proposition 1, Theorem 3 and the continuity of θ˙ˆ (λ ) on the compact set Λ. So θˆ (λˆ ) is a consistent estimator of θ0 . It is known that the choice of bandwidth b is important for the performance of the estimator θˆ (λ ) in (.) and consequently it can also affect the performance of λˆ in (.). For estimating λˆ , a reasonable choice is to use the optimal bandwidth for the estimation of x(t) and its derivative x(t). ˙ If x(t) is assumed to be α times continuously differentiable and the sample size is n, then such optimal bandwidth should be of order n−1/(2α+1) . However Theorem 3 shows that the optimal bandwidth for the estimation of λˆ should be of order n−γ for 1/(2α) < γ < 1/6 which is smaller than n−1/(2α+1) , i.e., for estimating θ , the curve x(t) and its derivative should be undersmoothed. In the numerical studies in the next section, on a discrete grid of interval 1/(2α) < γ < 1/6, we will select the minimizer of Mn (θˆ (λ ), λ ), which is defined in (.), as the choice of γ. Other more complex strategies for the choice of bandwidth b also have been discussed in literatures, see, e.g., Liang and Wu (2008), Gugushvili and Klaassen (2012).

4

3. Numerical Studies The dynamics of HIV (human immunodeficiency virus) in vivo are usually characterized by the ordinary differential equations. Consider the following model T˙U (t) = s − dTU (t) − k(t)TU (t)V (t), T˙I (t) = k(t)TU (t)V (t) − δ TI (t), V˙ (t) = Nδ TI (t) − cV (t).

(.) (.) (.)

Three state variables TU (t), TI (t),V (t) represent the concentration of uninfected T cell, infected T cell and virus in body respectively . The parameters include (s, d, k(t), δ , N, c) in which s denotes the rate at which new T cell is generated from sources within the body; d is the death rate per uninfected T cell; k(t) is the infection rate which can be time-varying; δ is the death rate per infected T cell; N is the number of virions produced by one infected T cell during its lifetime; c is the death rate per virus. Note though three state variables have been involved in this model, only V (t) and T (t) = TU (t) + TI (t) can be measured in clinical practice. Instead of dealing with model (.)∼(.) directly, a feasible way for this partially measured problem is to derive a system only containing the observed variables V (t) and T (t) = TU (t) + TI (t), which can be shown to be V˙ (t) = α0 + cV (t) + α1 T (t) + α2 T˙ (t),

(.)

where s = −α0 /α2 , d = α1 /α2 , N = α1 /δ − α2 , see Liang and Wu (2008), Fang et al. (2011) for the detailed derivation. Conventionally when the model (.) is fitted, the logarithm transformation is applied to the measured load of virus, which corresponds to the Box-Cox transformation with transformation parameter λ = 0. Certainly logarithm transformation may be not the optimal choice for a given patient. In the following, we will evaluate the performance of the proposed approach based on model (.) using both simulated and real data. Kernel function involved in two-step estimation is set to be,   2 105 315 2 1 − t 2 I|t|≤1 , − t (.) K(t) = 64 64

which is a fourth order kernel proposed by Gugushvili and Klaassen (2012). The bandwidth is chosen based on the formula b = n−γ given in Proposition 2 for 1/8 ≤ γ ≤ 1/6, The weight function w(t) is just assumed to be constant on interior point and zero on the boundary point. Other form of w(t) also is possible, e.g., see formula (3) on page 552 of McMurry and Politis (2004) or formula (4.4) of Gugushvili and Klaassen (2012). 3.1. Simulated Data Consider model (.)∼(.) on the time interval t ∈ [0, 20]. The parameters are set to be s = 36, d = 0.108, k(t) = 9 ∗ 10−5 (1 − 0.9 cos(πt/1000)), δ = 0.5, N = 1000, c = 3. The initial values for the three state variables are set respectively to be TU (0) = 600, TI (0) = 30, V (0) = 30. Runge-Kutta algorithm is employed to generate the states TU (ti ), TI (ti ),V (ti ) at time point ti for i = 1, · · · , 70, which are evenly spaced within [0,20]. The measurements are Y1i = V (ti ) + ε1i , Y2i = TU (ti ) + TI (ti ) + ε2i

(.)

where ε1i ∼ N(0,σ12 ) and ε2i ∼ N(0,σ22 ) are random measurement errors corresponding to virus and total T cell respectively. Three kinds of combination of error variance σ12 and σ22 are considered which are respectively (i) σ12 = 20, σ22 = 100; (ii) σ12 = 30, σ22 = 150; (iii) σ12 = 40, σ22 = 200. The true transformation parameter is set to be λ = 0.5. Four sample sizes n = 30, 100, 150, 200 are investigated. Every case is replicated for 50 times. The results, including the mean of parameter estimate and the standard error, are listed in the Table 1. Note we have transformed the estimates of (α1 , α1 , α2 ) to the estimates of (s, d, N) based on the one-to-one relationship between them. From Table 1 we can see all the estimates are reasonable at the given sample size and the error variances. So the proposed

5

procedure is reliable. s.d. σ1 = 20, σ2 = 100 σ1 = 30, σ2 = 150 σ1 = 40, σ2 = 200 σ1 = 20, σ2 = 100 σ1 = 30, σ2 = 150 σ1 = 40, σ2 = 200 σ1 = 20, σ2 = 100 σ1 = 30, σ2 = 150 σ1 = 40, σ2 = 200 σ1 = 20, σ2 = 100 σ1 = 30, σ2 = 150 σ1 = 40, σ2 = 200

Table 1: Parameter estimation for model (.) s d N c n = 30 35.16(0.412) 0.095 (0.001) 944.3(84.29) 3.46(0.74) 34.48(0.539) 0.092 (0.002) 948.6 (87.52) 3.27(0.87) 37.6(0.874) 0.093 (0.002) 926.1 (90.97) 3.35(0.89) n = 100 35.84 (0.494) 0.098(0.002) 1087.7(73.61) 2.89(0.53) 36.30 (0.669) 0.097(0.003) 1021.5(89.15) 3.27(0.86) 37.80(0.809) 0.095(0.003) 991.2(102.2) 3.53(0.88) n = 150 37.8(0.309) 0.104(0.002) 1286.2(99.91) 2.91(0.23) 35.08(0.705) 0.095(0.003) 979.5 (95.65) 3.12(0.57) 35.29(0.837) 0.096(0.003) 969.2(100.93) 2.95(0.78) n = 200 36.67 (0.445) 0.101(0.001) 1208.6(95.170) 3.25(0.45) 34.13(0.811) 0.09(0.003) 873.4(92.37) 2.98(0.54) 34.47(0.939) 0.092 (0.003) 897.3(98.89) 3.51(0.77)

λ 0.489(0.025) 0.479(0.033) 0.477(0.035) 0.493(0.031) 0.523(0.051) 0.513(0.057) 0.493(0.0285) 0.488(0.031) 0.533(0.056) 0.517(0.078) 0.498(0.087) 0.522(0.089)

3.2. Real Data In the study of HIV dynamics, extensive clinical data have been collected from many clinical trials. Usually the logarithm transformation is applied to the measurements of load of virus before the dynamic model is fitted. Here we investigate the time series of the load of virus and T cells from two AIDS patients. These time series have also been studied by several other authors, e.g., see Liang and Wu (2008) and Fang et al. (2011). This study measured HIV viral load at hours 0, 1, 2, 3, 4, 6, 8, 10, 12, 14, 16, 18, 20, 24, 28, 32, 40, 46, 52, 58, 64, 70, 144, 240, and 336 during the first two weeks of treatment, and then at weeks 3, 4, 6, 8, 10, 12, 14, 16, 20, 24, 28, 32, 36, 40, 44 and 48 during treatment. At most weekly clinical visits, total CD4 T cell counts were also measured. Here we investigate if there is better choice than logarithm transformation for these two patients. Also we confine ourself in Box-Cox transformations family which is flexible enough for this problem. For these two patients, if the logarithm transformation is adopted, then the integrated squared errors (ISE) defined in (.) turns out to be 1835.5 and 2684.3 respectively. If we do not assume the logarithm transformation, then the estimates of transformation parameter turns out to be λˆ 1 = 0.215, λˆ 2 = 0.324. The corresponding ISE are 1667.6 and 2085.4 respectively. Here in order to compare these ISE, we have applied the inverse Box-Cox transformation to the original ISEs calculated directly from (.) so that these ISE’s are in the same scale. It can be seen that the model fit can be improved by choosing an appropriate transformation parameter, especially for the patient 2. So though logarithm transformation is reasonable generally, for a given patients we can find a transformation that can attain a better model fit than logarithm transformation. 4. Proofs The following conditions are used in the proofs. (A1). The random error terms ελ ,i ’s are independently identically distributed with Eελ ,i = 0, Var(ελ ,i ) = σλ2 < +∞. (A2). The observation times 0 ≤ ti < · · · < tn ≤ 1 are deterministic and known and there exists a constant c0 > 1 such that for all n max2≤i≤n |ti − ti−1 | < cn0 holds. Furthermore there exists a constant c1 ≥ 1 such that for any interval A ⊆ [0, 1] of length |A| and all n ≥ 1 the inequality n1 ∑ni=1 I[ti ∈A] ≤ c1 max(|A|, 1n ) holds. Here I[t∈A] denotes the indictor function of set A. (A3). The set Λ is a compact subset of R; the parameter set Θ for all λ ∈ Λ is the same and is a compact subset of Rp. (A4). For all parameter values θ ∈ Θ, the solution x(t|θ ) of ODE model (.) is uniquely defined on the interval [0,1] and is a Cα [0, 1] function of t for some positive integer α. 6

(A5). The kernel function K(·) is symmetric and twice continuously differentiable, it has support within [-1,1] and R1 R1 l it satisfies the integrability conditions: −1 K(u)du = 1, and −1 u K(u)du = 0 for l = 1, · · · , α − 1. If α = 1, only the first of the two integrability conditions is required. (A6). The weight function w(·) is nonnegative function that is continuously differentiable, supported on the interval (δ , 1 − δ ) for some fixed number δ such that 0 < δ < 1/2 and is such that the Lebesgue measure of the set {t : w(t) > 0} is positive. (A7). The second derivatives F¨θ θ , F¨θ x ,F¨xx of the mapping F : R × Θ → R are continuous; B(y|λ ) is continuously differentiable with respect to y and λ on R × Λ. 0 (A7 ). The third derivatives of F(x, θ ) with respect to x and θ are continuous on R × Θ; The third derivatives of B(y|λ ) with respect to y and λ are continuous on R × Λ. Proof of Theorem 1. (i) By the uniform identifiability condition (.), ∀ε > 0 there exists η > 0 which does not depend on λ such that       P sup k θˆ (λ ) − θ (λ ) k> ε ≤ P inf M(θ (λ ), λ ) − M(θˆ (λ ), λ ) < −η (.) λ ∈Λ



λ ∈Λ

   ˆ = P sup M(θ (λ ), λ ) − M(θ (λ ), λ ) > η .

Note that

λ ∈Λ

Mn (θˆ (λ ), λ ) ≤ Mn (θ (λ ), λ ) = M(θ (λ ), λ ) + u(λ ).

(.)

(.)

The last term u(λ ) in the right-hand side tends to zero uniformly with respect to λ by (.) . Substitution of (.) into (.) yields     ˆ ˆ ˆ P sup k θ (λ ) − θ (λ ) k> ε ≤ P sup |Mn (θ (λ ), λ ) − M(θ (λ ), λ )| > η/2 λ ∈Λ

λ ∈Λ



 +P sup |u(λ )| > η/2 λ ∈Λ

(.)

  ≤ P sup sup |M(θ , λ ) − Mn (θ , λ )| > η/2 + o p (1) → 0, λ ∈Λ θ ∈Θ

as n tends to +∞ by (.), i.e.,

p sup k θˆ (λ ) − θ (λ ) k→ 0.

(ii) By the definition of θˆ (λ ) we have

λ ∈Λ

M˙ n |θ =θˆ (λ ) = 0. Differentiating both sides of (.) with respect to λ results in  −1  θ˙ˆ (λ ) = − (M¨ n )θ θ (θˆ (λ ), λ ) (M¨ n )θ ,λ (θˆ (λ ), λ ) , −(M¨ n )−1 (M¨ n )θ λ . θθ

(.)

(.)

Similarly for θ (λ ) we have

 −1  − M¨ θ θ (θ (λ ), λ ) M¨ θ λ (θ (λ ), λ ) ¨ , −M¨ θ−1 θ Mθ λ .

θ˙ (λ ) =

(.)

Note that sup k (M¨ n )θ θ − M¨ θ θ k

λ ∈Λ



sup k (M¨ n )θ θ (θˆ (λ ), λ ) − M¨ θ θ (θˆ (λ ), λ ) k

λ ∈Λ

+ sup k M¨ θ θ (θˆ (λ ), λ ) − M¨ θ θ (θ (λ ), λ ) k . λ ∈Λ

7

(.)

By (.) the first term in the right-hand side converges to zero uniformly as n tends to infinity. As for the second term consider the (i, j)th entries of the involved matrixes. By condition (a), there exists a constant C such that   p ... ¨ Mθ θ (θˆ (λ ), λ ) − M¨ θ θ (θ (λ ), λ ) i j = ∑ M θi θ j θk (ξi jk , λ )(θˆk (λ ) − θk (λ )) k=1 ... (.) ≤ p max M θi θ j θk (ξi jk , λ ) k θˆ (λ ) − θ (λ ) k≤ C k θˆ (λ ) − θ (λ ) k, i, j,k

which converges to zero uniformly with respect to λ ∈ Λ according to (i). And so we have p sup k (M¨ n )θ θ (θˆ (λ ), λ ) − M¨ θ θ (θ (λ ), λ ) k→ 0,

(.)

λ ∈Λ

which combined with (.) and condition (b) yields p ˆ ¨ −1 sup k (M¨ n )−1 θ θ (θ (λ ), λ ) − Mθ θ (θ (λ ), λ ) k→ 0.

(.)

λ ∈Λ

Similarly to (.) we have p

sup k (M¨ n )θ λ (θˆ (λ ), λ ) − M¨ θ λ (θ (λ ), λ ) k→ 0.

(.)

λ ∈Λ

Then (.), (.) and the compactness of Λ together imply that sup k θ˙ˆ (λ ) − θ˙ (λ ) k

λ ∈Λ

=

p ¨ ¨ −1 ¨ sup k M¨ θ−1 θ Mθ λ − (Mn )θ θ (Mn )θ λ k→ 0.

λ ∈Λ

(.)

This completes the proof of Theorem 1.  Proof of Theorem 2. In order to prove Theorem 2 we only need to verify that the conditions in Theorem 1 are satisfied in present setting, i.e., conditions (.), (.) and (.). The verification is similar to the arguments involved in the proof of Theorem 3 and Lemma 2 in the following. The difference is that in present situation we need to verify that the convergence is uniform with respect to θ and λ while in Theorem 3 and Lemma 2 we need to show the convergence is uniform only with respect to λ . The proof is therefore omitted. The following Lemma 1 and 2, Theorem 3 correspond to the Corollary 3.1 and Proposition 3.2. in Gugushvili and Klaassen (2012) respectively. The proof of Lemma 1 is the same as those in Corollary 3.1 and so it is omitted. As for the proof of Theorem 3 and Lemma 2, we will adopt a technique similar to those from proof of Proposition 3.2 in Gugushvili and Klaassen (2012). Lemma 1. Let x(t|λ ) be α ≥ 2 times continuously differentiable, then under the conditions of Theorem 2 we have ! r 1 log n α sup sup |x(t|λ ˆ ) − x(t|λ )| = O p b + 2 + , nb nb λ ∈Λ t∈[δ ,1−δ ]

sup

α−1

sup

λ ∈Λ t∈[δ ,1−δ ]

ˆ˙ |x(t|λ ) − x(t|λ ˙ )| = O p b

1 + 3+ nb

r

log n nb3

!

.

Lemma 2. Under the conditions (A1)∼(A7) we have sup

Z 1

λ ∈Λ 0

sup

Z 1

λ ∈Λ 0

2 p F(x(t|λ ˆ ), θ (λ )) − F(x(t|λ ˆ ), θˆ (λ )) w(t)dt → 0, p

[F(x(t|λ ), θ (λ )) − F(x(t|λ ˆ ), θ (λ ))]2 w(t)dt → 0. 8

(.) (.)

Proof. Only the proof of (.) is given; (.) can be proved similarly. Define Z 1

U(λ ) =

0

2 F(x(t|λ ˆ ), θ (λ )) − F(x(t|λ ˆ ), θˆ (λ )) w(t)dt,

(.)

then U(λ ) can be regarded as a random element which takes values in C(Λ), the space of all continuous function defined on Λ. In order to prove (.), we will verify that the condition of Theorem 1 in Andrews (1992) are fulfilled. First we note that Λ is totally bounded due to the compactness of Λ. Second for a given λ ∈ Λ, U(λ ) converges to zero by Proposition 1 and the continuity of F(x, θ ) on R × Θ. The last condition that we need to verify is that U(λ ) is equicontinuous on Λ. To this end define ( ) G=

sup

t∈[δ ,1−δ ],λ ∈Λ

|x(t|λ ˆ ) − x(t|λ )| ≤ γ

(.)

for some γ > 0, then we have for any positive ε and any partition Λ1 , · · · , Λm of Λ ! P sup sup |U(λ1 ) −U(λ2 )| ≥ ε l λ1 ,λ2 ∈Λl

!

=

P sup sup |U(λ1 ) −U(λ2 )| ≥ ε; G + P sup sup |U(λ1 ) −U(λ2 )| ≥ ε; G l λ1 ,λ2 ∈Λl

c

l λ1 ,λ2 ∈Λl

For the second term in the right-hand side of (.), we have from Lemma 1, ! P sup sup |U(λ1 ) −U(λ2 )| ≥ ε; Gc l λ1 ,λ2 ∈Λl

!

.

p

≤ P(Gc ) → 0.

(.)

(.)

For the first term in the right-hand side of (.) we have Z 1  ˆ F(x(t|λ ˆ ˆ |U(λ1 ) −U(λ2 )| ≤ 2 1 ), θ (λ1 )) − F(x(t|λ 1 ), θ (λ1 )) 0

o1/2 2 ˆ −F(x(t|λ ˆ ˆ 2 ), θ (λ2 )) + F(x(t|λ 2 ), θ (λ2 )) w(t)dt Z 1  2 ˆ F(x(t|λ ˆ ˆ × 1 ), θ (λ1 )) − F(x(t|λ 1 ), θ (λ1 )) w(t)dt 0

+

Z 1 0

2 ˆ F(x(t|λ ˆ ˆ 2 ), θ (λ2 )) − F(x(t|λ 2 ), θ (λ2 )) w(t)dt

1/2 1/2 2Q1 Q2 .

,

1/2

(.)

As for Q1 consider only the sample point in G, Q1



Z 1 0

2 [F(x(t|λ ˆ ˆ 1 ), θ (λ1 )) − F(x(t|λ 2 ), θ (λ2 ))] w(t)dt

Z 1

2 ˆ ˆ F(x(t|λ ˆ ˆ 1 ), θ (λ1 )) − F(x(t|λ 2 ), θ (λ2 )) w(t)dt 0 !2 Z 1 ∂ x ˆ dθ = F˙x (λ1 − λ2 )2 w(t)dt + F˙θ ∂ λ λ =ξ1 dλ λ =ξ1 0 2 Z 1 ˆ ∂ xˆ dθ  (λ1 − λ2 )2 w(t)dt + F˙x + F˙θ ∂ λ λ =ξ2 dλ 0 +

λ =ξ2

≤ C12 (Λ, γ)(λ1 − λ2 )2 ,

9

(.)

where the last inequality follows from the compactness of Λ, the continuity of θˆ (λ ) and θ (λ ) on Λ and the condition (A4). Here    !2  2  Z 1  ∂ xˆ ˆ ∂ x ˆ dθ d θ  w(t)dt .  F˙x + F˙θ + F˙x + F˙θ (.) C12 (Λ, γ) = sup  ∂λ dλ ∂λ dλ λ ∈Λ  0 Similarly for Q2 there exists a constant C2 (Λ, γ) such that Q2 ≤ C22 (Λ, γ) which combined with (.)-(.) yield ! ! P sup sup |U(λ1 ) −U(λ2 )| ≥ ε; G l λ1 ,λ2 ∈Λl

≤ P sup sup C1 (Λ, γ)C2 (Λ, γ)|λ1 − λ2 | > ε . l λ1 ,λ2 ∈Λl

If we take the partition Λ1 , · · · , Λm such as 0 < diam(Λl ) <

(.)

ε C1 (Λ, γ)C2 (Λ, γ)

holds for all l = 1, · · · , m, then the probability in (.) is zero and so P sup sup |U(λ1 ) −U(λ2 )| ≥ ε l λ1 ,λ2 ∈Λl

!

< ε,

i.e., U(λ ) is stochastic equicontinuous on Λ. From the Theorem 1 of Andrews (1992) we have proved (.). Proof of Theorem 3. Define the asymptotic criterion function to be Mλ (θ (λ )) =

Z 1−δ δ

[x(t|λ ˙ ) − F(x(t|λ ), θ (λ ))]2 w(t)dt.

From the definition of θ (λ ) we know that θ (λ0 ) = θ0 and so Mλ0 (θ (λ0 )) = 0, i.e., λ0 is the minimum point of Mλ (θ (λ )). Now we only need to show that p sup Mn,λ (θˆ (λ )) − Mλ (θ (λ )) → 0. (.) λ ∈Λ

Note that

Mn,λ (θˆ (λ )) − Mλ (θ (λ ))

s  s Z 1   Z 1 2 ˆ˙ x(t|λ ) − F(x(t|λ ˆ ), θˆ (λ )) w(t)dt + [x(t|λ ˙ ) − F(x(t|λ ), θ (λ ))]2 w(t)dt ≤   0 0 Z

1

2 ˆ˙ × x(t|λ ) − x(t|λ ˙ ) − F(x(t|λ ˆ ), θˆ (λ )) + F(x(t|λ ), θ (λ )) w(t)dt 0 √ √ p , ( T1 + T2 ) T3 .

1/2

(.)

For T2 from the compactness of Λ, the continuity of θ (λ ) and F(x, θ ), we have sup

Z 1

λ ∈Λ 0

[x(t|λ ˙ ) − F(x(t|λ ), θ (λ ))]2 w(t)dt = O p (1).

As for T1 , we have by Chauchy-Schwarz inequality, Z 1 0

2 ˆ˙ x(t|λ ) − F(x(t|λ ˆ ), θˆ (λ )) w(t)dt ≤ 10

(.)

4

+

Z 1 0

Z

0

Z 1 2 ˆ˙ x(t|λ ) − x(t|λ ˙ ) w(t)dt + [x(t|λ ˙ ) − F(x(t|λ ), θ (λ ))]2 w(t)dt

1

0

[F(x(t|λ ), θ (λ )) − F(x(t|λ ˆ ), θ (λ ))]2 w(t)dt +

 2 F(x(t|λ ˆ ), θ (λ )) − F(x(t|λ ˆ ), θˆ (λ )) w(t)dt.

Z 1 0

For the first term in the bracket on the right-hand side we have sup

Z 1

λ ∈Λ 0

2 ˆ˙ x(t|λ ) − x(t|λ ˙ ) w(t)dt ≤

sup t∈[δ ,1−δ ],λ ∈Λ

ˆ˙ |x(t|λ ) − x(t|λ ˙ )|2

Z 1 0

p

w(t)dt → 0,

which combined with Lemma 2 and (.) yields sup

Z 1

λ ∈Λ 0

As for T3 we have T3

2 ˆ˙ x(t|λ ) − F(x(t|λ ˆ ), θˆ (λ )) w(t)dt = O p (1).

≤ 2

Z 1 0

+2

(.)

ˆ˙ [x(t|λ ) − x(t|λ ˙ )]2 w(t)dt

Z 1 0

[F(x(t|λ ˆ ), θˆ (λ )) − F(x(t|λ ), θ (λ ))]2 w(t)dt.

The superior of the first term in the right-hand side over λ ∈ Λ has been shown to be o p (1). As for the second term it can also be easily deduced from Lemma 2 that sup

Z 1

λ ∈Λ 0

[F(x(t|λ ˆ ), θˆ (λ )) − F(x(t|λ ), θ (λ ))]2 w(t)dt = o p (1).

(.)

So we have T3 = o p (1) which combined with (.), (.) and (.) then yield the desired result (.). This completes the proof of Theorem 3.  References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14]

Andrews, D. W. K. Generic uniform convergence. Economet Theor, 1992, 8: 241-257. Brunel, N. J. B. Parameter estimation of ODE’s via nonparametric estimators. Electron. J. Stat, 2008, 2: 1242-1267. Cao, J. and Ramsay, J. O. Parameter cascades and profiling in functional data analysis. Computation Stat, 2007, 22: 335-351. Fang, Y., Wu, H. L., Zhu, L. X. A two-stage estimation method for random coefficient differential equation models with application to longitudinal HIV dynamic data Stat.√Sin, 2011, 21: 1145-1170. Gugushvili, S., Klaassen, C. A. J. n−consistent parameter estimation for systems of ordinary differential equation: bypassing numerical integration via smoothing. Bernoulli, 2012, 18: 1061-1098. Liang, H., Wu, H. L. Parameter estimation for differential equation models using a framework of measurement error in regression models. J. Am. Stat. Assoc, 2008, 103: 1570-1583. McMurry, T. L., Politis, D. N. Nonparametric regression with infinite order flat-top kernels. J. Nonparametric. Stat., 2004, 16: 549-562. Nowak, M. A., May, R. M. Virus Dynamics: Mathematical Principles of Immunology and Virology, Oxford: Oxford University Press, 2000. Perelson, A. S., Nelson, P. W. Mathematical analysis of HIV-I dynamics in vivo. SIAM Review, 1999, 41: 3-44. Ramsay, J. O., Hook, G., Campbell. and Cao, J. Parameter estimation for differential equation: a generalized smoothing approach. J. R. Stat. Soc. B, 2007, 69: 741-796. Robeva, R. S., Kirkwood, J. R., Davies, R. L., Farhy, L. S., Johnson, M. L., Kovatchev, B. P., Straume, M. An Invitation to Biomathematics, London, Elsevier Inc, 2008. van der Vaart. Asymptotic Statistics, London, Cambridge University Press, 1998. Varah, J. M. A spline least squares method for numerical parameter estimation in differential equation. SIAM J. Sci. Stat. Comput, 1982, 3: 28-46. Wu, H. L., Lu, T., Liang, H. Sparse additive ODEs for dynamic gene regulatory network . J. Am. Stat. Assoc, 2014,109(506): 343-361.

11