Conditional maximum likelihood estimation for a class of observation-driven time series models for count data

Conditional maximum likelihood estimation for a class of observation-driven time series models for count data

Statistics and Probability Letters 123 (2017) 193–201 Contents lists available at ScienceDirect Statistics and Probability Letters journal homepage:...

470KB Sizes 0 Downloads 109 Views

Statistics and Probability Letters 123 (2017) 193–201

Contents lists available at ScienceDirect

Statistics and Probability Letters journal homepage: www.elsevier.com/locate/stapro

Conditional maximum likelihood estimation for a class of observation-driven time series models for count data Yunwei Cui a,∗ , Qi Zheng b a

Department of Mathematics, Towson University, Towson, MD 21252, USA

b

Department of Bioinformatics & Biostatistics, University of Louisville, Louisville, KY, 40202, USA

article

info

Article history: Received 8 January 2016 Received in revised form 31 October 2016 Accepted 1 November 2016 Available online 28 December 2016 Keywords: Observation-driven models One-parameter exponential family INGARCH(p, q) models Time series of counts

abstract This paper investigates the statistical inference for a class of observation-driven time series models of count data based on the conditional maximum likelihood estimator (CMLE), where the conditional distribution of the observed count given a state process is from the one-parameter exponential family. Under certain regularity conditions, the strong consistency and asymptotic normality of the CMLE of the misspecified likelihood function are established. © 2016 Elsevier B.V. All rights reserved.

1. Introduction This paper concerns observation-driven time series models for count data. Count time series have received considerable attention recently in the literature. One line of research on count time series employs generalized linear models, where serial dependence among the observed counts is introduced through a state process. Conditioned on the state process, the distribution of the observed count is specified; and the state process often takes the form of the conditional expectation. If the state variable is determined by the history of the observed counts and states, then the model is characterized as ‘‘observation-driven’’. When analyzing count time series, most of the studies in the literature specify a model with Poisson deviates (i.e., given the state process, the observed count follows a Poisson distribution). However, empirical evidences indicate that in many situations, a count time series is better modeled with non-Poisson deviates; see, for example, Davis and Wu (2009) and Zhu (2011). In this paper we consider observation-driven models with deviates from the one-parameter exponential family, which includes Poisson distribution, binomial distribution (with known number of trials), and negative binomial distribution (with known number of failures). The same setting was first explored by Davis and Liu (2016). Compared with the work by Davis and Liu (2016), we study the case in a general setting when the past history of the observed counts and states is unobservable and initial values need to be chosen to calculate an approximation of the likelihood function. Under certain regularity conditions, we show that the choice of the initial values does not affect the asymptotic property of such an approximation and we derive the strong consistency and asymptotic normality of the CMLE. The remainder of the paper is organized as follows. In Section 2, we formulate the class of time series models to be studied. In Section 3, we study the asymptotic properties of the CMLE of the model parameters under the condition that the past history of the observed counts and states is unknown. We investigate the applicability of the results in Section 4 where



Corresponding author. E-mail addresses: [email protected] (Y. Cui), [email protected] (Q. Zheng).

http://dx.doi.org/10.1016/j.spl.2016.11.002 0167-7152/© 2016 Elsevier B.V. All rights reserved.

194

Y. Cui, Q. Zheng / Statistics and Probability Letters 123 (2017) 193–201

INGARCH(p, q) models with negative binomial deviates are verified to satisfy the regularity conditions and thus the main results of this paper are applicable. All technical proofs are given in the Appendix. 2. Model formulation We consider observation-driven time series models for count data with deviates from the one-parameter exponential family. Assume that a variable Y belongs to the one-parameter exponential family with its probability density function written in the form: p(y|η) = eηy−A(η) h(y), where η is the natural parameter, and A(·) and h(·) are known functions. It is well known that E(Y ) = A′ (η) and Var(Y ) = A′′ (η). The family includes the distributions that are commonly used for modeling count data. Let B(·) = A′ (·). For count data, we have E(Y ) > 0 and Var(Y ) > 0, so both A(·) and B(·) are strictly increasing functions. Moreover, A(·) is convex. Suppose {Yt } is a sequence of count variables following a distribution of the one-parameter exponential family. Defining σ -field Ft = σ {(Yt , λt ), (Yt −1 , λt −1 ), . . .} and the conditional mean λt = E(Yt |Ft −1 ), we specify a model for the count time series {Yt } as follows: Yt | Ft −1 ∼ p(y | ηt ),

t ∈ Z,

ηt = B (λt ), λt = fθ (Yt −1 , . . . , λt −1 , . . .) , −1

(1)

where fθ (·) is a measurable non-negative function which is known up to a parameter vector θ ∈ Θ to be estimated and Θ ⊂ Rd is a compact set. The natural parameter ηt can be determined recursively via the last two equations. For the Poisson · where n is the known total number of trials; and distribution, B−1 (·) = ln(·); for the binomial distribution, B−1 (·) = ln n−· · −1 for the negative binomial distribution, B (·) = ln r +· where r is the known number of failures. Our model specification is in line with Doukhan et al. (2012) and generalizes the models studied in Davis and Liu (2016), in which Yt |Ft −1 ∼ p(y | ηt ),

λt = gθ (λt −1 , Yt −1 ) .

(2)

3. Asymptotic properties of CMLE To ensure the stability of model (1), various Lipschitz-type conditions are imposed on fθ (·). Interested readers are referred to Doukhan et al. (2012, 2013) and Davis and Liu (2016) for details. We assume that model (1) possesses the following stability property:

(S) there exists a unique solution of (1), {(Yt , λt )}, which is a strictly stationary and ergodic process and E (Yt ) < ∞. The strong consistency and asymptotic normality of CMLE with respect to the model (2) has been established by Davis and Liu (2016), which are based on a strong assumption that there is enough information of the past history such that the ergodic conditional mean process can be computed correctly. However in real life situations, due to the lack of information about previous counts and state process, approximations of the conditional mean process are adopted to derive the likelihood function in order to perform the maximum likelihood estimation procedure. In such a case, people are working with a misspecified likelihood function and results about the strong consistency and asymptotic normality of the CMLE for the misspecified likelihood function are desirable. Accordingly, in this paper we study these asymptotic properties of the CMLE of the misspecified likelihood functions in the setting of model (1), which includes many of the widely discussed models in the integer-valued time series literature, including INGARCH(p, q) models (Ferland et al., 2006). ˆ t )}0t =−∞ denote a set of arbitrarily Suppose we are given a stationary count series {Y1 , . . . , Yn } of (1). Let Hˆ 1 = {(Yˆt , λ chosen values for the history of the observed counts and states, which are otherwise unavailable in practice. By recursively ˆ t (θ), 1 ≤ t ≤ n) according to fθ (·) given in updating the conditional mean process based on Hˆ 1 (henceforth denoted by λ (1), the misspecified conditional log-likelihood function (up to a constant function of the observations) is given by Lˆ n (θ) =

n  t =1

ℓˆ t (θ) =

n  {ηˆ t (θ)Yt − A(ηˆ t (θ))},

(3)

t =1

ˆ t (θ)). Let θ 0 = (θ01 , . . . , θ0d )T ∈ Rd be the true value of the parameter θ . The CMLE of θ 0 with respect to where ηˆ t (θ) = η(λ Lˆ n (θ) is defined as θˆ n = argmaxθ∈Θ (Lˆ n (θ)). Some of our regularity conditions are related to a concept used in Straumann and Mikosch (2006): a sequence {νt }t ∈Z of random elements with values in a normed vector space (B, ∥·∥) is said to converge to zero exponentially fast a.s. when e.a.s.

a.s.

t → ∞, denoted by νt → 0, if there exists γ > 1 with γ t ∥νt ∥ → 0. Given a sequence of random variables {νt }t ∈Z , a e.a.s.

sufficient condition for it to be νt → 0 is:

Y. Cui, Q. Zheng / Statistics and Probability Letters 123 (2017) 193–201

195

(E) There exist some 0 < ρ < 1 and an integrable random variable ψ , E (|ψ|) < ∞, such that |νt | < ρ t |ψ|, for all t ≥ 1. ˆ t (θ), it can be shown that the strong consistency Although Lˆ n (θ) is based on the approximation value of the state variable, λ property of θˆ n holds under the following conditions: (A0) The parameter space Θ ⊂ Rd is compact and includes θ 0 as an interior point; (A1) Let {(Yt , λt )} be the strictly stationary and ergodic solution to (1). There exists a measurable function fθ∞ : N∞ = 0 {(n1 , n2 , . . .) , ni ∈ N0 , i = 1, 2, . . .} → [0, ∞) so that λt = fθ∞ ( Yt −1 , Yt −2 , . . .) almost surely; and for any θ ∈ Θ , 0 λt (θ) ≡ fθ∞ (Yt −1 , Yt −2 , . . .) ≥ λL ∈ R(B), where λL > 0 and R(B) is the range of B(η); ∞ (A2) For any y ∈ N∞ 0 the mapping θ → fθ (y) is continuous; (A3) E (Y1 supθ∈Θ B−1 (fθ∞ (Y0 , Y−1 , . . .))) < ∞; (A4) If λt (θ 0 ) = (θ ′ ) almost surely for t, then θ 0 = θ ′ ;  λt∞  certain 2 ′ (A5) infθ∈Θ B η(fθ (Yt −1 , Yt −2 , . . .)) ≥ σL > 0; (A6) E(supθ∈Θ λt (θ)) < ∞; e.a.s. (A7) Hˆ 1 is selected such that supθ∈Θ |λˆ t (θ) − λt (θ)| → 0. Assumptions (A0)–(A4) are adapted from Davis and Liu (2016). Since in real life the past history of counts and state ˆ t (θ) process is unknown, three more assumptions, (A5)–(A7), are added to make sure the error of approximating λt (θ) by λ decreases exponentially fast so that Lˆ n (θ)/n is asymptotically the same as Ln (θ)/n, where Ln (θ) =

n 

ℓt (θ) =

t =1

n 

{ηt (θ)Yt − A(ηt (θ))} ,

t =1

with ηt (θ) = η(λt (θ)) = B−1 (λt (θ)) and λt (θ) being the ergodic process given in (A1). Theorem 1. Assume model (1) satisfies (S) and the observed count series {Y1 , . . . , Yn } is a sample path of the strictly stationary and ergodic solution. Then, under the conditions (A0)–(A7), θˆ n is a strongly consistent estimator of θ 0 , a.s. θˆ n → θ 0 ,

as n → ∞.

Davis and Liu (2016) also established the asymptotic normality for θ˜ n = arg maxθ∈Θ (Ln (θ)) of model (2) and it extends to the  infiniteorder model (1). We will borrow the ideas given in Lemma 7.4 in Straumann and Mikosch (2006) to prove a.s. → 0 to develop the asymptotic normality of θˆ n based on Davis and Liu’s result. In the sequel, let θi denote the ith component of the parameter vector, i.e., i ∈ {1, . . . , d}, θi ∈ θ . To prove the asymptotic normality of θˆ n , we need the following additional assumptions:  2   1 (θ 0 ) < ∞, for all i ∈ {1, . . . , d}; The mapping θ → fθ∞ is twice continuously differentiable in (A8) E B′ (η1 (θ 0 )) ∂η∂θ i

n1/2 θ˜ n − θˆ n

Θ;

  2    2    ∂ ℓ (θ)   ∂ ℓt (θ 0 ) < ∞ , E supθ∈Θ  ∂θ t∂θ  < ∞, for all i, j ∈ {1, . . . , d}; E is (A9) On the parameter space Θ , E supθ∈Θ  ∂λ∂θt (θ)  T ∂θ∂θ i i j invertible;

(A10) Hˆ 1 is selected such that     1 1  e.a.s.    → 0; sup  ′ − ′   B (η (θ)) B η ˆ (θ) θ∈Θ t t    λ (θ) λˆ t (θ)  e.a.s. t    → 0; − ′ sup  ′ B ηˆ t (θ)  θ∈Θ  B (ηt (θ))    ∂λ (θ) ∂ λˆ (θ)  t  t  e.a.s. − sup   → 0, for all i ∈ {1, . . . , d} .  ∂θi ∂θi  θ∈Θ Combining all the regularity conditions presented above, we have the following result: Theorem 2. Assume model (1) satisfies (S) and the observed count series {Y1 , . . . , Yn } is a sample path of the strictly stationary and ergodic solution. Then, under the conditions (A0)–(A10), θˆ n is asymptotically normal,

√ 

n θˆ n − θ 0





  D → N 0 , Ω −1 ,

where Ω = E B′ (ηt (θ 0 ))

∂ηt (θ 0 ) ∂ηt (θ 0 ) ∂θ ∂θ T

as n → ∞,



.

196

Y. Cui, Q. Zheng / Statistics and Probability Letters 123 (2017) 193–201

Remark 1. The proof of Theorem 1 uses the ergodic theorem for extended real-valued random variables, which allows an ergodic and strictly stationary sequence of variables to admit an expectation in R ∪ {+∞} or R ∪ {−∞}. The method was used in Francq and Zakoïan (2010) and Davis and Liu (2016). Also it is worth noting that, for the model (2), only giving the true value of λ1 is not enough to define the ergodic process required by (A1). The conditional mean process recursively calculated based on the true value of λ1 is non-stationary in general unless the recursion is performed with θ = θ 0 . Therefore, the ergodic theorem is not directly applicable and one must deal with the misspecified likelihood function. 4. Examples In this section we give examples of count model (1) that satisfy (S) and (A1) through (A10). We first establish a result regarding condition (S), and then investigate INGARCH(p, q) models to illustrate the wide applicability of Theorems 1 and 2 to existing models. 4.1. Models satisfying condition (S) It is well-known that, for observation-driven models, the probabilistic properties such as stationarity or ergodicity are in general difficult to establish. Doukhan and Louhichi (1999) first proposed the concept of weak dependence to circumvent similar difficulties arising in other fields. Doukhan et al. (2012, 2013) successfully applied the technique to prove the existence of a unique stationary and ergodic solution to some observation-driven models with Poisson deviates. We adapt the method of Doukhan et al. (2012, 2013) and prove that model (1) with either of the following two specifications is τ weakly dependent and satisfies the stability condition (S):

(C1) The conditional mean process λt depends on Yt −1 and λt −1 explicitly, λt =fθ (Yt −1 , λt −1 ), and  there exist two non negative numbers a and b with a + b < 1, such that for any (Yt , λt ) and Yt′ , λ′t in N0 ×(0, ∞): fθ (Yt , λt )− fθ (Yt′ , λ′t ) ≤     a λt − λ′t  + b Yt − Yt′ . (C2) The conditional mean process λt assumes an infinite order model, λt = fθ ((Yt −1 , λt −    1 ) . . .). For any Xt = (Xt , Xt −1 , . . .) and Xt′ = Xt′ , Xt′−1 , . . . in (N0 × (0, ∞))N where Xt = (Yt , λt ) and Xt′ = Yt′ , λ′t  in N0 × (0 , ∞), ∞ ∞ ′   there exists a non-negative sequence {αj } with j=1 αj < 1, such that: fθ (Xt ) − fθ (Xt′ ) ≤ j=1 αj Yt −j − Yt −j . The two preceding specifications with Poisson deviates were discussed in Doukhan et al. (2013). Please read Doukhan et al. (2012) and references therein for details about τ -weakly dependent process. Proposition 1. For the time series model (1), suppose either (C1) or (C2) is true, then (i) there exists a unique stationary and ergodic solution {(Yt , λt )} with E(Yt ) < ∞ and E(λt ) < ∞; (ii) {(Yt , λt )} is τ -weakly dependent; (iii) there exists a measurable function f : N∞ 0 = {(n1 , n2 , . . .) , ni ∈ N0 , i = 1, 2, . . .} → [0, ∞) such that the conditional mean of the strictly stationary and ergodic solution to (1) can be written as λt = f (Yt −1 , Yt −2 , . . .) almost surely; (iv) if in addition, the higher moments of Yt given λt have a polynomial expression in terms of λt : E Ytm |Ft −1 =





m 

βm,j λjt ,

(4)

j=0

where βm,m ≤ 1 for any m, then Yt has finite moments of any order. j

Remark 2. If Yt |Ft −1 ∼ Poisson(λt ), then E Ytm |Ft −1 = j=0 S (m, j)λt , where S (m, j) is the Stirling’s number of the second kind with S (0, 0) = S (m, 1) = S (m, m) = 1. So by Proposition 1, models satisfying (C1) or (C2) with Poisson distribution have finite moments of any order, which have been studied by many authors, including Ferland et al. (2006), Fokianos et al. (2009), and Doukhan et al. (2012).





m

Remark 3. Besides the βm,j in (4) for the Poisson distribution, we also can find out the βm,j in (4) for the binomial and negative binomial distributions by a standard moment generating function technique and mathematical induction. Let Y be min(m,n) S (m,j)n! j λ. a binomial random variable, Y ∼ Bino(n, p) with λ = E(Y ), then mth moment E(Y m ) is given by E(Y m ) = j =0 (n−j)!nj

Since the leading coefficient βm,m is less than one for any m, models satisfying (C1) or (C2) with binomial deviates have finite moments of any order. But for a negative binomial random variable, the property does not hold true. To be specific, suppose Y ∼ NB(r , p) with known r, namely p(Y = x) =





x+r −1 r −1

(1 − p)x pr ,

x ≥ 0.

Y. Cui, Q. Zheng / Statistics and Probability Letters 123 (2017) 193–201

197

Then, the mth moment of Y can be expressed as E(Y m ) =

m  (r + j − 1)! S (m, j)λj , j −1 r ! r j =0

where λ = E(Y ). Since the leading coefficient goes to infinity as m increases, the existence of finite moments of any order of model (1) with negative binomial deviates is not guaranteed. In fact, it can be shown that for the negative binomial INGARCH(p, q) process, q ̸= 0, no matter what parameter values are assumed, there exist orders of which the moments are infinite; the proof is available from the authors upon request. 4.2. INGARCH(p, q) models with negative binomial deviates Model (1) is called an INGARCH(p, q) process if

λt = δ +

p 

αi λt −i +

i =1

q 

βj Yt −j ,

(5)

j =1

where θ = (δ, α1 , . . . , αp , β1 , . . . , βq ) ∈ Θ ⊂ Rp+q+1 , δ > 0, αi , βj ≥ 0 for i = 1, . . . , p and j = 1, . . . , q. Define p q j Aθ (z ) = 1 − i=1 αi z i and Bθ (z ) = j=1 βj z ; and by convention Aθ (z ) = 1 if p = 0 and Bθ (z ) = 0 if q = 0. Let θ 0 denote the vector of true parameters, θ 0 = (δ0 , α01 , . . . , α0p , β01 , . . . , β0q ). To ensure that the model possesses key stability properties and Theorem 1 works, we assume the following conditions:

   (R1) pi=1 α0i + qj=1 β0j < 1 and pi=1 αi < 1 for all θ ∈ Θ , and Θ is compact; (R2) If p > 0, Aθ0 (z ) and Bθ0 (z ) have no common roots, at least one β0j ̸= 0 for j = 1, . . . , q, and α0p ̸= 0 if β0q = 0. p q Davis and Liu (2016) showed that under i=1 α0i + j=1 β0j < 1, there exists a unique strictly stationary solution {Yt , λt } to (5) with E (Yt ) < ∞ and E (λt ) < ∞ for the one-parameter exponential family. The ergodic property of {Yt , λt } can be established easily, because it complies with specification (C2). Since this class of models with Poisson deviates have been extensively studied in the literature, in the following we will focus on the negative binomial case. The state variable assumes a vector representation: λt = cθt + Aθ λt −1 with λt = (λt (θ), . . . , λt −p+1 (θ))T , cθt =  (δ + qj=1 βj Yt −j , 0, . . . , 0)T and

α1

α2

1 0 Aθ =  .  ..

0 1



.. .

0

αp

··· ··· ··· .. . ···



0 0 .

..  .

1

0

p

The condition i=1 αi < 1 imposed in (R1) indicates that the spectral radius of Aθ is less than 1, ρ(Aθ ) < 1. Moreover, the compactness of Θ entails sup ρ(Aθ ) < 1.

(6)

θ∈Θ

Iterating the vector representation based on θ ∈ Θ and using E(Yt ) < ∞, we can prove

λt = cθt + Aθ cθt −1 + A2θ cθt −2 + · · · + Atθ−1 cθ1 + Atθ λ0 =

∞ 

Akθ cθt −k .

(7)

k=0

Thus, the function fθ∞ (·) required in (A1) is simply the first function in the above vector function, which is well defined due to (6). Furthermore, by similar arguments to those in the proof of Theorem 7.2 in Francq and Zakoïan (2010), we can prove that (A1)–(A2) and (A6)–(A7) are satisfied. In particular,in light of (6) and (7), (A7) holds true for any arbitrarily chosen

ˆ 0 , . . . , λˆ 1−p , . . . . non-negative Hˆ 1 = Yˆ0 , . . . , Yˆ1−q , . . . , λ

The conditions (A3) and (A5) are related to the nature of the distribution functions. According to Remark 1 in Davis and Liu (2016), one can prove that (A3) holds for the negative binomial distributions. Given (A1), a sufficient condition for (A5) to be true is that B′ (·) is an increasing function in terms of λ, which the negative binomial distribution satisfies, since the η negative binomial distribution has B′ (η) = r (1−eeη )2 = λ r +λ . r

In a recent paper, Ahmad and Francq (2016) studied Poisson quasi-likelihood estimation method (QMLE) for observationdriven time series models of counts. In addition to conditions (R1) and (R2), an assumption on the existence of higher moments, E(Yt1+ϵ ) < ∞ for some ϵ > 0, is needed in order for the strong consistency of the Poisson QMLE to hold for INGARCH(p, q) models. As noted in Ahmad and Francq (2016), the conditions ensuring existence of higher order moments are complicated for negative binomial INGARCH(p, q) models with general order p and q. Actually, little is known about

198

Y. Cui, Q. Zheng / Statistics and Probability Letters 123 (2017) 193–201

the explicit expressions in terms of the true parameters for the existence of higher order moments when p ≥ 2 or q ≥ 2. In contrast, Theorem 1 in this paper imposes less stringent regularity conditions for INGARCH(p, q) models, and regularity conditions (R1) and (R2) guarantee the strong consistent property of the CMLE. Interested readers are also referred to Christou and Fokianos (2014) for inference of negative binomial time series models based on QMLE. To establish the asymptotic normality of θˆ n for the negative binomial INGARCH(p, q) models, another regularity condition needs to be imposed:

  (R3) For the strictly stationary and ergodic solution Yt , the third moment is finite, E Yt3 < ∞. The first and second partial derivatives of λt (θ) with respect to θ assume similar representations as (7), and most importantly, there exist constants ρ ∈ (0, 1) and K > 0 such that for all i, j ∈ {1, . . . , d}, ∞ 

λt (θ) = ω0∗ (θ) +

ℓ=1

ωℓ∗ (θ)Yt −ℓ ,

∞ ∞   ∂λt (θ) ∂ 2 λt (θ) = ω0i (θ) + = ω0i,j (θ) + ωℓi (θ)Yt −ℓ and ωℓi,j (θ)Yt −ℓ , ∂θi ∂θi ∂θj ℓ=1 ℓ=1

(8)

i ,j

i ∗ ℓ ℓ ℓ where sup  θ∈Θ |ωℓ (θ)| ≤ K ρ , supθ∈Θ |ωℓ (θ)|  ≤ K ρ , and supθ∈Θ |ωℓ (θ)| ≤ K ρ . Also it holds that, for all i, j ∈ {1, . . . , d},

∂ λˆ (θ)  e.a.s.

 ∂λ (θ)

 ∂λ (θ)

∂ λˆ (θ)  e.a.s.

supθ∈Θ  ∂θt − ∂θt i  → 0, and supθ∈Θ  ∂θit∂θj − ∂θit∂θj  → 0, based on any non-negative initial values, because (E) is i satisfied (please refer to (7.41)–(7.44) of Francq and Zakoïan, 2010). In view of (8), Jensen’s inequality and (R3) give E sup |λt (θ)|2 < ∞,

E sup |λt (θ)|3 < ∞,

θ∈Θ

θ∈Θ

    2    ∂λt (θ) 3  ∂ λt (θ) 2  ∂λt (θ) 2       < ∞. < ∞, E sup  < ∞, E sup  (9) E sup  ∂θi  ∂θi  ∂θi θj  θ∈Θ θ∈Θ θ∈Θ     2  2  ∂ηt (θ) ∂λt (θ) 1 ≤ E , the first part of (A8) is indicated by (9). The second part of (A8) Since E B′ (ηt (θ 0 )) 2 ∂θi ∂θi σ L

follows from (R1) and (R2) (please see the proof of Theorem 4.2 of Berkes et al., 2003). 3 −1 −3 After some algebra, we can show B′′ (ηt (θ)) B′ (ηt (θ))−3 ≤ r (λ− + (r + λL )−3 ) and L + r λL (r + λL )

∂ 2 ℓt (θ) ∂θ∂θ

T

= − (Yt − λt (θ)) −

B′′ (ηt (θ)) ∂λt (θ) ∂λt (θ) B′ (ηt (θ))3

∂θ

∂θ T

∂λt (θ) ∂λt (θ) 1 ∂ 2 λt (θ) + Y − λ . ( (θ)) t t B′ (ηt (θ)) ∂θ ∂(θ)T B′ (ηt (θ)) ∂θ∂θ T 1

Thus, the first part of (A9) follows from (A5) and (9). Also as a trivial result, Ω = −E



∂ 2 ℓt (θ) ∂θ∂θ T



. The last part of (A9) is

deduced by (R2) (please refer to the proof of Theorem 7.2 in Francq and Zakoïan, 2010). r +λ (θ) Due to Lemma 2.1 of Straumann and Mikosch (2006), (A6), (A7), (8), and B′ (ηt (θ)) = λt (θ) rt , we can show e.a.s. supθ∈Θ B′ (ηt (θ)) − B′ ηˆ t (θ)  → 0, and in view of (A5), we get







    1 1  e.a.s.    → 0. − ′ sup  ′ B ηˆ t (θ)  θ∈Θ  B (ηt (θ)) In the same way, we can show

   λ (θ) λˆ t (θ)  e.a.s.  t   → 0. sup  ′ − ′ B ηˆ t (θ)  θ∈Θ  B (ηt (θ)) As discussed below (8), the last part of (A10) is also true. The following result is obtained. Theorem 3. Assume an INGARCH(p, q) model, with negative binomial deviates of known r, satisfies (R1)–(R3), and the observed ˆ t (θ) updates according count series {Y1 , . . . , Yn } is a sample path of the strictly Then, if λ  stationary and ergodic solution. 

ˆ 0 , . . . , λˆ 1−p , θˆ n is asymptotically normal, to (1) with arbitrarily chosen non-negative initial values Yˆ0 , . . . , Yˆ1−q ; λ  D   √  n θˆ n − θ 0 → N 0, Ω −1 , 

∂η (θ) ∂η (θ)

t t where Ω = E B′ (ηt (θ 0 )) ∂(θ) ∂(θ)T

as n → ∞,



.

Y. Cui, Q. Zheng / Statistics and Probability Letters 123 (2017) 193–201

199

Appendix Proof of Theorem 1. By the virtue of (S), (A1), (A2), and (A3), the ergodic theorem for nonintegrable processes (Exercise 7.3, Francq and Zakoïan, 2010) renders that Mn (θ) =

n 1

n t =1

ℓt (θ) → M (θ) = E (Y1 η1 (θ) − A(η1 (θ))) ,

almost surely as n → ∞.

(10)

ˆ t (θ) on ℓˆ t (θ) die down as n increases, namely Next, we show that the effects of λ 

   sup  ℓt (θ) − ℓˆ t (θ) → 0 almost surely as n → ∞, (11)   n θ∈Θ t =1 t =1     ˆ n (θ) = 0 almost surely, with Mn (θ) = 1 nt=1 ℓt (θ) and which can also be written as limn→∞ supθ∈Θ Mn (θ) − M n  ˆ n (θ) = 1 nt=1 ℓˆ t (θ). M n Due to the fact that A(η) is a strictly convex and monotonically increasing function, for any two points u, v ∈ Θ , it holds  n 1 

n 

that

  |A(u) − A(v)| ≤ max A′ (u), A′ (v) |u − v|    ≤ A′ (u) + A′ (u) − A′ (v) |u − v| . Applying the above inequality

 

 

 

  

 

 

ˆ t (θ)) − η(λt (θ)) + A(η(λt (θ))) − A(η(λˆ t (θ))) sup ℓt (θ) − ℓˆ t (θ) ≤ sup Yt η(λ θ∈Θ

θ∈Θ

         ≤ sup Yt η(λˆ t (θ)) − η(λt (θ)) + A′ (η(λt (θ))) η(λˆ t (θ)) − η(λt (θ)) θ∈Θ       + A′ (η(λt (θ))) − A′ (η(λˆ t (θ))) η(λˆ t (θ)) − η(λt (θ))          ≤ sup Yt η′ (λĎt ) λˆ t (θ) − λt (θ) + λt (θ)η′ (λĎt ) λˆ t (θ) − λt (θ) θ∈Θ       + η′ (λĎt ) λˆ t (θ) − λt (θ) λˆ t (θ) − λt (θ)  1        ˆ ≤ sup Yt + λt (θ) + λˆ t (θ) − λt (θ) λt (θ) − λt (θ) , 2 σ θ∈Θ L where the third inequality follows from the mean value theorem and (A7), and the last inequality follows from the formula for the derivative and (A5). Invoking (A7), it follows from Lemma 2.1 of Straumann and Mikosch (2006)  of inverse function 

ˆ  that Yt supθ∈Θ λ t (θ) − λt (θ) → 0 almost surely as t → ∞. Applying the same method to the remaining terms, we obtain ˆ n (θ) → M (θ) almost surely as n → ∞. (11). And as a result of (11), we also get M Under (A4), one can show M (θ) < M (θ 0 ) for any θ ̸= θ 0 . Let U0 be a local base of θ 0 and U ⊂ U0 be a neighborhood of θ 0 . Then, in view of (11)  

 

ˆ n (θ) ≤ lim sup sup Mn (θ) + lim sup sup M ˆ n (θ) − Mn (θ) lim sup sup M n→∞

θ∈Θ \U

n→∞

θ∈Θ \U

n→∞

θ∈Θ \U

ˆ n (θ 0 ) ≤ sup M (θ) < M (θ 0 ) = lim Mn (θ 0 ) = lim inf M n→∞

θ∈Θ \U

ˆ n (θˆ n ), ≤ lim inf M n→∞

n→∞

almost surely,

(12)

where the second inequality holds by Lemma 1 of Davis and Liu (2016) and the strict inequality holds since M (θ) achieves its maximum value at θ 0 . Using the same argument as in Davis and Liu (2016), the proof concludes by contradiction. Let ω ∈ Ω be the case that ˜ , then (12) is true and θˆ n ̸∈ U infinitely often along a index set N



 

 

ˆ n (θˆ n ) ≤ lim inf Mn (θˆ n ) + Mn (θˆ n ) − M ˆ n (θˆ n ) lim inf M n→∞

n→∞

   ˆ n (θˆ n ) ≤ lim inf Mn (θˆ n ) + lim sup Mn (θˆ n ) − M n→∞

n→∞

200

Y. Cui, Q. Zheng / Statistics and Probability Letters 123 (2017) 193–201

   ˆ n (θˆ n ) ≤ lim inf Mn (θˆ n ) + lim sup Mn (θˆ n ) − M ˜ n→∞,n∈N

n→∞

≤ lim sup Mn (θˆ n ) ≤ lim sup sup Mn (θ) ≤ lim sup sup Mn (θ) ˜ θ∈Θ \U n→∞,n∈N

˜ n→∞,n∈N

n→∞

θ∈Θ \U

  ˆ n (θ) + lim sup sup Mn (θ) − M ˆ n (θ n ) ≤ lim sup sup M θ∈Θ \U

n→∞

n→∞

θ∈Θ \U

ˆ n (θ), = lim sup sup M

(13)

θ∈Θ \U

n→∞

which is in contradiction with (12). Therefore, given any arbitrarily chosen U ⊂ U0 and for all n large enough, θˆ n ∈ U almost surely and the proof is complete.  Proof of Theorem 2. According to Davis and Liu (2016), (A0)–(A8) lead to

√ 

n θ˜ n − θ 0



  D → N 0, Ω −1 . The asymptotic

normality of θˆ n can be derived by following the same lines as in Lemma 7.3 and Lemma and Mikosch  7.4 of Straumann 

 e.a.s.

(2006). Let the vector norm ∥·∥ be the L1 norm. All we need to do is to establish supθ∈Θ ℓ′t (θ) − ℓˆ ′t (θ) → 0.



After some algebra,

     ∂λt (θ) 1 ∂ λˆ t (θ)  1   ′   ′ ˆ − Yt ′ sup ℓ (θ) − ℓt (θ) ≤ sup Yt ′    B (η (θ)) ∂θ B ( η ˆ (θ)) ∂θ θ∈Θ θ∈Θ t t    λ (θ) ∂λ (θ) λˆ t (θ) ∂ λˆ t (θ)  t   t − ′ + sup  ′ . ∂θ B (ηˆ t (θ)) ∂θ  θ∈Θ  B (ηt (θ))

(14)

(15) e.a.s.

By Lemma 2.1 of Straumann and Mikosch (2006), (A6), (A9), and (A10), it can be shown that (14) → 0, since it is bounded by

      1 1 ∂λt (θ) 1 ∂ λˆ t (θ)  ∂ λˆ t (θ) 1 ∂ λˆ t (θ)      sup Yt ′ − Yt ′ − Yt ′  + Yt  ∂θ B (ηt (θ)) ∂θ   B′ (ηt (θ)) ∂θ B (ηˆ t (θ)) ∂θ  θ∈Θ  B (ηt (θ))       ∂ λˆ (θ)     1 Yt  ∂λt (θ) ∂ λˆ t (θ)  1  t   . ≤ sup 2  − − ′  + sup Yt   ′ ∂θ  θ∈Θ  ∂θ  B (ηt (θ)) B (ηˆ t (θ))  θ∈Θ σL  ∂θ e.a.s.

Similarly one can also show that (15) → 0, given (A6), (A9), and (A10). Therefore, n−1/2 sup

n     ′  a.s. ℓt (θ) − ℓˆ ′t (θ) → 0,

as n → ∞.

(16)

θ∈Θ t =1





Using mean value theorem, L′n (θ˜ n ) − L′n (θˆ n ) = L′′n (ζn ) θ˜ n − θˆ n , where ζn lies between θ˜ n and θˆ n , and by the fact









L′n (θ˜ n ) = 0 and Lˆ ′n (θˆ n ) = 0, we get n−1/2 Lˆ ′n (θˆ n ) − L′n (θˆ n ) = n−1 L′′n (ζn ) n1/2 θ˜ n − θˆ n . Assumptions (A8) and (A9) entail a.s.





that L′′n (ζn ) /n → E ∂ 2 ℓ1 (θ 0 ) /∂θ∂θ T . In addition, (16) implies that n−1/2 Lˆ ′n (θˆ n ) − L′n (θˆ n ) naturally follows.





a.s.

→ 0, and the desired result



Proof of Proposition 1. We give the proof for (C2) by following the outlines in Doukhan et al. (2013). The proof for (C1) can be derived similarly by the techniques used in Christou and Fokianos (2014) and arguments in the following. 1 Let Gλ (y) be the cumulative distribution function (cdf) of p(y|λ) and its inverse G− λ (u) = inf {y ≥ 0 : Gλ (y) ≥ u} for −1 u ∈ [0, 1]. Then, Yt in model (1) is given by Yt = Gλt (Ut ) with Ut being independent and uniformly distributed on [0, 1]. By Proposition 7 of Davis and Liu (2016)

  

 

1 −1 ′  E G− λt (Ut ) − Gλ′t (Ut ) λt , λt



  = λt − λ′t  .

(17)

Let Xt = (Yt , λt ) ∈ N0 × (0, ∞) and Xt = (Xt , Xt −1 , . . .). Then, it follows that 1 Xt = G− λt (Ut ), fθ (Xt −1 ) ≡ F (Xt −1 , Ut ) .





Define the norm on N0 × (0, ∞) by ∥X ∥ = |Y | + ϵ |λ| for some ϵ > 0 such that ϵ ℓ=1 αℓ < 1 and Φ (x) = x for x ∈ R+ . Then, it follows that Φ (·) defines an Orlicz function, i.e., defined on R+ , convex, increasing and satisfying Φ (0) = 0.

∞

Y. Cui, Q. Zheng / Statistics and Probability Letters 123 (2017) 193–201

201

In addition, Φ satisfies Φ (xy) ≤ Φ (x)Φ (y) for x, y ∈ R+ , which is required by Theorem 3.1 of Doukhan and Wintenberger (2008).     Based on Φ (·) the Orlicz norm of random variable X with values in N0 ×(0, ∞), ∥X ∥Φ = inf u > 0, with E Φ

∥X ∥ u





1 , is equal to E ∥X ∥. In view of (17) and given Xt −1 and Xt′−1 ,

         F (Xt −1 , Ut ) − F X ′ , Ut  = E Yt − Y ′  + ϵ λt − λ′  = (1 + ϵ) fθ (Xt −1 ) − fθ (X ′ ) t t t −1 t −1 Φ ∞ ∞       αℓ Xt −ℓ − Xt′−ℓ  αℓ Yt −ℓ − Yt′−ℓ  ≤ (1 + ϵ) ≤ (1 + ϵ) ℓ=1

=

∞ 

ℓ=1

  γℓ Xt −ℓ − Xt′−ℓ  ,

ℓ=1

with γℓ = (1 + ϵ)αℓ and ℓ=1 γℓ < 1. Thus, equation (4) of Doukhan et al. (2013) is verified and (i) and (ii) directly follow from Theorem 3.1 of Doukhan and Wintenberger (2008). Following the same lines as in the proof of Proposition 2 of Davis and Liu (2016) or Proposition 2.1 of Doukhan and Kengne (2015), (iii) can be established by Cauchy sequence convergence in L1 space. Due to the assumption βm,m ≤ 1, the induction method used in the proof of Theorem 2.1 in Doukhan et al. (2012) carries over to the general case and (iv) is established. 

∞

References Ahmad, A., Francq, C., 2016. Poisson QMLE of count time series models. J. Time Series Anal. 37, 291–314. Berkes, I., Horváth, L., Kokoszka, P., 2003. GARCH processes: structure and estimation. Bernoulli 9, 201–227. Christou, V., Fokianos, K., 2014. Quasi-likelihood inference for negative binomial time series models. J. Time Series Anal. 35, 55–78. Davis, R.A., Liu, H., 2016. Theory and inference for a class of nonlinear models with application to time series of counts. Statist. Sinica 26, 1673–1707. Davis, R.A., Wu, R., 2009. A negative binomial model for time series of counts. Biometrika 96, 735–749. Doukhan, P., Fokianos, K., Tjøstheim, D., 2012. On weak dependence conditions for Poisson autoregressions. Statist. Probab. Lett. 82, 942–948. Doukhan, P., Fokianos, K., Tjøstheim, D., 2013. Correction to On weak dependence conditions for Poisson autoregressions. Statist. Probab. Lett. 83, 1926–1927. Doukhan, P., Kengne, W., 2015. Inference and testing for structural change in general Poisson autoregressive models. Electron. J. Stat. 9, 1267–1314. Doukhan, P., Louhichi, S., 1999. A new weak dependence condition and applications to moment inequalities. Stochastic Process. Appl. 84, 313–342. Doukhan, P., Wintenberger, O., 2008. Weakly dependent chains with infinite memory. Stochastic Process. Appl. 118, 1997–2013. Ferland, R., Latour, A., Oraichi, D., 2006. Integer-valued GARCH process. J. Time Series Anal. 27, 923–942. Fokianos, K., Rahbek, A., Tjostheim, D., 2009. Poisson autoregression. J. Amer. Statist. Assoc. 104, 1430–1439. Francq, C., Zakoïan, J.M., 2010. GARCH Models: Structure, Statistical Inference and Financial Applications. Wiley, Chichester, UK. Straumann, D., Mikosch, T., 2006. Quasi-maximum-likelihood estimation in conditionally heteroscedastic time series: a stochastic recurrence equations approach. Ann. Statist. 34, 2449–2495. Zhu, F., 2011. A negative binomial integer-valued GARCH model. J. Time Series Anal. 32, 54–67.