The ZD-GARCH model: A new way to study heteroscedasticity

The ZD-GARCH model: A new way to study heteroscedasticity

Accepted Manuscript The ZD-GARCH model: A new way to study heteroscedasticity Dong Li, Xingfa Zhang, Ke Zhu, Shiqing Ling PII: DOI: Reference: S0304...

1MB Sizes 0 Downloads 67 Views

Accepted Manuscript The ZD-GARCH model: A new way to study heteroscedasticity Dong Li, Xingfa Zhang, Ke Zhu, Shiqing Ling

PII: DOI: Reference:

S0304-4076(17)30192-6 https://doi.org/10.1016/j.jeconom.2017.09.003 ECONOM 4429

To appear in:

Journal of Econometrics

Received date : 3 March 2016 Revised date : 8 September 2017 Accepted date : 23 September 2017 Please cite this article as: Li D., Zhang X., Zhu K., Ling S., The ZD-GARCH model: A new way to study heteroscedasticity. Journal of Econometrics (2017), https://doi.org/10.1016/j.jeconom.2017.09.003 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

The ZD-GARCH model: A new way to study heteroscedasticity B Y D ONG L I Center for Statistical Science and Department of Industrial Engineering, Tsinghua University [email protected] X INGFA Z HANG Department of Statistics, Guangzhou University [email protected] K E Z HU Department of Statistics and Actuarial Science, University of Hong Kong [email protected] AND

S HIQING L ING

Department of Mathematics, Hong Kong University of Science and Technology [email protected] A BSTRACT This paper proposes a first-order zero-drift GARCH (ZD-GARCH(1, 1)) model to study conditional heteroscedasticity and heteroscedasticity together. Unlike the classical GARCH model, the ZD-GARCH(1, 1) model is always non-stationary regardless of the sign of the Lyapunov exponent γ0 , but interestingly it is stable with its sample path oscillating randomly between zero and infinity over time when γ0 = 0. Furthermore, this paper studies the generalized quasi-maximum likelihood estimator (GQMLE) of the ZD-GARCH(1, 1) model, and establishes its strong consis-

2 tency and asymptotic normality. Based on the GQMLE, an estimator for γ0 , a t-test for stability, a unit root test for the absence of the drift term, and a portmanteau test for model checking are all constructed. Simulation studies are carried out to assess the finite sample performance of the proposed estimators and tests. Applications demonstrate that a stable ZD-GARCH(1, 1) model is more appropriate than a non-stationary GARCH(1, 1) model in fitting the KV-A stock returns in Francq and Zako¨ıan (2012). Some key words: Conditional heteroscedasticity; GARCH model; Generalized quasi-maximum likelihood estimator; Heteroscedasticity; Portmanteau test; Stability test; Top Lyapunov exponent; Zero-drift GARCH model.

1.

I NTRODUCTION

Heteroscedasticity is often observed in economic and financial time series data. Modeling heteroscedasticity accurately is important for making valid inference. In the last three decades, a lot of conditionally heteroscedastic models have been proposed since the seminar work of Engle (1982) and Bollerslev (1986). However, few attempts have been made in the literature to capture conditional heteroscedasticity and heteroscedasticity together parametrically. This paper reaches this goal by proposing a new first-order zero-drift generalized autoregressive conditional heteroscedasticity (ZD-GARCH(1, 1)) model:

yt = η t

√ 2 ht and ht = α0 yt−1 + β0 ht−1 ,

t = 1, 2, ...,

(1.1)

with initial values y0 ̸= 0 and h0 ≥ 0, where α0 > 0, β0 ≥ 0, {ηt } is a sequence of independent and identically distributed (i.i.d.) random variables, and ηt is independent of {yj , j < t}. If β0 ̸= 0, then the initial value h0 > 0 would suffice. Model (1.1) nests the widely used exponentially weighted moving average (EWMA) model in “RiskMetrics” (J.P. Morgan, 1997), and the EWMA model has been widely applied to calculate the daily volatility of many assets. Clearly, when Eηt2 < ∞, model (1.1) can capture the conditional heteroscedasticity of yt , since var(ηt )ht

3 designed as the conditional variance of yt changes over time. Moreover, let s2t = var(yt ). If Eηt = 0, we have s2t = [α0 var(ηt ) + β0 ]s2t−1 in model (1.1) so that

s2t = [α0 var(ηt ) + β0 ]t−1 s21 .

(1.2)

In this case, yt in model (1.1) is not only homoscedastic when α0 var(ηt ) + β0 = 1, but also heteroscedastic with an exponentially decayed (or explosive) variance when α0 var(ηt ) + β0 < 1 (or > 1). Obviously, the heteroscedastic structure of yt in (1.2) is different from the parametric ones assumed in Breusch and Pagan (1979) and White (1980) or the nonparametric ones studied in Dahlhaus (1997), Dahlhaus and Subba Rao (2006), Engle and Rangel (2008), Giraitis et al. (2014, 2017), and many others. Thus, when Eηt2 < ∞, model (1.1) provides a new way to study conditional heteroscedasticity and heteroscedasticity together. Needless to say, model (1.1) is motivated by the classical GARCH(1, 1) model:

yt = ηt



2 ht and ht = ω0 + α0 yt−1 + β0 ht−1 ,

t = 1, 2, ...,

(1.3)

where all notations are inherited from model (1.1) except for ω0 > 0. Model (1.3) introduced by Engle (1982), Bollerslev (1986) and Taylor (1986) has become the workhorse of financial applications. It can be used to describe the volatility dynamics of almost any financial return series; see comments in Engle (2004, p.408). Due to the importance of this model, numerous works were devoted to its probabilistic structure and statistical inference, see, e.g., Francq and Zako¨ıan (2010) for a comprehensive review. However, they all assume that ω0 is positive. The case with ω0 = 0, i.e., model (1.1), is a meaningful one but is hardly touched, except for Hafner and Preminger (2015) who studied an ARCH(1) model without intercept (i.e., model (1.1) with β0 = 0). We fill this gap from a theoretical viewpoint. Compared with Hafner and Preminger (2015), we develop a much more involved technique due to the existence of β0 .

4 Bougerol and Picard (1992a) showed that model (1.3) is stationary if and only if the top Lyapunov exponent γ0 < 0, where γ0 = E log(β0 + α0 ηt2 ).

(1.4)

If Eηt = 0 and Eηt2 < ∞, then s2t = ω0 (Eηt2 ) + [α0 (Eηt2 ) + β0 ]s2t−1 in model (1.3). When γ0 ≥ 0, we have α0 (Eηt2 ) + β0 > 1 and yt in model (1.3) is heteroscedastic with an exponentially explosive variance. However, when γ0 ≥ 0, no consistent estimator is available for ω0 as shown in Francq and Zako¨ıan (2012), and hence no prediction can be made in practice. Model (1.1) avoids this dilemma due to the absence of ω0 . Moreover, Section 2 below demonstrates that except for a different scale, the sample path of model (1.1) has a similar shape to that of model (1.3) when γ0 ≥ 0. In view of this, model (1.1) could be more convenient than model (1.3) when studying heteroscedasticity. This paper gives an omnifaceted investigation of model (1.1). First, we show that, after a suitable renormalization, the limit of the sample path of ht or |yt | converges weakly to a geometric Brownian motion regardless of the sign of γ0 . This result represents a sharp difference from those for model (1.3) in Li et al. (2014) and Li and Wu (2015). From this result, we find that |yt | diverges to infinity or converges to zero almost surely (a.s.) at an exponential rate when γ0 > 0 or γ0 < 0, respectively, while |yt | oscillates randomly between zero and infinity over time when γ0 = 0. Following the terminology in Hafner and Preminger (2015), we call model (1.1) stable if γ0 = 0 and unstable otherwise. Second, we study the generalized quasi-maximum likelihood estimator (GQMLE) of unknown parameter θ0 ≡ (α0 , β0 )′ . It is shown that the GQMLE is strongly consistent and asymptotically normal in a unified framework. Third, we study an estimator of γ0 , and propose a t-test to check the model stability (i.e., γ0 = 0). If model (1.1) is stable, a unit root test is further proposed to check the absence of ω0 . Fourth, we propose a portmanteau test for model checking. Simulation studies are carried out to assess the performance

5 of all proposed estimators and tests in the finite sample. Finally, applications demonstrate that a stable ZD-GARCH(1, 1) model is more appropriate than a non-stationary GARCH(1, 1) model in fitting the KV-A stock returns in Francq and Zako¨ıan (2012). The remainder of the paper is organized as follows. Section 2 investigates the limit of sample path of yt in model (1.1). Section 3 studies the GQMLE and its asymptotics. Section 4 presents the inference for γ0 and ω0 . Section 5 proposes a portmanteau test for model checking. Simulation results are reported in Section 6. Empirical examples are illustrated in Section 7. Concluding remarks and discussions are made in Section 8. All of proofs are relegated to the Appendix.

2.

S AMPLE PATH PROPERTIES

This section studies the sample path properties of renormalized ht and |yt | in model (1.1). 2 ), and hence From model (1.1), we have log ht = log ht−1 + log(β0 + α0 ηt−1

log ht =

t−1 ∑

log(β0 + α0 ηi2 ) + log h0 .

i=0

The theorem below then follows directly from Donsker’s Theorem in Billingsley (1999) (Theorem 8.2 on page 90). T HEOREM 2.1. Suppose that (i) {ηt } is a sequence of i.i.d. random variables with P (ηt = 0) = 0 and σγ20 ≡ var[log(β0 + α0 ηt2 )] ∈ (0, ∞); and (ii) h0 is a positive random variable and independent of {ηt : t ≥ 1}. Then, as n → ∞, √ 1/ n

h[ns]

√ =⇒ exp(σγ0 B(s)) in D[0, 1], exp(sγ0 n) where γ0 is defined in (1.4), ‘=⇒’ stands for weak convergence, B(s) is a standard Brownian motion on [0, 1], and D[0, 1] is the space of functions defined on [0, 1], which are right continuous and have left limits, endowed with the Skorokhod topology.

6 Furthermore, it follows that as n → ∞, √

|y[ns] |2/ n √ =⇒ exp(σγ0 B(s)) exp(sγ0 n)

in D[0, 1].

Remark 2.1. Similar to Li and Wu (2015), the condition that ηt is i.i.d. in Theorem 2.1 can be relaxed to the one that ηt is strictly stationary and ergodic with {log(β0 + α0 ηt2 )} satisfying a suitable invariance principle. Remark 2.2. For model (1.3), Li and Wu (2015) proved that as n → ∞, in D[0, 1],     ∞ if γ0 < 0,   √   1/ n |h[ns] | ) ( √ =⇒ exp σγ0 max B(τ ) if γ0 = 0,  exp(sγ0 n) 0≤τ ≤s       exp(σγ0 B(s)) if γ0 > 0.

(2.1)

This limiting result varies according to the value of γ0 , which is unlike the one in Theorem 2.1. Theorem 2.1 has two direct implications. First, it implies that yt in model (1.1) is always nonstationary. This is not the case for model (1.3), which is non-stationary if and only if γ0 ≥ 0 (see, e.g., Bougerol and Picard (1992a,b)). Second, it indicates that the sample path property of yt in model (1.1) depends on the sign of γ0 , and this is also the case for model (1.3) as shown in Nelson (1990), Francq and Zako¨ıan (2012), and Li et al. (2014). Precisely, we can see that |yt | in model (1.1) oscillates randomly between zero and infinity over time when γ0 = 0, while |yt | either converges to zero or diverges to infinity a.s. as t → ∞, according to the case that γ0 < 0 or γ0 > 0, respectively. In this sense, model (1.1) is stable if γ0 = 0, and unstable otherwise; see Hafner and Preminger (2015). To further illustrate it, we depict the top three sub-figures in Fig. 1, in which they are one sample path of {yt }200 t=1 from model (1.1) with ηt ∼ N (0, 1), β0 = 0.7, and α0 = 0.3 (i.e., γ0 < 0), 0.388 (i.e., γ0 = 0), or 0.5 (i.e., γ0 > 0), respectively. These three subfigures confirm the conclusion drawn from Theorem 2.1 above, and most importantly, they show that only the stable model (1.1) is useful in fitting real data sets. This will be further demonstrated

7 in our empirical studies in Section 7. The stable parameter region of model (1.1) is small for a given distribution of ηt . It is simply a curve in terms of (α0 , β0 ). This is similar to the IGARCH time series (i.e., α0 + β0 = 1 in model (1.3)) and the unit root time series (yt = ϕ0 yt−1 + εt with ϕ0 = 1). From a practical point of view, the parameter space is not a main concern. The importance is whether or not the pattern of the sample path of model (1.1) matches to that of a real time series. γ0<0

γ0=0

γ0>0

0.2

1

100

0.1

0.5

50

0

0

0

−0.1

−0.5

−50

−0.2

0

100 t

200

1

−1

0

100 t

200

5

100 t

200

0

100 t

200

0

100 t

200

100

0

0

0

−0.5

−100 0

100 t

200

−5

20

100

10

50

0

0

−10

−50

−20

0

200

0.5

−1

−100

0

100 t

200

−100

0

100 t

200

−200

5000

0

0

100 t

200

−5000

200 Fig. 1. One sample path {yt }200 t=1 based on the same innovations {ηt }t=1 , where the columns from left to right

correspond to the cases that γ0 < 0, γ0 = 0, and γ0 > 0, respectively, and the rows from top to bottom correspond to model (1.1), model (1.3) with ω0 = 0.001, and model (1.3) with ω0 = 1, respectively.

Finally, it is interesting to see the difference in sample path between model (1.1) and model (1.3). The remaining sub-figures in Fig. 1 plot the sample path of {yt }200 t=1 from model (1.3) with ω0 = 0.001 or 1 under the same setting as that for model (1.1). From these graphs, we find that apart from a larger scale, the sample path of yt from model (1.1) when γ0 ≥ 0 has a similar shape as the one from model (1.3) when γ0 ≥ 0, and the sample path of yt from model (1.1) with γ0 = 0 looks compatible with the one from model (1.3) when γ0 < 0. These findings may suggest that it is desirable to detect whether ω0 = 0 in model (1.3). However, this is not an easy

8 task since ω0 only reflects the scale of yt in model (1.3). Hafner and Preminger (2015) suggested a bootstrap-assisted likelihood ratio (LR) test for this purpose, but our simulation study (not reported here but is available from us) shows that this LR test does not have satisfactory power when γ0 ≥ 0. When γ0 = 0, we will propose a unit-root test to check whether ω0 = 0 in Section 4.

3.

T HE GQMLE

AND ITS ASYMPTOTICS

Let θ = (α, β)′ ∈ Θ be the unknown parameter of model (1.1) with the true parameter θ0 = (α0 , β0 )′ ∈ Θ, where Θ is the parametric space. Assume the data sample {y1 , ..., yn }1 is from model (1.1). Then, as in Francq and Zako¨ıan (2013a), the generalized quasi-maximum likelihood estimator (GQMLE) of θ0 is defined by n ∑

θbn,r = (b αn,r , βbn,r )′ = arg min θ∈Θ

where r ≥ 0,

ℓt,r (θ) =

    log {σtr (θ)} +

ℓt,r (θ),

t=1

|yt |r σtr (θ) ,

if r > 0,

   {log |yt | − log σt (θ)}2 , if r = 0,

and σt2 (θ) is recursively defined by

2 2 σt2 (θ) = αyt−1 + βσt−1 (θ),

t = 1, ..., n,

with the initial values y0 and σ02 (θ). Hereafter, we set y0 = y0∗ (a user-chosen non-zero constant) and σ02 (θ) = 0 without loss of generality. In applications, we can always choose y0∗ to be the first nonzero observation, and then do estimation for the remaining ones. The non-negative user-chosen number r involved in θbn,r indicates the used estimation method.

Particularly, when r = 2, θbn,r reduces to the Gaussian QMLE; and when r = 1, θbn,r reduces to the Laplacian QMLE. Simulation studies in Section 6 imply that we shall choose a small (or large) value of r when ηt is heavy-tailed (or light-tailed). 1

To make the notations log |yt | and 1/yt valid in the sequel, we assume that each yt is not zero for convenience. If some yt is zero in practice, we can simply let log |yt | = 1 and 1/yt = 1, and this will not affect our asymptotics below, since the probability that the event {yt = 0} happens is zero according to Assumption 3.1(i).

9 To obtain the asymptotic property of θbn,r , we give two assumptions below.

Assumption 3.1. (i) ηt is i.i.d. and ηt2 is non-degenerate with P (ηt = 0) = 0; (ii) E|ηt |r = 1 when r > 0, and E log |ηt | = 0 when r = 0. Assumption 3.2. The parametric space Θ is a compact subset of (0, ∞) × [0, ∞). Assumption 3.1(i) is a regular condition for ARCH-type models, and Assumption 3.1(ii) is the identification condition for θbn,r ; see, e.g., Francq and Zako¨ıan (2013a). Assumption 3.2 is commonly used in nonstationary ARCH-type models; see, e.g., Francq and Zako¨ıan (2012, 2013b). Let κr =

    4(E|ηt |2r − 1)/r2 ,    4E[(log |ηt |)2 ],

if r > 0, if r = 0.

The following two theorems state the consistency and asymptotic normality of θbn,r , respectively. T HEOREM 3.1. If Assumptions 3.1-3.2 hold, then θbn,r → θ0 a.s. as n → ∞.

T HEOREM 3.2. If Assumptions 3.1-3.2 hold, κr < ∞ and θ0 is an interior point of Θ, then √ d n(θbn,r − θ0 ) −→ N (0, κr I −1 ) as n → ∞,

d

where −→ stands for the convergence in distribution, and      I=  

1 α20

ν1 α0 β0 (1−ν1 )

(1+ν1 )ν2 ν1 α0 β0 (1−ν1 ) β02 (1−ν1 )(1−ν2 )

     

with νi = E

(

β0 β0 + α0 ηt2

)i

for i = 1, 2.

Remark 3.1. From two preceding theorems, we have a unified framework on the consistency and asymptotic normality of the GQMLE in model (1.1), regardless of the sign of γ0 . However, this is not the case for the Gaussian QMLE in model (1.3); see, e.g., Jensen and Rahbek (2004a,b) and Francq and Zako¨ıan (2012). Particularly, it is worth noting that when γ0 = 0, an additional

10 assumption on the distribution of ηt , which is not necessary for us, is needed for Francq and Zako¨ıan (2012) to establish the asymptotic normality of the Gaussian QMLE. However, this additional assumption is hard to be verified even for ηt ∼ N (0, 1). Remark 3.2. By Theorem 3.2, we can do statistical inference on θ0 (e.g., the Wald test for the linear constraint of θ0 ). To accomplish it, we need to estimate both κη and I. Denote the residual ηbt,r = yt /b σt,r , where σ bt,r (> 0) is recursively calculated by 2 2 2 σ bt,r =α bn,r yt−1 + βbn,r σ bt−1,r ,

t = 1, ..., n,

2 = 0. Then, we can consistently estimate κ and I by their sample counwith y0 = y0∗ and σ b0,r r

terparts based on the residuals {b ηt,r }. In Theorem 3.2, θ0 is required to be an interior point of

Θ. If θ0 can be on the boundary of Θ (e.g., β0 = 0), we need the condition of E(ηt−4 ) < ∞ for Lemma A.3 in the Appendix, so that under the conditions of Theorem 3.2 and the mild condition that Θ contains a hypercube, the similar argument as Francq and Zako¨ıan (2007) shows that √ ( where λΛ = Z1 −

d

n(θbn,r − θ0 ) −→ λΛ as n → ∞,

α0 ν1 β0 (1−ν1 ) Z2 I(Z2

)′ < 0), Z2 − Z2 I(Z2 < 0) and (Z1 , Z2 )′ ∼ N (0, κr I −1 ).

However, the condition that E(ηt−4 ) < ∞ fails even for ηt ∼ N (0, 1).

Remark 3.3. We should mention that θ0 depends on r due to the identifiability assumption E|ηt |r = 1. Hence, the matrix I still depends on r implicitly through θ0 , and the efficiency or variance of estimators with different choices of r cannot be compared by simply computing κr . Next, we show how to formally compare the efficiency of two GQMLEs with r = r1 and r2 , where r1 > 0 and r2 > 0. Suppose that θ0 is identified under the assumption E|ηt |r1 = 1. Then, by Theorem 3.2, we know that

√ b n(θn,r1 − θ0 ) → N (0, κr1 I −1 ) as n → ∞ for the first GQMLE

11 of θ0 . Rewrite model (1.1) as yt = ηt∗



2 h∗t and h∗t = α0∗ yt−1 + β0 h∗t−1 ,

where ηt∗ = ηt /zr2 , α0∗ = α0 zr22 , and zr2 = (E|ηt |r2 )1/r2 . In the preceding model, the re-scaled innovation ηt∗ satisfies the identification condition of the GQMLE θbn,r2 , hence θbn,r2 converges ∗ weakly to (α0∗ , β0 )′ instead of θ0 . Based on this fact, we shall treat θbn,r , Γθbn,r2 as the second 2

GQMLE of θ0 , where Γ = diag{1/zr22 , 1} is a diagonal matrix. By Theorem 3.2, it is not hard to see that

√ b∗ n(θn,r2 − θ0 ) → N (0, κ∗r2 I −1 ) as n → ∞, where κ∗r is defined in the same way as

∗ κr with ηt being replaced by ηt∗ . Hence, we can simply compare the efficiency of θbn,r1 and θbn,r 2

by looking at the following ratio:

κr1 E|ηt |2r1 − 1 r22 ∗ Eff(θbn,r1 , θbn,r ) = · = (E|ηt |r2 )2 . 2 κ∗r2 r12 E|ηt |2r2 − (E|ηt |r2 )2

∗ ) ≈ 0.876 and Eff(θ bn,2 , θb∗ ) ≈ 0.674, and For instance, when ηt ∼ N (0, 1), Eff(θbn,2 , θbn,1 n,0.5

these results are consistent to our simulation results; see Table 2 of Section 6. Clearly, similar ratios can be derived for r1 = 0 or r2 = 0, and the details are omitted here for saving the space. Remark 3.4. The key technique for the proofs of Theorems 3.1-3.2 is to show that normalized by ht , the non-stationary process σt2 (θ) and its first and second derivatives can be well approximated by some stationary processes; see Lemmas A.1-A.2 in the Appendix. When β ≡ 0 as considered by Hafner and Preminger (2015), this approximation is straightforward, while when β > 0, σt2 (θ) has the autoregressive structure, resulting in a nontrivial approximation in theory. The technique developed in this paper may be generalized to other nonstationary processes (driven by a finite number of parameters and an initial value) or exotic unit root type processes whose parameters are hidden within the model. For instance, it shall be applicable for the zero-drift GJR model with minor modifications.

12 Since Theorem 3.2 rules out the case that β0 = 0, we shall consider this special case independently. When β0 = 0, model (1.1) becomes yt = η t

√ 2 ht and ht = α0 yt−1 ,

(3.1)

t = 1, 2, ... .

This model is also called the ARCH(1) model without intercept in Hafner and Preminger (2015). Denote Θα ⊂ (0, ∞) be the parametric space of model (3.1). Then, the GQMLE of α0 in model (3.1) is

α en,r =

 n [ ∑  r 2   arg min 2 log{αyt−1 } + α∈Θα t=2 n [ ∑

   arg min

α∈Θα t=2

1 αr/2

|yt |r |yt−1 |r

]

, if r > 0,

] 2 ) 2, log(|yt |) − 21 log(αyt−1

if r = 0.

Without the compactness of Θα , the asymptotic normality of α en,r is given below: T HEOREM 3.3. If Assumption 3.1 holds and κr < ∞, then

√ d n(e αn,r − α0 ) −→ N (0, κr α02 ) as n → ∞. Remark 3.5. We prove Theorem 3.3 in a straightforward way, since an explicit expression of α en,r is available in this case. Our proof is different from the one in Hafner and Preminger (2015), who have obtained the same result for r = 2.

Remark 3.6. Besides the GQMLE, one could consider many other estimation methods for model (1.1); see, e.g., Peng and Yao (2003), Berkes and Horv´ath (2004), Fan et al. (2014), and Zhu and Li (2015a). Moreover, the condition that κr < ∞ is necessary for the asymptotic normality of the GQMLE, and one may be interested in studying the GQMLE when κr = ∞; see, e.g., Hall and Yao (2003). These are two promising directions for the future study. As shown in Remark 3.2, the Wald and LR tests are not suitable to detect whether β0 = 0 in model (1.1). One may try the score test as in Engle (1982) for this purpose. However, this is not suitable as well. To see it clearly, we consider the limiting distribution of the score under the constraint that β0 = 0, where Πn,r (θ) =

1 n

∑n

t=2 ∂ℓt,r (θ)/∂β

√ n Πn,r (θen,r )

and θen,r = (e αn,r , 0). A

13 direct calculation shows that √ n Πn,r (θen,r ) =

) n √ r ∑ 1 r/2 r/2 [ n(e αn,r − α0 )] r/2 2 e 2α0 n n,r t=2 ηt−1 α ( )[ ] n r/2 rα0 1 ∑ 1 − |ηt |r √ + for r > 0. 2 r/2 n t=2 ηt−1 2α0 α ˜ n,r (

Hence, the limiting distribution of

√ n Πn,r (θen,r ) exists only when E(ηt−4 ) < ∞, which fails

even for ηt ∼ N (0, 1). Similarly, the conclusion holds when r = 0. In Section 5, a portmanteau test is available to detect whether β0 = 0 in model (1.1).

4.

I NFERENCE FOR γ0

AND

ω0

Generally, γ0 plays a key role in determining the stationarity of nonlinear time series models. In model (1.1), γ0 plays an equally important role in determining its stability. Thus, it is necessary to do statistical inference for γ0 . From the definition of γ0 in (1.4), a natural plug-in estimator of γ0 is defined as n

γ bn,r =

1∑ 2 log(βbn,r + α bn,r ηbt,r ), n t=1

where ηbt,r is defined in Remark 3.2. Furthermore, we have the following theorem: T HEOREM 4.1. If the conditions in Theorem 3.2 are satisfied, then as n → ∞,

(i) γ bn,r → γ0 in probability; (ii)

√ d n(b γn,r − γ0 ) −→ N (0, σγ20 ),

where σγ20 is defined in Theorem 2.1. Moreover, if Assumption 3.1(i) holds and σγ20 ∈ (0, ∞), the same conclusion holds for model (3.1). Remark 4.1. Although γ bn,r depends on r (i.e., the estimation method), its asymptotic variance

is free of that. Intuitively, this suggests that the performance of γ bn,r and its related stable test

14 below is less affected by the estimation method. Simulation studies in Section 6 will confirm this statement. Since model (1.1) is stable if and only if γ0 = 0, it is interesting to consider hypotheses: H0 : γ 0 = 0

v.s.

(4.1)

H1 : γ0 ̸= 0.

From Theorem 4.1, we propose a t-type test statistic Tn,r to detect H0 in (4.1), where √ γ bn,r n σ bη,r

Tn,r = 2 = with σ bη,r

1 n

∑n

b

t=1 {log(βn,r

d

2 )}2 − γ 2 .2 Under H , it is not hard to see that T +α bn,r ηbt,r bn,r 0 n,r →

N (0, 1) as n → ∞. So, at the significance level α ∈ (0, 1), H0 in (4.1) is rejected if |Tn,r | > |Φ−1 (α/2)|, where Φ(·) is the c.d.f. of N (0, 1); otherwise, it is not rejected. Once H0 in (4.1) is not rejected, it is interesting to make a further test for hypotheses: H0 : ω0 = 0 and γ0 = 0

v.s.

(4.2)

H1 : ω0 > 0 or γ0 ̸= 0.

The study of hypotheses (4.2) is important for model selection, since the sample path of a stable model (1.1) in Fig. 1 looks compatible with the one of stationary or nonstationary model (1.3), and this may cause model mis-specification without examining H0 in (4.2). Note that under H0 in (4.2), zt , log yt2 − log ηt2 is a standard unit root process: (4.3)

zt = ϕ0 zt−1 + et ,

2 ). In this case, we can use the Dickey-Fuller (DF) test where ϕ0 = 1 and et = log(β0 + α0 ηt−1

in Dickey and Fuller (1979) to detect the null hypothesis of ϕ0 = 1, and the acceptance of this null hypothesis can give us some evidence to use the stable model (1.1). 2

For model (3.1), b γn,r =

2 n

2 = (log |yn | − log |y0 |) and b ση,r

4 n

∑n

t=1

{log |yt | − log |yt−1 |}2 −

4 n2

(log |yn | − log |y0 |)2 .

Interestingly, the preceding definitions of b γn,r and b ση,r are independent of the estimation method, and they have been used in Hafner and Preminger (2015).

15 2 , and then As zt in model (4.3) is unobservable, we shall replace it by zbt,r , log yt2 − log ηbt,r

define a DF type unit root test statistic as follows:

Un,r

where ϕbn,r

2 = with σe,r

1 n−1

∑n

ϕbn,r − 1 = se(ϕbn,r )

∑n bt,r zbt−1,r t=2 z = ∑ n 2 bt−1,r t=2 z

2 bt,r t=2 e

[

and

]2

(4.4)

,

se(ϕbn,r ) =



σ2 ∑n e,r 2 bt,r t=2 z

and ebt,r = zbt,r − ϕbn,r zbt−1,r . The following theorem gives the limit-

ing null distribution of Un,r .

T HEOREM 4.2. If a stable model (1.1) is correctly specified and the conditions in Theorem 3.2 are satisfied, then as n → ∞, d

Un,r −→

[∫ 1 0

]2 B(s)dB(s)

∫1 0

B2 (s)ds

.

The above theorem indicates that regardless of r, Un,r has the same limiting null distribution. At the significance level α ∈ (0, 1), H0 in (4.2) is rejected if Un,r > cvα , where cvα is the α upper percentile of the limiting distribution of Un,r ; otherwise, it is not rejected. By direct simulations, we can find that the 1%, 5% and 10% upper percentiles are 7.0798, 4.2772 and 3.0407 for Un,r . It is worth noting that although Un,r can only detect whether ω0 = 0 for the stable ZD-GARCH model, it shall be enough to meet the practical demand, since only the stable ZD-GARCH model has a desirable sample path to match the real data. It is interesting to see the power of Un,r under the following fixed alternative:

H1ST : ω0 > 0 and γ0 < 0.

16 Under H1ST , yt follows a stationary GARCH(1, 1) model. In this case, if θbn,r converges weakly to a quasi-point θ∗ , it is not hard to show that ϕbn,r

∗ ) E(zt∗ zt−1 → ϕ∗ , E[(zt∗ )2 ]

and

√ n[se(ϕbn,r )] →



(σe∗ )2 E[(zt∗ )2 ]

in probability, as n → ∞, where zt∗ = log[σt2 (θ∗ )] and (σe∗ )2 = (1 + ϕ2∗ )E[(zt∗ )2 ] − ∗ ]. Hence, as n → ∞, it follows that 2ϕ∗ E[zt∗ zt−1

Un,r (1 + ϕ2∗ )E[(zt∗ )2 ] − 2ϕ∗ E[(zt∗ )2 ] → c∗ , ∗ ] n (1 + ϕ2∗ )E[(zt∗ )2 ] − 2ϕ∗ E[zt∗ zt−1 in probability under H1ST , where the constant c∗ ∈ (0, 1). That is, Un,r can detect H1ST consistently (i.e., reject H0 with a power approaching to one). It is not clear the power behavior of Un,r under the alternative H1N S : ω0 > 0 and γ0 > 0. This needs a further study in the future.

5.

M ODEL DIAGNOSTIC CHECKING

This section proposes a portmanteau test to check the adequacy of model (1.1). We first define the lag-k autocorrelation function (ACF) of the sth power of the absolute residuals {|b ηt,r |s } as ρbr,s (k) =

∑n

ηt,r |s − b ar,s )(|b ηt−k,r |s t=k+1 (|b ∑n ηt,r |s − b ar,s )2 t=1 (|b

where r ≥ 0, s > 0, k is a positive integer, and b ar,s = Next, we introduce the following notations:

1 n

∑n

−b ar,s )

,

ηt,r |s . t=1 |b

[ ] ( ν k−1 |ηt |s − as ) as = E|ηt |s , bs = var(|ηt |s ), ps (k) = 0, 1 E , 1 − ν1 β0 + α0 ηt2  ( )    (p′s (1), p′s (2), · · · , p′s (m))′ 2 I −1 , if r > 0, r Vr,s = ( )    (p′s (1), p′s (2), · · · , p′s (m))′ 2I −1 , if r = 0,     (p′s (1), p′s (2), · · · , p′s (m))′ E[(|ηt |s − as )(|ηt |r − 1)], if r > 0, Wr,s =    (p′s (1), p′s (2), · · · , p′s (m))′ E[(|ηt |s − as ) log |ηt |], if r = 0.

17 Let m be a given positive integer. The following theorem is crucial to derive the limiting distribution of our portmanteau test. T HEOREM 5.1. Suppose that bs < ∞, br < ∞, and ∥Wr,s ∥ < ∞ for given r ≥ 0 and s > 0. If model (1.1) is correctly specified and the conditions in Theorem 3.2 hold, then

where

√ d n(b ρr,s (1), ..., ρbr,s (m))′ −→ N (0, Σr,s (m)) as n → ∞,

Σr,s (m) =





 b2s Im , Wr,s  1   (Im , −Vr,s )′ (I , −V ) m r,s   b2s ′ Wr,s br I

and Im is the m × m identity matrix. Moreover, if model (3.1) is correctly specified and Assumption 3.1(i) holds, then the same conclusion holds with Σr,s (m) = Im . b r,s (m) be the sample counterpart of Σr,s (m), based on the For model (1.1) with β0 > 0, let Σ

b r,s (m) = Im . Then, residuals {b ηt,r }; and for model (1.1) with β0 = 0 (i.e., model (3.1)), let Σ our portmanteau test is defined by Qr,s (m) := n(n + 2)

(

ρbr,s (1) ρbr,s (m) , ..., n−1 n−m d

)

b r,s (m)]−1 [Σ

(

ρbr,s (1) ρbr,s (m) , ..., n−1 n−m

)′

.

By Theorem 5.1, we have Qr,s (m) −→ χ2m as n → ∞. Thus, at the significance level α ∈ (0, 1), we conclude that model (1.1) is not adequate if Qr,s (m) > Ψ−1 m (1 − α), where Ψd (·) is the c.d.f. of χ2d ; otherwise, it is adequate. In applications, the choice of m depends on the frequency of the series, and one can often choose m to be O(log(n)), which delivers 6 or 12 for a moderate n (see Tsay (2008)); the choice of r and s depends on the thickness of ηt , and one may choose the small (or large) r and s when ηt is heavy-tailed (or light-tailed). It is worth noting that when r = s = 2, Qr,s (m) is defined in the same way as the portmanteau test in Li and Mak (1994). When r = 2 (or 1) and s = 1, Qr,s (m) is analogous to the portmanteau test Q2 (M ) in Li and Li (2005) (or Qr in Li and Li (2008)). Our portmanteau test Qr,s (m) is

18 the first formal diagnostic checking tool for nonstationary ARCH-type models in the literature. For more discussions on the diagnostic checking of stationary ARCH-type models, we refer to Li (2004), Escanciano (2007, 2008), Ling and Tong (2011), and Chen and Zhu (2015). Moreover, we shall highlight that the estimation effect does not affect the limiting distribution of Qr,s (m) for model (3.1), and this is different from most of portmanteau tests in times series analysis; see, e.g., Zhu and Li (2015b), Zhu (2016) and references therein for more discussions in this context. For model (1.1) with β0 > 0, the estimation effect is involved into the limiting distribution of Qr,s (m), but interestingly, if α0 /β0 ≈ 0 (as often observed in applications), it is b r,s (m) ≈ Im due to the fact that Vr,s ≈ 0, and hence the estimation effect not hard to see that Σ is negligible in this case.

6.

S IMULATION STUDIES

In this section, we first assess the finite-sample performance of θbn,r and γ bn,r . We generate

1000 replications from the following ZD-GARCH(1, 1) model: yt = ηt



ht ,

2 ht = α0 yt−1 + 0.9ht−1 ,

(6.1)

where ηt is taken as N (0, 1), the standardized Student’s t5 (st5 ) or the standardized Student’s t3 (st3 ) such that Eηt2 = 1. Here, we fix β0 = 0.9, and choose α0 as in Table 1, where the values of α0 correspond to the cases of γ0 > 0, γ0 = 0, and γ0 < 0, respectively. For the indicator r, we choose it to be 2, 1, 0.5, and 0. Since each GQMLE has a different identification condition, θbn,r has to be re-scaled for θ0 in model (6.1), and it is defined as θbn,r =

(

αn,r ,β (E|ηt |r )2/r n,r

)

for r > 0 and θbn,r =

(

αn,r ,β exp(2E log |ηt |) n,r

)

for r = 0,

where θn,r is the GQMLE calculated from the data sample, and the true values of (E|ηt |r )2/r and exp(2E log |ηt |) are used.

19 Table 1. The values of the pair (α0 , γ0 ) when β0 = 0.9. ηt ∼ st5

ηt ∼ N (0, 1)

ηt ∼ st3

α0

γ0

α0

γ0

α0

γ0

0.1

-0.0082

0.1

-0.0152

0.1

-0.0300

0.1096508

0.0000

0.1201453

0.0000

0.1508275

0.0000

0.2

0.0706

0.2

0.0548

0.2

0.0263

Table 2. Summary for θbn,r and γ bn,r when γ0 = 0. r=2

ηt

n

α bn,r

βbn,r

r=1

γ bn,r

α bn,r

βbn,r

r = 0.5 γ bn,r

α bn,r

βbn,r

r=0 γ bn,r

α bn,r

βbn,r

γ bn,r

N (0, 1) 500 Bias -0.0050 0.0045 -0.0001 -0.0049 0.0045 -0.0001 -0.0030 0.0035 -0.0001 -0.0021 0.0036 0.0000 SD 0.0247 0.0205 0.0060 0.0257 0.0214 0.0060 0.0290 0.0240 0.0060 0.0400 0.0329 0.0061 AD 0.0242 0.0196 0.0056 0.0260 0.0210 0.0056 0.0295 0.0237 0.0057 0.0385 0.0307 0.0057 1000 Bias -0.0029 0.0028 0.0003 -0.0027 0.0029 0.0003 -0.0010 0.0020 0.0003 -0.0007 0.0024 0.0003 SD 0.0184 0.0152 0.0042 0.0194 0.0160 0.0042 0.0217 0.0176 0.0042 0.0284 0.0228 0.0042 AD 0.0174 0.0141 0.0041 0.0187 0.0151 0.0041 0.0212 0.0170 0.0041 0.0278 0.0221 0.0041 st5

500 Bias -0.0056 0.0042 0.0004 -0.0054 0.0053 0.0004 -0.0030 0.0041 0.0004 -0.0011 0.0030 0.0004 SD 0.0440 0.0300 0.0078 0.0316 0.0228 0.0078 0.0322 0.0230 0.0078 0.0425 0.0299 0.0078 AD 0.0368 0.0253 0.0073 0.0297 0.0206 0.0073 0.0310 0.0213 0.0074 0.0385 0.0263 0.0075 1000 Bias -0.0032 0.0013 -0.0001 -0.0022 0.0021 -0.0001 -0.0004 0.0013 -0.0002 -0.0004 0.0015 -0.0001 SD 0.0297 0.0207 0.0052 0.0215 0.0153 0.0052 0.0223 0.0157 0.0052 0.0275 0.0193 0.0052 AD 0.0273 0.0190 0.0053 0.0215 0.0149 0.0053 0.0223 0.0153 0.0054 0.0275 0.0189 0.0054

st3

500 Bias 0.0162 -0.0020 0.0008 -0.0045 0.0044 0.0008 -0.0041 0.0037 0.0007 -0.0002 0.0033 0.0007 SD 0.1958 0.0543 0.0097 0.0472 0.0245 0.0097 0.0408 0.0217 0.0096 0.0491 0.0253 0.0096 AD 0.0975 0.0364 0.0096 0.0431 0.0219 0.0093 0.0386 0.0199 0.0094 0.0457 0.0230 0.0095 1000 Bias 0.0176 -0.0035 0.0003 -0.0029 0.0024 0.0003 -0.0026 0.0019 0.0002 0.0003 0.0019 0.0002 SD 0.2808 0.0430 0.0070 0.0324 0.0164 0.0070 0.0282 0.0147 0.0070 0.0340 0.0174 0.0070 AD 0.0860 0.0303 0.0070 0.0311 0.0159 0.0067 0.0276 0.0143 0.0068 0.0326 0.0165 0.0068 †The smallest value of AD for θbn,r is in boldface

‡The large discrepancy between underlined ADs and SDs is due to the invalidity of θbn,2 when ηt ∼ st3

Table 2 reports the empirical bias, empirical standard deviation (SD), and the average of the asymptotic standard deviations (AD) of θbn,r and γ bn,r for the case of γ0 = 0.3 The sample size 3

We shall mention that the numerical results for θbn,r and b γn,r (as well as Qr,s (6) below) for the cases of γ0 < 0 and γ0 > 0 are similar to those for the case of γ0 = 0, and they are available from us upon request but not reported here for saving the space.

20 n is chosen to be 500 and 1000. The ADs of θbn,r and γ bn,r are evaluated from the asymptotic covariances in Theorems 3.2 and 4.1, respectively. From this table, we find that (i) except θbn,2

in the case of ηt ∼ st3 , all GQMLEs have a small bias, and their SDs and ADs are close to each other; (ii) when ηt is light-tailed (i.e., ηt ∼ N (0, 1)), θbn,2 (or θbn,0 ) is the best (or worst) estimator

in terms of the minimized value of AD; (iii) when ηt is heavy-tailed (i.e., ηt ∼ st3 or st5 ), θbn,1 or θbn,0.5 has the smallest value of AD, while θbn,2 has a larger value of AD than θbn,0 if ηt ∼ st5 , and

it is even not applicable if ηt ∼ st3 ; (iv) the performance of γ bn,r seems to be unchanged for all choices of r, even when θbn,2 is not asymptotically normal in the case of ηt ∼ st3 .

Next, we examine the finite-sample performance of Tn,r . Table 3 reports the power of Tn,1 in

terms of different values of α0 with n = 500, 1000 and 2000, where the sizes of Tn,1 correspond to the choices of α0 with respective to γ0 = 0 in Table 1. Here, we shall mention that the power of Tn,r for other choices of r is almost the same as the one of Tn,1 , and hence it is not reported here for saving the space. From Table 3, we can see that the size of Tn,1 becomes precise when the value of n is large, and the power of Tn,1 increasing with n is satisfactory for all choices of distributions of ηt . Third, we assess the finite-sample performance of Un,r . We generate 1000 replications from the following GARCH(1, 1) model: yt = ηt



ht ,

2 ht = ω0 + α0 yt−1 + 0.9ht−1 ,

(6.2)

where ηt is chosen as in model (6.1), α0 is chosen with respect to γ0 < 0, γ0 = 0 or γ0 > 0, and ω0 = 0, 10−7 , 10−5 or 10−3 . For each replication, we use Un,r to detect whether ω0 = 0 and γ0 = 0, where r = 2, 1, 0.5, and 0. Table 4 reports the power of Un,r with n = 500 or 1000, and the sizes of Un,r correspond to the case that ω0 = 0 and γ0 = 0. From this table, we can find that the performance of Un,r is robust in terms of the choice of r and its size is close to the nominal level 5% especially for large n. Furthermore, Un,r is very powerful to reject the null hypothesis

21 Table 3. Power (×100) of Tn,1 at the significance level 5% α0 ηt

n

N (0, 1)

500

0.07 0.08

0.1

100 99.9 39.5

0.102 0.104 0.106 0.1096508 0.112 0.114 0.116 0.118 0.12 0.14 27.5

18.9

13.6

7.8

9.5

11.8

18.1

24.8

31.0 95.4

1000 100

100 60.5

43.8

25.7

14.9

6.5

8.2

16.5

25.1

35.4

51.7 99.8

2000 100

100 85.6

65.5

39.9

20.6

5.0

10.2

21.5

43.1

62.3

78.0 100

α0 ηt

n

0.06 0.07 0.08

0.09

0.10

0.11

0.1201453

0.13

0.14

0.15

0.16

0.17 0.18

st5

500

100 99.7 99.0

92.7

65.7

25.7

10.2

19.9

42.0

69.6

88.0

96.5 99.0

1000 100

100

100

99.2

85.8

35.1

6.8

24.2

65.3

94.1

99.5

99.9 100

2000 100

100

100

100

97.5

56.2

5.6

43.4

93.6

99.9

100

100

100

α0 ηt

n

0.09 0.10 0.11

0.12

0.13

0.14

0.1508275

0.16

0.17

0.18

0.19

0.20 0.21

st3

500

98.6 93.0 81.2

61.1

39.1

17.8

8.5

10.4

18.9

30.5

52.0

70.2 80.7

1000 99.9 99.3 96.1

79.0

51.9

20.4

7.8

10.6

29.4

59.4

83.3

93.2 99.0

2000 100

96.6

74.0

28.3

6.3

15.5

48.4

86.2

98.4

100

100 99.9

100

†The size of Tn,1 is in boldface

when γ0 ≤ 0 and its rejection rates are more than 90% even when ω0 = 10−3 in all cases. When γ0 > 0, its rejection rates are relatively low. This is reasonable since σt → ∞ when t → ∞, and the contribution of ω0 to σt is ignorable in this case. In summary, Un,r is a good tool to test the stable model (1.1) against model (1.3) with γ0 ≤ 0. In the end, we assess the finite-sample performance of Qr,s (m). We generate 1000 replications from the following higher-order ZD-GARCH model: yt = η t



ht ,

2 2 ht = α0 yt−1 + α1 yt−2 + 0.9ht−1 ,

(6.3)

where ηt and α0 are chosen as in model (6.1) for the case of γ0 = 0, and α1 = 0, 0.1, · · · , 0.7. For each replication, we use Qr,s (6) to detect whether model (6.1) is adequate to fit the generated data, where (r, s) = (2, 2), (1, 2), (1, 1) and (0.5, 0.5). Fig. 2 plots the power of Qr,s (6) with γ0 = 0 and n = 500 or 1000, and the sizes of Qr,s (6) correspond to the case that α1 = 0. From

22

Table 4. Power (×100) of Un,r at the significance level 5%

ηt

ω0

n

r=2

r=1

r = 0.5

r=0

0 10−7 10−5 10−3

0 10−7 10−5 10−3

0 10−7 10−5 10−3

0 10−7 10−5 10−3

α0 N (0, 1) 500

0.09

87.2 0.0

4.2 100 86.0 0.2

5.7 100 84.1 0.4

0.1096508 7.7 9.8 63.9 99.9 7.0 9.3 64.1 100

1000

8.1 100 83.9 1.1 15.4 100

8.4 10.0 66.4 100 10.4 13.0 63.5 100

0.13

74.3 81.9 96.8 57.9 70.9 79.7 96.6 72.1 72.8 80.9 97.4 80.7 69.5 79.7 98.1 86.8

0.09

98.6 0.0

0.2 100 98.2 0.0

0.5 100 98.8 0.0

0.7 100 98.5 0.2

4.3 100

0.1096508 6.0 9.0 70.7 96.6 6.0 7.4 70.9 98.1 6.9 8.2 68.9 99.1 4.5 8.8 69.2 99.2

st5

500

0.13

44.4 46.9 30.3 4.9 48.7 50.6 34.2 9.9 57.0 59.0 39.0 11.0 59.6 62.8 46.5 14.8

0.10

55.9 0.5

5.8 100 55.3 0.2

4.7 100 56.2 0.1

5.9 100 56.1 0.2

8.2 100

0.1201453 10.3 9.8 47.8 96.6 7.7 6.4 51.6 98.7 7.2 6.8 49.1 99.5 8.8 6.8 47.7 99.9

1000

0.14

40.8 51.4 86.8 64.1 41.4 50.8 87.5 75.3 37.6 47.8 89.9 83.1 44.1 48.9 89.8 85.9

0.10

76.8 0.0

0.9 100 80.3 0.0

0.4 100 80.2 0.0

0.5 100 78.0 0.0

1.2 100

0.1201453 7.5 6.8 55.3 90.1 6.3 5.8 54.1 94.9 5.6 5.9 50.5 95.6 6.4 6.1 49.5 97.3

st3

500

0.14

28.6 34.0 38.8 17.8 30.0 36.8 45.8 23.1 36.1 39.4 50.9 26.4 42.3 48.1 59.1 33.9

0.13

32.9 3.2 12.8 99.4 30.5 0.3

7.9 99.9 29.7 0.2

7.1 99.7 27.0 0.4

6.7 99.7

0.1508275 10.8 8.4 40.6 92.0 8.5 6.1 34.9 96.5 8.7 6.1 31.9 97.6 8.8 3.8 35.8 98.1

1000

0.17

24.1 27.6 65.8 71.5 18.7 23.1 64.8 81.3 18.8 20.0 65.0 85.6 18.7 21.7 65.1 90.7

0.13

39.8 0.6

7.8 98.7 41.5 0.2

3.0 99.3 42.4 0.2

1.7 99.8 43.0 0.1

3.3 99.9

0.1508275 8.1 6.0 43.3 81.2 7.4 5.8 36.4 88.3 5.7 4.0 33.6 91.9 6.4 4.7 32.9 92.2 0.17

15.6 19.3 44.1 35.7 16.1 17.3 53.1 44.1 15.5 21.6 54.2 49.1 17.0 20.9 58.4 56.9

†The size of Un,r is in boldface

this figure, we can see that (i) each Qr,s (6) has a precise size, although Qr,s (6) with s = 2 is not valid in the case of ηt ∼ st3 ; (ii) when ηt is light-tailed (e.g., ηt ∼ N (0, 1)), Qr,s (6) with all choices of (r, s) except that (r, s) = (0.5, 0.5) has almost the same power performance; (iii) when ηt is heavy-tailed (e.g., ηt ∼ st5 or st3 ), Qr,s (6) with small values of s (e.g., Q1,1 (6) and Q0.5,0.5 (6)) has a much better power performance than that with large values of s (e.g., Q2,2 (6) and Q1,2 (6)). In summary, Qr,s (m) has the ability to detect the mis-specification of model (1.1) in the higher-order term.

23 ηt ∼ N(0, 1), γ0 = 0, n = 500

1

0.5

ηt ∼ N(0, 1), γ0 = 0, n = 1000

1

0.5

0

0 0

0.2

0.4

0.6

α1 ηt ∼ st5 , γ0 = 0, n = 500

1

0

0.4

0.6

α1 ηt ∼ st5 , γ0 = 0, n = 1000

1

0.5

0.2

0.5

0

0 0

0.2

0.4

0.6

α1 ηt ∼ st3 , γ0 = 0, n = 500

1

0

0.4

0.6

α1 ηt ∼ st3 , γ0 = 0, n = 1000

1

0.5

0.2

0.5

0

0 0

0.2

0.4

0.6

0

α1

0.2

0.4

0.6

α1

Fig. 2. The power of Q2,2 (6) (circle line), Q1,2 (6) (plus line), Q1,1 (6) (cross line), and Q0.5,0.5 (6) (diamond line) for ηt ∼ N (0, 1) (top panels), st5 (middle panels), and st3 (bottom panels), where the data sample is generated from

model (6.3) with γ0 = 0 and the sample size n = 500 (left panels) or 1000 (right panels). Here, the solid line stands for the significance level α = 5%.

Moreover, we are interested in seeing whether Qr,s (m) can detect the mis-specification of model (1.1) in the drift term. We generate 1000 replications from model (6.2), where ηt and α0 are chosen as in model (6.1) for the case of γ0 = 0, and ω0 = 0, 0.02, · · · , 0.1. For each replication, we use Qr,s (6) to detect whether model (6.1) is adequate to fit the generated data, where the values of (r, s) are chosen as before. Fig. 3 plots the power of Qr,s (6) with γ0 = 0 and n = 500 or 1000, and the sizes of Qr,s (6) correspond to the case that ω0 = 0. From this figure, we can see that (i) each Qr,s (6) has a precise size, although Qr,s (6) with s = 2 is not valid in

24 ηt ∼ N(0, 1), γ0 = 0, n = 500

ηt ∼ N(0, 1), γ0 = 0, n = 1000

1

0.6 0.4

0.5

0.2 0

0 0

0.05

0.1

ω0 ηt ∼ st5 , γ0 = 0, n = 500

0.6

0

0.4

0.2

0.2

0

0.1

ω0 ηt ∼ st5 , γ0 = 0, n = 1000

0.6

0.4

0.05

0 0

0.05

0.1

ω0 ηt ∼ st3 , γ0 = 0, n = 500

0.2

0

0.1

ω0 ηt ∼ st3 , γ0 = 0, n = 1000

0.2

0.1

0.05

0.1

0

0 0

0.05

0.1

0

ω0

0.05

0.1

ω0

Fig. 3. The power of Q2,2 (6) (circle line), Q1,2 (6) (plus line), Q1,1 (6) (cross line), and Q0.5,0.5 (6) (diamond line) for ηt ∼ N (0, 1) (top panels), st5 (middle panels), and st3 (bottom panels), where the data sample is generated from

model (6.2) with γ0 = 0 and the sample size n = 500 (left panels) or 1000 (right panels). Here, the solid line stands for the significance level α = 5%.

the case of ηt ∼ st3 ; (ii) when ηt is light-tailed (e.g., ηt ∼ N (0, 1)), Qr,s (6) with large values of s (e.g., s = 2) has a much better performance of that with small values of s (e.g., s = 1 or 0.5), and for a fixed choice of s, a smaller value of r leads to a better power performance of Qr,s (6); (iii) when ηt is heavy-tailed (e.g., ηt ∼ st5 or st3 ), the power performance of Qr,s (6) for s = 2 or 1 becomes better when the value of r becomes smaller; (iv) Q0.5,0.5 (6) has the worst power performance in all examined cases; (v) the performance of each portmanteau test becomes worse when ηt becomes more heavy-tailed; (vi) the power of each portmanteau test may not increase

25 with the value of ω0 , and this is probably because the positive ω0 only reflects the scale of yt in model (6.2). In summary, it is reasonable to conclude that Qr,s (m) has a desirable power to detect the mis-specification in the drift term especially for light-tailed ηt . Overall, our proposed estimators (θbn,r and γ bn,r ) and tests (Tn,r , Un,r and Qr,s (m)) have a

good finite-sample performance in most of examined cases.

7.

A PPLICATIONS

This section restudies the daily stock data of Monarch Community Bancorp (NasdaqCM: MCBF), KV Pharmaceutical (NYSE: KV-A), Community Bankers Trust (AMEX: BTC), and China MediaExpress (NasdaqGS: CCME) in Francq and Zako¨ıan (2012). The log-return (×100) of each stock data is non-stationary according to their strict stationarity test, and it was fitted by a non-stationary model (1.3) with the Gaussian QMLE. Let {e ηt,2 }nt=1 be the residuals of each fitted model (1.3) in Francq and Zako¨ıan (2012). Fig. 4 plots the left-tail Hill’s estimators 100 {H1n (k)}100 ηt,2 }nt=1 , where the Hill’s estik=10 and right-tail Hill’s estimators {H2n (k)}k=10 of {e

mators H1n (k) and H2n (k) of any sequence {zt }nt=1 are defined, respectively, by [

k z(i) 1∑ log H1n (k) = k z(k+1) i=1

]−1

[

k z(n−i+1) 1∑ log and H2n (k) = k z(n−k) i=1

]−1

with {z(t) }nt=1 being the ascending order statistics of {zt }nt=1 , and k being a given positive integer. Since the estimated parameters in Francq and Zako¨ıan (2012) are

√ n-consistent, it is

expected that the Hill’s estimator is a consistent estimator of the tail index of ηt if its fourth moment is finite. As an informal tool, the Hill’s estimators in Fig. 4 seem to indicate that ηt in each fitted model (1.3) has an infinite fourth moment. To gain more evidence, we further use the test statistic Θnw in Trapani (2016) to check the following hypotheses: H0 : E|ηt |k = ∞ v.s. H1 : E|ηt |k < ∞,

(7.1)

26 where k = 2 or 4, and Θnw =



1

−1

{

}2 w 2 ∑ √ [I (exp(e µ∗∗ du k /2)ξj ≤ u) − 0.5] w i=1

with {ξj }w j=1 being an i.i.d. N (0, 1) sample of size w, and

(N )

where µϕ

µ e∗∗ k

[ ] (N ) k µ 1 µ ek = k , µ e1 µ(N ) k

is the ϕ-th absolute moment of N (0, 1), and µ eϕ =

1 n

∑n

ηt |ϕ . t=1 |e

As Trapani (2016),

we choose w = [n4/5 ], and the corresponding results of Θnw are reported in Table 5. From this table, we find that for each return, at the significance level 5%, the null hypothesis of E|ηt |2 = ∞ is rejected, while the null hypothesis of E|ηt |4 = ∞ is not rejected. This is consistent to our findings from the plot of Hill’s estimators in Fig. 4. Hence, the Gaussian QMLE in Francq and Zako¨ıan (2012) may not be an ideal choice for modeling these four stock returns.

MCBF

4

KV-A

3

3.5

2.5

3 2

2.5 2 10

40

70

100

1.5 10

40

k BTC

3.5

70

100

70

100

k CCME

4

3

3

2.5 2

2 1.5 10

40

70

100

1 10

k

40

k

100 Fig. 4. Left-tail Hill’s estimators {H1n (k)}100 k=10 (solid line) and right-tail Hill’s estimators {H2n (k)}k=10 (dashed

line) for {e ηt,2 }n t=1 .

27 Table 5. Testing results of Θnw based on {ηet,2 }nt=1 . MCBF

Θnw

KV-A

BTC

CCME

k=2

k=4

k=2

k=4

k=2

k=4

k=2

k=4

16.1022

0.2242

14.0442

0.1088

12.4332

0.0799

8.9521

0.5240

[0.0001]

[0.6358]

[0.0001]

[0.7415]

[0.0004]

[0.7801]

[0.0028]

[0.4692]

†The p-value of Θnw in brackets is calculated by P (χ21 > Θnw )

In this paper, we are interested in seeing whether model (1.1) is able to fit these stock returns adequately. Fig. 5 plots the p-values of Q1,1 (m) and Q0.1,1 (m) for m = 1, 2, · · · , 20. From this figure, we can see that model (1.1) is adequate to fit the KV-A return as indicated by both portmanteau tests, and so we can fit this stock return by

yt = η t



2 ht and ht = 0.0588yt−1 + 0.9081ht−1 ,

(0.0135)

(7.2)

(0.0167)

where model (7.2) is estimated by the GQMLE method with r = 1, and the standard deviations of this estimator θbn,1 are in open parentheses. Based on the residuals {b ηt,1 }nt=1 , we find that ηt

in model (7.2) has a finite second moment but an infinite fourth moment by looking at the Hill’s 100 estimators {H1n (k)}100 k=10 and {H2n (k)}k=10 , and calculating the values of Θnw (the details are

omitted for saving the space). Moreover, the value of the stable test statistic Tn,1 is 0.934 (with p-value=0.35), and so there is no statistical evidence against the hypothesis that model (7.2) is stable. Also, the value of the unit root test statistic Un,1 is 1.015 (with p-value=0.36), and hence a stable model (1.1) is more appropriate than model (1.3) to fit the KV-A return. Note that the estimate of α0 var(ηt ) + β0 is 1.096, and this implies that the KV-A return is heteroscedastic with an slightly exponentially explosive variance. For the remaining three stock returns, Fig. 5 shows that model (1.1) can not fit them adequately as indicated by Q0.1,1 (m), although each fitted model (1.1) is stable according to the stable test

28 MCBF

KV−A

1

1

0.8

0.8

0.6

0.6

0.4

0.4

0.2

0.2

0

1

3

5

7

9 11 13 15 17 19 21 m

0

1

3

5

7

BTC 0.8

9 11 13 15 17 19 21 m CCME

1 0.8

0.6

0.6 0.4 0.4 0.2 0

0.2 1

3

5

7

9 11 13 15 17 19 21 m

0

1

3

5

7

9 11 13 15 17 19 21 m

Fig. 5. The plot of p-values of Q1,1 (m) (circle line) and Q0.1,1 (m) (plus line) for each stock return, where the solid line stands for the significance level α = 5%.

statistic Tn,1 , and except the BTC return, a stable model (1.1) is more appropriate than model (1.3) to fit the MCBF and CCME returns in light of the unit root test statistic Un,1 .4

8.

C ONCLUDING REMARKS AND DISCUSSIONS

This paper introduced a ZD-GARCH(1, 1) model to study the conditional heteroscedasticity and heteroscedasticity together. Unlike the classical GARCH(1, 1) model, the ZD-GARCH(1, 1) model is always nonstationary. It is stable with its sample path oscillating randomly between zero and infinity over time when γ0 = 0. Moreover, this paper studied the GQMLE of the ZDGARCH(1, 1) model, and established its strong consistency and asymptotic normality, regardless of the sign of γ0 . Based on the GQMLE, an estimator for γ0 , a t-test for stability, a unit root test for the absence of the drift term, and a portmanteau test for model checking are all constructed. 4

For the MCBF, BTC and CCME stock returns, the p-values of the stable test statistic Tn,1 are 0.25, 0.34 and 0.39, respectively, and the p-values of the unit root test statistic Un,1 are 0.74, 0.01 and 0.37, respectively.

29 Simulation studies revealed that all proposed estimators and tests have a satisfactory finite sample performance. Applications demonstrate that a stable ZD-GARCH(1, 1) model is more appropriate than a non-stationary GARCH(1, 1) model in fitting the KV-A stock returns in Francq and Zako¨ıan (2012). The ZD-GARCH(1, 1) model is most likely stable in applications. This is not unexpected, since only the stable ZD-GARCH(1, 1) model has a desirable sample path which is close to that often observed in the real world. In theory, the idea of setting the drift term to zero can be easily applied to many other conditionally heteroscedastic models, and their probabilistic structures and statistical inference deserve exploration in the future. In practice, the statistical inference methods for the ZD-GARCH(1, 1) model could be useful for empirical analysis.

ACKNOWLEDGEMENTS The authors greatly appreciate the very constructive suggestions and comments of three anonymous referees, the Associate Editor, and the Co-Editor Y. Ait-Sahalia that greatly improve this article. The authors also thank Dr. Alice Cheung for her professional proof-editing help. Li’s research is supported by the Start-up Fund of Tsinghua University (No. 53330230117) and NSFC (No. 11401337 and 11571348). Zhang’s research is supported by NSFC (No. 11401123). Zhu’s research is supported by NSFC (No. 11571348, 11371354, 11690014, 11731015 and 71532013), Seed Fund for Basic Research (No. 201611159233), and Key Laboratory of RCSDS, Chinese Academy of Sciences. Ling’s research is supported by Hong Kong Research Grants Commission (Grants GRF 16500915, 16307516 and 16500117).

30 A PPENDIX : P ROOFS OF T HEOREMS Define five [0, ∞]-valued processes

vt (θ) =

∞ ∑ i=1

dα t (θ) =

∞ ∑ i=1

dβt (θ) = νtαβ (θ) =

i−1 2 ∏ ηt−i β 2 2 , β0 + α0 ηt−i j=1 β0 + α0 ηt−j

i−1 ∞ 2 ∏ ∑ (i − 1)αηt−i β 2 2 , β(β0 + α0 ηt−i ) j=1 β0 + α0 ηt−j i=2 ∞ ∑ i=2

νtββ (θ) =

i−1 2 ∏ αηt−i β 2 2 , β0 + α0 ηt−i j=1 β0 + α0 ηt−j

i−1 2 ∏ (i − 1)ηt−i β 2 2 , β(β0 + α0 ηt−i ) j=1 β0 + α0 ηt−j

∞ i−1 2 ∑ ∏ (i − 1)(i − 2)αηt−i i=3

with the convention

β 2 (β

∏j−1

k=1

0

+

2 ) α0 ηt−i

β 2 β + α0 ηt−j j=1 0

= 1 when j ≤ 1. Let Θ0 = {θ ∈ Θ : β < eγ0 } and Θ∗0 be any compact subset

¯ = sup{α|θ ∈ Θ∗0 }, and β¯ = sup{β|θ ∈ Θ∗0 }. of Θ0 . Denote α = inf{α|θ ∈ Θ∗0 }, β = inf{β|θ ∈ Θ∗0 }, α To facilitate the proofs, the following four lemmas are needed. Lemmas A.1 and A.2 show that normalized by ht , the nonstationary process σt2 (θ) and its first and second derivatives can be well approximated by some stationary processes. Both of them are critical to derive the asymptotics of θbn,r . Lemmas A.3 and A.4 are used to prove the asymptotic normality of θbn,r .

L EMMA A.1. Suppose that Assumptions 3.1(i) and 3.2 hold. Then, for any θ ∈ Θ0 , vt (θ) is stationary

and ergodic. Moreover, there exists a constant c0 > 0 such that, as t → ∞, 2 σt (θ) (i) e sup − vt (θ) → 0 a.s.; ht θ∈Θ∗ 0 ht 1 c0 t (ii) e sup 2 − → 0 a.s. vt (θ) θ∈Θ∗ σ (θ) c0 t

0

t

Finally, for any θ ∈ / Θ0 , it holds that σt2 (θ)/ht → ∞ a.s. as t → ∞. P ROOF. For any θ ∈ Θ0 , vt (θ) is finite (a.s.) by Assumption 3.1(i), Assumption 3.2 and the Cauchy root test (see Francq and Zako¨ıan (2010), page 25). Since it is a measurable function of {ηj : j < t}, vt (θ) is thus stationary and ergodic. Note that our used technique is close to the one for Lemma 1.1 in Brandt (1986).

31 2 For (i), from (1.1), it follows that ht−1 /ht = 1/(β0 + α0 ηt−1 ). Note that 2 2 (θ) σt2 (θ) ht−1 yt−1 ht−1 σt−1 =α +β ht ht ht−1 ht ht−1 2 2 αηt−1 σt−1 (θ) β = + 2 2 β0 + α0 ηt−1 β0 + α0 ηt−1 ht−1

=

t ∑ i=1

(A1)

i−1 2 ∏ αηt−i β 2 2 . β0 + α0 ηt−i j=1 β0 + α0 ηt−j

¯ Choose c0 = (γ0 − log β)/2. Then, c0 > 0 by Assumption 3.2, and e

c0 t

2 ∞ i−1 2 ∑ ∏ σt (θ) αηt−i β = ec0 t − v (θ) t 2 2 ht β + α0 ηt−i j=1 β0 + α0 ηt−j i=t+1 0 ≤ ≤

∞ ∑

i−1 2 ∏ αηt−i βec0 2 2 β + α0 ηt−i β + α0 ηt−j i=t+1 0 j=1 0 ∞ ∑

i−1 2 ∏ ¯ c0 α ¯ ηt−i βe 2 2 . β + α0 ηt−i β + α0 ηt−j j=1 0 i=t+1 0

Note that by the strong law of large numbers for stationary and ergodic sequences, i−1

1∑ log i j=1

(

¯ c0 βe 2 β0 + α0 ηt−j

)

→ log β¯ + c0 − γ0 = −c0

as i → ∞. By the Cauchy root test again, it follows that 2 σ (θ) ec0 t sup t − vt (θ) → 0 a.s. ht θ∈Θ∗ 0

as t → ∞. Thus, it entails that (i) holds.

For (ii), a simple calculation yields that 2 ht σt (θ) 1 1 sup 2 − ≤ sup − vt (θ) , 2 ∗ ∗ vt (θ) vt (θ)σt (θ)/ht θ∈Θ0 ht θ∈Θ0 σt (θ)

¯ Note that σ 2 (θ)/ht → vt (θ) a.s. as t → ∞ by (i) and where θ = (α, β) and θ¯ = (¯ α, β). t vt (θ) ≥

2 α ηt−1 α > 0 a.s. by Assumption 3.1(i). ≥ 2 β0 + α0 ηt−1 α0

By (i), it follows that (ii) holds. Finally, for any θ ∈ / Θ0 , by (A1), it follows that σt2 (θ)/ht → ∞ a.s. as t → ∞ by the Cauchy root test when β > eγ0 and by the Chung-Fuchs theorem when β = eγ0 . The proof is completed.



32 β L EMMA A.2. Suppose that Assumptions 3.1(i) and 3.2 hold. Then, for any θ ∈ Θ0 , dα t (θ), dt (θ),

νtαβ (θ) and νtββ (θ) are stationary and ergodic. Moreover, as t → ∞,

1 ∂σt2 (θ)

β α ′

(i) sup − (dt (θ), dt (θ)) → 0 a.s.; ∗ h ∂θ t θ∈Θ0

 

0 νtαβ (θ)

1 ∂ 2 σ 2 (θ)  

  t (ii) sup −  → 0 a.s. ′

h ∂θ∂θ   t θ∈Θ∗ 0

νtαβ (θ) νtββ (θ) P ROOF. Since σt2 (θ) = α

∑t

i=1

2 β i−1 yt−i , we have

t

∂σt2 (θ) ∑ i−1 2 = β yt−i , ∂α i=1 ∂ 2 σt2 (θ) = 0, ∂α2 and

t ∑ ∂σt2 (θ) 2 =α (i − 1)β i−2 yt−i , ∂β i=2 t

∂ 2 σt2 (θ) ∑ 2 = (i − 1)β i−2 yt−i , ∂α∂β i=2

t ∑ ∂σt2 (θ) 2 (i − 1)(i − 2)β i−3 yt−i . = α ∂β 2 i=3

Then, the conclusion follows from the similar argument as for Lemma A.1.



L EMMA A.3. Suppose that the conditions in Theorem 3.2 hold. Then, as n → ∞, n

1 ∑ ∂ℓt,r (θ0 ) d √ −→ N (0, dr I) , ∂θ n t=1

where dr = r4 κr /16 when r > 0, and dr = κr /4 when r = 0. P ROOF. When r > 0, by a direct calculation, we have

[ ( )r/2 ] n n h 1 ∑ ∂ℓt,r (θ0 ) r ∑ 1 ∂σt2 (θ0 ) t √ 1 − |ηt |r = √ ∂θ σt2 (θ0 ) n t=1 2 n t=1 σt2 (θ0 ) ∂θ n

r ∑ 1 ∂σt2 (θ0 ) = √ (1 − |ηt |r ) + op (1), 2 n t=1 ht ∂θ

where the last equality holds by Lemma A.1(ii), Cauchy root test, and the fact that vt (θ0 ) = 1 a.s. Similarly, when r = 0, we have √ ] [ n n 1 ∑ ∂ℓt,r (θ0 ) 1 ∑ 1 ∂σt2 (θ0 ) ht √ = −√ log |ηt | + log ∂θ σt2 (θ0 ) n t=1 n t=1 σt2 (θ0 ) ∂θ n

1 ∑ 1 ∂σt2 (θ0 ) = −√ log |ηt | + op (1). n t=1 ht ∂θ

β Note that when θ0 is an interior point of Θ, dα t (θ0 ) and dt (θ0 ) have moments of any order. Thus, by

Lemma A.2(i), the conclusion follows from the same argument as Lemma A.4 in Francq and Zako¨ıan (2012).



33 β ′ L EMMA A.4. Let dt (θ) = (dα t (θ), dt (θ)) . Suppose that Assumptions 3.1-3.2 hold. Then, as n → ∞, 2 n ∂ ℓt,r (θ) 1∑ sup − Σt,r (θ) → 0 a.s., n t=1 θ∈Θ∗0 ∂θ∂θ′

where

[

r |ηt |r Σt,r (θ) = 1 − r/2 2vt (θ) vt (θ) when r > 0, and

]

    

νtαβ (θ) νtββ (θ)



] √  Σt,r (θ) = log vt (θ) − log |ηt |   [

when r = 0.

νtαβ (θ)

0

0



[ ]  ( r r ) |ηt |r  1+ − 1 dt (θ)d′t (θ) + 2 2 vtr/2 (θ)  2vt (θ)

νtαβ (θ)

νtαβ (θ) νtββ (θ)



√ ] [  1 1  + dt (θ)d′t (θ)  + log |ηt | + log vt (θ) 2 

P ROOF. Note that when r > 0, [ ( )r/2 ] r h 1 ∂ 2 σt2 (θ) ∂ 2 ℓt,r (θ) t r = 1 − |η | t ∂θ∂θ′ 2 σt2 (θ) σt2 (θ) ∂θ∂θ′ [ ] ( )r/2 r ( r) ht 1 ∂σt2 (θ) ∂σt2 (θ) r + 1+ |ηt | −1 4 , 2 2 2 σt (θ) σt (θ) ∂θ ∂θ′ and when r = 0,  √  2 2 ∂ 2 ℓt,r (θ)  σt2 (θ)  1 ∂ σt (θ) = log − log |η | t ∂θ∂θ′ ht σt2 (θ) ∂θ∂θ′ √ [ ] ht 1 1 ∂σt2 (θ) ∂σt2 (θ) + log |ηt | + log + . 2 4 σt (θ) 2 σt (θ) ∂θ ∂θ′ Since the conclusion holds for the element-wise a.s. convergence, we only consider the convergence of ∂ 2 ℓt,2 (θ)/∂β 2 for simplicity. By a direct calculation, we have 2 n ∂ ℓt,2 (θ) 1∑ ββ − Σ sup (θ) t,2 2 n t=1 θ∈Θ∗0 ∂β n { } 1 ∂ 2 σ 2 (θ) { 2 } ββ 1∑ η ν (θ) h t t t t ≤ − 1 − sup 1 − ηt2 2 n t=1 θ∈Θ∗0 σt (θ) σt2 (θ) ∂β 2 vt (θ) vt (θ) n { } 1 [ ∂σ 2 (θ) ]2 { 2η 2 } [dβ (θ)]2 1∑ t t t 2 ht + −1 4 − −1 sup 2ηt 2 n t=1 θ∈Θ∗0 σt (θ) σt (θ) ∂β vt (θ) vt2 (θ) := I1,n + I2,n ,

34 where Σββ t,2 (θ) is the last entry of Σt,2 (θ). For I1,n , we have n h 1 ht 1 ∂ 2 σt2 (θ) 1∑ 2 t ηt sup 2 − n t=1 θ∈Θ∗0 σt (θ) vt (θ) σt2 (θ) ht ∂β 2 n η 2 ht 1 1 ∂ 2 σt2 (θ) 1∑ + sup 1 − t 2 − n t=1 θ∈Θ∗0 vt (θ) σt (θ) vt (θ) ht ∂β 2 n η 2 1 1 ∂ 2 σt2 (θ) 1∑ ββ sup 1 − t − ν (θ) + t 2 ∗ n t=1 θ∈Θ0 vt (θ) vt (θ) ht ∂β

I1,n ≤

:= I11,n + I12,n + I13,n .

Now, we deal with I13,n . By Lemma A.1(i) and Lemma A.2(ii), it follows that I13,n ≤

n 1 ∂ 2 σ 2 (θ) 1∑ 1 [ η2 ] ββ t 1+ t sup − ν (θ) → 0 a.s. t n t=1 vt (θ) vt (θ) θ∈Θ∗0 ht ∂β 2

as n → ∞. Similarly, by Lemma A.1(ii) and Lemma A.2, we can prove that I11,n → 0 and I12,n → 0 a.s. as n → ∞. Thus, it follows that I1,n → 0 a.s. as n → ∞. Using the same procedure, we can show that I2,n → 0 a.s. as n → ∞. Therefore, as n → ∞, 2 n ∂ ℓt,2 (θ) 1∑ ββ sup − Σt,2 (θ) → 0 a.s. 2 n t=1 θ∈Θ∗0 ∂β



and in turn the conclusion holds.

P ROOF OF T HEOREM 3.1. We first consider the case that r > 0. Note that θbn,r = arg minθ∈Θ Qn,r (θ),

where

with

and

{( [ } ] )r/2 n 2 1∑ h r (θ) σ t Qn,r (θ) = |ηt |r := On,r (θ) + Rn,r (θ) − 1 + log t n t=1 σt2 (θ) 2 ht [ {( } ] )r/2 n 1∑ 1 r On,r (θ) = |ηt |r − 1 + log vt (θ) n t=1 vt (θ) 2 [ {( ] )r/2 ( )r/2 } n 2 1∑ h 1 r σ (θ) t Rn,r (θ) = |ηt |r − + log t . n t=1 σt2 (θ) vt (θ) 2 ht vt (θ)

Lemma A.1 implies that if θ ∈ / Θ0 , Qn,r (θ) → ∞ a.s. as n → ∞. Thus, it is sufficient to consider the case θ ∈ Θ∗0 , where Θ∗0 is an arbitrary compact subset of Θ0 . By the strong law of large numbers for stationary and ergodic sequences, we have lim On,r (θ) = E

n→∞

{(

1 v1 (θ)

)r/2

r − 1 + log v1 (θ) 2

}

≥0

a.s.

with the equality holding if and only if v1 (θ) = 1 a.s. or equivalently, θ = θ0 by Lemma A.2 in Francq and Zako¨ıan (2012).

35 Furthermore, since vt (θ) > α/α0 > 0, Lemma A.1(i) and the mean value theorem entail that n 2 [ ] n 1 ∑ σt (θ) ht σt2 (θ) 1 1∑ lim sup log sup + sup − vt (θ) ≤ lim 2 n→∞ θ∈Θ∗ n n→∞ ∗ ∗ ht vt (θ) n t=1 θ∈Θ0 σt (θ) vt (θ) θ∈Θ0 ht 0 t=1 2 ] n [ σt (θ) 1∑ ht 1 = lim − sup − vt (θ) 2 n→∞ n σt (θ) vt (θ) θ∈Θ∗0 ht t=1 2 n σ (θ) 2∑ 1 + lim sup t − vt (θ) n→∞ n v (θ) θ∈Θ∗0 ht t=1 t = 0 a.s.

(A2)

Meanwhile, by Lemma A.1(i), the mean value theorem, and the fact that E|ηt |r < ∞, we can show that n {( )r/2 ( )r/2 } 1 ∑ ht 1 r − lim sup |ηt | 2 (θ) n→∞ θ∈Θ∗ n σ v (θ) t t 0 t=1 [ ( ) ( )r/2−1 ] n r/2−1 ht r ∑ ht 1 1 r + sup 2 − ≤ lim |ηt | sup n→∞ 2n σt2 (θ) vt (θ) σt (θ) vt (θ) θ∈Θ∗ θ∈Θ∗ 0 0 t=1 = 0 a.s.,

where the last equality holds as for (A2). Thus, it follows that lim sup |Rn,r (θ)| = 0

n→∞ θ∈Θ∗

a.s.

(A3)

0

Then, since Θ is compact, the proof in the case of r > 0 is completed by standard arguments. Next, we consider the case that r = 0. Note that θbn,0 = arg minθ∈Θ Qn,0 (θ), where ( ) ] n [ ( )2 1∑ ht 2 Qn,0 (θ) = log |ηt | log + 0.5 log(h /σ (θ)) := On,0 (θ) + Rn,0 (θ) t t n t=1 σt2 (θ)

with

] 1 ∑[ 2 − log |ηt | log vt (θ) + (0.5 log vt (θ)) n t=1 n

On,0 (θ) = and Rn,0 (θ) =

( ] ) n [ )2 ( 1∑ ht vt (θ) 2 2 log |ηt | log + 0.5 log(h /σ (θ)) − (0.5 log v (θ)) . t t t n t=1 σt2 (θ)

By the strong law of large numbers for stationary and ergodic sequences, we have [ ] 2 lim On,0 (θ) = E (0.5 log vt (θ)) ≥ 0 a.s.

n→∞

with the equality holding if and only if v1 (θ) = 1 a.s. or equivalently, θ = θ0 by Lemma A.2 in Francq and Zako¨ıan (2012). Meanwhile, as for (A3), it is not hard to show that lim sup |Rn,0 (θ)| = 0

n→∞ θ∈Θ∗ 0

a.s.

36 Then, since Θ is compact, the proof in the case of r = 0 is completed by standard arguments.



P ROOF OF T HEOREM 3.2. Define n

In,r =

1 ∑ ∂ℓt,r (θ0 ) n t=1 ∂θ∂θ′

n

and

∑ ∂ℓt,r (θ0 ) −1 1 √ Sn,r = −In,r . ∂θ n t=1

Since vt (θ0 ) = 1, EΣt (θ0 ) = (r2 /4)I when r > 0, and EΣt (θ0 ) = I/2 when r = 0. Then, it is not hard to see that by Lemmas A.3 and A.4, d

Sn,r −→ N (0, κr I −1 ) as n → ∞. By Taylor’s expansion, Theorem 3.1, and Lemma A.4, standard arguments entail that √

n(θbn,r − θ0 ) = Sn,r + op (1),

and hence the conclusion holds. This completes the proof.



P ROOF OF T HEOREM 3.3. A direct calculation shows that α en,r has the following explicit expression: n n yt 2 ∑ 1 ∑ yt2 r/2 for r = 0. for r > 0 and log α en,r = α en,r = log 2 n − 1 t=2 yt−1 n − 1 t=2 yt−1

From this, by Assumption 3.1, it is straightforward to see that without the compactness of Θα , as n → ∞, n ( ) √ 1 ∑ d r/2 r/2 r/2 n(e αn,r − α0 ) = √ (|ηt |r − 1)α0 + op (1) −→ N 0, (r2 /4)κr α0r for r > 0, n t=2 and

n √ 2 ∑ d n(log α en,r − log α0 ) = √ log |ηt | + op (1) −→ N (0, κr ) for r = 0. n t=2

By the delta method, it follows that as n → ∞,

( ) √ d n(e αn,r − α0 ) −→ N 0, κr α02 for r ≥ 0.

This completes the proof. P ROOF

OF



T HEOREM 4.1. For (i), the conclusion follows directly from Theorems 3.1-3.2 and the

Taylor expansion of log(x). For (ii), let γn (θ) = n−1 have

∑n

t=1

log(β + αηt2 (θ)) with ηt (θ) = yt /σt (θ). By Taylor’s expansion, we

γ bn,r := γn (θbn,r ) = γn (θ0 ) +

∗ ∂γn (θn,r ) (θbn,r − θ0 ), ′ ∂θ

∗ ∗ where θn,r satisfies ∥θn,r − θ0 ∥ ≤ ∥θbn,r − θ0 ∥. Using the expression

∂ηt2 (θ) h2 1 ∂σt2 (θ) = −ηt2 4 t ∂θ σt (θ) ht ∂θ

37 and the same argument as for Lemmas A.1-A.2, we can show that n ∂γ (θ) 1∑ n sup − Γt (θ) → 0 a.s. as n → ∞, ∗ ∂θ n θ∈Θ0 t=1

where

( Γt (θ) =

)′ )′ ηt2 vt (θ) αηt2 1 ( α β , − (θ) . d (θ), d t βvt (θ) + αηt2 βvt (θ) + αηt2 βvt (θ) + αηt2 vt (θ) t

Note that E∥Γt (θ0 )∥ ≤ 2(1 − ν1 )/α0 + 2ν1 /β0 < ∞, and EΓt (θ0 ) = 0 by the facts that Edα t (θ0 ) =

1/α0 and Edβt (θ0 ) = ν1 /{β0 (1 − ν1 )}, where ν1 ∈ (0, 1) is defined in Theorem 3.2. By Theorem 3.1 and the strong law of large numbers for stationary and ergodic sequences, it follows that ∗ ∂γn (θn,r ) = EΓt (θ0 ) + o(1) = o(1) a.s. ∂θ

Thus, since

√ b n(θn,r − θ0 ) = Op (1) by Theorem 3.2, we have √

n(b γn,r − γ0 ) =



n(γn (θ0 ) − γ0 ) + n

∗ ∂γn (θn,r )√ n(θbn,r − θ0 ) ′ ∂θ

1 ∑ =√ {log(β0 + α0 ηt2 ) − γ0 } + op (1) n t=1 d

−→ N (0, σγ20 ) as n → ∞, by the central limit theorem. This completes the proof of (ii). Moreover, for model (3.1), it is straightforward to see that √

n ] 1 ∑[ n(b γn,r − γ0 ) = √ log(α0 ηt2 ) − E log(α0 ηt2 ) . n t=1

Hence, parts (i) and (ii) hold by the central limit theorem. This completes all of the proofs. P ROOF

OF



T HEOREM 4.2. By Lemmas A.1-A.2, the mean value theorem, and a similar argument as

for Lemma A.4, it is straightforward to show that 2 log ηbt,r = log ηt2 + log(ht /σt2 (θbn,r ))

√ √ β ′ b −c0 t = log ηt2 − (dα / n) + op (1/ n), t (θ0 ), dt (θ0 )) (θn,r − θ0 ) + Op (e

where op (1) holds uniformly in t. Using the previous equation, we can show that under H0 in (4.2), ∑n ∑n n−1 t=2 zbt,r (b zt,r − zbt−1,r ) n−1 t=2 zt et b ∑n n(ϕn,r − 1) = = −2 ∑n + op (1) 2 2 n−2 t=2 zbt−1,r n t=2 zt−1 ∫1 B(s)dB(s) d −→ 0∫ 1 B2 (s)ds 0

38 and n[se(ϕbn,r )] =

d



n

σ2 ∑e,r n −2

−→ ( ∫1 0

2 bt,r t=2 z

1

B2 (s)ds

jointly. Hence, it follows that the conclusion holds.

=



)1/2 ,

n−2

σ2 ∑γn0

2 t=2 zt

+ op (1)



β ′ P ROOF OF T HEOREM 5.1. Recall that dt (θ0 ) = (dα t (θ0 ), dt (θ0 )) . By Taylor’s expansion and the sim-

ilar technique as for Lemma A.4, some calculations give us that } √ √ 1 { nb ρr,s (k) = Js (k) − E[d′t (θ0 )πs,t (k)][ n(θbn,r − θ0 )] + op (1) bs } √ 1 { = Js (k) − ps (k)[ n(θbn,r − θ0 )] + op (1), bs

(A4)

where πs,t (k) = |ηt−k |s − as and

n 1 ∑ Js (k) = √ (|ηt |s − as )(|ηt−k |s − as ). n t=k+1

Note that √

n(θbn,r − θ0 ) =

By (A4), it follows that

where qr,s

 

2 √ I −1 r n

∑n

t=1

dt (θ0 )(|ηt |r − 1) + op (1), if r > 0,

 √2 I −1 ∑n d (θ ) log |η | + o (1), t p t=1 t 0 n

√ n(b ρr,s (1), · · · , ρbr,s (m)) =

if r = 0.

n ∑ 1 √ (Im , −Vr,s )qr,s + op (1), bs n t=m+1

(  (|ηt |s − as )(|ηt−1 |s − as ), · · · , (|ηt |s − as )(|ηt−m |s − as ), d′ (θ0 )(|ηt |r − 1))′ , if r > 0, t = ( )′  (|η |s − a )(|η |s − a ), · · · , (|η |s − a )(|η s ′ if r = 0. t s t−1 s t s t−m | − as ), dt (θ0 ) log |ηt | ,

Particularly, for model (3.1), it is straightforward to see that √ n(b ρr,s (1), · · · , ρbr,s (m)) =

n ∑ 1 √ (Im , 0)qr,s + op (1). bs n t=m+1

Thus, the conclusion holds by the central limit theorem for martingale difference sequence.



R EFERENCES Berkes, I., Horv´ath, L., 2004. The efficiency of the estimators of the parameters in GARCH processes. Annals of Statistics 32, 633–655. Billingsley, P., 1999. Convergence of Probability Measures (2nd). Wiley. Bollerslev, T., 1986. Generalized autoregressive conditional heteroscedasticity. Journal of Econometrics 31, 307–327.

39 Bougerol, P., Picard, N., 1992a. Stationarity of GARCH processes and of some nonnegative time series. Journal of Econometrics 52, 115–127. Bougerol, P., Picard, N., 1992b. Strict stationarity of generalized autoregressive processes. Annals of Probability 20, 1714–1730. Brandt, A., 1986. The stochastic equation Yn+1 = An Yn + Bn with stationary coefficients. Advances in Applied Probability 18, 211–220. Breusch, T. S., Pagan, A. R., 1979. A simple test for heteroscedasticity and random coefficient variation. Econometrica 47, 1287–1294. Chen, M., Zhu, K., 2015. Sign-based portmanteau test for ARCH-type models with heavy-tailed innovations. Journal of Econometrics 189, 313–320. Dahlhaus, R., 1997. Fitting time series models to nonstationary processes. Annals of Statistics 25, 1–37. Dahlhaus, R., Subba Rao, S., 2006. Statistical inference for time-varing ARCH processes. Annals of Statistics 34, 1075–1114. Dickey, D.A., Fuller, W.A., 1979. Distribution of the estimators for autoregressive time series with a unit root. Journal of the American Statistical Association 74, 427–431. Engle, R. F., 1982. Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation. Econometrica 50, 987–1007. Engle, R. F., 2004. Risk and volatility: Econometric models and financial practice. American Economic Review 94, 405–420. Engle, R. F., Rangel, J. G., 2008. The spline GARCH model for unconditional volatility and its global macroeconomic causes. The Review of Financial Studies 21, 1187–1222. Escanciano, J. C., 2007. Model checks using residual marked emprirical processes. Statistica Sinica 17, 115–138. Escanciano, J. C., 2008. Joint and marginal specification tests for conditional mean and varnance models. Journal of Econometrics 143, 74–87. Fan, J., Qi, L., Xiu, D., 2014. Quasi-maximum likelihood estimation of GARCH models with heavy-tailed likelihoods (with discussion). Journal of Business & Economic Statistics 32, 178–205. Francq, C., Zako¨ıan, J.-M., 2007. Quasi-maximum likelihood estimation in GARCH processes when some coefficients are equal to zero. Stochastic Processes and their Applications 117, 1265–1284. Francq, C., Zako¨ıan, J.-M., 2010. GARCH Models: Structure, Statistical Inference and Financial Applications. Wiley. Francq, C., Zako¨ıan, J.-M., 2012. Strict stationarity testing and estimation of explosive and stationary generalized autoregressive conditional heteroscedasticity models. Econometrica 80, 821–861. Francq, C., Zako¨ıan, J.-M., 2013a. Optimal predictions of powers of conditionally heteroscedastic processes. Journal of the Royal Statistical Society: Series B 75, 345–367. Francq, C., Zako¨ıan, J.-M., 2013b. Inference in nonstationary asymmetric GARCH models. Annals of Statistics 41, 1970–1998. Giraitis, L., Kapetanios, G., Yates, Y., 2014. Inference on stochastic time-varying coefficient models. Journal of Econometrics 179, 46-65. Giraitis, L., Kapetanios, G., Yates, Y., 2017. Inference on multivariate heteroscedastic time-varying random coefficient models. Working paper.

40 Hafner, C. M., Preminger, A., 2015. An ARCH model without intercept. Economics Letters 129, 13–17. Hall, P., Yao, Q., 2003. Inference in ARCH and GARCH models with heavy-tailed errors. Econometrica 71, 285–317. J.P. Morgan. 1997. RiskMetrics technical documents (4th ed.) New York. Jensen, S. T., Rahbek, A., 2004a. Asymptotic normality of the QMLE estimator of ARCH in the nonstationary case. Econometrica 72, 641–646. Jensen, S. T., Rahbek, A., 2004b. Asymptotic inference for nonstationary GARCH. Econometric Theory 20, 1203– 1226. Li, W. K., 2004. Diagnostic checks in time series. Chapman & Hall/CRC. Li, D., Li, M., Wu, W., 2014. On dynamics of volatilities in nonstationary GARCH models. Statistics & Probability Letters 94, 86–90. Li, D., Wu, W., 2015. Renorming volatilities in a family of GARCH models. Working paper. Li, G., Li, W. K., 2005. Diagnostic checking for time series models with conditional heteroscedasticity estimated by the least absolute deviation approach. Biometrika 92, 691–701. Li, G., Li, W. K., 2008. Least absolute deviation estimation for fractionally integrated autoregressive moving average time series models with conditional heteroscedasticity. Biometrika 95, 399–414. Li, W. K., Mak, T. K., 1994. On the squared residual autocorrelations in non-linear time series with conditional heteroscedasticity, Journal of Time Series Analysis 15, 627–636. Ling, S., Tong, H., 2011. Score based goodness-of-fit tests for time series. Statistica Sinica 21, 1807–1829. Nelson, D. B., 1990. Stationarity and persistence in the GARCH(1,1) model. Econometric Theory 6, 318–334. Peng, L., Yao, Q., 2003. Least absolute deviations estimation for ARCH and GARCH models. Biometrika 90, 967– 975. Taylor, S., 1986. Modelling Financial Time Series. Wiley. Trapani, L., 2016. Testing for (in)finite moments. Journal of Econometrics 191, 57–68. Tsay, R. S., 2008. Lecture notes for “Time Series Analysis for Forecasting and Model Building”. The University of Chicago. White, H., 1980. A heteroscedasticity-consistent covariance matrix estimator and a direct test for heteroscedasticity. Econometrica 48, 817–838. Zhu, K., 2016. Bootstrapping the portmanteau tests in weak auto-regressive moving average models. Journal of the Royal Statistical Society: Series B 78, 463-485. Zhu, K., Li, W. K., 2015a. A new Pearson-type QMLE for conditionally heteroscedastic models. Journal of Business & Economic Statistics 33, 552–565. Zhu, K., Li, W. K., 2015b. A bootstrapped spectral test for adequacy in weak ARMA models. Journal of Econometrics 187, 113–130.