Accepted Manuscript
Binary Time Series Models Driven by a Latent Process Konstantinos Fokianos, Theodoros Moysiadis PII: DOI: Reference:
S2452-3062(17)30009-6 10.1016/j.ecosta.2017.02.001 ECOSTA 44
To appear in:
Econometrics and Statistics
Received date: Revised date: Accepted date:
20 February 2016 19 October 2016 7 February 2017
Please cite this article as: Konstantinos Fokianos, Theodoros Moysiadis, Binary Time Series Models Driven by a Latent Process, Econometrics and Statistics (2017), doi: 10.1016/j.ecosta.2017.02.001
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
Binary Time Series Models Driven by a Latent Process
CR IP T
Konstantinos Fokianos Theodoros Moysiadis University of Cyprus Department of Mathematics & Statistics PO BOX 20537 Nicosia 1678 Cyprus
Abstract
AN US
The problem of ergodicity, stationarity and maximum likelihood estimation is
studied for binary time series models that include a latent process. General models are considered, covered by different specifications of a link function. Maximum likelihood estimation is discussed and it is shown that the MLE satisfies standard asymptotic theory. The logistic and probit models, routinely
M
employed for the analysis of binary time series data, are of special importance in this study. The results are applied to simulated and real data. Keywords: autocorrelation, generalized linear models, logistic model, probit
ED
model, regression, weak dependence.
PT
2010 MSC: 62M10, 62J12, 62F12, 62M20, 62M09
1. Introduction
CE
Figure 1 displays trading activity of six thinly traded shares at the Johannesburg Stock Exchange between the time period from 5th of October 1987 to 3rd of June 1991. These data are binary because for each share the presence (1) or the absence (0) of a trading was recorded. We will further analyze these data
AC
5
in Section 7 but we point out that modeling of presence/absence is of interest for identification of trading patterns, at least for this particular application. The goal of this article is to study properties of regression based models for
the analysis of binary time series; see [1] for an early treatment. Regression 10
modeling, in this context, has been studied by [2], [3], and [4], among others. Preprint submitted to Journal of Econometrics and Statistics
February 20, 2017
ACCEPTED MANUSCRIPT
Such data have been increasingly popular in various financial applications ([5], [6], [7], [8], [9], [10, 11, 12], [13] and [14]), but also to other scientific fields. We deliver ergodicity and stationarity conditions for binary time series models,
15
CR IP T
employing the concept of weak dependence (e.g. [15], [16]). Some previous work in this direction was reported by [17] but their point of view is different from the approach taken here.
The desired conditions are established by considering binary time series mod-
els which are driven by a latent process and are specified by means of a general link function. We are especially interested to the logistic and probit models;
these are the most popular models for analyzing binary data ( [18], [19]). It is
AN US
20
easy to see that both of these models fall within the framework of generalized linear models (GLM) as developed by [20] and they can be applied in a straightforward way by using existing software tools. GLM are natural extensions of the ordinary AR models (see [21], for instance), in the sense that the response 25
distribution –in the case of binary data it is the Bernoulli distribution– belongs
M
to the exponential family. Moreover it is assumed that a “nice” function –called the link function– of the conditional mean of the response is linearly related to
ED
its lagged values and/or a latent process. For the logistic and probit models, the link function is the inverse cumulative distribution function (c.d.f.) of the 30
logistic and normal distributions, respectively. However, the presentation will
PT
not be restricted to those models but we will outline a general theory which covers other link functions.
CE
Regression models for binary time series have been discussed previously by [22, Ch.2] and the references therein. It was explicitly shown that the combina-
35
tion of likelihood inference and generalized linear models provide a systematic
AC
framework for the analysis of quantitative as well as qualitative time series data. Indeed, estimation, diagnostics, model assessment, and forecasting are implemented in a straightforward manner, while the computation is carried out by a number of the existing software packages. Experience shows that both positive
40
and negative association can be taken into account by a suitable parametrization of the model. 2
ACCEPTED MANUSCRIPT
Our novel contribution consists of introducing a feedback process which enriches the dynamics of the observed process. This approach parallels GARCH modeling whereby the volatility is regressed on its past values and the lagged square returns. For binary time series, a GARCH type model is specified in
CR IP T
45
terms of the success probabilities, which are regressed on their past values and lagged responses. In fact, the main contribution of this article, is the general-
ization of the recent results obtained by [23] to a broader class of binary time series models; probit and logistic models are special cases in our framework.
In Section 2 we give several examples of the type of models we study. Sec-
50
AN US
tion 3 introduces a general model (eq. (5)), which can be employed for the
analysis of binary time series. This section contains results about stationarity and ergodicity of (5). Section 4 examines maximum likelihood estimation for the parameter vector. Section 5 reports a limited simulation study and Section 55
6 shows that inclusion of covariates in (5) can be easily accomplished within the GLM framework. This is an appealing feature of our approach. Section
M
7 illustrates an example of a real data analysis. The paper concludes with a
ED
discussion and an Appendix with some theoretical results. 2. Modeling of Binary Time Series Let {Yt }t∈Z denote a binary time series. Consider an increasing sequence
PT
of σ-fields, say {Ft }t≥1 , which will be specified in detail later. Denote the
CE
conditional probability of success given Ft−1 by pt = P (Yt = 1|Ft−1 ) = E(Yt |Ft−1 ),
t ∈ Z.
In this section we focus on developing and discussing autoregressive models
AC
for binary time series, which might include a feedback mechanism or a latent process, as we explain below. For instance, consider the following simple linear model
60
pt = d + a1 pt−1 + b1 Yt−1 ,
t ∈ Z,
(1)
where d, a1 , b1 ∈ R. The above model is quite analogous to the ordinary GARCH model (see [24]). Model (1) was studied by [25, 26] for modeling the joint 3
ACCEPTED MANUSCRIPT
distribution of discrete price changes and their duration. In this work, we are interested on the price change process only. Clearly (1) is driven by a latent process. However, it also imposes restriction on a1 and b1 because the transition probabilities {pt } have to belong to the interval (0, 1). In addition, the linear
CR IP T
65
model (1) introduces further constraints on the regression parameters when any
covariates are included. The complexity of the problem increases considerably for a model which entertains more lagged regressors.
[22, Sec.2.1] show that the logistic model is a natural choice for analyzing binary time series. The logistic model is defined by t ∈ Z,
AN US
λt = d + a1 λt−1 + b1 Yt−1 ,
(2)
for some real unknown parameters d, a1 , b1 , where pt λt = log , 1 − pt
is the inverse logistic c.d.f. It is obvious that (2) falls within the framework of generalized linear models–in fact the logistic model corresponds to the canonical
M
70
link function model according to GLM terminology. Additionally, (2) does not impose complicated restrictions on the parameter space while it allows inclusion
ED
of covariates in a straightforward manner; see also [27] and [28, p.471]. By recalling that the logistic link corresponds to the canonical link of the 75
Bernoulli distribution, we note that model (2) parallels the log-linear model
PT
proposed by [29] for count time series modeling. Model (2) has been studied in [23], where it was shown that the constraint 4|a1 | + |b1 | < 4 is necessary for
CE
the process (Yt , λt ) to be ergodic and stationary. In this case, the values of a1 and b1 are allowed to belong to a larger set of values compared to the typical stationarity region of ordinary ARMA models.
AC
80
The probit model is another useful tool for binary time series analysis. It is
defined analogously to model (2) by pt = Φ(πt ),
πt = d + a1 πt−1 + b1 Yt−1 ,
(3)
where Φ(·) denotes the c.d.f. of a standard normal random variable and d, a1 , b1 are unknown real parameters. Models of this form, have been considered by [30], 4
ACCEPTED MANUSCRIPT
[31], [32], [33, 13], and [34], among others. Models (2) and (3) are similar in the sense that they are defined by means of a latent process which is a monotone transformation of the success probability pt . Indeed, model (3) is defined by
pt = FL (λt ),
CR IP T
means of the probit transformation, whereas model (2) can be rewritten as λt = d + a1 λt−1 + b1 Yt−1 ,
with FL (·) denoting the standard logistic c.d.f. The same argument shows that (1) is of identical form with that of (2) and (3) by considering the standard uniform c.d.f. Hence, it is quite natural to study models of the general form
85
AN US
given by (5) as we discuss in Section 3. For both models (2) and (3), it is
easy to incorporate covariates by adding an extra term of the form γ 0 X t on their right hand side. Here γ is a vector of parameters and X t is a covariate process. Note that the logistic (respectively, probit) model allows this inclusion because FL−1 (·) (respectively, Φ−1 (·)) takes values on R. Model (1) poses more restrictions on coefficients though because γ 0 X t ≥ 0 and 0 ≤ pt ≤ 1. process is considered:
p X
ρj Yt−j + γ 0 X t + Ut > 0 .
ED
Yt = I
M
Binary time series models have been also studied by [17] where the following
90
j=1
(4)
In the above, I(·) is the indicator function and ρi , i = 1, 2, . . . , p are unknown
PT
parameters. In addition, Ut is an error sequence such that the vector process (X 0t , Ut ) is strictly stationary and strongly mixing. When the errors are normally distributed, γ = 0 and p = 1, then we obtain (3) but without the hidden
CE
95
process term. The approach taken here is based on modeling directly binary time series by employing GLM methodology. Hence, (4) can be also viewed as
AC
a motivation to consider the logistic (or probit) link functions, among many other possible specification, when Ut is an i.i.d sequence of logistic (or normal)
100
distributed random variables. In addition we study model (5), which includes a hidden process. [17] show near epoch dependence of the process defined by (4) and they study likelihood theory for the probit model employing smoothed maximum score estimation. We study maximum likelihood inference for (5), 5
ACCEPTED MANUSCRIPT
which allows for general specifications, and we establish asymptotic normality 105
when the feedback term is included in the model; such proof is missing from the
CR IP T
literature to the best of our knowledge (cf. [10, p.43]). 3. Main Results
In this section we study conditions for ergodicity and stationarity of binary time series models specified by pt = F (κt ),
κt = g(Yt−1 , Yt−2 , . . . , Yt−p , κt−1 , κt−2 , . . . , κt−q ; θ),
t ∈ Z, (5)
AN US
where κt denotes the feedback process, Yt−i , i = 1, 2, . . . , p is the i-th lagged
value of the response process, κt−j , j = 1, 2, . . . , q is the j-th lagged value of κt , 110
g is a real function (not necessarily linear) and F is any continuous c.d.f. with F 0 = f . Then (5) with p = q = 1, yields models (2) (respectively, (3)) when F is chosen to be the standard logistic c.d.f. (FL ) (respectively, the standard normal c.d.f. (Φ)), and g(.) is linear.
115
M
Model (5) implies the existence of the feedback mechanism, which determines the evolution of the observed process. When strong correlation exists in the
ED
data, then we expect (5) to be more parsimonious than a model which does not include the components κt−j . This is a similar phenomenon observed to count time series data; see [35, Sec.2]. Model (5) allows for a variety of non-
120
PT
linear models for the analysis of binary time series. The non-linear specification, introduced by the choice of g(·) is made possible by the fact that the inverse c.d.f. assumes values in R.
CE
For developing asymptotic likelihood theory, we need a weak law of large
numbers for the Hessian matrix, a martingale CLT for the score function and
AC
existence of moments of the joint process (Yt , κt ). The main tool to obtain the necessary conditions that ensure ergodicity and stationarity of the joint process (Yt , κt ) is the notion of weak dependence, which was introduced by [36]; see also [15]. In fact [16] prove the existence of a weakly dependent strictly stationary solution of the equation Xt = H(Xt−1 , Xt−2 , Xt−3 , . . . ; ξt ), 6
ACCEPTED MANUSCRIPT
which is a chain with infinite memory, where {ξt } is an innovation sequence. In our case, Xt = (Yt , κt ). [37] assume a contraction type condition (see also [38], [39], [40]) to prove stationarity, ergodicity and develop asymptotic theory of a Poisson based model for the analysis of count time series. This is the type of
CR IP T
125
condition that we will assume in this work for the case of binary time series.
When a count time series is under consideration, the innovation sequence {ξt }
appearing in the last display consists of independent unit (standardized) Poisson processes. For the models we consider in this contribution, the sequence {ξt } 130
consists of standard uniform random variables; see the proof of Theorem 1 in
AN US
the Appendix.
Note that [17] have shown that the process Yt , defined by (4), is near epoch dependent by assuming that (X t , Ut ) is strongly stationary and mixing process and by imposing mild conditions to the regression coefficients. Their arguments 135
show that (4) satisfies the property of fading memory; for a detailed discussion on the property of near epoch dependence see [41]. Our approach is quite
M
different because we impose conditions to the function g(·) of (5). In fact, after repeated substitution, the process κt can be expressed as func-
140
ED
tion of the binary process Ys for s < t. Such model is quite analogous to the model considered by [42] who gave an example of a process with discrete innovations which is not strongly mixing. The main difficulty for proving mixing of
PT
(Yt , κt ) is that the process Yt assumes discrete values but the process κt takes values in R-see also [39, Sec.3] who discusses these issues for count time series
CE
models. The concept of weak dependence, as introduced by [36] bypasses the 145
problem even when the innovations are discrete; see also [36, Sec. 3.4]. Theorem 1. Consider model (5) and assume that there exist β1 , . . . , βp , γ1 , . . . , γq ∈
AC
R+ , for any X = (Y1 , . . . , Yp , κ1 , . . . , κq ), X 0 = (Y10 , . . . , Yp0 , κ01 , . . . , κ0q ) in {0, 1}p × Rq , such that
|g(Y1 , . . . , Yp , κ1 , . . . , κq )−g(Y10 , . . . , Yp0 , κ01 , . . . , κ0q )| ≤
q X i=1
γi ·|κi −κ0i |+
p X i=1
βi ·|Yi −Yi0 |. (6)
We further assume that supx∈R |f (x)| ≤ K, (see also Assumption 4). If K ≥ 7
ACCEPTED MANUSCRIPT
Pq Pp 1, assume that (K + 1)( i=1 γi + i=1 βi ) < 1. If K < 1, suppose that Pq Pp ( i=1 γi ) /(1 − K) + i=1 βi < 1. Then, there exists a unique causal solu-
tion {(Yt , κt ), t ∈ Z}, which is stationary, ergodic and satisfies E|κt |s < ∞,
∀s ∈ N.
CR IP T
150
Remark 1. The constant K determines whether the link function F (·) is a contraction. Hence, we consider separately the cases K ≤ 1 and K > 1.
Corollary 1. Suppose that we consider model (5) with F = FL , κt = λt and
AN US
155
q = 1. In this case K = 1/4 < 1. Then (6) is satisfied provided that 4γ1 /3 + Pp i=1 βi < 1. Hence there exists a unique causal solution {(Yt , λt ), t ∈ Z}, which is stationary, ergodic and satisfies E|λt |s < ∞, ∀s ∈ N.
Corollary 2. Consider (5) with F = Φ, κt = πt and q = 1. Then (6) is satisfied √ √ √ Pp provided that γ1 2π/( 2π −1)+ i=1 βi < 1, because K = 1/ 2π < 1. Hence
there exists a unique causal solution {(Yt , πt ), t ∈ Z}, which is stationary, ergodic and satisfies E|πt |s < ∞, ∀s ∈ N.
M
160
We discuss some special cases of model (5) with F = Φ to bring across the
ED
essence of Theorem 1. In this case κt = πt .
PT
Example 1. Let the function g(·) to be linear and q = 1. Then πt = d + a1 πt−1 +
p X
bi Yt−i ,
(7)
i=1
CE
where a1 , bi (i = 1, . . . , p) ∈ R. For this particular case it is sufficient to assume √ √ Pp that 2π|a1 | + i=1 |bi | < 2π/2, which changes the model parameter region.
Note that in Corollary 2, the function g(·) is any function, which satisfies (6),
AC
but for this example g(·) is chosen to be linear. If we drop the feedback term,
we obtain
and the required condition is
πt = d +
p X
bi Yt−i ,
i=1
Pp
i=1
|bi | <
8
√
2π.
(8)
ACCEPTED MANUSCRIPT
A model of specific interest is (3). If the function g(·) in (3) is not linear then we have to assume that there exist γ1 , β1 > 0, such that γ1 + β1 < 1.
CR IP T
|g(Y, π) − g(Y 0 , π 0 )| ≤ γ1 · |π − π 0 | + β1 · |Y − Y 0 |,
√ √ The condition γ1 + β1 < 1 is slightly better than the condition γ 2π/( 2π − 165
1) + β1 < 1, obtained from Corollary 2 with p = 1. This is a consequence of
verifying condition 3.1 of [16]. For p = 1, the proof can be slightly altered so that the model parameters satisfy a weaker condition. For the special case of
170
AN US
model (3) and when g(·) is linear, it is enough to assume the weaker condition √ √ 2π|a1 | + |b1 | < 2π and the result of Theorem 1 is still true. Similar conditions are found for the logistic model (2) by replacing the con√ stant 2π with the constant 4. The necessary assumptions for different lags and link functions are displayed in Table 1.
Remark 2. Apart from probit and logit models there is a wide range of link
175
M
functions that we can apply for binary time series analysis. For instance, if pt = exp(− exp(−κt )), then κt = − log(− log(pt )). Therefore we obtain the log-
ED
log link function. Another possibility is to consider pt = 1 − exp(− exp(κt )). Then κt = log(− log(1 − pt )). This is the complementary log-log link function.
PT
For a detailed discussion about other link functions see [22, Sec.2.1.2].
CE
4. Inference
Recall model (5). Assume that the function g(·) depends on some finite
180
dimensional parameter θ, with s = dim(θ). For instance when model (3) holds,
AC
then F = Φ and κt (θ) = πt (θ) that is pt (θ) = Φ(πt (θ)) and θ = (d, a1 , b1 )T de-
notes the respective vector of unknown parameters. In what follows, we discuss the estimation of θ, based on the conditional likelihood function for model (5).
185
We set Ft = σ(κ1−q , . . . , κ−1 , κ0 , Ys , s ≤ t). The likelihood function is given by
9
ACCEPTED MANUSCRIPT
Order
g(·)
(5)
p=q=1
linear
F =Φ
(5) F = FL
p=q=1
general
p > 1, q = 1
linear
p > 1, q = 1
general
p=q=1
linear
p=q=1
general
p > 1, q = 1
linear
p > 1, q = 1
general
√
Conditions 2π|a1 | + |b1 | <
√
2π
γ1 + β1 < 1 √ √ P |a1 | 2π/( 2π − 1) + pi=1 |bi | < 1 √ √ Pp γ1 2π/( 2π − 1) + i=1 βi < 1
CR IP T
Models
4|a1 | + |b1 | < 4
γ1 + β1 < 1 Pp i=1 |bi | < 1 Pp 4γ1 /3 + i=1 βi < 1
4|a1 |/3 +
Table 1: Sufficient conditions to establish ergodicity and stationarity for the joint process (Yt , κt ) for some different classes of models. The notation γ1 and βi , i = 1, 2, . . . , p, refers to
AN US
the notation introduced in Theorem 1 and the notation a1 and bi , i = 1, 2, . . . , p, refers to the special case of linear models.
=
N Y
t=1
=
N Y
t=1
P (Yt = yt |Ft−1 ) =
N Y
t=1
pYt t (θ)(1 − pt (θ))1−Yt
F Yt (κt (θ))(1 − F (κt (θ)))1−Yt .
M
LN (θ)
(9)
lN (θ)
=
ED
The conditional log-likelihood is equal to N X
lt (θ) =
t=1
(10)
PT
t=1
N h i X Yt log F (κt (θ)) + (1 − Yt ) log(1 − F (κt (θ))) .
ˆ we maximize (9). We denote by In order to compute the conditional MLE θ
CE
ˆ = arg max(LN (θ)), where Θ ⊆ Rs denotes the parameter space. The score θ θ∈Θ
function is equal to
AC
SN (θ) =
N X t=1
∇lt (θ) =
N X ∂lt (θ) t=1
∂θ
=
N X t=1
f (κt (θ)) ∂κt (θ) (Yt − pt (θ)), pt (θ)(1 − pt (θ)) ∂θ (11)
by recalling that F 0 = f. It is clear, that at the true value θ 0 , we obtain
E[Yt − pt (θ 0 ) | Ft−1 ] = 0. This fact combined with Assumption 4 –see below– shows that the score function is square integrable martingale. In particular, for
10
ACCEPTED MANUSCRIPT
model (3) we obtain that
Generally, for model (5) we have that ∂κt (θ) ∂θ
=
CR IP T
T ∂κt (θ) ∂πt (θ) ∂πt (θ) ∂πt (θ) ∂πt (θ) = = , , ∂θ ∂θ ∂d ∂a1 ∂b1 T ∂πt−1 (θ) ∂πt−1 (θ) ∂πt−1 (θ) = 1 + a1 , πt−1 (θ) + a1 , Yt−1 + a1 . ∂d ∂a1 ∂b1 q X ∂g(Yt−1 , . . . , Yt−p , κt−1 (θ), . . . , κt−q (θ); θ) ∂κt−j (θ) ∂κt−j (θ)
j=1
∂θ
+
AN US
∂g(Yt−1 , . . . , Yt−p , κt−1 (θ), . . . , κt−q (θ); θ) . ∂θ
The conditional information matrix is given by GN (θ)
= =
f (κt (θ)) ∂κt (θ) Cov (Yt − pt (θ)) Ft−1 pt (θ)(1 − pt (θ)) ∂θ t=1
N X
N X t=1
f (κt (θ))2 ∂κt (θ) ∂κt (θ) , F (κt (θ))(1 − F (κt (θ))) ∂θ ∂θ T
M
since the conditional variance of Yt is Var Yt Ft−1 = pt (θ)(1 − pt (θ)). We also define the matrix
f (κt (θ))2 ∂κt (θ) ∂κt (θ) , F (κt (θ))(1 − F (κt (θ))) ∂θ ∂θ T
ED
190
G(θ)
= E
(12)
PT
where expectation is taken with respect to the stationary distribution. The Hessian matrix is given by = −
CE
HN (θ)
t=1
∂θ∂θ T
N X ∂ 2 κt (θ)
f (κt (θ)) (Yt − pt (θ)) ∂θ∂θ F (κt (θ))(1 − F (κt (θ))) t=1 N X ∂κt (θ) ∂κt (θ) ∂f κt (θ) /∂κt (θ) − (Yt − pt (θ)) ∂θ ∂θ T F (κt (θ))(1 − F (κt (θ))) t=1 2 N X ∂κt (θ) ∂κt (θ) f (κt (θ)) + (Yt − pt (θ))2 . T ∂θ F (κ (θ))(1 − F (κ (θ))) ∂θ t t t=1
= −
AC
N X ∂ 2 lt (θ) T
(13)
11
ACCEPTED MANUSCRIPT
ˆ can be proved by imposing the Asymptotic normality of conditional MLE θ following assumptions, which are quite analogous to those given by [43], [44], 195
[45] and [23].
CR IP T
Assumption 1. The parameter θ belongs to a compact set Θ and the true value, θ 0 , belongs to the interior of Θ.
Assumption 2. The components of ∂g/∂θ are assumed to be linearly independent.
Assumption 3. The function g(·) is four times differentiable with respect to θ
l=1
AN US
and κ. In addition if x∗ = (θ T , κT )T = (θ1 , . . . , θs , κ1 , . . . , κq )T , then ∂g(Y1 , . . . , Yp , κ1 , . . . , κq ; θ) ∂g(Y10 , . . . , Yp0 , κ01 , . . . , κ0q ; θ) − ∂x∗i ∂x∗i p q X X ≤ bil |Yl − Yl0 | + cil |κl − κ0l |, i = 1, . . . , s + q, l=1
≤
M
∂ 2 g(Y , . . . , Y , κ , . . . , κ ; θ) ∂ 2 g(Y 0 , . . . , Y 0 , κ0 , . . . , κ0 ; θ) 1 p 1 q q p 1 1 − ∂x∗i ∂x∗j ∂x∗i ∂x∗j p X
bijl |Yl − Yl0 | +
p X
bijkl |Yl − Yl0 | +
l=1
q X l=1
cijl |κl − κ0l |,
i, j = 1, . . . , s + q,
PT ≤
l=1
q X l=1
cijkl |κl − κ0l |,
i, j, k = 1, . . . , s + q,
where bil , bijl , bijkl , cil , cijl , cijkl ∈ R+ . We further assume that ∀i, j, k ∈ {1, . . . , s+
CE
200
ED
∂ 3 g(Y , . . . , Y , κ , . . . , κ ; θ) ∂ 3 g(Y 0 , . . . , Y 0 , κ0 , . . . , κ0 ; θ) 1 p 1 q q p 1 1 − ∂x∗i ∂x∗j ∂x∗k ∂x∗i ∂x∗j ∂x∗k
q} the following hold Ps+q Pp i=1 ( l=1 bil +cil ) < ∞,
AC
cijkl ) < ∞,
Ps+q Pp i,j=1 ( l=1 bijl +cijl ) < ∞,
E|∂g(0; θ)/∂x∗i | < ∞,
Ps+q
i,j,k=1 (
E|∂ 2 g(0; θ)/∂x∗i ∂x∗j | < ∞,
Pp
l=1 bijkl +
E|∂ 3 g(0; θ)/∂x∗i ∂x∗j ∂x∗k | <
∞.
205
Assumption 4. The function F (x) is defined in R, such that F : R → [0, 1], is
monotone and continuously differentiable with F 0 = f . In addition, there exists a finite, positive number K, such that sup |f (x)| ≤ K. x∈R
12
ACCEPTED MANUSCRIPT
Assumption 1 is standard in the literature. Assumption 2 guarantees that the matrix G(θ) defined by (12) is invertible. Assumption 3 make possible the ap210
proximation of the derivatives of g(·) by linear functions of its arguments. These
CR IP T
assumptions are commonly used in estimation of nonlinear time series models, and they are relatively mild. They are employed when the log-likelihood function is three times differentiable, as in the case we consider. Finally, Assumption 4 implies that the process pt (θ) is bounded away from zero and one. This is re215
quired for defining properly the quantities (10)-(13). Obviously, such conditions
are satisfied by the logistic model (2) and the probit model (3). The follow-
AN US
ˆ Its proof is ing theorem shows the asymptotic normality of conditional MLE θ.
based on standard extension of the arguments given by [23], among others; see Lemma A-1 of the Appendix. Additionally, the Appendix contains Lemma A-2, 220
ˆ does not depend on the initial value of κt . which shows that the maximizer θ Theorem 2. For model (5) and under the assumptions of Theorem 1 and As-
M
sumptions 1-4, there exists an open neighborhood of the true value θ 0 , such that a unique conditional MLE exists with probability converging to one, as N → ∞.
ˆ does not depend on the initial values of {κt } and it is consistent Furthermore, θ
ED
and asymptotically normally distributed, i.e. √
D
ˆ − θ 0 ) −→ N (0, G−1 (θ 0 )), N (θ
PT
ˆ is a consistent estimator of where G is defined in (12). In addition, G(θ)
CE
G(θ 0 ).
AC
5. Simulations
225
We present a limited simulation study in this section. We consider g(·)
to be linear and the link function is chosen to be either the standard logistic FL or the standard normal Φ. We consider linear models of different order (p = 1, 2, 3 and q = 1) and for different sample size (N = 200, 1000, 1500, 2500).
All computations have been implemented in R ([46]). We perform 1000 simulations for every case, discarding the first 500 observations after generating 13
ACCEPTED MANUSCRIPT
230
data, to ensure that the stationarity region has been reached. The true paramPp eters have to satisfy the condition 4|a1 |/3 + i=1 |b1 | < 1 when F = FL and √ √ Pp |a1 | 2π/( 2π − 1) + i=1 |b1 | < 1 when F = Φ (see Table 1). To initiate the
CR IP T
algorithm, we set κ0 = 1 and we obtain initial values for the parameter vector θ
by a simple generalized linear model fitting. The maximum likelihood estima235
tors and their standard errors are displayed in Table 2. Corresponding QQ-plots
and histograms of the simulated estimators are not displayed for saving space, but the asserted approximation to the normal distribution is satisfactory in
every case, especially when the sample size grows larger. Table 2 shows that
240
AN US
the estimators approximate their true value quite satisfactorily in both cases considered (g = FL−1 or Φ−1 ), especially when p = 1.
We investigate an interesting scenario is when the observed time series and the respective model is stationary but possesses a ”long memory” effect. Such a model, practically, would lead to persistent runs of ones and zeros in the observed binary process. For the linear logistic model (2) such a process can be generated by the sets of true values given in Table 3, for example. In the latter
M
245
two cases (the long memory effect is stronger), the conditions the parameters
ED
have to satisfy according to Theorem 1 are violated. To carry out the estimation we employ unconstrained optimization. The obtained estimators are consistent, even for small sizes. These results indicate that the constraints imposed on the parameters are not very sharp in case the linear model is considered and thus,
PT
250
CE
the stationarity region may be even larger.
6. Covariates One of the advantages of model (5) studied in this paper is the ease of
AC
introducing time dependent covariates. Suppose that {Xt } is some covariate
time series. Define the the σ-field to Ft = σ(κ1−q , . . . , κ0 , Xs+1 , Ys , s ≤ t) and consider the logistic model (2), for instance. Then we obtain the model λt = d + a1 λt−1 + b1 Yt−1 + γ 0 Xt ,
14
(14)
ACCEPTED MANUSCRIPT
model
Par.
Par.
fitted
True
Estim.
Sample size N =200
N =1000
N =1500
N =2500
0.5
0.511
(0.431)
0.495
(0.205)
0.504
(0.166)
0.503
(0.126)
F = FL
-0.3
a ˆ1
-0.292
(0.334)
-0.295
(0.159)
-0.304
(0.130)
-0.301
(0.096)
p=1
1
ˆb1
0.998
(0.351)
0.997
(0.156)
1.004
(0.126)
1.000
(0.099)
(5)
0.5
dˆ
0.508
(0.350)
0.505
(0.166)
F =Φ
-0.3
a ˆ1
-0.294
(0.261)
-0.304
(0.130)
p=1
1
ˆb1
1.000
(0.268)
1.003
(0.125)
(5)
0.5
dˆ
0.402
(0.193)
0.484
(0.110)
F = FL
0.15
a ˆ1
0.061
(0.319)
0.096
(0.185)
p=2
-0.4
ˆb1
-0.249
(0.219)
-0.358
(0.112)
-0.35
ˆb2
-0.288
(0.235)
-0.351
(5)
0.5
dˆ
0.451
(0.135)
F =Φ
0.15
a ˆ1
0.078
p=2
-0.4
ˆb1
-0.327
-0.35
ˆb2
(5)
0.5
dˆ
F = FL
0.15
p=3
-0.4
0.515
(0.143)
0.506
(0.107)
-0.309
(0.109)
-0.306
(0.082)
0.997
(0.100)
1.001
(0.076)
0.490
(0.096)
0.500
(0.082)
0.111
(0.162)
0.114
(0.137)
-0.373
(0.094)
-0.384
(0.073)
-0.350
(0.114)
-0.358
(0.095)
(0.068)
0.504
(0.055)
0.504
(0.076)
(0.200)
0.088
(0.119)
0.092
(0.102)
0.107
(0.085)
(0.152)
-0.379
(0.075)
-0.388
(0.061)
-0.390
(0.049)
-0.323
(0.163)
-0.363
(0.090)
-0.366
(0.076)
-0.361
(0.063)
0.354
(0.208)
0.459
(0.125)
0.479
(0.112)
0.497
(0.100)
a ˆ1
0.036
(0.315)
0.085
(0.209)
0.088
(0.196)
0.095
(0.166)
ˆb1
-0.216
(0.204)
-0.350
(0.117)
-0.362
(0.096)
-0.378
(0.073)
-0.23
ˆb2
-0.147
(0.208)
-0.211
(0.134)
-0.222
(0.120)
-0.230
(0.098)
-0.15
ˆb3
-0.101
(0.214)
-0.132
(0.121)
-0.142
(0.102)
-0.151
(0.085)
(5)
0.5
dˆ
M
AN US
(0.136)
CR IP T
(5)
dˆ
0.440
F =Φ
0.15 -0.4 -0.23
0.507
(0.097)
0.512
(0.089)
0.519
(0.078)
0.048
(0.209)
0.073
(0.145)
0.078
(0.130)
0.083
(0.115)
ˆb1
-0.312
(0.161)
-0.375
(0.075)
-0.385
(0.063)
-0.393
(0.050)
ˆb2
-0.169
(0.164)
-0.215
(0.091)
-0.214
(0.083)
-0.217
(0.068)
ˆb3
-0.127
(0.161)
-0.155
(0.081)
-0.159
(0.070)
-0.160
(0.061)
PT
-0.15
(0.156)
a ˆ1
ED
p=3
0.509
Table 2: Maximum likelihood estimators and their standard errors (in parentheses) for model
CE
(5) with g(·) linear, F = FL or F = Φ, p = 1, 2, 3, q = 1 and different sample sizes N = 200, 1000, 1500, 2500. Results are based on 1000 runs.
AC
where γ is a vector of unknown parameters. If {Xt } is itself weakly dependent, then we can construct a two-dimensional process {κt , Xt+1 } and a corresponding
255
three dimensional with {Yt } included. If the transition mechanism of {Xt } does not depend on {κt , Yt }, it is simple to find conditions for weak dependence. The triangular structure when {Xt } is exogenous allows for separate conditions for {Xt }. The conditions for {κt , Yt } are exactly as before. Inference for model 15
ACCEPTED MANUSCRIPT
Par.
Par.
Sample size
True
Estim.
-0.5
dˆ
-0.516
(0.188)
-0.501
(0.109)
-0.506
(0.076)
-0.500
(0.060)
-0.502
(0.047)
0.5
a ˆ1
0.431
(0.209)
0.480
(0.114)
0.489
(0.080)
0.493
(0.064)
0.494
(0.049)
1
ˆb1
1.016
(0.297)
0.999
(0.188)
1.005
(0.134)
1.000
(0.106)
1.002
(0.084)
0.5
dˆ
0.476
(0.432)
0.489
(0.268)
0.493
-0.95
a ˆ1
-0.940
(0.079)
-0.948
(0.021)
-0.948
2.5
ˆb1
2.493
(0.495)
2.528
(0.298)
2.514
-1
dˆ
-0.990
(0.459)
-0.995
(0.271)
-0.997
-0.5
a ˆ1
-0.496
(0.136)
-0.503
(0.075)
-0.499
4
ˆb1
4.024
(0.652)
4.013
(0.392)
4.002
N =500
N =1000
N =1500
(0.174)
0.506
(0.149)
0.499
(0.112)
(0.014)
-0.949
(0.010)
-0.950
(0.008)
(0.204)
2.497
(0.170)
2.500
(0.127)
(0.193)
-0.997
(0.158)
-1.000
(0.127)
(0.055)
-0.502
(0.044)
-0.501
(0.033)
(0.268)
3.997
(0.226)
4.006
(0.180)
AN US
Table 3: Maximum likelihood estimators and their standard errors (in parentheses) for model (5) with g(·) linear, F = FL , p = 1, q = 1 and different sample sizes N = 200, 500, 1000, 1500, 2500. Results are based on 1000 runs.
(14) is developed by partial likelihood theory, see [22, Ch.1] among others. The 260
asymptotic theory can be developed as in Section 4.
M
To illustrate the performance of our modeling, we considered xt to be an AR(1) model with φ = 0.95 and φ = 0.5. We generated data by the model
ED
λt = d + a1 λt−1 + b1 Yt−1 + c1 xt .
(15)
In the first case (φ = 0.95), we practically investigate the scenario that the
PT
covariate process is highly persistent. The maximum likelihood estimators and their respective standard errors are displayed in Table 4 for different sets of true values. The results indicate that the estimators approach satisfactorily the true values, especially for large sample sizes.
CE
265
AC
7. Data Analysis We compare the performance of various models to real data. We apply
linear models whose parametrization is based on either the logit or the probit link functions. The feedback term κt−1 may be also dropped, yielding the no-
270
feedback model; that is classical autoregressive models for binary data. We consider F = FL , κt = λt and F = Φ, κt = πt . We apply these models to 16
N =2500
CR IP T
N =200
ACCEPTED MANUSCRIPT
Par.
Par.
Sample size
True
Estim.
0.5
dˆ
0.624
(0.843)
0.543
(0.267)
0.525
(0.179)
0.514
(0.147)
0.511
0.3
a ˆ1
0.306
(0.178)
0.304
(0.106)
0.303
(0.070)
0.303
(0.056)
0.300
(0.043)
-1
ˆb1
-1.176
(0.805)
-1.089
(0.468)
-1.048
(0.315)
-1.037
(0.255)
-1.023
1.5
cˆ1
1.720
(2.504)
1.550
(0.227)
1.528
0.2
dˆ
0.288
(0.482)
0.224
(0.275)
0.208
-0.3
a ˆ1
-0.274
(0.186)
-0.297
(0.114)
-0.294
2.2
ˆb1
2.109
(0.433)
2.183
(0.292)
2.184
0.5
cˆ1
0.514
(0.143)
0.507
(0.086)
0.591
0.5
dˆ
0.545
(0.273)
0.524
(0.161)
0.506
0.3
a ˆ1
0.312
(0.124)
0.306
(0.076)
0.301
-1
ˆb1
-1.102
(0.475)
-1.049
(0.284)
1.5
cˆ1
1.561
(0.259)
1.523
(0.152)
0.2
dˆ
0.222
(0.421)
0.208
(0.253)
0.206
(0.172)
0.211
(0.151)
0.205
(0.111)
-0.3
a ˆ1
-0.277
(0.177)
-0.292
(0.109)
-0.299
(0.084)
-0.302
(0.069)
-0.301
(0.054)
2.2
ˆb1
2.156
(0.408)
2.200
(0.271)
2.198
(0.194)
2.201
(0.169)
2.200
(0.128)
0.5
cˆ1
0.508
(0.195)
0.503
(0.121)
0.502
(0.088)
0.503
(0.070)
0.502
(0.053)
N =1000
N =1500
(0.154)
1.516
(0.124)
1.511
(0.096)
(0.191)
0.208
(0.156)
0.208
(0.127)
(0.087)
-0.298
(0.070)
-0.303
(0.055)
(0.211)
2.195
(0.172)
2.203
(0.134)
(0.062)
0.502
(0.049)
0.503
(0.040)
(0.114)
0.506
(0.096)
0.501
(0.069)
(0.053)
0.300
(0.042)
0.300
(0.033)
-1.015
(0.196)
-1.013
(0.164)
-1.002
(0.119)
1.508
(0.106)
1.510
(0.087)
1.502
(0.069)
M
(15) for different sample sizes N = 200, 500, 1000, 1500, 2500. Results are based on 1000 runs. The upper two panels correspond to φ = 0.95 and the bottom two panels correspond to
ED
φ = 0.5 (xt = φ xt−1 + t ).
six binary time series reported by [47, Ch.13]. These time series represent
PT
six thinly traded shares at the Johannesburg Stock Exchange for the period between 5 October 1987 to 3 June 1991 (910 days). These data are binary
CE
because for each share, the presence (1) or absence (0) of trading is recorded, throughout the period of observation. Note that during this time period, market was dominated by investors who had certain preferences towards shares. So
AC
there existed periods of no trading activity. The approach taken here identifies possible trading behavior (see also [47, Ch.13]).
280
Three of the six shares are from the coal sector and three from the diamonds
sector. The coal shares are Amcoal (1), Vierfontein (2) and Wankie (3), and the diamond shares Anamint (4), Broadacres (5) and Carrigs (6). Figure 1 displays
17
(0.116)
(0.204)
Table 4: Maximum likelihood estimators and their standard errors (in parentheses) for model
275
N =2500
CR IP T
N =500
AN US
N =200
5
||||||||||||||||||||||||||||||||||||| ||||||||||||||||||||||||||| | |||||||||||||||||||||||||||||||||||| |||||||||||||||||| |||||||||||||||||||||||||||||||||||||| |||||||||||||| | | |||||| |
4
||| ||| |||||||| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| || |||| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
3
|||| ||| || | || |||||||| ||| || | | |
2
|||||||||||||||| ||||||||||||||||||||||||||||||||||||||||||||||||||| |||||||||||||||| |||||||||||||||||||||||||||||||| ||||||||||||||||||||||||||||||||||||||||| ||||||||||||||||||| |||||||||||||||||||| ||||||||||||||||||| ||||| |||| |||||||||||||||||
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||||||||||||||||||||||||||||||
||||||||||| ||||||
| || |||||||| ||||||| | | ||| | | | |||| ||| |||| ||||| || | | ||||| |||||| | | |||||||| || | | ||| |||| | |
AN US
0
||||||||||||||||||||| |||||||||||||||||||||||||||||||||||||||||| ||||| ||||||||| ||||| || | ||||||| ||||| | ||||||||
CR IP T
6
||||||||||||||||||||||||||||||||||||| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ||||||||||||||||||
1
share number
ACCEPTED MANUSCRIPT
200
400
600
800
days
Figure 1:
Trading shares:
1=”Amcoal”, 2=”Vierfontein”, 3=”Wankie”, 4=”Anamint”,
M
5=”Broadacres” and 6=”Carrigs”. The vertical lines represent presence of trading.
ED
the presence or absence of trading for the six trading shares respectively. Figure 2 displays the autocorrelation functions (a.c.f.) of those six binary time series. 285
We note that the a.c.f. for the Wankie and Amcoal data resembles the a.c.f. of
PT
a white noise sequence.
Initially, we apply only models (2) and (3) to each time series for comparing the logit and probit link based models. The corresponding estimators are
AC
CE
reported in Table 5 with their standard errors given inside the parentheses.
18
10
15
20
25
30
10
15
20
25
30
0
10
15
20
25
PT
Lag
30
0
5
10
15 Lag
Broadacres
Carrigs
ACF
ACF
5
ED
0
5
Lag
M
5
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0
ACF
Anamint
0.0 0.2 0.4 0.6 0.8 1.0
ACF
0
Lag
10
15 Lag
20
25
30
0
5
10
15 Lag
AC
CE
Figure 2: The autocorrelation functions of the six trading shares.
19
20
25
30
20
25
30
0.0 0.2 0.4 0.6 0.8 1.0
5
Wankie
AN US
ACF 0
0.0 0.2 0.4 0.6 0.8 1.0
ACF
Vierfontein
0.0 0.2 0.4 0.6 0.8 1.0
Amcoal
CR IP T
ACCEPTED MANUSCRIPT
0.0003 (0.007) 0.967 (0.019) 0.9995 (0.005)
(3) (2) (3) 0.102 (0.049) -0.0012(0.013)
(2) (3)
M
0.184 (0.042)
0.294 (0.068) 0.446 (0.075)
0.726 (0.122)
0.677 (0.064)
0.675 (0.065)
-0.180 (0.036)
-0.292 (0.059)
Anamint
AN US
0.207 (0.116)
0.363 (0.206)
0.607 (0.180)
0.629 (0.194)
-0.443 (0.197)
-0.707 (0.358)
Wankie
0.231 (0.051)
0.375 (0.084)
0.873 (0.030)
0.875 (0.031)
-0.133 (0.029)
-0.214 (0.048)
Broadacres
Diamond Shares
0.443 (0.065)
0.742 (0.109)
0.788 (0.034)
0.783 (0.035)
-0.218 (0.033)
-0.364 (0.056)
Carrigs
CR IP T
Table 5: Maximum likelihood estimators and their standard errors (in parentheses) for the six trading shares after fitting models (2) and (3).
ˆb1
0.906 (0.023)
0.909 (0.023)
-0.091 (0.021)
-0.146 (0.034)
Vierfontein
Coal Shares
ED
PT
a ˆ1
-0.042 (0.022)
(2)
Amcoal
Model
dˆ
CE
MLE Estimators
AC
ACCEPTED MANUSCRIPT
20
ACCEPTED MANUSCRIPT
We observe from Table 5 that the feedback coefficient a ˆ1 is estimated sim-
290
ilarly after applying either the logistic or probit model. However, there are notable differences in the estimation of the coefficients b1 and d. Furthermore,
CR IP T
the signs of the estimators are the same for all parameters, in almost all cases, with the exception of Amcoal. The standard errors of the parameter estimators 295
obtained by fitting model (3) are smaller than those obtained by fitting model (2). This fact has been also observed in the simulation study. We compare
all models, based on (5), using F = FL or F = Φ, with or without feedback,
for different values of the order p. The comparison is performed in terms of
300
AN US
the Akaike information criterion (AIC) and the Bayesian information criterion (BIC). The results are displayed in Table 6. We observe that the best model is (5) with F = Φ and p = 1 (i.e. model (3)), because it minimizes both criteria with the exception of the Wankie share. Model (5) with F = FL and p = 1 (i.e. model (2)), is overall the second best model with values of AIC and BIC very close to those obtained by model (3). Note that the values of AIC and BIC for the models without feedback and of the same order are approximately the same
M
305
regardless of the link function. In the case of feedback models with p greater
AC
CE
PT
ED
than one, the probit links yields smaller values.
21
1085.45,1104.71 1084.37,1108.44
p=3
p=4
no feedback
22 1085.55,1104.80 1084.47,1108.54
p=3
p=4
no feedback
BIC
M
1187.54,1211.61
1199.31,1218.57
1218.43,1232.87
1224.87,1234.49
1199.56,1223.63
1212.36,1231.62
1157.29,1171.73
1167.43,1177.06
1168.48,1192.55
1166.72,1185.97
1147.94,1162.38
767.20,791.27
765.76,785.01
766.71,781.15
769.33,778.96
770.33,794.39
769.02,788.27
767.32,781.76
767.17,791.24
765.77,785.02
BIC
Anamint AIC
1149.75,1173.82
1153.60,1172.86
1157.21,1171.65
1167.43,1177.06
BIC
1154.58,1178.65
1158.72,1177.97
1121.91,1136.35
1143.62,1167.68
1153.71,1172.96
1162.70,1177.14
1173.53,1183.15
1163.19,1187.26
1164.78,1184.04
1123.42,1137.86
AIC
Broadacres
1143.55,1167.62
1153.58,1172.84
1162.68,1177.12
1173.53,1183.15
CR IP T
1158.23,1182.29
1158.64,1177.89
1147.00,1161.44
1150.01,1174.08
1153.94,1173.20
AN US
1168.90,1183.34
1187.65,1211.72
1199.38,1218.64
1218.44,1232.88
766.72,781.16
769.33,778.96
1224.87,1234.49
770.72,794.79
769.40,788.65
767.42,781.86
AIC
1210.73,1234.79
1218.72,1237.97
1169.55,1183.99
BIC
Wankie
Shares
Table 6: Values of the AIC and BIC for the six trading shares obtained after fitting different models.
1086.02,1100.46
p=2
F =Φ
1085.10,1094.73
p=1
1085.47,1109.54
p=3
(5)
1085.94,1105.19
p=2
F =Φ
AIC
Vierfontein
ED
1069.19,1083.63
p=1
(5)
1085.99,1100.44
p=2
F = FL
1085.10,1094.73
p=1
1086.00,1110.06
p=3
(5)
1085.93,1105.18
p=2
F = FL
1075.02,1089.46
p=1
BIC
PT
AIC
Amcoal
CE
(5)
Model
AC BIC
1090.26,1114.33
1095.65,1114.90
1119.06,1133.50
1143.29,1152.92
1115.80,1139.87
1126.15,1145.41
1071.49,1085.93
1090.04,1114.11
1095.39,1114.65
1118.93,1133.37
1143.29,1152.92
1151.81,1175.88
1153.61,1172.86
1072.43,1086.87
AIC
Carrigs
ACCEPTED MANUSCRIPT
ACCEPTED MANUSCRIPT
Acknowledgements The authors thank the Editor, Associate Editor and two reviewers for several 310
useful comments that improved the presentation considerably. This work has
CR IP T
been carried out while the first author was visiting the Department of Mathematics and Statistics, University of Cyprus. He would like to thank all the
AC
CE
PT
ED
M
AN US
members of the Department for their warm hospitality.
23
ACCEPTED MANUSCRIPT
Appendix 315
Proof of Theorem 1. The first step is to show that there exists a weakly dependent strictly stationary process {Xt = (Yt , κt ), t ∈ Z}, which belongs to
CR IP T
L1 . We need to verify condition 3.1 of [16]. Condition 3.2 in the same paper is
assumed, while condition 3.3 trivially holds for this case. We can write Xt as follows Xt
(Yt , κt ) = (1(Ut ≤ pt ), κt ) = (1(Ut ≤ F (κt )), κt )
=
AN US
= H(Yt−1 , Yt−2 , . . . , Yt−p , κt−1 , κt−2 , . . . , κt−q ; Ut ) = H(x; Ut ),
where κt = g(κt−1 , . . . , κt−q , Yt−1 , . . . , Yt−p ) = g(x) and Ut is a sequence of uniform random variables on (0, 1). We define for a vector x = (y, κ) ∈ {0, 1}×R the norm ||x|| = |y| + |κ|, ∀ > 0. Then EkH(x; Ut ) − H(x0 ; Ut )k =
M
= E(|1(Ut ≤ F (g(x))) − 1(Ut ≤ F (g(x0 )))| + |g(x) − g(x0 )|) = |F (g(x)) − F (g(x0 ))| + |g(x) − g(x0 )| ≤ sup|F 0 (ω)| · |g(x) − g(x0 )| + |g(x) − g(x0 )|
ED
ω
PT
= (K + ) |g(x) − g(x0 )| " p # q X X 0 0 ≤ (K + ) βi · |Yt−i − Yt−i | + γi · |κt−i − κt−i | i=1
i=1
min(p,q)
X
≤ (K + )
CE
i=1
0 (βi + γi /) · kXt−i − Xt−i k +
AC
where
min(p,q)
a = (K + )
X
i=min(p,q)
β, if p > q, i δ1 = γ /, otherwise. i
Then X i=1
max(p,q)
(βi + γi /) +
max(p,q)
X
i=min(p,q)
24
δi = (K + )
0 δi · kXt−i − Xt−i k ,
p X i=1
βi +
q X i=1
!
γi / .
ACCEPTED MANUSCRIPT
Pq Pp We need to assume that a < 1 (see [16]). If K < 1, we choose = [K i=1 γi ] / [(1 − K) i=1 βi ] . Pq Pp Then, a < 1 yields ( i=1 γi ) /(1 − K) + i=1 βi < 1 and condition 3.1 is satPq Pp isfied. If K ≥ 1, we choose = (K i=1 γi ) / i=1 βi . Then, a < 1 yields Pq Pp (K + 1)( i=1 γi + i=1 βi ) < 1.
CR IP T
320
This concludes the first part of the proof. For the second part, since |Yt | ≤ 1,
325
we need to show that E|κt |s < ∞. From (6), we have
|g(Yt−1 , . . . , Yt−p , κt−1 , . . . , κt−q ) − g(0)| ≤
q X i=1
γi · |κt−i | +
p X i=1
βi · |Yt−i |.
AN US
We have already proved that E|κt | < ∞ (since (Yt , κt ) ∈ L1 ). We use induction. From the above relation we get that
q X
|κt | ≤ |g(0)| +
≤
i=1
βi . Then
c+ " q X
q X i=1
γi · |κt−i | #s
ED
|κt |s
Pp
=
i=1
i=1
!s
M
Let c = |g(0)| +
γi · |κt−i | +
γi · |κt−i |
=
p X
βi .
i=1
" q s X X s
n=0
+ Rs−1 ≤
n
q X i=1
i=1
γi · |κt−i |
#n
cs−n
γi |κt−i |s + Rs−1 ,
E|κt |s ≤
q X i=1
γi |κt−i |s + C,
CE
PT
due to convexity, where Rs−1 is a polynomial of order (s − 1). Hence,
and the desired result follows under the condition in Theorem 1.
Pq
i=1
γi < 1, already implied
AC
In case when p = q = 1, we could place a different condition. We have that
330
kH(x; Ut ) − H(x0 ; Ut )k
≤
0 (K + ) max(β1 , γ1 /)kXt−1 − Xt−1 k
=
0 (γ1 + β1 K)kXt−1 − Xt−1 k ,
where = γ1 /β1 . Hence the condition becomes γ1 + Kβ1 < 1. If K < 1 it becomes γ1 + β1 < 1 as in Table 1. 25
ACCEPTED MANUSCRIPT
Lemma A-1. Under the Assumptions of Theorem 1 and Assumptions 1-4, we have the following results, as N → ∞. (i) The score function defined in (11) satisfies
CR IP T
1 D √ SN (θ 0 ) −→ N (0, G(θ 0 )), N
where G(θ) is a positive definite matrix, defined in (12). (ii) The Hessian matrix defined in (13) satisfies 1 p HN (θ 0 ) −→ G(θ 0 ). N
p
n θ : ||θ − θ 0 || ≤
AN US
(iii) Within the neighborhood of the true value, O(θ 0 ) = √ o r/ N , r > 0,
N 1 X ∂ 3 lt (θ) max sup ≤ KN , i,j,k θ∈O(θ 0 ) N ∂θi ∂θj ∂θk t=1
such that KN −→ K, where K is a constant.
M
335
Proof of Lemma A-1. Consider model (5)
where
ED
(i) In order to apply the CLT for martingales, we show that {∂lt (θ)/∂θ}t∈N ,
PT
f (κt (θ)) ∂κt (θ) ∂lt (θ) = (Yt − pt (θ)) , ∂θ pt (θ)(1 − pt (θ)) ∂θ
is a sequence of square integrable martingale differences. At the true value θ = θ 0 , we have E (Yt − pt (θ)|Ft−1 ) = 0. We need to show that
CE
340
AC
E|∂lt (θ)/∂θ| < ∞ or equivalently E|∂κt (θ)/∂θ| < ∞. We can write ∂κ(θ) ∂θi
= + +
∂g(Yt−1 , . . . , Yt−p , κt−1 (θ), . . . , κt−q (θ); θ) ∂κt−1 (θ) + ... + ∂κt−1 (θ) ∂θi ∂g(Yt−1 , . . . , Yt−p , κt−1 (θ), . . . , κt−q (θ); θ) ∂κt−q (θ) + ∂κt−q (θ) ∂θi ∂g(Yt−1 , . . . , Yt−p , κt−1 (θ), . . . , κt−q (θ); θ) , i = 1, 2, . . . , s. ∂θi (A-1) 26
ACCEPTED MANUSCRIPT
Setting 0 0 Yt−1 , . . . , Yt−p , κ0t−1 , . . . , κ0t−(i−1) , κ0t−i , κ0t−(i+1) , . . . , κ0t−q )
CR IP T
0 0 = (Yt−1 , . . . , Yt−p , κ0t−1 , . . . , κ0t−(i−1) , κt−i , κ0t−(i+1) , . . . , κ0t−q ),
in (6), we find that
∂g(Yt−1 , . . . , Yt−p , κt−1 (θ), . . . , κt−q (θ); θ) ≤ γi . ∂κt−i (θ)
∂κ(θ) ∂θi
≤ γ1
AN US
Then, (A-1) gives
∂κt−1 (θ) ∂κt−q (θ) + . . . + γq + Ct , ∂θi ∂θi
i = 1, 2, . . . , s,
where |Ct | is bounded by Assumption 3. By repeated substitution, we Pq derive that E|∂κt (θ)/∂θ| < ∞, because i=1 γi < 1. Following [23], we prove the desired result. (ii) and (iii) are also proved in the spirit of the
345
M
proof of a similar result in [23].
Lemma A-2. If Assumptions 1 and 4 hold and the conditions of Theorem 1
ED
are satisfied, then for model (5) it holds that 1 1˜ sup lN (θ) − lN (θ) → 0, N θ∈Θ N
a.s., as N → ∞,
(A-2)
PT
where ˜lN (θ) denotes (10) evaluated at some starting value (κ1−q , . . . , κ0 ).
CE
Proof of Lemma A-2. We need to show that N ! N X 1 X ˜ lim sup lt (θ) = 0, lt (θ) − N →∞ θ∈Θ N t=1
a.s.,
t=1
AC
where ˜lt (θ) is the tth log-likelihood component obtained by setting the starting value to (κ1−q , . . . , κ0 ).
27
ACCEPTED MANUSCRIPT
t=1
t=1
N N X X ˜ = (lt (θ) − lt (θ)) ≤ |lt (θ) − ˜lt (θ)| t=1
t=1
N X Yt log(F (κt θ)) + (1 − Yt ) log(1 − F (κt (θ))) = t=1
CR IP T
We have that N N X X ˜ lt (θ) − lt (θ)
350
− [Yt log(F (˜ κt (θ))) + (1 − Yt ) log(1 − F (˜ κt (θ)))] N X log(F (κt (θ))) − log(F (˜ ≤ κt (θ))) t=1
κt (θ))) + log(1 − F (κt (θ))) − log(1 − F (˜ N X t=1
since Yt ∈ {0, 1}.
|A1t | + |A2t | ,
AN US
≤
(A-3)
But pt = F (κt (θ)) and therefore Assumption 4 shows that pt is bounded away from 0 and 1. Suppose that pt ∈ I ⊂ (0, 1). Then also F (κt (θ)) ∈ I
and therefore κt (θ) ∈ R, bounded away from ±∞. Recall that an everywhere
M
differentiable function h : R → R is Lipschitz continuous if and only if it has bounded first derivative. Hence, F (·) is Lipschitz continuous from Assumption
ED
4. Then,
|A1t | ≤ K1 |F (κt (θ)) − F (˜ κt (θ))| ≤ K1 K |κt (θ) − κ ˜ t (θ)|,
(A-4)
PT
where the first inequality holds since the function h(x) = log(x) : I → R has bounded first derivative. The second inequality holds, since f (x) is assumed to
CE
be bounded above by a finite positive number K. Similarly, we find that |A2t | ≤ K2 |F (κt (θ)) − F (˜ κt (θ))| ≤ K2 K |κt (θ) − κ ˜ t (θ)|,
(A-5)
AC
where the first inequality holds since the function log(1−x) : I → R has bounded first derivative. Applying recursively (6) we have |κt (θ) − κ ˜ t (θ)|
≤ <
q X j=1
γi · |κt−j (θ) − κ ˜ t−j (θ)|
γt · K 0, 28
(A-6)
ACCEPTED MANUSCRIPT
by repeated substitution, where 0 < γ < 1 and K 0 is a finite positive constant. Hence, from the compactness of θ (see Assumption 1), we can write sup |κt (θ) − κ ˜ t (θ)| ≤ γ t · K 0 .
CR IP T
θ∈Θ
From equations (A-3), (A-4), (A-5) and (A-6) we obtain that N 1 K 0K X |lN (θ) − ˜lN (θ)| ≤ (K1 + K2 ) γ t . N N t=1
AC
CE
PT
ED
M
AN US
The rest of the proof is similar to [23].
29
ACCEPTED MANUSCRIPT
355
References References
CR IP T
[1] B. Kedem, Binary Time Series, Marcel Dekker, New York, 1980. [2] R. D. Stern, R. Coe, A model fitting analysis of daily rainfall data, Journal of the Royal Statistical Society. Series A (General) 147 (1984) pp. 1–34. 360
[3] L. Fahrmeir, H. Kaufmann, Regression models for nonstationary categorical time series, Journal of Time Series Analysis 8 (1987) 147–160.
AN US
[4] E. V. Slud, B. Kedem, Partial likelihood analysis of logistic regression and autoregression, Statist. Sinica 4 (1994) 89–106.
[5] W. Breen, L. R. Glosten, R. Jagannathan, Economic significance of predictable variations in stock index returns, The Journal of Finance 44 (1989)
365
1177–1189.
M
[6] K. C. Butler, S. Malaikah, Efficiency and inefficiency in thinly traded stock markets: Kuwait and Saudi Arabia, Journal of Banking & Finance 16
370
ED
(1992) 197–210.
[7] P. F. Christoffersen, F. X. Diebold, Financial asset returns, direction-of-
PT
change forecasting, and volatility dynamics, Management Science 52 (2006) 1273–1287. doi:10.1287/mnsc.1060.0520.
CE
[8] P. Christoffersen, F. X. Diebold, R. S. Mariano, A. Tay, Y. K. Tse, Direction-of-change forecasts for asian equity markets based on conditional variance, skewness and kurtosis dynamics: Evidence from hong kong and
AC
375
singapore, Journal of Financial Forecasting 1 (2007) 1–22.
[9] R. Startz, Binomial autoregressive moving average models with an application to US recessions, Journal of Business & Economic statistics 26 (2008) 1–8.
30
ACCEPTED MANUSCRIPT
380
[10] H. Nyberg, Dynamic probit models and financial variables in recession forecasting, Journal of Forecasting 29 (2010) 215–230. [11] H. Nyberg, Forecasting the direction of the us stock market with dynamic
CR IP T
binary probit models, International Journal of Forecasting 27 (2011) 561– 578. 385
[12] H. Nyberg, Predicting bear and bull stock markets with dynamic binary time series models, Journal of Banking & Finance 37 (2013) 3351–3363.
Forecasting 31 (2012) 47–67.
AN US
[13] H. Kauppi, Predicting the direction of the Fed’s target rate, Journal of
[14] R. Wu, Y. Cui, A parameter-driven logit regression model for binary time series, Journal of Time Series Analysis.
390
[15] J. Dedecker, P. Doukhan, G. Lang, J. R. Le´ on R., S. Louhichi, C. Prieur, Weak Dependence: With Examples and Applications, Vol. 190 of Lecture
M
Notes in Statistics, Springer, New York, 2007.
ED
[16] P. Doukhan, O. Wintenberger, Weakly dependent chains with infinite memory, Stochastic Process. Appl. 118 (2008) 1997–2013. doi:10.1016/j.spa.
395
2007.12.004.
PT
[17] R. M. de Jong, T. Woutersen, Dynamic time series binary choice, Econometric Theory 27 (2011) 673–702.
CE
[18] D. R. Cox, E. J. Snell, Analysis of Binary Data, 2nd Edition, Vol. 32 of Monographs on Statistics and Applied Probability, Chapman & Hall,
400
AC
London, 1989.
[19] A. Agresti, Categorical Data Analysis, 2nd Edition, Wiley, New York, 2002. doi:10.1002/0471249688.
[20] P. McCullagh, J. A. Nelder, Generalized Linear Models, 2nd Edition, Chap-
405
man & Hall, London, 1989.
31
ACCEPTED MANUSCRIPT
[21] P. J. Brockwell, R. A. Davis, Time Series: Theory and Methods, SpringerVerlag, New York, 1991, second Edition.
ken, NJ, 2002. doi:10.1002/0471266981. 410
CR IP T
[22] B. Kedem, K. Fokianos, Regression Models for Time Series analysis, Hobo-
[23] T. Moysiadis, K. Fokianos, On binary and categorical time series models with feedback, Journal of Multivariate Analysis 131 (2014) 209–228.
[24] T. Bollerslev, Generalized autoregressive conditional heteroskedasticity, J.
AN US
Econometrics 31 (1986) 307–327. doi:10.1016/0304-4076(86)90063-1.
[25] J. R. Russell, R. F. Engle, Econometric analysis of discrete-valued irregularly-spaced financial transactions data using a new autoregres-
415
sive conditional multinomial model, SSRN eLibrary,doi:10.2139/ssrn. 106528.
[26] J. R. Russell, R. F. Engle, A discrete-state continuous-time model of fi-
M
nancial transactions prices and times, Journal of Business and Economic Statistics 23 (2005) 166–180. doi:10.1198/073500104000000541.
420
ED
[27] T. H. Rydberg, N. Shephard, Dynamics of trade-by-trade price movements: decomposition and models, Journal of Financial Econometrics 1 (2003) 2–
PT
25.
[28] D. Tjøstheim, Rejoinder on: Some recent theory for autoregressive count time series, TEST 21 (2012) 469–476.
CE
425
[29] K. Fokianos, D. Tjøstheim, Log-linear Poisson autoregression, J. Multivari-
AC
ate Anal. 102 (2011) 563–578. doi:10.1016/j.jmva.2010.11.002.
[30] D. R. Cox, Statistical analysis of time series: some recent developments,
430
Scand. J. Statist. 8 (1981) 93–115.
[31] S. L. Zeger, B. Qaqish, Markov regression models for time series: a quasi-likelihood approach, Biometrics 44 (1988) 1019–1031. doi:10.2307/ 2531732. 32
ACCEPTED MANUSCRIPT
[32] H. Kauppi, P. Saikkonen, Predicting US recessions with dynamic binary response models, The Review of Economics and Statistics 90 (2008) 777– 791.
435
CR IP T
[33] H. Kauppi, Yield-curve based probit models for forecasting US recessions: stability and dynamics, Tech. rep., Aboa Centre for Economics (2008).
[34] H. Nyberg, Studies on binary time series models with applications to em-
pirical macroeconomics and finance, Ph.D. thesis, Universiy of Helsinki (2010).
440
AN US
[35] K. Fokianos, Count time series models, in: T. S. Rao, S. S. Rao, C. R. Rao (Eds.), Handbook of Statistics: Time Series Analysis–Methods and Applications, Vol. 30, Elsevier B. V., Amsterdam, 2012, pp. 315–347. [36] P. Doukhan, S. Louhichi, A new weak dependence condition and applications to moment inequalities, Stochastic Process. Appl. 84 (1999) 313–342.
445
M
doi:10.1016/S0304-4149(99)00055-1.
[37] P. Doukhan, K. Fokianos, D. Tjøstheim, On weak dependence conditions
ED
for Poisson autoregressions, Statist. Probab. Lett. 82 (2012) 942–948. doi: 10.1016/j.spl.2012.01.015. [38] K. Fokianos, A. Rahbek, D. Tjøstheim, Poisson autoregression, J. Amer.
PT
450
Statist. Assoc. 104 (2009) 1430–1439, with electronic supplementary mate-
CE
rials available online. doi:10.1198/jasa.2009.tm08270. [39] M. Neumann, Absolute regularity and ergodicity of poisson count processes,
AC
Bernoulli 17 (2011) 1268–1284.
455
[40] K. Fokianos, D. Tjøstheim, Nonlinear Poisson autoregression, Ann. Inst. Statist. Math. 64 (2012) 1205–1225. doi:10.1007/s10463-012-0351-3.
[41] B. M. P¨ otscher, I. R. Prucha, Dynamic nonlinear econometric models, Springer-Verlag, Berlin, 1997, asymptotic theory.
33
ACCEPTED MANUSCRIPT
[42] D. W. K. Andrews, Nonstrong mixing autoregressive processes, J. Appl. Probab. 21 (1984) 930–934.
460
[43] I. Berkes, L. Horv´ ath, P. Kokoszka, GARCH processes: structure and es-
CR IP T
timation, Bernoulli 9 (2003) 201–227. doi:10.3150/bj/1068128975.
[44] C. Francq, J.-M. Zakoian, GARCH Models: Structure, Statistical Inference and Financial Applications, John Wiley & Sons, UK, 2011. 465
[45] M. Meitz, P. Saikonnen, Ergodicity, mixing and existence of moments of
a class of Markov models with applications to GARCH and ACD models,
AN US
Econometric Theory 24 (2008) 1291–1320.
[46] R Core Team, R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, http://www.Rproject.org (2013).
470
[47] W. Zucchini, I. L. MacDonald, Hidden Markov models for Time Series: An
AC
CE
PT
ED
M
Introduction Using R, CRC Press, Boca Raton, FL, 2009.
34