Binary time series models driven by a latent process

Binary time series models driven by a latent process

Accepted Manuscript Binary Time Series Models Driven by a Latent Process Konstantinos Fokianos, Theodoros Moysiadis PII: DOI: Reference: S2452-3062(...

717KB Sizes 1 Downloads 62 Views

Accepted Manuscript

Binary Time Series Models Driven by a Latent Process Konstantinos Fokianos, Theodoros Moysiadis PII: DOI: Reference:

S2452-3062(17)30009-6 10.1016/j.ecosta.2017.02.001 ECOSTA 44

To appear in:

Econometrics and Statistics

Received date: Revised date: Accepted date:

20 February 2016 19 October 2016 7 February 2017

Please cite this article as: Konstantinos Fokianos, Theodoros Moysiadis, Binary Time Series Models Driven by a Latent Process, Econometrics and Statistics (2017), doi: 10.1016/j.ecosta.2017.02.001

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

Binary Time Series Models Driven by a Latent Process

CR IP T

Konstantinos Fokianos Theodoros Moysiadis University of Cyprus Department of Mathematics & Statistics PO BOX 20537 Nicosia 1678 Cyprus

Abstract

AN US

The problem of ergodicity, stationarity and maximum likelihood estimation is

studied for binary time series models that include a latent process. General models are considered, covered by different specifications of a link function. Maximum likelihood estimation is discussed and it is shown that the MLE satisfies standard asymptotic theory. The logistic and probit models, routinely

M

employed for the analysis of binary time series data, are of special importance in this study. The results are applied to simulated and real data. Keywords: autocorrelation, generalized linear models, logistic model, probit

ED

model, regression, weak dependence.

PT

2010 MSC: 62M10, 62J12, 62F12, 62M20, 62M09

1. Introduction

CE

Figure 1 displays trading activity of six thinly traded shares at the Johannesburg Stock Exchange between the time period from 5th of October 1987 to 3rd of June 1991. These data are binary because for each share the presence (1) or the absence (0) of a trading was recorded. We will further analyze these data

AC

5

in Section 7 but we point out that modeling of presence/absence is of interest for identification of trading patterns, at least for this particular application. The goal of this article is to study properties of regression based models for

the analysis of binary time series; see [1] for an early treatment. Regression 10

modeling, in this context, has been studied by [2], [3], and [4], among others. Preprint submitted to Journal of Econometrics and Statistics

February 20, 2017

ACCEPTED MANUSCRIPT

Such data have been increasingly popular in various financial applications ([5], [6], [7], [8], [9], [10, 11, 12], [13] and [14]), but also to other scientific fields. We deliver ergodicity and stationarity conditions for binary time series models,

15

CR IP T

employing the concept of weak dependence (e.g. [15], [16]). Some previous work in this direction was reported by [17] but their point of view is different from the approach taken here.

The desired conditions are established by considering binary time series mod-

els which are driven by a latent process and are specified by means of a general link function. We are especially interested to the logistic and probit models;

these are the most popular models for analyzing binary data ( [18], [19]). It is

AN US

20

easy to see that both of these models fall within the framework of generalized linear models (GLM) as developed by [20] and they can be applied in a straightforward way by using existing software tools. GLM are natural extensions of the ordinary AR models (see [21], for instance), in the sense that the response 25

distribution –in the case of binary data it is the Bernoulli distribution– belongs

M

to the exponential family. Moreover it is assumed that a “nice” function –called the link function– of the conditional mean of the response is linearly related to

ED

its lagged values and/or a latent process. For the logistic and probit models, the link function is the inverse cumulative distribution function (c.d.f.) of the 30

logistic and normal distributions, respectively. However, the presentation will

PT

not be restricted to those models but we will outline a general theory which covers other link functions.

CE

Regression models for binary time series have been discussed previously by [22, Ch.2] and the references therein. It was explicitly shown that the combina-

35

tion of likelihood inference and generalized linear models provide a systematic

AC

framework for the analysis of quantitative as well as qualitative time series data. Indeed, estimation, diagnostics, model assessment, and forecasting are implemented in a straightforward manner, while the computation is carried out by a number of the existing software packages. Experience shows that both positive

40

and negative association can be taken into account by a suitable parametrization of the model. 2

ACCEPTED MANUSCRIPT

Our novel contribution consists of introducing a feedback process which enriches the dynamics of the observed process. This approach parallels GARCH modeling whereby the volatility is regressed on its past values and the lagged square returns. For binary time series, a GARCH type model is specified in

CR IP T

45

terms of the success probabilities, which are regressed on their past values and lagged responses. In fact, the main contribution of this article, is the general-

ization of the recent results obtained by [23] to a broader class of binary time series models; probit and logistic models are special cases in our framework.

In Section 2 we give several examples of the type of models we study. Sec-

50

AN US

tion 3 introduces a general model (eq. (5)), which can be employed for the

analysis of binary time series. This section contains results about stationarity and ergodicity of (5). Section 4 examines maximum likelihood estimation for the parameter vector. Section 5 reports a limited simulation study and Section 55

6 shows that inclusion of covariates in (5) can be easily accomplished within the GLM framework. This is an appealing feature of our approach. Section

M

7 illustrates an example of a real data analysis. The paper concludes with a

ED

discussion and an Appendix with some theoretical results. 2. Modeling of Binary Time Series Let {Yt }t∈Z denote a binary time series. Consider an increasing sequence

PT

of σ-fields, say {Ft }t≥1 , which will be specified in detail later. Denote the

CE

conditional probability of success given Ft−1 by pt = P (Yt = 1|Ft−1 ) = E(Yt |Ft−1 ),

t ∈ Z.

In this section we focus on developing and discussing autoregressive models

AC

for binary time series, which might include a feedback mechanism or a latent process, as we explain below. For instance, consider the following simple linear model

60

pt = d + a1 pt−1 + b1 Yt−1 ,

t ∈ Z,

(1)

where d, a1 , b1 ∈ R. The above model is quite analogous to the ordinary GARCH model (see [24]). Model (1) was studied by [25, 26] for modeling the joint 3

ACCEPTED MANUSCRIPT

distribution of discrete price changes and their duration. In this work, we are interested on the price change process only. Clearly (1) is driven by a latent process. However, it also imposes restriction on a1 and b1 because the transition probabilities {pt } have to belong to the interval (0, 1). In addition, the linear

CR IP T

65

model (1) introduces further constraints on the regression parameters when any

covariates are included. The complexity of the problem increases considerably for a model which entertains more lagged regressors.

[22, Sec.2.1] show that the logistic model is a natural choice for analyzing binary time series. The logistic model is defined by t ∈ Z,

AN US

λt = d + a1 λt−1 + b1 Yt−1 ,

(2)

for some real unknown parameters d, a1 , b1 , where   pt λt = log , 1 − pt

is the inverse logistic c.d.f. It is obvious that (2) falls within the framework of generalized linear models–in fact the logistic model corresponds to the canonical

M

70

link function model according to GLM terminology. Additionally, (2) does not impose complicated restrictions on the parameter space while it allows inclusion

ED

of covariates in a straightforward manner; see also [27] and [28, p.471]. By recalling that the logistic link corresponds to the canonical link of the 75

Bernoulli distribution, we note that model (2) parallels the log-linear model

PT

proposed by [29] for count time series modeling. Model (2) has been studied in [23], where it was shown that the constraint 4|a1 | + |b1 | < 4 is necessary for

CE

the process (Yt , λt ) to be ergodic and stationary. In this case, the values of a1 and b1 are allowed to belong to a larger set of values compared to the typical stationarity region of ordinary ARMA models.

AC

80

The probit model is another useful tool for binary time series analysis. It is

defined analogously to model (2) by pt = Φ(πt ),

πt = d + a1 πt−1 + b1 Yt−1 ,

(3)

where Φ(·) denotes the c.d.f. of a standard normal random variable and d, a1 , b1 are unknown real parameters. Models of this form, have been considered by [30], 4

ACCEPTED MANUSCRIPT

[31], [32], [33, 13], and [34], among others. Models (2) and (3) are similar in the sense that they are defined by means of a latent process which is a monotone transformation of the success probability pt . Indeed, model (3) is defined by

pt = FL (λt ),

CR IP T

means of the probit transformation, whereas model (2) can be rewritten as λt = d + a1 λt−1 + b1 Yt−1 ,

with FL (·) denoting the standard logistic c.d.f. The same argument shows that (1) is of identical form with that of (2) and (3) by considering the standard uniform c.d.f. Hence, it is quite natural to study models of the general form

85

AN US

given by (5) as we discuss in Section 3. For both models (2) and (3), it is

easy to incorporate covariates by adding an extra term of the form γ 0 X t on their right hand side. Here γ is a vector of parameters and X t is a covariate process. Note that the logistic (respectively, probit) model allows this inclusion because FL−1 (·) (respectively, Φ−1 (·)) takes values on R. Model (1) poses more restrictions on coefficients though because γ 0 X t ≥ 0 and 0 ≤ pt ≤ 1. process is considered:

p X

 ρj Yt−j + γ 0 X t + Ut > 0 .

ED

Yt = I

M

Binary time series models have been also studied by [17] where the following

90

j=1

(4)

In the above, I(·) is the indicator function and ρi , i = 1, 2, . . . , p are unknown

PT

parameters. In addition, Ut is an error sequence such that the vector process (X 0t , Ut ) is strictly stationary and strongly mixing. When the errors are normally distributed, γ = 0 and p = 1, then we obtain (3) but without the hidden

CE

95

process term. The approach taken here is based on modeling directly binary time series by employing GLM methodology. Hence, (4) can be also viewed as

AC

a motivation to consider the logistic (or probit) link functions, among many other possible specification, when Ut is an i.i.d sequence of logistic (or normal)

100

distributed random variables. In addition we study model (5), which includes a hidden process. [17] show near epoch dependence of the process defined by (4) and they study likelihood theory for the probit model employing smoothed maximum score estimation. We study maximum likelihood inference for (5), 5

ACCEPTED MANUSCRIPT

which allows for general specifications, and we establish asymptotic normality 105

when the feedback term is included in the model; such proof is missing from the

CR IP T

literature to the best of our knowledge (cf. [10, p.43]). 3. Main Results

In this section we study conditions for ergodicity and stationarity of binary time series models specified by pt = F (κt ),

κt = g(Yt−1 , Yt−2 , . . . , Yt−p , κt−1 , κt−2 , . . . , κt−q ; θ),

t ∈ Z, (5)

AN US

where κt denotes the feedback process, Yt−i , i = 1, 2, . . . , p is the i-th lagged

value of the response process, κt−j , j = 1, 2, . . . , q is the j-th lagged value of κt , 110

g is a real function (not necessarily linear) and F is any continuous c.d.f. with F 0 = f . Then (5) with p = q = 1, yields models (2) (respectively, (3)) when F is chosen to be the standard logistic c.d.f. (FL ) (respectively, the standard normal c.d.f. (Φ)), and g(.) is linear.

115

M

Model (5) implies the existence of the feedback mechanism, which determines the evolution of the observed process. When strong correlation exists in the

ED

data, then we expect (5) to be more parsimonious than a model which does not include the components κt−j . This is a similar phenomenon observed to count time series data; see [35, Sec.2]. Model (5) allows for a variety of non-

120

PT

linear models for the analysis of binary time series. The non-linear specification, introduced by the choice of g(·) is made possible by the fact that the inverse c.d.f. assumes values in R.

CE

For developing asymptotic likelihood theory, we need a weak law of large

numbers for the Hessian matrix, a martingale CLT for the score function and

AC

existence of moments of the joint process (Yt , κt ). The main tool to obtain the necessary conditions that ensure ergodicity and stationarity of the joint process (Yt , κt ) is the notion of weak dependence, which was introduced by [36]; see also [15]. In fact [16] prove the existence of a weakly dependent strictly stationary solution of the equation Xt = H(Xt−1 , Xt−2 , Xt−3 , . . . ; ξt ), 6

ACCEPTED MANUSCRIPT

which is a chain with infinite memory, where {ξt } is an innovation sequence. In our case, Xt = (Yt , κt ). [37] assume a contraction type condition (see also [38], [39], [40]) to prove stationarity, ergodicity and develop asymptotic theory of a Poisson based model for the analysis of count time series. This is the type of

CR IP T

125

condition that we will assume in this work for the case of binary time series.

When a count time series is under consideration, the innovation sequence {ξt }

appearing in the last display consists of independent unit (standardized) Poisson processes. For the models we consider in this contribution, the sequence {ξt } 130

consists of standard uniform random variables; see the proof of Theorem 1 in

AN US

the Appendix.

Note that [17] have shown that the process Yt , defined by (4), is near epoch dependent by assuming that (X t , Ut ) is strongly stationary and mixing process and by imposing mild conditions to the regression coefficients. Their arguments 135

show that (4) satisfies the property of fading memory; for a detailed discussion on the property of near epoch dependence see [41]. Our approach is quite

M

different because we impose conditions to the function g(·) of (5). In fact, after repeated substitution, the process κt can be expressed as func-

140

ED

tion of the binary process Ys for s < t. Such model is quite analogous to the model considered by [42] who gave an example of a process with discrete innovations which is not strongly mixing. The main difficulty for proving mixing of

PT

(Yt , κt ) is that the process Yt assumes discrete values but the process κt takes values in R-see also [39, Sec.3] who discusses these issues for count time series

CE

models. The concept of weak dependence, as introduced by [36] bypasses the 145

problem even when the innovations are discrete; see also [36, Sec. 3.4]. Theorem 1. Consider model (5) and assume that there exist β1 , . . . , βp , γ1 , . . . , γq ∈

AC

R+ , for any X = (Y1 , . . . , Yp , κ1 , . . . , κq ), X 0 = (Y10 , . . . , Yp0 , κ01 , . . . , κ0q ) in {0, 1}p × Rq , such that

|g(Y1 , . . . , Yp , κ1 , . . . , κq )−g(Y10 , . . . , Yp0 , κ01 , . . . , κ0q )| ≤

q X i=1

γi ·|κi −κ0i |+

p X i=1

βi ·|Yi −Yi0 |. (6)

We further assume that supx∈R |f (x)| ≤ K, (see also Assumption 4). If K ≥ 7

ACCEPTED MANUSCRIPT

Pq Pp 1, assume that (K + 1)( i=1 γi + i=1 βi ) < 1. If K < 1, suppose that Pq Pp ( i=1 γi ) /(1 − K) + i=1 βi < 1. Then, there exists a unique causal solu-

tion {(Yt , κt ), t ∈ Z}, which is stationary, ergodic and satisfies E|κt |s < ∞,

∀s ∈ N.

CR IP T

150

Remark 1. The constant K determines whether the link function F (·) is a contraction. Hence, we consider separately the cases K ≤ 1 and K > 1.

Corollary 1. Suppose that we consider model (5) with F = FL , κt = λt and

AN US

155

q = 1. In this case K = 1/4 < 1. Then (6) is satisfied provided that 4γ1 /3 + Pp i=1 βi < 1. Hence there exists a unique causal solution {(Yt , λt ), t ∈ Z}, which is stationary, ergodic and satisfies E|λt |s < ∞, ∀s ∈ N.

Corollary 2. Consider (5) with F = Φ, κt = πt and q = 1. Then (6) is satisfied √ √ √ Pp provided that γ1 2π/( 2π −1)+ i=1 βi < 1, because K = 1/ 2π < 1. Hence

there exists a unique causal solution {(Yt , πt ), t ∈ Z}, which is stationary, ergodic and satisfies E|πt |s < ∞, ∀s ∈ N.

M

160

We discuss some special cases of model (5) with F = Φ to bring across the

ED

essence of Theorem 1. In this case κt = πt .

PT

Example 1. Let the function g(·) to be linear and q = 1. Then πt = d + a1 πt−1 +

p X

bi Yt−i ,

(7)

i=1

CE

where a1 , bi (i = 1, . . . , p) ∈ R. For this particular case it is sufficient to assume √ √ Pp that 2π|a1 | + i=1 |bi | < 2π/2, which changes the model parameter region.

Note that in Corollary 2, the function g(·) is any function, which satisfies (6),

AC

but for this example g(·) is chosen to be linear. If we drop the feedback term,

we obtain

and the required condition is

πt = d +

p X

bi Yt−i ,

i=1

Pp

i=1

|bi | <

8



2π.

(8)

ACCEPTED MANUSCRIPT

A model of specific interest is (3). If the function g(·) in (3) is not linear then we have to assume that there exist γ1 , β1 > 0, such that γ1 + β1 < 1.

CR IP T

|g(Y, π) − g(Y 0 , π 0 )| ≤ γ1 · |π − π 0 | + β1 · |Y − Y 0 |,

√ √ The condition γ1 + β1 < 1 is slightly better than the condition γ 2π/( 2π − 165

1) + β1 < 1, obtained from Corollary 2 with p = 1. This is a consequence of

verifying condition 3.1 of [16]. For p = 1, the proof can be slightly altered so that the model parameters satisfy a weaker condition. For the special case of

170

AN US

model (3) and when g(·) is linear, it is enough to assume the weaker condition √ √ 2π|a1 | + |b1 | < 2π and the result of Theorem 1 is still true. Similar conditions are found for the logistic model (2) by replacing the con√ stant 2π with the constant 4. The necessary assumptions for different lags and link functions are displayed in Table 1.

Remark 2. Apart from probit and logit models there is a wide range of link

175

M

functions that we can apply for binary time series analysis. For instance, if pt = exp(− exp(−κt )), then κt = − log(− log(pt )). Therefore we obtain the log-

ED

log link function. Another possibility is to consider pt = 1 − exp(− exp(κt )). Then κt = log(− log(1 − pt )). This is the complementary log-log link function.

PT

For a detailed discussion about other link functions see [22, Sec.2.1.2].

CE

4. Inference

Recall model (5). Assume that the function g(·) depends on some finite

180

dimensional parameter θ, with s = dim(θ). For instance when model (3) holds,

AC

then F = Φ and κt (θ) = πt (θ) that is pt (θ) = Φ(πt (θ)) and θ = (d, a1 , b1 )T de-

notes the respective vector of unknown parameters. In what follows, we discuss the estimation of θ, based on the conditional likelihood function for model (5).

185

We set Ft = σ(κ1−q , . . . , κ−1 , κ0 , Ys , s ≤ t). The likelihood function is given by

9

ACCEPTED MANUSCRIPT

Order

g(·)

(5)

p=q=1

linear

F =Φ

(5) F = FL

p=q=1

general

p > 1, q = 1

linear

p > 1, q = 1

general

p=q=1

linear

p=q=1

general

p > 1, q = 1

linear

p > 1, q = 1

general



Conditions 2π|a1 | + |b1 | <





γ1 + β1 < 1 √ √ P |a1 | 2π/( 2π − 1) + pi=1 |bi | < 1 √ √ Pp γ1 2π/( 2π − 1) + i=1 βi < 1

CR IP T

Models

4|a1 | + |b1 | < 4

γ1 + β1 < 1 Pp i=1 |bi | < 1 Pp 4γ1 /3 + i=1 βi < 1

4|a1 |/3 +

Table 1: Sufficient conditions to establish ergodicity and stationarity for the joint process (Yt , κt ) for some different classes of models. The notation γ1 and βi , i = 1, 2, . . . , p, refers to

AN US

the notation introduced in Theorem 1 and the notation a1 and bi , i = 1, 2, . . . , p, refers to the special case of linear models.

=

N Y

t=1

=

N Y

t=1

P (Yt = yt |Ft−1 ) =

N Y

t=1

pYt t (θ)(1 − pt (θ))1−Yt

F Yt (κt (θ))(1 − F (κt (θ)))1−Yt .

M

LN (θ)

(9)

lN (θ)

=

ED

The conditional log-likelihood is equal to N X

lt (θ) =

t=1

(10)

PT

t=1

N h i X Yt log F (κt (θ)) + (1 − Yt ) log(1 − F (κt (θ))) .

ˆ we maximize (9). We denote by In order to compute the conditional MLE θ

CE

ˆ = arg max(LN (θ)), where Θ ⊆ Rs denotes the parameter space. The score θ θ∈Θ

function is equal to

AC

SN (θ) =

N X t=1

∇lt (θ) =

N X ∂lt (θ) t=1

∂θ

=

N X t=1

f (κt (θ)) ∂κt (θ) (Yt − pt (θ)), pt (θ)(1 − pt (θ)) ∂θ (11)

by recalling that F 0 = f. It is clear, that at the true value θ 0 , we obtain

E[Yt − pt (θ 0 ) | Ft−1 ] = 0. This fact combined with Assumption 4 –see below– shows that the score function is square integrable martingale. In particular, for

10

ACCEPTED MANUSCRIPT

model (3) we obtain that

Generally, for model (5) we have that ∂κt (θ) ∂θ

=

CR IP T

 T ∂κt (θ) ∂πt (θ) ∂πt (θ) ∂πt (θ) ∂πt (θ) = = , , ∂θ ∂θ ∂d ∂a1 ∂b1  T ∂πt−1 (θ) ∂πt−1 (θ) ∂πt−1 (θ) = 1 + a1 , πt−1 (θ) + a1 , Yt−1 + a1 . ∂d ∂a1 ∂b1  q  X ∂g(Yt−1 , . . . , Yt−p , κt−1 (θ), . . . , κt−q (θ); θ) ∂κt−j (θ) ∂κt−j (θ)

j=1

∂θ

+

AN US

∂g(Yt−1 , . . . , Yt−p , κt−1 (θ), . . . , κt−q (θ); θ) . ∂θ

The conditional information matrix is given by GN (θ)

= =

 f (κt (θ)) ∂κt (θ) Cov (Yt − pt (θ)) Ft−1 pt (θ)(1 − pt (θ)) ∂θ t=1

N X

N X t=1

f (κt (θ))2 ∂κt (θ) ∂κt (θ) , F (κt (θ))(1 − F (κt (θ))) ∂θ ∂θ T

M

  since the conditional variance of Yt is Var Yt Ft−1 = pt (θ)(1 − pt (θ)). We also define the matrix



 f (κt (θ))2 ∂κt (θ) ∂κt (θ) , F (κt (θ))(1 − F (κt (θ))) ∂θ ∂θ T

ED

190



G(θ)

= E

(12)

PT

where expectation is taken with respect to the stationary distribution. The Hessian matrix is given by = −

CE

HN (θ)

t=1

∂θ∂θ T

N X ∂ 2 κt (θ)

f (κt (θ)) (Yt − pt (θ)) ∂θ∂θ F (κt (θ))(1 − F (κt (θ))) t=1  N X ∂κt (θ) ∂κt (θ) ∂f κt (θ) /∂κt (θ) − (Yt − pt (θ)) ∂θ ∂θ T F (κt (θ))(1 − F (κt (θ))) t=1  2 N X ∂κt (θ) ∂κt (θ) f (κt (θ)) + (Yt − pt (θ))2 . T ∂θ F (κ (θ))(1 − F (κ (θ))) ∂θ t t t=1

= −

AC

N X ∂ 2 lt (θ) T

(13)

11

ACCEPTED MANUSCRIPT

ˆ can be proved by imposing the Asymptotic normality of conditional MLE θ following assumptions, which are quite analogous to those given by [43], [44], 195

[45] and [23].

CR IP T

Assumption 1. The parameter θ belongs to a compact set Θ and the true value, θ 0 , belongs to the interior of Θ.

Assumption 2. The components of ∂g/∂θ are assumed to be linearly independent.

Assumption 3. The function g(·) is four times differentiable with respect to θ

l=1

AN US

and κ. In addition if x∗ = (θ T , κT )T = (θ1 , . . . , θs , κ1 , . . . , κq )T , then ∂g(Y1 , . . . , Yp , κ1 , . . . , κq ; θ) ∂g(Y10 , . . . , Yp0 , κ01 , . . . , κ0q ; θ) − ∂x∗i ∂x∗i p q X X ≤ bil |Yl − Yl0 | + cil |κl − κ0l |, i = 1, . . . , s + q, l=1



M

∂ 2 g(Y , . . . , Y , κ , . . . , κ ; θ) ∂ 2 g(Y 0 , . . . , Y 0 , κ0 , . . . , κ0 ; θ) 1 p 1 q q p 1 1 − ∂x∗i ∂x∗j ∂x∗i ∂x∗j p X

bijl |Yl − Yl0 | +

p X

bijkl |Yl − Yl0 | +

l=1

q X l=1

cijl |κl − κ0l |,

i, j = 1, . . . , s + q,

PT ≤

l=1

q X l=1

cijkl |κl − κ0l |,

i, j, k = 1, . . . , s + q,

where bil , bijl , bijkl , cil , cijl , cijkl ∈ R+ . We further assume that ∀i, j, k ∈ {1, . . . , s+

CE

200

ED

∂ 3 g(Y , . . . , Y , κ , . . . , κ ; θ) ∂ 3 g(Y 0 , . . . , Y 0 , κ0 , . . . , κ0 ; θ) 1 p 1 q q p 1 1 − ∂x∗i ∂x∗j ∂x∗k ∂x∗i ∂x∗j ∂x∗k

q} the following hold Ps+q Pp i=1 ( l=1 bil +cil ) < ∞,

AC

cijkl ) < ∞,

Ps+q Pp i,j=1 ( l=1 bijl +cijl ) < ∞,

E|∂g(0; θ)/∂x∗i | < ∞,

Ps+q

i,j,k=1 (

E|∂ 2 g(0; θ)/∂x∗i ∂x∗j | < ∞,

Pp

l=1 bijkl +

E|∂ 3 g(0; θ)/∂x∗i ∂x∗j ∂x∗k | <

∞.

205

Assumption 4. The function F (x) is defined in R, such that F : R → [0, 1], is

monotone and continuously differentiable with F 0 = f . In addition, there exists a finite, positive number K, such that sup |f (x)| ≤ K. x∈R

12

ACCEPTED MANUSCRIPT

Assumption 1 is standard in the literature. Assumption 2 guarantees that the matrix G(θ) defined by (12) is invertible. Assumption 3 make possible the ap210

proximation of the derivatives of g(·) by linear functions of its arguments. These

CR IP T

assumptions are commonly used in estimation of nonlinear time series models, and they are relatively mild. They are employed when the log-likelihood function is three times differentiable, as in the case we consider. Finally, Assumption 4 implies that the process pt (θ) is bounded away from zero and one. This is re215

quired for defining properly the quantities (10)-(13). Obviously, such conditions

are satisfied by the logistic model (2) and the probit model (3). The follow-

AN US

ˆ Its proof is ing theorem shows the asymptotic normality of conditional MLE θ.

based on standard extension of the arguments given by [23], among others; see Lemma A-1 of the Appendix. Additionally, the Appendix contains Lemma A-2, 220

ˆ does not depend on the initial value of κt . which shows that the maximizer θ Theorem 2. For model (5) and under the assumptions of Theorem 1 and As-

M

sumptions 1-4, there exists an open neighborhood of the true value θ 0 , such that a unique conditional MLE exists with probability converging to one, as N → ∞.

ˆ does not depend on the initial values of {κt } and it is consistent Furthermore, θ

ED

and asymptotically normally distributed, i.e. √

D

ˆ − θ 0 ) −→ N (0, G−1 (θ 0 )), N (θ

PT

ˆ is a consistent estimator of where G is defined in (12). In addition, G(θ)

CE

G(θ 0 ).

AC

5. Simulations

225

We present a limited simulation study in this section. We consider g(·)

to be linear and the link function is chosen to be either the standard logistic FL or the standard normal Φ. We consider linear models of different order (p = 1, 2, 3 and q = 1) and for different sample size (N = 200, 1000, 1500, 2500).

All computations have been implemented in R ([46]). We perform 1000 simulations for every case, discarding the first 500 observations after generating 13

ACCEPTED MANUSCRIPT

230

data, to ensure that the stationarity region has been reached. The true paramPp eters have to satisfy the condition 4|a1 |/3 + i=1 |b1 | < 1 when F = FL and √ √ Pp |a1 | 2π/( 2π − 1) + i=1 |b1 | < 1 when F = Φ (see Table 1). To initiate the

CR IP T

algorithm, we set κ0 = 1 and we obtain initial values for the parameter vector θ

by a simple generalized linear model fitting. The maximum likelihood estima235

tors and their standard errors are displayed in Table 2. Corresponding QQ-plots

and histograms of the simulated estimators are not displayed for saving space, but the asserted approximation to the normal distribution is satisfactory in

every case, especially when the sample size grows larger. Table 2 shows that

240

AN US

the estimators approximate their true value quite satisfactorily in both cases considered (g = FL−1 or Φ−1 ), especially when p = 1.

We investigate an interesting scenario is when the observed time series and the respective model is stationary but possesses a ”long memory” effect. Such a model, practically, would lead to persistent runs of ones and zeros in the observed binary process. For the linear logistic model (2) such a process can be generated by the sets of true values given in Table 3, for example. In the latter

M

245

two cases (the long memory effect is stronger), the conditions the parameters

ED

have to satisfy according to Theorem 1 are violated. To carry out the estimation we employ unconstrained optimization. The obtained estimators are consistent, even for small sizes. These results indicate that the constraints imposed on the parameters are not very sharp in case the linear model is considered and thus,

PT

250

CE

the stationarity region may be even larger.

6. Covariates One of the advantages of model (5) studied in this paper is the ease of

AC

introducing time dependent covariates. Suppose that {Xt } is some covariate

time series. Define the the σ-field to Ft = σ(κ1−q , . . . , κ0 , Xs+1 , Ys , s ≤ t) and consider the logistic model (2), for instance. Then we obtain the model λt = d + a1 λt−1 + b1 Yt−1 + γ 0 Xt ,

14

(14)

ACCEPTED MANUSCRIPT

model

Par.

Par.

fitted

True

Estim.

Sample size N =200

N =1000

N =1500

N =2500

0.5

0.511

(0.431)

0.495

(0.205)

0.504

(0.166)

0.503

(0.126)

F = FL

-0.3

a ˆ1

-0.292

(0.334)

-0.295

(0.159)

-0.304

(0.130)

-0.301

(0.096)

p=1

1

ˆb1

0.998

(0.351)

0.997

(0.156)

1.004

(0.126)

1.000

(0.099)

(5)

0.5



0.508

(0.350)

0.505

(0.166)

F =Φ

-0.3

a ˆ1

-0.294

(0.261)

-0.304

(0.130)

p=1

1

ˆb1

1.000

(0.268)

1.003

(0.125)

(5)

0.5



0.402

(0.193)

0.484

(0.110)

F = FL

0.15

a ˆ1

0.061

(0.319)

0.096

(0.185)

p=2

-0.4

ˆb1

-0.249

(0.219)

-0.358

(0.112)

-0.35

ˆb2

-0.288

(0.235)

-0.351

(5)

0.5



0.451

(0.135)

F =Φ

0.15

a ˆ1

0.078

p=2

-0.4

ˆb1

-0.327

-0.35

ˆb2

(5)

0.5



F = FL

0.15

p=3

-0.4

0.515

(0.143)

0.506

(0.107)

-0.309

(0.109)

-0.306

(0.082)

0.997

(0.100)

1.001

(0.076)

0.490

(0.096)

0.500

(0.082)

0.111

(0.162)

0.114

(0.137)

-0.373

(0.094)

-0.384

(0.073)

-0.350

(0.114)

-0.358

(0.095)

(0.068)

0.504

(0.055)

0.504

(0.076)

(0.200)

0.088

(0.119)

0.092

(0.102)

0.107

(0.085)

(0.152)

-0.379

(0.075)

-0.388

(0.061)

-0.390

(0.049)

-0.323

(0.163)

-0.363

(0.090)

-0.366

(0.076)

-0.361

(0.063)

0.354

(0.208)

0.459

(0.125)

0.479

(0.112)

0.497

(0.100)

a ˆ1

0.036

(0.315)

0.085

(0.209)

0.088

(0.196)

0.095

(0.166)

ˆb1

-0.216

(0.204)

-0.350

(0.117)

-0.362

(0.096)

-0.378

(0.073)

-0.23

ˆb2

-0.147

(0.208)

-0.211

(0.134)

-0.222

(0.120)

-0.230

(0.098)

-0.15

ˆb3

-0.101

(0.214)

-0.132

(0.121)

-0.142

(0.102)

-0.151

(0.085)

(5)

0.5



M

AN US

(0.136)

CR IP T

(5)



0.440

F =Φ

0.15 -0.4 -0.23

0.507

(0.097)

0.512

(0.089)

0.519

(0.078)

0.048

(0.209)

0.073

(0.145)

0.078

(0.130)

0.083

(0.115)

ˆb1

-0.312

(0.161)

-0.375

(0.075)

-0.385

(0.063)

-0.393

(0.050)

ˆb2

-0.169

(0.164)

-0.215

(0.091)

-0.214

(0.083)

-0.217

(0.068)

ˆb3

-0.127

(0.161)

-0.155

(0.081)

-0.159

(0.070)

-0.160

(0.061)

PT

-0.15

(0.156)

a ˆ1

ED

p=3

0.509

Table 2: Maximum likelihood estimators and their standard errors (in parentheses) for model

CE

(5) with g(·) linear, F = FL or F = Φ, p = 1, 2, 3, q = 1 and different sample sizes N = 200, 1000, 1500, 2500. Results are based on 1000 runs.

AC

where γ is a vector of unknown parameters. If {Xt } is itself weakly dependent, then we can construct a two-dimensional process {κt , Xt+1 } and a corresponding

255

three dimensional with {Yt } included. If the transition mechanism of {Xt } does not depend on {κt , Yt }, it is simple to find conditions for weak dependence. The triangular structure when {Xt } is exogenous allows for separate conditions for {Xt }. The conditions for {κt , Yt } are exactly as before. Inference for model 15

ACCEPTED MANUSCRIPT

Par.

Par.

Sample size

True

Estim.

-0.5



-0.516

(0.188)

-0.501

(0.109)

-0.506

(0.076)

-0.500

(0.060)

-0.502

(0.047)

0.5

a ˆ1

0.431

(0.209)

0.480

(0.114)

0.489

(0.080)

0.493

(0.064)

0.494

(0.049)

1

ˆb1

1.016

(0.297)

0.999

(0.188)

1.005

(0.134)

1.000

(0.106)

1.002

(0.084)

0.5



0.476

(0.432)

0.489

(0.268)

0.493

-0.95

a ˆ1

-0.940

(0.079)

-0.948

(0.021)

-0.948

2.5

ˆb1

2.493

(0.495)

2.528

(0.298)

2.514

-1



-0.990

(0.459)

-0.995

(0.271)

-0.997

-0.5

a ˆ1

-0.496

(0.136)

-0.503

(0.075)

-0.499

4

ˆb1

4.024

(0.652)

4.013

(0.392)

4.002

N =500

N =1000

N =1500

(0.174)

0.506

(0.149)

0.499

(0.112)

(0.014)

-0.949

(0.010)

-0.950

(0.008)

(0.204)

2.497

(0.170)

2.500

(0.127)

(0.193)

-0.997

(0.158)

-1.000

(0.127)

(0.055)

-0.502

(0.044)

-0.501

(0.033)

(0.268)

3.997

(0.226)

4.006

(0.180)

AN US

Table 3: Maximum likelihood estimators and their standard errors (in parentheses) for model (5) with g(·) linear, F = FL , p = 1, q = 1 and different sample sizes N = 200, 500, 1000, 1500, 2500. Results are based on 1000 runs.

(14) is developed by partial likelihood theory, see [22, Ch.1] among others. The 260

asymptotic theory can be developed as in Section 4.

M

To illustrate the performance of our modeling, we considered xt to be an AR(1) model with φ = 0.95 and φ = 0.5. We generated data by the model

ED

λt = d + a1 λt−1 + b1 Yt−1 + c1 xt .

(15)

In the first case (φ = 0.95), we practically investigate the scenario that the

PT

covariate process is highly persistent. The maximum likelihood estimators and their respective standard errors are displayed in Table 4 for different sets of true values. The results indicate that the estimators approach satisfactorily the true values, especially for large sample sizes.

CE

265

AC

7. Data Analysis We compare the performance of various models to real data. We apply

linear models whose parametrization is based on either the logit or the probit link functions. The feedback term κt−1 may be also dropped, yielding the no-

270

feedback model; that is classical autoregressive models for binary data. We consider F = FL , κt = λt and F = Φ, κt = πt . We apply these models to 16

N =2500

CR IP T

N =200

ACCEPTED MANUSCRIPT

Par.

Par.

Sample size

True

Estim.

0.5



0.624

(0.843)

0.543

(0.267)

0.525

(0.179)

0.514

(0.147)

0.511

0.3

a ˆ1

0.306

(0.178)

0.304

(0.106)

0.303

(0.070)

0.303

(0.056)

0.300

(0.043)

-1

ˆb1

-1.176

(0.805)

-1.089

(0.468)

-1.048

(0.315)

-1.037

(0.255)

-1.023

1.5

cˆ1

1.720

(2.504)

1.550

(0.227)

1.528

0.2



0.288

(0.482)

0.224

(0.275)

0.208

-0.3

a ˆ1

-0.274

(0.186)

-0.297

(0.114)

-0.294

2.2

ˆb1

2.109

(0.433)

2.183

(0.292)

2.184

0.5

cˆ1

0.514

(0.143)

0.507

(0.086)

0.591

0.5



0.545

(0.273)

0.524

(0.161)

0.506

0.3

a ˆ1

0.312

(0.124)

0.306

(0.076)

0.301

-1

ˆb1

-1.102

(0.475)

-1.049

(0.284)

1.5

cˆ1

1.561

(0.259)

1.523

(0.152)

0.2



0.222

(0.421)

0.208

(0.253)

0.206

(0.172)

0.211

(0.151)

0.205

(0.111)

-0.3

a ˆ1

-0.277

(0.177)

-0.292

(0.109)

-0.299

(0.084)

-0.302

(0.069)

-0.301

(0.054)

2.2

ˆb1

2.156

(0.408)

2.200

(0.271)

2.198

(0.194)

2.201

(0.169)

2.200

(0.128)

0.5

cˆ1

0.508

(0.195)

0.503

(0.121)

0.502

(0.088)

0.503

(0.070)

0.502

(0.053)

N =1000

N =1500

(0.154)

1.516

(0.124)

1.511

(0.096)

(0.191)

0.208

(0.156)

0.208

(0.127)

(0.087)

-0.298

(0.070)

-0.303

(0.055)

(0.211)

2.195

(0.172)

2.203

(0.134)

(0.062)

0.502

(0.049)

0.503

(0.040)

(0.114)

0.506

(0.096)

0.501

(0.069)

(0.053)

0.300

(0.042)

0.300

(0.033)

-1.015

(0.196)

-1.013

(0.164)

-1.002

(0.119)

1.508

(0.106)

1.510

(0.087)

1.502

(0.069)

M

(15) for different sample sizes N = 200, 500, 1000, 1500, 2500. Results are based on 1000 runs. The upper two panels correspond to φ = 0.95 and the bottom two panels correspond to

ED

φ = 0.5 (xt = φ xt−1 + t ).

six binary time series reported by [47, Ch.13]. These time series represent

PT

six thinly traded shares at the Johannesburg Stock Exchange for the period between 5 October 1987 to 3 June 1991 (910 days). These data are binary

CE

because for each share, the presence (1) or absence (0) of trading is recorded, throughout the period of observation. Note that during this time period, market was dominated by investors who had certain preferences towards shares. So

AC

there existed periods of no trading activity. The approach taken here identifies possible trading behavior (see also [47, Ch.13]).

280

Three of the six shares are from the coal sector and three from the diamonds

sector. The coal shares are Amcoal (1), Vierfontein (2) and Wankie (3), and the diamond shares Anamint (4), Broadacres (5) and Carrigs (6). Figure 1 displays

17

(0.116)

(0.204)

Table 4: Maximum likelihood estimators and their standard errors (in parentheses) for model

275

N =2500

CR IP T

N =500

AN US

N =200

5

||||||||||||||||||||||||||||||||||||| ||||||||||||||||||||||||||| | |||||||||||||||||||||||||||||||||||| |||||||||||||||||| |||||||||||||||||||||||||||||||||||||| |||||||||||||| | | |||||| |

4

||| ||| |||||||| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| || |||| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

3

|||| ||| || | || |||||||| ||| || | | |

2

|||||||||||||||| ||||||||||||||||||||||||||||||||||||||||||||||||||| |||||||||||||||| |||||||||||||||||||||||||||||||| ||||||||||||||||||||||||||||||||||||||||| ||||||||||||||||||| |||||||||||||||||||| ||||||||||||||||||| ||||| |||| |||||||||||||||||

||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||||||||||||||||||||||||||||||

||||||||||| ||||||

| || |||||||| ||||||| | | ||| | | | |||| ||| |||| ||||| || | | ||||| |||||| | | |||||||| || | | ||| |||| | |

AN US

0

||||||||||||||||||||| |||||||||||||||||||||||||||||||||||||||||| ||||| ||||||||| ||||| || | ||||||| ||||| | ||||||||

CR IP T

6

||||||||||||||||||||||||||||||||||||| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ||||||||||||||||||

1

share number

ACCEPTED MANUSCRIPT

200

400

600

800

days

Figure 1:

Trading shares:

1=”Amcoal”, 2=”Vierfontein”, 3=”Wankie”, 4=”Anamint”,

M

5=”Broadacres” and 6=”Carrigs”. The vertical lines represent presence of trading.

ED

the presence or absence of trading for the six trading shares respectively. Figure 2 displays the autocorrelation functions (a.c.f.) of those six binary time series. 285

We note that the a.c.f. for the Wankie and Amcoal data resembles the a.c.f. of

PT

a white noise sequence.

Initially, we apply only models (2) and (3) to each time series for comparing the logit and probit link based models. The corresponding estimators are

AC

CE

reported in Table 5 with their standard errors given inside the parentheses.

18

10

15

20

25

30

10

15

20

25

30

0

10

15

20

25

PT

Lag

30

0

5

10

15 Lag

Broadacres

Carrigs

ACF

ACF

5

ED

0

5

Lag

M

5

0.0 0.2 0.4 0.6 0.8 1.0

0.0 0.2 0.4 0.6 0.8 1.0

ACF

Anamint

0.0 0.2 0.4 0.6 0.8 1.0

ACF

0

Lag

10

15 Lag

20

25

30

0

5

10

15 Lag

AC

CE

Figure 2: The autocorrelation functions of the six trading shares.

19

20

25

30

20

25

30

0.0 0.2 0.4 0.6 0.8 1.0

5

Wankie

AN US

ACF 0

0.0 0.2 0.4 0.6 0.8 1.0

ACF

Vierfontein

0.0 0.2 0.4 0.6 0.8 1.0

Amcoal

CR IP T

ACCEPTED MANUSCRIPT

0.0003 (0.007) 0.967 (0.019) 0.9995 (0.005)

(3) (2) (3) 0.102 (0.049) -0.0012(0.013)

(2) (3)

M

0.184 (0.042)

0.294 (0.068) 0.446 (0.075)

0.726 (0.122)

0.677 (0.064)

0.675 (0.065)

-0.180 (0.036)

-0.292 (0.059)

Anamint

AN US

0.207 (0.116)

0.363 (0.206)

0.607 (0.180)

0.629 (0.194)

-0.443 (0.197)

-0.707 (0.358)

Wankie

0.231 (0.051)

0.375 (0.084)

0.873 (0.030)

0.875 (0.031)

-0.133 (0.029)

-0.214 (0.048)

Broadacres

Diamond Shares

0.443 (0.065)

0.742 (0.109)

0.788 (0.034)

0.783 (0.035)

-0.218 (0.033)

-0.364 (0.056)

Carrigs

CR IP T

Table 5: Maximum likelihood estimators and their standard errors (in parentheses) for the six trading shares after fitting models (2) and (3).

ˆb1

0.906 (0.023)

0.909 (0.023)

-0.091 (0.021)

-0.146 (0.034)

Vierfontein

Coal Shares

ED

PT

a ˆ1

-0.042 (0.022)

(2)

Amcoal

Model



CE

MLE Estimators

AC

ACCEPTED MANUSCRIPT

20

ACCEPTED MANUSCRIPT

We observe from Table 5 that the feedback coefficient a ˆ1 is estimated sim-

290

ilarly after applying either the logistic or probit model. However, there are notable differences in the estimation of the coefficients b1 and d. Furthermore,

CR IP T

the signs of the estimators are the same for all parameters, in almost all cases, with the exception of Amcoal. The standard errors of the parameter estimators 295

obtained by fitting model (3) are smaller than those obtained by fitting model (2). This fact has been also observed in the simulation study. We compare

all models, based on (5), using F = FL or F = Φ, with or without feedback,

for different values of the order p. The comparison is performed in terms of

300

AN US

the Akaike information criterion (AIC) and the Bayesian information criterion (BIC). The results are displayed in Table 6. We observe that the best model is (5) with F = Φ and p = 1 (i.e. model (3)), because it minimizes both criteria with the exception of the Wankie share. Model (5) with F = FL and p = 1 (i.e. model (2)), is overall the second best model with values of AIC and BIC very close to those obtained by model (3). Note that the values of AIC and BIC for the models without feedback and of the same order are approximately the same

M

305

regardless of the link function. In the case of feedback models with p greater

AC

CE

PT

ED

than one, the probit links yields smaller values.

21

1085.45,1104.71 1084.37,1108.44

p=3

p=4

no feedback

22 1085.55,1104.80 1084.47,1108.54

p=3

p=4

no feedback

BIC

M

1187.54,1211.61

1199.31,1218.57

1218.43,1232.87

1224.87,1234.49

1199.56,1223.63

1212.36,1231.62

1157.29,1171.73

1167.43,1177.06

1168.48,1192.55

1166.72,1185.97

1147.94,1162.38

767.20,791.27

765.76,785.01

766.71,781.15

769.33,778.96

770.33,794.39

769.02,788.27

767.32,781.76

767.17,791.24

765.77,785.02

BIC

Anamint AIC

1149.75,1173.82

1153.60,1172.86

1157.21,1171.65

1167.43,1177.06

BIC

1154.58,1178.65

1158.72,1177.97

1121.91,1136.35

1143.62,1167.68

1153.71,1172.96

1162.70,1177.14

1173.53,1183.15

1163.19,1187.26

1164.78,1184.04

1123.42,1137.86

AIC

Broadacres

1143.55,1167.62

1153.58,1172.84

1162.68,1177.12

1173.53,1183.15

CR IP T

1158.23,1182.29

1158.64,1177.89

1147.00,1161.44

1150.01,1174.08

1153.94,1173.20

AN US

1168.90,1183.34

1187.65,1211.72

1199.38,1218.64

1218.44,1232.88

766.72,781.16

769.33,778.96

1224.87,1234.49

770.72,794.79

769.40,788.65

767.42,781.86

AIC

1210.73,1234.79

1218.72,1237.97

1169.55,1183.99

BIC

Wankie

Shares

Table 6: Values of the AIC and BIC for the six trading shares obtained after fitting different models.

1086.02,1100.46

p=2

F =Φ

1085.10,1094.73

p=1

1085.47,1109.54

p=3

(5)

1085.94,1105.19

p=2

F =Φ

AIC

Vierfontein

ED

1069.19,1083.63

p=1

(5)

1085.99,1100.44

p=2

F = FL

1085.10,1094.73

p=1

1086.00,1110.06

p=3

(5)

1085.93,1105.18

p=2

F = FL

1075.02,1089.46

p=1

BIC

PT

AIC

Amcoal

CE

(5)

Model

AC BIC

1090.26,1114.33

1095.65,1114.90

1119.06,1133.50

1143.29,1152.92

1115.80,1139.87

1126.15,1145.41

1071.49,1085.93

1090.04,1114.11

1095.39,1114.65

1118.93,1133.37

1143.29,1152.92

1151.81,1175.88

1153.61,1172.86

1072.43,1086.87

AIC

Carrigs

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIPT

Acknowledgements The authors thank the Editor, Associate Editor and two reviewers for several 310

useful comments that improved the presentation considerably. This work has

CR IP T

been carried out while the first author was visiting the Department of Mathematics and Statistics, University of Cyprus. He would like to thank all the

AC

CE

PT

ED

M

AN US

members of the Department for their warm hospitality.

23

ACCEPTED MANUSCRIPT

Appendix 315

Proof of Theorem 1. The first step is to show that there exists a weakly dependent strictly stationary process {Xt = (Yt , κt ), t ∈ Z}, which belongs to

CR IP T

L1 . We need to verify condition 3.1 of [16]. Condition 3.2 in the same paper is

assumed, while condition 3.3 trivially holds for this case. We can write Xt as follows Xt

(Yt , κt ) = (1(Ut ≤ pt ), κt ) = (1(Ut ≤ F (κt )), κt )

=

AN US

= H(Yt−1 , Yt−2 , . . . , Yt−p , κt−1 , κt−2 , . . . , κt−q ; Ut ) = H(x; Ut ),

where κt = g(κt−1 , . . . , κt−q , Yt−1 , . . . , Yt−p ) = g(x) and Ut is a sequence of uniform random variables on (0, 1). We define for a vector x = (y, κ) ∈ {0, 1}×R the norm ||x|| = |y| + |κ|, ∀ > 0. Then EkH(x; Ut ) − H(x0 ; Ut )k =

M

= E(|1(Ut ≤ F (g(x))) − 1(Ut ≤ F (g(x0 )))| + |g(x) − g(x0 )|) = |F (g(x)) − F (g(x0 ))| + |g(x) − g(x0 )| ≤ sup|F 0 (ω)| · |g(x) − g(x0 )| + |g(x) − g(x0 )|

ED

ω

PT

= (K + ) |g(x) − g(x0 )| " p # q X X 0 0 ≤ (K + ) βi · |Yt−i − Yt−i | + γi · |κt−i − κt−i | i=1



i=1

min(p,q)

X

≤ (K + ) 

CE

i=1

0 (βi + γi /) · kXt−i − Xt−i k +

AC

where



min(p,q)

a = (K + ) 

X

i=min(p,q)

  β, if p > q, i δ1 =  γ /, otherwise. i

Then X i=1



max(p,q)

(βi + γi /) +

max(p,q)

X

i=min(p,q)

24



δi  = (K + )

0 δi · kXt−i − Xt−i k  ,

p X i=1

βi +

q X i=1

!

γi / .

ACCEPTED MANUSCRIPT

Pq Pp We need to assume that a < 1 (see [16]). If K < 1, we choose  = [K i=1 γi ] / [(1 − K) i=1 βi ] . Pq Pp Then, a < 1 yields ( i=1 γi ) /(1 − K) + i=1 βi < 1 and condition 3.1 is satPq Pp isfied. If K ≥ 1, we choose  = (K i=1 γi ) / i=1 βi . Then, a < 1 yields Pq Pp (K + 1)( i=1 γi + i=1 βi ) < 1.

CR IP T

320

This concludes the first part of the proof. For the second part, since |Yt | ≤ 1,

325

we need to show that E|κt |s < ∞. From (6), we have

|g(Yt−1 , . . . , Yt−p , κt−1 , . . . , κt−q ) − g(0)| ≤

q X i=1

γi · |κt−i | +

p X i=1

βi · |Yt−i |.

AN US

We have already proved that E|κt | < ∞ (since (Yt , κt ) ∈ L1 ). We use induction. From the above relation we get that

q X

|κt | ≤ |g(0)| +



i=1

βi . Then

c+ " q X

q X i=1

γi · |κt−i | #s

ED

|κt |s

Pp

=

i=1

i=1

!s

M

Let c = |g(0)| +

γi · |κt−i | +

γi · |κt−i |

=

p X

βi .

i=1

" q s   X X s

n=0

+ Rs−1 ≤

n

q X i=1

i=1

γi · |κt−i |

#n

cs−n

γi |κt−i |s + Rs−1 ,

E|κt |s ≤

q X i=1

γi |κt−i |s + C,

CE

PT

due to convexity, where Rs−1 is a polynomial of order (s − 1). Hence,

and the desired result follows under the condition in Theorem 1.

Pq

i=1

γi < 1, already implied

AC

In case when p = q = 1, we could place a different condition. We have that

330

kH(x; Ut ) − H(x0 ; Ut )k



0 (K + ) max(β1 , γ1 /)kXt−1 − Xt−1 k

=

0 (γ1 + β1 K)kXt−1 − Xt−1 k ,

where  = γ1 /β1 . Hence the condition becomes γ1 + Kβ1 < 1. If K < 1 it becomes γ1 + β1 < 1 as in Table 1. 25

ACCEPTED MANUSCRIPT

Lemma A-1. Under the Assumptions of Theorem 1 and Assumptions 1-4, we have the following results, as N → ∞. (i) The score function defined in (11) satisfies

CR IP T

1 D √ SN (θ 0 ) −→ N (0, G(θ 0 )), N

where G(θ) is a positive definite matrix, defined in (12). (ii) The Hessian matrix defined in (13) satisfies 1 p HN (θ 0 ) −→ G(θ 0 ). N

p

n θ : ||θ − θ 0 || ≤

AN US

(iii) Within the neighborhood of the true value, O(θ 0 ) = √ o r/ N , r > 0,

N 1 X ∂ 3 lt (θ) max sup ≤ KN , i,j,k θ∈O(θ 0 ) N ∂θi ∂θj ∂θk t=1

such that KN −→ K, where K is a constant.

M

335

Proof of Lemma A-1. Consider model (5)

where

ED

(i) In order to apply the CLT for martingales, we show that {∂lt (θ)/∂θ}t∈N ,

PT

f (κt (θ)) ∂κt (θ) ∂lt (θ) = (Yt − pt (θ)) , ∂θ pt (θ)(1 − pt (θ)) ∂θ

is a sequence of square integrable martingale differences. At the true value θ = θ 0 , we have E (Yt − pt (θ)|Ft−1 ) = 0. We need to show that

CE

340

AC

E|∂lt (θ)/∂θ| < ∞ or equivalently E|∂κt (θ)/∂θ| < ∞. We can write ∂κ(θ) ∂θi

= + +

∂g(Yt−1 , . . . , Yt−p , κt−1 (θ), . . . , κt−q (θ); θ) ∂κt−1 (θ) + ... + ∂κt−1 (θ) ∂θi ∂g(Yt−1 , . . . , Yt−p , κt−1 (θ), . . . , κt−q (θ); θ) ∂κt−q (θ) + ∂κt−q (θ) ∂θi ∂g(Yt−1 , . . . , Yt−p , κt−1 (θ), . . . , κt−q (θ); θ) , i = 1, 2, . . . , s. ∂θi (A-1) 26

ACCEPTED MANUSCRIPT

Setting 0 0 Yt−1 , . . . , Yt−p , κ0t−1 , . . . , κ0t−(i−1) , κ0t−i , κ0t−(i+1) , . . . , κ0t−q )

CR IP T

0 0 = (Yt−1 , . . . , Yt−p , κ0t−1 , . . . , κ0t−(i−1) , κt−i , κ0t−(i+1) , . . . , κ0t−q ),

in (6), we find that

∂g(Yt−1 , . . . , Yt−p , κt−1 (θ), . . . , κt−q (θ); θ) ≤ γi . ∂κt−i (θ)

∂κ(θ) ∂θi

≤ γ1

AN US

Then, (A-1) gives

∂κt−1 (θ) ∂κt−q (θ) + . . . + γq + Ct , ∂θi ∂θi

i = 1, 2, . . . , s,

where |Ct | is bounded by Assumption 3. By repeated substitution, we Pq derive that E|∂κt (θ)/∂θ| < ∞, because i=1 γi < 1. Following [23], we prove the desired result. (ii) and (iii) are also proved in the spirit of the

345

M

proof of a similar result in [23].

Lemma A-2. If Assumptions 1 and 4 hold and the conditions of Theorem 1

ED

are satisfied, then for model (5) it holds that 1 1˜ sup lN (θ) − lN (θ) → 0, N θ∈Θ N

a.s., as N → ∞,

(A-2)

PT

where ˜lN (θ) denotes (10) evaluated at some starting value (κ1−q , . . . , κ0 ).

CE

Proof of Lemma A-2. We need to show that N ! N X 1 X ˜ lim sup lt (θ) = 0, lt (θ) − N →∞ θ∈Θ N t=1

a.s.,

t=1

AC

where ˜lt (θ) is the tth log-likelihood component obtained by setting the starting value to (κ1−q , . . . , κ0 ).

27

ACCEPTED MANUSCRIPT

t=1

t=1

N N X X ˜ = (lt (θ) − lt (θ)) ≤ |lt (θ) − ˜lt (θ)| t=1

t=1

N X Yt log(F (κt θ)) + (1 − Yt ) log(1 − F (κt (θ))) = t=1

CR IP T

We have that N N X X ˜ lt (θ) − lt (θ)

350

− [Yt log(F (˜ κt (θ))) + (1 − Yt ) log(1 − F (˜ κt (θ)))] N  X log(F (κt (θ))) − log(F (˜ ≤ κt (θ))) t=1

 κt (θ))) + log(1 − F (κt (θ))) − log(1 − F (˜ N  X t=1

since Yt ∈ {0, 1}.

 |A1t | + |A2t | ,

AN US



(A-3)

But pt = F (κt (θ)) and therefore Assumption 4 shows that pt is bounded away from 0 and 1. Suppose that pt ∈ I ⊂ (0, 1). Then also F (κt (θ)) ∈ I

and therefore κt (θ) ∈ R, bounded away from ±∞. Recall that an everywhere

M

differentiable function h : R → R is Lipschitz continuous if and only if it has bounded first derivative. Hence, F (·) is Lipschitz continuous from Assumption

ED

4. Then,

|A1t | ≤ K1 |F (κt (θ)) − F (˜ κt (θ))| ≤ K1 K |κt (θ) − κ ˜ t (θ)|,

(A-4)

PT

where the first inequality holds since the function h(x) = log(x) : I → R has bounded first derivative. The second inequality holds, since f (x) is assumed to

CE

be bounded above by a finite positive number K. Similarly, we find that |A2t | ≤ K2 |F (κt (θ)) − F (˜ κt (θ))| ≤ K2 K |κt (θ) − κ ˜ t (θ)|,

(A-5)

AC

where the first inequality holds since the function log(1−x) : I → R has bounded first derivative. Applying recursively (6) we have |κt (θ) − κ ˜ t (θ)|

≤ <

q X j=1

γi · |κt−j (θ) − κ ˜ t−j (θ)|

γt · K 0, 28

(A-6)

ACCEPTED MANUSCRIPT

by repeated substitution, where 0 < γ < 1 and K 0 is a finite positive constant. Hence, from the compactness of θ (see Assumption 1), we can write sup |κt (θ) − κ ˜ t (θ)| ≤ γ t · K 0 .

CR IP T

θ∈Θ

From equations (A-3), (A-4), (A-5) and (A-6) we obtain that N 1 K 0K X |lN (θ) − ˜lN (θ)| ≤ (K1 + K2 ) γ t . N N t=1

AC

CE

PT

ED

M

AN US

The rest of the proof is similar to [23].

29

ACCEPTED MANUSCRIPT

355

References References

CR IP T

[1] B. Kedem, Binary Time Series, Marcel Dekker, New York, 1980. [2] R. D. Stern, R. Coe, A model fitting analysis of daily rainfall data, Journal of the Royal Statistical Society. Series A (General) 147 (1984) pp. 1–34. 360

[3] L. Fahrmeir, H. Kaufmann, Regression models for nonstationary categorical time series, Journal of Time Series Analysis 8 (1987) 147–160.

AN US

[4] E. V. Slud, B. Kedem, Partial likelihood analysis of logistic regression and autoregression, Statist. Sinica 4 (1994) 89–106.

[5] W. Breen, L. R. Glosten, R. Jagannathan, Economic significance of predictable variations in stock index returns, The Journal of Finance 44 (1989)

365

1177–1189.

M

[6] K. C. Butler, S. Malaikah, Efficiency and inefficiency in thinly traded stock markets: Kuwait and Saudi Arabia, Journal of Banking & Finance 16

370

ED

(1992) 197–210.

[7] P. F. Christoffersen, F. X. Diebold, Financial asset returns, direction-of-

PT

change forecasting, and volatility dynamics, Management Science 52 (2006) 1273–1287. doi:10.1287/mnsc.1060.0520.

CE

[8] P. Christoffersen, F. X. Diebold, R. S. Mariano, A. Tay, Y. K. Tse, Direction-of-change forecasts for asian equity markets based on conditional variance, skewness and kurtosis dynamics: Evidence from hong kong and

AC

375

singapore, Journal of Financial Forecasting 1 (2007) 1–22.

[9] R. Startz, Binomial autoregressive moving average models with an application to US recessions, Journal of Business & Economic statistics 26 (2008) 1–8.

30

ACCEPTED MANUSCRIPT

380

[10] H. Nyberg, Dynamic probit models and financial variables in recession forecasting, Journal of Forecasting 29 (2010) 215–230. [11] H. Nyberg, Forecasting the direction of the us stock market with dynamic

CR IP T

binary probit models, International Journal of Forecasting 27 (2011) 561– 578. 385

[12] H. Nyberg, Predicting bear and bull stock markets with dynamic binary time series models, Journal of Banking & Finance 37 (2013) 3351–3363.

Forecasting 31 (2012) 47–67.

AN US

[13] H. Kauppi, Predicting the direction of the Fed’s target rate, Journal of

[14] R. Wu, Y. Cui, A parameter-driven logit regression model for binary time series, Journal of Time Series Analysis.

390

[15] J. Dedecker, P. Doukhan, G. Lang, J. R. Le´ on R., S. Louhichi, C. Prieur, Weak Dependence: With Examples and Applications, Vol. 190 of Lecture

M

Notes in Statistics, Springer, New York, 2007.

ED

[16] P. Doukhan, O. Wintenberger, Weakly dependent chains with infinite memory, Stochastic Process. Appl. 118 (2008) 1997–2013. doi:10.1016/j.spa.

395

2007.12.004.

PT

[17] R. M. de Jong, T. Woutersen, Dynamic time series binary choice, Econometric Theory 27 (2011) 673–702.

CE

[18] D. R. Cox, E. J. Snell, Analysis of Binary Data, 2nd Edition, Vol. 32 of Monographs on Statistics and Applied Probability, Chapman & Hall,

400

AC

London, 1989.

[19] A. Agresti, Categorical Data Analysis, 2nd Edition, Wiley, New York, 2002. doi:10.1002/0471249688.

[20] P. McCullagh, J. A. Nelder, Generalized Linear Models, 2nd Edition, Chap-

405

man & Hall, London, 1989.

31

ACCEPTED MANUSCRIPT

[21] P. J. Brockwell, R. A. Davis, Time Series: Theory and Methods, SpringerVerlag, New York, 1991, second Edition.

ken, NJ, 2002. doi:10.1002/0471266981. 410

CR IP T

[22] B. Kedem, K. Fokianos, Regression Models for Time Series analysis, Hobo-

[23] T. Moysiadis, K. Fokianos, On binary and categorical time series models with feedback, Journal of Multivariate Analysis 131 (2014) 209–228.

[24] T. Bollerslev, Generalized autoregressive conditional heteroskedasticity, J.

AN US

Econometrics 31 (1986) 307–327. doi:10.1016/0304-4076(86)90063-1.

[25] J. R. Russell, R. F. Engle, Econometric analysis of discrete-valued irregularly-spaced financial transactions data using a new autoregres-

415

sive conditional multinomial model, SSRN eLibrary,doi:10.2139/ssrn. 106528.

[26] J. R. Russell, R. F. Engle, A discrete-state continuous-time model of fi-

M

nancial transactions prices and times, Journal of Business and Economic Statistics 23 (2005) 166–180. doi:10.1198/073500104000000541.

420

ED

[27] T. H. Rydberg, N. Shephard, Dynamics of trade-by-trade price movements: decomposition and models, Journal of Financial Econometrics 1 (2003) 2–

PT

25.

[28] D. Tjøstheim, Rejoinder on: Some recent theory for autoregressive count time series, TEST 21 (2012) 469–476.

CE

425

[29] K. Fokianos, D. Tjøstheim, Log-linear Poisson autoregression, J. Multivari-

AC

ate Anal. 102 (2011) 563–578. doi:10.1016/j.jmva.2010.11.002.

[30] D. R. Cox, Statistical analysis of time series: some recent developments,

430

Scand. J. Statist. 8 (1981) 93–115.

[31] S. L. Zeger, B. Qaqish, Markov regression models for time series: a quasi-likelihood approach, Biometrics 44 (1988) 1019–1031. doi:10.2307/ 2531732. 32

ACCEPTED MANUSCRIPT

[32] H. Kauppi, P. Saikkonen, Predicting US recessions with dynamic binary response models, The Review of Economics and Statistics 90 (2008) 777– 791.

435

CR IP T

[33] H. Kauppi, Yield-curve based probit models for forecasting US recessions: stability and dynamics, Tech. rep., Aboa Centre for Economics (2008).

[34] H. Nyberg, Studies on binary time series models with applications to em-

pirical macroeconomics and finance, Ph.D. thesis, Universiy of Helsinki (2010).

440

AN US

[35] K. Fokianos, Count time series models, in: T. S. Rao, S. S. Rao, C. R. Rao (Eds.), Handbook of Statistics: Time Series Analysis–Methods and Applications, Vol. 30, Elsevier B. V., Amsterdam, 2012, pp. 315–347. [36] P. Doukhan, S. Louhichi, A new weak dependence condition and applications to moment inequalities, Stochastic Process. Appl. 84 (1999) 313–342.

445

M

doi:10.1016/S0304-4149(99)00055-1.

[37] P. Doukhan, K. Fokianos, D. Tjøstheim, On weak dependence conditions

ED

for Poisson autoregressions, Statist. Probab. Lett. 82 (2012) 942–948. doi: 10.1016/j.spl.2012.01.015. [38] K. Fokianos, A. Rahbek, D. Tjøstheim, Poisson autoregression, J. Amer.

PT

450

Statist. Assoc. 104 (2009) 1430–1439, with electronic supplementary mate-

CE

rials available online. doi:10.1198/jasa.2009.tm08270. [39] M. Neumann, Absolute regularity and ergodicity of poisson count processes,

AC

Bernoulli 17 (2011) 1268–1284.

455

[40] K. Fokianos, D. Tjøstheim, Nonlinear Poisson autoregression, Ann. Inst. Statist. Math. 64 (2012) 1205–1225. doi:10.1007/s10463-012-0351-3.

[41] B. M. P¨ otscher, I. R. Prucha, Dynamic nonlinear econometric models, Springer-Verlag, Berlin, 1997, asymptotic theory.

33

ACCEPTED MANUSCRIPT

[42] D. W. K. Andrews, Nonstrong mixing autoregressive processes, J. Appl. Probab. 21 (1984) 930–934.

460

[43] I. Berkes, L. Horv´ ath, P. Kokoszka, GARCH processes: structure and es-

CR IP T

timation, Bernoulli 9 (2003) 201–227. doi:10.3150/bj/1068128975.

[44] C. Francq, J.-M. Zakoian, GARCH Models: Structure, Statistical Inference and Financial Applications, John Wiley & Sons, UK, 2011. 465

[45] M. Meitz, P. Saikonnen, Ergodicity, mixing and existence of moments of

a class of Markov models with applications to GARCH and ACD models,

AN US

Econometric Theory 24 (2008) 1291–1320.

[46] R Core Team, R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, http://www.Rproject.org (2013).

470

[47] W. Zucchini, I. L. MacDonald, Hidden Markov models for Time Series: An

AC

CE

PT

ED

M

Introduction Using R, CRC Press, Boca Raton, FL, 2009.

34