Input–output properties of the Page–Hinkley detector

Input–output properties of the Page–Hinkley detector

Systems & Control Letters 60 (2011) 486–491 Contents lists available at ScienceDirect Systems & Control Letters journal homepage: www.elsevier.com/l...

272KB Sizes 2 Downloads 13 Views

Systems & Control Letters 60 (2011) 486–491

Contents lists available at ScienceDirect

Systems & Control Letters journal homepage: www.elsevier.com/locate/sysconle

Input–output properties of the Page–Hinkley detector✩ László Gerencsér a,∗ , Cecilia Prosdocimi b a

MTA SZTAKI, Hungarian Academy of Sciences, Budapest, Hungary

b

Department of Economics and Business, LUISS Guido Carli, Rome, Italy

article

info

Article history: Received 28 April 2010 Received in revised form 9 January 2011 Accepted 4 April 2011 Available online 8 May 2011 Keywords: Page–Hinkley detector L-mixing Exponential inequalities False alarm rate

abstract We consider the stochastic input–output properties of a simple non-linear dynamical system, the socalled Page–Hinkley detector, playing a key role in change detection, and also in queuing theory. We show that for L-mixing inputs with negative expectation the output process of this system is L-mixing. The result is applied to get an upper bound for the false alarm rate. The proof is then adapted to get a similar result for the case of random i.i.d. inputs. Possible extensions and open problems are given in the discussion. © 2011 Elsevier B.V. All rights reserved.

An alarm is given if gn exceeds a pre-fixed threshold δ > 0. The moment of alarm is defined by

1. Introduction Detection of changes of statistical patterns is a fundamental problem in many applications; for a survey see [1,2]. A basic method for detecting temporal changes is the Cumulative Sum (CUSUM) test or Page–Hinkley detector, introduced by Page [3] and analyzed later, among others, by Hinkley [4] and Lorden [5]. The CUSUM test or Page–Hinkley detector is defined via a sequence of random variables (r.v.-s) (Xn ), often called residuals in the engineering literature, such as likelihood ratios, such that

E(Xn ) < 0 for n ≤ τ ∗ − 1, and E(Xn ) > 0 for n ≥ τ ∗ , with τ ∗ denoting the change point. To give an example, in the case of i.i.d. samples with densities f (x, θ0 ) and f (x, θ1 ) before and after the change point, we would set Xn = − log f (xn , θ0 ) + log f (xn , θ1 )

∑n

where xn is the nth sample. Letting S0 := 0 and Sn := k=1 Xk , the CUSUM statistics or Page–Hinkley detector is defined for n ≥ 0 as gn := Sn − min Sk = max (Sn − Sk ). 0≤k≤n

✩ This work was supported by the CNR–MTA Cooperation Agreement, and by the University of Padova. ∗ Corresponding author. Tel.: +36 1 279 6138; fax: +36 1 4667 503. E-mail addresses: [email protected], [email protected] (L. Gerencsér).



τˆ = τˆ (δ) = inf n | Sn − min Sk > δ . 0≤k≤n

(2)

The Page–Hinkley detector was first used for independent observations, but its range of applicability has been extended for dependent sequences. The applicability of the Page–Hinkley detector to ARMA systems with unknown dynamics before and after the change has been demonstrated in [6], using heuristic arguments and simulations, and later adapted to Hidden Markov Models (HMM-s) in [7]. The Page–Hinkley detector for HMMs, with known dynamics before and after the change, was also considered in [8], but no detailed analysis of the proposed algorithm was given. For a special class of dependent sequences the Page–Hinkley detector was used in [9], again without a theoretical analysis. Change detection for general dependent sequences was first rigorously studied in [10] under the very weak condition that lim N

(1)

0≤k≤n

0167-6911/$ – see front matter © 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.sysconle.2011.04.004



τ +N 1 −

N n=τ

Xn = I > 0 ,

where the convergence is meant in probability, and where Xn is the conditional loglikelihood ratio (as in Eq. (9) in Section 3 below). A deep theoretical analysis of the expected delay with given Average Run Length for HHM-s is provided in [11,12]. The Page–Hinkley detector (gn ) can be equivalently defined via a non-linear dynamical system, with a+ = max{0, a}, as follows: gn = (gn−1 + Xn )+

with g0 = 0.

(3)

L. Gerencsér, C. Prosdocimi / Systems & Control Letters 60 (2011) 486–491

From a system-theoretic point of view this system is not stable in any sense. E.g., for a constant positive input, (gn ) becomes unbounded, and the effect of initial perturbations may not vanish. On the other hand, for an i.i.d. input sequence (Xn ), with E(Xn ) < 0, some stability of the output process (gn ) can be expected. The resulting non-linear stochastic system is a standard object in queuing theory (see [13] Chapter 1 and [14] Chapters 1.5 and 3.6), and in the theory of risk processes (see [15]). In this case the process (gn ) is clearly a homogeneous Markov chain, also called a one-sided random walk or Lindley process. A number of stability properties of (gn ) have been established in [14,16,17], as we will recall in Section 2. The purpose of this paper is to extend these results motivated by change detection for HMM-s, as described in Section 3. After giving a brief overview of the results for the i.i.d. case, we show that for L-mixing inputs with negative expectation and further technical conditions, such as boundedness, the output process (gn ) of this system is L-mixing. (For the definition of L-mixing see Appendix and [18] for further details). The result is applied to get an upper bound for the false alarm rate. The proof is adapted to get a similar result for the more standard case of random i.i.d. inputs with negative expectation, and finite exponential moments of some positive order, reproducing some known tight bounds for the false alarm rate. Further possible extensions and open problems are formulated in the Discussion section. The assumption that (Xn ) is an i.i.d. sequence reflects the tacit assumption that actually there is no change at all, i.e. τ ∗ = +∞. The Page–Hinkley detector can still be used to monitor the process, and we may occasionally get an alarm. Our results can be applied to give an upper bound for the almost sure false alarm rate as a function of the threshold δ , defined as lim sup N −→+∞

N 1 −

N n =1

I{gn >δ} .

(4)

A key quantity in change  detection is the Average Run Length (ARL) defined as E0 τˆ (δ) (see Chapter 6.2 in [19] or [20]). For the i.i.d. case, it is shown in [20] that E τˆ (δ) , defined in terms of the stationary distribution of the Markov chain (gn ), is approximatively reciprocal to the false alarm rate, for large δ . In this case, the false alarm rate could be defined using the Law of Large Numbers for homogeneous Markov chains. However for models with dependent and inhomogeneous input data, the false alarm rate seems not to be directly quantifiable as a pathwise characteristic.





2. The case of i.i.d. inputs

487

To formulate a useful addition we need the following notations:

Fn := σ (Xi | i ≤ n) and Fn+ := σ (Xi | i ≥ n + 1). Thus Fn is the past, and Fn+ is the future of (Xn ) up to time n. Assume:

µ := µ(c ′ ) = E(exp c ′ X1 ) < 1 for some c ′ > 0.

(6)

Assuming that E(X1+ ) > 0, let c be defined by ∗

µ(c ∗ ) = E(exp c ∗ X1 ) = 1.

(7)

Theorem 1. Let (Xn ) be a sequence of i.i.d. r.v.-s such that (6) holds. Then (gn ), defined by Eq. (3), is L-mixing with respect to (w.r.t.) (Fn , Fn+ ). In addition, for any c ′′ such that 0 < c ′′ < c ′ < c ∗ , we have with µ = µ(c ′ )



′′

c ′′





E exp c gn ≤ 1 +

µ



1−µ

c ′ − c ′′

=: Cc ′′ ,c ′ .

(8)

An outline of the proof will be given in Section 5. The theorem holds true if (Xn ) are independent, not necessarily identically distributed, and

µ( ¯ c ′ ) := sup E(exp c ′ Xn ) < 1 for some c ′ > 0. n

3. The case of L-mixing input Consider now the case when the input (Xn ) is L-mixing w.r.t. (Fn , Fn+ ). This condition is motivated by change detection problems for HMM-s. In the case of a HMM with finite state space and continuous read-out, parametrized by θ0 and θ1 before and after the change, the residuals would be defined as Xn = − log p(Yn |Yn−1 , . . . , Y0 , θ0 ) + log p(Yn |Yn−1 , . . . , Y0 , θ1 ). (9)

(Xn ) is L-mixing under certain technical conditions, see [21,22]. We need two additional technical assumptions, using the notations of the Appendix. The first one is fairly mild, requiring that +∞ −

τ γq (τ , X ) < +∞ for all 1 ≤ q < +∞.

(10)

τ =0

The second assumption is much more restrictive, saying that M∞ (X ) < +∞

and Γ∞ (X ) < +∞.

(11)

This condition will be discussed in the Discussion section. We define a critical exponent in terms of M∞ (X ) and Γ∞ (X ) as follows:

β ∗ := ε/(4M∞ (X )Γ∞ (X )).

If the input process (Xn ) is an i.i.d. sequence, then (gn ) is a homogeneous Markov chain. A number of results for this Markov chain are established in [16]. The existence of a unique invariant measure is proven under the hypothesis E(X1 ) < 0 (see Proposition 8.5.1. and Theorem 10.0.1.) Moreover, it is proven that (gn ) is V -geometrically mixing under the assumption E(exp c ′ X1 ) < ∞ for some c ′ > 0 (see Chapter 16.1 for details). It follows that the strong law of large numbers holds for (gn ), see Theorem 17.0.1. An alternative approach to the analysis of (gn ) is given in [17]. It is noted there that the process (gn ) can be generated in a convenient way by repeated applications of random functions: letting fX (g ) := (g + X )+ we have

Then, for any β ≤ β define

gn = fXn fXn−1 fXn−2 . . . fX1 (g0 ).

λ = λ(β ′ )

(5)

Using this representation, the existence of an invariant measure is proven, for the case when E(X1 ) < 0, via a backward iteration.



(12)



  λ = λ(β ′ ) := exp 4M∞ (X )Γ∞ (X )(β ′ )2 − β ′ ε .

(13)

Note that for the critical value β we have λ(β ) = 1, and for β ′ < β ∗ we have λ(β ′ ) < 1. The main result of this section is then the following: ∗



Theorem 2. Let (Xn ) be an L-mixing process w.r.t. (Fn , Fn+ ) such that (10) and (11) are satisfied, and

E(Xn ) ≤ −ε < 0 for all n ≥ 0.

(14)

Let (gn ) be defined as in (3). Then (gn ) is L-mixing w.r.t. (Fn , Fn ). In addition, for any β ′′ , β ′ such that 0 < β ′′ < β ′ < β ∗ , we have with +

E exp β gn ≤ 1 +



′′





β ′′ β ′ − β ′′



λ 1−λ

=: Kβ ′′ ,β ′ .

(15)

488

L. Gerencsér, C. Prosdocimi / Systems & Control Letters 60 (2011) 486–491

Proof of Theorem 2. Use the following equivalent formulation for (gn ): gn = max (Xi + · · · + Xn )+ ,



n−τ − i =1

(16)

1≤i≤n

λn−i+1 / exp(β ′ x)

n −



and define the auxiliary process

λl / exp(β ′ x)

l=τ +1

gn,n−τ (X ) := max (Xi + · · · + Xn )+ .

(17)

1≤i≤n−τ

+∞ −



λl / exp(β ′ x) =

l=τ +1

Lemma 1. Let (Xn ) and β , β and λ be as in Theorem 2. Then ′′

 E exp β gn,n−τ (X ) ≤ 1 + ′′



λ τ +1 exp(−β ′ x).  1−λ





β ′′ ′ β − β ′′



λτ +1 . 1−λ

Proof of Lemma 1. We have (18)

E exp β ′′ gn,n−τ (X ) =





+∞



P(exp β ′′ gn,n−τ (X ) > x)dx.

(20)

0

For the proof of Lemma 1 we need the following result:

For x ≥ 1 we get by Lemma 2

Lemma 2. Let (Xn ) and β and λ be as in Theorem 2. Then for any x≥0 ′

P gn,n−τ (X ) > x ≤





λ τ +1 exp(−β ′ x). 1−λ

(19)

Proof of Lemma 2. We follow the arguments of the proof of   Theorem 3.1 in [23]. First we estimate E exp β ′ (Xi + · · · + Xn ) , 1 ≤ i ≤ n − τ . Define Dk := Xk − E(Xk ),



E exp β



n −

 ′2

Dk − 2M∞ (D)Γ∞ (D)β (n − i + 1)

≤ 1.

k=i

After rearrangement and multiplication by exp β ′ get





k=i E(Xk ), we

∑n



 n −   ′ E exp β Dk + E(Xk )  ′2

≤ exp αβ (n − i + 1) + β

E exp β ′′ gn,n−τ (X ) = 1 +







n −



λ τ +1 =1+ 1−λ

+∞



x

−β ′ /β ′′

E exp β

 Xk

E(Xk ) ,

 2 ≤ exp 4M∞ (X )Γ∞ (X )β ′ (n − i + 1)

Take β ′ < β ∗ . Recalling the definition of λ(β ′ ) we get

E exp β ′

n −

Xk

  (n−i+1) 2 ≤ exp 4M∞ (X )Γ∞ (X )β ′ − β ′ ε

k=i

= λ(β ′ )n−i+1 . Now, for β ′ < β ∗ , we have λ = λ(β ′ ) < 1, and thus we obtain for x≥0

P gn,n−τ (X ) > x ≤





n−τ  − 

P

Xi + · · · + Xn + > x



i=1



n−τ − 

E exp β ′ (Xi + · · · Xn ) / exp(β ′ x)

i =1



β ′′ ′ β − β ′′



λ τ +1 .  1−λ

‖ gn,n−τ (X ) ‖p ≤ Kp λ(τ +1)/p

(22)

for any integer p ≥ 1, where Kp := β1′′





β ′′ β ′ −β ′′

1/p 

p! 1−λ

1/p

.



(gn,n−τ )p . p!

(23)

We continue the proof of Theorem 2. The starting point is (16) and (17), with i replaced by k: gn = max (Xk + · · · + Xn )+ ,

(24)

1≤k≤n

and gn,n−τ (X ) :=

 − β ′ ε(n − i + 1) .



 dx ≤ 1 +

Corollary 1. Under the conditions and notations of Theorem 2 we have



k=i



P(exp β ′′ gn,n−τ (X ) > x)dx

1

exp β ′′ gn,n−τ ≥ 1 + (β ′′ )p

k=i

n −

+∞



The claim follows directly from Lemma 1 and the inequality

with α := 2M∞ (D)Γ∞ (D). Noting that Dk + E(Xk ) = Xk , E(Xk ) ≤ −ε , and α ≤ 4M∞ (X )Γ∞ (X ), we conclude that ′

(21)

For x < 1 we have P(exp β ′′ gn,n−τ (X ) > x) = 1. Combining (20) and (21) we get

k=i



λτ +1 −β ′ /β ′′ x . 1−λ

1

for all k ≥ 1. Obviously E(Dk ) = 0 for all k, M∞ (D) ≤ 2M∞ (X ), and Γ∞ (D) = Γ∞ (X ). By the exponential inequality, given as Theorem 5.1 in [23], applied to the process (Dk )i≤k≤n with weights fk = β ′ we obtain



  λ τ +1 β ′ log x P(exp β gn,n−τ (X ) > x) ≤ exp − 1−λ β ′′ ′′

max (Xk + · · · + Xn )+ .

(25)

1≤k≤n−τ

Since Xn is Fn -adapted for any n ∈ N, it follows that (gn ) is Fn adapted. To show that (gn ) is M-bounded note that gn = gn,n (X ). For any fixed q, let p := ⌈q⌉ be the first integer greater or equal to q. Then, by Corollary 1, we have ‖ gn ‖q ≤‖ gn ‖p ≤ Kp λ1/p . To show that (gn ) is L-mixing we make use of Lemma 4 in Appendix. Let Xk+,n−τ := E Xk |Fn+−τ .





Since (Xn ) is L-mixing, for k ≥ n − ⌈ τ2 ⌉ + 1 , or k − (n − τ ) ≥ τ − ⌈ τ2 ⌉ + 1, Xk+,n−τ is a good approximation of Xk . A key step is to approximate gn by gn++ ,n−τ :=

max

n−⌈ τ2 ⌉+1≤k≤n

(Xk+,n−τ + · · · + Xn+,n−τ )+ .

(26)

L. Gerencsér, C. Prosdocimi / Systems & Control Letters 60 (2011) 486–491

+ Note that gn++ ,n−τ is Fn−τ measurable, as required. For each τ , define

γq (τ ) := sup ‖ gn − ++

n≥τ

Γq++ (g ) :=

+∞ −

gn++ ,n−τ

‖q

and (27)

γq++ (τ ).

τ =0

To estimate gn − gn : g n,n−τ :=

(28)

max

n−⌈ τ2 ⌉+1≤k≤n

we use an intermediate approximation of

(Xk + · · · + Xn )+ .

(29)

Note that g n,n−τ is not necessarily Fn+−τ -measurable. Write ++ ‖ gn − gn++ ,n−τ ‖q ≤‖ gn − g n,n−τ ‖q + ‖ g n,n−τ − gn,n−τ ‖q

γ q (τ ) := sup ‖ gn − g n,n−τ ‖q ,

Γ q (g ) :=

n≥τ

+∞ −

(30)

γ q (τ ),

++

Γ q (g ) :=

n≥τ

γ ++ q (τ ).

τ =0

Γq

(31)

To estimate ‖ gn − g n,n−τ ‖q we use the following inequality: let K be a finite set, and K = K1 ∪ K2 , with K1 ∩ K2 = ∅. Then for any Ak ∈ R+ , k ∈ K max Ak ≤ max Ak + max Ak . k∈K

k∈K1

(32)

k∈K2

With K = {1, . . . , n}, K1 = {1, . . . , n − ⌈ τ2 ⌉}, K2 = {n − ⌈ τ2 ⌉ + 1, . . . , n}: gn − g n,n−τ ≤

max

(Xk + · · · + Xn )+ = gn,n−⌈ τ2 ⌉ (X ).

1≤k≤n−⌈ τ2 ⌉

(33)

Now for any real q ≥ 1 let p := ⌈q⌉. Using Corollary 1 we finally get

γ q (τ ) ≤ sup ‖ gn − g n,n−τ ‖p ≤‖ gn,n−⌈ τ2 ⌉ (X ) ‖p ≤ Kp λ n≥τ

(⌈ τ2 ⌉+1)/p

To estimate ‖ g n,n−τ − gn++ ,n−τ ‖q we use the following simple inequality: let (an ), (bn ), n ≥ 1, be sequences of real numbers, and for 1 ≤ m ≤ n set m≤k≤n

Then |ˆa − bˆ | ≤

m≤k≤n

(37)







≤ Kβ ′′ ,β ′ / exp(β ′′ δ)

(38)

for all n ≥ 1. Let δ ′ < δ and let f be a smooth Lipschitz-continuous function such that I{g >δ} ≤ f (g ) ≤ I{g >δ ′ } . Transformations of Lmixing processes via real Lipschitz-continuous bounded functions are L-mixing and by Theorem 2 (gn ) is L-mixing, thus (f (gn )) is also L-mixing. Using the strong law of large numbers for L-mixing processes, we get, after centering, lim sup N −→+∞

N 1 −

N n =1

I{gn >δ} ≤ lim sup

N −→+∞

N 1 −

N n =1

f (gn )

N N   1 −  1 −  E f (gn ) ≤ lim sup E I{gn >δ ′ } . N −→+∞ N n=1 N −→+∞ N n=1

= lim sup

(39)

Taking into account (38), and that δ ′ is arbitrary, we get the claim.  5. The i.i.d. case revisited First we outline the proof of Theorem 1. Standard results of the theory of risk processes imply that for any c ′ such that 0 < c ′ < c ∗ with c ∗ as in Eq. (7), we have E (exp c ′ gn ) < ∞, see [24,15]. The argument below partially follows the line of proof of this known result. Trivially, for any n ∈ N, Fn and Fn+ are independent, and (gn ) is Fn -adapted. Following the proof of Theorem 2 we define the auxiliary process

k=n−⌈ τ2 ⌉+1

τ − j=⌊ τ2 ⌋+1

γq (j, X ).

It follows that, using condition (10), +∞ −

I{gn >δ} ≤ Kβ ′′ ,β ′ exp(−β ′′ δ)

and bˆ := max (bk + · · · + bn )+ .

k=m |ak − bk |. Applying this we get n − ‖q ≤ ‖ Xk − Xk+,n−τ ‖q



++

N n =1

∑n

‖ g n,n−τ − gn++ ,n−τ

Γ q (g ) =

N 1 −

P gn > δ = P exp(β ′′ gn ) > exp(β ′′ δ)

.

(34)

aˆ := max (ak + · · · + an )+

(36)

where Kβ ′′ ,β ′ is defined in Theorem 2.



(g ) ≤ Γ q (g ) + Γ q (g ).

I{gn >δ} ,

Proof. By Theorem 2 we have

Taking supn≥τ in Eq. (30) and summing over τ we get ++

N n =1

Theorem 3. Let (Xn ) and β ∗ be as in Theorem 2, and let (gn ) be defined as in (1). Then for any δ > 0, and any 0 < β ′′ < β ′ < β ∗ we have

N −→+∞

+∞ −

N 1 −

with the tacit assumption that τ ∗ = +∞. This is in fact the most important implication of the results of the previous sections. We prove the result for L-mixing inputs; the i.i.d. case will be briefly covered below.

lim sup

τ =0

++ γ ++ q (τ ) := sup ‖ g n,n−τ − gn,n−τ ‖q ,

++

As a corollary to Theorems 1 and 2 we can get an upper bound for the a.s. false alarm rate defined as

N −→+∞

Γq (g ) ≤ 2Γq++ (g ). gn++ ,n−τ

4. False alarm rate

lim sup

By Lemma 4 in the Appendix we have

489

γ ++ q (τ ) ≤

τ =0



+∞ − τ − τ =0 j=⌊ τ ⌋+1 2 +∞ −

gn,n−τ (X ) :=

γq (j, X )

τ γq (τ , X ) < +∞.

max (Xk + · · · + Xn )+ .

(40)

1≤k≤n−τ

The exponential moments of gn,n−τ (X ) can be bounded as in Lemma 1: (35)

τ =0

Combining (35), (34), (31) and (28), we conclude that Γq (g ) < +∞, as stated. To conclude the proof of Theorem 2, we note once more that (15) follows from Lemma 1, recalling that gn = gn,n . 

Lemma 3. Let (Xn ) and c ′ > c ′′ and µ be as in Theorem 1. Then

E exp c ′′ gn,n−τ (X ) ≤ 1 +





c ′′

 c′



 c ′′

µτ + 1 . 1−µ

(41)

490

L. Gerencsér, C. Prosdocimi / Systems & Control Letters 60 (2011) 486–491

For the proof we use the following inequality (see Lemma 2):

P gn,n−τ (X ) > x ≤





τ +1

µ exp(−c ′ x) 1−µ

(42)

for any x ≥ 0. In the proof the required exponential inequality reduces to



E exp c ′

n −

Xk



=

n  ∏



E exp c ′ Xk ≤ µn−i+1 .

k=i

k=i

The proof of Lemma 3 is obtained by mimicking the proof of Lemma 1. To show that (gn ) is L-mixing we use the Fn+−τ -adapted approximation gn++ max (Xk + · · · + Xn )+ , ,n−τ = g n,n−τ := n−τ +1≤k≤n

(43)

in analogy with (26) and (29). We get as in (33) gn − g n,n−τ ≤

max (Xk + · · · + Xn )+ = gn,n−τ ,

(44)

1≤k≤n−τ

N −→+∞

I{gn >δ} ≤ Cc ′′ ,c ′ exp(−c ′′ δ)

N n =1

N 1 −

N n=1

(45)

where Cc ′′ ,c ′ is defined in Theorem 1. As an  example,  let Xn = − log f (Yn , θ0 ) + log f (Yn , θ1 ). Then we have E exp(Xn ) = 1, i.e. c ∗ = 1. Thus the a.s. false alarm rate is less than Cc ′′ ,c ′ exp(−c ′′ δ) for any c ′′ < 1, essentially reproducing the bound K exp(−δ) given in [20], Section 5.3.

lim sup N −→+∞

N 1 −

N n=1

Xn = − log p(Yn |Yn−1 , . . . , Y0 , θ0 ) + log p(Yn |Yn−1 , . . . , Y0 , θ1 ). may not be bounded, i.e. the condition M∞ (X ) < +∞ may not be satisfied. An even more serious restriction is the condition Γ∞ (X ) < +∞. These two conditions were crucial for the validity of an exponential inequality for partial sums of L-mixing processes, used in the proof of Lemma 1. A careful study of the proof of Lemma 1 shows that what we really need is the validity of

    Xk − E(Xk ) ≤ C exp c (β ′ )2 n ,

n −  ′

(46)

with some β ′ > 0 and C , c > 0. A necessary condition for this is that J (β ) = lim sup n

1 n

 log E exp β



n −

Definition 1. We say that (Xn ) is M-bounded if for all 1 ≤ q < +∞ Mq (X ) := sup ‖ Xn ‖q < +∞. We can also define Mq (X ) for q = +∞ as M∞ (X ) := sup ess sup |Xn |. n ≥1

Let (Fn )n≥1 be an increasing family of σ -fields and let (Fn+ )n≥1 be a decreasing family of σ -fields, Fn ⊆ F and Fn+ ⊆ F for any n. Assume that Fn and Fn+ are independent for all n. Let τ ≥ 0 be an integer, and let for 1 ≤ q < +∞

γq (τ , X ) = γq (τ ) := sup ‖ Xn − E(Xn |Fn+−τ ) ‖q , n≥τ

Γq (X ) :=

+∞ −

γq (τ ).

τ =0

We can also define

γ∞ (τ , X ) := sup ess sup |Xn − E(Xn |Fn+−τ )|, n≥τ

Γ∞ (X ) :=

+∞ −

γ∞ (τ , X ).

τ =0

 Xk

< +∞.

(47)

k =1

J (β ) in Eq. (47) is a well known quantity in risk sensitive control. It is known that J (β ′ ) < +∞ for some β ′ > 0 if |Xk | ≤ Xk∗ where Xk∗ = ZkT Zk where Zk is the output of a finite-dimensional, timeinvariant, stable linear Gaussian system, see Appendix F in [25]. ′

(49)

We summarize a few definitions given in [18]. Let (Ω , F , P) be a probability space, and let (Xn ) be a stochastic process on (Ω , F , P).

k=1



gn < +∞?

n≥0

The problem formulation of this paper was motivated by the problem of change detection of Hidden Markov Processes. It should be admitted though, that the results of the paper are not directly applicable. Namely, the Page–Hinkley-scores Xn defined under (9) as

E exp β

(48)

Appendix. L-mixing processes

6. Discussion



Xn < 0.

Let (gn ) be the response of the Page–Hinkley-detector driven by (Xn ). Does it follow that

Theorem 4. Let (Xn ) and c ∗ be as in Theorem 1, and let (gn ) be defined as in (1). Then for any δ > 0, and any 0 < c ′′ < c ′ < c ∗ we have N 1 −

lim sup N −→+∞

and the proof is completed as in the L-mixing case. For the false alarm rate we get, as in the case Theorem 3:

lim sup

An important and relevant example is when the data Yn are generated by a finite dimensional, time-invariant, stable, linear Gaussian system. Then, writing it in innovation form, via Kalmanfiltering, the conditional density log p(Yn |Yn−1 , . . . , Y0 , θ ), and hence the Page–Hinkley-score Xn does indeed satisfy the above majorization condition. We conclude that (47) is satisfied with some β ′ > 0. A closer look at the functional J (β ′ ), which can be computed explicitly, reveals that actually the stronger condition (46) also holds. We conclude that, adapting the arguments of Theorem 2, an almost sure upper bound for the false alarm rate can be given when trying to detect changes in the dynamics of a finite dimensional, time-invariant, stable, linear Gaussian system. It may be of interest to establish the stability properties of the Page–Hinkley-detector for deterministic inputs, mimicking the i.i.d. case. We conclude this section by formulating the following problem: assume that (Xn ) is a bounded deterministic sequence satisfying

Definition 2. A process (Xn ) is L-mixing w.r.t. (Fn , Fn+ ) if Xn is Fn measurable for all n ≥ 1, (Xn ) is M-bounded, and Γq (X ) < +∞ for all 1 ≤ q < +∞. A prime example of L-mixing process is the output process of a stable linear stochastic system driven by a M-bounded i.i.d. sequence. To estimate γq (τ , X ) the following lemma is useful:

L. Gerencsér, C. Prosdocimi / Systems & Control Letters 60 (2011) 486–491

Lemma 4. Let F ′ ⊂ F be two σ -algebras. Let ξ be a F -measurable r.v.. Then for any 1 ≤ q < +∞ and any F ′ -measurable r.v. η we have

‖ ξ − E(ξ |F ′ ) ‖q ≤ 2 ‖ ξ − η ‖q .

(A.1)

Centered L-mixing processes satisfy the strong law of large numbers, see Corollary 1.3 in [18]. References [1] M. Basseville, I. Nikiforov, Detection of Abrupt Changes: Theory and Application, Prentice-Hall, 1993. [2] B. Brodsky, B. Darkhovsky, Nonparametric Methods in Change-Point Problems, Kluwer Academic Publishers, 1993. [3] E. Page, Continuous inspection schemes, Biometrika 41 (1/2) (1954) 100–115. [4] D. Hinkley, Inference about the change-point from cumulative sum tests, Biometrika 58 (3) (1971) 509–523. [5] G. Lorden, Procedures for reacting to a change in distribution, The Annals of Mathematical Statistics 42 (6) (1971) 1897–1908. [6] J. Baikovicius, L. Gerencsér, Change-point detection as model selection, Informatica 3 (1) (1992) 3–20. [7] L. Gerencsér, G. Molnár-Sáska, Change detection of hidden Markov models, in: Proceedings of the 43th IEEE Conference on Decision and Control, 2004, pp. 754–1758. [8] B. Chen, P. Willett, Quickest detection of hidden Markov models, in: Proc. 36th IEEE CDC, San Diego, CA, 1997, pp. 3984–3989. [9] R. Jana, S. Dey, Change detection in teletraffic models, IEEE Transactions on Signal Processing 48 (3) (2000) 846–853. [10] T.L. Lai, Information bounds and quick detection of parameter changes in stochastic systems, IEEE Transactions on Information Theory 44 (1998) 2917–2929. [11] C. Fuh, Sprt and cusum in hidden Markov models, Annals of Statistics 31 (3) (2003) 942–977.

491

[12] C. Fuh, Asymptotic operating characteristics of an optimal change point detection in hidden Markov models, Annals of Statistics 32 (5) (2004) 2305–2339. [13] L. Takacs, Introduction to the Theory of Queues, Oxford University Press, New York, 1962. [14] S. Asmussen, Applied Probability and Queues, Springer-Verlag, New York, 2003. [15] H. Panjer, G. Willmot, Insurance Risk Models, Society of Actuaries, 1992. [16] S. Meyn, R. Tweedie, Markov Chains and Stochastic Stability, Springer-Verlag, London, 1993. [17] P. Diaconis, D. Freedman, Iterated random functions, SIAM Review 41 (1) (1999) 45–76. [18] L. Gerencsér, On a class of mixing processes, Stochastics 26 (1989) 165–191. [19] H. Poor, O. Hadjiliadis, Quickest Detection, Cambridge University Press, 2009. [20] M. Pollak, A.G. Tartakovsky, Asymptotic exponentiality of the distribution of first exit times for a class of Markov processes with applications to quickest change detection, Theory of Probability and its Applications Probability and Applications 53 (3) (2009) 430–442. [21] L. Gerencsér, G. Michaletzky, G. Molnár-Sáska, An improved bound for the exponential stability of predictive filters of hidden Markov models, Communications in Information and Systems 7 (2) (2007) 133–152 (special Volume on Stochastic Control and Filtering, in Honor of Tyrone Duncan on the Occasion of his 65th Birthday, guest eds.: A. Bensoussan, S. Mitter and B. PasikDuncan). [22] L. Gerencsér, G. Molnár-Sáska, Identification of Hidden Markov Models Uniform LLN-s, in: Modeling, Estimation and Control, Festschrift in Honor of Giorgio Picci on the Occasion of his Sixty-Fifth Birthday, in: Lecture Notes in Control and Information Sciences, vol. 364, 2007, pp. 135–149, Springer (Ed.). [23] L. Gerencsér, Almost sure exponential stability of random linear differential equations, Stochastics and Stochastics Reports 36 (1991) 91–107. [24] E. Sparre Andersen, On the collective theory of risk in the case of contagion between the claims, in: Transactions of the XV-th International Congress of Actuaries, 1957, pp. 219–229. [25] A. Stoorvogel, J. van Schuppen, System identification with information theoretic criteria, in: S. Bittanti, G. Picci (Eds.), Identification, Adaptation, Learning, Springer-Verlag, Berlin, 1996, pp. 289–338.