Stochastic Processes and their Applications 77 (1998) 69–81
Finite dimensional lters for nonlinear stochastic dierence equations with multiplicative noises a Dipartimento
Marco Ferrante a;∗;1 , Paolo Vidoni b;2
di Matematica Pura ed Appl., Universita degli Studi di Padova, via Belzoni 7, I-35131 Padova, Italy b Dipartimento di Scienze Statistiche, Universit a degli Studi di Udine, via Treppo 18, I-33100 Udine, Italy Received 5 January 1998; accepted 24 April 1998
Abstract We consider the ltering problem for partially observable stochastic processes {Xn ; Yn }n∈N , solutions to systems of stochastic dierence equations. In the rst part of the paper we shall present a simple constructive method to obtain nite dimensional lters in discrete time. Then, applying some well-known results, mainly on the product of independent positive random variables, we shall present new nite dimensional lters and interpret some known results in a more c 1998 Elsevier Science B.V. All rights reserved. general setting. AMS classi cation: 93E11; 60G35 Keywords: Stochastic ltering; Finite dimensional lters
1. Introduction The ltering problem can be described as follows: given an unobservable stochastic process {Xt }t∈T , with T = N or T = [0; +∞), and a second stochastic process {Yt }t∈T , more or less related to {Xt }t∈T , it is desired to nd the “best estimate” of the state Xt given the observation Y t = {Ys ; 06s6t}. The phrase “best estimate” is to be understood usually in the mathematical sense of evaluating the conditional expectation E[(Xt )|Y t ], for any suitable real function (·). In physical situations, the process {Xt }t∈T may be the state of a given dynamical system or, more generally, a signal available for measurement and the process {Yt }t∈T may represent the actual physical measurements, which contain indirect information on the process {Xt }t∈T . In the case of applications in economics with particular reference to stochastic volatility modelling (see e.g. Barndor-Nielsen, 1997; Shephard, 1994, ∗
Corresponding author. Fax: 49 875 8596; e-mail:
[email protected]. Partially supported by EU Grant No. ERBCHRXCT 940449. 2 Partially supported by Italian National Research Council Grant No. 96.01542.CT10. 1
c 1998 Elsevier Science B.V. All rights reserved 0304-4149/98/$19.00 PII: S 0 3 0 4 - 4 1 4 9 ( 9 8 ) 0 0 0 3 7 - 4
70
M. Ferrante, P. Vidoni / Stochastic Processes and their Applications 77 (1998) 69–81
1996) where usually T = N, the observable process {Yt }t∈T may describe a nancial time series, while the random variables (r.v.’s in the sequel) Xt , t ∈ T , called in the mathematical nance terminology stochastic volatilities, are related to the variance, which is therefore stochastic, of the r.v.’s Yt , t ∈ T . In statistical literature, these models are called state-space models. The ltering problem is clearly completely solved once we are able to compute the conditional distribution of Xt given Y t and we term an exact lter system the algorithm determining this conditional distribution, for each t ∈ T . In general, the computation is complicated since, whenever T = N, the number of quantities involved increases with n, while, whenever T = [0; +∞), we have to solve an in nite-dimensional stochastic partial dierential equation, namely the Duncan–Mortensen–Zakai equation. Thus, an exact lter system is usually not very useful in practice. However, if we are able to prove that there exists a nite dimensional process {t }t∈T such that E[(Xt )|Y t ] = (t ), the problem becomes much more tractable and of great interest for the possible applications. In this situation, we call {t }t∈T a nite dimensional lter; the precise de nition of a nite dimensional lter in discrete time is given at the end of this section. Although the existence of such a lter is a “rare” event, the few known examples of lters have been widely used in many applications and, for this reason, are usually of great interest. The mathematical problem described above has the same formulation both with T = [0; +∞) and with T = N, but the techniques used to handle the rst case, the so-called continuous-time ltering, and the second case, the so-called discrete-time ltering, are quite dierent. In the continuous-time case, the process {Xt ; Yt }t∈T is usually de ned as the solution to an Itˆo stochastic dierential equation: dXt = f(Xt ) dt + g(Xt ) dWt ; dYt = h(Xt ) dt + (Xt ) dVt ;
(1.1)
where {Wt }t∈T and {Vt }t∈T are independent Wiener processes and f; g; h; suitable real functions. Besides the famous Kalman–Bucy lter (see e.g. Jazwinski, 1970), that holds for the linear model with g and constant, very few examples of nite dimensional lters are known (see e.g. Benes, 1981). However, in the early 1980s many results have been proved in order to characterize the existence of nite dimensional lters in terms of the nite dimensionality of a suitable Lie algebra generated by the coecient of the stochastic dierential equation (1:1) (for a rather comprehensive account on this topic see e.g. Hazewinkel and Willems, 1981). Usually, this approach does not have a discrete-time counterpart, except for very simple situations, where a certain similarity with the continuous-time case can be observed (see e.g. Levine and Pignie, 1986; Ferrante and Giummole, 1995). For this reason, the problem of de ning nite dimensional lters in discrete time is, in some sense, a more complicated problem, even if the required mathematical tools are usually elementary (for some recent examples of nite dimensional lters in discrete time see Shephard, 1994; Vidoni, 1998). We shall consider the discrete-time ltering problem and our aim is twofold: rst, by recalling a simple constructive approach used by Vidoni (1998), we shall provide a useful tool for searching new classes of nite dimensional lters. Then, we shall use this
M. Ferrante, P. Vidoni / Stochastic Processes and their Applications 77 (1998) 69–81
71
method and some known results on the product of independent positive r.v.’s, to obtain some new nite dimensional lters for partially observable stochastic processes which are solutions to suitable systems of stochastic dierence equations. More precisely, given a probability space ( ; F; P), we shall consider the ltering problem for partially observable stochastic processes (p.o.s.p.’s in the sequel) {Xn ; Yn }n∈N which are unique solutions to systems of stochastic dierence equations of the following type: X0
given random variable
Xn+1 (!) = fn+1 (Xn (!); Y n (!); !) Yn (!) = gn (Xn (!); !)
for n¿0;
(1.2)
for n¿0:
Throughout the paper we shall always assume the hypotheses: (A1) for any n¿0 fn+1 : Rd × R(n+1)p × → Rd ; gn : Rd × → Rp are B(Rd ) ⊗ B(R(n+1)p ) ⊗ F- and B(Rd ) ⊗ F-measurable maps, respectively, where B(·) denotes the Borel eld; (A2) for any xn ∈ Rd and yi ∈ Rp , 06i6n, the random vectors {fn+1 (xn ; y n ; ·)}n∈N , {gn (xn ; ·)}n∈N and X0 are independent and absolutely continuous with respect to the Lebesgue or to a suitable counting dominating measure on Rd , Rp and Rd , respectively. Here, Yn is conditionally independent, given Xn , of all the preceding states and observations; this assumption may be trivially relaxed by considering gn (·) as depending as well on y n−1 . It is not dicult to see that the classical models studied in the literature can be expressed within this framework in a very simple way. For example, the linear system with additive Gaussian white noise, which admits as a f.d.f.s. the well-known Kalman– Bucy lter (see Jazwinski, 1970, Theorem 7.2, p. 201), can be obtained putting, for any n¿1, fn (xn−1 ; y n−1 ; !) = n xn−1 +
n Wn (!);
gn (xn ; !) = Mn xn + Vn (!)
with X0 , {Wn }n∈N+ and {Vn }n∈N+ independent Gaussian random vectors, and n , n , Mn constant nonsingular matrices. The advantage of the present notation will be clear in Section 4. To conclude this section, let us recall the precise de nition of a nite dimensional lter system (f.d.f.s. in the sequel) for the discrete-time case (see e.g. Ferrante and Runggaldier, 1990): De nition 1.1. Let {P(·; z) : z ∈ Z} be a parameterized set of probability measures on Rd with parameter set Z ⊆ Rk . A system of measurable functions cn : Z × Rm → Z is a f.d.f.s. for {Xn ; Yn }n∈N if, for each n ∈ N and y n = (y0 ; : : : ; yn ) ∈ Rp(n+1) , Pn−1 (·|Y n−1 = y n−1 ) = P(·; z) ⇒ Pn (·|Y n = y n ) = P(·; cn (z; yn )); where Pn (·|Y n = y n ) denotes the conditional distribution of Xn given Y n = y n .
72
M. Ferrante, P. Vidoni / Stochastic Processes and their Applications 77 (1998) 69–81
2. A constructive approach The problem concerning the existence of a f.d.f.s. in discrete time is usually faced with the general aim of nding conditions assuring the existence of a f.d.f.s., for a given class of p.o.p.s.’s. In this respect there are some general results on necessary conditions (see e.g. Ferrante and Runggaldier, 1990), which state that, if a f.d.f.s. in discrete-time exists, then the probability distributions of Yn given Xn , of Xn given Y n and of Xn+1 given Y n are all of the exponential class. On the other hand, there are very few results on sucient conditions (see e.g. Ferrante and Giummole, 1995) since the existence of a f.d.f.s. for a given problem is, in general, an unusual situation. For this reason, in order to deduce new classes of f.d.f.s.’s, we adopt a constructive approach which, though not very general, enables, by means of an ad hoc procedure, the determination of a number of new results, within the theory of stochastic ltering. This strategy is explicitly adopted by Vidoni (1998) and it is based on the following remarks. Under assumptions (A1) and (A2), a p.o.s.p. {Xn ; Yn }n∈N is characterized by the triple {p(x0 ); p(xn+1 |xn ; y n ); p(yn |xn )}n∈N , where the enclosed functions are the densities associated to the probability distributions of X0 , of Xn+1 given Xn = xn and Y n = y n and of Yn given Xn = xn , respectively. Since we are interested in the ltering problem, it may be useful to recall the relation which enables the computation, at each time n, of the ltering density, that is the conditional density of Xn given Y n = y n . This relation consists of two steps. By Bayes’ rule we have the updating step: p(yn |xn ) p(xn |y n−1 ) p(yn |xn ) p(xn |y n−1 ) dxn Rd
p(xn |y n ) = R
for n¿0:
(2.1)
We call p(xn |y n−1 ), namely the conditional density of Xn given Y n−1 = y n−1 , the prediction density at time n − 1 and we have Z p(xn |xn−1 ; y n−1 ) p(xn−1 |y n−1 ) dxn−1 for n¿1; (2.2) p(xn |y n−1 ) = Rd
(with the convention p(x0 |y−1 ) ≡ p(x0 )) which de nes the prediction step. A f.d.f.s. exists whenever the ltering densities belong, at each time n, to a given family of probability density functions, parametrized by a nite dimensional parameter. The constructive strategy adopted in this paper consists in de ning a p.o.s.p. in such a way that both the ltering and the prediction densities belong to the same nitely parametrized set of probability density functions D. That is, we require that the updating step (2:1) and the prediction step (2:2) correspond to functional operators T1 and T2 such that Ti (D) ⊆ D, for i = 1; 2. In this respect, we consider some well-known inferential techniques in Bayesian statistics, related to conjugate families of distributions, and some useful results concerning the algebra of r.v.’s. More precisely, our strategy is based on the following procedure: 1. Assume that the observation density p(yn |xn ), n¿0, belongs to an exponential family of distributions; 2. Choose the initial density p(x0 ) conjugate to the observation distribution at n = 0;
M. Ferrante, P. Vidoni / Stochastic Processes and their Applications 77 (1998) 69–81
73
3. De ne a suitable sequence of measurable functions fn+1 (xn ; y n ; !), n ∈ N, such that the prediction step maintains conjugacy. With this constructive procedure, it is possible to de ne new examples of p.o.s.p.’s admitting f.d.f.s.’s and some known results may be easily interpreted within this new perspective. As a simple example, the Kalman–Bucy lter can be viewed in this framework, since the Gaussian is the conjugate distribution associated to the mean value of a Gaussian observation distribution and it is well known that the normal law is closed under the sum and the ane transformations. In this way, the updating and the prediction steps de ne a f.d.f.s. which updates the mean value and the variance matrix of the Gaussian ltering distributions, at each time n. Furthermore, in the following examples, the state-space models given in Shephard (1994), Smith and Miller (1986) and Vidoni (1998) are reinterpreted as p.o.s.p.’s admitting f.d.f.s.’s. Example 2.1. Smith and Miller (1986) propose a class of non-Gaussian state-space models obtained as generalizations of a simple model based on the system of stochastic dierence equation (1:2). They assume that X0 ∼ Ga(0 ; 0 ), that is X0 is distributed as a gamma r.v., fn (xn−1 ; y n−1 ; !) = n xn−1 Wn (!);
gn (xn ; !) =
1 Vn (!) xn
for any n¿1, {Wn }n∈N+ is a sequence of independent Be( n n−1 ; (1 − n )n−1 ) r.v.’s, where the notation Be(·; ·) indicates a beta distributed r.v. with given parameters, and {Vn }n∈N+ is a sequence of independent Ga(1; 1) r.v.’s (let us recall that a Ga(; ) distribution is characterized by the probability density function p(x; ; ) =
exp{− x}x−1 5(0; +∞) (x); ()
where ; are strictly positive real constants, while a Be( ; ) distribution is characterized by the probability density function p(x; ; ) =
( + ) −1 x (1 − x)−1 5(0; 1) (x) ( ) ()
with ; strictly positive real constants). Moreover, n and n are non-random quantities and n−1 is the shape parameter of the gamma ltering distribution at time n − 1. This p.o.s.p. admits a f.d.f.s. since the ltering distribution, at each time n, is a Ga(n ; n (y n )), where n = n n−1 + 1 and n (y n ) = n−1 (y n−1 )(n )−1 + yn , n¿1. Example 2.2. The basic model considered by Shephard (1994), which is labeled Gaussian local scale model, can be viewed as the solution to the system of stochastic difference equation (1:2), where the conditional distribution of X0 given Y0 = y0 is a Ga(0 ; 0 ) and fn (xn−1 ; y n−1 ; !) = n xn−1 Wn (!);
1 gn (xn ; !) = √ Vn (!) xn
for any n¿1. Here {Wn }n∈N+ is a sequence of independent Be( n−1 ; (1 − )n−1 ) r.v.’s and {Vn }n∈N+ is a sequence of independent standard Gaussian r.v.’s. Indeed,
74
M. Ferrante, P. Vidoni / Stochastic Processes and their Applications 77 (1998) 69–81
n = exp{−E[log(Wn )]} and , such that 0¡ 61, are non-random quantities; n−1 is the shape parameter of the gamma ltering distribution at time n − 1. This p.o.s.p. admits a f.d.f.s. since the ltering distribution, at each time n, is a Ga(n ; n (y n )), where n = n−1 + 12 and n (y n ) = n−1 (y n−1 ) n + 12 yn 2 , n¿1. In a recent paper Vidoni (1998) de nes, as a simple application of a more general procedure, two p.o.s.p.’s admitting a f.d.f.s., which constitute an alternative to that proposed by Shephard. The only dierence concerns the prediction steps, which, in the rst case, simply involves the convolution property of the gamma distribution and it is based on the measurable functions fn (xn−1 ; y n−1 ; !) = xn−1 + ( n−1 (y n−1 ))−1 Wn (!), n¿1. Here {Wn }n∈N+ is a sequence of independent Ga( ; 1) r.v.’s, where ¿0, and n−1 (y n−1 ) is the scale parameter of the gamma ltering distribution at time n − 1. This p.o.s.p. admits a f.d.f.s. since the ltering distribution, at each time n, is a Ga(n ; n (y n )), where n = n−1 + + 12 and n (y n ) = n−1 (y n−1 ) + 12 yn 2 . In the second case, fn (xn−1 ; y n−1 ; !) = [n−1 xn−1 + { n−1 (y n−1 )}−1 n−1 Wn (!)] (n−1 + )−1 , n¿1, where {Wn }n∈N+ is de ned as before and (n−1 ; n−1 (y n−1 )) are the shape and the scale parameters of the gamma ltering distribution at time n − 1. With these assumptions, the ltering distribution, at each time n, is a Ga(n ; n (y n )), where n = n−1 + + 12 and n (y n ) = n−1 (y n−1 )(n−1 + )(n−1 )−1 + 12 yn 2 , n¿1.
3. Product of independent random variables In the present section we shall recall some results concerning the product of independent positive random variables (i.p.r.v.’s in the sequel) and our aim is to nd out families of r.v.’s closed under product (for more details see e.g. Springer, 1979). It is well known that the family of Gaussian r.v.’s is closed under the sum, while for other classes this holds just under additional assumptions (e.g. the sum of two independent gamma r.v.’s Ga(a1 ; b1 ) and Ga(a2 ; b2 ) is a gamma r.v. itself i b1 = b2 ). For the product of i.p.r.v.’s the following result, due to Jambunathan (1954) and whose proof is straightforward, is available: Proposition 3.1. Given X1 ∼ Ga(a1 ; b1 ), X2 ∼ Be(ca2 ; (1 − c)a2 ), X3 ∼ Be(a3 ; b3 ) and X4 ∼ Be(a4 ; b4 ), with a1 ; a2 ; a3 ; a4 ; b1 ; b3 ; b4 strictly positive real constants and 0¡c61, we have: 1. if X1 and X2 are independent and a1 = a2 , then Y = X1 X2 ∼ Ga(ca1 ; b1 ); 2. if X3 and X4 are independent and a4 = a3 + b3 , then Z = X3 X4 ∼ Be(a3 ; b3 + b4 ). It seems dicult to nd out further results similar to Proposition 3.1, even if we consider the most common families of probability distributions. Nevertheless, statement 2 in Proposition 3.1 will allow us to construct two new f.d.f.s.’s in the next section, while statement 1 is the core of the results proved by Shephard (1994) and Smith and Miller (1986). We shall now prove an easy multidimensional extension of statement 2 of Proposition 3.1. Let us de ne the componentwise product on R n × R n
M. Ferrante, P. Vidoni / Stochastic Processes and their Applications 77 (1998) 69–81
75
given by def
(x1 ; x2 ; : : : ; xn ) • (y1 ; y2 ; : : : ; yn ) = (x1 y1 ; x2 y2 ; : : : ; xn yn ): Proposition 3.2. Let X ∼ Din (c1 ; : : : ; cn+1 ), where Din (·) indicates an n-dimensional Dirichlet random vector with given parameters, and Yi ∼ Be(ai ; bi ), i = 1; : : : ; n be mutually independent. If ci = ai + bi , i = 1; : : : ; n, then Z = X • (Y1 ; : : : ; Yn ) ∼ Din (a1 ; : : : ; an ; Pn i=1 bi + cn+1 ). Proof. Let us start by recalling the form of the density of a Din (c1 ; : : : ; cn+1 ) random vector, which is P n+1 ( i=1 ci ) p(x1 ; : : : ; xn ) = Q n+1 (ci ) i=1
1−
n X
!cn+1 −1 xi
n Y
i=1
xici −1
i=1
Pn with ci ¿0, i = 1; : : : ; n + 1, and (x1 ; : : : ; xn ) ∈ = {(x1 ; : : : ; xn ) ∈ (0; 1) n : i=1 xi ¡1}. It is immediate to check that Z is an absolutely continuous random vector with density Z h(z1 ; : : : ; zn ) =
S
×
P n+1 ( i=1 ci ) Q n+1 (ci ) i=1
1−
n X
!cn+1 −1 xi
i=1
n Y
xici −1
i=1
b −1 ai −1 n Y zi i (ai + bi )) zi 1− xi−1 dx1 : : : dxn ; (ai ) (bi ) xi xi i=1
where S = {x ∈ : 0¡zi ¡xi ; i = 1; : : : ; n}. Making the following change of variables: ti =
xi − zi Pn ; 1 − i=1 zi
i = 1; : : : ; n
(3.1)
from the assumption that ci = ai + bi , i = 1; : : : ; n, we get P n+1 n ci ) Y (ai + bi )) ( h(z1 ; : : : ; zn ) = Q n+1i=1 (ci ) i=1 (ai ) (bi ) i=1 ×
n Y i=1
ziai −1
Z
1−
n X i=1
1−
!cn+1 −1 ti
n X
!P n
i=1
bi +cn+1 −1
zi
i=1 n Y
tibi −1 dt1 : : : dtn :
i=1
Since the integral gives the normalizing constant of a Dirichlet distribution, we obtain Pn that Z ∼ Din (a1 ; : : : ; an ; i=1 bi + cn+1 ) and the result is proved. Remark 3.1. Note that Proposition 3.2 is not extendable to the case of the componentwise product of two Dirichlet random vectors, being the change of variables (3:1) no more applicable.
76
M. Ferrante, P. Vidoni / Stochastic Processes and their Applications 77 (1998) 69–81
4. Some new nite-dimensional lters in discrete-time In this section we shall apply the results on the product of i.p.r.v.’s and, by using the strategy explained in the second section, we derive some new f.d.f.s.’s. In the rst three models, the product of r.v.’s appears in the state equation (that is the rst equation in system (1:2)). In the other two models, the product appears in the observation equation (that is the second equation in system (1:2)), while the state equation presents a suitable nonlinear transformation. 4.1. Beta-binomial, beta-negative binomial and Dirichlet-multinomial models Let us consider the p.o.s.p. {Xn ; Yn }n∈N , solution to system (1:2), with fn+1 (xn ; y n ; !) = xn Wn+1 (!);
gn (xn ; !) = Vn (!; xn )
(4.1)
for any n¿0, where {Wn }n∈N+ is a sequence of independent beta r.v.’s and {Vn (·; xn )}n∈N is a sequence of independent binomial r.v.’s. Proposition 4.1. Assume that X0 ∼ Be(; ), {Wn }n∈N+ is a sequence of independent Be(n−1 (y n−1 ) + n−1 (y n−1 ); ) r.v.’s and {Vn (·; xn )}n∈N is a sequence of independent binomial r.v.’s with parameters (m; xn ), with n (y n ) = n−1 (y n−1 ) + yn n
n (y ) = n−1 (y
n−1
for n¿1; 0 (y0 ) = + y0 ;
) + + m − yn
for n¿1; 0 (y0 ) = + m − y0 :
Then the process {Xn ; Yn }n∈N , solution to Eqs. (1.2) – (4.1), admits a f.d.f.s., being the ltering distribution at each time n equal to that of a Be(n (y n ); n (y n )) r.v. Remark 4.1. Note that n (y n ) + n (y n ) = + + n + (n + 1)m and then the probability distribution of Wn+1 , n¿0, does not depend on y n . A further example of p.o.s.p. admitting a f.d.f.s. is given by a system of stochastic dierence equations which is formally equivalent to Eq. (4.1); more precisely, the initial r.v. is de ned as in the previous case, while {Wn }n∈N+ is a sequence of independent Be(n−1 − ; ) r.v.’s, with 0¡ ¡r, and {Vn (·; xn )}n∈N is a sequence of independent negative binomial r.v.’s with parameters (r; xn ). Here, n−1 is a suitable parameter of the ltering distribution at time n − 1, which will be de ned in the following proposition. Proposition 4.2. Under these assumptions, the solution process of Eqs. (1.2) – (4.1) admits a f.d.f.s., being the ltering distribution, at time n, a Be(n ; n (y n )), where n = n−1 − + r, n (y n ) = n−1 (y n−1 ) + + yn , n¿1, with 0 = + r, 0 = + y0 . The simple proofs of Proposition 4.1 and of Proposition 4.2 are omitted, since they are quite similar to that of the following Proposition 4.3.
M. Ferrante, P. Vidoni / Stochastic Processes and their Applications 77 (1998) 69–81
77
A multidimensional result, that represents in some sense an extension (see Remark 4.2 below) of the beta-binomial model (1:2)– (4:1), is provided by the p.o.s.p. {Xn ; Yn }n∈N , solution to system (1:2), with fn+1 (xn ; y n ; !) = xn • Wn+1 (!);
gn (xn ; !) = Vn (!; xn )
(4.2)
for any n¿0. Here, {Wn }n∈N+ is a sequence of independent d-dimensional random vectors, whose component Wni , 16i6n, are independent beta r.v.’s, while {Vn (·; xn )}n∈N is a sequence of independent multinomial random vectors with parameters (m; xn ). The following result holds: Proposition 4.3. Let X0 ∼ Did ( 1 ; : : : ; d ; d+1 ), {Wn }n∈N+ be a sequence of independent d-dimensional random vectors Wn = (Wn1 ; : : : ; Wnd ), with Wni ∼ Be( in−1 (1 − ); in−1 ), 0¡¡1 and in−1 de ned below, i = 1; : : : ; d, mutually independent, and {Vn (·; xn )}n∈N be a sequence of independent multinomial random vectors with parameters (m; xn ). Then, the process {Xn ; Yn }n ∈ N , solution to Eqs. (1.2) – (4.2), admits a f.d.f.s., being the ltering distribution at time n that of a Did ( 1n ; : : : ; dn ; d+1 n ) random vector, where
in = in−1 (1 − ) + yni ;
d+1 n
=
d X i=1
i0 = i + y0i ;
in−1
+
d+1 n−1
i = 1; : : : ; d; ∀n¿1; +m−
d X
yni ;
i = 1; : : : ; d; ∀n¿1;
i=1
i = 1; : : : ; d; d+1 = d+1 + m − 0
d X
y0i :
i=1
Proof. Since X0 ∼ Did ( 1 ; : : : ; d ; d+1 ) and the distribution of Y0 given X0 = x0 is equal to that of a multinomial random vector with parameters (m; x0 ), it is immediate to prove that the ltering density at step 0 is equal to the density of a Did ( 10 ; : : : ; d0 ; d+1 0 ) random vector. Let us assume that the ltering density at step n − 1 corresponds to the density of a Did ( 1n−1 ; : : : ; dn−1 ; d+1 n−1 ) random vector. Since the components of the random vector Wn = (Wn1 ; : : : ; Wnd ) are mutually independent and Wni ∼ Be( in−1 (1 − ); in−1 ), 0¡¡1, i = 1; : : : ; d, the state equation and Proposition 3.2 assure that the prediction Pd density at step n corresponds to a Did ( 1n−1 (1 − ); : : : ; dn−1 (1 − ); i=1 in−1 + d+1 n−1 ). As already done for the ltering density at step 0, the updating step (2:1) completes easily the proof. Remark 4.2. Although the Dirichlet-multinomial model generalizes the beta-binomial model in a multivariate setting, Proposition 4.3 is not a proper extension of Proposition 4.1. It is immediate to see that, for d = 1, the present lter does not coincide with that of Proposition 4.1 and it has to be considered as a further f.d.f.s. for the beta-binomial model. Furthermore, when d¿1, it is not possible to specify a lter analogous to that of Proposition 4.1, being the role of the Dirichlet r.v. and of the vector of beta r.v.’s in Eq. (4.2) no more symmetric.
78
M. Ferrante, P. Vidoni / Stochastic Processes and their Applications 77 (1998) 69–81
Remark 4.3. The beta-binomial and the beta-negative binomial models may be used when we are interested in the number of successes in repeated independent Bernoulli trials or in the number of failures prior to rth success, respectively. Here, the probability of success in each individual trail is assumed to be stochastic and it varies with n according to the particular latent process {Xn }n∈N which is considered. The Dirichletmultinomial model can be employed in the more general case when each individual trial may give d + 1 dierent outcomes. 4.2. Pareto-uniform and bilateral Pareto-uniform models Let us start by recalling some results on the Pareto and the bilateral Pareto r.v.’s, which are characterized, respectively, by the following densities: p(x; a; b) = aba x−a−1 ;
x¿b¿0; a¿0;
p(x; y; a; b; c) = a(a + 1)(c − b)a (y − x)−a−2 ;
x¡b¡c¡y; a; c¿0:
(4.3)
In the sequel we shall abbreviate these distributions as Pa(a; b) and Pa2 (a; b; c). The following propositions, whose proof is straightforward, hold: Proposition 4.4. Given X1 ∼ Pa(a1 ; b1 ) and X2 ∼ Pa(a2 ; b2 ), if X1 and X2 are independent and b1 = b2 , then Y = min(X1 ; X2 ) ∼ Pa(a1 + a2 ; b1 ). Proposition 4.5. A random vector X = (X 1 ; X 2 ) is distributed as a Pa2 (a; b; c) if and only if c − X 1 ∼ Pa(a; c − b) and the conditional density of X 2 − X 1 given X 1 = x1 corresponds to the density of a Pa(a + 1; c − x1 ). A p.o.s.p. {Xn ; Yn }n∈N admitting a f.d.f.s. may be de ned as the solution to system (1:2) with a multiplicative observation equation gn (xn ; !) = xn Vn (!);
n¿0
(4.4)
and a nonlinear state equation fn+1 (xn ; y n ; !) = min(xn ; n (y n )Wn+1 (!));
n¿0;
(4.5)
where n (y n ) is a suitable real function of the observations y n . We shall use here the fact that the Pareto is the conjugate distribution associated to the parameter xn , the upper bound of the support, of the uniform observation distribution. We prove the following proposition: Proposition 4.6. Assuming that X0 ∼ Pa(; ), {Wn }n ∈ N+ is a sequence of independent Pa( ; 1) r.v.’s and {Vn }n∈N is a sequence of independent uniform r.v.’s on (0; 1), then the process {Xn ; Yn }n∈N , solution to Eqs. (1.2) – (4.4) and (4.5), admits a f.d.f.s., being the ltering distribution at time n that of a Pa(n ; n (y n )) r.v., where n = n−1 + + 1;
∀n¿1; 0 = + 1
n (y n ) = max( n−1 (y n−1 ); yn ) = max( ; y0 ; : : : ; yn );
∀n¿1; 0 = max( ; y0 ):
M. Ferrante, P. Vidoni / Stochastic Processes and their Applications 77 (1998) 69–81
79
Proof. Let us start by proving the result for the ltering distribution at time 0. Since X0 ∼ Pa(; ) and the observation density is equal to p(y0 |x0 ) = (1=x0 )5(y0 ;+∞) (x0 ), from Eq. (2.1) we easily obtain that the ltering distribution at time 0 is that of a Pa( + 1; max( ; y0 )) r.v. Let us assume that the ltering density at time n − 1 is that of a Pa(n−1 ; n−1 (y n−1 )) r.v. Since n−1 (y n−1 )Wn ∼ Pa( ; n−1 (y n−1 )), from Eq. (2.2) and Proposition 4.4 we get that the density of Xn given Y n−1 = y n−1 is that of a Pa(n−1 + ; n−1 (y n−1 )) r.v. Moreover, from Eq. (2.1) it is immediate to prove that the ltering density at step n is that of a Pa(n ; n (y n )) r.v., where n = n−1 + +1 and n (y n ) = max( n−1 (y n−1 ); yn ). A suitable extension of the Pareto-uniform model involves a bidimensional unobservable process which is based on the bilateral Pareto distribution and provides a stochastic modelling for both the lower and the upper bounds of the support of the uniform observation distribution. This p.o.s.p. is a generalization of Eqs. (4.4) and (4.5) and it is de ned, for any n¿0, by 1 (xn ; y n ; !) = max[xn1 ; n (y n ) − ( n (y n ) − n (y n ))Wn+1 (!)]; fn+1 2 (xn ; y n ; !) = fn+1
1 (xn ; y n ; !) n (y n ) − fn+1 n n (y ) − xn1
(4.6)
× min[xn2 − xn1 ; ( n (y n ) − xn1 )Wn+1 (!)]; gn (xn ; !) = xn1 + (xn2 − xn1 ) Vn (!); where xn = (xn1 ; xn2 ) and n (y n ) and n (y n ) are suitable real functions of the observations y n . Here, the state equation describes the one-step ahead stochastic evolution of Xn1 and Xn2 − Xn1 , where Xn = (Xn1 ; Xn2 ) correspond to the lower and upper bounds of the support of the uniform observation distribution at time n. Proposition 4.7. Let X0 ∼ Pa2 (; ; ), {Wn }n∈N+ be a sequence of independent Pa( ; 1) r.v.’s and {Vn }n∈N be a sequence of independent uniform r.v.’s on (0; 1). Then, the p.o.s.p. de ned by Eqs. (1.2)–(4.6) admits a f.d.f.s., being the ltering distribution at each time n equal to that of a Pa2 (n ; n (y n ); n (y n )) r.v., where n = n−1 + + 1;
∀n¿1; 0 = + 1
n
n (y ) = min(n−1 (y n−1 ); yn ) = min(; y0 ; : : : ; yn ); n
n (y ) = max( n−1 (y
n−1
); yn ) = max( ; y0 ; : : : ; yn );
∀n¿1; 0 = min(; y0 ); ∀n¿1; 0 = max( ; y0 ):
Proof. Let us start by noticing that the ltering density at step 0 corresponds to that of a Pa2 (0 ; 0 ; 0 ) r.v., which easily follows from the updating step (2:1), since X0 ∼ Pa2 (; ; ) and p(y0 |x01 ; x02 ) = 1=(x02 − x01 )5(x01 ; x02 −x01 ) (y0 ). Let us now assume that the ltering density at step n − 1 corresponds to the density of a Pa2 (n−1 ; n−1 (y n−1 ); n−1 (y n−1 )). By Proposition 4.5, we have that the con1 2 1 ditional density of ( n−1 (y n−1 ) − Xn−1 ) given Y n−1 = y n−1 and of (Xn−1 − Xn−1 )
80
M. Ferrante, P. Vidoni / Stochastic Processes and their Applications 77 (1998) 69–81
1 1 given Xn−1 = xn−1 ; Y n−1 = y n−1 are those of a Pa(n−1 ; n−1 (y n−1 ) − n−1 (y n−1 )) and 1 Pa(n−1 + 1; n−1 (y n−1 ) − xn−1 ) r.v.’s, respectively. Since ( n−1 (y n−1 ) − n−1 (y n−1 )) n−1 n−1 Wn ∼ Pa( ; n−1 (y )−n−1 (y )), it is not dicult to prove that the conditional density of ( n−1 (y n−1 )−Xn1 ) given Y n−1 = y n−1 corresponds to the density of a Pa(n−1 +
; n−1 (y n−1 )−n−1 (y n−1 )) r.v. Moreover, since Proposition 4.6 assures that the condi2 1 1 1 1 − Xn−1 ; ( n−1 (y n−1 ) − Xn−1 tional density of min(Xn−1 ) Wn ) given Xn−1 = xn−1 ; Y n−1 = n−1 n−1 1 is equal to that of a Pa(n−1 + + 1; n−1 (y ) − xn−1 ) r.v., we have that the y conditional density of (Xn2 − Xn1 ) given Xn1 = xn1 ; Y n−1 = y n−1 is that of a Pa(n−1 +
+ 1; n−1 (y n−1 ) − xn1 ) r.v. Thus, by Proposition 4.5, we obtain that the prediction distribution at step n − 1 is a Pa2 (n−1 + ; n−1 (y n−1 ); n−1 (y n−1 )). It is now easy to prove, by applying Eq. (2.1), that the ltering distribution at step n is a Pa2 (n ; n (y n ); n (y n )), where n = n−1 + + 1, n (y n ) = min(n−1 (y n−1 ); yn ) and n (y n ) = max( n−1 (y n−1 ); yn ).
Remark 4.4. The Pareto-uniform and the bilateral Pareto-uniform models may be considered when the observations, at each time n, are distributed as a continuous uniform r.v. Here, the lower and the upper bound of the support of the uniform observation distribution are stochastic; their development is described by a suitable latent process related, respectively, to the Pareto or to the bilateral Pareto distributions.
Acknowledgements We would like to thank Professor O.E. Barndor-Nielsen and all the sta of the Department of Theoretical Statistics of the University of Aarhus, Denmark, for the warm hospitality during our stay in Aarhus, where we initiated this work.
References Barndor-Nielsen, O., 1997. Normal inverse gaussian distribution and stochastic volatility modelling. Scand. J. Statist. 24, 1–13. Benes, V.E., 1981. Exact nite dimensional lters for certain diusions with nonlinear drift. Stochastics 5, 65–92. Ferrante, M., Runggaldier, W.J., 1990. On necessary conditions for the existence of nite-dimensional lters in discrete-time. Systems Control Lett. 14, 63– 69. Ferrante, M., Giummole, F., 1995. Finite dimensional lters for a discrete-time nonlinear system with generalized gaussian white noise. Stochastics Stochastics Rep. 53, 195–211. Hazewinkel, M., Willems, J.C. (Eds.), 1981. Stochastic Systems: The Mathematics of Filtering, Identi cation and Applications. Reidel, Dordrecht. Jambunathan, M.V., 1954. Some properties of Beta and Gamma distribution. Ann. Math. Statist. 25, 401– 405. Jazwinski, A.H., 1970. Stochastic Processes and Filtering Theory. Academic Press, New York. Levine, J., Pignie, G., 1986. Exact nite-dimensional lters for a class of nonlinear discrete-time systems. Stochastics 18, 97–132. Shephard, N., 1994. Local scale models: state space alternative to integrated GARCH processes. J. Econometrics 60, 181–202. Shephard, N., 1996. Statistical aspects of ARCH and stochastic volatility. In: Cox, D.R., Hinkley, D.V., Barndor-Nielsen O.E. (Eds.), Likelihood, Time Series Models – In Econometric Finance and Other Fields. Chapman & Hall, London.
M. Ferrante, P. Vidoni / Stochastic Processes and their Applications 77 (1998) 69–81
81
Smith, R.L., Miller, J.E., 1986. A non-Gaussian state space model and application to prediction of records. J. Roy. Statist. Soc. Ser. B 48, 79 –88. Springer, M.D., 1979. The Algebra of Random Variables. Wiley, New York. Vidoni, P., 1998. Exponential family state space models based on a conjugate latent process. J. Roy. Statist. Soc. Ser. B, to appear.