Journal
of Statistical
Planning
and Inference
33 (1992) 121-129
North-Holland
New developments in inference for temporal stochastic processes C.C. Heyde Department of Statistics, Institute
of Advanced Studies, Australian National University, Canberra, ACT, Austra-
lia Received February
Abstract:
1988; accepted April 1990
This paper is concerned
the use of quasi-score
estimating
ogy offers the principal maximum
likelihood,
advantages
estimators
is primarily
theory is now available. estimator
Estimation;
processes through
derived therefrom.
asymptotic.
This methodol-
The emphasis in the paper is in describing
models based on a trend, which is typically random,
results for the quasi-likelihood Key words andphrases:
recent progress in inference for stochastic
and quasi-likelihood
of both least squares, which is founded on finite sample considerations,
whose justification
types of semi-martingale which a comprehensive
with sketching functions
Consistency
and minimum
and a stochastic size asymptotic
disurbance, confidence
and the for zone
are also noted.
quasi-likelihood;
semi-martingale
models.
I. Introduction A considerable unification of ideas has recently taken place in the general theory of inference for stochastic processes through the use of what are called quasi-score estimating functions. This framework incorporates essential ideas from the methods of least squares and maximum likelihood and we shall discuss the developments in some detail. However, our primary emphasis in the paper is in describing the types of models and the kinds of inferential results for which a comprehensive theory is now available.
2. Modelling It should first be noted that general statistical theory is principally concerned with prototype models of systems (biological or otherwise). For specific applications
Correspondence to: Prof. CC. Heyde, Dept. of Statistics, Institute tional University, GPO Box 4, Canberra, ACT 2601, Australia.
0378-3758/92/$05.00
0
1992-Elsevier
Science Publishers
for Advanced
B.V. All rights
Studies, Australian
reserved
Na-
C.C. Heyde
122
some modifications
/ Inference
are often necessary.
for
temporal
The general
processes
theory
operates
at the level of
what one might call strategic models. These are simple, mathematically tractable models constructed with the aim of identifying possible physical or biological principles. That is, to answer “could it happen?” rathern than “will it happen?“. Strategic models provide a conceptual framework for the discussion of broad classes of phenomena. They are a preliminary can be used on real data.
to the development
of testable models which
Now there are two prospective components that one should consider in a strategic model, namely (i) a trend, which may be deterministic, and (ii) a stochastic disturbance of random fluctuations, due to (iia) intrinsic stochasticity (variability conditional on fixed parameters) and possibly (iib) environmental stochasticity (variability consequent upon changing parameters). It should be remarked that trend quite often does not correspond to mean behaviour. To illustrate the concepts we take (Z,, t = 0,1, . . . } as a Galton-Watson branching process with 0 =E(Z, 1Z,= 1) (> 1) as the mean of the offspring distribution. Then, it is possible to write Z, in the form Z,=BZ,-,+rlt,
(1)
where qr = (Z”’ ,,,_,-B)+(Z,i:‘_,-8)+...+(ZI~_;‘-8) represents intrinsic stochasticity. and identically distributed, each can be thought of as consisting disturbance qt. Note that E(Z, where W is random with EW= (e.g. Athreya and Ney (1972, p.
Here Z,(f/_ 1, i = 1,2, . . . , Z,_ r , are independent with the distribution of Zt 1Ze = 1. Equation (1) of a trend term 8Z,_r together with a stochastic 1Ze= 1) = 0’ but (Z, / Z,= 1)-B’W a.s. as t+ m, 1 and P( W>O) >0 provided EZt log(l+ Zt) < 03 24)), so a deterministic description of trend is not
possible. If the Galton-Watson process is non-homogeneous and generation t reproduces with offspring mean 13,having mean 8, then the model could be written as ZI=~zr-,+~,+E~,
(2)
where 5r=(&,-@)Z,-1 and
represent, respectively, environmental stochasticity and intrinsic stochasticity disturbances about the trend. Here, as is usually the case, it is possible to generate much more system variability from environmental stochasticity than from intrinsic stochasticity. The theory that we shall discuss is most conveniently treated in a continuous time context. Note that this covers the case of discrete time, for any process {X,, k = 0, 1, . . . } can be set in continuous time by defining
C. C. Heyde / Inference for temporal processes
x(t)
=x/f,
k
123
1.
The general principle for building strategic models can be thought of as follows. Let X(t) represent the vector of interest at time t, the data for analysis being of the form {X(s),O
1%I =A(@ d4,
E[dX(t)
wheref,(8) is a predictable process and A, is a monotone then integration gives the representation
X(t) =
I‘h9
right continuous
dA,+ MO
process,
(3)
TO
where {m,(B),@,} is a martingale. Here the trend term is {i.&(0) dl, and the stochastic disturbance is m,(e). Inference for the (semimartingale) model (3) has been discussed by Hutton and Nelson (1986) and Godambe and Heyde (1987) for vector valued processes and a vector parameter and in the scalar case by Thavaneswaran and Thompson (1986) and Habib (1992). It covers a very wide variety of applications. In particular it should be noted that all processes observed in discrete time are representable in the form (3). Suppose that the observations are {Y,, k= 0, 1, . . . , T} and write X(t)= Clzo Yk for integer t. Then, x(t)=
f: E(Y,p&)+ k=O
f: (Y,-WY, k=O
I@k-l))
is of the requisite form being a sum of predictable differences plus a martingale. Also covered are many Markov process models such as those discussed by Kurtz (1981) including epidemic, diffusion, chemical reaction and genetic models in which case f,(e) is of the form Af(X(t)) for some operator A and Ar = t. Later, we shall assume that the quadratic characteristic (m(8)>, is representable in the form
(m(e)),
=
”0,(e) d&,
(5)
\ .O
where a,(e) is a predictable matrix. Note that (m(8)), is the unique predictable increasing process such that m,(e)mi(e) - (m(B)), is a martingale, the prime denoting transpose. A convenient sketch of these martingale concepts is available in Shiryaev (1981).
3. Optimality Optimality
of estimating of estimating
functions function
procedures
has recently
been subjected
to inten-
124
C.C. Heyde / Inference for temporal processes
sive study and we shall indicate the basic results that have been obtained and their relevance for the model (3). A more comprehensive discussion, but with rather different emphasis, is available in Godambe and Heyde (1987). The general framework takes {X(S), O<.s< T} as a sample from a process with values in r-dimensional Euclidean space whose distribution depends on a parameter 8 from an open subset of p-dimensional Euclidean space. Let {g,, t 2 O> be a standard filtration formed from the history of the process (X(t)} and write P for the class of possible probability class g of square integrable {Gr=G({X(s),
measures for {X(t)}. martingale estimating
We shall focus attention functions
on the
O
which are martingales for all members of P. Here G, is a vector of dimension p. Estimators 8* are found by solving the estimating equation Gr(0 *) =0 and it should be noted that all the standard methods of estimation, such as maximum likelihood, least squares, conditional least squares, minimum x2 etc., are estimating function methods of this type. We shall suppose that the true probability measure for X(t) has density pt(Q) and we write U,(e) =p;‘(Q)j,(e) for the score function which is presumed to exist. Here the dot refers to differentiation with respect to the components of 0 so that bd,(B) is the column vector with components dpt(0)/dt9i. We confine attention to the subclass g 1 C E4 of martingale estimating functions for which Ed,= (E dG, ;/83,) and E(G,G;) are nonsingular. Now, the score function USE SF2and there is a quite extensive body of theory which advocates the use of the estimating equation U,(e) = 0, i.e. the method of maximum likelihood (e.g. Basawa and Prakasa Rao (1980, Chapters 7,8), Hall and Heyde (1980, Chapter 6)). However, the true underlying distribution and hence the score function are not ordinarily known in practice. If UT is unknown it may be argued that it is best to choose an estimating function Gr which has the minimum distance, in an appropriate sense, from UT. This idea is formalized in the following criterion for optimality for fixed samples, denoted by Or. We shall say that GF is OF-optimal within (e2C 9, if for some fixed matrix
function E(aUT-
a depending G,)(dJ-
on 19,
G,)‘-
E(dJT-
G;r)(aU,-
G;)’
(6)
is nonnegative definite for all GEE g2, 0 and elements of P. This is a condition of minimum dispersion distance and an Or-optimal estimating function is defined only up to a constant (matrix) multiplier. Now the criterion (6) for Or-optimality contains the score function UT which is in general unknown but there are alternative criteria which are equivalent and do not involve UT. The most useful of these is that (EC$)‘(EG;Gj’)-‘(Ed;) is nonnegative
definite
- (E&)‘(EG,G’#(Ed,)
for all GEE FS2, B and elements
(7) of P. Details
are given in
C. C. Heyde
/ Inference
for temporal
processes
125
Godambe and Heyde (1987). In the case p = 1 of a one-dimensional parameter, (7) readily translates into the condition that GF has maximum correlation with the unknown score function. This follows because, under the regularity conditions which we are assuming, for p= 1,
E(U/,GT) = -Ed,. Now Or-optimality
is by no means
a uniquely
desirable
property
and in par-
ticular it is most important that estimators should have good asymptotic properties. It turns out that (Godambe and Heyde (1987, Section 4)) modulo certain mild regularity conditions, G;E Ce, provides asymptotic confidence intervals of minimum size for 0 provided that G*‘(G*)-‘G*_G’ T
TT
7
(G)-‘G T
(8)
T
for all (regular) GEE g2, 0 and elements of P. Here the bar process defined by GT=jl E(dd;- ) f3” ) and we note that
is nonnegative definite denotes the predictable
EGT= EG;T,
E(G)T=EGTG;.
(9)
If GF satisfies the criterion (8) we shall say that it is O,-optimal, meaning optimal in the asymptotic sense. It should be remarked that if the score function USE g2, then UT is O,-optimal. That is, maximum likelihood possesses the O,-optimality property. Now it is clear from the results (9) that (8) is a kind of stochastic version of (7). It is, furthermore, straightforward to use these criteria for various useful sets YJ2 of estimating functions and it ordinarily happens that Or-optimality and OAoptimality occur together. When this happens the optimal G; is called a qumiscore estimating function and a solution of G;(B) = 0 is called a quasi-likelihood
estimator. For example,
if for the model
(3) with (5) we set ‘T
G+FJ,:
g2=
G,=
i
b,(B) dm,(O),
then the optimal solution in both the OF and 0, function) can be taken as ‘I
G;W =
b, predictable
\ *O
, 1
senses (the quasi-score
estimating
T
I fv’(e)a:(8>
dm,(@,
(10)
.O
where the plus denotes the Moore-Penrose generalized inverse. This solution has been discussed in some detail in Godambe and Heyde (1987). A particularly important special case of this last example is that of a process observed in discrete time. For the model (4) where T
GT~ CC?,:CT= c b,(e)(Y,-E(Y, k=l
(g,_,)),
6, is gk_t-measurable
, I
126
and writing
C.C. Heyde / Inference for temporal processes
hk = Y, - E( Y, 1gk_ ,), we can take
as the quasi-likelihood
estimating
function.
Note that, in particular,
if all the terms
E(h&k 1@k_l) are the same (e.g. if the {Y,) are stationary) then G;(B) =0 gives the conditional least squares estimator, also obtained by minimizing with respect to
e. Qr(Q=
i hk$
k=O
the dispersion matrix composed of the one-step errors of best prediction (e.g. Hall and Heyde (1980, pp. 172-173)). Ordinary least squares corresponds to the case where each E(Yk / gk_]) is a.s. constant. It should be remarked that quasi-likelihood estimators are, under broad conditions, strongly consistent and asymptotically normally distributed. Sufficient conditions for these results are given in Hutton and Nelson (1986). For a concrete example of the quasi-likelihood methodology, note that under certain circumstances the membrane potential V(t) across a neuron is well described by a stochastic differential equation dV(t) =(-@V(t)
+A) dt+ M(t),
(11)
(e.g. Kallianpur (1983)) where M(t) is a martingale with discontinuous sample paths and a (centered) generalized Poisson distribution. Here (M), = o2 t for some (T> 0. The model is of the form (3) with (5) and the use of (10) gives G;=
‘(-I’(t) .\’0
l)‘{dI’(t)-(-eI’(t)+l)dt}
as the quasi-likelihood estimating function for 8= (Q A)’ on the basis of a single realization {V(s), O
equations 37. l’(t)dI’(t)= ! .O
‘T(-@‘(t)+X)l’(t)dl, I .O
V(T)- V(O)= ‘r(-&V’(t)+X)d& I -0
and it should be noted that these do not involve detailed properties of the disturbance M(t), only a knowledge of (M),. In particular, they remain if M(t) is replaced (as holds in a certain limiting sense; see Kallianpur rs2 W(t) where W(t) is standard Brownian motion. In this latter case 4 actually the respective maximum likelihood estimators.
stochastic the same (1983)) by and 1 are
127
C.C. Heyde / Inference for temporal processes
4. General formulation It turns out that the availability of a semimartingale model of the type essential for an optimal estimating function result analogous to (10). {X(S), 0G.s~ T}, where the distribution of X(t) depends on 8, it is only to find some naturally related martingale {h,(B),S5} and then, among estimating functions
(3) is not For data necessary the set of
,T
I
G;=
IT (di;,)‘(d(h),)+
I TO
,
b,(B) d/z,(Q), 6, predictable
(0
dh,
1 (12)
is a quasi-score estimating function, being both Or- and O,-optimal. The estimating function (12) can conveniently be interpreted as the derivative of an underlying quasi-likelihood whose maximum provides the quasi-likelihood estimator. Indeed, it is usually possible to obtain it as the true score function for members of a certain exponential family. The quasi-likelihood terminology and its classical application are due to Wedderburn (1974) who dealt with independent random variables K, i= 1,2, . . . , n, with EY,=p;(@), var y= v;(0) and introduced the quasi-score estimating equation ;j,
(,&(Wv,(~))W, -P;(6) = 0.
(13)
Note that (12) reduces to (13) in the particular case where hk(/3) = CFE, (X,-p;(d)). The quasi-likelihood estimator is an ordinary maximum likelihood estimator in an exponential family setting. This result is useful for diffusion and compound Poisson processes, for example. These are exponential family situations and the quasi-score function is much simpler to write down than the likelihood. A case in point is given by the version of (11) where the stochastic disturbance is cr* W(t). Of course the martingale (hs(0),g5} on which the quasi-score estimating function (12) is based is not unique and competing quasi-score estimating functions based on other martingales may be available. These competitors can be compared by means of a martingale information criterion and combined into a new quasiscore estimating function if this is advantageous. Details are given in Heyde (1987).
5. Scope of the methodology The methodology described herein deals with the case of a finite number of parameters but not the estimation of functions (the infinite dimensional case). For example, in the case of counting processes the basic model is of the form X(t) = ” /I(s) ds+M(t), I CO
128
C.C. Heyde
/ Inference
for Iemporal
processes
with intensity function L(t), M(t) being a martingale. In some applications wish to deal with a linear model for L(r) of the form
we may
where the orj(t) are unknown functions to be estimated and the Jj(t) are known covariates. The sample would typically be of n copies of X, J,, J2, . . . , Jp, observed over some time interval. This is the Aalen model and it is only amenable to direct treatment by the methods of this paper if the ~j’s are constants. However, the problem can be treated by Grenander’s method of sieves (Grenander (1981)). If the aj’s are regarded as if they are piecewise linear then the problem is reduced to one of the estimation of finitely many parameters. This can be done on a mesh of small size which is set to tend to zero as the sample size n increases. The approach outlined above is tedious and inelegant but fortunately it does appear that much of the theory discussed in this paper will have direct extensions to the infinite dimensional case. Some preliminary results along these lines have been given in the thesis of Thavaneswaran (1985). A Bayesian approach to the infinite dimensional problem is also possible; see Thompson and Thavaneswaran (1992). Finally, it should be remarked that the available general theory does not directly address the matter of nuisance parameters and some interesting problems occur in this area.
References Athreya,
K.B. and P.E.
Basawa,
I.V. and B.L.S.
Press,
Ney (1972). Branching Prakasa
Processes.
Rao (1980). Statistical
New York. for Stochastic
Processes.
Academic
London.
Godambe, V.P. and C.C. Heyde (1987). Quasi-likelihood 23 l-244. Grenander,
Springer, Inference
U. (1981). Abstract
Inference.
Habib, M.K. (1992). Optimal estimation ference 33, 143-156 (this issue).
281-287. Hutton, J.E. and P.1. Nelson
estimation.
Znt. Statist.
Rev. 55,
Wiley, New York. for semimartingale
Hall, P.G. and C.C. Heyde (1980). Martingale York. Heyde, C.C. (1987). On combining
and optimal
Limit
quasi-likelihood
models.
Theory and its Application. estimating
(1986). Quasi-likelihood
neuronal
functions.
estimation
J. Statist.
P/arm.
Academic
Press, New
Stochastic
Process.
for semimartingales.
Appl.
Stochastic
In-
25, Pro-
cess. Appl. 22, 245-251. Kallianpur, G. (1983). On the diffusion approximation to a discontinuous model for a single neuron. In: P.K. Sen, Ed., Contributions to Statistics: Essays in Honor of Norman L. Johnson. NorthHolland, Amsterdam, 247-258. Kurtz, T.G. (1981). Approximation
of Population
Processes.
CBMS-NSF
Regional
Applied Mathematics, No. 36. SIAM, Philadelphia, PA. Shiryaev, A.N. (1981). Martingales: recent developments, results and applications. 49, 199-233.
Conference Internat.
Series in
Statist.
Rev.
C. C. Heyde / Inference for temporal processes
Thavaneswaran,
A. (1985). Unpublished
Ph.D.
Thavaneswaran,
A. and M.E. Thompson
(1986). Optimal
thesis.
University
129
of Waterloo,
estimation
Canada.
for semimartingales.
J. Appl. Pro-
bab. 23, 409-417. Thompson, processes.
M.E. and A. Thavaneswaran J. Statist. Phnn.
Wedderburn, Gauss-Newton
R.W.M. method.
(1974).
Inference
(1992). On Bayesian
non-parametric
estimation
for stochastic
33, 131-141 (this issue).
Quasi-likelihood
Biometrika 61, 439-447.
functions,
generalized
linear
models,
and
the