New developments in inference for temporal stochastic processes

New developments in inference for temporal stochastic processes

Journal of Statistical Planning and Inference 33 (1992) 121-129 North-Holland New developments in inference for temporal stochastic processes C...

524KB Sizes 2 Downloads 50 Views

Journal

of Statistical

Planning

and Inference

33 (1992) 121-129

North-Holland

New developments in inference for temporal stochastic processes C.C. Heyde Department of Statistics, Institute

of Advanced Studies, Australian National University, Canberra, ACT, Austra-

lia Received February

Abstract:

1988; accepted April 1990

This paper is concerned

the use of quasi-score

estimating

ogy offers the principal maximum

likelihood,

advantages

estimators

is primarily

theory is now available. estimator

Estimation;

processes through

derived therefrom.

asymptotic.

This methodol-

The emphasis in the paper is in describing

models based on a trend, which is typically random,

results for the quasi-likelihood Key words andphrases:

recent progress in inference for stochastic

and quasi-likelihood

of both least squares, which is founded on finite sample considerations,

whose justification

types of semi-martingale which a comprehensive

with sketching functions

Consistency

and minimum

and a stochastic size asymptotic

disurbance, confidence

and the for zone

are also noted.

quasi-likelihood;

semi-martingale

models.

I. Introduction A considerable unification of ideas has recently taken place in the general theory of inference for stochastic processes through the use of what are called quasi-score estimating functions. This framework incorporates essential ideas from the methods of least squares and maximum likelihood and we shall discuss the developments in some detail. However, our primary emphasis in the paper is in describing the types of models and the kinds of inferential results for which a comprehensive theory is now available.

2. Modelling It should first be noted that general statistical theory is principally concerned with prototype models of systems (biological or otherwise). For specific applications

Correspondence to: Prof. CC. Heyde, Dept. of Statistics, Institute tional University, GPO Box 4, Canberra, ACT 2601, Australia.

0378-3758/92/$05.00

0

1992-Elsevier

Science Publishers

for Advanced

B.V. All rights

Studies, Australian

reserved

Na-

C.C. Heyde

122

some modifications

/ Inference

are often necessary.

for

temporal

The general

processes

theory

operates

at the level of

what one might call strategic models. These are simple, mathematically tractable models constructed with the aim of identifying possible physical or biological principles. That is, to answer “could it happen?” rathern than “will it happen?“. Strategic models provide a conceptual framework for the discussion of broad classes of phenomena. They are a preliminary can be used on real data.

to the development

of testable models which

Now there are two prospective components that one should consider in a strategic model, namely (i) a trend, which may be deterministic, and (ii) a stochastic disturbance of random fluctuations, due to (iia) intrinsic stochasticity (variability conditional on fixed parameters) and possibly (iib) environmental stochasticity (variability consequent upon changing parameters). It should be remarked that trend quite often does not correspond to mean behaviour. To illustrate the concepts we take (Z,, t = 0,1, . . . } as a Galton-Watson branching process with 0 =E(Z, 1Z,= 1) (> 1) as the mean of the offspring distribution. Then, it is possible to write Z, in the form Z,=BZ,-,+rlt,

(1)

where qr = (Z”’ ,,,_,-B)+(Z,i:‘_,-8)+...+(ZI~_;‘-8) represents intrinsic stochasticity. and identically distributed, each can be thought of as consisting disturbance qt. Note that E(Z, where W is random with EW= (e.g. Athreya and Ney (1972, p.

Here Z,(f/_ 1, i = 1,2, . . . , Z,_ r , are independent with the distribution of Zt 1Ze = 1. Equation (1) of a trend term 8Z,_r together with a stochastic 1Ze= 1) = 0’ but (Z, / Z,= 1)-B’W a.s. as t+ m, 1 and P( W>O) >0 provided EZt log(l+ Zt) < 03 24)), so a deterministic description of trend is not

possible. If the Galton-Watson process is non-homogeneous and generation t reproduces with offspring mean 13,having mean 8, then the model could be written as ZI=~zr-,+~,+E~,

(2)

where 5r=(&,-@)Z,-1 and

represent, respectively, environmental stochasticity and intrinsic stochasticity disturbances about the trend. Here, as is usually the case, it is possible to generate much more system variability from environmental stochasticity than from intrinsic stochasticity. The theory that we shall discuss is most conveniently treated in a continuous time context. Note that this covers the case of discrete time, for any process {X,, k = 0, 1, . . . } can be set in continuous time by defining

C. C. Heyde / Inference for temporal processes

x(t)

=x/f,

k
123

1.

The general principle for building strategic models can be thought of as follows. Let X(t) represent the vector of interest at time t, the data for analysis being of the form {X(s),O
1%I =A(@ d4,

E[dX(t)

wheref,(8) is a predictable process and A, is a monotone then integration gives the representation

X(t) =

I‘h9

right continuous

dA,+ MO

process,

(3)

TO

where {m,(B),@,} is a martingale. Here the trend term is {i.&(0) dl, and the stochastic disturbance is m,(e). Inference for the (semimartingale) model (3) has been discussed by Hutton and Nelson (1986) and Godambe and Heyde (1987) for vector valued processes and a vector parameter and in the scalar case by Thavaneswaran and Thompson (1986) and Habib (1992). It covers a very wide variety of applications. In particular it should be noted that all processes observed in discrete time are representable in the form (3). Suppose that the observations are {Y,, k= 0, 1, . . . , T} and write X(t)= Clzo Yk for integer t. Then, x(t)=

f: E(Y,p&)+ k=O

f: (Y,-WY, k=O

I@k-l))

is of the requisite form being a sum of predictable differences plus a martingale. Also covered are many Markov process models such as those discussed by Kurtz (1981) including epidemic, diffusion, chemical reaction and genetic models in which case f,(e) is of the form Af(X(t)) for some operator A and Ar = t. Later, we shall assume that the quadratic characteristic (m(8)>, is representable in the form

(m(e)),

=

”0,(e) d&,

(5)

\ .O

where a,(e) is a predictable matrix. Note that (m(8)), is the unique predictable increasing process such that m,(e)mi(e) - (m(B)), is a martingale, the prime denoting transpose. A convenient sketch of these martingale concepts is available in Shiryaev (1981).

3. Optimality Optimality

of estimating of estimating

functions function

procedures

has recently

been subjected

to inten-

124

C.C. Heyde / Inference for temporal processes

sive study and we shall indicate the basic results that have been obtained and their relevance for the model (3). A more comprehensive discussion, but with rather different emphasis, is available in Godambe and Heyde (1987). The general framework takes {X(S), O<.s< T} as a sample from a process with values in r-dimensional Euclidean space whose distribution depends on a parameter 8 from an open subset of p-dimensional Euclidean space. Let {g,, t 2 O> be a standard filtration formed from the history of the process (X(t)} and write P for the class of possible probability class g of square integrable {Gr=G({X(s),

measures for {X(t)}. martingale estimating

We shall focus attention functions

on the

O
which are martingales for all members of P. Here G, is a vector of dimension p. Estimators 8* are found by solving the estimating equation Gr(0 *) =0 and it should be noted that all the standard methods of estimation, such as maximum likelihood, least squares, conditional least squares, minimum x2 etc., are estimating function methods of this type. We shall suppose that the true probability measure for X(t) has density pt(Q) and we write U,(e) =p;‘(Q)j,(e) for the score function which is presumed to exist. Here the dot refers to differentiation with respect to the components of 0 so that bd,(B) is the column vector with components dpt(0)/dt9i. We confine attention to the subclass g 1 C E4 of martingale estimating functions for which Ed,= (E dG, ;/83,) and E(G,G;) are nonsingular. Now, the score function USE SF2and there is a quite extensive body of theory which advocates the use of the estimating equation U,(e) = 0, i.e. the method of maximum likelihood (e.g. Basawa and Prakasa Rao (1980, Chapters 7,8), Hall and Heyde (1980, Chapter 6)). However, the true underlying distribution and hence the score function are not ordinarily known in practice. If UT is unknown it may be argued that it is best to choose an estimating function Gr which has the minimum distance, in an appropriate sense, from UT. This idea is formalized in the following criterion for optimality for fixed samples, denoted by Or. We shall say that GF is OF-optimal within (e2C 9, if for some fixed matrix

function E(aUT-

a depending G,)(dJ-

on 19,

G,)‘-

E(dJT-

G;r)(aU,-

G;)’

(6)

is nonnegative definite for all GEE g2, 0 and elements of P. This is a condition of minimum dispersion distance and an Or-optimal estimating function is defined only up to a constant (matrix) multiplier. Now the criterion (6) for Or-optimality contains the score function UT which is in general unknown but there are alternative criteria which are equivalent and do not involve UT. The most useful of these is that (EC$)‘(EG;Gj’)-‘(Ed;) is nonnegative

definite

- (E&)‘(EG,G’#(Ed,)

for all GEE FS2, B and elements

(7) of P. Details

are given in

C. C. Heyde

/ Inference

for temporal

processes

125

Godambe and Heyde (1987). In the case p = 1 of a one-dimensional parameter, (7) readily translates into the condition that GF has maximum correlation with the unknown score function. This follows because, under the regularity conditions which we are assuming, for p= 1,

E(U/,GT) = -Ed,. Now Or-optimality

is by no means

a uniquely

desirable

property

and in par-

ticular it is most important that estimators should have good asymptotic properties. It turns out that (Godambe and Heyde (1987, Section 4)) modulo certain mild regularity conditions, G;E Ce, provides asymptotic confidence intervals of minimum size for 0 provided that G*‘(G*)-‘G*_G’ T

TT

7

(G)-‘G T

(8)

T

for all (regular) GEE g2, 0 and elements of P. Here the bar process defined by GT=jl E(dd;- ) f3” ) and we note that

is nonnegative definite denotes the predictable

EGT= EG;T,

E(G)T=EGTG;.

(9)

If GF satisfies the criterion (8) we shall say that it is O,-optimal, meaning optimal in the asymptotic sense. It should be remarked that if the score function USE g2, then UT is O,-optimal. That is, maximum likelihood possesses the O,-optimality property. Now it is clear from the results (9) that (8) is a kind of stochastic version of (7). It is, furthermore, straightforward to use these criteria for various useful sets YJ2 of estimating functions and it ordinarily happens that Or-optimality and OAoptimality occur together. When this happens the optimal G; is called a qumiscore estimating function and a solution of G;(B) = 0 is called a quasi-likelihood

estimator. For example,

if for the model

(3) with (5) we set ‘T

G+FJ,:

g2=

G,=

i

b,(B) dm,(O),

then the optimal solution in both the OF and 0, function) can be taken as ‘I

G;W =

b, predictable

\ *O

, 1

senses (the quasi-score

estimating

T

I fv’(e)a:(8>

dm,(@,

(10)

.O

where the plus denotes the Moore-Penrose generalized inverse. This solution has been discussed in some detail in Godambe and Heyde (1987). A particularly important special case of this last example is that of a process observed in discrete time. For the model (4) where T

GT~ CC?,:CT= c b,(e)(Y,-E(Y, k=l

(g,_,)),

6, is gk_t-measurable

, I

126

and writing

C.C. Heyde / Inference for temporal processes

hk = Y, - E( Y, 1gk_ ,), we can take

as the quasi-likelihood

estimating

function.

Note that, in particular,

if all the terms

E(h&k 1@k_l) are the same (e.g. if the {Y,) are stationary) then G;(B) =0 gives the conditional least squares estimator, also obtained by minimizing with respect to

e. Qr(Q=

i hk$

k=O

the dispersion matrix composed of the one-step errors of best prediction (e.g. Hall and Heyde (1980, pp. 172-173)). Ordinary least squares corresponds to the case where each E(Yk / gk_]) is a.s. constant. It should be remarked that quasi-likelihood estimators are, under broad conditions, strongly consistent and asymptotically normally distributed. Sufficient conditions for these results are given in Hutton and Nelson (1986). For a concrete example of the quasi-likelihood methodology, note that under certain circumstances the membrane potential V(t) across a neuron is well described by a stochastic differential equation dV(t) =(-@V(t)

+A) dt+ M(t),

(11)

(e.g. Kallianpur (1983)) where M(t) is a martingale with discontinuous sample paths and a (centered) generalized Poisson distribution. Here (M), = o2 t for some (T> 0. The model is of the form (3) with (5) and the use of (10) gives G;=

‘(-I’(t) .\’0

l)‘{dI’(t)-(-eI’(t)+l)dt}

as the quasi-likelihood estimating function for 8= (Q A)’ on the basis of a single realization {V(s), O
equations 37. l’(t)dI’(t)= ! .O

‘T(-@‘(t)+X)l’(t)dl, I .O

V(T)- V(O)= ‘r(-&V’(t)+X)d& I -0

and it should be noted that these do not involve detailed properties of the disturbance M(t), only a knowledge of (M),. In particular, they remain if M(t) is replaced (as holds in a certain limiting sense; see Kallianpur rs2 W(t) where W(t) is standard Brownian motion. In this latter case 4 actually the respective maximum likelihood estimators.

stochastic the same (1983)) by and 1 are

127

C.C. Heyde / Inference for temporal processes

4. General formulation It turns out that the availability of a semimartingale model of the type essential for an optimal estimating function result analogous to (10). {X(S), 0G.s~ T}, where the distribution of X(t) depends on 8, it is only to find some naturally related martingale {h,(B),S5} and then, among estimating functions

(3) is not For data necessary the set of

,T

I

G;=

IT (di;,)‘(d(h),)+

I TO

,

b,(B) d/z,(Q), 6, predictable

(0

dh,

1 (12)

is a quasi-score estimating function, being both Or- and O,-optimal. The estimating function (12) can conveniently be interpreted as the derivative of an underlying quasi-likelihood whose maximum provides the quasi-likelihood estimator. Indeed, it is usually possible to obtain it as the true score function for members of a certain exponential family. The quasi-likelihood terminology and its classical application are due to Wedderburn (1974) who dealt with independent random variables K, i= 1,2, . . . , n, with EY,=p;(@), var y= v;(0) and introduced the quasi-score estimating equation ;j,

(,&(Wv,(~))W, -P;(6) = 0.

(13)

Note that (12) reduces to (13) in the particular case where hk(/3) = CFE, (X,-p;(d)). The quasi-likelihood estimator is an ordinary maximum likelihood estimator in an exponential family setting. This result is useful for diffusion and compound Poisson processes, for example. These are exponential family situations and the quasi-score function is much simpler to write down than the likelihood. A case in point is given by the version of (11) where the stochastic disturbance is cr* W(t). Of course the martingale (hs(0),g5} on which the quasi-score estimating function (12) is based is not unique and competing quasi-score estimating functions based on other martingales may be available. These competitors can be compared by means of a martingale information criterion and combined into a new quasiscore estimating function if this is advantageous. Details are given in Heyde (1987).

5. Scope of the methodology The methodology described herein deals with the case of a finite number of parameters but not the estimation of functions (the infinite dimensional case). For example, in the case of counting processes the basic model is of the form X(t) = ” /I(s) ds+M(t), I CO

128

C.C. Heyde

/ Inference

for Iemporal

processes

with intensity function L(t), M(t) being a martingale. In some applications wish to deal with a linear model for L(r) of the form

we may

where the orj(t) are unknown functions to be estimated and the Jj(t) are known covariates. The sample would typically be of n copies of X, J,, J2, . . . , Jp, observed over some time interval. This is the Aalen model and it is only amenable to direct treatment by the methods of this paper if the ~j’s are constants. However, the problem can be treated by Grenander’s method of sieves (Grenander (1981)). If the aj’s are regarded as if they are piecewise linear then the problem is reduced to one of the estimation of finitely many parameters. This can be done on a mesh of small size which is set to tend to zero as the sample size n increases. The approach outlined above is tedious and inelegant but fortunately it does appear that much of the theory discussed in this paper will have direct extensions to the infinite dimensional case. Some preliminary results along these lines have been given in the thesis of Thavaneswaran (1985). A Bayesian approach to the infinite dimensional problem is also possible; see Thompson and Thavaneswaran (1992). Finally, it should be remarked that the available general theory does not directly address the matter of nuisance parameters and some interesting problems occur in this area.

References Athreya,

K.B. and P.E.

Basawa,

I.V. and B.L.S.

Press,

Ney (1972). Branching Prakasa

Processes.

Rao (1980). Statistical

New York. for Stochastic

Processes.

Academic

London.

Godambe, V.P. and C.C. Heyde (1987). Quasi-likelihood 23 l-244. Grenander,

Springer, Inference

U. (1981). Abstract

Inference.

Habib, M.K. (1992). Optimal estimation ference 33, 143-156 (this issue).

281-287. Hutton, J.E. and P.1. Nelson

estimation.

Znt. Statist.

Rev. 55,

Wiley, New York. for semimartingale

Hall, P.G. and C.C. Heyde (1980). Martingale York. Heyde, C.C. (1987). On combining

and optimal

Limit

quasi-likelihood

models.

Theory and its Application. estimating

(1986). Quasi-likelihood

neuronal

functions.

estimation

J. Statist.

P/arm.

Academic

Press, New

Stochastic

Process.

for semimartingales.

Appl.

Stochastic

In-

25, Pro-

cess. Appl. 22, 245-251. Kallianpur, G. (1983). On the diffusion approximation to a discontinuous model for a single neuron. In: P.K. Sen, Ed., Contributions to Statistics: Essays in Honor of Norman L. Johnson. NorthHolland, Amsterdam, 247-258. Kurtz, T.G. (1981). Approximation

of Population

Processes.

CBMS-NSF

Regional

Applied Mathematics, No. 36. SIAM, Philadelphia, PA. Shiryaev, A.N. (1981). Martingales: recent developments, results and applications. 49, 199-233.

Conference Internat.

Series in

Statist.

Rev.

C. C. Heyde / Inference for temporal processes

Thavaneswaran,

A. (1985). Unpublished

Ph.D.

Thavaneswaran,

A. and M.E. Thompson

(1986). Optimal

thesis.

University

129

of Waterloo,

estimation

Canada.

for semimartingales.

J. Appl. Pro-

bab. 23, 409-417. Thompson, processes.

M.E. and A. Thavaneswaran J. Statist. Phnn.

Wedderburn, Gauss-Newton

R.W.M. method.

(1974).

Inference

(1992). On Bayesian

non-parametric

estimation

for stochastic

33, 131-141 (this issue).

Quasi-likelihood

Biometrika 61, 439-447.

functions,

generalized

linear

models,

and

the