Statistics & Probability Letters 2 (1984) 105-109 North-Holland
ON THE MARGINAL PROCESS
March 1984
DISTRIBUTION
OF A FIRST ORDER
AUTOREGRESSIVE
Jeffrey D. H A R T Institute of Statistics, Texas A&M University, College Station. Texas 77843, USA Received January 1983 Revised June 1983
Abstract." The question of what marginal distributions are possible for a first order autoregressive process is addressed. Results concerning the possible multimodality of the marginal distribution are obtained. Keywords: symmetry, multimodality, class L distributions, infinitely divisible distributions.
1. Introduction A time series { X,: Itl = 0, 1, 2 .... } is said to be autoregressive of order p and moving average of order q, or ARMA (p, q), if it satisfies
q,(B)X,=O(B)Z,
(1.1)
where ~ ( B ) and O(B) are backward shift operators of order p and q, respectively, and the series ( Z,: It[ = 0, 1, 2 .... } is an i.i.d, sequence of random variables. Assuming that the operator ~ is invertible, (1.1) is equivalent to the following infinite order moving average representation for Xt:
X, = ~ - ' ( B ) O ( B ) Z , = ~_~ fljZ,_j.
(1.2)
j=o
In order to facilitate the problem of inferring the parameters associated with ~, and 0 it is often assumed that the common distribution of each of the noise terms, Z t, is N(0, 02). In a recent article Lusk and Wright (1982) discussed the implications of assuming that the distribution of Z t is other than normal. Specifically, they addressed the question of what sorts of probability distributions are possible for the output X t of the linear system in (1.2). The more concrete results of Lusk and Wright pertain to the first order moving average model, but they conjecture that, for p > 0, the distribution
of X, tends to be symmetric and unimodal, since X, is the infinite sum of independent random variables. In the special case of a first order autoregressive process, the purpose of the current note is to (i) point out counterexamples to the conjecture that the distribution of X, is of necessity symmetric; (ii) show that any of a wide class of bimodal distributions is impossible for X t unless the series { X, ) is sufficiently 'close' to white noise; (iii) demonstrate that X, can, in fact, have a multimodal distribution.
2. The question of symmetry and unimodality of
X,'s distribution The first order autoregressive process has the form
X, = oX,_l + Z,,
I/I = 0, 1 . . . . .
(2.1)
where Ipl < 1. The representation (1.2) for X, holds with flj = p J, j = 0, 1, 2 ..... For this particular process (assuming that the Z, are i.i.d.) it is of interest to characterize the distributions Fx which are possible for Art. Before stating a theorem in this regard, it is worth pointing out that most results characterizing the distribution of an infinite sum
0167-7152/84/$3.00 © 1984, Elsevier Science Publishers B.V. (North-Holland)
105
Volume 2, Number 2
STATISTICS& PROBABILITYLETTERS
of independent random variables require that individual terms of the sum be asymptotically negligible relative to the sum itself. Since this requirement is not satisfied by the sum in (1.2), it is not at all clear that 3(, should obey a central limit type of theorem. One immediate example (see Chernick (1981)) illustrating that X, need not have a typical 'limiting' distribution is that of the first order autoregressive process in which each X, has a U(0, 1) distribution, a distribution which is not even infinitely divisible. In the following theorem an incomplete, but informative, characterization of the distribution of X, is given. The result in this theorem has been cited independently by Gaver and Lewis (1980) and McKenzie (1982), but is wroth repeating here since it is due the status of a theorem and since it gives rise to Corollary 2.1, which is important in the context of multimodality. Theorem 2.1. Let { X, } be a first order autoregressire process as in (2.1) with the Z, i.i.d. The distribution function F may exist as the distribution of X, for each p ~ (0, 1) if and only if F is in the class L of distributions, a subclass of the infinitely divisible distributions. (See, e.g., Feller (1971) for a definition of the class L.) Let ~bx and ~Pz be the characteristic functions of X, and Z, respectively. Since X,_ 1 and Z, are independent, it follows from (2.1) that
Proof.
~Px(S)=~x(pS)q~z(S)
foralls~(-oo,
oo).
A given distribution F with characteristic function ~kF is thus a possible distribution of X, for each p c ( 0 , 1) iff ~tlF(S)//I~F(PS ) is a characteristic function for each p ~ (0, 1). By the lemma on p. 588 of Feller (1977), though, the latter condition is necessary and sufficient for F to be in the class L, and the result follows. Corollary 2.1. Under the conditions of Theorem 2.1, if F exists as the distribution of Xtfor each p ~ (0, 1),
March 1984
Theorem 2.1 provides many counterexamples to the conjecture that X,'s distribution tends to be symmetric, as, for example, the class L contains the stable distributions (see Feller (1971)), many of which are asymmetric. Another example (see Gaver and Lewis (1980)) which shows that the distribution of X, need not be even approximately symmetric is the exponential distribution
Z,(x)={o
1 - e -ax
,x<0
,x > 0
(a> 01,
which is the distribution of X, whenever the distribution of each Z, is Fz(x)=
0 p+(1
,x<0, -
p)(1-e
-~x)
,x>_-0,
where p ~ (0, 1). Such a probability model for the input series Z, is not finreasonable since, as Robinson (1982) has pointed out, in practice time series realizations are sometimes mixtures of discrete and continuous random variables. Conspicuously missing from Theorem 2.1 is any mention of the nature of X,'s distribution for p negative. Although it is not apparent if there is an analog to this theorem for p ~ ( - 1, 0), it is interesting to point out that any symmetric stable distribution may be the distribution of X, for each p E ( - 1 , 1). This follows from the fact that any such distribution has a characteristic function of the form ~,~(s) = e -~1"1~ (where 0 < a ~< 2 and "/is a positive constant), and thus q/~(s)/t~,(OS ) is a characteristic function for each 0 ~ ( - 1, 1). Having shown that the distribution of X, need not be symmetric, we m a y turn now to the question of unimodality. Although Corollary 2.1 does not imply that the distribution of X, must be unimodal, it does imply the following: for any multimodal distribution F there exists at least one positive value of O such that
then F is unimodal.
X, = ~ oJZ,_j j=o
Proof. The result is immediate from Theorem 2.1 since Yamazato (1978) has proven that each distribution in the class L is unimodal.
does not have the distribution F. The following theorem goes much further than this by discounting, for almost all O ~ ( - 1 , 1), a wide class of
106
Volume 2, N u m b e r 2
STATISTICS & PROBABILITY LETTERS
mixture distributions as possibilities for the distribution of X,. Theorem 2.2. Let F be any infinitely divisible distribution with characteristic function @F" I f p is any number in ( - 1, 1) except 0 or one of those in the set { l / k : Ikl = 3, 5, 7 .... }, then the distribution function of
X, = ~ p/Z,_: j=0
(where the Z k are i. i.d.) is not of the form F,,.,:(x)=½[F(x-q)+F(x-c2)],
March 1984
instructive of how other types of multimodality may be eliminated (for certain p values) as candidates for X,'s distribution. The proof of the theorem uses the fact that a necessary condition for X, to have the distribution F (for some p) is ffF(S)
~< 1
{2.2}
for alls.
Since in general I+g] is not monotone on (0, ~ ) for multimodal distributions, (2.2) will not in general be satisfied by such distributions for ]Ol sufficiently close to 1.
q 4:c 2. 3. Effect of a bimodal input distribution
Proof. The characteristic function associated with F c..... is
4,F(S; c,, c2) = ½~Pr(s)(e i
c2) X
Since one would suspect that a multimodal distribution for X, is most likely when the distribution of Z, is multimodal, we consider now the distribution of X t when the distribution of Z, is the mixture of two normals. More specifically, suppose that Z, has density
fz(x; o,c)=(2
2~o)-'(exp(-(x
+exp(--(x-c)2/ZoZ))
e x p ( i q (1 - p ) s ) ( 1 + exp(i(c 2 - q ) s ) )
+C)2/2o 2) (c > 0),
(1 + exp(i(c 2 - c,)ps))
to be acharacteristic function. Since F is infinitely
divisible, ~ r never vanishes; and thus in order for the above ratio to be a characteristic function it is necessary that (1 + e i{`'2-cDs) vanishes whenever (1 + e i(c2-'')°*) vanishes. Simple complex analysis shows that this can only happen when p { I / k : Ikl = 3, 5, 7 .... }. As a simple example of the result in Theorem 2.2 consider
f ( x ) = (2 2 ~ o ) - l ( e x p ( -- (x - b q ) 2 / 2 o 2) + exp(-- ( x - - / & ) 2 / 2 a 2 ) ) , a mixture of two normal densities which is bimodal for 1/~1-/'21/o sufficiently large. For O nonzero, Theorem 2.2 states that f cannot possibly be the distribution of X, unless p = + ½, + ½. . . . . Although the class of bimodal distributions which were eliminated (for IPI > ½) above is quite large, perhaps more than anything Theorem 2.2 is
(3.1) which has mean 0 and variance o 2 + c 2. The characteristic function corresponding to this density is
~pz(s) = e -~'2°:/2 cos(cs). and hence the characteristic function of X, is
+x(S) = f i e -~p2"o~/2 cos(cpJs) j=O
= e -s~°:/(2('-,~)} cos(cs) f i cos(cpJs). j=l
(3.2) (Since ~x is invariant to a change of p's sign, we assume that p > 0.) F r o m (3.2) it follows that the distribution of X, is the same as the distribution of the convolution of two independent random variables I/1 and I12, say, where I/1 has density f z ( ' ; o(1 - p2)-1/2, c) and Y2 is the convolution of independent random variables ~* ( j = 1, 2 .... ) h a v 107
Volume 2, Number 2
STATISTICS & PROBABILITY LETTERS
March 1984
(3.2) is satisfied. This implies the existence of a multimodal distribution for Xt for any p satisfying
ing distributions
Ipl< ½. The density g of I11 + Y2 is g(x) =
f co/(1-0) ]z(x . I - Y ; a(1 - 02) -1/2 , c) dFr~(y ), c o / ( 1 - O)
where Fr2 is the c.d.f, of I12. Since the distribution of II1 + Y2 is symmetric about 0, g has either a relative minimum or maximum at 0. If g ' ( 0 ) > 0, then g has a relative minimum at 0, implying that g is multimodal. Since
(ii) Fix o at any value f o r which f z ( x ; 2o/v~-, c) is bimodal. Then there exists a P0 (0 < Po < 3) such that (3.2) is satisfied for 0 ~ p < P0. Therefore, if Z, has any bimodal density of the form in (3.1), then for p sufficiently small Xt has a multimodal distribution. Although (3.2) cannot be simplified for each value of p, we have (as shown by Feller (1971, p. 593)), for p = + 3, f i cos(co's) = f i c o s ( c s / 2 ' ) = sin(cs ), j= 1
g"(0)
fcp/(1
"-
j= 1
CS
=
co/(
- O)
.,, [
]}(-y,'o(1-O)
2 "~ - 1 / 2
which is the characteristic function of a random variable which is uniformly distributed on ( - c, c). Using standard convolution results, it is now easy to verify that, for p = _+ 3, X, has the unimodal density
,c) dFr:(Y),
1 - o)
g"(0) > 0 if
f~'(-y;
o(1-p2)-'/2, c)>0
forlYl <
cp
1 - p"
/ ~ ( x ; o,
(3.2)
c) = (¼~)(~(¢3(x
+ 2c)/2o)
-~(Vr3(x- 2c)/2o)),
We now consider two cases. (i) Fix p at any value such that 0 < p < 3 . Then clearly a can be chosen small enough so that
where • is the standard normal c.d.f. For this particular input distribution, then, IP] = ½ is ap-
.4
.3
.2
.1
-2
Fig. 1.
108
-I
I 0
I 1
I 2
Volume 2, Number 2
STATISTICS & PROBABILITY LETTERS
March 1984
.25
.20
.15
.lO
• 05
-3
-2
-l
1
2
3
Fig. 2.
parently the cutoff point between unimodality of X,'s distribution and the possibility of multimodality. For c = 1 and o = ½, fz('; o, c) is shown in Figure 1 and the corresponding fx('; o, c) for 0 = + ½ in Figure 2.
Acknowledgement The author would like to thank Emanuel Parzen and T.E. Wehrly for their helpful comments.
References 4. Concluding remarks Although the dependence structure imposed by an ARMA process limits to some extent the distributions which 3(, may have, the results of this note indicate that an assumption of even approximate normality is not well founded. For a first order autoregressive process multimodality appears to be a phenomenon associated with small values of IP], but the variety of unimodal distributions which is possible is quite large, including the symmetric stable distributions if p ~ ( - 1 , 1) and the larger class L of distributions if 0 E (0, 1). This wealth of distributional types simply reconfirms the need for robust methods, such as those of Denby and Martin (1979), of estimating the parameters of ARMA models.
Chernick, M.R. (1981), A limit theorem for the maximum of autoregressive processes with uniform marginal distributions, Ann. Probab. 9, 145-149. Denby, L. and R.D. Martin (1979). Robust estimation of the first order autoregressive parameter, JASA 74, 140-146. Feller, W. (1971), An Introduction to Probability Theory and its Applications, vol. II (Wiley, New York, 2nd ed.). Gaver, D.P. and P.A.W. Lewis (1980), First-order autoregressive gamma sequences and point processes, Advances in Applied Probability 12, 727-745. Lusk, E.J. and H. Wright (1982). Non-Gaussian series and series with non-zero means: Practical implications for time series analysis, Statist. Probab. Lett. 1, 2-6. McKenzie, E. (1982), Product autoregression: A time series characterization of the gamma distribution, J. Applied Probab. 19, 463-468. Robinson, P.M. (1982), Analysis of time series from mixed distributions, Ann. Statist. 10, 915-925. Yamazato, M. (1978), Unimodality of infinitely divisible distribution functions of class L, Ann. Probab. 6, 523-531.
109