Statistics and Probability Letters 81 (2011) 1128–1135
Contents lists available at ScienceDirect
Statistics and Probability Letters journal homepage: www.elsevier.com/locate/stapro
Martingale limit theorems of divisible statistics in a multinomial scheme with mixed frequencies Haizhen Wu ∗ , Giorgi Kvizhinadze School of Mathematics, Statistics and Operations Research, Victoria University of Wellington, Wellington 6140, New Zealand
article
abstract
info
Article history: Received 10 March 2010 Received in revised form 2 March 2011 Accepted 3 March 2011 Available online 10 March 2011 Keywords: Functional limit theorems Divisible statistics Multinomial scheme Mixed frequencies
The martingale approach to limit theorems of divisible statistics in non-classical multinomial schemes, established by Khmaladze in 1983, has shown great power for those models with all asymptotically Poissonian frequencies. We extended this approach to more general situations, which include both asymptotically Gaussian and Poissonian frequencies, and established functional limit theorems. © 2011 Elsevier B.V. All rights reserved.
1. Introduction For each n consider the random vector {νni }1≤i≤N which follows a multinomial distribution with sample size n and probabilities {pni }1≤i≤N . Divisible statistics (also called separable statistics or decomposable statistics in some articles) can be defined as the sum of some functions gni of the frequencies N −
gni (νni , npni ) .
i=1
If the expectations npni of the frequencies νni are bounded, νni are asymptotically Poisson random variables. We call these frequencies asymptotically Poissonian. However, in other situations, these expectations may diverge to infinity and √ the normalized frequencies Yni = (νni − npni )/ npni converge in distribution to a Gaussian random variable. We call these frequencies asymptotically Gaussian and we consider instead the divisible statistics as sum of functions hni with arguments Yni and npni , i.e. N − i=1
hni (Yni , npni ) =
N −
gni (νni , npni ) .
i =1
Divisible statistics include many well-known and widely-used statistics, such as Pearson’s chi-square statistic with hni (Yni , npni ) = Yni2 , log-likelihood statistic with gni (νni , npni ) = 2νni log (νni /(npni )) and the so-called spectrum with gni (νni , npni ) = I{νni ∈ A}, etc. In classical models, the number of disjoint events N and the probabilities {pni } are usually fixed. As sample size n increases, all the frequencies are asymptotically Gaussian and the limit distribution of divisible statistics can be easily derived based on the asymptotic normality of the vector of normalized frequencies {Yni }.
∗
Corresponding author. Tel.: +64 4 4635233 8699. E-mail addresses:
[email protected],
[email protected] (H. Wu).
0167-7152/$ – see front matter © 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.spl.2011.03.007
H. Wu, G. Kvizhinadze / Statistics and Probability Letters 81 (2011) 1128–1135
1129
However, in models which we will call non-classical, the number of disjoint events N increases and probabilities {pni } usually change as n increases, such that the probabilities form a triangular array. The non-classical models occur in many real world applications. For example, if we are interested in the connectivity of autonomous systems which compose the global Internet routing system, then since most of them only connected to a few of others systems (CAIDA AS Relationships Dataset, 2009/12/15), Number of connections Number of ASs
1 12184
2 12929
3–10 6679
>10 1534
the number of connections of these systems are more properly considered as asymptotically Poissonian and the classical model will not be feasible. Another important application area of non-classical models is statistical linguistic, where the frequencies of words in a text corpus are of great interest (see, e.g. Baayen (2001)), and many of them are asymptotically Poissonian. In these non-classical models, the divisible statistics usually diverge, but after suitable normalization, the convergence can be achieved. However, the limiting behavior highly depends on specification of the model such as the probabilities {pni } and the relation between n and N, etc. Some important works on this stream include Ivchenko and Medvedev (1981), Medvedev (1977) and Morris (1975), etc. Although the study of the normalized divisible statistics do disclose some asymptotic properties of divisible statistics, the functional limiting behavior of partial sum process Nt 1 − Xn (t ) = √ (gni (νni , npni ) − E [gni (νni , npni )]) N i=1
provides more fruitful information. In 1983, Khmaladze (1983) proposed a new powerful approach to study this sort of problem. It was demonstrated that, (n) (n) after introducing the filtration {Hi }0≤i≤N , with Hi = σ {νnk : k ≤ i}, the partial sum process Xn can be easily decomposed into a martingale and a compensator. It was shown in Khmaladze (1983) that for a large class of divisible statistics with
|gni (νni , npni )| < ceaνni , both martingale and compensator converge in distribution to Gaussian processes. These limit theorems used the conditions of n ∼ N and sup(Npni ) < ∞, which imply that all the frequencies are asymptotically Poissonian. However, this constraint needs to be relaxed in many practical situations. For example, in a BNC spoken English corpus which contains both demographic and context-governed spoken language text, Frequency Number of words
1–10 30387
11–100 6140
101–1000 3123
>1000 676
although the majority of the words (30 387 out of 40 326) have frequencies less than 10 and hence can be treated as asymptotically Poissonian, many of them (nearly 10%) have frequencies higher than 100. Therefore we have to consider models with a mixture of asymptotically Gaussian and asymptotically Poissonian frequencies. In this paper, we intend to relax the condition sup(Npni ) < ∞ such that the existence of asymptotically Gaussian frequencies will be allowed. We will show that for the class of divisible statistics with
|hni (Yni , npni )| < bea|Yni | for some a, b > 0,
(C1)
similar martingale limit theorems can be established. 2. Preliminaries and auxiliary lemmas Before we formally establish the theorems, we shall review some properties of the frequencies {νni } and the normalized frequencies {Yni }, and establish auxiliary lemmas. Marginally, νni has a binomial distribution with sample size n and probability pni . It is necessary to assume that f (t ) = lim Npn[Nt ] n→∞
and λ(t ) = lim npn[Nt ] n→∞
(C2)
exist. A crucial but not so obvious fact is the following. Lemma 1. If inf(npni ) ≥ δ > 0,
(C3)
n,i
then for any a > 0, there is c0 (a, δ) > 0 such that for c > c0 (a, δ), sup E ea|Yni | I {|Yni | > c } ≤ 2(a + 1)e−c .
n ,i
(1)
1130
H. Wu, G. Kvizhinadze / Statistics and Probability Letters 81 (2011) 1128–1135
Proof. Apply exponential inequality (see, e.g., Shorack and Wellner (1986)) to Yni , it can be shown that for y > 0,
1 − FYni (y) = P(Yni > y) ≤ exp −
y2 2
ψ
√
y
npni
with ψ(λ) = (2/λ2 )[(1 + λ) ln(1 + λ) − λ]. Since ψ is decreasing and npni > δ ,
ψ
√
y
npni
≥ψ
y
and P(Yni > y) ≤ e
√ δ
2 − y2 ψ √y δ
. √
Then since λψ(λ) is increasing and ψ(0) = 1, if let c1 (a, δ) being the solution of yψ y/ δ /2 = a + 1, P(Yni > y) ≤ e−(a+1)y for all y > c1 (a, δ). Hence,
∫
eay dFYni (y) = eac [1 − FYni (c )] + a
y>c
∞
∫
[1 − FYni (y)]eay dy ≤ (a + 1)e−c ,
(2)
c
√
√
for c > c1 (a, δ). On the other hand, FYni (y) = 0 for y < − np and for − np ≤ y < 0
FYni (y) = P(Yni < y) ≤ exp −
y2 2
ψ
√
y
npni
,
since ψ(λ) ≥ 1 for λ < 0,
−
y2 2
ψ
√
y
npni
≤−
y2 2
2
y and P(Yni < y) ≤ e− 2 .
If letting c2 (a) = 2(a + 1), then P(Yni < y) ≤ e(a+1)y for all y < −c2 (a). And for c > c2 (a),
∫ e
−ay
y<−c
dFYni (y) = e FYni (−c ) + a ac
−c
∫
FYni (y)e−ay dy ≤ (a + 1)e−c
(3)
−∞
Combine (2) and (3), for c > c0 (a, δ) = max(c1 (a, δ), c2 (a)), we have (1).
The immediate consequence of Lemma 1 is that {bea|Yni | }n≥1, 1≤i≤N is uniformly integrable and sup E bea|Yni | ≤ M
n ,i
for some M, which depends only on a, b and δ .
(n)
Now consider the conditional distribution of νni given Hi−1 . It is again binomial, but with sample size n˜ ni = n − and probability p˜ ni = pni /(1 − If we let Fn (t ) =
rni =
n˜ ni p˜ ni npni
=
i =1
1 − Fn
νnj
pni be the distribution function defined by {pni }, which converges to continuous F (t ), and
∑Nt
1 − Fˆn
j =1
j=1 pnj ).
∑i−1
Fˆn (t ) = N1 i=1 νni be the corresponding empirical distribution function, then vn (t ) = distribution to Brownian bridge v(t ) with respect to time F (t ). Consider the ratios
∑Nt
∑i−1
√
n(Fˆn (t ) − Fn (t )) converges in
i−1 i−N1 , N
then for any T < 1 such that lim inf(1 − Fn (T )) > 0,
(4)
n
by the Kolmogorov–Smirnov theorem, we can easily derive
i−1 sup vn i−N1 √ −vn N i i−1 ≤ = Op (1). sup n(rni − 1) = sup lim inf (1 − Fn (T )) 1 − Fn N i i
(5)
n
Based on these properties, we can establish uniform integrability of {bea|Yni | }1≤i≤N in probability under conditional measures.
H. Wu, G. Kvizhinadze / Statistics and Probability Letters 81 (2011) 1128–1135
1131
Lemma 2. If conditions (C3) and sup pni → 0,
(C4)
i
are satisfied, then for any 0 < λ < 1 and ϵ > 0, there is a nϵ such that for all n > nϵ and sufficiently large c,
(n)
P sup E ea|Yni | I {|Yni | > c } |Hi−1 ≤ 2(˜a + 1)eaλ−˜c
>1−ϵ
(6)
i
√
√
with a˜ = a 1 + λ, δ˜ = (1 − λ)δ and c˜ = (c − λ) / 1 + λ. Therefore,
P sup E be
a|Yni |
i
|Hi(−n)1
˜ ≤M
>1−ϵ
(7)
˜ which depends only on a, b, δ and λ. for some M Proof. Let Y˜ni = (νni − n˜ ni p˜ ni )/ n˜ ni p˜ ni then
Yni =
√
√
rni Y˜ni +
npni (rni − 1).
Under condition (4), the Glivenko–Cantelli theorem implies that a.s
sup |rni − 1| −→ 0.
(8)
i≤NT
Under (C4) and by (5), we have
√
sup npni (rni − 1) ≤ sup
i≤NT
√
√
pni sup n(rni − 1) = op (1)
i≤NT
(9)
i≤NT
For any 0 < λ < 1 and ϵ > 0, (8) and (9) imply there exists nϵ such that for all n > nϵ ,
√ P sup |rni − 1| < λ, sup npni (rni − 1) < λ > 1 − ϵ. i≤NT
(10)
i≤NT
√ supi≤NT |rni − 1| < λ and supi≤NT npni (rni − 1) < λ imply,
√
√
) = eaλ ea 1+λ|Y˜ni | , inf(˜nni p˜ ni ) = inf(rni npni ) ≥ (1 − λ)δ = δ˜ > 0 ea|Yni | ≤ ea(
1+λ|Y˜ni |+λ
i
i
and
√
I {|Yni | > c } ≤ I
1 + λ|Y˜ni | + λ > c
= I |Y˜ni | > c˜ .
Therefore, (n)
(n)
sup E ea|Yni | I {|Yni | > c } |Hi−1 ≤ eaλ sup E ea˜ |Yni | I |Y˜ni | > c˜ |Hi−1 . i
˜
i
By applying Lemma 1, we get (6) and hence (7).
Another crucial but not well-known fact is presented in the following lemma, Lemma 3. If conditions (C3) and (C4) are satisfied, and if
k − npni ≤ cni = √1 o(1) |yni | = √ npni
pni
with o(1) independent of i, then for all i ≤ NT , Dni (yni ) =
B(k, n˜ ni , p˜ ni ) B(k, n, pni )
−1
√
= yni npni (rni − 1) + pni Op (1) + with Op (1) and op
√1
n
both independent of i.
y2ni pni Op
(1) + op
1
√
n
(11)
1132
H. Wu, G. Kvizhinadze / Statistics and Probability Letters 81 (2011) 1128–1135
Proof. For the we use p, p˜ , n˜ , y, c and r in place of pni , p˜ ni , n˜ ni , yni , cni and rni respectively. It is easy to see √ sake of√simplicity, that y p/n = o 1/ n . Let y˜ = (k − n˜ p˜ )/ n˜ p˜ . Under conditions (5), (C3) and (C4) implies,
1 1 √ 1 √ |˜y − y| = √ − 1 y − √ np(r − 1) ≤ op (1) + pOp (1) = op (1). δ r r p˜ p Under condition (4), n˜ / n ≤ 1/ [1 − Fn (T )] 1 − Fˆn (T ) = Op (1), hence p˜ p˜ 1 = op √ . y˜ ≤ |y| + y˜ − y n˜ n˜ n Therefore, we have
k
=p+y
n
1 =p+o √
p n
k
and
n
n˜
= p˜ + op
1
√
n
.
Applying Stirling’s formula, we have, B(k, n˜ , p˜ )
=
B(k, n, p)
n˜ p˜
k
k n
1−
np
1−p
1 − p˜
n˜ −k
1−
1 − nk˜
1−
k n k n˜
1+o
1
√
n
1
= eA 1 + o √
n−k
n
with
A = k ln
1 − nk˜ 1 − p˜
√ np(r − 1)2 = np(r − 1) + y np(r − 1) + + op
n˜ p˜
np
+ (n − k) ln
1−
k n
1−p
− (˜n − k) ln
+
1 2
ln
1− 1−
k n k n˜
.
Let’s consider each term. We have
k ln
n˜ p˜
np
2
1−
(n − k) ln
k n
1−p
1 − nk˜ 1 − p˜
(˜n − k) ln
√ = −y np +
y2 p 2(1 − p)2
= −˜y n˜ p˜ +
y˜ 2 p˜ 2(1 − p˜ )2
1
√
n
,
(12)
1 +o √ ,
(13)
n
+ op
1
√
n
,
(14)
and
ln
1− 1−
k n k n˜
= ln 1 +
k n˜
−
1−
k n k n˜
= p˜ − p + pop (1) + op
1
√
n
.
(15)
Since
√
y˜ n˜ p˜ − y np = np − n˜ p˜ = −np(r − 1) (12)–(15) imply that np(r − 1)2
y˜ 2 p˜ − + p˜ − p + pop (1) + op 2 2 2(1 − p) 2(1 − p˜ )2 √ 1 = y np(r − 1) + pOp (1) + y2 pOp (1) + op √
√
A = y np(r − 1) +
+
y2 p
1
√
n
n
with Op (1) and op
√1
n
independent of i. Since A → 0, eA − 1 = A + A2 /2 + o(A) implies (11).
H. Wu, G. Kvizhinadze / Statistics and Probability Letters 81 (2011) 1128–1135
1133
3. Functional limit theorems and proofs Based on the auxiliary lemmas proved in the previous section, we can establish the functional limit theorems. Since the asymptotically Gaussian frequencies exists in models, the partial sum process becomes Nt 1 − Xn (t ) = √ (hni (Yni , npni ) − E [hni (Yni , npni )]) . N i=1
Correspondingly, the martingale is Nt 1 − (n) Wn (t ) = √ hni (Yni , npni ) − E hni (Yni , npni ) |Hi−1 , N i =1
and the compensator becomes, Nt 1 − (n) Kn (t ) = √ E hni (Yni , npni ) |Hi−1 − E [hni (Yni , npni )] . N i=1
We first establish the limit theorem for the martingale part. Theorem 1. Assume the conditions (C1)–(C4) are satisfied and d
hn[Nt ] Yn[Nt ] , npn[Nt ] −→ ξ (t )
(16)
If letting w denote a standard Brownian motion on R and let W (t ) = w (τ (t )) with +
τ (t ) =
t
∫
σ 2 (s)ds, 0
with σ 2 (s) being the variance of ξ (s), then for t < T , d
Wn → W .
√
Proof. Let ξni = hni (Yni , npni ) and ηni = E [ξni |Hi−1 ], then Wn is a martingale with differences (ξni − ηni ) / N. According to Corollary 6 in Liptser and Shiryayev (1981), to prove the theorem, it is necessary and sufficient that the conditions N 1 −
N i=1
(n)
P
E (ξni − ηni )2 I{(ξni − ηni )2 > ϵ N }|Hi−1 −→ 0 ∀ϵ > 0
(α)
and Nt 1 −
N i=1
(n)
P
E (ξni − ηni )2 |Hi−1 −→ τ (t )
(β)
are satisfied. To verify (α ), it is sufficient to show that
(n)
P
sup E (ξni − ηni )2 I (ξni − ηni )2 > ϵ N |Hi−1 −→ 0.
(17)
i
Recall that under condition (C1), |ξni | ≤ bea|Yni | . By Lemma 2, consider a fixed 0 < λ < 1, for any ϵ > 0, there exists nϵ such that for all n > nϵ , (10) holds and
(n) ˜ P sup |ηni | ≤ sup E |ξni ||Hi−1 ≤ M > 1 − ϵ i
(18)
i
˜ < ∞ only depending on a, b, δ, λ. Since |ηni | ≤ M ˜ implies with some M 2 ˜ 2 ≤ 2 b2 + M ˜ 2 e2a|Yni | ≤ 2ξni2 + 2M (ξni − ηni )2 ≤ 2ξni2 + 2ηni and hence
I (ξni − ηni )2 > ϵ N ≤ I |Yni | >
1 2a
ln
ϵN ˜ 2) 2(b2 + M
=c ,
(6) implies the left side of (17) is bounded by
(n)
˜ 2 e2a|Yni | I {|Yni | > c } |Hi−1 ≤ 4 b2 + M ˜ 2 (2a˜ + 1)e2aλ−˜c , sup E 2 b2 + M i
1134
H. Wu, G. Kvizhinadze / Statistics and Probability Letters 81 (2011) 1128–1135
with a˜ and c˜ defined in Lemma 2. For sufficiently large n, N and hence c are large enough such that the right side can be arbitrarily small. Since ϵ is arbitrary, (18) holds and (α ) is satisfied. For (β ), consider the step functions
ϕn (t ) = E
2 n) ξn[Nt ] − ηn[Nt ] |H[(Nt ]−1 , P
then Lemma 2 and (16) imply ϕn (t ) −→ σ 2 (t ) and
n) (n) 2 ϕn (t ) ≤ E ξn2[Nt ] |H[(Nt ]−1 ≤ sup E ξn[Nt ] |H[Nt ]−1 < ∞ t ≤T
(n)
in probability. Obviously, supt ≤T E ξn2[Nt ] |H[Nt ]−1 is integrable with respect to t ∈ [0, T ]. By dominated convergence theorem, for t ≤ T , t
∫
P
ϕn (s)ds −→ τ (t ). 0
Hence (β ) holds and therefore the theorem is proved.
Then, the limit theorem of the compensator is established below. Theorem 2. Let k(s) = E [ξ (s)Y (s)]
K x( t ) = −
√
f (s) and define the operator
t
∫
k(s)x(s)ds. 0
If conditions (C1)–(C4) are satisfied, lim sup(Npn[Nt ] ) < Cf (t ) ∀t < T
(C5)
n
for some C < ∞ and there is some α > 0 such that
−1/2+α
−
sup pni i
1 2
ln N → ∞,
(C6)
then for t < T d
Kn → K = K
v . 1−F
Proof. Instead of Kn (t ), we consider the truncated process Nt 1 − (n) Knc (t ) = √ E ξni I {|Yni | ≤ cni } |Hi−1 − E [ξni I {|Yni | ≤ cni }] N i =1
−1/2+α/2
with cni = pni
−1/2
= pni
o (1). It can be shown that the difference Nt
1 − (n) Kn (t ) − Knc (t ) = √ E ξni I {|Yni | > cni } |Hi−1 − E [ξni I {|Yni | > cni }] N i =1 is asymptotically √negligible. (C6) implies N exp(− infi cni ) → 0. By Lemma 1, we have
[ ] Nt 1 − √ E [ξni I {|Yni | > cni }] ≤ N sup E |ξni |I |Yni | > inf cni √ i N i=1 i [ ] √ √ ≤ b N sup E ea|Yni | I |Yni | > inf cni ≤ 2b(a + 1) Ne− infi cni → 0. i
i
√
For any fixed 0 < λ < 1 and c˜ni = (cni − λ)/ 1 + λ,
√
N exp(− infi c˜ni ) → 0 as well. Similarly, by Lemma 2, we have, Nt 1 − √ (n) E ξni I {|Yni | > cni } |Hi−1 ≤ 2b(˜a + 1) Neaλ − infi c˜ni → 0 √ N i=1
H. Wu, G. Kvizhinadze / Statistics and Probability Letters 81 (2011) 1128–1135
1135
in probability. Since infi cni = (supi pni )−1/2+α/2 → ∞, p sup Kn (t ) − Knc (t ) → 0.
(19)
t ≤NT
The proof of weak convergence of Knc (t ) is based on the following equality. (n)
E ξni I {|Yni | ≤ cni } |Hi−1 − E [ξni I {|Yni | ≤ cni }] = E [ξni Dni (Yni )I {|Yni | ≤ cni }] . Since,
Nt 1 − p sup √ E ξni Yni2 I {|Yni | ≤ cni } pni −→ 0 and t ≤NT N i=1
Nt 1 − pni −→ 0. sup √ t ≤NT N i =1
Lemma 3 implies
Nt √ 1 − c sup Kn (t ) − √ E [ξni Yni I {|Yni | ≤ cni }] npni (rni − 1) = op (1) . (20) t ≤NT N i=1 √ Let kcn (t ) = E ξn[Nt ] Yn[Nt ] I |Yn[Nt ] | ≤ cn[Nt ] Npn[Nt ] and n(rn[Nt ] − 1) = −vn (t )/(1 − Fn (t )), (19) and (20) imply ∫ t vn (t ) c kn (s) sup Kn (t ) + ds = op (1). 1 − F ( t ) t ≤NT n 0 Since supt E ξn[Nt ] Yn[Nt ] I |Yn[Nt ] | ≤ cn[Nt ] is bounded in probability, under condition (C5), kcn (t ) is integrable. Further since kcn (t ) → k(t ), ∫ t ∫ t vn (t ) vn (t ) c sup kn (s) ds − k(s) ds = op (1). 1 − F ( t ) 1 − F ( t ) t ≤NT n n 0 0 Since the operator K is continuous and for t < T ,
vn ( t ) v(t ) d −→ , 1 − Fn (t ) 1 − F (t ) the theorem is proved.
Acknowledgements The authors are grateful to the Editor, an anonymous referee and Dr. Ilze Ziedins for valuable comments that helped improve the paper, and especially to Prof. Estate Khmaladze for fruitful discussions. Haizhen Wu was partially supported by the Victoria Ph.D. Scholarship. References Baayen, R.H., 2001. Word Frequency Distributions. Kluwer Academic, Dordrecht, Boston. Ivchenko, G.I., Medvedev, Y.I., 1981. Decomposable statistics and hypothesis testing for grouped data. Theory of Probability and its Applications 25, 540–551. Khmaladze, E.V., 1983. Martingale limit theorems for divisible statistics. Theory of Probability and its Applications 28 (3), 530–548. Liptser, R.S., Shiryayev, A.N., 1981. A functional central limit theorem for semimartingales. Theory of Probability and its Applications 25 (4), 667–688. Medvedev, Y.I., 1977. Separable statistics in a polynomial scheme, I., II. Theory of Probability and its Applications 22, 1–15, 607–615. Morris, C., 1975. Central limit theorems for multinomial sums. The Annals of Statistics 3 (1), 165–188. Shorack, G.R., Wellner, J.A., 1986. Empirical Processes With Applications to Statistics. John Wiley and Sons Inc.