Statistical Methodology 32 (2016) 77–90
Contents lists available at ScienceDirect
Statistical Methodology journal homepage: www.elsevier.com/locate/stamet
Homogeneity testing via weighted affinity in multiparameter exponential families Alexander Katzur, Udo Kamps ∗ Institute of Statistics, RWTH Aachen University, D-52056 Aachen, Germany
article
abstract
info
Article history: Received 25 November 2015 Received in revised form 7 April 2016 Accepted 8 April 2016 Available online 16 April 2016
Based on stochastically independent samples with underlying density functions from the same multiparameter exponential family, a weighted version of Matusita’s affinity is applied as test statistic in a homogeneity test of identical densities as well as in a discrimination problem. Asymptotic distributions of the test statistics are stated, and the impact of weights on the deviation of actual and required type I error for finite sample sizes is examined in a simulation study. © 2016 Elsevier B.V. All rights reserved.
Keywords: Homogeneity test Discrimination Exponential family Affinity Sequential order statistics Non-central chi-squared-distribution
1. Introduction As a measure of similarity or dissimilarity of probability distributions, Matusita’s affinity measure (cf., e.g., Matusita [13–15]) has been considered in the literature, such as in classification and discrimination problems for multivariate normal distributions. For two or more distributions on Ω ⊂ Rn with distribution functions F1 , . . . , FJ and respective density functions f1 , . . . , fJ with respect to (w.r.t.) measure ν , their affinity ρJ is defined by
ρJ (F1 , . . . , FJ ) =
J Ω
1/J fj (x)
dν(x).
(1)
j =1
It is known that 0 ≤ ρJ (F1 , . . . , FJ ) ≤ 1, where the upper bound is attained iff all the distribution functions coincide (almost everywhere (a.e.) w.r.t. ν ). Matusita [13,14] derived a representation for
∗
Corresponding author. E-mail address:
[email protected] (U. Kamps).
http://dx.doi.org/10.1016/j.stamet.2016.04.002 1572-3127/© 2016 Elsevier B.V. All rights reserved.
78
A. Katzur, U. Kamps / Statistical Methodology 32 (2016) 77–90
the affinity of two multivariate normal distributions and analyzed its distribution when replacing unknown parameters by arithmetic means and (or) the sample covariance matrix. Moreover, he obtained a representation for the affinity of J ≥ 2 multivariate normal distributions. He also suggested a homogeneity test on the basis of ρJ and applied it to the multivariate normal case. Toussaint [19] replaced the geometric mean of density functions in (1) by a weighted geometric mean with vector ω = (ω1 , . . . , ωJ )′ of weights, ωj > 0, 1 ≤ j ≤ J, and Jj=1 ωj = 1, i.e.,
ρJ (F1 , . . . , FJ | ω) =
J Ω j=1
fj (x)
ωj
dν(x),
(2)
taking values in the closed unit interval. Concerning asymptotic distributions of respective test statistics, Garren [9] considered Matusita’s affinity (1) in exponential families, and derived its asymptotic distribution based on plug-in maximum likelihood estimators (MLEs), provided that the numbers of observations in the J samples grow equally fast. He then applied his results to propose a simple and a two-sided hypothesis test in the one-sample situation as well as a decision rule for discriminating between two unknown distributions based on three samples. Previously, in a very general setting, Zografos [20] considered parametric probability spaces and, under certain regularity conditions, he obtained an asymptotic distribution for f -dissimilarities, which include the affinity measure, with plug-in MLEs using asymptotic normality of the MLEs. f -dissimilarities have been used in several papers for testing statistical hypotheses (cf., e.g., Morales et al. [16]). In the present paper, we aim at examining some tests for the parameters of exponential families by means of the weighted affinity (2), which, in this situation, allows for a closed form representation. This turns out to be helpful when dealing with small sample sizes. We present a homogeneity test, a simple and a two-sided test as well as a discrimination approach for a two class classification problem. For the distribution of the test statistic of the homogeneity test under the alternative, Zografos’s results can be applied here, since our exponential family setting as well as the weighted affinity are contained in his general set-up. Under the null hypothesis of homogeneity, we find distributional convergence of the test statistic to a distribution free expression that only depends on known constants and multivariate normal random vectors with zero mean and identity covariance matrix. Moreover, as a by-product, it turns out that Garren’s results for the case F1 = · · · = FJ are actually distribution free, and this applies as well to the weighted affinity. Simulations indicate that Garren’s assumption of equally fast growing samples can lead to a major deviation of the actual type I error rate from the required type I error rate based on the asymptotic distribution of the test statistic, if the finite sample sizes are actually unequal (see Section 4). We provide two modifications to overcome this problem, weights ω1 , . . . , ωJ are introduced and a relaxed assumption on the growth behavior of the sample sizes is considered, i.e., numbers of observations in different samples do not need to grow equally fast. The impact of those modifications on the actual type I error is illustrated in a simulation study. The present work is structured as follows. In Section 2, exponential families are introduced and a closed form expression for the weighted affinity of several distributions from the same exponential family of distributions is provided. In Section 3, the hypothesis tests and their test statistics that are based on the affinity (2) are examined. The asymptotic distribution of the homogeneity test statistic under both hypotheses is derived in Section 3.1. Furthermore, if the alternative is replaced by a contiguous alternative hypotheses, an asymptotic distribution can be obtained by means of the non-central χ 2 -distribution. The asymptotic distributions of the test statistic of the simple and the two-sided test and for the statistic of the discrimination approach are provided in Sections 3.2–3.4, respectively. In Section 4, the homogeneity test is analyzed by means of a simulation study. Special attention is put on the impact of the modifications, i.e., the weights and the relaxed growth assumption, as well as on the usage of the asymptotic distribution with small and moderate sample sizes. 2. Exponential families and weighted affinity We will use the following notation of an exponential family.
A. Katzur, U. Kamps / Statistical Methodology 32 (2016) 77–90
79
A family P = {f (·; α) [ν] : α ∈ Θ }, Θ ⊆ Rr , of distributions is called a r-parametric exponential family, if all these distributions have a density function of the form f (x; α) = exp α′ T (x) − κ(α) h(x)1Ω (x),
with respect to a σ -finite measure ν , where α = (α1 , . . . , αr )′ is the natural parameter, T (x) = (T1 (x), . . . , Tr (x))′ is the natural statistic, κ and h are real valued measurable functions, and h is independent of α; 1Ω denotes the indicator function on Ω . The natural parameter space Θ , comprises all α ∈ Rr , for which 0<
exp α′ T (x) h(x)1Ω (x) dν(x) < ∞.
Rn
The normalizing function κ is defined by
κ : Θ → R,
α → log
exp α′ T (x) h(x)1Ω (x) dν(x),
Rn
and it is infinitely often differentiable on the interior of Θ (cf. Lehmann and Casella [12], Theorem 5.8). If F1 , . . . , FJ are from the same exponential family P with parameters α1 , . . . , αJ , respectively, then it is easily seen that the weighted affinity is given by
ρJ (F1 , . . . , FJ |ω) = exp −
J
ωj κ(αj ) + κ
j =1
J
ω j αj
(3)
j =1
(see Garren [9] for Matusita’s affinity). We consider the exponential family P to be regular, which means that the parameter space Θ is open and that the statistics T1 , . . . , Tr are P-affine independent, i.e., for a ∈ Rr and a0 ∈ R it holds that a′ T (x) = a0
P-a.e. =⇒ a = 0, a0 = 0.
Example 1. Let, respectively, F1 , . . . , FJ be the distribution functions of the r-dimensional normal distributions Nr (µ1 , 6), . . . , Nr (µJ , 6), where µ1 , . . . , µJ are unknown r-dimensional vectors and 6 is a known positive definite (r × r )-matrix. In this situation the density function of Fj , 1 ≤ j ≤ J, is given by:
fj (x) = exp µ′j 6−1 x − i.e., κ(µ) =
1 ′ −1 µj 6 µj 2
1 (2π )−k/2 |6|−1/2 exp − x′ 6−1 x , 2
µ′ 6−1 µ. Then (3) is given by ′ J J J 1 ′ −1 −1 ρJ (F1 , . . . , FJ |ω) = exp − ωj µj 6 µj + ωj µj 6 ωj µ j . 1 2
2
j =1
j =1
j =1
By setting all the weights equal to 1/J, we obtain the result derived by Matusita [15]. In the following, for some function ζ : Rn −→ Rm , we will use the notation
∂ζ
1
∂ x1 . ∇ζ(x) = .. ∂ζ 1
∂ xn
(x)
··· ..
(x)
.
···
∂ζm (x) ∂ x1 .. . ∂ζm (x) ∂ xn
∂ 2ζ ∂ x ∂ x (x) 1 1 .. and ∇ 2 ζ (x) = . ∂ 2ζ (x) ∂ xn ∂ x1
to denote the Jacobian and the Hessian matrix (m = 1), respectively.
··· ..
.
···
∂ 2ζ (x) ∂ x1 ∂ xn .. . ∂ 2ζ (x) ∂ xn ∂ xn
(4)
80
A. Katzur, U. Kamps / Statistical Methodology 32 (2016) 77–90
3. Statistical tests By means of the weighted affinity (3), a homogeneity test regarding J vectors of model parameters is examined and a discrimination approach in Garren [9] is extended and analyzed. 3.1. Homogeneity test The idea to test homogeneity based on the affinity of several distributions was introduced by Matusita [15] and studied for multivariate normal distributions. Here, the use of a weighted version is discussed in an exponential family setting. Throughout this subsection, the following sample situation is assumed: Sampling situation 1. Let F1 , . . . , FJ be respective distribution functions of densities f1 , . . . , fJ from the same regular exponential family P. Let Fj be the distribution function with natural parameter αj = (αj 1 , . . . , αj r )′ , 1 ≤ j ≤ J. We define αˆ 1 , . . . , αˆ J to be the MLEs of α1 , . . . , αJ , where αˆ j is based on observations of the Nj independent random variables X j,1 , . . . , X j,Nj , 1 ≤ j ≤ J, and X 1,1 , . . . , X 1,N1 , X 2,1 , . . . , X 2,N2 , . . . , X J ,1 , . . . , X J ,NJ are also independent and thus so are the MLEs. For a shorter notation we define Fˆj to be Fαˆ j , i.e., the distribution of the exponential family, where the parameter is given by the MLE of αj . For results regarding the existence of MLEs in regular exponential families, we refer to Bickel and Doksum [4], ch. 2.3. In the Sampling situation 1, we consider testing homogeneity of the natural parameters α1 , . . . , αJ ; i.e., null-hypothesis H0 and alternative H1 are given by H0 : α1 = · · · = αJ
and
H1 : ∃ i ̸= j, i, j ∈ {1, . . . , J }
with αi ̸= αj .
(5)
The test statistic we use is the logarithm of the weighted affinity, ρJ (F1 , . . . , FJ |ω), where the unknown parameters are replaced by their MLEs: log ρJ (Fˆ1 , . . . , FˆJ |ω) = −
J
ωj κ(αˆ j ) + κ
J
ωj αˆ j .
(6)
j =1
j =1
For the asymptotic distribution of (6) under H1 , Theorem 3.1 in Zografos [20] can be utilized. Theorem 1. Let the Sampling situation 1 be given, and let N be such that N /Nj → cj ∈ (0, ∞) as N → ∞ for all j ∈ {1, . . . , J }. Then, (i) under H1 (see (5)),
J 2 −1 ρJ (Fˆ1 , . . . , FˆJ |ω) d ′ N log −→ Nr 0, cj V j ∇ κ(αj ) Vj , ρJ (F1 , . . . , FJ |ω) N →∞ j =1 J where V j = ωj ∇κ ω α − ∇κ(α ) . j j j j =1 √
(ii) Under H0 (see (5)), d
−2N log ρJ (Fˆ1 , . . . , FˆJ |ω) −→
N →∞
ωj
√
cj Y j −
j =1
×
J
J
√
′
ωi ci Y i
i =1
√
cj Y j −
J
√
ωi ci Y i ,
i=1 i.i.d.
where Y 1 , . . . , Y J ∼ Nr {0, Ir } and Ir is the r-dimensional identity matrix.
(7)
A. Katzur, U. Kamps / Statistical Methodology 32 (2016) 77–90
81
Proof. (i) Follows from Corollary 3.1(a) (ii) in Zografos [20] by applying the ∆-method. (ii) By a Taylor series expansion of log ρJ (Fˆ1 , . . . , FˆJ |ω) around α with α = α1 = · · · = αJ (see (6)), we obtain: 1 log ρJ (Fˆ1 , . . . , FˆJ |ω) = 2
J
′ ∇ κ(α)
ωj (αˆ j − α)
2
ωj (αˆ j − α) + R0
j=1
j=1 J 1
−
J
ωj (αˆ j − α)′ ∇ 2 κ(α) (αˆ j − α) +
2 j =1
J
Rj ,
(8)
j =1 P
where Rj , 1 ≤ j ≤ J, and R0 are the remainder terms for which NRj −→ 0, 0 ≤ j ≤ J. Since the
√ MLEs are independent and
d
ˆ j − αj ) −→ Nr 0, cj ∇ 2 κ(αj ) N (α
N →∞ −1
(see Bickel and Doksum [4],
N →∞
Theorem 5.3.5), 1 ≤ j ≤ J, we obtain
−2N log ρJ (Fˆ1 , . . . , FˆJ |ω) −→
N →∞
=
J
d
ωj cj Y j Y j − ′
j=1
J
ωj
√
cj Y j −
j =1
ωj cj Y j
′ J
J
√
ωj cj Y j
j =1
√
′
ωj cj Y j
j =1
√
cj Y j −
×
√
j =1
J
J
√
ωj cj Y j ,
j=1 i.i.d.
where Y 1 , . . . , Y J ∼ Nr {0, Ir }.
The use of introducing the weight vector ω is illustrated in Section 4 by means of a simulation study. Its choice may, in finite sample scenarios, have a strong influence on the real type I error, depending on the sample sizes and the choice of the constants cj , 1 ≤ j ≤ J. In the following remark, we discuss existing results that are closely connected with the above theorem: Remark 1. (i) Theorem 1 extends the corresponding result of Garren [9] for Matusita’s affinity. It also introduces a relaxed assumption on the growth behavior of the sample sizes by implementing the constants cj , which are implicitly assumed to be equal to one in Garren’s results. Moreover, the asymptotic distribution in Theorem 1(ii) turns out to be independent of the model parameters and therefore, the same holds for Garren’s result. (ii) The asymptotic distribution in Theorem 1(ii) may alternatively be stated as a weighted sum of independent χ 2 -distributed random variables. Following Zografos [20] Corollary 3.1(b), we find: If α1 = · · · = αJ , then J −1 d −2N ρJ (Fˆ1 , . . . , FˆJ |ω) − 1 −→ βj χr2,j , N →∞
j =1
where χr2,j are independent χ 2 -random variables with r degrees of freedom and βj are the nonzero eigenvalues of the matrix M = W − ωω′ C, with W = diag(ω) and C = diag c1 , . . . , cJ . Now, since W − ωω′ is singular, 0 is an eigenvalue of M. However, the other eigenvalues do not admit an explicit representation, in general. If ωi ci = ωj cj for some pair (i, j) with i ̸= j, then ωi ci is an eigenvalue of M, this can be seen by noticing that the eigenvalues of M coincide with those of W−1 MW = IJ − 11′ W CW, where IJ
denotes the identity matrix of dimension J, 1 = (1, . . . , 1)′ ∈ RJ , and IJ − 11′ W CW − ωi ci IJ is a singular matrix. If, for one 1 ≤ i ≤ J, ωi ci ̸= ωj cj for all 1 ≤ j ≤ J, j ̸= i, then the latter matrix is of full rank, and thus ωi ci is no eigenvalue of M. Actually, the multiplicity of the eigenvalue ωi ci is
82
A. Katzur, U. Kamps / Statistical Methodology 32 (2016) 77–90
equal to |{1 ≤ j ≤ J : ωi ci = ωj cj }| − 1. Thus, if ωj = cj−1 , 1 ≤ j ≤ J, then the matrix M has J − 1 times eigenvalue 1 and once the eigenvalue 0. Hence, for the eigenvalues λ of M that are unequal to any of ωi ci , 1 ≤ i ≤ J, WC − λIJ is regular. Therefore, we find with A = λC−1 − W:
M − λIJ = (−1)J A + ωω′ |C| = (−1)J |A| 1 + ω′ A−1 ω |C|, which means that an eigenvalue λ is a solution of 1 + ω′ A−1 ω = 1 +
J
ωj2
j =1
cj
λ − ωj cj
= 0.
There is a scenario, where (7) is not only asymptotically true, but also for finite sample sizes N1 , . . . , NJ ∈ N: Example 2. Matusita [15] suggested to deal with different sample sizes in such a way that Fˆj is plugged into the affinity Nj times, 1 ≤ j ≤ J, i.e., he considered
ρN (Fˆ1 , . . . , Fˆ1 , Fˆ2 , . . . , Fˆ2 , . . . , FˆJ , . . . , FˆJ ) = ρJ (Fˆ1 , . . . , FˆJ |ω∗ ), N1
N2
NJ
′ N , . . . , NJ . He analyzed the multivariate normal case, where Fj is the distribution function of Nr (µj , 6j ), 1 ≤ j ≤ J, and obtained for the case of known 6 = 61 = · · · = 6J and estimated µ1 , . . . , µJ that ′ J J J ′ ∗ ∗ − 1 ∗ ∗ − 1 ˆj , ˆj 6 ˆ j6 µ ˆj −N ωj µ ωj µ −2N log ρJ (Fˆ1 , . . . , FˆJ |ω ) = N ωj µ where N = N1 + · · · + NJ and ω∗ =
N1 N
j =1
j =1
ˆj = where µ
1 Nj
Nj
l =1
j =1
X j,l is the MLE of µj , 1 ≤ j ≤ J (cf. (4)). Matusita mentioned that, under H0 , this
statistic has a χ -distribution. Indeed, for a general ω, we obtain after some algebra that 2
d
−2N log ρJ (Fˆ1 , . . . , FˆJ |ω) =
J −1 j=1
d
βj χr2,j = ∗ χ(2J −1)r , ω=ω
′ ′ where the βj ’s are the non-zero eigenvalues of the matrix (W − ωω ) C, with W = diag(ω), C = diag NN , . . . , NN , and the χr2,j are independent χ 2 -random variables with r degrees of freedom 1
J
(cf. Remark 1). In case of small sample sizes, it may be more appropriate to use the exact distribution of the expression in (6) instead of applying Theorem 1 (see Section 4). Example 3 shows another set-up. Example 3. Sequential order statistics (SOSs) have been introduced in Kamps [11,10] to model the failure times of components as well as the system failure time of some k-out-of-n system, where component failures may affect remaining lifetimes (see also, e.g., Cramer and Kamps [7], Balakrishnan et al. [2]). Let the components’ failure times be denoted by the r-dimensional vector x = (x1 , . . . , xr )′ with r = n − k + 1. Following Bedbur et al. [3], the joint density function fα of SOSs X1∗ , . . . , Xr∗ , say, based on some baseline distribution function F with density function f , forms a regular r-parametric exponential family with fα (x) = exp{α′ T (x) − κ(α)}h(x)1Rr< (x),
λr -a.e.,
A. Katzur, U. Kamps / Statistical Methodology 32 (2016) 77–90
where λr is the r-dimensional Lebesgue measure, Θ = Rr+ , κ(α) = − n ln(1 − F (x1 )), Ti (x1 , . . . , xr ) = (n − i + 1) ln h(x) =
n!
(n − r )!
r
f (xi )
1 − F ( xi ) i =1
1 − F (xi )
1 − F (xi−1 )
,
r
i=1
83
ln(αi ), T1 (x1 , . . . , xr ) =
2 ≤ i ≤ r,
,
Rr< = {x = (x1 , . . . , xr )′ : F −1 (0) < x1 < · · · < xr < F −1 (1)}. For representations of marginal distributions, we refer to Cramer and Kamps [8] and Cramer [6]. The model parameters α1 , . . . , αr describe effects of previous failures in such a way that, successively, after the jth component failure, the underlying hazard rate of all remaining components is supposed to change from αj f /(1 − F ) to αj+1 f /(1 − F ), 1 ≤ j ≤ r − 1. The rth component failure coincides with the system failure. In Sampling situation 1, we may be interested in testing whether we have identical underlying distribution functions F1 , . . . , FJ in the J systems under test, which amounts to testing (5). This homogeneity test may be used to test the hypothesis that different k-out-of-n systems can be described by the same distributional structure. In terms of SOSs the test statistic (6) is given by
−2N log ρJ (Fˆ1 , . . . , FˆJ |ω) = −2N
J r i=1
J ωj ln αˆ ji − ln ωj αˆ ji
j =1
j =1 J
ωj
αˆ ji r j =1 = −2N ln J . i=1 ωj αˆ ji j =1
Since the MLEs of the elements of the parameter vectors of SOSs are independent and are furthermore indep.
inverse-gamma distributed, i.e. αˆ ji ∼ IΓ (Nj , αji Nj ), 1 ≤ j ≤ J, 1 ≤ i ≤ r, (see Cramer and Kamps [7], Theorem 4.2), and since
αˆ ji αji
∼ IΓ (Nj , Nj ), the above statistic is, under H0 , distribution free for any
N1 , . . . , NJ ∈ N. SOSs can also be interpreted as common order statistics from a sample of certain exchangeable random variables, which can be applied to describe the lifetimes of components in a coherent system; i.e., the dependence structure between the system components is induced by the joint distribution of SOSs (cf. Burkschat [5], Navarro and Burkschat [17]). Following Pardo [18] p. 414, it is convenient to also look at Pitman-type local analysis, in order not to obtain trivial asymptotic powers that are all equal to one. Therefore, we may modify the alternative hypothesis of (5) to the contiguous alternative hypotheses dJ d1 H1,N : α1 = α + √ , . . . , αJ = α + , N1 NJ for d 1 , . . . , d J ∈ Rr and αi ̸= αj for at least two distinct indices. Then the asymptotic distribution under the alternative can be obtained by means of the non-central χ 2 -distribution. Theorem 2. Let the Sampling situation 1 be given, and let N be such that N /Nj → cj ∈ (0, ∞) as N → ∞ for all j ∈ {1, . . . , J }. Then, under H1,N , we find: d
−2N log ρJ (Fˆ1 , . . . , FˆJ |ω) −→
N →∞
J k=1
λk χr2,bk ,
84
A. Katzur, U. Kamps / Statistical Methodology 32 (2016) 77–90
where the χr2,bk , 1 ≤ k ≤ J, are independent, non-central χ 2 -distributed with non-centrality parameter
2 J 2 1/2 bk = d li q ∇ κ(α) l=1 l,k 2
and r degrees of freedom, λk are the eigenvalues of the matrix C1/2 W − ωω′ C1/2 (cf. Remark 1), and
Q = ql,k 1≤l,k≤J is an orthogonal matrix such that Q′ C1/2 W − ωω′ C1/2 Q = diag (λ1 , . . . , λJ ).
Proof. First, we note that, for 1 ≤ j ≤ J, the following assertion holds:
√ ˆj − α = N α
N Nj
+
N Nj
Nj
αˆ j − α + d
d j −→ Nr
dj
√
N →∞
Nj
−1
cj d j , cj ∇ 2 κ(α)
.
(9)
Then, by a Taylor series expansion around α, we obtain Eq. (8), and with (9) we deduce:
Z1
d − 2N log ρJ (Fˆ1 , . . . , FˆJ |ω) −→ (Z ′1 , . . . , Z ′J )B ... , N →∞
where
ZJ B = C1/2 ⊗ Ir
W − ωω′ ⊗ Ir
C1/2 ⊗ Ir = C1/2 W − ωω′ C1/2 ⊗ Ir ,
and ⊗ denotes the Kronecker product, Z j ∼ Nr
1/2 ∇ 2 κ(α) d j , Ir , 1 ≤ j ≤ J, and Z 1 , . . . , Z J are
independent. Moreover, with Q∗ = Q ⊗ Ir , we find Q∗ BQ∗ = Q′ C1/2 W − ωω′ C1/2 Q ⊗ Ir = diag(λ1 , . . . , λJ ) ⊗ Ir , ′
and
J
Z1 l=1 .. Q . ∼ NJr J ZJ
1/2
ql,1 ∇ 2 κ(α)
.. .
∗′
ql,J ∇ 2 κ(α)
1/2
dl
dl
, IJr .
l =1
Using Theorem 3.3.5 in Anderson [1] completes the proof.
3.2. Simple hypotheses The weighted affinity (2) may also be applied to generalize the results of Garren [9] for a test problem with simple hypotheses. In this subsection, we consider the following sampling situation: Sampling situation 2. Let, as in Sampling situation 1, F , F1 and F2 be distribution functions with respective densities f , f1 and f2 from the same regular exponential family P with natural parameters α, α1 and α2 , respectively, where α is unknown. Let, moreover, X = (X 1 , . . . , X N ), be an i.i.d. sample from F , ˆ denote the MLE of α. and let α
A. Katzur, U. Kamps / Statistical Methodology 32 (2016) 77–90
85
Based on X , we consider the simple hypothesis test H0 : α = α1
vs. H1 : α = α2
(10)
based on the statistic
ˆ = log Q α|ω
ρ2 (Fˆ , F2 |ω) = ω2 [−κ(α2 ) + κ(α1 )] + κ(ω1 αˆ + ω2 α2 ) − κ(ω1 αˆ + ω2 α1 ), ρ2 (Fˆ , F1 |ω)
where H0 is rejected if Q (·|ω) is too large. ˆ The following theorem provides the asymptotic distribution of the statistic Q α|ω under H0 , as well as under H1 . It is an extension of Garren [9], Theorem 5.1, to the weighted affinity (2). Theorem 3. Under Sampling situation 2, and for fixed α1 , α2 ∈ Θ , α1 ̸= α2 , we find (i) Q (x|ω) is strictly monotonically increasing in the direction of α2 − α1 . (ii) ω1 1 1
ω1
√
d
ˆ N Q (α|ω) − Q (α1 |ω) −→ N1 {0, D(α1 |ω)}, under H0 , and
√
N →∞ d
ˆ N Q (α|ω) − Q (α2 |ω) −→ N1 {0, D(α2 |ω)}, under H1 ,
N →∞
−1
where D(α|ω) = {b(α)}′ ∇ 2 κ(α)
{b(α)}, and b(α) = ∇κ(ω2 α1 + ω1 α) − ∇κ(ω1 α + ω2 α2 ).
Proof. (i) Noticing ∇ Q (x|ω) = ω1 ∇κ(ω1 x + ω2 α2 ) − ω1 ∇κ(ω1 x + ω2 α1 ), a Taylor series expansion around ω1 x + ω2 α1 leads to 1
ω1 ω2
(α2 − α1 )′ ∇ Q (x|ω) = (α2 − α1 )′ ∇ 2 κ(b) (α2 − α1 ) = (α2 − α1 )′ Covb [T (X )](α2 − α1 ) > 0,
since the exponential family is regular (cf. Bickel and Doksum [4], Theorem 1.6.4). Herein, b is on the straight line connecting ω1 x + ω2 α1 and ω1 x + ω2 α2 and therefore an element of Θ . ˆ (ii) By a Taylor series expansion of Q (α|ω) around α1 , assuming H0 is true, we find
ˆ Q (α|ω) = Q (α1 |ω) + [∇ Q (α1 |ω)]′ (αˆ − α1 ) + R √ with R being a remainder term satisfying as in the proof of Theorem 1(ii).
P
NR −→ 0. The rest of the proof uses similar arguments N →∞
ˆ Hence, for N large enough, a critical value for the hypothesis test (10) based on the statistic Q α|ω can be obtained and the power of the test can be derived using part (ii) of the above theorem. Due to the monotonicity of the statistic, we find in the one-dimensional case (cf. Garren [9], Proposition 5.1):
Corollary 1. For a regular one-parametric exponential family with unknown parameter α , and given the MLE exists, the test statistic Q (α|ω) ˆ leads to the uniformly most powerful level-α -test for (10). 3.3. Two-sided parameter test Also, in Sampling situation 2 (F2 is not needed here), a two-sided test for the unknown parameter
α is near at hand (cf. Garren [9] for the common affinity). For some fixed α0 , we consider testing H0 : α = α1
vs. H1 : α ̸= α1
by means of a single sample X and the test statistic
ˆ G(α|ω) = log ρ2 Fˆ , F1 | ω
ˆ − ω2 κ(α1 ) + κ(ω1 αˆ + ω2 α1 ). = −ω1 κ(α) We find the following result to construct an asymptotic test.
86
A. Katzur, U. Kamps / Statistical Methodology 32 (2016) 77–90
Theorem 4. Let the Sampling situation 2 be given, and let a regular exponential family be given with statistic G(·|ω). Then: (i) For a fixed value α1 ∈ Θ and x ∈ Θ , G(x|ω) is strictly monotonically decreasing in the direction x − α1 . (ii) If α = α1 , then 2
d
ˆ N G(α|ω) −→ χr2 . N →∞ ω1 ω2 (iii) If α ̸= α1 , then √ d −1 N ˆ G(α|ω) − log ρ2 (F , F1 |ω) −→ N1 0, b′ ∇ 2 κ(α) b , N →∞ ω1 where b = −∇κ(α) + ∇κ(ω1 α + ω2 α1 ). −
Proof. (i) With ∇ G(x|ω) = −ω1 ∇κ(x) + ω1 ∇κ(ω1 x + ω2 α1 ) and by a Taylor series expansion of ∇κ(x) around ω1 x + ω2 α1 , we obtain: 1
ω1 ω2
(x − α1 )′ ∇ G(x|ω) = −(x − α1 )′ ∇ 2 κ(b)(x − α1 ) < 0,
where b lies on the straight line connecting x and ω1 x + ω2 α1 . (ii) and (iii) follow analogously to the proofs of Theorem 3(ii) and Theorem 1(ii), respectively, by Taylor series expansion around the true underlying parameter and then using the asymptotic normality of the MLE. Therefore, the test rejects H0 , if the value of the test statistic is too small and, for N large enough, a critical value can be obtained using (ii) of the above theorem. 3.4. Discriminant problem In this section, we assume Sampling situation 1 with J = 3, i.e., we have three distributions F1 , F2 and F3 with respective parameters α1 , α2 and α3 . Suppose furthermore that we want to decide whether α3 = α1 or α3 = α2 for unknown α1 , α2 and α3 . By means of the weighted affinity, a generalization of Garren’s suggested decision rule is given by
ξω : Θ −→ {F1 , F2 }, 3
(αˆ 1 , αˆ 2 , αˆ 3 ) →
F1 , F2 ,
if ρ2 (Fˆ1 , Fˆ3 |ω) ≥ ρ2 (Fˆ2 , Fˆ3 |ω), if ρ2 (Fˆ2 , Fˆ3 |ω) > ρ2 (Fˆ1 , Fˆ3 |ω).
In order to provide an approximate formula for the probability of false decision by means of this rule, we introduce an equivalent decision rule based on U (Fˆ1 , Fˆ2 , Fˆ3 |ω) := log
ρ2 (Fˆ2 , Fˆ3 |ω) ρ2 (Fˆ1 , Fˆ3 |ω)
= κ(ω1 αˆ 2 + ω2 αˆ 3 ) − κ(ω1 αˆ 1 + ω2 αˆ 3 ) − ω1 κ(αˆ 2 ) − κ(αˆ 1 ) , and decide in favor of F1 , if U (.|ω) ≤ 0, and in favor of F2 , if U (.|ω) > 0. It is easy to show that the MLEs converge P-a.s. to the true parameters, when the numbers of observations N1 , N2 and N3 approach infinity (MLE representation, e.g., from Bickel and Doksum [4] Theorem 2.3.1, and the law of large numbers). Thus, we obtain P-a.s.
U (Fˆ1 , Fˆ2 , Fˆ3 |ω) −→
N →∞
log ρ2 (F2 , F1 |ω) < 0, − log ρ2 (F1 , F2 |ω) > 0,
if F3 = F1 , if F3 = F2 ,
i.e., the decision is asymptotically P-a.s. right. The following theorem extends a result of Garren [9] regarding the asymptotic normality of U (Fˆ1 , Fˆ2 , Fˆ3 |ω).
A. Katzur, U. Kamps / Statistical Methodology 32 (2016) 77–90
87
Theorem 5. Given the Sampling situation 1, and let N be such that N /Nj → cj ∈ (0, ∞) as N → ∞ for all j ∈ {1, 2, 3}. Then we find the following asymptotic results: (i) If F3 = F1 , then
√
d
N U (Fˆ1 , Fˆ2 , Fˆ3 |ω) − log ρ2 (F2 , F1 |ω) −→ N1 0, V (2) (α1 , α2 |ω) .
N →∞
(ii) If F3 = F2 , then
√
d
N U (Fˆ1 , Fˆ2 , Fˆ3 |ω) + log ρ2 (F1 , F2 |ω) −→ N1 0, V (1) (α2 , α1 |ω) .
N →∞
−1
−1
Herein V (i) (x, y |ω) = ω12 ci [b1 (y , x)]T ∇ 2 κ(y ) b1 (y , x) + ω22 c3 [b2 (x, y )]T ∇ 2 κ(x) where b1 (y , x) = −∇κ(y ) + ∇κ(ω1 y + ω2 x), b2 (x, y ) = −∇κ(x) + ∇κ(ω1 y + ω2 x).
b2 (x, y ),
ˆ 2 + ω2 αˆ 3 ), κ(ω1 αˆ 1 + ω2 αˆ 3 ), κ(αˆ 2 ) and κ(αˆ 1 ) around Proof. (i) A Taylor series expansion of κ(ω1 α ω1 α2 +ω2 α1 , ω1 α1 +ω2 α3 = α1 , α2 and α1 , respectively, and then using the asymptotic normality of the MLEs as before, leads to the assertions. (ii) Part (ii) is along the same lines. Following the results of the simulation study in Section 4 for the homogeneity test from Section 3.1, we expect that introducing the constants ci will lead to a better fit of the asymptotic result for finite sample scenarios as compared to Garren [9], Theorem 5.3, where the ci were chosen to be equal to one. Using the above theorem, the probability of false decision is approximately given by:
P (false decision |F3 = F1 ) = P U (Fˆ1 , Fˆ2 , Fˆ3 |ω) > 0|F3 = F1
√ ≈Φ
N log ρ2 (F2 , F1 |ω)
V (2) (α1 , α2 |ω)
.
If F3 = F2 just interchange the role of F1 , α1 and F2 , α2 in the argument of Φ and change the superscript of V . Note that the above formula still depends on the unknown parameters. It is also possible to calculate the asymptotic distributions if either α1 or α2 is known, by just plugging in the true parameter for the respective MLE into the proof. If both α1 and α2 are known, then we are back in the simple hypothesis case with switched weights. 4. Simulation study We examine the influence of the weights on the type I errors of (5) using the test statistic (6), when the numbers of observations from each of the distributions differ considerably. We apply the test statistic for the case of J = 3 and F1 , F2 , F3 being univariate normal distributions with equal mean µ and equal standard deviation σ . The actual values of µ and σ do not have to be fixed, because under H0 and for univariate normal distributions, the test statistic is distribution free for finite sample sizes. The critical value considered in the simulations is the 0.95-quantile of the asymptotic distribution under the null hypothesis (see ((7))), and we reject H0 if the value of the statistic exceeds this quantile. Figs. 1 and 2 show contour plots of the simulated type I errors (in percent), where the axes indicate the weights ω1 and ω2 and the crosses mark the points of equal and relative weights, i.e., the spots where ω1 = ω2 = 1/3 and ω1 = N1 /(N1 + N2 + N3 ), ω2 = N2 /(N1 + N2 + N3 ), respectively. Using these relative weights in the case of unequal sample sizes has already been proposed by Matusita [15] (cf. Example 2). The contour plots are based on the type I errors that are simulated for each point (ω1 , ω2 ) lying on a grid over the unit square, where ω1 , ω2 ∈ {1/30, 2/30, . . . , 28/30} s.t. ω3 = 1 − ω1 − ω2 > 0. In Fig. 1, the quantile and the type I errors are based on 1,000,000 repetitions, each, and in Fig. 2 on 10,000,000 repetitions. In the sampling situation considered by Garren [9], where all the Nj , 1 ≤ j ≤ J, are assumed to grow equally fast with N, i.e., cj = 1, 1 ≤ j ≤ J, the weights have a major influence on the type I
88
A. Katzur, U. Kamps / Statistical Methodology 32 (2016) 77–90
(a) N1 = 10, N2 = 50, N3 = 100.
(b) N1 = 100, N2 = 500, N3 = 1000.
Fig. 1. Simulated type I errors in percent for different weight vectors (ω1 , ω2 , 1 − ω1 − ω2 ), all cj = 1.
(a) N1 = 10, N2 = 50, N3 = 100.
(b) N1 = 100, N2 = 500, N3 = 1000.
Fig. 2. Simulated type I errors in percent for different weight vectors (ω1 , ω2 , 1 − ω1 − ω2 ), cj = N /Nj .
errors, as can be seen in Fig. 1. As the asymptotic result only provides a value for N, if all the Nj are equal, we chose it to be the geometric mean over the sample sizes. We find that the relative weights (left cross), produce a smaller type I error than equal weights (right cross). For N being the geometric mean over N1 , N2 , N3 , we also recognize that the type I error of the relative weights is much closer to 0.05 than the type I error of equal weights, but of course, this can be changed by a different choice of N. If all Nj = 100, then the type I error varies only slightly between 0.053 and 0.057. Let now N = N1 + N2 + N3 and assume that cj = N /Nj , 1 ≤ j ≤ J, i.e., the proportions between the sample sizes remain the same as N approaches infinity. Then Fig. 2 shows that the weights become less influential when we include the cj ’s. We also note that, in contrast to Fig. 1, the type I errors do not tend to zero for certain choices of the weights. Fig. 2(a) shows that for small sample sizes choosing different weights still leads to different type I errors, but choosing the weights relatively to the sample sizes,
A. Katzur, U. Kamps / Statistical Methodology 32 (2016) 77–90
89
Fig. 3. Asymptotic cdf and exact cdf’s for different sample sizes.
i.e., wj = Nj /N (left cross, type I error 0.122), leads to type I errors further away from 0.05 than choosing them to be all equal (right cross, type I error 0.106). In Fig. 2(b) we see that this effect is preserved (relative weights 0.057, equal weights 0.055), but that the simulated type I errors only vary from 0.052 to 0.059. Hence, the influence of the weights on the type I error is already very small. Following Remark 1(ii), a logical choice is then the relative weights, because then the asymptotic distribution turns out to be χJ2−1 . Summarizing we can say that inclusion of the parameters cj , 1 ≤ j ≤ J, improves the test asymptotics w.r.t. the type I error considerably. It should be noted that, if the sample sizes are small but differ considerably in magnitude, then the exact distribution should be utilized, rather than the asymptotic one (see Examples 2 and 3). In the above situation with J = 3 and normal distributions, Fig. 3 shows graphs of the asymptotic cdf of the test statistic under H0 as in (7) as well as exact cdf’s for the choices (N1 , N2 , N3 ) = m · (2, 10, 20), m = 1, 2, 5. Even for m = 5 with sample sizes N1 = 10, N2 = 50 and N3 = 100, we observe a considerable deviance of exact and asymptotic cdf. The effect on the corresponding type I error is illustrated in Fig. 4, where the type I error is plotted against the factor m by using the asymptotic cdf. It is obvious that, even for relatively large sample sizes (e.g., m = 20), the real type I error exceeds the required error 0.05 noticeably. 5. Discussion Asymptotic distributions of test statistics based on a weighted affinity under the null hypothesis and under the alternative can be stated for a homogeneity test as well as for a discriminant problem, while assuming that all proportions of sample sizes converge to a finite positive number. For given sample sizes and their asymptotic proportions, simulated contour plots w.r.t. weights in Matusita’s affinity show their influence on type I errors of an asymptotic homogeneity test, when the sample sizes differ significantly in magnitude. When applying the affinity in a homogeneity test within a multiparameter exponential family without accounting for different sample sizes in the asymptotic distribution, i.e. when assuming that all sample sizes grow equally fast, the use of the weighted version is recommended in order to realize an actual type I error that is close to the required one in a given situation. When one accounts for different sample sizes, the use of the weighted version is also recommended in order to obtain a simple asymptotic distribution. Weights relatively chosen according to sample sizes seem to be a reasonable choice in both cases. Related tests for simple hypotheses and two-sided tests can also be addressed. Due to problems in controlling type I errors in the statistical tests under consideration when using asymptotic distributions of test statistics, exact distributions should be used in small sample cases.
90
A. Katzur, U. Kamps / Statistical Methodology 32 (2016) 77–90
Fig. 4. Type I error based on asymptotic cdf.
Acknowledgments The authors would like to thank two referees and an associate editor for their careful reading and helpful comments. References [1] T.W. Anderson, An Introduction to Multivariate Statistical Analysis, third ed., John Wiley and Sons, New Jersey, 2003. [2] N. Balakrishnan, E. Beutner, U. Kamps, Order restricted inference for sequential k-out-of-n systems, J. Multivariate Anal. 99 (7) (2008) 1489–1502. [3] S. Bedbur, E. Beutner, U. Kamps, Generalized order statistics: An exponential family in model parameters, Statistics 46 (2) (2012) 159–166. [4] P.J. Bickel, K.A. Doksum, Mathematical Statistics: Basic Ideas and Selected Topics Volume 1, second ed., Prentice Hall, New Jersey, 2001. [5] M. Burkschat, Systems with failure-dependent lifetimes of components, J. Appl. Probab. 46 (2009) 1052–1072. [6] E. Cramer, Hermite interpolation polynomials and distributions of ordered data, Stat. Methodol. 6 (2009) 337–343. [7] E. Cramer, U. Kamps, Sequential k-out-of-n systems, in: N. Balakrishnan, C.R. Rao (Eds.), Advances in Reliability, in: Handbook of Statistics, vol. 20, Elsevier, Amsterdam, 2001, pp. 301–372. [8] E. Cramer, U. Kamps, Marginal distributions of sequential and generalized order statistics, Metrika 58 (2003) 293–310. [9] S.T. Garren, Asymptotic distribution of estimated affinity between multiparameter exponential families, Ann. Inst. Statist. Math. 52 (3) (2000) 426–437. [10] U. Kamps, A concept of generalized order statistics, J. Statist. Plann. Inference 48 (1995) 1–23. [11] U. Kamps, A Concept of Generalized Order Statistics, Teubner, Stuttgart, 1995. [12] E. Lehmann, G. Casella, Theory of Point Estimation, second ed., Springer, New York, 1998. [13] K. Matusita, A distance and related statistics in multivariate analysis, in: P.R. Krishnaiah (Ed.), Multivariate Analysis, Academic Press, New York, 1966, pp. 187–200. [14] K. Matusita, Classification based on distance in multivariate Gaussian cases, in: Proc. Fifth Berkeley Symposium on Math. Statist. Prob., Univ. of California Press, Berkeley, 1967, pp. 299–304. [15] K. Matusita, On the notion of affinity of several distributions and some of its applications, Ann. Inst. Statist. Math. 19 (1) (1967) 181–192. [16] D. Morales, L. Pardo, K. Zografos, Informational distances and related statistics in mixed continuous and categorical variables, J. Statist. Plann. Inference 75 (1) (1998) 47–63. [17] J. Navarro, M. Burkschat, Coherent systems based on sequential order statistics, Naval Res. Logist. 58 (2011) 123–135. [18] L. Pardo, Statistical Inference Based on Divergence Measures, Chapman & Hall/CRC, Boca Raton, 2006. [19] G.T. Toussaint, Some properties of Matusita’s measure of affinity of several distributions, Ann. Inst. Statist. Math. 26 (1974) 389–394. [20] K. Zografos, f-dissimilarity of several distributions in testing statistical hypotheses, Ann. Inst. Statist. Math. 50 (2) (1998) 295–310.