Journal of Statistical Planning and Inference 110 (2003) 133–146
www.elsevier.com/locate/jspi
Iterative weighted least-squares estimates in a heteroscedastic linear regression model Kiyoshi Inoue Department of Economics, Faculty of Economics, Fukushima University, Kanayagawa 1, Fukushima-shi, Fukushima 960-1296, Japan Received 21 July 1998; received in revised form 20 March 2001; accepted 7 August 2001
Abstract The aim of this study is to improve the e0ciency of weighted least-squares estimates for a regression parameter. An iterative procedure, starting with an unbiased estimate other than the unweighted least-squares estimate, yields estimates which are asymptotically more e0cient than the feasible generalized least-squares estimate when errors are spherically distributed. The result has an application in the improvement of the Graybill–Deal estimate of the common mean of c 2001 Elsevier Science B.V. All rights reserved. several normal populations. MSC: 62J05; 62F12 Keywords: Heteroscedastic linear regression; Iterative procedure; Replication; Asymptotic variance; Common mean; Graybill–Deal estimate; Spherical distribution
1. Introduction We consider a heteroscedastic linear regression model with replication: yij = xijt + ij
(j = 1; 2; : : : ; ni ; i = 1; : : : ; k):
Here, xij are p × 1 design vectors, is an unknown vector of interest, ij is the jth component of an ni × 1 random vector Ei (j = 1; 2; : : : ; ni ), and yij are the responses. We assume that Ei ’s are mutually independent, and that each Ei has a mean zero and a variance i2 Ini , where i2 is unknown (i = 1; : : : ; k). We know that the generalized least-squares estimate (GLS) of is BLUE, i.e. its k variance attains the lower bound ( i=1 i−2 Xit Xi )−1 , where Xit = (xi1 ; : : : ; xini ). The estimate is not available, however, because its expression includes the inverses of unknown error variances. Replacing 12 ; : : : ; k2 (or 1−2 ; : : : ; k−2 ) with the corresponding estimates, we may form an alternative estimate of . A plausible choice of estimates c 2001 Elsevier Science B.V. All rights reserved. 0378-3758/01/$ - see front matter PII: S 0 3 7 8 - 3 7 5 8 ( 0 1 ) 0 0 2 8 5 - 3
134
K. Inoue / Journal of Statistical Planning and Inference 110 (2003) 133–146
of error variances is given by
ˆ2i = ||yi − Xi (Xit Xi )− Xit yi ||2 =(ni − ri ); where || · || denotes the Euclidean norm, ri the rank of Xi , and yit = (yi1 ; : : : ; yini ). The ˆ2i is unbiased, while its inverse generally gives a biased estimate of i−2 (e.g. −2 E( ˆ−2 i ) = i (ni − ri )=(ni − ri − 2) when Ei is normally distributed). If we intend to use a slightly generalized weight (e.g. a weight based on an unbiased estimate of i−2 ) rather than the inverse of the estimate of i2 when forming an estimate of , then a possible choice of weighting matrix is ˆ−2 multiplied by a constant matrix (e.g. i
ˆ−2 (n − r − 2)=(n − r )I ). By introducing a symmetric and positive deHnite ni × ni i i i i ni i −2 matrix Wi , then giving a weight ˆi Wi to ith point, we can treat a class of choices including those stated above. This results in considering a weighted least-squares estimate (WLS) expressed by −1 k k −2 t −2 t ˆw =
ˆi Xi Wi Xi
ˆi Xi Wi yi : i=1
i=1
In particular, we shall call ˆw corresponding to Wi = Ini (i = 1; : : : ; k) the feasible GLS estimate (FGLS), and denote it by ˆFG . Since the FGLS appears plausible, we might expect it to be accurate and probably most e0cient in the class of ˆw -type estimates. In fact, it is di0cult to obtain Hnite sample behaviour of ˆw except that it is unbiased, and consequently, as yet, little is known of accuracy of ˆFG . A large sample behaviour of the FGLS under normality assumption is, however, comparatively easy to derive. We can observe that ˆFG is asymptotically e0cient when ni −ri → +∞ (i = 1; : : : ; k) using the consistency of ˆ2i . An asymptotic theory in a case where k is large but ni is small has been investigated by Carroll and Cline (1988), who pointed out that ˆFG has an asymptotic variance k −1 : m−3 −2 t ˆ
X Xi V (FG ) = (1.1) m − 5 i=1 i i : when ni = m(¿ 6) and ri = 1 (i = 1; : : : ; k), where the relation “=” between two matrices Ak and Bk denotes that limk→+∞ (t Ak =t Bk ) = 1 for all non-zero vector . Relation (1.1) indicates that there is room for asymptotic improvement of ˆFG when numbers of replicates at all points i = 1; : : : ; k are assumed to be small. Fuller and Rao (1978) employed the ordinary least-squares estimate (OLS) ˆL as an initial estimate of , then regarded the average of squared residuals from the estimated responses xijt ˆL (j = 1; 2; : : : ; ni ) as an estimate of i2 . From delicate calculations they obtained the asymptotic variance of a resulting WLS. Shao (1989) relaxed the assumptions — normality about error vectors, and convergence assumptions about matrices related to design matrix — in Fuller and Rao (1978). In Shao (1989), the normality was replaced by symmetry about origin with several moment conditions, and the convergence by an assumption of divergence at a moderate rate.
K. Inoue / Journal of Statistical Planning and Inference 110 (2003) 133–146
135
However, comparison between ˆFG considered here and a WLS derived from the OLS is intractable because the form of asymptotic variance of the WLS diMers from that of ˆFG (see Theorem 3.1 in Shao, 1989 or Theorem in Fuller and Rao, 1978), and accordingly it is hard to see whether ˆFG is improved by the WLS based on ˆL . Inoue (1999) employed ˆFG as an initial estimate of , and showed that a resulting ˆ ˆ )) is asymptotically normal with variance WLS (say ( FG ˆ ˆ )) =: V (( FG
m−3 6(m + 2) − m − 5 m2 (m − 5)
k i=1
i−2 Xit Xi
−1 ;
(1.2)
though the relation was veriHed only for the “common mean case” xijt ≡ (j=1; 2; : : : ; ni ; i = 1; : : : ; k). The same property as (1.2) appears to hold even if the common mean restriction and the restriction ni = m (i = 1; : : : ; k) are removed, and the normality assumption is relaxed. Furthermore, if the procedure forming a WLS after estimating error variances through a current regression Ht is repeated several times, then the resulting estimate of seems to be more e0cient than ˆFG . Various iterative WLS procedures that were started from the initial estimate, OLS, have been suggested so far. Among them, Chen and Shao (1993) and Hooper (1993) seem important. From a Bayesian point of view, Hooper (1993) derived the asymptotic distribution of an iterative WLS estimate and obtained an asymptotically optimal weight function, which is easily interpreted under an inverse gamma model for the error variance. Chen and Shao (1993) gave a detailed discussion about the OLS-started iterative WLS estimate. They derived a general formula for the asymptotic distribution of the iterative WLS under several regularity conditions, and proposed an adaptive procedure ensuring that iterative process stops after a Hnite number of iterations. When we derive the asymptotic properties of iterative WLS, adopting the OLS as the initial estimate is convenient, principally because it requires a smaller number of replicates ni than ˆw -type estimates when ensuring the existence of moments. Nevertheless, the OLS seems somewhat unsuited to iterative “weighted” least-squares estimates. When faced with a signiHcant heteroscedasticity in error distributions, as the initial estimate we would prefer adopting a WLS to the “unweighted” least-squares estimate. Under severe heteroscedasticity, the OLS is highly ine0cient and hence yields poor weighting matrices in the Hrst step of the iterative procedure. It thus seems natural that we should try out a WLS as the initial estimate. In this paper, we derive large sample properties of iterative WLS starting with some ˆ w -type estimate. In Section 2, we obtain a result analogous to (1.1) or Theorem 2:1 in Inoue (1999). A lower bound for the asymptotic variance of ˆw -type estimates is found there. In Section 3, by slightly modifying results in Chen and Shao (1993), and making use of the Neyman–Scott-type estimate, we show that the improvement of ˆw -type estimates is accomplished by a sequence of iterative WLS when errors are spherically distributed. In particular cases, the least number of iteration needed for the improvement turns out to be one.
136
K. Inoue / Journal of Statistical Planning and Inference 110 (2003) 133–146
2. An asymptotic lower bound for V(ˆw ) Throughout the paper, we assume the boundedness of the number of replicates ni , the error variance i2 , and the weighting matrix Wi . Here, the boundedness of Wi is interpreted as d1 Ini 6 Wi 6 d2 Ini , where d1 and d2 are positive constants independent of i. Let a positive constant with subscript “U” (“L”) denote the least “upper” (the greatest “lower”) bound for a sequence of bounded positive numbers (e.g. nU = maxk∈N max16i6k ni ). As for the design vectors, throughout the paper we assume that the Euclidean norm of xij is bounded upwards, and that the minimum eigenvalue of k k −1 i=1 Xit Xi is bounded downwards by some positive constant. It is noted that the assumption about the norm of xij implies the upper boundedness of the maximum k eigenvalue of k −1 i=1 Xit Xi . As for the moment conditions of Ei (i = 1; : : : ; k), we use the following: sup E(| ˆ2i |− ) ¡ + ∞
for some constant (¿ 1);
(2.1)
sup E(||E˜ i || ) ¡ + ∞
for some constant (¿ 2);
(2.2)
i∈N
i∈N
E(E˜ i ) = 0 (i = 1; : : : ; k);
(2.3)
E(ei eit ) ¿ d · Ini
(2.4)
for some positive constant d (i = 1; : : : ; k);
E(ei ) = 0 (i = 1; : : : ; k);
(2.5)
where E˜ i = Ei = ˆ2i and ei = ni Ei =||Ei ||2 (i = 1; : : : ; k). We can derive the convergence of random matrices and vectors from these conditions. k k As a convention, using an ni × ni matrix Ai , let [Ai ] denote a i=1 ni × i=1 ni diagonal matrix with ith block element Ai . To simplify the description of calculations, let “d” denote a generic positive constant properly chosen in accordance with each situation. To derive the asymptotic properties of ˆw -type estimates, we use auxiliary lemmas. Lemma 2.1. Under (2:1) −1 (X t W Vˆ X )−1 = (X t WBX )−1 + op (k −1 );
(2.6)
−1 ˆ −1 ). where X t = (X1t ; : : : ; Xkt ); W = [Wi ]; Vˆ = [ ˆ−2 i Ini ] and B = E(V −2 Proof. Since |E( ˆ−2 i )| 6 E(| ˆi | ) for (¿ 1), we get sup E{| ˆ−2 − E( ˆ−2 i i )| } ¡ + ∞: i∈N
(2.7)
K. Inoue / Journal of Statistical Planning and Inference 110 (2003) 133–146
Hence, noting that the (s; t)-element of X t W Vˆ k i=1
{ ˆ−2 − E( ˆ−2 i )} i
ni ni l=1 j=1
−1
137
X − X t WBX is written by
wijl xijs xilt ;
where xijs denotes the sth component of xij and wijl the (j; l)-element of Wi , using (2.7) we obtain ni ni −2 sup E { ˆ−2 )} w x x − E(
ˆ ¡ + ∞; ijl ijs ilt i i i∈N l=1 j=1 which ensures the relation −1 X t W Vˆ X = X t WBX + op (k)
(2.8)
(see e.g. Corollary (ii) in Chung, 1974, p. 125). Relation (2.8), together with the boundedness assumptions, implies (2.6). Lemma 2.2. Under (2:1)–(2:4); −1 d t · Vw−1=2 (X t WBX )−1 (X t W Vˆ E) → N(0; 1)
(k → +∞)
(2.9)
for every p×1 unit vector ; where E = (E1t ; : : : ; Ekt )t and Vw−1=2 denotes the inverse of −1 −1 the square root of Vw = (X t WBX )−1 (X t WCWX )(X t WBX )−1 with C = E(Vˆ EEt Vˆ ). Proof. Putting tk = t · Vw−1=2 (X t WBX )−1 , we shall apply the Lyapunov central limit t t k theorem to { ˆ−2 i k Xi Wi Ei }i=1 . Firstly, using (2.4) and the boundedness assumptions, we show that (X t WCWX )−1 = O(k −1 ):
(2.10)
DeHne Pi = Ini − Xi (Xit Xi )− Xit . Since the rank of Pi is ni − ri (see e.g. Basilevsky, 1983, p. 282, Theorem 6:10), there exist an orthogonal matrix Qi and a diagonal matrix &i = diag(Ini −ri ; Ori ) such that Pi = Qit &i Qi . Putting fi = (fi1 ; : : : ; fini )t = Qi Ei (i = 1; : : : ; k), we have the relation
ˆ2i =
n i −ri j=1
fij2 =(ni − ri ) 6 ||fi ||2 =(ni − ri ) = ||Ei ||2 =(ni − ri )
and hence, E(Ei Eit = ˆ4i ) ¿ (ni − ri )2 E(Ei Eit =||Ei ||4 )
(i = 1; : : : ; k):
Thus, we obtain the relation X t WCWX ¿ d · kIp , which implies (2.10).
138
K. Inoue / Journal of Statistical Planning and Inference 110 (2003) 133–146
Secondly, we verify the moment condition. By (2.1), (2.10) and the boundedness assumptions, we have t t t ˜ | ˆ−2 i k Xi Wi Ei | 6 ||k || · ||Xi Wi ||E · ||Ei ||
6 d · k −=2 · ||E˜ i || for some constant (¿ 2), where || · ||E denotes the Euclidean matrix norm. From this and (2.2), we obtain k i=1
1−=2 t t ): E(| ˆ−2 i k Xi Wi Ei | ) 6 O(k
(2.11)
Finally, using (2.3) we have −1 V (tk X t W Vˆ E) = tk X t WCWXk = 1:
(2.12)
Relation (2.9) is ensured by (2.11) and (2.12). Now we state a result. Theorem 2.3. Under (2:1)–(2:4); Vw−1=2 (ˆw − ) is asymptotically normal with mean zero and variance Ip (as k → +∞); where Vw is de1ned in Lemma 2:2. The minimum of Vw ; denoted by V∗ ; is attained when W = W∗ = C −1 B : V∗ = (X t C −1 B2 X )−1 : ∗ The ˆw with W = W∗ , denoted by ˆw , is asymptotically optimal in the class of ˆw -type estimates in the sense that inf =0 inf k∈N (t Vw =t V∗ ) is greater than or equal to one. This does not necessarily imply the absolute improvement of the feasible ∗ GLS by ˆw , because, for example, if ni ¿ ri + 5 (i = 1; : : : ; k) and Ei ’s are spherically distributed with density functions, then we note that n i − ri − 4 (Ini − Pi ) (2.13) W∗ = Pi + ni − r i
and −1 ni − ri − 4 (ni − 2)E(||Ei ||−2 )Ini X V∗ = X ni − r i − 2
(2.14)
(for techniques of calculations based on the properties of spherical distributions, see e.g. Fang et al., 1989); hence, we have the equivalence between asymptotic variances ∗ V (ˆ ) and V (ˆ ) provided n1 − r1 = · · · = nk − rk . FG
w
Now we prove Theorem 2.3.
Proof of Theorem 2.3. Let be an arbitrary p × 1 unit vector. Using (2.10), (2.1), Lemmas 2.1 and 2.2, we see the relation
K. Inoue / Journal of Statistical Planning and Inference 110 (2003) 133–146
139
−1 t Vw−1=2 · (ˆw − ) − t Vw−1=2 (X t WBX )−1 (X t W Vˆ E) −1 −1 = t Vw−1=2 {(X t W Vˆ X )−1 − (X t WBX )−1 } · (X t W Vˆ E)
= op (1): From this, Lemma 2.2 and the Slutsky theorem we obtain d
t Vw−1=2 (ˆw − )→ N(0; 1)
(k → +∞);
which, together with CramRer–Wold device, implies that the p × 1 vector Vw−1=2 (ˆw − ) is asymptotically normal with mean zero and variance Ip (as k → +∞). The rest to be veriHed is the positive semi-deHniteness of Vw − V∗ . Let R be a square root matrix of C −1 . Put Z1 = R−1 B−1 WBX and Z2 = RBX , and let I denote k k t −1 t the i=1 ni × i=1 ni identity matrix. Since I − Z1 (Z1 Z1 ) Z1 is idempotent and V∗−1 −Vw−1 = Z2t {I −Z1 (Z1t Z1 )−1 Z1t }Z2 , we have V∗−1 ¿ Vw−1 , which implies the relation Vw ¿ V∗ . ∗ 3. Improvement of ˆw by an iterative procedure (0) Here we consider an initial estimate of such that ˆ = + Op (k −1=2 ) to calculate ˆ(0) a weight for ith point, denoted by ˆ−2 i ( )Wi , where Wi is the same as in the preced(0) ing section and ˆ2 (ˆ ) denotes the average of squared residuals from the estimated
responses
(0) xijt ˆ
i
(j = 1; 2; : : : ; ni ):
(0)
(0)
ˆ2i (ˆ ) = ||yi − Xi ˆ ||2 =ni
(i = 1; : : : ; k):
(0) (0) (0) (0) We may update ˆ using ˆ21 (ˆ ); : : : ; ˆ2k (ˆ ) and then each ˆ2i (ˆ ) using the up(c) dated estimate. Thus, we have an iterative WLS sequence {ˆ }c¿0 recursively deterw
mined by
−1 k k −2 ˆ(c) t −2 ˆ(c) t (c+1) ˆ =
ˆi (w )Xi Wi Xi
ˆi (w )Xi Wi yi ; w i=1
i=1
(3.1)
(0) (0) (c) (c+1) is rewritten where ˆw is regarded as ˆ . Since the relation between ˆw and ˆw by (c+1) k n X t W (y − X ˆ ) i i i i i w i=1
(c)
||yi − Xi ˆw ||2
= 0;
(c)
we can view {ˆw }c¿0 as the approximate solutions to an estimating equation k n X t W (y − X ) ˜ i i i i i = 0; ˜ 2 ||yi − Xi || i=1
(3.2)
140
K. Inoue / Journal of Statistical Planning and Inference 110 (2003) 133–146
which reduces to the likelihood equation when Ei ’s are normally distributed and Wi = Ini (i = 1; : : : ; k). Neyman and Scott (1948) dealt with the common mean case under normality assumption and proved under mild conditions that the solution ˜ corresponding to ni − 2 W= (3.3) In i ni is asymptotically more e0cient than the maximum likelihood estimate unless n1 = · · · = nk . Even if the normality assumption and the restriction xijt ≡ are removed, we can derive an analogous result. Through local linearization of the left-hand side of (3.2) about the true regression vector , we obtain the asymptotic variance ˜ of : ˜ =: (X t W BX ˜ ˜ )−1 (X t W CWX ˜ )−1 V () )(X t W BX
(3.4)
(see, e.g. Carroll and Ruppert, 1988, Chapter 7), where B˜ = [ni E(||Ei ||−2 )Ini − (2=ni ) E(ei eit )] and C˜ = [E(ei eit )]. By matrix algebra we Hnd that the right-hand side of (3.4) is bounded downwards by ˜ −1 (B) ˜ 2 X }−1 ; VNS = {X t (C) which is attained when the weighting matrix W is chosen as ˜ −1 B˜ = ni E(||Ei ||−2 ){E(ei eit )}−1 − 2 Ini : WNS = (C) ni If the condition E(ei eit ) = ni E(||Ei ||−2 )Ini
(i = 1; : : : ; k)
(3.5)
is additionally satisHed, then WNS reduces to (3.3) and hence, −1 (ni − 2)2 VNS = X t E(||Ei ||−2 )Ini X : ni
(3.6)
Comparing (3.6) with (2.14), we can see the superiority of the Neyman–Scott-type ∗ estimate over ˆw -type estimates. This indicates the possibility that ˆw is improved by (c) ˆ with su0ciently large c when Ei ’s are spherically distributed. w
Modifying Theorems 1 and 2 in Chen and Shao (1993), we shall obtain the large (c) (0) ∗ sample properties of ˆ with ˆ = ˆ . To begin with, we state a result analogous to
their Theorem 1.
w
w
k Theorem 3.1. Let a p × i=1 ni matrix Ak (also Bk ) be zero; or satisfy the following conditions: d1 · kIp 6 Ak Atk 6 d2 · kIp for some positive constants d1 and d2 ; the (1) norms of the column vectors of Ak are bounded upwards. Also; let ˆ be determined w
K. Inoue / Journal of Statistical Planning and Inference 110 (2003) 133–146
141
(0) by (3:1) with ˆw satisfying (0) ˜ + op (k −1=2 ); ˆw − = Gk−1 (Ak e + Bk E) t t where e = (e1t ; : : : ; ekt )t ; E˜ = (E˜ 1 ; : : : ; E˜ k )t ; Gk = X t [ni E(||Ei ||−2 )Wi ]X . (2:1)–(2:5);
Then
under
(1) ˜ + op (k −1=2 ); ˆw − = Gk−1 (A˜ k e + B˜ k E) t where A˜ k = X t W + 2Hk Gk−1 Ak ; B˜ k = 2Hk Gk−1 Bk and Hk = X t [n−1 i Wi E(ei ei )]X .
Proof. Since the weighting matrices are assumed bounded (upwards and downwards), the proof of the present theorem is essentially the same as that of Theorem 1 in Chen (0) and Shao (1993), provided that the relation ˆw − = Op (k −1=2 ) is ensured. In a similar way as Lemma 2.2, however, relations G −1 Ak e = Op (k −1=2 ) and G −1 Bk E˜ = Op (k −1=2 ) k
k
(0) are veriHed by the present assumptions and hence the boundedness of k 1=2 (ˆw − ) in probability. (0) Let A(0) k (also Bk ) be a p × Theorem 3.1. DeHne
Ak(c) =
c−1 s=0
k
i=1
ni matrix satisfying the same conditions as Ak in
(2Hk Gk−1 )s · X t W + (2Hk Gk−1 )c · A(0) k
˜ )−1 · X t W + (2Hk G −1 )c · A(0) = {I − (2Hk Gk−1 )c } · Gk · (X t W BX k k and Bk(c) = (2Hk Gk−1 )c · Bk(0) : (m) t (m) Then the repeated application of Theorem 3.1, supposing that A(m) k (Ak ) (also Bk (m) t (Bk ) ) satisHes the same condition as Ak Atk in the theorem (m = 1; 2; : : : ; c−1 ), yields the relation (c) ˜ + op (k −1=2 ): ˆw − = Gk−1 (Ak(c) e + Bk(c) E)
˜ ¿ d · kIp , implies the asymptotic normality This, together with -k(c) = V (Ak(c) e + Bk(c) E) (c) of ˆ starting with the ˆ -type estimate, which is analogous to Theorem 2 in Chen w
and Shao (1993).
w
(m) c−1 c−1 Theorem 3.2. Assume that (2:1)–(2:5) are satis1ed; and that {A(m) k }m=0 ; {Bk }m=0 (c) and -k(c) satisfy the conditions above. Then (Gk−1 -k(c) Gk−1 )−1=2 (ˆw − ) is asymptotically normal with mean zero and variance Ip (as k → +∞).
142
K. Inoue / Journal of Statistical Planning and Inference 110 (2003) 133–146
(0) ∗ Putting, for example, ˆw = ˆw in the theorem above, we have the expression
˜ ˜ )−1 (X t W CWX ˜ )−1 · (I − Lck ) Gk−1 -k(c) Gk−1 = {I − (Ltk )c } · (X t W BX )(X t W BX ˜ )−1 · X t W · E(eE˜ t ) · W∗ X · V∗ · Lck + {I − (Ltk )c } · (X t W BX ˜ t ) · WX · (X t W BX ˜ )−1 · (I − Lck ) + (Ltk )c · V∗ · X t W∗ · E(Ee + (Ltk )c · V∗ · Lck ;
(3.7)
where Lk = 2Hk Gk−1 . By straightforward matrix algebra we Hnd that the Euclidean matrix norm of Lck is bounded sup ||Lck ||E 6 d · (2=nL )c ;
(3.8)
k∈N
(c) from which we observe that ˆw becomes equivalent to ˜ (deHned in (3.2)) as c increases. Henceforth, we assume that ni ¿ ri +5 (i = 1; : : : ; k), that error vectors are spherically distributed with density functions, and that supi∈N E(||Ei ||− ) ¡ + ∞ for some constant (¿ 2). In this case, conditions (2.1) – (2.5) and (3.5) are fulHlled, and Gk−1 -k(c) Gk−1 is clearly expressed. We note here that the types of spherical distributions need not be speciHed; furthermore, they are allowed to be non-identical.
Theorem 3.3. Assume that spherically distributed errors Ei (i = 1; : : : ; k) satisfy (0) ∗ (c) the conditions above. Let ˆ = ˆ and W = WNS for every ˆ (c ¿ 1). Then (c)
w
w
w
(Dk(c) )−1=2 (ˆw − ) is asymptotically normal with mean zero and variance Ip (as k → +∞); where Dk(c) = VNS + (Ltk )c (V∗ − VNS )Lck
(3.9)
and Lk = {X t [(2=ni )(ni − 2)E(||Ei ||−2 )Ini ]X }{X t [(ni − 2)E(||Ei ||−2 )Ini ]X }−1 . There are two reasons as to why we have focused on the case W = WNS in the ˜ ˜ )−1 (X t W CWX ˜ )−1 , theorem above. Firstly, the leading term of (3.7), (X t W BX )(X t W BX is minimized at W = WNS , so that the choice W = WNS is justiHed when we allow the large iteration number c. Secondly, the choice results in the simple asymptotic variance ∗ (3.9), which is easy to compare with the asymptotic variance of the initial estimate ˆw . Now we prove the theorem. ˜ · G −1 started Proof of Theorem 3.3. It su0ces to satisfy that Gk−1 · V (Ak(c) e + Bk(c) E) k (0) (0) with Ak = 0 and Bk = Gk · V∗ · X t W∗ is expressed by (3.9). Firstly, we show that n i − ri E(E˜ i eit ) = ni E(||Ei ||−2 ) Pi + (i = 1; : : : ; k); (3.10) (Ini − Pi ) ni − r i − 2 where Pi is the symmetric and idempotent matrix deHned in the proof of Lemma 2.2. Let fi and Qi be the same as in the proof of Lemma 2.2. Partitioning fi into two
K. Inoue / Journal of Statistical Planning and Inference 110 (2003) 133–146
143
vectors (fi(1) )t = (fi1 ; : : : ; fini −ri )t and (fi(2) )t = (fini −ri +1 ; : : : ; fini )t , we have expressions ˆ2i = ||fi(1) ||2 =(ni − ri ) and ˆ2i () = ||fi ||2 =ni . Since fi is spherically distributed, we have
fi(2) (fi(1) )t fi(1) (fi(1) )t 1 E = 0; E = E(||Ei ||−2 )Ini −ri ni − r i ||fi ||2 ||fi(1) ||2 ||fi ||2 ||fi(1) ||2 and
E
fi(2) (fi(2) )t
||fi ||2 ||fi(1) ||2
1 = E ri
||fi(2) ||2
||fi ||2 ||fi(1) ||2
I ri
=
1 {E(||fi(1) ||−2 ) − E(||Ei ||−2 )}Iri ri
=
1 E(||Ei ||−2 )Iri ; ni − r i − 2
hence
E(E˜ i eit ) = ni (ni − ri ) · Qit · E −2
= ni E(||Ei ||
fi fit
||fi ||2 ||fi(1) ||2
· Qi
n i − ri (In − Pi ) ) Pi + ni − r i − 2 i
(i = 1; : : : ; k):
Secondly, we show that t X t WNS · E(eE˜ ) · W∗ X = V∗−1 :
Substitution of (2.13), (3.3) and (3.10) yields t X t WNS · E(eE˜ ) · W∗ X n i − ri t ni − 2 −2 (In − Pi ) =X Ini · ni E(||Ei || ) Pi + ni ni − r i − 2 i n i − ri − 4 × Pi + (Ini − Pi ) X ni − r i n i − ri − 4 = X t (ni − 2)E(||Ei ||−2 ) Pi + (Ini − Pi ) X ni − r i − 2 t ni − ri − 4 −2 =X (ni − 2)E(||Ei || )Ini X; ni − r i − 2
where it is noted that Xit Pi = 0 (i = 1; : : : ; k).
(3.11)
144
K. Inoue / Journal of Statistical Planning and Inference 110 (2003) 133–146
˜ · G −1 from (3.7) and (3.11). Since A(c) = Now we calculate Gk−1 · V (Ak(c) e + Bk(c) E) k k (c) t c c (I − Lk ) · Gk · VNS · X WNS and Bk = Lk · Gk · V∗ · X t W∗ , we have Gk−1 -k(c) Gk−1 = {I − (Ltk )c } · VNS · (I − Lck ) + {I − (Ltk )c } · VNS · V∗−1 · V∗ · Lck + (Ltk )c · V∗ · V∗−1 · VNS · (I − Lck ) + (Ltk )c · V∗ · Lck = Dk(c) : Though it is required that we should specify the least number of iteration yielding ∗ a WLS superior to the initial estimate ˆw , in general, we have solely the following result. Theorem 3.4. Under the conditions in Theorem 3:3; there exists an integer c0 such that ∗ t V (ˆw ) inf inf lim inf ¿ 1: (c) c¿c0 =0 k→+∞ t V (ˆw ) Proof. Firstly, we show that t V∗ 2ri + 4 inf inf ¿ 1 + inf : i∈N =0 k∈N t VNS ni (ni − ri − 4)
(3.12)
−1 It su0ces to verify that the eigenvalues of VNS V∗ are bounded downwards by the right-hand side of (3.12) (see e.g. Magnus and Neudecker, 1988, p. 236, miscellaneous −1 −1 exercise 5). Since the equation |VNS V∗ −/Ip | = 0 is equivalent to |VNS −/V∗−1 | = 0, we −1 note that the / must be greater than one under the positive deHniteness of VNS − V∗−1 . The equation also requires that / ¿ inf i∈N {(ni − 2)(ni − ri − 2)=ni (ni − ri − 4)}, because −1 VNS − /V∗−1 (ni − 2)(ni − ri − 2) (ni − ri − 4)(ni − 2)E(||Ei ||−2 ) t =X −/ I ni X ni (ni − ri − 4) ni − r i − 2
is positive deHnite if / is less than inf i∈N {(ni − 2)(ni − ri − 2)=ni (ni − ri − 4)}. Secondly, we evaluate the quadratic forms t (Ltk )c (V∗ − VNS )Lck and t VNS . Since V∗ − VNS 6 d1 · (X t X )−1 and VNS ¿ d2 · (X t X )−1 for some positive constants d1 and d2 , we have the following: t · (Ltk )c (V∗ − VNS )Lck · 6 (Lck )t · d1 (X t X )−1 · (Lck ) 6 d · k −1 · ||Lck ||2E · ||||2 ;
(3.13)
t VNS ¿ t · d2 (X t X )−1 · ¿ d · k −1 · ||||2 :
(3.14)
K. Inoue / Journal of Statistical Planning and Inference 110 (2003) 133–146
145
Finally, by Theorems 2:3 and 3:3, (3.8), (3.13) and (3.14), we note that lim inf
k→+∞
∗ t V (ˆw )
(c) t V (ˆw )
¿ lim inf
k→+∞
¿ lim inf
k→+∞
¿
t V∗
t Dk(c) t V ∗ t VNS + d · k −1 · ||Lck ||2E · ||||2
inf k∈N (t V∗ =t VNS ) ; 1 + d · (2=nL )2c
which implies the relation inf lim inf
∗ t V (ˆw )
(c) =0 k→+∞ t V (ˆw )
¿
inf =0 inf k∈N (t V∗ =t VNS ) : 1 + d · (2=nL )2c
(3.15)
Using (3.12) we observe that there exists an integer c0 such that the right-hand side of (3.15) is greater than one for c ¿ c0 . In particular cases, we can verify that c0 = 1. Example 1 (equi-replicated case). Let ni = m(¿ rU + 5) (i = 1; : : : ; k). Using relations Lk =
2 Ip ; m
VNS =
m {X t [E(||Ei ||−2 )Im ]X }−1 (m − 2)2
and V∗ ¿
m − rL − 2 {X t [E(||Ei ||−2 )Im ]X }−1 ; (m − rL − 4)(m − 2)
we have ∗
inf lim inf
=0 k→+∞
t V (ˆw )
(c) t V (ˆw )
= 1+
1 − (2=m)2c {inf =0 inf k∈N (t V∗ =t VNS ) − 1}−1 + (2=m)2c
¿1+
1 − (2=m)2c m(m − rL − 4)=(2rL + 4) + (2=m)2c
for every c ¿ 1. The ˆw -type estimates (including the feasible GLS) are improved signiHcantly when m is small or the greatest lower bound for the rank of Xi is large. Example 2 (common mean case). Let xijt ≡ with p = 1 and xij1 = 1, and let ni ¿ 6 (i = 1; : : : ; k). In this case, the feasible GLS is sometimes called the Graybill– Deal estimate, whose Hnite sample behaviour (such as inadmissibility in a class of unbiased estimates) holds our interest. Using relations k k −1 −2 −2 2(ni − 2)E(||Ei || ) ni (ni − 2)E(||Ei || ) ; Lk = i=1
i=1
146
K. Inoue / Journal of Statistical Planning and Inference 110 (2003) 133–146
VNS =
k
i=1
2
−2
(ni − 2) E(||Ei ||
−1 )
and −1
k n −5 i V∗ = ni (ni − 2)E(||Ei ||−2 ) i=1 ni − 3
;
we have lim inf
k→+∞
V (ˆFG ) (c) V (ˆw )
∗
¿ lim inf
k→+∞
V (ˆw )
(c) V (ˆw )
¿1+
1 − (2=nL )2c {inf k∈N (V∗ =VNS ) − 1}−1 + (2=nL )2c
¿1+
1 − (2=nL )2c nU (nU − 5)=6 + (2=nL )2c
for every c ¿ 1. The Graybill–Deal estimate is improved signiHcantly when the least upper bound for the number of replicates ni is small. Acknowledgements The author would like to thank an Associate Editor and two referees for helpful comments and useful suggestions. References Basilevsky, A., 1983. Applied Matrix Algebra in the Statistical Sciences. North-Holland, Amsterdam. Carroll, R.J., Cline, D.B.H., 1988. An asymptotic theory for weighted least-squares with weights estimated by replication. Biometrika 75, 35–43. Carroll, R.J., Ruppert, D., 1988. Transformation and Weighting in Regression. Chapman and Hall, New York. Chen, J., Shao, J., 1993. Iterative weighted least squares estimators. Ann. Statist. 21, 1071–1092. Chung, K.L., 1974. A Course in Probability Theory, 2nd Edition. Academic Press, New York. Fang, K.-T., Kotz, S., Ng, K.W., 1989. Symmetric Multivariate and Related Distributions. Chapman and Hall, New York. Fuller, W.A., Rao, J.N.K., 1978. Estimation for a linear regression model with unknown diagonal covariance matrix. Ann. Statist. 6, 1149–1158. Hooper, P.M., 1993. Iterative weighted least squares estimation in heteroscedastic linear models. J. Amer. Statist. Assoc. 88, 179–184. Inoue, K., 1999. Asymptotic improvement of the Graybill–Deal estimator. Comm. Statist. Theory Methods 28 (2), 388–407. Magnus, J.R., Neudecker, H., 1988. Matrix DiMerential Calculus with Applications in Statistics and Econometrics. Wiley, Chichester. Neyman, J., Scott, E.L., 1948. Consistent estimates based on partially consistent observations. Econometrica 16, 1–32. Shao, J., 1989. Asymptotic distribution of the weighted least squares estimator. Ann. Inst. Statist. Math. 41, 365–382.