An asymptotic approximation for EPMC in linear discriminant analysis based on monotone missing data

An asymptotic approximation for EPMC in linear discriminant analysis based on monotone missing data

Journal of Statistical Planning and Inference 142 (2012) 110–125 Contents lists available at ScienceDirect Journal of Statistical Planning and Infer...

305KB Sizes 6 Downloads 43 Views

Journal of Statistical Planning and Inference 142 (2012) 110–125

Contents lists available at ScienceDirect

Journal of Statistical Planning and Inference journal homepage: www.elsevier.com/locate/jspi

An asymptotic approximation for EPMC in linear discriminant analysis based on monotone missing data Nobumichi Shutoh a,1, a

Department of Mathematical Information Science, Graduate School of Science, Tokyo University of Science, 1-3, Kagurazaka, Shinjuku-ku, Tokyo 162-8601, Japan

a r t i c l e in f o

abstract

Article history: Received 5 October 2010 Received in revised form 30 June 2011 Accepted 5 July 2011 Available online 22 July 2011

In this paper, we propose an asymptotic approximation for the expected probabilities of misclassification (EPMC) in the linear discriminant function on the basis of k-step monotone missing training data for general k. We derive certain relations of the statistics in order to obtain the approximation. Finally, we perform Monte Carlo simulation to evaluate the accuracy of our result and to compare it with existing approximations. & 2011 Elsevier B.V. All rights reserved.

Keywords: Linear discriminant analysis Probabilities of misclassification Asymptotic approximation Monotone missing data

1. Introduction In discriminant analysis, asymptotic approximations for the expected probabilities of misclassification (EPMC) play an important role in the investigation of discrimination errors because the discriminant functions have complex distributional expressions. In particular, for two groups PðgÞ : Np ðlðgÞ , SÞ ðg ¼ 1,2Þ, several authors have discussed asymptotic approximations for the EPMC in the linear discriminant function based on the data set shown in the shaded portion of Fig. 1, i.e., p-dimensional sample vectors xðgÞ for j ¼ 1, . . . ,N1ðgÞ and g ¼1,2: j W1 ¼ ðx ð1Þ x ð2Þ Þ0 S1 ½x12ðx ð1Þ þ x ð2Þ Þ, where the sample mean vector based on the sample vector xðgÞ is denoted by x ðgÞ , the pooled sample covariance matrix j ðgÞ ð1Þ based on xj is denoted by S, and the vector arising from P or Pð2Þ is denoted by x. On the basis of the cut-off point c, which depends on prior probabilities of drawing an observation vector from PðgÞ and on the cost of discrimination, the vector x may be assigned to Pð1Þ if W1 4 c; otherwise, it may be assigned to Pð2Þ . If the cut-off point is set as c¼ 0, the EPMC can be written as e1 ð2j1Þ ¼ PrðW1 r0jx 2 Pð1Þ Þ, e1 ð1j2Þ ¼ PrðW1 40jx 2 Pð2Þ Þ:

 Corresponding author: Tel.: þ 81 3 3260 4271; fax: þ 81 3 3260 4293.

E-mail address: [email protected] Research Fellow of the Japan Society for the Promotion of Science.

1

0378-3758/$ - see front matter & 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.jspi.2011.07.001

N. Shutoh

/ Journal of Statistical Planning and Inference 142 (2012) 110–125

(g)

(g)

111

(g)

N [k ] = N 1 + · · · + N k (g)

(g)

(g)

N [k − 1] = N 1 + · · · + N k − 1 (g) N [2] (g) N1

=

(g) N1

+

(g) N2 (g)

N2

(g)

N k− 1

(g)

Nk

p1 p[2] = p1 + p2 p[k − 1] = p1 + · · · + pk − 1 pk

p ≡ p[k ] = p1 + · · · + pk

Fig. 1. k-step monotone missing data. (The shaded portion denotes complete data.)

Although it is difficult to handle the exact distribution of W1 under x 2 PðgÞ , the linear discriminant function has asymptotic normality. Using the asymptotic property, Okamoto (1963) derived an asymptotic expansion for the distribution of W1 up to the second order. We can consider Okamoto’s (1963) result as an approximation for the probabilities of misclassification for a fixed p and large sample sizes. On the other hand, Fujikoshi and Seo (1998) derived another asymptotic approximation for the EPMC, which is useful for a high-dimensional framework. In addition, Lachenbruch’s (1968) approximation is suitable for both large-sample and high-dimensional frameworks. Wakaki (1994) investigated the linear discriminant function in elliptical populations. It is important to note that these results have been derived using only the shaded portion in Fig. 1. In the last decade, these results have been extended to the case of a linear discriminant function with 2-step monotone missing training data. The authors assumed that the data set illustrated in Fig. 1 for k¼ 2, i.e., p-dimensional sample vectors xðgÞ ðj ¼ 1, . . . ,N1ðgÞ ,g ¼ 1,2Þ and p1-dimensional sample vectors xðgÞ , is observed for j ¼ N1ðgÞ þ 1, . . . ,N1ðgÞ þ N2ðgÞ , g ¼1,2, and j 1j p 4 p1 . In this case, Shutoh (2011) extended the result derived by Okamoto (1963) up to the first order. Kanda and Fujikoshi (2004) and Shutoh et al. (2011) considered a result similar to that of Lachenbruch (1968). Batsidis et al. (2006) obtained asymptotic expressions for the distribution functions of the probabilities of misclassification. Batsidis and Zografos (2006) investigated linear discrimination in elliptical distributions. In this paper, we primarily deal with Lachenbruch’s (1968) approximation. We provide asymptotic approximation under k-step monotone missing training data, shown in Fig. 1, for general k. It is important to note that we have presented a certain generalization of the results derived by Lachenbruch (1968) and Shutoh et al. (2011). The remainder of this paper is organized as follows. In Section 2, we review Lachenbruch’s (1968) approximation and state the main result derived in this study. In Section 3, we present the estimates and their relations. In Section 4, we evaluate our results using Monte Carlo simulation for the selected parameters. Finally, in Section 5, we conclude this paper and discuss the scope for future related studies. The desired lemmas and the main proofs are provided in Appendices A and B, respectively. 2. Lachenbruch’s approximation 2.1. Lachenbruch’s approximation for complete data In this subsection, we review the approximation derived by Lachenbruch (1968). First, we define the following statistics: 1=2

Z1 ¼ V1

ðx ð1Þ x ð2Þ Þ0 S1 ðxlð1Þ Þ,

U1 ¼ ðx ð1Þ x ð2Þ Þ0 S1 ðx ð1Þ lð1Þ Þ12D21 , D21 ¼ ðx ð1Þ x ð2Þ Þ0 S1 ðx ð1Þ x ð2Þ Þ0 , V1 ¼ ðx ð1Þ x ð2Þ Þ0 S1 SS1 ðx ð1Þ x ð2Þ Þ0 : 1=2

Then, we can write W1 ¼ V1 Z1 U1 , and it is important to note that the marginal distribution of Z1 is the standard normal distribution, since the conditional distribution does not depend on the given statistics. Thereby, Lachenbruch (1968) proposed asymptotic approximation stated in Theorem 1. Several asymptotic approximations have been considered under a large-sample framework C1 : N1ðgÞ -1 ðg ¼ 1,2Þ,

N1ð2Þ =N1ð1Þ -positive constant

and a high-dimensional framework C2 : N1ðgÞ -1 ðg ¼ 1,2Þ,

p -r1 2 ð0,1Þ, n1

D2 ¼ Oð1Þ,

112

N. Shutoh

/ Journal of Statistical Planning and Inference 142 (2012) 110–125

where n1 ¼ N1ð1Þ þ N1ð2Þ 2 and D2 ¼ ðlð1Þ lð2Þ Þ0 S1 ðlð1Þ lð2Þ Þ. Under the second framework C2, we have p-1, m1  n1  p ¼ n1 f1ðp=n1 Þg-1. Lachenbruch (1968) proposed an approximation only under C1, afterward, Fujikoshi and Seo (1998) justified his approximation under C2. In addition, Okamoto (1963) proposed another approximation under C1, whereas Fujikoshi and Seo (1998) proposed an approximation under the high-dimensional framework C2. Several useful reviews on studies based on asymptotic theory under C1 and C2 have been provided by Fujikoshi et al. (2010). Theorem 1 (Lachenbruch, 1968). Under the asymptotic frameworks C1 and C2, an asymptotic approximation for e1 ð2j1Þ can be obtained as e1 ð2j1Þ C FðEðU1 ÞfEðV1 Þg1=2 Þ, where FðÞ is the cumulative distribution function of the standard normal distribution, EðU1 Þ ¼ 

n1 u1 , 2ðm1 1Þ

u1 ¼ D2 þ

pðN1ð1Þ N1ð2Þ Þ N1ð1Þ N1ð2Þ

EðV1 Þ ¼

,

n21 ðn1 1Þ v1 , m1 ðm1 1Þðm1 3Þ pðN1ð1Þ þ N1ð2Þ Þ

v1 ¼ D2 þ

N1ð1Þ N1ð2Þ

,

for m1 34 0. Similarly, an asymptotic approximation for e1 ð1j2Þ can be obtained by interchanging N1ð1Þ and N1ð2Þ in EðU1 Þ. 2.2. Lachenbruch’s approximation for k-step monotone missing data We consider the asymptotic approximation stated in Section 2.1 in the case of k-step monotone missing training data. From Fig. 1, a k-step monotone missing data set is stated as the following sample vectors from PðgÞ : 1 0 ðgÞ 0 ðgÞ 1 x ðgÞ 1 1 0 ðgÞ 0 ðgÞ x11 x ðgÞ B 1N1 C x 1,N½2 C B 1,N1ðgÞ þ 1 C B ðgÞ B ðgÞ C 0 1 0 1 C B C B x ðgÞ C B B x C C B ðgÞ xðgÞ ðgÞ xðgÞ ðgÞ B 2N1 C B xðgÞ B 21 C     C C B 1,N 1,N þ 1 x C B 2,NðgÞ þ 1 C C B B B C B ðgÞ ½k2 ½k1 C C B 2,N½2 ðgÞ C, B B ^ C, . . . , B B C, . . . , B C, xðgÞ ðgÞ 1 ^ , . . . , , . . . , , . . . , x , ð1Þ C C B ðgÞ CB C B B @ xðgÞ A @ xðgÞ A 1,N½k1 þ 1 1,N½k C C B C B xðgÞ B xðgÞ C ðgÞ ðgÞ ^ ^ C C B þ1 2,N½k2 2,N½k1 B k1,NðgÞ C B B k1,1 C A A @ ðgÞ @ ðgÞ A B @ 1 C x ðgÞ ðgÞ A xk1,NðgÞ þ 1 @ ðgÞ k1,N½2 xk1 1 x ðgÞ

kN1

ðgÞ ¼ N1ðgÞ þ    þ NqðgÞ , and xðgÞ denotes a pkq þ 1 -dimensional sample vector from PðgÞ for where p  p1 þ    þ pk , N½q kq þ 1,j ðgÞ q¼ 1,y,k, j ¼ 1, . . . ,N½q , and g ¼1,2. Summarizing the sample vectors listed in (1), as it turns out, we assume that ðgÞ ðgÞ  Np½kq þ 1 ðlðgÞ , Sðkq þ 1Þ Þ ðq ¼ 1, . . . ,k, j ¼ N½q1 þ 1, . . . ,N½q Þ xðgÞ ðkq þ 1Þj ðkq þ 1Þ Pkq þ 1 can be observed, where p½kq þ 1 ¼ i ¼ 1 pi , 0 1 0 1 xðgÞ lðgÞ 1j 1 B B C C C, lðgÞ C, ^ ^ ¼B ¼B xðgÞ @ A @ ðgÞ A ðkq þ 1Þj ðkq þ 1Þ l xðgÞ kq þ 1 kq þ 1,j

0

Sðkq þ 1Þ ¼ B @

S11

...

S1,kq þ 1

1

^

&

^

Skq þ 1,1

...

Skq þ 1,kq þ 1

C A,

ðgÞ lðgÞ a is a pa -dimensional partitioned vector of l , and Sab is a pa  pb partitioned matrix of S for a ¼ 1, . . . ,k and b ¼ 1, . . . ,k. In other words, we have the decomposition of lðgÞ and S as 0 ðgÞ 1 0 1 S11 . . . S1k l1 B C B C & ^ C lðgÞ ¼ B A: @ ^ A, S ¼ @ ^ S . . . S lðgÞ k1 kk k ðgÞ  xðgÞ , p  p½k , and N½0  0. It is important to note that xðgÞ j ðkÞj We discuss the linear discriminant function

b Wk ¼ ðl b where l

ðgÞ

ð1Þ

b l

ð2Þ 0 b 1

ÞS

b ½x12ðl

ð1Þ

b þl

ð2Þ

Þ,

b are the estimates for ‘ ¼ k, obtained in Section 3. Thus, the probabilities of misclassification are expressed as and S

ek ð2j1Þ ¼ PrðWk r0jx 2 Pð1Þ Þ,

N. Shutoh

/ Journal of Statistical Planning and Inference 142 (2012) 110–125

113

ek ð1j2Þ ¼ PrðWk 40jx 2 Pð2Þ Þ: Then, we can consider an approximation for ek ð2j1Þ as ek ð2j1Þ C FðEðUk ÞfEðVk Þg1=2 Þ, b are the estimates for ‘ ¼ k, obtained in Section 3, b ðgÞ and S where l b Uk ¼ ðl

ð1Þ

b l

ð2Þ 0 b 1

ÞS

b ðl

ð1Þ

lð1Þ Þ12D2k ,

b D2k ¼ ðl

ð1Þ

b l

ð2Þ 0 b 1

ÞS

b ðl

ð1Þ

b l

ð2Þ 0 b 1

b SS

b Vk ¼ ðl

ð1Þ

b l

ÞS

1

b ðl

ð2Þ

Þ,

ð1Þ

b l

ð2Þ

Þ

under a large-sample framework for k-step monotone missing data, C3 : NqðgÞ -1 ðg ¼ 1,2Þ and a high-dimensional framework for k-step monotone missing data, ðgÞ -1 ðg ¼ 1,2Þ, C4 : N½q

p½kq þ 1 -rq,k 2 ð0,1Þ, n½q

D2 ¼ Oð1Þ,

P where n½q ¼ qi¼ 1 ni , nq ¼ Nqð1Þ þNqð2Þ 2 for q¼1,y,k. Under the fourth framework C4, we have p½kq þ 1 -1, m½q,kq þ 1  n½q p½kq þ 1 ¼ n½q f1ðp½kq þ 1 =n½q Þg-1. The main result of this study is derived in Sections 3 and 4. In particular, the expectations are presented in Proposition 4. Theorem 2. Under the asymptotic frameworks C3 and C4, an asymptotic approximation for ek ð2j1Þ can be obtained as ek ð2j1Þ C FðEðUk ÞfEðVk Þg1=2 Þ, where EðUk Þ ¼ EðU1 Þ

k  X

n½‘

n½‘1

u‘,k 

 un‘,k ,

2ðm½‘,k‘ þ 1 1Þ 2ðm½‘1,k‘ þ 1 1Þ ( ) k X n2½‘1 ðn½‘1 1Þ n2½‘ ðn½‘ 1Þ v‘,k  vn‘,k EðVk Þ ¼ EðV1 Þ þ m½‘1,k‘ þ 1 ðm½‘1,k‘ þ 1 1Þðm½‘1,k‘ þ 1 3Þ m½‘,k‘ þ 1 ðm½‘,k‘ þ 1 1Þðm½‘,k‘ þ 1 3Þ ‘¼2 ‘¼2

þ2

k X

k X

n½ki þ 1 pi ðm 1Þðm½ki þ 1,i 1Þ ½ki þ 1,i1 ‘ ¼ 2 i ¼ k‘ þ 2

  n½‘1 ðn½‘1 1Þ n½‘ ðn½‘ 1Þ v‘,k  vn‘,k ,  m½‘,k‘ þ 1 ðm½‘,k‘ þ 1 3Þ m½‘1,k‘ þ 1 ðm½‘1,k‘ þ 1 3Þ where m½s,t ¼ n½s p½t , 2

u‘,k ¼ dk‘ þ 1 þ

2

v‘,k ¼ dk‘ þ 1 þ

ð1Þ ð2Þ p½k‘ þ 1 ðN½‘ N½‘ Þ ð1Þ ð2Þ N½‘ N½‘

,

ð1Þ ð2Þ p½k‘ þ 1 ðN½‘ þ N½‘ Þ ð1Þ ð2Þ N½‘ N½‘

,

2

un‘,k ¼ dk‘ þ 1 þ

2

vn‘,k ¼ dk‘ þ 1 þ

ð1Þ ð2Þ p½k‘ þ 1 ðN½‘1 N½‘1 Þ ð1Þ ð2Þ N½‘1 N½‘1

,

ð1Þ ð2Þ p½k‘ þ 1 ðN½‘1 þN½‘1 Þ ð1Þ ð2Þ N½‘1 N½‘1

,

ð1Þ ð2Þ d2k‘ þ 1 ¼ d0ðk‘ þ 1Þ S1 ðk‘ þ 1Þ dðk‘ þ 1Þ , dðk‘ þ 1Þ ¼ lðk‘ þ 1Þ lðk‘ þ 1Þ ,

for m1 3 4 0. Similarly, that for ek ð1j2Þ can be obtained by interchanging Nið1Þ and Nið2Þ for i¼1,y,k in EðUk Þ. Corollary 3. For k¼2, Theorem 2 coincides with the result derived by Shutoh et al. (2011).

3. Estimates for k-step monotone missing data We consider the estimates of lðgÞ and S on the basis of NqðgÞ ðq ¼ 1, . . . ,‘Þ sample vectors, and the same on the basis of NqðgÞ ðq ¼ 1, . . . ,‘1Þ sample vectors, for ‘ ¼ 2, . . . ,k. As in the case of Kanda and Fujikoshi (1998), we can obtain the estimates

114

N. Shutoh

of lðgÞ and C, where 0 ðgÞ 1

l

0

B ðk‘ þ 1Þ C B lðgÞ C C lðgÞ ¼ B B k‘ þ 2 C, B C ^ @ A

C11

C¼B @ ^

... &

Ck1

lðgÞ k

/ Journal of Statistical Planning and Inference 142 (2012) 110–125

...

1

0

^ C A,

Cðk‘ þ 1Þ ¼ B @

C1k Ckk

C11

...

C1,k‘ þ 1

^

&

^

Ck‘ þ 1,1

...

Ck‘ þ 1,k‘ þ 1

1 C A ¼ Sðk‘ þ 1Þ ,

Cii ¼ Sii Siði1Þ S1 Ciði1Þ ¼ Siði1Þ S1 Siði1Þ ¼ S0ði1Þi ¼ ðSi1 . . . Si,i1 Þ ði1Þ Sði1Þi  Siið1...i1Þ , ði1Þ , for constant ‘ and i ¼ k‘ þ2, . . . ,k. On the basis of the sample vectors, we can obtain the maximum likelihood estimates (MLEs) of flðgÞ , lðgÞ , Cðk‘ þ 1Þ , Cii , Ciði1Þ g up to the ‘th step. Instead of the MLEs, we use simpler estimates of ðk‘ þ 1Þ i ðgÞ ðgÞ flðk‘ þ 1Þ , li , Cðk‘ þ 1Þ , Cii , Ciði1Þ g: ½g,‘ ½g,ki þ 1 ½g,ki þ 1 b b ðgÞ b ðgÞ b ðgÞ0 b ðgÞ0 b ðgÞ0 b ðgÞ l l C l l iði1Þ ðx ði1Þ ðk‘ þ 1Þ ¼ x ðk‘ þ 1Þ , i ¼ xi ði1Þ Þ, ði1Þ ¼ ðl 1 , . . . , l i1 Þ:

1 ½‘ G , n½‘ ðk‘ þ 1Þ

b C ðk‘ þ 1Þ ¼

b ¼ C ii

1 G½ki þ 1 , n½ki þ 1 iið1...i1Þ

½ki þ 1 ½ki þ 1 1 b C g , iði1Þ ¼ Giði1Þ fGði1Þ

þ 1 ½ki þ 1 þ 1 ½ki þ 1 1 ½ki þ 1 G½ki G½ki g Gði1Þi , iið1...i1Þ ¼ Gii iði1Þ fGði1Þ

Furthermore, on the basis of the sample vectors, similar estimates up to the ð‘1Þth step can be obtained as ½g,ki þ 1 ½g,ki þ 1 ~ ~ ðgÞ0 ~ ðgÞ0 l~ ðgÞ ¼ x ½g,‘1 , l~ ðgÞ C l~ ðgÞ l~ ðgÞ0 iði1Þ ðx ði1Þ i ¼ xi ði1Þ Þ, ði1Þ ¼ ðl 1 , . . . , l i1 Þ, ðk‘ þ 1Þ ðk‘ þ 1Þ

1 G½‘1 , n½‘1 ðk‘ þ 1Þ

~ C ðk‘ þ 1Þ ¼

~ ¼C b , C ~ b C ii ii iði1Þ ¼ C iði1Þ ,

where the p½kq þ 1 -dimensional sample mean vector based on the data set illustrated in Fig. 2(a) for q ¼ 1, . . . ,‘ and s ¼ 1, . . . ,q is given by ðgÞ

x ðg,sÞ ¼ ðkq þ 1Þ

N½s X

1

NsðgÞ j ¼ NðgÞ

½s1

xðgÞ , ðkq þ 1Þj þ1

the p½kq þ 1 -dimensional sample mean vector based on the data set illustrated in Fig. 2(b) for q ¼ 1, . . . ,‘ and s ¼ 1, . . . ,q is given by ðgÞ

½g,s x ðkq ¼ þ 1Þ

N½s 1 X ðgÞ N½s j¼1

xðgÞ , ðkq þ 1Þj

the p½i -dimensional sample vector based on the data set illustrated in Fig. 2(c) and (d) for i ¼ k‘ þ 2, . . . ,k is given by 0 ðgÞ 1 x1j 0 1 ðgÞ ðgÞ C N½ki þ 1 N½ki þ 1 B þ 1 B ^ C X X x ½g,ki 1 1 B ðgÞ C ðgÞ @ ði1Þ A¼ xðiÞj ¼ ðgÞ Bx C, ðgÞ þ 1 C N½ki N½ki þ 1 j ¼ 1 B x ½g,ki þ 1 j ¼ 1 @ i1,j A i xðgÞ ij þ 1 þ 1 x ½g,ki is a p½i1 -dimensional partitioned vector, x ½g,ki is a pi-dimensional partitioned vector ði1Þ i

ðqÞ

G

ðqÞ

¼ nq S ,

S

0 B

G½‘ ¼B @ ðk‘ þ 1Þ

0 B

þ 1 G½ki ¼B ði1Þ @

ðqÞ

2 1 X ¼ nq g ¼ 1

ðgÞ

N½q X j¼

ðgÞ N½q1

ðxðgÞ x ðg,qÞ ÞðxðgÞ x ðg,qÞ Þ0 ðkq þ 1Þj ðkq þ 1Þ ðkq þ 1Þj ðkq þ 1Þ þ1

G½‘ 11

...

^

&

^

...

G½‘ k‘ þ 1,k‘ þ 1

G½‘ k‘ þ 1,1

G½‘ 1,k‘ þ 1

þ 1 G½ki 11

...

þ 1 G½ki 1,i1

^

&

^

...

þ 1 G½ki i1,i1

þ 1 G½ki i1,1

1

0

C C, A

G½‘1 ¼B @ ðk‘ þ 1Þ

B

G½‘1 11

...

^

&

^

...

G½‘1 k‘ þ 1,k‘ þ 1

G½‘1 k‘ þ 1,1

½‘1 G1,k‘ þ1

1 C C, A

1 C C, A

ð1Þ ðqÞ G½q ab ¼ Gab þ . . . þ Gab ,

þ 1 þ 1 0 þ 1 þ 1 G½ki ¼ fG½ki g ¼ ðG½ki . . . G½ki Þ, iði1Þ ði1Þi i1 i,i1 ðqÞ and GðqÞ for q ¼ 1, . . . ,‘, a ¼ 1, . . . ,kqþ 1, and b ¼ 1, . . . ,kqþ 1. ab is the pa  pb partitioned matrix of G

N. Shutoh

/ Journal of Statistical Planning and Inference 142 (2012) 110–125

N2(g)

N1(g)

115

Nq(g)

Ns(g)

p[k−q+1]

N1(g)

N2(g)

Nq(g)

Ns(g)

p[k−q+1]

N1(g)

(g)

N2(g)

Nk−i+1

N p[i−1]

(g) −1

p [k−

+2]

pi

N1(g)

(g)

N2(g)

Nk−i+1

N p[i−1]

(g) −1

p [k−

+2]

pi

Fig. 2. Data set that constructs the sample vectors. (a) part that constructs x ðg,sÞ for q ¼ 1, . . . ,‘ and s ¼ 1, . . . ,q; (b) part that constructs x ½g,s for q ¼ 1, . . . ,‘ ðkq þ 1Þ ðkq þ 1Þ þ 1 þ 1 and s ¼ 1, . . . ,q; (c) part that constructs x ½g,ki for i ¼ k‘ þ 2, . . . ,k ðkiþ 1 ¼ 1, . . . ,‘1Þ; and (d) part that constructs x ½g,ki for i ¼ k‘ þ 2, . . . ,k ði1Þ i ðki þ 1 ¼ 1, . . . ,‘1Þ.

In other words, on the basis of the sample vectors, the estimates of lðgÞ and S up to the ‘th step can be obtained as 0

1 b ðgÞ l B ðk‘ þ 1Þ C B ðgÞ C b C l b ðgÞ ¼ B l B k‘ þ 2 C, B C ^ @ A ðgÞ bk l 0 B b S ðk‘ þ 1Þ ¼ @

b S

0

b 11 S B b S ¼@ ^ b S k1

b 11 S

...

^

&

k‘ þ 1,1

...

... & ...

b S 1k

C ^ A, b S kk

b 1,k‘ þ 1 S ^ b S

1

1 C b A ¼ C ðk‘ þ 1Þ ,

k‘ þ 1,k‘ þ 1

1

b þC b b b b b b ii ¼ C b iði1Þ ¼ S b0 S S ii iði1Þ S ði1Þ C ði1Þi , ði1Þi ¼ C iði1Þ S ði1Þ , ~ up to the ð‘1Þth step can be obtained because the following relations hold: for i ¼ k‘ þ2, . . . ,k. Similarly, l~ ðgÞ and S

Sðk‘ þ 1Þ ¼ Cðk‘ þ 1Þ , Sii ¼ Cii þ Ciði1Þ S1 Siði1Þ ¼ S0ði1Þi ¼ Ciði1Þ Sði1Þ : ði1Þ Cði1Þi , Kanda and Fujikoshi (1998) obtained MLEs based on k-step monotone missing data for one group. The same for two groups can be obtained by substituting ‘ ¼ k, ð1Þ ð2Þ þ N½‘ , n½‘ ¼ N½‘

ð1Þ ð2Þ n½ki þ 1 ¼ N½ki þN½ki , þ 1 þ 1

116

N. Shutoh

GðqÞ ¼ nq SðqÞ þ

2 N ðgÞ N ðgÞ X q ½q1

ðg,qÞ ½g,q1 0 ðx ðg,qÞ x ½g,q1 Þðx ðkq x ðkq Þ ðkq þ 1Þ ðkq þ 1Þ þ 1Þ þ 1Þ

ðgÞ N½q

g¼1

/ Journal of Statistical Planning and Inference 142 (2012) 110–125

b for q¼2,y,k. b ðgÞ and S in l 4. Desired results for deriving the approximation The main objective of this study is to obtain the approximation for the EPMC. Hence, we derive the relations between b U‘ ¼ ðl

ð1Þ

b D2‘ ¼ ðl b V‘ ¼ ðl

b l

ð1Þ

ð1Þ

ð2Þ 0 b 1

b l

b l

ÞS

b ðl

ð2Þ 0 b 1

ð1Þ

ð1Þ

ÞS

b ðl

ð2Þ 0 b 1

b SS

ÞS

lð1Þ Þ12D2‘ , b l

1

b ðl

ð2Þ

ð1Þ

Þ,

b l

ð2Þ

Þ

and ~ 1 ðl~ ð1Þ lð1Þ Þ1D2 , U‘1 ¼ ðl~ ð1Þ l~ ð2Þ Þ0 S 2 ‘1 ~ 1 ðl~ ð1Þ l~ ð2Þ Þ, D2‘1 ¼ ðl~ ð1Þ l~ ð2Þ Þ0 S ~ 1 SS ~ 1 ðl~ ð1Þ l~ ð2Þ Þ V‘1 ¼ ðl~ ð1Þ l~ ð2Þ Þ0 S for constant ‘ð‘ ¼ 2, . . . ,kÞ. We define 0 Ip½k‘ þ 1 B b B C k‘ þ 2ðk‘ þ 1Þ b ¼B C ‘,k B ^ @ b C kðk‘ þ 1Þ

1 Ipk‘ þ 2

C C C C A

0 & b C k,k1

...

Ipk

and 0

1

b C B ðk‘ þ 1Þ

B B b ‘,k ¼ B H B B @

0 1

b C k‘ þ 2,k‘ þ 2 & 1

b C kk

0

1 C C C C: C C A

b 1 H b ¼C b 1 b 1 0 Then, the following holds: S ‘,k ‘,k ðC ‘,k Þ , the result stated in Corollary 8, 0 ½‘ 1 0 ½‘ 1 eðk‘ þ 1Þ dðk‘ þ 1Þ B C B C Bb C Ba b k‘ þ 2,k C ð1Þ ð2Þ ð1Þ b B C ð1Þ b b k‘ þ 2,k B C b l b Þ¼B b l Þ ¼ B C ‘,k ðl C, C, C ‘,k ðl B C @ A ^ ^ @ A b ab k,k b k,k

where ½‘

x ½2,‘ , e½‘ ¼ x ½1,‘ lð1Þ , dðk‘ þ 1Þ ¼ x ½1,‘ ðk‘ þ 1Þ ðk‘ þ 1Þ ðk‘ þ 1Þ ðk‘ þ 1Þ ðk‘ þ 1Þ þ 1 þ 1 ½1,ki þ 1 þ 1 b ab i,k ¼ ðx ½1,ki x ½2,ki ÞC x ½2,ki Þ, iði1Þ ðx ði1Þ i i ði1Þ ½1,ki þ 1 b ¼ ðx ½1,ki þ 1 x ½2,ki þ 1 ÞC b b lð1Þ iði1Þ ðx ði1Þ i,k i i ði1Þ Þ,

for i ¼ k‘ þ2, . . . ,k. Thus, we can obtain U‘ and V‘ as 1

½‘ b U‘ ¼ dðk‘ þ 1Þ C ðk‘ þ 1Þ eðk‘ þ 1Þ ½‘0

  k X 1 ½‘0 ½‘ b 1a b 1 b 1 b b 1 a b i,k b 0i,k C  dðk‘ þ 1Þ C ab 0i,k C i,k ðk‘ þ 1Þ dðk‘ þ 1Þ þ ii ii 2 2 i ¼ k‘ þ 2

and 1

1

b b V‘ ¼ dðk‘ þ 1Þ C ðk‘ þ 1Þ Sðk‘ þ 1Þ C ðk‘ þ 1Þ dðk‘ þ 1Þ ½‘0

½‘

ð2Þ

N. Shutoh

117

  ½‘0 b 1 a b 1 b b dðk‘ þ 1Þ C ð S  S C Þ C ði1Þi ðk‘ þ 1Þi ðk‘ þ 1Þði1Þ i,k ðk‘ þ 1Þ ii

k X

þ2

/ Journal of Statistical Planning and Inference 142 (2012) 110–125

i ¼ k‘ þ 2



k X

k X

þ



b O b b 1 b b 1 ðO b ab 0i,k C ij iðj1Þ C ðj1Þj ÞC jj a j,k , ii

ð3Þ

i ¼ k‘ þ 2 j ¼ k‘ þ 2

b ¼ S C b where O iðjÞ iðjÞ iði1Þ Sði1ÞðjÞ ,

b ¼ S C b O ij ij iði1Þ Sði1Þj :

b ¼C ~ ,C b ~ Similarly, U‘1 and V‘1 can be obtained. Using (2), (3), C ii ii iði1Þ ¼ C iði1Þ , and the result stated in Corollary 8, we can obtain the following relations: ½‘0 ½‘10 ½‘ ½‘1 b 1 ~ 1 U‘ U‘1 ¼ dðk‘ þ 1Þ C ðk‘ þ 1Þ eðk‘ þ 1Þ dðk‘ þ 1Þ C ðk‘ þ 1Þ eðk‘ þ 1Þ

1 ½‘0 1 ½‘10 ~ 1 ½‘ ½‘1 b 1  dðk‘ þ 1Þ C ðk‘ þ 1Þ dðk‘ þ 1Þ þ dðk‘ þ 1Þ C ðk‘ þ 1Þ dðk‘ þ 1Þ , 2 2

ð4Þ

½‘0 ½‘ ½‘10 ½‘1 b 1 b 1 ~ 1 ~ 1 V‘ V‘1 ¼ dðk‘ þ 1Þ C ðk‘ þ 1Þ Sðk‘ þ 1Þ C ðk‘ þ 1Þ dðk‘ þ 1Þ dðk‘ þ 1Þ C ðk‘ þ 1Þ Sðk‘ þ 1Þ C ðk‘ þ 1Þ dðk‘ þ 1Þ



k X

þ2

1

1

½‘0 b b b b dðk‘ þ 1Þ C ðk‘ þ 1Þ ðSðk‘ þ 1Þi Sðk‘ þ 1Þði1Þ C ði1Þi ÞC ii a i,k



i ¼ k‘ þ 2

2

k X

n

o ½‘10 ~ 1 ~ ~ 1 ~ dðk‘ þ 1Þ C ðk‘ þ 1Þ ðSðk‘ þ 1Þi Sðk‘ þ 1Þði1Þ C ði1Þi ÞC ii a i,k ,

ð5Þ

i ¼ k‘ þ 2 ½‘1

½2,‘1 b i,k and dðk‘ þ 1Þ ¼ x ½1,‘1 where a~ i,k ¼ a x ðk‘ for i ¼ k‘ þ 2, . . . ,k and ‘ ¼ 2, . . . ,k. ðk‘ þ 1Þ þ 1Þ From the expectations of (4) and (5), we can obtain the following relations on the expectations whose proofs are provided in Appendix B.

Proposition 4. Suppose that m1 34 0. Then, the following expectations can be obtained: EðU‘ Þ ¼ EðU‘1 Þ

EðV‘ Þ ¼ EðV‘1 Þ þ

n½‘1 n½‘ u þ un , 2ðm½‘,k‘ þ 1 1Þ ‘,k 2ðm½‘1,k‘ þ 1 1Þ ‘,k n2½‘ ðn½‘ 1Þ m½‘,k‘ þ 1 ðm½‘,k‘ þ 1 1Þðm½‘,k‘ þ 1 3Þ

v‘,k

n2½‘1 ðn½‘1 1Þ

vn m½‘1,k‘ þ 1 ðm½‘1,k‘ þ 1 1Þðm½‘1,k‘ þ 1 3Þ ‘,k   k X n½ki þ 1 pi n½‘1 ðn½‘1 1Þ n½‘ ðn½‘ 1Þ v‘,k  vn‘,k , þ2 ðm½ki þ 1,i1 1Þðm½ki þ 1,i 1Þ m½‘,k‘ þ 1 ðm½‘,k‘ þ 1 3Þ m½‘1,k‘ þ 1 ðm½‘1,k‘ þ 1 3Þ i ¼ k‘ þ 2



for ‘ ¼ 2, . . . ,k. The result stated in Theorem 2 is dependent on unknown parameters. For practical application of Theorem 2, we require estimates of the Mahalanobis squared distances. We propose the estimation of the Mahalanobis squared distances in order to compute the asymptotic approximation from the data set. The proof of the following theorem is provided in Appendix B. 2 b 2 , can be obtained as Theorem 5. The unbiased estimators of di ði ¼ 1, . . . ,kÞ, denoted by d i 2

db1 ¼

ð1Þ ð2Þ p1 ðN½k þ N½k Þ n½k p1 1 2 Dk,1  , ð1Þ ð2Þ n½k N N ½k

½k

8 93 ð2Þ < 2 p1 ðN ð1Þ þN ð2Þ Þ= N ð1Þ þ N½k1 m 1 n ½k1,2 ½k ½k ½k b 2  p2 ðn½k1 1Þ  ½k1 4D2  5 þ m½k1,2 1 d db2 ¼ db1 þ , 1 k,2 ð1Þ ð2Þ ð1Þ ð2Þ ; n½k1 m½k1,1 1 m½k1,1 1 n½k p1 1 : N½k N½k N½k1 N½k1 8 9 8 2 i1 < 2 p1 ðN ð1Þ þN ð2Þ Þ= X 2 m½ki þ 1,i 1 n½k n½ks þ 1 < b 2 m 1 b 2 ½k ½k 2 b b 4 di ¼ Dk,i  d þ d s  ½ks þ 1,s d s1  ð1Þ ð2Þ ; : n½ki þ 1 m n½k p1 1 : 1 m 1 ½ks þ 1,s ½ks þ 1,s1 1 N½k N½k s¼2 93 ð1Þ ð2Þ ð1Þ ð2Þ ps ðn½ks þ 1 1Þ N½ks þ 1 þN½ks þ 1 =5 m½ki þ 1,i 1 b 2 pi ðn½ki þ 1 1Þ N½ki þ 1 þN½ki þ 1   þ þ d  ði Z3Þ, ; m½ks þ 1,s1 1 N ð1Þ m½ki þ 1,i1 1 i1 m½ki þ 1,i1 1 N ð1Þ Nð2Þ Nð2Þ 2

2

½ks þ 1

½ks þ 1

½ki þ 1

½ki þ 1

118

N. Shutoh

0 B D2k,i ¼ B @

ð2Þ

10 0

/ Journal of Statistical Planning and Inference 142 (2012) 110–125

11 0

1

b 11 . . . S b 1i b ð2Þ b ð1Þ l S l b b ð1Þ l 1 C 1 l 1 C B 1 B C C@ ^ C, & ^ A B ^ ^ A @ A ð1Þ ð2Þ ð1Þ ð2Þ b b b b S . . . S l i l i bi b i l l i1 ii

b ab are the estimates for ‘ ¼ k, obtained in Section 3. b ðgÞ l and S i 5. Simulation studies We conduct simulation studies in order to numerically evaluate our results. Then, we compare the approximations proposed in this paper with those of Lachenbruch (1968) and Shutoh et al. (2011) in terms of accuracy. Although these approximations can be applied to data sets with unequal sample sizes, we set M‘  N‘ð1Þ ¼ N‘ð2Þ ð‘ ¼ 1, . . . ,kÞ for convenience. In particular, we are interested in the expectation and variance of the approximation. It can be observed that the approximation has good accuracy when its variance is small and when the difference between the expectation and the EPMC is small. We perform simulation studies under x 2 Pð1Þ . Further, all the simulation studies are conducted using the following sample sizes: M1 ¼10, 15, 20, 25, 30, 35, 40, 45, 50 and M(i) ¼ 10, 15, 20, 25, 30, 35, 40, 45, 50, where M1 ¼ N1ð1Þ ¼ N1ð2Þ and MðiÞ ¼ N1ð1Þ ¼ N1ð2Þ ¼    ¼ Nið1Þ ¼ Nið2Þ for i¼2,y,k. The approximations based on k-step monotone missing training data are denoted by b e k , and they are compared with results based on complete training data (denoted by b e 1 ) and results based on 2step monotone missing training data (denoted by b e 2 ). The dimensionalities are set as follows. Tables 1–4 list the performance parameters of approximations under a small dimensionality, i.e., p ¼3. We compare the case of complete data, the case of 2-step monotone missing data when p1 ¼ 2,p2 ¼ 1, and the case of 3-step monotone missing data when p1 ¼ p2 ¼ p3 ¼ 1. Tables 5–8 list the performance parameters under a large dimensionality, i.e., p ¼7. We compare the case of complete data, the case of 2-step monotone missing data when p1 ¼ 6,p2 ¼ 1, and the case of 7-step monotone missing data when p1 ¼ p2 ¼ p3 ¼ p4 ¼ p5 ¼ p6 ¼ p7 ¼ 1. All the simulation studies indicate that the approximations proposed in this paper for k-step monotone missing training data exhibit a smaller difference between the expectations of the approximations and the EPMC. Further, the variances of b ek are smaller than those obtained by Lachenbruch (1968) and Shutoh et al. (2011). Finally, we can observe that the linear discriminant function constructed using k-step monotone missing training data has lower probabilities of misclassification. 6. Conclusion We derived an asymptotic approximation based on k-step monotone missing training data. As it turns out, from Theorem 2, we obtained a certain generalization of the results derived by Lachenbruch (1968) and Shutoh et al. (2011). Furthermore, the desired estimates, i.e., the unbiased estimates of the Mahalanobis squared distances, could be derived Table 1 EPMC in W1, W2, and W3 computed using Monte Carlo simulation when p ¼ 3. M1 10 15 20 25 30 35 40 45 50

e1 ð2j1Þ

M(2)

e2 ð2j1Þ

M(3)

e3 ð2j1Þ

0.3461 0.3319 0.3230 0.3192 0.3160 0.3129 0.3119 0.3104 0.3098

10 15 20 25 30 35 40 45 50

0.3329 0.3226 0.3162 0.3142 0.3115 0.3092 0.3087 0.3075 0.3072

10 15 20 25 30 35 40 45 50

0.3314 0.3216 0.3151 0.3133 0.3110 0.3088 0.3084 0.3073 0.3070

Table 2 Expected values of asymptotic approximations computed using Monte Carlo simulation when p ¼3. M1 10 15 20 25 30 35 40 45 50

Eðb e1Þ

M(2)

Eðb e2Þ

M(3)

Eðeb3 Þ

0.3962 0.3644 0.3479 0.3380 0.3315 0.3269 0.3234 0.3207 0.3186

10 15 20 25 30 35 40 45 50

0.3662 0.3418 0.3304 0.3239 0.3197 0.3167 0.3145 0.3128 0.3115

10 15 20 25 30 35 40 45 50

0.3593 0.3366 0.3264 0.3207 0.3169 0.3144 0.3125 0.3110 0.3098

N. Shutoh

/ Journal of Statistical Planning and Inference 142 (2012) 110–125

119

Table 3 Differences between the expected values of the approximations and the EPMC computed using Monte Carlo simulation when p ¼3. M1 10 15 20 25 30 35 40 45 50

Eðeb1 Þe1 ð2j1Þ

M(2)

Eðb e 2 Þe2 ð2j1Þ

M(3)

Eðb e 3 Þe3 ð2j1Þ

0.0501 0.0325 0.0249 0.0188 0.0154 0.0140 0.0115 0.0103 0.0087

10 15 20 25 30 35 40 45 50

0.0334 0.0192 0.0142 0.0098 0.0082 0.0075 0.0058 0.0053 0.0043

10 15 20 25 30 35 40 45 50

0.0279 0.0150 0.0113 0.0074 0.0059 0.0056 0.0041 0.0037 0.0028

M(3)

Varðb e3Þ

10 15 20 25 30 35 40 45 50

4.655  10  3 2.552  10  3 1.705  10  3 1.273  10  3 1.006  10  3 0.832  10  3 0.708  10  3 0.616  10  3 0.543  10  3

Table 4 Variances of approximations computed using Monte Carlo simulation when p ¼3. M1 10 15 20 25 30 35 40 45 50

Varðb e1Þ

M(2) 3

13.837  10 8.153  10  3 5.455  10  3 4.022  10  3 3.146  10  3 2.580  10  3 2.189  10  3 1.901  10  3 1.667  10  3

10 15 20 25 30 35 40 45 50

Varðb e2Þ 3

6.616  10 3.683  10  3 2.480  10  3 1.857  10  3 1.470  10  3 1.218  10  3 1.039  10  3 0.907  10  3 0.801  10  3

Table 5 EPMC in W1, W2, and W7 computed using Monte Carlo simulation when p ¼ 7. M1 10 15 20 25 30 35 40 45 50

e1 ð2j1Þ

M(2)

e2 ð2j1Þ

M(7)

e7 ð2j1Þ

0.3908 0.3662 0.3534 0.3446 0.3381 0.3326 0.3289 0.3268 0.3245

10 15 20 25 30 35 40 45 50

0.3683 0.3464 0.3362 0.3291 0.3241 0.3201 0.3175 0.3164 0.3145

10 15 20 25 30 35 40 45 50

0.3582 0.3368 0.3273 0.3219 0.3175 0.3143 0.3125 0.3117 0.3105

Table 6 Expected values of asymptotic approximations computed using Monte Carlo simulation when p¼ 7. M1 10 15 20 25 30 35 40 45 50

Eðb e1Þ

M(2)

Eðb e2Þ

M(7)

Eðb e7Þ

0.4383 0.3984 0.3769 0.3632 0.3535 0.3465 0.3410 0.3368 0.3333

10 15 20 25 30 35 40 45 50

0.4078 0.3675 0.3498 0.3395 0.3328 0.3281 0.3245 0.3217 0.3196

10 15 20 25 30 35 40 45 50

0.3963 0.3521 0.3358 0.3273 0.3220 0.3185 0.3159 0.3140 0.3124

using Theorem 5. Finally, we performed simulation studies to evaluate our result and to compare it with the results derived by Lachenbruch (1968) and Shutoh et al. (2011). These studies show that our approximation is more accurate than existing approximations.

120

N. Shutoh

/ Journal of Statistical Planning and Inference 142 (2012) 110–125

Table 7 Differences between the expected values of the approximations and the EPMC computed using Monte Carlo simulation when p ¼ 7. M1 10 15 20 25 30 35 40 45 50

Eðb e 1 Þe1 ð2j1Þ

M(2)

Eðb e 2 Þe2 ð2j1Þ

M(7)

Eðb e 7 Þe7 ð2j1Þ

0.0475 0.0322 0.0234 0.0185 0.0154 0.0140 0.0121 0.0100 0.0088

10 15 20 25 30 35 40 45 50

0.0396 0.0211 0.0136 0.0104 0.0087 0.0080 0.0070 0.0054 0.0051

10 15 20 25 30 35 40 45 50

0.0381 0.0153 0.0085 0.0054 0.0046 0.0042 0.0034 0.0023 0.0019

Table 8 Variances of approximations computed using Monte Carlo simulation when p ¼ 7. M1 10 15 20 25 30 35 40 45 50

Varðb e1Þ

M(2)

Varðb e2Þ

M(3)

Varðb e7Þ

13.298  10  3 8.956  10  3 6.350  10  3 4.785  10  3 3.781  10  3 3.099  10  3 2.608  10  3 2.243  10  3 1.963  10  3

10 15 20 25 30 35 40 45 50

7.206  10  3 4.302  10  3 2.907  10  3 2.146  10  3 1.687  10  3 1.388  10  3 1.170  10  3 1.010  10  3 0.888  10  3

10 15 20 25 30 35 40 45 50

3.502  10  3 1.658  10  3 1.028  10  3 0.724  10  3 0.552  10  3 0.445  10  3 0.369  10  3 0.317  10  3 0.275  10  3

We may consider the error bound for the proposed approximation, as described by Fujikoshi et al. (2010). Using the concept presented in Section 4, we can also obtain the asymptotic approximations derived by Okamoto (1963) and Shutoh (2011) for k-step monotone missing training data. In the future, it may be possible to obtain more accurate approximations than the approximation proposed in this paper, under small dimensionality.

Acknowledgments The author is greatly indebted to Professor T. Seo and Professor Y. Fujikoshi for their insightful comments on this study. The author would also like to thank Mr. K. Kurihara for carefully checking the main theorem in this paper. Finally, the author would like to extend his sincere gratitude the referee who gave invaluable comments and suggestions, which have greatly enhanced this paper. This study was supported by Grant-in-Aid for JSPS Fellows (23  6926Þ.

Appendix A. Lemmas The distributions of the estimates are obtained in the following lemma. Essentially, Lemma 6 has been obtained by Kanda and Fujikoshi (1998). Lemma 6. For i ¼ k‘ þ 2, . . . ,k, n1 4 p, the estimates for ‘-step partial data set satisfy the following properties. (i)

½g,ki þ 1 b b ðgÞ C ,l iði1Þ and fx ði1Þ ðiÞ g are independent,

(ii)

b b b and fC C ii ði1Þ , C iði1Þ g are independent,

(iii) (iv)

b b b In particular, C ðk‘ þ 1Þ , C k‘ þ 2,k‘ þ 2 , . . . , C kk are mutually independent, ½‘ Gðk‘ þ 1Þ  Wp½k‘ þ 1 ðn½‘ , Sðk‘ þ 1Þ Þ,

(v) (vi)

þ 1 G½ki iið1...i1Þ  Wpi ðn½ki þ 1 p½i1 , Siið1...i1Þ Þ, ½‘ ð1Þ ð2Þ ð1Þ ð2Þ þ N½‘ Þ=N½‘ N½‘ ÞSðk‘ þ 1Þ Þ, dðk‘ þ 1Þ  Np½k‘ þ 1 ðdðk‘ þ 1Þ ,ððN½‘

(vii)

ð1Þ e½‘  Np½k‘ þ 1 ð0,ð1=N½‘ ÞSðk‘ þ 1Þ Þ. ðk‘ þ 1Þ

Moreover, on the basis of the sample vector, the estimates up to ð‘1Þ-step have similar properties. Next, we state the results for the inverse of the partitioned matrix.

N. Shutoh

/ Journal of Statistical Planning and Inference 142 (2012) 110–125

121

Lemma 7. Suppose that O is a nonsingular matrix that is decomposed as !

O11 O21



O12 : O22

Then, the inverse matrix can be obtained as 1

O

¼

1 1 1 O1 11 þ O11 O12 O221 O21 O11

1 O1 11 O12 O221

1 O1 221 O21 O11

O1 221

! ,

where O221 ¼ O22 O21 O1 11 O12 . From Lemma 7, it follows that Corollary 8 also holds. Corollary 8. For i ¼ k‘ þ 2, . . . ,k, it holds that

Sðk‘ þ 1Þi ¼ Sðk‘ þ 1Þði1Þ S1 ði1Þ Sði1Þi , where 0

Sðk‘ þ 1Þi ¼ B @

0

Sði1Þ ¼ B @

S1i

1

0

^

C A,

Sðk‘ þ 1Þði1Þ ¼ B @

Sk‘ þ 1,i

0

&

^

...

Si1,i1

C A,

Sði1Þi ¼ B @

^

Si1,1

&

^

...

Sk‘ þ 1,i1

C A,

^

Sk‘ þ 1,1

1

...

1

...

S1,i1

S11

S1,i1

S11

S1i

1

^

C A:

Si1,i

Proof. For i ¼ k‘ þ 2, Corollary 8 clearly holds. We primarily consider the case of i ¼ k‘ þ 3, . . . ,k. Using the expressions

Sðk‘ þ 1Þði1Þ ¼ ðSðk‘ þ 1Þ Sðk‘ þ 1Þk‘ þ 2 . . . Sðk‘ þ 1Þi1 Þ, 0

Sðk‘ þ 1Þ

BS B k‘ þ 2ðk‘ þ 1Þ ^ @

Sði1Þ ¼ B B

Sðk‘ þ 1Þk‘ þ 2 Sk‘ þ 2,k‘ þ 2

...

^

&

Si1,k‘ þ 2

...

Si1ðk‘ þ 1Þ

...

1

Sðk‘ þ 1Þi1 Sk‘ þ 2,i1 C C ^

Si1,i1

C, C A

S0ði1Þi ¼ ðS0ðk‘ þ 1Þi Sk‘ þ 2,i 0 . . . S0i1,i Þ and applying Sði1Þ to the result stated in Lemma 7 with

O11 ¼ Sðk‘ þ 1Þ , O12 ¼ O021 ¼ ðSðk‘ þ 1Þk‘ þ 2 . . . Sðk‘ þ 1Þ1 Þ, 0

O22 ¼ B @

Sk‘ þ 2,i1

1

&

^

...

Si1,i1

C A,

Sk‘ þ 2,k‘ þ 2

...

^

Si1,k‘ þ 2

it follows, from the right-hand side, that Corollary 8 holds.

&

Furthermore, we state some results on the expectations of the random vectors and matrices. Lemma 9. Suppose that y  Nd ðg, OÞ, G  Wd ðf , OÞ and f d3 4 0. Consider the decomposition ! ! ! ! g1 y1 O11 O12 G11 G12 y¼ , G¼ , g¼ , O¼ , g2 y2 G21 G22 O21 O22

122

N. Shutoh

/ Journal of Statistical Planning and Inference 142 (2012) 110–125

where ya is a da -dimensional partitioned random vector of y, Gab is a da  db partitioned matrix of G, ga is a da -dimensional partitioned vector of g, Oab is a da  db partitioned matrix of O, and d ¼ d1 þ d2 . Then, it holds that Eðy0 AyÞ ¼ tr OA0 þ g0 Ag, y2 jy1  Nd2 ðg2 þ O21 O1 11 ðy 1 g1 Þ, O221 Þ, EðG1 Þ ¼ ff d1g1 O1 , EðG1 AG1 Þ ¼ fðf dÞðf d1Þðf d3Þg1 fðf d2ÞO1 AO1 þ O1 A0 O1 þtr AO1  O1 g, 1 EðG1 11 G12 jG11 Þ ¼ O11 O12 , 1 1 1 1 EðG1 11 G12 A2 G21 G11 jG11 Þ ¼ tr A2 O221  G11 þ O11 O12 A2 O21 O11 ,

where A is a d  d constant matrix and A2 is a d2  d2 constant matrix. Finally, we present the desired lemma derived by Shutoh et al. (2011). Lemma 10 (Shutoh et al., 2011). Suppose that G1  Wd ðf1 , OÞ, G2  Wd ðf2 , OÞ, f1 d1 40, f d3 4 0, and they are mutually 1 independent, where f ¼ f1 þ f2 . Then, it holds that Eðtr C1 G1 Þ ¼ fðf dÞðf d3Þðf1 d1Þg1 ½ðf d2Þtr C1 O1 1 C2 ðG1 þ G2 Þ 1 1 0 1 1 1 C2 O þtr C1 O C 2 O þtr C1 O  tr C2 O , where C1 and C2 are d  d constant matrices. Appendix B. Proofs B.1. Proof of Proposition 4 First, we consider the statistic U‘ U‘1 . From the results stated in Lemmas 6 and 9, it follows that EðfG½‘ g1 Þ ¼ ðm½‘,k‘ þ 1 1Þ1 S1 ðk‘ þ 1Þ , ðk‘ þ 1Þ ½‘0

½‘ Eðdðk‘ þ 1Þ S1 ðk‘ þ 1Þ eðk‘ þ 1Þ Þ ¼

½‘0

p½k‘ þ 1 ð1Þ N½‘

½‘

Eðdðk‘ þ 1Þ S1 ðk‘ þ 1Þ dðk‘ þ 1Þ Þ ¼ v‘,k ,

,

EðfG½‘1 g1 Þ ¼ ðm½‘1,k‘ þ 1 1Þ1 S1 ðk‘ þ 1Þ , ðk‘ þ 1Þ ½‘10

½‘1 Eðdðk‘ þ 1Þ S1 ðk‘ þ 1Þ eðk‘ þ 1Þ Þ ¼

½‘10

p½k‘ þ 1 ð1Þ N½‘1

,

½‘10

n Eðdðk‘ þ 1Þ S1 ðk‘ þ 1Þ dðk‘ þ 1Þ Þ ¼ v‘,k

ðB:1Þ

hold. Therefore, it holds that the expectation of ½‘0 ½‘ ½‘ 1 ½‘0 b 1 b 1 dðk‘ þ 1Þ C ðk‘ þ 1Þ eðk‘ þ 1Þ 2dðk‘ þ 1Þ C ðk‘ þ 1Þ dðk‘ þ 1Þ

equals 

n½‘ u : 2ðm½‘,k‘ þ 1 1Þ ‘,k

ðB:3Þ

Similarly, it holds that the expectation of 1

1

½‘1 1 ~ ~ dðk‘ þ 1Þ C ðk‘ þ 1Þ eðk‘ þ 1Þ þ 2dðk‘ þ 1Þ C ðk‘ þ 1Þ dðk‘ þ 1Þ ½‘10

½‘10

½‘1

equals 

n½‘1 un : 2ðm½‘1,k‘ þ 1 1Þ ‘,k

ðB:4Þ

Thus, the expectation of U‘ U‘1 can be obtained from (B.3) and (B.4). Next, we consider the statistic V‘ V‘1 . From the results stated in Lemmas 6 and 9, it follows that g1 Sðk‘ þ 1Þ fG½‘ g1  ¼ E½fG½‘ ðk‘ þ 1Þ ðk‘ þ 1Þ

n½‘ 1 S1 , m½‘,k‘ þ 1 ðm½‘,k‘ þ 1 1Þðm½‘,k‘ þ 1 3Þ ðk‘ þ 1Þ

ðB:5Þ

½‘1 g1 Sðk‘ þ 1Þ fG½‘1 g1  ¼ E½fGðk‘ þ 1Þ ðk‘ þ 1Þ

n½‘1 1 S1 m½‘1,k‘ þ 1 ðm½‘1,k‘ þ 1 1Þðm½‘1,k‘ þ 1 3Þ ðk‘ þ 1Þ

ðB:6Þ

also hold. Using (B.1), (B.2), (B.5), and (B.6), we can obtain ½‘0 ½‘ b 1 b 1 Eðdðk‘ þ 1Þ C ðk‘ þ 1Þ Sðk‘ þ 1Þ C ðk‘ þ 1Þ dðk‘ þ 1Þ Þ ¼

1

1

~ ~ Eðdðk‘ þ 1Þ C ðk‘ þ 1Þ Sðk‘ þ 1Þ C ðk‘ þ 1Þ dðk‘ þ 1Þ Þ ¼ ½‘10

½‘1

n2½‘ ðn½‘ 1Þ m½‘,k‘ þ 1 ðm½‘,k‘ þ 1 1Þðm½‘,k‘ þ 1 3Þ

v‘,k ,

n2½‘1 ðn½‘1 1Þ m½‘1,k‘ þ 1 ðm½‘1,k‘ þ 1 1Þðm½‘1,k‘ þ 1 3Þ

ðB:7Þ

vn‘,k :

ðB:8Þ

N. Shutoh

/ Journal of Statistical Planning and Inference 142 (2012) 110–125

123

Further, we consider ½‘0 b 1 b b 1 b dðk‘ þ 1Þ C ðk‘ þ 1Þ ðSðk‘ þ 1Þi Sðk‘ þ 1Þði1Þ C ði1Þi ÞC ii a i,k 1

1 ½ki þ 1

b b ¼ dðk‘ þ 1Þ C ðk‘ þ 1Þ Sðk‘ þ 1Þi C ii di ½‘0

ðB:9Þ

½‘0 b 1 b b 1 ½ki þ 1 dðk‘ þ 1Þ C ðk‘ þ 1Þ Sðk‘ þ 1Þði1Þ C ði1Þi C ii di

ðB:10Þ

½‘0 ½ki þ 1 b 1 b b 1 b þdðk‘ þ 1Þ C ðk‘ þ 1Þ Sðk‘ þ 1Þði1Þ C ði1Þi C ii C iði1Þ dði1Þ

ðB:11Þ

½‘0 ½ki þ 1 b 1 b 1 b : dðk‘ þ 1Þ C ðk‘ þ 1Þ Sðk‘ þ 1Þi C ii C iði1Þ dði1Þ

ðB:12Þ

First, from Corollary 8 and the result derived from Lemmas 6 and 9: þ 1 1 ½ki þ 1 þ 1 EðfG½ki g Gði1Þi jG½ki Þ ¼ S1 ði1Þ Sði1Þi , ði1Þ ði1Þ

ðB:13Þ

it follows that the conditional expectation of the sum of (B.9) and (B.10) given ½‘

½ki þ 1

ðdðk‘ þ 1Þ ,di

þ 1 , G½‘ , G½ki Þ iið1...i1Þ ðk‘ þ 1Þ

equals 0. Next, the conditional expectations of the sum of (B.11) and (B.12) given ½‘

½ki þ 1

ðdðk‘ þ 1Þ ,dði1Þ

þ 1 þ 1 , G½‘ , G½ki , G½ki ði1Þ ði1Þi Þ ðk‘ þ 1Þ

can be obtained: n½‘ n½ki þ 1 ½‘0 þ 1 1 ½ki þ 1 1 d ffG½‘ g1 Sðk‘ þ 1Þði1Þ fG½ki g Gði1Þi Siið1...i1Þ ði1Þ ðk‘ þ 1Þ m½ki þ 1,i 1 ðk‘ þ 1Þ ½ki þ 1

½ki þ 1 ½ki þ 1 1 þ fG½‘ g1 Sðk‘ þ 1Þi S1 g dði1Þ iið1...i1Þ gGiði1Þ fGði1Þ ðk‘ þ 1Þ

,

since we can obtain þ 1 1 g Þ ¼ ðm½ki þ 1,i 1Þ1 S1 EðfG½ki iið1...i1Þ iið1...i1Þ

ðB:14Þ

using Lemmas 6 and 9. Moreover, from Lemmas 6 and 9, it follows that þ 1 1 ½ki þ 1 1 þ 1 ½ki þ 1 1 þ 1 g Gði1Þi Siið1...i1Þ G½ki g jG½ki Þ EðfG½ki ði1Þ iði1Þ fGði1Þ ði1Þ þ 1 1 1 1 ¼ pi fG½ki g þ S1 ði1Þ Sði1Þi Siið1...i1Þ Siði1Þ Sði1Þ ði1Þ

ðB:15Þ

holds. Therefore, the conditional expectation of the sum of (B.11) and (B.12) given ½‘

½ki þ 1

ðdðk‘ þ 1Þ ,dði1Þ

þ 1 , G½‘ , G½ki Þ ði1Þ ðk‘ þ 1Þ

can be obtained using (B.13), (B.14), (B.15), and Corollary 8: n½‘ n½ki þ 1 pi ½‘0 þ 1 1 ½ki þ 1 d fG½‘ g1 Sðk‘ þ 1Þði1Þ fG½ki g dði1Þ : ði1Þ m½ki þ 1,i 1 ðk‘ þ 1Þ ðk‘ þ 1Þ If i4 k‘ þ 2, then we consider 0 ½ki þ 1 þ 1 1 g fG½ki ði1Þ

¼@

Gðk‘ þ 1Þ

þ 1 G½ki ðk‘ þ 1Þðk‘ þ 2...i1Þ

þ 1 G½ki ðk‘ þ 2...i1Þðk‘ þ 1Þ

þ 1 G½ki ðk‘ þ 2...i1Þ

11 A

,

ðB:16Þ

where 0 B þ 1 G½ki ¼B @ ðk‘ þ 1Þ

1

þ 1 G½ki 11

...

þ 1 G½ki 1,k‘ þ 1

^

&

^

þ 1 G½ki k‘ þ 1,1

...

þ 1 G½ki k‘ þ 1,k‘ þ 1

C C, A 0

n

þ 1 þ 1 G½ki ¼ G½ki ðk‘ þ 1Þðk‘ þ 2...i1Þ ðk‘ þ 2...i1Þðk‘ þ 1Þ

o0

B ¼B @

þ 1 G½ki 1,k‘ þ 2

...

þ 1 G½ki 1,i1

^

&

^

þ 1 G½ki k‘ þ 1,k‘ þ 2

...

þ 1 G½ki k‘ þ 1,i1

1 C C, A

124

N. Shutoh

0 B þ 1 G½ki ¼B @ ðk‘ þ 2...i1Þ

/ Journal of Statistical Planning and Inference 142 (2012) 110–125

þ 1 G½ki k‘ þ 2,k‘ þ 2

...

þ 1 G½ki k‘ þ 2,i1

^

&

^

þ 1 G½ki i1,k‘ þ 2

...

þ 1 G½ki i1,i1

1 C C: A

Applying (B.16) to Lemma 7 and using Lemmas 6 and 9, we can obtain the conditional expectation of the sum of (B.11) and (B.12) given ½ki þ 1

þ 1 ðdðk‘ þ 1Þ ,dðk‘ þ 1Þ , G½‘ , G½ki Þ ði1Þ ðk‘ þ 1Þ ½‘

as pi ðm½ki þ 1,k‘ þ 1 1Þ ½‘0 þ 1 1 ½ki þ 1 dðk‘ þ 1Þ fG½‘ g1 Sðk‘ þ 1Þ fG½ki g dðk‘ þ 1Þ : ðk‘ þ 1Þ ðk‘ þ 1Þ m½ki þ 1,i1 1 This coincides with the result for i ¼ k‘ þ2. It is important to note that ½ki þ 1

½ki þ 1

½ki þ 1

Eðdðk‘ þ 2...i1Þ jdðk‘ þ 1Þ Þ ¼ dðk‘ þ 2...i1Þ þ Sðk‘ þ 2...i1Þðk‘ þ 1Þ S1 ðk‘ þ 1Þ ðdðk‘ þ 1Þ dðk‘ þ 1Þ Þ, ½ki þ 1

½‘0

Eðdðk‘ þ 1Þ Adðk‘ þ 1Þ Þ ¼ d0ðk‘ þ 1Þ Adðk‘ þ 1Þ þ

ð1Þ ð2Þ N½‘ þ N½‘ ð1Þ ð2Þ N½‘ N½‘

tr Sðk‘ þ 1Þ A0 ,

where A is a p½k‘ þ 1  p½k‘ þ 1 constant matrix, 0 ½ki þ 1 1 0 1 d dk‘ þ 2 B k‘ þ 2 C B ½ki þ 1 C, dðk‘ þ 2...i1Þ ¼ @ ^ C ^ dðk‘ þ 2...i1Þ ¼ B A, @ A ½ki þ 1 di1 di1

ðB:17Þ

0

Sðk‘ þ 2...i1Þðk‘ þ 1Þ ¼ B @

Sk‘ þ 2ðk‘ þ 1Þ

1

^

C A:

Si1ðk‘ þ 1Þ

Furthermore, from Lemmas 6 and 10, it follows that þ 1 1 g1 Sðk‘ þ 1Þ fG½ki g dðk‘ þ 1Þ Þ ¼ Eðd0ðk‘ þ 1Þ fG½‘ ðk‘ þ 1Þ ðk‘ þ 1Þ

þ 1 1 g1 Sðk‘ þ 1Þ fG½ki g Þ¼ Eðtr Sðk‘ þ 1Þ fG½‘ ðk‘ þ 1Þ ðk‘ þ 1Þ

n½‘ 1 d2 , m½‘,k‘ þ 1 ðm½‘,k‘ þ 1 1Þðm½‘,k‘ þ 1 3Þ k‘ þ 1

p½k‘ þ 1 ðn½‘ 1Þ , m½‘,k‘ þ 1 ðm½‘,k‘ þ 1 1Þðm½‘,k‘ þ 1 3Þ

ðB:18Þ

ðB:19Þ

hold. Combining (B.17)–(B.19), for i ¼ k‘ þ2, . . . ,k, we can obtain the expectation of the sum of (B.11) and (B.12), i.e., the expectation of the sum of (B.9)–(B.12): 1

1

½‘0 b b b b Eðdðk‘ þ 1Þ C ðk‘ þ 1Þ ðSðk‘ þ 1Þi Sðk‘ þ 1Þði1Þ C ði1Þi ÞC ii a i,k Þ

¼

pi n½‘ n½ki þ 1 ðn½‘ 1Þ v : m½‘,k‘ þ 1 ðm½‘,k‘ þ 1 3Þðm½ki þ 1,i1 1Þðm½ki þ 1,i 1Þ ‘,k

ðB:20Þ

Similarly, we can obtain the expectation: 1

1

½‘10 ~ ~ ~ ~ Eðdðk‘ þ 1Þ C ðk‘ þ 1Þ ðSðk‘ þ 1Þi Sðk‘ þ 1Þði1Þ C ði1Þi ÞC ii a i,k Þ

¼

pi n½‘1 n½ki þ 1 ðn½‘ 1Þ vn : m½‘1,k‘ þ 1 ðm½‘1,k‘ þ 1 3Þðm½ki þ 1,i1 1Þðm½ki þ 1,i 1Þ ‘,k

Thus, the proof has been completed using (B.3), (B.4), (B.7), (B.8), (B.20) and (B.21).

ðB:21Þ &

B.2. Proof of Theorem 5 b0 H b b ð1Þ b ð2Þ b 1 ¼ C b b We consider the statistic D2k,i . For i¼1,y,k, it is important to note that S ðiÞ i,k i,k C i,k and C i,k ðl ðiÞ l ðiÞ Þ equals 0 1 ½k d1 B C ½k1 B C b 21 d½k1 d2 C B C 1 C, B B C ^ @ A ½ki þ 1 ½ki þ 1 b C iði1Þ dði1Þ di where 0 B b ð2Þ B b ð1Þ l ðiÞ l ðiÞ ¼ @

b ð2Þ b ð1Þ l 1 l 1 ^ b ð2Þ b ð1Þ i  i

l

l

1 C C, A

0

b 11 S

b ðiÞ ¼ B S @ ^ b i1 S

 & 

b 1i S

1

C ^ A, b ii S

N. Shutoh

/ Journal of Statistical Planning and Inference 142 (2012) 110–125

½j

di ¼ x ½1,j x ½2,j ði ¼ 1, . . . ,k, j ¼ 1, . . . ,kiþ 1Þ, i i Therefore, it holds that ½k0 b 1 ½k D2k,i ¼ d1 C 11 d1 þ

½k0 b 1 ½k D2k,1 ¼ d1 C 11 d1

i X

½ks þ 1

ðds

½ki þ 10

½ki þ 10

½ki þ 10

dði1Þ

¼ ðd1

, . . . ,di1

1

½ks þ 1

b sðs1Þ d C ðs1Þ

125

Þ:

and

b sðs1Þ d C ðs1Þ

b ðd ÞC ss s

½ks þ 1 0

½ks þ 1

Þ,

s¼2

for i¼ 2,y,k. For i¼1, using Lemmas 6 and 9, we can obtain 8 9 ð1Þ ð2Þ = < p1 ðN½k þ N½k Þ n½k 2 2 d þ EðDk,1 Þ ¼ : n½k p1 1 : 1 Nð1Þ Nð2Þ ; ½k

ðB:22Þ

½k

For i ¼ 2, . . . ,k and s¼2,y,i, it is important to note that ½ks þ 1

Eðds

½ks þ 1

jdðs1Þ

½ks þ 1

Þ ¼ ds þ Ssðs1Þ S1 ðs1Þ ðdðs1Þ

dðs1Þ Þ:

Thus, from (B.13), (B.14), (B.15), and (B.22), it can be shown that EðD2k,i Þ

8 9 8 9 ð1Þ ð2Þ = ð1Þ ð2Þ < i X p1 ðN½k þ N½k Þ n½k n½ks þ 1 < 2 m½ks þ 1,s 1 2 ps ðn½ks þ 1 1Þ N½ks þ 1 þN½ks þ 1 = 2 ¼ d þ ds  d þ  þ : ; m½ks þ 1,s1 1 s1 m½ks þ 1,s1 1 N ð1Þ n½k p1 1 : 1 N ð1Þ N ð2Þ ; s ¼ 2 m½ks þ 1,s 1 : N ð2Þ ½k

½ks þ 1

½k

½ks þ 1

ðB:23Þ Thus, from (B.22) and (B.23), it directly follows that Theorem 5 holds.

&

References Batsidis, A., Zografos, K., 2006. Discrimination of observations into one of two elliptic populations based on monotone training samples. Metrika 64, 221–241. Batsidis, A., Zografos, K., Loukas, S., 2006. Errors in discrimination with monotone missing data from multivariate normal populations. Comput. Statist. Data Anal. 50, 2600–2634. Fujikoshi, Y., Seo, T., 1998. Asymptotic approximations for EPMC’s of the linear and quadratic discriminant functions when the sample sizes and the dimension are large. Random Oper. Stochastic Equations 6, 269–280. Fujikoshi, Y., Ulyanov, V.V., Shimizu, R., 2010. Multivariate Statistics High-Dimensional and Large-Sample Approximations. John Wiley & Sons, Inc., Hoboken, New Jersey, pp. 467–479. Kanda, T., Fujikoshi, Y., 1998. Some basic properties of the MLE’s for a multivariate normal distribution with monotone missing data. Amer. J. Math. Management Sci. 18, 161–190. Kanda, T., Fujikoshi, Y., 2004, Linear discriminant function and probabilities of misclassification with monotone missing data. Proc. 8th China–Japan Statist. Sympos, pp. 142–143. Lachenbruch, P.A., 1968. On expected probabilities of misclassification in discriminant analysis, necessary sample size, and a relation with the multiple correlation coefficient. Biometrics 24, 823–834. Okamoto, M., 1963. An asymptotic expansion for the distribution of the linear discriminant function. Ann. Math. Statist. 34, 1286–1301. Shutoh, N., 2011. Asymptotic expansions relating to discrimination based on two-step monotone missing samples. J. Statist. Plann. Inference 141, 1297–1306. Shutoh, N., Hyodo, M., Seo, T., 2011. An asymptotic approximation for EPMC in linear discriminant analysis based on two-step monotone missing samples. J. Multivariate Anal. 102, 252–263. Wakaki, H., 1994. Discriminant analysis under elliptical populations. Hiroshima Math. J. 24, 257–298.