Smoothed jackknife empirical likelihood inference for ROC curves with missing data

Smoothed jackknife empirical likelihood inference for ROC curves with missing data

Journal of Multivariate Analysis 140 (2015) 123–138 Contents lists available at ScienceDirect Journal of Multivariate Analysis journal homepage: www...

532KB Sizes 0 Downloads 74 Views

Journal of Multivariate Analysis 140 (2015) 123–138

Contents lists available at ScienceDirect

Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva

Smoothed jackknife empirical likelihood inference for ROC curves with missing data Hanfang Yang a , Yichuan Zhao b,∗ a

School of Statistics, Renmin University of China, Beijing, 100872, China

b

Department of Mathematics and Statistics, Georgia State University, Atlanta, GA 30303, United States

article

info

Article history: Received 12 July 2013 Available online 14 May 2015 AMS subject classifications: 62G20 62G10 62H20 Keywords: Jackknife Smoothed empirical likelihood Missing data

abstract In this paper, we apply smoothed jackknife empirical likelihood (JEL) method to construct confidence intervals for the receiver operating characteristic (ROC) curve with missing data. After using hot deck imputation, we generate pseudo-jackknife sample to develop jackknife empirical likelihood. Comparing to traditional empirical likelihood method, the smoothed JEL has a great advantage in saving computational cost. Under mild conditions, the smoothed jackknife empirical likelihood ratio converges to a scaled chisquare distribution. Furthermore, simulation studies in terms of coverage probability and average length of confidence intervals demonstrate this proposed method has the good performance in small sample sizes. A real data set is used to illustrate our proposed JEL method. © 2015 Elsevier Inc. All rights reserved.

1. Introduction The ROC curve has received considerable attention over past decades, which has been widely used in epidemiology, medical research, signal detection, diagnostic medicine and material testing. In medical studies, the sensitivity or true positive rate (TPR) of the diagnostic test is the proportion of the diseased patients who have positive tests among diseased patients. A plot of sensitivity (TPR) against 1-specificity defines the ROC curve, which is a graphical summary of the discriminatory accuracy of diagnostic tests. Current methods for the ROC curve analysis are focused on complete data including Swets and Pickett [26], Tosteson and Begg [27], Hsieh and Turnbull [9], Zou et al. [33], Lloyd [14], Pepe [20], Metz et al. [16] and Lloyd and Yong [15], among others. Moreover, Pepe [21] provided an excellent summary for recent research work of ROC curves. Claeskens et al. [4] developed smoothed empirical likelihood confidence intervals for the continuousscale ROC curve for complete data. More recently, Pesarin and Salmaso [22] discussed the empirical conditional ROC curve for two-sample test statistics in the permutation context, including the related calculation algorithm, etc. Empirical likelihood (EL) is a conditional nonparametric method for statistical inference, which employs the maximum likelihood method conditioning on the observed data set. Owen [18,19] introduced EL method to construct confidence regions for the mean vector. Some related studies include the Bartlett-correctability [5], general estimating equations [23], the general plug-in EL [8] and so on. For ROC curves, a common way of EL is to transform nonlinear constraints to linear constraints by introducing some link variables as in [4,3], etc. More recently, jackknife empirical likelihood method, based on jackknife pseudo-sample, becomes more attractive. Jing et al. [10] proposed the jackknife empirical likelihood method for



Corresponding author. E-mail address: [email protected] (Y. Zhao).

http://dx.doi.org/10.1016/j.jmva.2015.05.002 0047-259X/© 2015 Elsevier Inc. All rights reserved.

124

H. Yang, Y. Zhao / Journal of Multivariate Analysis 140 (2015) 123–138

a U-statistic. Gong et al. [7] demonstrated that the smoothed jackknife empirical likelihood (JEL) method for the continuousscale ROC curve and Yang and Zhao [32] proposed JEL inference methods for the difference of two correlated ROC curves. All the above studies discussed the Wilks’ theorem from EL or JEL method for the complete data. We often meet with missing data which happen in medical studies and other scientific experiments since some of the responses and data may not be obtained for many reasons. The imputation-based procedure is one common method to deal with missing data problem. Using imputation method, the information in the recorded variables in the incomplete cases can be retained for statistical inference about parameters of interest. For example, Wang and Rao [30,31] addressed missing response questions based on empirical likelihood methods and demonstrated efficiency gain comparing to complete-case analysis. In this paper, we assume that data are missing completely at random (MCAR), which indicates the causality of missing data is not associated with other values of observed or unobserved variables [12,24]. We consider here hot deck imputation procedure in which missing data are randomly substituted by values from observed responses, which is used by Current Population Survey (CPS) and the third National Health and Nutrition Examination Survey [2]. Essentially, a hot deck imputation method is to resample the imputed missing observation from those observed data. The hot deck estimator, which is less sensitive to model misspecification is unbiased under the MCAR assumption (see [12]). By empirical likelihood method, missing data problem was also investigated by Qin and Zhang [25], Qin and Qian [24] using hot deck imputation technique, among others. In particular, Liu and Zhao [13] applied hot deck imputation methods in missing data to improve the accuracy of semi-empirical likelihood inference for ROC curves in terms of coverage probability compared with the confidence intervals by deleting missing values when the sample size is very small and missing rate is very high. To the best of our knowledge, no paper has addressed the problem on how to construct confidence intervals for the continuous-scale ROC curve with missing completely at random data by jackknife EL methods. In this paper, we apply smoothed jackknife EL to construct confidence intervals for the ROC curve with missing data to avoid adding extra constraints. The paper is organized as follows. Major procedures for jackknife empirical likelihood ratio are proposed in Section 2, including methods to develop smoothed empirical likelihood and asymptotic results of the jackknife empirical likelihood ratio. In Section 3, we conduct simulation studies to evaluate smoothed jackknife empirical likelihood confidence intervals for continuous-scale ROC curves in small and moderate samples in terms of coverage probability and average length of confidence intervals. Furthermore, we illustrate our approach using real data example. We make a brief discussion in Section 4. All proofs are given in the Appendix. 2. Inference procedure 2.1. Missing data and hot deck imputation Consider i.i.d. random samples of xi , i = 1, . . . , m, in distribution F and independent missing indicators δxi , i = 1, . . . , m, in Bernoulli distribution with response rate P1 , which means P (δxi ) = P (δxi |xi ) and P1 = P (δxi = 1). Similarly, the random samples are denoted by yi , i = 1, . . . , n, in distribution G and missing indicators δyi , i = 1, . . . , n, in Bernoulli distribution with response rate P2 . Thus, we have P (δyi ) = P (δyi |yi ) and P2 = P (δyi ). It is quite common that P1 ̸= P2 in practice. Combining xi with δxi , we can define xi,m = xi ∗ δxi , i = 1, . . . , m, as completely random missing data. Also, we have yi,m =  yi ∗ δyi , i = 1, . . . , n. Denote the observed set as Xobs = {xi : δxi = 1, i = 1, . . . , m} and Yobs = yi : δyi = 1, i = 1, . . . , n . Then, we adopt the procedure of the  hot deck imputation, replacing the missing value with values from observed set Xobs m n and Yobs . Denote r1 = δ , r = x 2 i =1 j=1 δyj , m1 = m − r1 and m2 = n − r2 . Let Srx = {i : δxi = 1}, Smx = {i : δxi = 0}, i Sry = {j : δyj = 1} and Smy = {j : δyj = 0}. x∗i are generated by the discrete uniform distribution from observed data set Xobs , and y∗i are generated by the discrete uniform distribution from observed data set Yobs . Finally, we obtain the data after hot deck imputation xI ,i = xi,m + x∗i ∗ (1 − δxi ), i = 1, . . . , m, and yI ,i = yi,m + y∗i ∗ (1 − δyi ), i = 1, . . . , n. 2.2. Smoothed empirical likelihood ratio Let F and G be the distribution functions of the diseased and non-diseased populations, respectively. The ROC curve can be written as ROC (p) = 1 − F (G−1 (1 − p)), where 0 < p < 1 and G−1 denotes the nquantile function of G, which is m G−1 (p) = inf{x ∈ R : p ≤ G(x)}. Denote Fm (x) = 1/m i=1 I (xI ,i ≤ x) and Gn (y) = 1/n j=1 I (yI ,j ≤ y). Let 0 < P1 and 0 < P2 . We have the following theorem about the weak convergence of Fm (x) and Gn (y). Theorem 2.1.









m{Fm (x) − F (x)} H⇒ n{Gn (x) − G(x)} H⇒

1 − P1 + P1−1 B(F (x)),

1 − P2 + P2−1 B(G(x)),

where B(·) is the Brownian bridge on [0, 1].

H. Yang, Y. Zhao / Journal of Multivariate Analysis 140 (2015) 123–138

125

Let K (p) =



w(u)du, u ≤p

where w is a smooth symmetric kernel function with support [−1, 1]. Define the smooth estimator of ROC(p) as Rˆ m,n (p) = 1 −

m 1 

m j =1

 K

1 − p − Gn (xI ,j )



,

h

where h = h(n) > 0 is a bandwidth. Define Rˆ m,n,i (p) = 1 − Rˆ m,n,i (p) = 1 −

1



 1≤j≤m,j̸=i

m−1

K

1 − p − G n ( x I ,j ) h

  m 1  1 − p − Gn,m−i (xI ,j ) m j=1

K



h

,

,

1 ≤ i ≤ m,

m + 1 ≤ i ≤ m + n,

where Gn,−k (y) =

1

 1≤i≤n,i̸=k

n−1

I (yI ,i ≤ y),

k = 1, . . . , n.

The jackknife pseudo-sample is defined as Vˆ i (p) = (m + n)Rˆ m,n (p) − (m + n − 1)Rˆ m,n,i (p),

i = 1 , . . . , m + n.

(2.1)

˜ based on the Vˆ i (p), is The empirical likelihood ratio at R, m+n  m +n m +n    sup {pi } : pi = 1, pi Vˆ i (p) = R˜ , pi > 0, i = 1, . . . , m + n i =1 i=1 i=1 m+n  . L(R˜ , p) = m +n   sup {pi }, pi = 1, pi > 0, i = 1, . . . , m + n i =1

i =1

By using the Lagrange multiplier method, we have l(R˜ , p) = −2 log L(R˜ , p) = 2

n 

log{1 + λ(Vˆ i (p) − R˜ )},

(2.2)

i=1

where λ satisfies the equation



Vˆ i (p) − R˜

i =1

1 + λ{Vˆ i (p) − R˜ }

m+n

= 0.

Define

v

2 m,n

(p) =

m+n

1



m + n i=1

 Vˆ i (p) −

1

2

m+n



m + n i =1

Vˆ i (p)

.

(2.3)

2 We will develop the asymptotic properties of empirical variance vm ,n (p) and the empirical likelihood ratio statistic for the true value R(p) of the ROC curve at point p based on xI ,i , i = 1, . . . , m, yI ,j , j = 1, . . . , n. These results are used to construct an asymptotic confidence interval for R(p). We give the additional conditions.

A.1. A.2. A.3. A.4. A.5.

F (x) and G(y) are continuous functions; sup |f (x)| < ∞ and sup |g (y)| < ∞, where f (x) = dF (x)/dx and g (y) = dG(y)/dy; Assume that m/n → r, as m + n → ∞, and 0 < P1 ≤ 1 and 0 < P2 ≤ 1; w′ (u) is bounded by M < ∞ for u ∈ (−1, 1); p ∈ (a, b) for any sub-interval (a, b) ⊂ (0, 1).

Theorem 2.2. Under Assumptions A.1–A.5, assume conditions h = h(n) → 0, nh2 / log n → ∞ and nh4 → 0 as n → ∞. Then, for p ∈ (a, b), P

vm2 ,n (p) → σ22 (p), where



σ (p) = 1 + 2 2

1 r





R(p)(1 − R(p)) + (1 + r )R 2 (p)p(1 − p).

126

H. Yang, Y. Zhao / Journal of Multivariate Analysis 140 (2015) 123–138

Table 1 Coverage probability of 95% confidence intervals for ROC (p). m

n

p

P1

P2

JEL (A)

SEL (A)

JEL (B)

SEL (B)

JEL (C)

SEL (C)

50 80 100 50 80 100

50 80 100 50 80 100

0.4 0.4 0.4 0.5 0.5 0.5

0.8 0.8 0.8 0.8 0.8 0.8

0.8 0.8 0.8 0.8 0.8 0.8

0.941 0.961 0.948 0.936 0.945 0.943

0.923 0.935 0.955 0.950 0.958 0.949

0.952 0.943 0.958 0.944 0.962 0.956

0.951 0.944 0.951 0.945 0.934 0.930

0.933 0.940 0.942 0.935 0.941 0.939

0.951 0.940 0.958 0.940 0.950 0.956

50 80 100 50 80 100

50 80 100 50 80 100

0.4 0.4 0.4 0.5 0.5 0.5

0.9 0.9 0.9 0.9 0.9 0.9

0.9 0.9 0.9 0.9 0.9 0.9

0.943 0.939 0.951 0.941 0.943 0.936

0.947 0.958 0.955 0.958 0.957 0.958

0.954 0.948 0.955 0.946 0.948 0.940

0.948 0.946 0.943 0.939 0.947 0.937

0.939 0.942 0.945 0.943 0.947 0.947

0.947 0.948 0.948 0.946 0.950 0.958

Theorem 2.3. Under the conditions of Theorem 2.2, for p ∈ (a, b), we have D

l(R(p), p) → a(p)χ12 , where R(p) is true ROC curve at p,

σ12 (p) , σ22 (p)

a(p) =

  1 ′ R(p){1 − R(p)} + (1 − P2 + P2−1 )(1 + r )R 2 (p)p(1 − p). σ12 (p) = (1 − P1 + P1−1 ) 1 + r

Remark. In our setting, the limiting distribution is the scaled chi-squared distribution because of missing mechanism. When the response rate P1 = P2 = 1, the limiting distribution is a standard chi-squared distribution. We may use the consistent estimator aˆ (p) of a(p) to construct our confidence intervals of R(p) given p. Thus, the asymptotic 100(1 − α)% smoothed jackknife EL confidence interval for R(p) is given by





I (p) = R˜ : l(R˜ , p) ≤ aˆ (p)χ12 (α) , where χ12 (α) is the upper α -quantile of χ12 . 3. Numerical studies and real data analysis In this section, we conduct simulation studies to compare the performance of jackknife empirical likelihood (JEL) method and smoothed empirical likelihood (SEL) for the ROC curve proposed by An [1] in terms of coverage accuracy and average length of confidence intervals with various distributions, response rates and sample sizes. The MATLAB computational environment is selected to use for numerical studies in this section. The M-file codes are explicitly upon request from readers. In the simulation studies, distributions of the diseased population (X) and the non-diseased population (Y) are represented by F (x) and G(y). We consider three scenarios, which are (A) F ∼ N (0, 4), G ∼ N (1, 4), (B) F ∼ Exp(1.5), G ∼ N (1.5, 1) and (C) F ∼ Exp(0.8), G ∼ Exp(0.8). Random samples x and y are independently drawn from populations X and Y. The response rates for data x and y are chosen as, (P1 , P2 ) = (0.8, 0.8) or (0.9, 0.9). We employ independent Bernoulli distribution with given response rate on random samples x and y to create samples with MCAR. The sample sizes for x and y are (m, n) = (50, 50), (80, 80) and (100, 100). For certain response rate and sample size, we generate 1000 independent random samples of missing data. Without the loss of generality, we use both methods to construct confidence intervals for ROC curves at p = 0.4, 0.5. The nominal level of the confidence intervals is 1 − α = 0.95. Then, the Epanechnikov kernel

 w(u) =

3 4

(1 − u2 ) 0

if |u| ≤ 1 otherwise

is used for both JEL method and SEL method and the smoothing parameter is chosen to be h = c ∗ n−1/3 for JEL method and SEL method. Here c is determined by the cross-validation method as [3] to minimize the mean squared error of estimation equation in JEL method or SEL method respectively. The simulation results of coverage probability are illustrated in Table 1. From Table 1, after selecting the optimal bandwidths for both methods, we find out that JEL method has similar performance as SEL method does in terms of coverage probability. Next, we investigate the performance of average length of ROC curves using JEL method and SEL method. We arrange the same simulation settings as before. To obtain the average length, we applied the bisection method to find solutions.

H. Yang, Y. Zhao / Journal of Multivariate Analysis 140 (2015) 123–138

127

Table 2 Average length of 95% confidence intervals for ROC (p). m

n

p

P1

P2

JEL (A)

SEL (A)

JEL (B)

SEL (B)

JEL (C)

SEL (C)

50 80 100 50 80 100

50 80 100 50 80 100

0.4 0.4 0.4 0.5 0.5 0.5

0.8 0.8 0.8 0.8 0.8 0.8

0.8 0.8 0.8 0.8 0.8 0.8

0.320 0.251 0.231 0.363 0.293 0.261

0.281 0.257 0.232 0.349 0.294 0.267

0.313 0.252 0.226 0.344 0.273 0.244

0.289 0.253 0.241 0.329 0.280 0.266

0.389 0.312 0.264 0.368 0.322 0.289

0.332 0.274 0.265 0.325 0.305 0.298

50 80 100 50 80 100

50 80 100 50 80 100

0.4 0.4 0.4 0.5 0.5 0.5

0.9 0.9 0.9 0.9 0.9 0.9

0.9 0.9 0.9 0.9 0.9 0.9

0.293 0.233 0.211 0.330 0.268 0.237

0.277 0.234 0.215 0.319 0.271 0.257

0.294 0.231 0.205 0.312 0.248 0.222

0.279 0.242 0.226 0.314 0.272 0.256

0.357 0.277 0.258 0.354 0.294 0.265

0.295 0.283 0.259 0.318 0.299 0.295

Fig. 1. 95% point-wise jackknife empirical likelihood confidence interval for ROC curves in missing completely at random, where JEL Upper stands for the upper bound of jackknife empirical likelihood confidence interval, JEL Lower stands for the lower bound of jackknife empirical likelihood confidence interval, SEE stands for smoothed empirical estimator and True stands for the true value of ROC curve.

This process does not involve high computation costs because jackknife method can simplify the complexity of equations significantly. This is one of the main advantages of the smoothed jackknife EL method. In summary, Table 2 shows that JEL method has shorter average length of confidence intervals than SEL does most of the times except for the m = n = 50 case. In addition, we study empirical likelihood confidence intervals for ROC curves at different specificities generated by simulated data. Data (D) F ∼ N (1, 0.25), G ∼ N (0, 1) is employed in this case. We choose two sample sizes (50, 50) and (300, 300) with different response rates (1, 1) and (0.8, 0.9) and employ the same approach to select the optimal bandwidths. We select 100 points on the ROC curve evenly to obtain JEL confidence intervals respectively. As Fig. 1 shows, it is

128

H. Yang, Y. Zhao / Journal of Multivariate Analysis 140 (2015) 123–138

Fig. 2. 95% point-wise jackknife empirical likelihood confidence intervals for the ROC curve from Pancreatic Cancer Serum Biomarkers data in missing completely at random, where JELCI stands for the jackknife empirical likelihood confidence interval based on a given variable, smoothed estimator stands for the smoothed empirical estimator, Combined stands for the sum of CA-125 and CA-19-9 and Optimal Combined stands for the optimal weighted average of CA-125 and CA-19-9.

clear that the confidence intervals of ROC curves are located above the diagonal line, which indicates two distributions can be distinguished clearly by ROC curves. In addition, the ROC curve has the shorter confidence intervals when the sample sizes are larger. Moreover, we use a real example to illustrate our proposed method. The Pancreatic Cancer Serum Biomarkers data contain two continuous biomarkers CA-125 (V1) and CA-19-9 (V2), which are used to identify pancreatic cancer. In this paper, we construct the jackknife empirical likelihood confidence interval for the ROC curve with missing completely at random (MCAR) at 20% by two bio-markers CA-125 (V1) and CA-19-9 (V2), respectively. Missing completely at random (MCAR) mechanism is designed by choosing the missing observation randomly from the real data. The smoothing parameter c is selected by the cross-validation method. When people use two, or more biomarkers for distinguishing diseased from non-diseased observations in real data analysis, they may combine the two biomarkers according to several combining functions due to the unknown dependence among results. For simplicity, we apply two methods to combine the biomarkers, which are the sum and weighted average of two biomarkers subject to maximizing area under the ROC curve (AUC) as a referee suggested. We calculate AUC by the weighted average of two biomarkers, CA-125 (V1) and CA-19-9 (V2). Only one weight in [0, 1] is needed to be determined by the traversal principle, and the optimal weight is obtained based on the maximum value of AUC. Then, we apply JEL methods to handle these problems with missing completely at random. Fig. 2 shows the point-wise confidence intervals for the ROC curve, by CA-125 (V1), CA-19-9 (V2), the sum and the average of two biomarkers by optimal weights. The confidence intervals for ROC curves by two biomarkers CA-125 (V1) and CA-19-9 (V2), and their combinations are above the diagonal line. Thus, using jackknife empirical likelihood method, we are able to measure, evaluate and compare the discriminative ability of biomarkers. When the data is missing completely at random, the diseased observation can be distinguished from the non-diseased observation using the biomarkers. 4. Discussion In this paper, we apply jackknife empirical likelihood method to construct confidence intervals for the continuous-scale ROC curve with missing data. The theoretical results provide asymptotic properties, including asymptotic variance and limiting distribution of the empirical likelihood ratio statistics. The simulation results demonstrate that coverage probability

H. Yang, Y. Zhao / Journal of Multivariate Analysis 140 (2015) 123–138

129

of EL confidence interval can be close to nominal level at various response rates, distributions of data and different locations of the ROC curve. Comparing with traditional SEL methods, JEL methods have the advantages such as less computational cost and a similar coverage probability and narrower average length in most settings except for the very small sample size (m = n = 50). We use MATLAB to make programs for the proposed numerical studies and real data analysis and the codes are provided upon request. There are other topics, which should be investigated in the future. For instance, we wish to extend the point-wise confidence intervals to simultaneous confidence bands for the ROC curve with missing completely at random, which is an open problem. It is very interesting to extend the theory in this paper to multivariate outcome variables especially when the missing probabilities may differ on different outcomes. Furthermore, combining jackknife empirical likelihood method, imputation methods could be applied to solve other missing data problems. Moreover, we would like to develop smoothed jackknife empirical likelihood method for ROC curves with other kinds of incomplete data, such as right censoring data and current status data. It is very challenging to investigate JEL methods for the difference of two correlated ROC curves with missing, etc. Acknowledgments Hanfang Yang’s research is supported by the Fundamental Research Funds for the Central Universities (the Research Funds of Renmin University of China) (13XNF059). Yichuan Zhao is supported by the National Security Agency Grant (H98230-12-1-0209), the National Science Foundation Grant (DMS-1406163) and the Research Initiation Grant, Georgia State University. The authors would like to thank the two referees and AE for their useful suggestions and comments which lead to a significant improvement of the current version. Appendix. Proofs of theorems Proof of Theorem 2.1. Denote σ − algebra Br1 = {σ (xi , δxi , i ∈ Sr1 )} and Bm = {σ (xi , δxi , i = 1, . . . , m)}. Because x∗i are only dependent on Br1 , from [25], we have E (I (x∗i ≤ x)|Br1 ) = E (I (x∗i ≤ x)|Bm ) = and

1  r1 i∈S r1

I ( xi ≤ x)

√ √  m 1  m1 1   ∗ {I (xi ≤ x) − F (x)} + √ √ m{Fm (x) − F (x)} = √ √ I (xi ≤ x) − E (I (x∗i ≤ x)|Br1 ) . r1 r1 i∈S m m1 i∈S



r1

m1

Denote the first term as Vm (x) and the second term as Um (x). The response  rate P1 > 0 assures the r1 → ∞ and m1 → ∞ when m → ∞. We define the empirical distribution Fr1 (x) = 1/r1 i∈Sr I (xi ≤ x) of x1 , . . . , xm and x∗i , i ∈ Sm1 with the distribution function Fr1 (x). Denote Fr1 ,m1 (x) = 1/m1



i∈Sm1

1

I (x∗i ≤ x) . Fr1 ,m1 (x) is the empirical distribution of





√ x∗1 , . . . , x∗m1 . By Theorem (3.6.13) in [29], we have m1 {Fr1 ,m1 (x) − Fr1 (x)} H⇒ B(F (x)), where B(·) is the Brownian bridge P

on [0, 1]. Hence, E (Um (s)Um (t )) −→(1 − P1 ){F (min(s, t )) − F (s)F (t )} and √ E (Um (x)|Br1 ) = 0. By Donsker’s theorem and multivariate central limit theorem from Theorem 19.3 of van der Vaart [28], r1 {Fr1 (x) − F (x)} H⇒ B(F (x)) and B(F (x)) is P

tight. E (Vm (s)Vm (t )) −→ P1−1 {F (min(s, t )) − F (s)F (t )}, where B(·) is the Brownian bridge on [0, 1]. Then, we consider

 √ √ m√ m1 √ r1 {Fr1 (x) − F (x)} , √ m1 {Fr1 ,m1 (x) − Fr1 (x)} . (Vm (x), Um (x)) = √ r1

m

We know that Brownian bridge B (F (x)) is tight and Vm (x) and Um (x) marginally converge to Brownian bridge, i.e., Vm (x) H⇒ 



1 − P1 B(F (x)) and Um (x) H⇒

weak convergence

P W

P1−1 B(F (x)), respectively. From Eq. (3.2) in [6], we know that Um (x)

P

W

P1−1 B(F (x)), where

is defined as Kosorok [11]. By p. 180 in [29] and Theorem 2.2 in [11],

(Vm (x), Um (x)) H⇒



 ˜ B1 (F (x)), 1 − P1 B2 (F (x)) ,

−1 ˜

P1



where B˜ 1 (F (x)) and B˜ 2 (F (x)) are independent copies of B(F (x)). The sequence converges jointly in distribution to two independent Brownian bridges, which implies that W1 (x) =



m{Fm (x) − F (x)} = Vm (x) + Um (x) H⇒

Similarly, we have W2 (y) =



n{Gn (x) − G(x)} H⇒





1 − P1 + P1−1 B(F (x)).

1 − P2 + P2−1 B(G(x)).



(A.1)

130

H. Yang, Y. Zhao / Journal of Multivariate Analysis 140 (2015) 123–138

Lemma A.1. Under conditions in Theorem 2.2, as n −→ ∞, we have







D

m + n Rˆ m,n (p) − R(p) −→ N (0, σ12 (p)),

where σ12 (p) is defined in Theorem 2.3 and R(p) is the true ROC curve at point p ∈ (a, b). Proof. We consider the uniform convergence of empirical distribution function after hot deck imputation. Mojirsheibani [17] derived the Glivenko–Cantelli Theorem with completely randomly missing data. sup |Fm (x) − F (x)| −→ 0

a.s.

sup |Gn (y) − G(y)| −→ 0

and

x∈R

a.s.

(A.2)

y∈R

Since R′′ is continuous at p ∈ (a, b), R′ and R′′ are bounded in (a, b). We write 1−

m 1 

m j =1

 K

1 − p − G(xI ,j )



− R(p)

h

= F (G−1 (1 − p)) −





 K

1 − p − G(x) h

−∞

= F (G−1 (1 − p)) − K = F {G−1 (1 − p)} − K = F {G−1 (1 − p)} − K



1 − p − G(x)



−p



h



−p h

dFm (x)

Fm (x)|∞ −∞ +



Fm (x)w



Fm (x)dK

1 − p − G(x)





h

−∞









1 − p − G(x)



h

−∞

 −

h





h−1 dG(x)

1

Fm {G−1 (1 − p − xh)}w(x)dx

− −1

= F {G−1 (1 − p)} − Fm (G−1 (1 − p)) −



= F {G−1 (1 − p)} − Fm {G−1 (1 − p)} −



1



Fm {G−1 (1 − p − xh)} − Fm {G−1 (1 − p)} w(x)dx



−1 1

F {G−1 (1 − p − xh)} − F {G−1 (1 − p)} w(x)dx





−1 1



  ([Fm {G−1 (1 − p − xh)} − F {G−1 (1 − p − xh)}] − Fm {G−1 (1 − p)} − F {G−1 (1 − p)} )w(x)dx.

− −1

Because −p/h is beyond the support of kernel function K as h → 0, K (−p/h) = 0 when p ∈ (a, b). ∞



[F {G−1 (1 − p − xh)} − F {G−1 (1 − p)}]w(x)dx −∞ (1−p)/h

 =−

−p/h

=−



1

R′ (p)xhw(x)dx −

1



(1−p)/h

2 −p/h

R′′ (p∗ )(xh)2 w(x)dx

1

R′′ (p∗ )(xh)2 w(x)dx

2 −1

= O(h2 ),

(A.3)

wherep is between p and p + xh. Because p ∈ (a, b), we have Fm {G ∗

−1/2

−1

−1

−1

1 − P2 + P2 B[F {G

m

−1/2

(1 − p − xh)}] = op (m

(1 − p − xh)} − F {G (1 − p − xh)} − −1

), for any x ∈ [−1, 1]. Using the conditions on h and the continuity

of BF (x),



(1−p)/h −p/h



1

(Fm {G−1 (1 − p − xh)} − F {G−1 (1 − p − xh)} − [Fm {G−1 (1 − p)} − F {G−1 (1 − p)}])w(x)dx

Fm {G−1 (1 − p − xh)} − F {G−1 (1 − p − xh)} − m−1/2

=



1 − P2 + P2−1 B[F {G−1 (1 − p − xh)}]w(x)dx

−1



1





Fm {G−1 (1 − p)} − F {G−1 (1 − p)} − m−1/2





1 − P2 + P2−1 B[F {G−1 (1 − p)}] w(x)dx

−1



+ 1 − P2 + P2−1



1

m−1/2 B[F {G−1 (1 − p − xh)}] − m−1/2 B[F {G−1 (1 − p)}] w(x)dx





−1

= op (m

−1/2

).

(A.4)

H. Yang, Y. Zhao / Journal of Multivariate Analysis 140 (2015) 123–138

131

Combining (A.1)–(A.4), we have





m 1 

m 1−

 K

m j =1



1 − p − G(xI ,j )





− R(p)

h

m[F {G−1 (1 − p)} − Fm {G−1 (1 − p)}] + op (m−1/2 m1/2 ) + O(m1/2 h2 )

= D

−→ N (0, (1 − P1 + P1−1 )F {G−1 (1 − p)}{1 − F [G−1 (1 − p)]}) = N (0, (1 − P1 + P1−1 )R(p){1 − R(p)}).

(A.5)

Write

√    m n 1 − p − G n ( x I ,j ) K

m

h

j =1







=

K

1 − p − Gn (x)



h

−∞



√    m n 1 − p − G(xI ,j ) K

m

h

j =1



d nFm (x) −





 K

1 − p − G(x)



h

−∞



d nFm (x).

(A.6)

Notice that ∞



1 − p − Gn (x)

 K

=K





h ∞



W1 (x)dK

− 

1

−∞ ∞

W1 (x)w

h −∞



m{Fm (x) −

1 − p − Gn (x)

F (x)} |∞ −∞



h





h

1





h

1 − p − G(x)





m{Fm (x) − F (x)} |∞ −∞

1 − p − G(x)



h



h −∞



d m{Fm (x) − F (x)}



h

W1 (x)dK

dG(x) −

W1 (x)w



1 − p − Gn (x)



h

dGn (x)

1



−1

W1 {Gn−1 (1 − p − hu)}w(x)du

−1



−1

1



1 B[F {G−1 (1 − p − hu)}] − B[F {G− n (1 − p − hu)}]w(x)du + op (1)

1 − P2 + P2

=

1 − p − G(x)

−∞

1 − p − G(x)

W1 {G−1 (1 − p − hu)}w(x)du −

=

 −K



 +

1



 K

−∞

1 − p − Gn (x)







d m{Fm (x) − F (x)} −

h

−∞

=





−1

= op (1), because of the continuity of B(F (x)) and the proof in p. 1525 of Gong et al. [7]. Thus, we can adjust Eq. (A.6) as follows ∞



 K

1 − p − Gn (x) h

−∞





d nFm (x) −





 K

1 − p − G(x) h

=





−∞

n

1 − p − Gn (x)

1





d nFm (x)

 1 − p − G(x) −K dF (x) h h −∞     ∞ √ G(x) − Gn (x) 1 − p − G(x) = n w dF (x) √





K

√ +



h

−∞ ∞





n

G(x) − Gn (x)

2

−∞

h

2

h

w





1 − p − G(x) + ξx



h

dF (x).

(A.7)

Denote R′ (p) as the first derivative of R(p). The Brownian bridge B1 (G(x)) and B2 (G(x)) are uniformly bounded for x ∈ (a, b). Also, we have the continuities of Brownian bridge B(·) and R′ (p). Thus, B(x) is uniformly bounded. We have







1

n



G(x) − Gn (x)

2

−∞

h

√

= √

2 nh2 2

nh2

1

= √



1 − p − G(x) + ξx

n(G(x) − Gn (x))





h

2

w′



h

1

W2 {G−1 (1 − p − uh + ξx )}



dF (x)

1 − p − G(x) + ξx

−∞



1

= √

w′





1

2

2



dF (x)

w′ (u)dF {G−1 (1 − p − uh + ξx )}

−1 1

2 nh −1



2

W2 {G−1 (1 − p − uh + ξx )}

w ′ (u)R′ (p + uh + ξx )du

132

H. Yang, Y. Zhao / Journal of Multivariate Analysis 140 (2015) 123–138

 =

1 − P2 + P2−1 

1





2 nh

 =

1 − P2 + P2−1  2 nh

=

2

w′ (u)R′ (p + uh + ξx )du + op (1)

−1 1

{B(1 − p − uh + ξx )}2 w ′ (u)R′ (p + uh + ξx )du + op (1)





B[G{G−1 (1 − p − uh + ξx )}]

−1

1 − P2 + P2−1 

1

{B(1 − p)}2 w′ (u)R′ (p)du + op (1)



2 nh

−1

= op (1).

(A.8)

Recall that







G(x) − Gn (x)



n



h

−∞

w



1 − p − G(x) h



dF (x)

1



W2 {G−1 (1 − p − hu)}w(u)h−1 dF (G−1 (1 − p − hu))

= −1 1





1 − P2 + P2−1 B[G{G−1 (1 − p − hu)}]w(u)(R′ (p + uh))du + op (1)

= −1

=



1 − P2 + P2−1 R′ (p)B(1 − p)

1



w(u)du + op (1) −1 ′

D

−→ N (0, (1 − P2 + P2−1 )p(1 − p)R 2 (p)).

(A.9)

Then, we have





m + n{Rˆ m,n (p) − R(p)} =

m + n√



n

√ +

 n

m 1 

m j =1

m + n√



m

 K

 m 1−

1 − p − G(xI ,j )

 −

h

m 1 

m j =1

  m 1  1 − p − G(xI ,j ) m j =1

K

h

 K

1 − p − Gn (xI ,j )



h

 − R(p) .

Combining (A.6)–(A.9) and the independence of first term and second term, we can obtain the conclusion as follows,



D

m + n{Rˆ m,n (p) − R(p)} −→ N (0, σ12 (p)).



Lemma A.2. Under conditions in Theorem 2.2, for p ∈ (a, b), as m + n −→ ∞, we have



√ m+n

1

m+n



m + n i=1

 D ˆVi (p) − R(p) −→ N (0, σ12 (p)),

where σ12 (p) is defined in Theorem 2.3. Proof. From the definition of Vˆ i (p), we have 1

m+n



m + n i=1

Vˆ i (p) =

=

m+n

1



m + n i=1 m 

1

m + n i=1

{(m + n)Rˆ m,n (p) − (m + n − 1)Rˆ m,n,i (p)} 



( m + n) 1 − 

− (m + n − 1) 1 − +

1

m+n



m + n i=m+1



m 1 

m j =1

m − 1 j=1,j̸=i

(m + n) 1 −

− (m + n − 1) 1 −

K

m 

1







m 1 

m j =1



h

 K

1 − p − Gn (xI ,j )



h

  m 1  1 − p − G n ( x I ,j ) m j =1

 K

1 − p − Gn (xI ,j )

K

h

1 − p − Gn,m−i (xI ,j ) h



H. Yang, Y. Zhao / Journal of Multivariate Analysis 140 (2015) 123–138

=

m 

1

m + n i=1

 +

   m m+n  1 − p − Gn (xI ,j )

1−

m

m+n−1



K

=



1 m+n

m+n−1

+

m





K

m+n





1



m+n−

+ 

K

m+n−1 m

m+n



1 − p − Gn (xI ,j )

  m

m

K

1 − p − Gn (xI ,j )



h

j=1



h

1 − p − Gn,−i (xI ,j )

m+n



m



 −K

1 − p − Gn (xI ,j )



h

(m − 1)

    m 1 − p − Gn (xI ,j ) K

n

m

h

j =1

  m m+n  1 − p − Gn (xI ,j ) K

m

n m m + n − 1 

m

h

h



m−1 m

+

K

j =1

n  m   

m+n−1

m+n

m

m+n−1

m+n

+

m m+n 

h



h

i=1 j=1

1

+

=

m+n−

1 − p − Gn (xI ,j )

j=1,j̸=i

j =1

h

K



  m  1 − p − Gn,−i (xI ,j )

m

  m  1 − p − Gn (xI ,j ) j =1

K



m 



1 − p − G n ( x I ,j )



j=1,j̸=i

m+n−1

1+

m + n i=1

j =1

m



n 

1

h

  m m+n



m−1

+

K

133



h

j =1

  K

1 − p − Gn,−i (xI ,j )



h

i=1 j=1

 −K

1 − p − Gn (xI ,j ) h



.

(A.10)

Write

 n  m    1 − p − Gn,−i (xI ,j ) K

h

i =1 j =1

=

  m n   Gn,−i (xI ,j ) − Gn (xI ,j ) j =1

+

h

i =1

n  m  1 i=1 j=1

=

 −K



w



1 − p − Gn (xI ,j )

h

2

h



h

2

 2 n  m  1 Gn,−i (xI ,j ) − Gn (xI ,j ) i=1 j=1



h

Gn,−i (xI ,j ) − Gn (xI ,j )

2

1 − p − Gn (xI ,j )

w′ 

w′



1 − p − ξn,i,j



h 1 − p − ξn,i,j



h

,

(A.11)

where ξn,i,j is between the Gn (xI ,j ) and Gn,−i (xI ,j ), Gn (xI ,j ) − Gn,−i (xI ,j ) =

1 n−1

{Gn (xI ,j ) − I (YI ,i ≤ xI ,j )} = Op





1 n−1

,

(A.12)

and n  {Gn,−i (xI ,j ) − Gn (xI ,j )} = 0, i =1

because Gn (xI ,j ) − Gn,−i (xI ,j ) =

n 1

n k=1

 =

1 n



I (yI ,i ≤ xI ,j ) − 1 n−1

 n k=1

1

n 

n − 1 i=k,k̸=i

I (yI ,i ≤ xI ,j ) −

I (yI ,i ≤ xI ,j ) 1 n−1

 n  k=1

I (yI ,i ≤ xI ,j ) −

n  i=k,k̸=i

 I (yI ,i ≤ xI ,j )

134

H. Yang, Y. Zhao / Journal of Multivariate Analysis 140 (2015) 123–138

=

1 n−1

{Gn (xI ,j ) − I (yI ,i ≤ xI ,j )}.

By similar steps in (A.11) and (A.12), we have

 n  m    1 − p − Gn,−i (xI ,j ) K

 −K

h

i=1 j=1

1 − p − Gn (xI ,j )



 = Op

h



mn

(n − 1)2 h

.

(A.13)

Combining (A.10), (A.13) and Lemma A.1, we have



√ m+n

=



m + n i=1





m+n 1−

√ =



m+n

1

Vˆ i (p) − R(p)

m 1 

m j =1

 K

1 − p − Gn (xI ,j )

+ Op

h



m + n Rˆ m,n (p) − R(p) + Op







(m + n − 1)n (m + n)(n − 1)2 h

m+n−1

mn



(m + n)m h(n − 1)2 

 − R(p)

D

−→ N (0, σ12 (p)).  Lemma A.3. Under conditions in Theorem 2.2, for p ∈ (a, b), as m + n −→ ∞, we have 1

m+n



m + n i=1

2

Vˆ i (p) − R(p)

P

−→ σ22 (p),

where σ22 (p) is defined in Theorem 2.2. Proof. For 1 ≤ i ≤ m, Vˆ i (p) = 1 −

m m+n 

m

 = 1+

 K

1 − p − Gn (xI ,j )

m−1

+

h

j =1

m+n−1





m+n m

  m K

m m+n−1 

m−1

 K

1 − p − Gn (xI ,j ) h

j=1,j̸=i

1 − p − Gn (xI ,j )





h

j =1

     m m  m+n−1  1 − p − Gn (xI ,j ) 1 − p − Gn (xI ,j ) − K − K m−1 h h j =1 j=1,j̸=i     m  n 1 − p − Gn (xI ,j ) m+n−1 1 − p − Gn (xI ,i ) = 1+ K − K , (m − 1)m j=1 h m−1 h and



ˆ (p) = 1 −

Vi2

m+n−1 m−1

 +2 1 −

 K

m+n−1 m−1

1 − p − Gn (xI ,i ) h

 K

1 − p − Gn (xI ,i ) h



2 +



 2 m  1 − p − Gn (xI ,j )

n

(m − 1)m n

(m − 1)m

K

j =1

h

  m  1 − p − Gn (xI ,j ) K

j =1

h

,

which implies that m  i=1

  m (m + n − 1)2  1 − p − Gn (xI ,i ) 2 K (m − 1)2 i=1 h     2 m 2(m + n − 1)  1 − p − Gn (xI ,i ) n 1 − p − Gn (xI ,j ) − +m Σim=1 K K m−1 h (m − 1)m h j =1        m m  2n m+n−1  1 − p − Gn (xI ,i ) 1 − p − Gn (xI ,j ) + m− K K . (m − 1)m m−1 h h j =1 j=1

Vˆ i2 (p) = m +

(A.14)

H. Yang, Y. Zhao / Journal of Multivariate Analysis 140 (2015) 123–138

135

Since K 2 is a distribution function, from [7] and (A.5), we have that m 1 

m i=1



K2

1 − p − Gn (xI ,i )



h

P

−→ F {G−1 (1 − p)}.

Hence, by (A.14) and Lemma A.1, 1

m 

m + n i=1

  m (m + n − 1)2  1 − p − Gn (xI ,i ) 2 + K m+n (m + n)(m − 1)2 i=1 h   m 2(m + n − 1)  1 − p − Gn (xI ,i ) − K (m − 1)(m + n) i=1 h  2  m  1 − p − Gn (xI ,j ) n2 K + (m − 1)2 m(m + n) j=1 h    m 2n 1 − p − Gn (xI ,i ) m+n−1  + K m− (m + n)(m − 1)m m−1 h i=1     m  1 − p − Gn (xI ,j ) × K m

Vˆ i2 (p) =

h

j =1

D

−→

r +1 1 − 2F {G−1 (1 − p)} + F {G−1 (1 − p)} + [F {G−1 (1 − p)}]2 r r (r + 1)   2 r +1 + F {G−1 (1 − p)} 1 − F {G−1 (1 − p)} r +1 r r

r +1

r +1

=

r

R(p) −

2r + 1 r (r + 1)

R2 (p).

(A.15)

Next, for m + 1 ≤ i ≤ m + n, we can write that Vˆ i (p) = 1 −

m 1 

m j =1

m+n−1

+

1 − p − Gn (xI ,j )

 K

m



h m

  K

1 − p − Gn,m−i (xI ,j )



h

j =1

 −K

1 − p − Gn (xI ,j )



h

,

and

 ˆ (p) =

Vi2

1−

m 1 

m j =1

 −K  ×

 K

1 − p − Gn (xI ,j )



2

+2 1−

h

m

 +

h

1 − p − Gn (xI ,j )

m+n−1

2

m   

K

m m+n−1 

m

K

1 − p − Gn,m−i (xI ,j )



h

j =1

  m 1  1 − p − Gn (xI ,j ) m j =1

1 − p − Gn,m−i (xI ,j ) h

j =1

 

K

h



 −K

1 − p − Gn (xI ,j )



h

= Ii (p) + IIi (p) + IIIi (p).

(A.16)

By (A.13), we have 1

m+n



m + n i=m+1 Define Ai =

Ai =



m j =1

IIIi (p) = Op ((nh)−1 ).

  K

1−p−Gn,−i (xI ,j ) h



−K

  m    1 − p − Gn,−i (xI ,j ) K

j=1

h

(A.17)



1−p−Gn (xI ,j )

2

h

 −K

. By (A.13), one has

1 − p − Gn (xI ,j ) h

2

136

H. Yang, Y. Zhao / Journal of Multivariate Analysis 140 (2015) 123–138







=

mK

1 − p − Gn,−i (x) h

−∞

  =





m

+m −∞





=



h



1

w

Gn,−i (x) − Gn (x)

2

 m



dFm (x) −





 mK

Gn,−i (x) − Gn (x)



2

h

w

1 − p − Gn (x)



h

w′

h

−∞



1 − p − Gn (x)







h

h



2

dFm (x)

dFm (x)

1 − p − Gn (x) + ξ

1 − p − Gn (x)



h

−∞

Gn,−i (x) − Gn (x)

−∞





2

dFm (x)

2 dFm (x)

+ op (1)

+ op (1).

Combining (A.1), (A.2), the continuity of R′ , and Assumptions A.4 and A.5, one has that n 

1

m + n i =1

n  

m2

Ai =









m + n i =1 −∞ −∞   1 − p − Gn (x1 )

×w

h

Gn,−i (x1 ) − Gn (x1 ) h

w

n   [

m2



Gn,−i (x2 ) − Gn (x2 )



h



1 − p − Gn (x2 )







h



dFm (x1 )dFm (x2 ) + op (1)



{Gn (x1 ) − I (YI ,i ≤ x1 )}{Gn (x2 ) − I (YI ,i ≤ x2 )} (m + n)(n − 1)2 h2 i=1 −∞ −∞     1 − p − Gn (x1 ) 1 − p − G n ( x2 ) ×w w dFm (x1 )dFm (x2 )] + op (1) h h   ∞ ∞  n m2 = {Gn (x1 )Gn (x2 ) + I (YI ,i ≤ x1 )I (YI ,i ≤ x2 ) (m + n)(n − 1)2 h2 −∞ −∞ i=1

=

− I (YI ,i ≤ x1 )Gn (x2 ) − Gn (x1 )I (YI ,i ≤ x2 )}      1 − p − Gn (x2 ) 1 − p − Gn (x1 ) w dFm (x1 )dFm (x2 ) + op (1) ×w h

nm

2

nm

2

h







1







{Gn (x1 ∧ x2 ) − Gn (x1 )Gn (x2 )} (m + n)(n − 1)2 h2 −∞ −∞      1 − p − Gn (x2 ) 1 − p − Gn (x1 ) w dFm (x1 )dFm (x2 ) + op (1) ×w

=

h

=

h



1

[Gn {Fm−1 (v1 ) ∧ Fm−1 (v2 )} − Gn {Fm−1 (v1 )}Gn {Fm−1 (v2 )}] (m + n)(n − 1)2 h2 −1 −1     1 − p − Gn (Fm−1 (v2 )) 1 − p − Gn (Fm−1 (v1 )) w dv1 dv2 + op (1). ×w h

h

From the proof of Lemma A.1 of Gong et al. [7], the above equation is nm2



1



1

[G{F −1 (v1 ) ∧ F −1 (v2 )} − G{F −1 (v1 )}G{F −1 (v2 )}] (m + n)(n − 1)2 h2 −1 −1     1 − p − G{F −1 (v1 )} 1 − p − G{F −1 (v2 )} ×w w dv1 dv2 + op (1)

=

h

=

nm

2

(m + n)(n − 1)2

h



1

−1



1

[G{G−1 (1 − p − hu1 ) ∧ G−1 (1 − p − hu2 )} − G(G−1 (1 − p − hu2 )) −1

× Gn {G−1 (1 − p − hu2 )}]w(u1 )w(u2 )dF {G−1 (1 − p − hu2 )}dF {G−1 (1 − p − hu1 )} + op (1)  1 1 nm2 = [G{G−1 (1 − p − hu1 ) ∧ G−1 (1 − p − hu2 )} (m + n)(n − 1)2 −1 −1 − G{G−1 (1 − p − u2 )}Gn {G−1 (1 − p − u2 )}]w(u1 )w(u2 )R′ (p + hu2 )R′ (p + hu1 )d(u2 )d(u1 ) + op (1)  1 1 nm2 = [{(1 − p − hu1 ) ∧ (1 − p − hu2 ) − (1 − p − hu1 )(1 − p − hu2 )} (m + n)(n − 1)2 −1 −1

H. Yang, Y. Zhao / Journal of Multivariate Analysis 140 (2015) 123–138

137

× w(u1 )w(u2 )R′ (p)R′ (p)]d(u2 )d(u1 ) + op (1) = =

nm2



(m + n)(n − 1)

{(1 − p) ∧ (1 − p) − (1 − p)2 }R 2 (p) 2

nm2 r2

P

w(u1 )d(u1 )



1

w(u2 )d(u2 ) + op (1)

−1

−1



{(1 − p) − (1 − p)2 }R 2 (p) + op (1)

(m + n)(n − 1)2

−→

1





(1 + r )

p(1 − p)R 2 (p).

(A.18)

Combining (A.16), (A.17), (A.18) and Lemma A.1, one has that m+n

1

1

P



m + n i=m+1

Vˆ i2 (p) −→



1+r

R2 (p) + (r + 1)p(1 − p)R 2 (p).

(A.19)

Hence, it follows from (A.15), (A.19) and Lemma A.2 that m+n

1



m + n i =1

{Vˆ i (p) − R(p)}2 m+n

1



m+n

2



Vˆ i (p) m + n i=1 m+n i =1 1+r 2r + 1 2 1 ′ P −→ R(p) − R (p) + R2 (p) + (r + 1)p(1 − p)R 2 (p) − 2R2 (p) + R2 (p) r r (r + 1) 1+r

=

Vˆ i2 (p) + R2 (p) −

R(p)

  1 ′ R(p){1 − R(p)} + (r + 1)p(1 − p)R 2 (p) = 1+ r

= σ22 (p).  Proof of Theorem 2.2. From Lemma A.2, we have

v

2 m,n

(p) =

=





Vˆ i (p) −

m + n i=1



m + n i=1

m + n i=1

Vˆ i (p)

 {Vˆ i (p) − R(p)} + 



m + n i =1

2

m+n

1



m + n i =1

Vˆ i (p) − R(p)



m+n

1



m + n i=1 1



2

m+n

1

2

m+n

1

m+n

1

+

=

m+n

1

Vˆ i (p) − R(p) {Vˆ i (p) − R(p)}

m+n



m + n i=1

{Vˆ i (p) − R(p)}2 + op (1).

Then, applying Lemma A.3, we obtain Theorem 2.2.



Proof of Theorem 2.3. Throughout let θ = R(p). Recall 1/(m + n) Following similar steps as Gong et al. [7], we have

m+n i=1

Vˆ i (p)−θ 1+λ(Vˆ i (p)−θ )

= 0. Define γi = λ{Vˆ i (p) − θ }.

|λ| = Op ((m + n)−1/2 ),

(A.20)

max |γi | = op (1).

(A.21)

and 1≤i≤m+n

Using (A.20) and (A.21), one has that 0=

1

 Vˆ i (p) − θ

m+n

m + n i=1

1 + γi

=

1

m+n



m + n i =1

{Vˆ i (p) − θ } − Sm+n λ + Op ((m + n)−1 ),

which implies that

λ = Sm−+1 n

1

m+n



m + n i=1

{Vˆ i (p) − θ} + Op ((m + n)−1 ),

(A.22)

138

H. Yang, Y. Zhao / Journal of Multivariate Analysis 140 (2015) 123–138

where Sm+n = 1/(m + n)

m+n i=1

{Vˆ i (p) − θ }2 . Combining Taylor expansion, (A.20), (A.22), Lemmas A.1 and A.2, it is clear that

m+n

l(R(p), p) = 2



log(1 + γi )

i=1







m+n

= 1 m+n

m+n

1



m +n m+n



{Vˆ i (p) − θ}

i=1

2 + op (1)

{Vˆ i (p) − θ }2

i =1

σ 2 (p)χ 2 D −→ 1 2 1 .  σ2 (p) References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33]

Y. An, Smoothed empirical likelihood inference for ROC curves with missing data, Open J. Stat. 2 (2012) 21–27. R. Andridge, R. Little, A review of hot deck imputation for survey non-response, Internat. Statist. Rev. 78 (2010) 40–64. J. Chen, L. Peng, Y. Zhao, Empirical likelihood based confidence intervals for copulas, J. Multivariate Anal. 100 (2009) 137–151. G. Claeskens, B.Y. Jing, L. Peng, W. Zhou, Empirical likelihood confidence regions for comparison distributions and ROC curves, Canad. J. Statist. 31 (2003) 173–190. T. DiCiccio, P. Hall, Empirical likelihood is bartlett-correctable, Ann. Statist. 19 (1991) 1053–1061. E. Giné, J. Zinn, Bootstrapping general empirical measures, Ann. Probab. 18 (1990) 851–869. Y. Gong, L. Peng, Y. Qi, Smoothed jackknife empirical likelihood method for ROC curve, J. Multivariate Anal. 70 (2010) 1520–1531. N.L. Hjort, I.W. McKeague, I. Van Keilegom, Extending the scope of empirical likelihood, Ann. Statist. 37 (2009) 1079–1111. F. Hsieh, B.W. Turnbull, Non-parametric and semi-parametric estimation of the receiver operating characteristic curve, Ann. Statist. 24 (1996) 25–40. B. Jing, J. Yuan, W. Zhou, Jackknife empirical likelihood, J. Amer. Statist. Assoc. 104 (2009) 1224–1232. M. Kosorok, Bootstrapping the Grenander estimator, Inst. Math. Stat. 1 (2008) 282–292. R. Little, D. Rubin, Statistical Analysis with Missing Data, John Wiley, New York, 2002. X. Liu, Y. Zhao, Semi-empirical likelihood inference for the ROC curve with missing data, J. Statist. Plann. Inference 142 (2012) 3123–3133. C.J. Lloyd, The use of smoothed ROC curves to summarize and compare diagnostic systems, J. Amer. Statist. Assoc. 93 (1998) 1356–1364. C.J. Lloyd, Z. Yong, Kernel estimators for the ROC curve are better than empirical, Statist. Probab. Lett. 44 (1999) 221–228. C.E. Metz, B.A. Herman, J.H. Shen, Maximum likelihood estimation of receiver operating characteristic (ROC) curves from continuously-distributed data, Stat. Med. 17 (1998) 1033–1053. M. Mojirsheibani, The Glivenko–Cantelli theorem based on data with randomly imputed missing values, Statist. Probab. Lett. 55 (2001) 385–396. A. Owen, Empirical likelihood ratio confidence intervals for a single functional, Biometrika 75 (1988) 237–249. A. Owen, Empirical likelihood and confidence regions, Ann. Statist. 18 (1990) 90–120. M.S. Pepe, A regression modeling framework for ROC curves in medical diagnostic testing, Biometrika 84 (1997) 595–608. M.S. Pepe, The Statistical Evaluation of Medical Tests for Classification and Prediction, Oxford University Press, 2003. F. Pesarin, L. Salmaso, Permutation Tests for Complex Data, Theory, Applications and Software, John Wiley and Sons, Chichester, UK, 2010. J. Qin, J. Lawless, Empirical likelihood and general estimating equations, Ann. Statist. 22 (1994) 300–325. Y. Qin, Y. Qian, Empirical likelihood confidence intervals for the differences of quantiles with missing data, Acta Math. Appl. Sin. 25 (2009) 105–116. Y. Qin, S. Zhang, Empirical likelihood confidence intervals for differences between two datasets with missing data, Pattern Recognit. Lett. 29 (2008) 803–812. I.A. Swets, R.M. Pickett, Evaluation of Diagnostic Systems: Methods from Signal Detection Theory, Academic Press, New York, 1982. A.N. Tosteson, C.B. Begg, A general regression methodology for ROC curve estimation, Med. Decis. Making 8 (1988) 205–215. A.W. van der Vaart, Asymptotic Statistics, Cambridge University Press, 1998. A.W. van der Vaart, J. Wellner, Weak Convergence and Empirical Processes: with Applications to Statistics, in: Springer Series in Statistics, 1996. Q.H. Wang, J.N.K. Rao, Empirical likelihood for linear regression models under imputation for missing responses, Canad. J. Statist. 29 (2001) 597–608. Q.H. Wang, J.N.K. Rao, Empirical likelihood-based inference in linear models with missing data, Scand. J. Stat. 29 (2002) 563–576. H. Yang, Y. Zhao, Jackknife empirical likelihood for the difference of ROC curve, J. Multivariate Anal. 115 (2013) 270–284. K.H. Zou, W.J. Hall, D.E. Shapiro, Smooth non-parametric receiver operating characteristic (ROC) curves for continuous diagnostic tests, Stat. Med. 16 (1997) 2143–2156.