The effect of verification bias on the comparison of predictive values of two binary diagnostic tests

The effect of verification bias on the comparison of predictive values of two binary diagnostic tests

Journal of Statistical Planning and Inference 138 (2008) 950 – 963 www.elsevier.com/locate/jspi The effect of verification bias on the comparison of p...

198KB Sizes 0 Downloads 28 Views

Journal of Statistical Planning and Inference 138 (2008) 950 – 963 www.elsevier.com/locate/jspi

The effect of verification bias on the comparison of predictive values of two binary diagnostic tests J.A. Roldán Nofuentes∗ , J.D. Luna del Castillo Biostatistics, School of Medicine, University of Granada, 18071, Spain Received 19 December 2006; received in revised form 22 January 2007; accepted 2 March 2007 Available online 24 May 2007

Abstract The comparison of the accuracy of two binary diagnostic tests has traditionally required knowledge of the disease status in all of the patients in the sample via the application of a gold standard. In practice, the gold standard is not always applied to all patients in a sample, and the problem of partial verification of the disease arises. The accuracy of a binary diagnostic test can be measured in terms of positive and negative predictive values, which represent the accuracy of a diagnostic test when it is applied to a cohort of patients. In this paper, we deduce the maximum likelihood estimators of predictive values (PVs) of two binary diagnostic tests, and the hypothesis tests to compare these measures when, in the presence of partial disease verification, the verification process only depends on the results of the two diagnostic tests. The effect of verification bias on the naïve estimators of PVs of two diagnostic tests is studied, and simulation experiments are performed in order to investigate the small sample behaviour of hypothesis tests. The hypothesis tests which we have deduced can be applied when all of the patients are verified with the gold standard. The results obtained have been applied to the diagnosis of coronary stenosis. © 2007 Elsevier B.V. All rights reserved. Keywords: Binary diagnostic test; Partial verification; Positive and negative predictive values; Verification bias

1. Introduction A diagnostic method is a test that is applied to a patient in order to obtain a provisional diagnosis regarding the presence or absence of a disease. If the test outcome is binary, the accuracy of the test is measured in terms of sensitivity and specificity. Sensitivity is the probability of a positive test result given that the patient is diseased and specificity is the probability of a negative test result given that the patient is non-diseased. Another way to describe the diagnostic value of a diagnostic test are the positive and negative predictive values (PVs) (Zhou et al., 2002). The positive predictive value (PPV) is the probability of a patient being diseased given a positive result test, and the negative predictive value (NPV) is the probability of a patient being non-diseased given a negative result test. PVs depend not only on the sensitivity and specificity of the diagnostic test in diseased and non-diseased patients, but also on the disease prevalence. While sensitivity and specificity quantify how well the test reflects true disease status, the PVs quantify the clinical value of the diagnostic test, since the patient and clinician are most interested in how likely it is that disease is present given the test result. ∗ Corresponding author.

E-mail addresses: [email protected] (J.A. Roldán Nofuentes), [email protected] (J.D. Luna del Castillo). 0378-3758/$ - see front matter © 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.jspi.2007.03.054

J.A. Roldán Nofuentes, J.D. Luna del Castillo / Journal of Statistical Planning and Inference 138 (2008) 950 – 963

951

In order to evaluate the accuracy of a diagnostic test we need to have an unbiased estimator of that accuracy, for which we need to know the true disease status (present or absent) in each patient through the application of a gold standard, e.g. a biopsy, a clinical assessment, etc. Sometimes, the gold standard is not applied to all of the patients in the sample, which can cause the problem known as verification bias (Begg and Greenes, 1983). When the sensitivity and specificity of a binary diagnostic test are estimated using only the patients with a disease status verified with the gold standard, the estimators obtained, called naïve estimators, are affected by a bias called verification bias or “work-up” bias. The verification bias is associated with studies of the effectiveness of diagnostic test which are restricted to patients with a disease status verified with the gold standard. The size of the bias depends on the association between the selection for disease verification through gold standard and the diagnostic test outcome. Begg and Greenes (1983) proposed a method, based on conditional independence in the process of verification (the selection for verification depends only on the result of the diagnostic test and not on the disease status), to correct verification bias in the estimation of accuracy in binary diagnostic tests. Zhou (1993) deduced the expressions of the maximum likelihood estimators (MLEs) of sensitivity and specificity of a binary diagnostic test in the presence of partial verification of the disease, showing that the MLEs coincide, under conditional independence in the process of verification, with the estimators given by Begg and Greenes. Zhou (1994) studied the effect of verification bias in the estimation of positive and negative PVs of a binary diagnostic test, demonstrating that, under the same previous assumption, the MLEs of PVs coincide with their respective naïve estimators. Kosinski and Barnhart (2003a) have proposed a global sensitivity analysis of sensitivity and specificity of a binary diagnostic test in the presence of verification bias, and Kosinski and Barnhart (2003b) have proposed a model for assessing a binary diagnostic test in the presence of verification bias when the process of verification also depends on the disease status. In clinical practice, one of the main problems when studying diagnostic methods is the comparison between the accuracy of two diagnostic tests when both are applied to the same sample of patients. The comparison of PVs in paired designs is not a well developed area of statistics. Bennett (1972) has proposed a method to compare PVs of two diagnostic tests, when both tests are applied to the same sample of patients, that was based on a multinomial maximum likelihood goodness-of-fit statistic. However, Bennett’s method does not really compare PVs (see Leisenring et al., 2000). Leisenring et al. (2000) have proposed a method based on a marginal regression model to compare the PVs of two diagnostic tests in paired designs. Zhou (1998) deduced the hypothesis tests in order to compare the sensitivity and specificity of two binary diagnostic tests in the presence of verification bias. Roldán Nofuentes and Luna del Castillo (2005, 2006) have deduced the hypothesis tests in order to compare different parameters (likelihood ratios, risk of error, and kappa coefficient of risk of error) of two binary diagnostic tests in the presence of partial verification. Therefore, it is interesting to deduce the hypothesis tests in order to compare the PVs of two binary diagnostic tests in paired designs when not all the patients are verified with the gold standard. The objective of this study is to deduce hypothesis tests in order to compare the PVs of two diagnostic tests, when both diagnostic tests are applied to the same sample of patients in the presence of partial verification of the disease, when the disease verification process only depends on the results of the diagnostic tests. Furthermore, we will study the effect that verification bias has on the naïve estimators of the PVs with the aim of being able to determine the condition in which the PVs can be compared, in the presence of verification bias, using the results of Leisenring et al. (2000). In Section 2, we study the PVs of a binary diagnostic test and we will also deduce its MLEs, asymptotic variances and hypothesis tests to compare PVs in two binary diagnostic tests in the presence of partial verification of the disease, when the disease verification process only depends on the results of the diagnostic tests. In Section 3, we deduce the MLEs of the PVs and the corresponding tests of hypothesis in the presence of covariates. In Section 4, we study the effect of verification bias on the naïve estimators of PVs of two diagnostic tests. In Section 5, simulation experiments are performed in order to investigate the small sample behaviour of each one of the hypothesis tests deduced in Section 2, and in order to investigate the effect of verification bias on the naïve estimators of the PVs. In Section 6, the results are applied to the diagnosis of coronary stenosis, and in Section 7, we discuss our findings. 2. Maximum likelihood estimators Let T1 and T2 be two binary random variables representing the outcomes of diagnostic tests 1 and 2, respectively, so that Tk = 1 if the test outcome is positive, indicating the provisional presence of the disease, and Tk = 0 if the result is negative, indicating the provisional absence of the disease. Let D be the binary random variable which models the result of the gold standard or disease status, so that D = 1 if the patient is diseased and D = 0 if the patient is

952

J.A. Roldán Nofuentes, J.D. Luna del Castillo / Journal of Statistical Planning and Inference 138 (2008) 950 – 963

non-diseased. And let V be the binary random variable which models the disease verification status of each patient, so that V = 1 if the patient has been verified through the gold standard and V = 0 if the patient has not been verified. Let k = P (Tk = 1|D = 1) and k = P (Tk = 0|D = 0) be the sensitivity and specificity of each diagnostic test, respectively, and p = P (D = 1) the disease prevalence. The PPV and NPV of each diagnostic test are PPVk = P (D = 1|Tk = 1)

and

NPVk = P (D = 0|Tk = 0).

Applying Bayes’ Theorem, PVs are written in terms of sensitivity, specificity, and disease prevalence as PPVk =

pk pk + (1 − p)(1 − k )

and

NPVk =

(1 − p)k . (1 − p)k + p(1 − k )

The first objective is to obtain the MLEs of PVs. If the probability of selecting a patient to verify the disease status depends only on the results of the two tests, i.e. P (V |T1 , T2 , D) = P (V |T1 , T2 ),

(1)

this estimation problem may be solved. Assumption (1) is equivalent to supposing that the mechanism for missing data is MAR (Rubin, 1976), and so, inference may be realised using the method of maximum likelihood. The application of both diagnostic tests to a same random sample of n patients gives us Table 1 (joint distribution of the two tests). Let ij and ij be the probability defined as ij = P (D = 1|T1 = i, T2 = j ) and ij = P (T1 = i, T2 = j )

Table 1 Cross-classification of test results by verification status and disease status T1 = 1

T1 = 0

T2 = 1

T2 = 0

T2 = 1

T2 = 0

Joint distribution of the two tests V =1 D=1 D=0 V =0

s11 r11 u11

s10 r10 u10

s01 r01 u01

s00 r00 u00

Total

n11

n10

n01

n00

T1 = 1

T1 = 0

Marginal distribution of test 1 V =1 D=1 D=0 V =0

s11 + s10 r11 + r10 u11 + u10

s01 + s00 r01 + r00 u01 + u00

Total

n11 + n10

n01 + n00

T2 = 1

T2 = 0

Marginal distribution of test 2 V =1 D=1 D=0 V =0

s11 + s01 r11 + r01 u11 + u01

s10 + s00 r10 + r00 u10 + u00

Total

n11 + n01

n10 + n00

J.A. Roldán Nofuentes, J.D. Luna del Castillo / Journal of Statistical Planning and Inference 138 (2008) 950 – 963

953

for i, j = 0, 1, where 11 = 1 − 00 − 10 − 01 . In terms of the probabilities ij and ij , the PVs of test 1 are written as 1 PPV1 =

j =0 1j 1j 1 j =0 1j

1 and

j =0 (1 − 0j )0j 1 j =0 0j

NPV1 =

.

(2)

Similarly, the PVs of diagnostic test 2 are 1 i=0 i1 i1 PPV2 =  1 i=0 i1

1 and

NPV2 =

i=0 (1 − i0 )i0 . 1 i=0 i0

(3)

The expressions given in Eqs. (2) and (3) are equivalent to those deduced by Pepe (2003). Zhou (1998) deduced the MLEs of the parameters ij and ij , i, j = 0, 1, and their asymptotic covariance matrices, given by the expressions ˆ ij =

sij sij + rij

and

ˆ ij =

nij . n

(4)

Substituting Eq. (4) in (2) and (3), by the Zehna’s Theorem (Zehna, 1966), the MLEs of PVs are 1 = PPV

n11 s11 m10 + n10 s10 m11 (n11 + n10 )m11 m10

 and N PV1 =

n01 r01 m00 + n00 r00 m01 (n01 + n00 )m01 m00

n11 s11 m01 + n01 s01 m11 (n11 + n01 )m11 m01

 and N PV2 =

n10 r10 m00 + n00 r00 m10 (n10 + n00 )m10 m00

for test 1, and 2 = PPV

for test 2, where mij = sij + rij . As PVs are functions of ij and ij , applying the delta method (Agresti, 1990) the (r, s)th element of asymptotic variance–covariance matrix of PVs is rs =

1 

 2ij jPVr jPVs jPVr jPVs + nij jij jij s (1 − ij )2 + rij 2ij jij jij i,j =0 ij (i,j )=(1,1) ⎛ ⎞⎛ ⎞ 2 2     1 ij jPVr ⎠ ⎝ ij jPVs ⎠ ⎝ , − 2 n n j  ij ij jij  ij ij 2ij (1 − ij )2

1 i,j =0

nij

(i,j )=(1,1)

(5)

(i,j )=(1,1)

where PV is the PPV or the NPV (for a proof, see Appendix A). Through Slutski’s Theorem, the statistic for H0 : PV1 = PV2 vs H1 : PV1  = PV2 is 2 1 − PV PV −→ N (0, 1). 2 ) − 2 Cov( 1 , PV 2 ) n→∞ 1 ) + Var(

PV  PV

PV Var( 3. MLEs with covariates Zhou (1998) also deduced the MLEs of sensitivity and specificity of each diagnostic test and the corresponding hypothesis tests, when, in the presence of partial disease verification, we observe in all of the patients a vector of discrete covariates X. If in this situation we use the approach proposed in Section 2, the number of free parameters can increase in an uncontrollable manner with an increase in the number of covariates due to the use of unrestricted multinomial distributions (Zhou, 1998). The assumption (1) is not valid for this case. If the probability of selecting a patient to verify the disease status depends only on the results of the two tests and the vector of covariates X and not

954

J.A. Roldán Nofuentes, J.D. Luna del Castillo / Journal of Statistical Planning and Inference 138 (2008) 950 – 963

on the true disease status, i.e. P (V |D, T1 , T2 , X) = P (V |T1 , T2 , X), this estimation problem may be solved. We suppose that the vector of discrete covariates X has a number I of different patterns of covariates, where xi = (xi0 , xi1 , . . . , xik ) is the ith covariate pattern of k covariates, and xi0 = 1, i = 1, . . . , I . Let ij l = P (D = 1|T1 = j, T2 = l, X = xi ) = ij l = P (T1 = j, T2 = l|X = xi ) = 1

exp(0 + 1 j + 2 l + 3 xi ) , 1 + exp(0 + 1 j + 2 l + 3 xi )

exp(0j l + 1j l xi )

h1 ,h2 =0

exp(0j l + 1h1 h2 xi )

,

i = P (X = xi )

(6)

for j, l = 0, 1, i = 1, . . . , I , and where 011 = 111 = 0 and 3 is a (k + 1) × 1 vector. The parameter ij l is modelled through logistic regression and ij l through multinomial logit model. Using this parameterization the PVs of test 1 are written as I 1  I 1 (1 − i0l )i0l i i=1 l=0 i1l i1l i PPV1 = and NPV1 = i=1I l=01 (7)  I 1 1 − i=1 l=0 i0l i i=1 l=0 i0l i and of test 2: PPV2 =

I 1

j =0 ij 1 ij 1 i I 1 1 − i=1 j =0 ij 0 i i=1

 I 1 and

NPV2 =

j =0 (1 − ij 0 )ij 0 i . I 1 i=1 j =0 ij 0 i

i=1

(8)

The MLEs of ij l and ij l are obtained using a statistical software and the MLE of i is ˆ i = ni /n (Zhou, 1998), where ni is the total number of patients with X = xi and n is the total number of patients. Substituting the MLEs of ij l , ij l , and i in Eqs. (7) and (8) we obtain, by the Zehna’s Theorem, the MLEs of PVs in the presence of covariates. As PVs are functions of , , and , applying the delta method (Agresti, 1990) the covariance matrix of PVs is j(PV1 , PV2 ) −1 j(PV1 , PV2 ) j(PV1 , PV2 ) −1 j(PV1 , PV2 ) j(PV1 , PV2 ) −1 j(PV1 , PV2 ) I + I + I , j j j j j j

(9)

ˆ and where PV is the PPV or the NPV. In Eq. (9), the Fisher information matrices ˆ and = , evaluated at  = ˆ ,  = , I and I are computed using statistical software (for example, SPSS or Stata), and the Fisher information matrix I is (Zhou, 1998) nI n1 nI −1 + 2 (1, . . . , 1) (1, . . . , 1). I = diag ,..., 2 2 1 I −1 I Finally, the statistic for H0 : PV1 = PV2 vs H1 : PV1  = PV2 is similar to the statistic obtained in the previous section. 4. Naïve estimators of PVs Zhou (1994) has studied the effect of verification bias on the PVs of a binary diagnostic test, demonstrating that, when the disease verification process only depends on the result of the diagnostic test, the MLEs of PVs coincide with their respective naïve estimators. When two binary diagnostic tests are compared, using only the patients with disease status verified, the naïve estimators of PVs of test 1 are  1n = s11 + s10 PPV m11 + m10

 1n = r01 + r00 and NPV m01 + m00

J.A. Roldán Nofuentes, J.D. Luna del Castillo / Journal of Statistical Planning and Inference 138 (2008) 950 – 963

955

and of test 2:  2n = s11 + s01 PPV m11 + m01

 2n = r10 + r00 , and NPV m10 + m00

where mij = sij + rij . Under MAR assumption the probabilities of selecting a patient for disease verification is

ij = P (V = 1|T1 = i, T2 = j ), i, j = 0, 1. It is easy to demonstrate that the MLE of ij is sij + rij .

ˆ ij = nij

(10)

The results of Zhou (1994) are not valid in the situation studied in our article, since, in general, P (V =1|T1 =i) = i1 + i0 and P (V = 1|T2 = j ) = 1j + 0j . In Table 1, we also show the marginal distributions of each diagnostic test. From the marginal distribution of test 1 and subject to the MAR assumption (P (V =1|D=1, T1 =i)=P (V =1|T1 =i)), the MLEs of the verification probabilities are ˆ 1 = (s11 + s10 + r11 + r10 )/(n11 + n10 ) and ˆ 0 = (s01 + s00 + r01 + r00 )/(n01 + n00 ). It is easy to demonstrate that, in general, ˆ 1  = ˆ 11 + ˆ 10 and ˆ 0  = ˆ 01 + ˆ 00 , with ˆ ij given by the Eq. (10). Similar results are obtained from the marginal distribution of test 2. Therefore, if in the presence of partial disease verification and subject to MAR assumption two binary tests are applied to all of the patients in the same random sample, the evaluation of each diagnostic test cannot be carried out from their marginal distribution, i.e. each diagnostic test cannot be evaluated from the corresponding 3 × 2 table. The following proposition gives the conditions under which the MLEs of PVs coincide with the naïve estimators of PVs. Proposition. Under MAR assumption, when ˆ 11 = ˆ 10 ( ˆ 11 = ˆ 01 ) the MLE of PPV of test 1 (test 2) coincide with their respective naïve estimator, and when ˆ 00 = ˆ 01 ( ˆ 00 = ˆ 10 ) the MLE of NPV of test 1 (test 2) coincide with their respective naïve estimator. ˆ the MLEs of PVs coincide with the naïve estimators, and the comparisons of For a proof, see Appendix B. If ˆ ij = , the PVs can be realized applying the method proposed by Leisenring et al. (2000). Although this result is interesting, it represents a special case that is very unlikely to occur since in practice all the probabilities ij are different. When

ij = 1, i, j = 0, 1, which corresponds to the situation in which all of the patients are verified and therefore there is no verification bias, the evaluation and comparison of the PVs of the diagnostic tests can be carried out using our method or using the method of Leisenring et al. 5. Simulation study To investigate the power and type I error of the hypothesis tests obtained in Section 2 a Monte Carlo study was carried out. This study consisted of the generation of 2000 random samples sized 100, 200, 300, 400, 500, and 1000 from multinomial distributions, whose probabilities have been computed under the MAR assumption and considering that both diagnostic tests are conditionally dependent on the disease (Vacek, 1985), i.e. P (T1 = i, T2 = j |D = k) = P (T1 = i|D = k) × P (T2 = j |D = k) + ij k ,

(11)

where ij = 1 when i = j and ij = −1 when i  = j , and k represents the conditional dependence between both diagnostic tests ( k > 0): 1 is the covariance when D = 1 and 0 is the covariance when D = 0. It is verified (Vacek, 1985) that k ϑ1 (1 − ϑ2 ) when ϑ2 > ϑ1 and k ϑ2 (1 − ϑ1 ) when ϑ1 > ϑ2 , where ϑ is the sensitivity or specificity. If 1 = 0 = 0, assumption (11) is equivalent to supposing that both diagnostic tests are conditionally independent on the disease. In the simulation experiments the probabilities of selecting a patient for the verification of the disease status have been defined as 11 = P (V = 1|T1 = 1, T2 = 1), 10 = P (V = 1|T1 = 1, T2 = 0), 01 = P (V = 1|T1 = 0, T2 = 1), and

00 = P (V = 1|T1 = 0, T2 = 0). Each multinomial distribution has been characterized by a sensitivity and specificity value, for both tests, a prevalence of the disease and verification probabilities. The simulation experiments have been designed so that in none of the samples generated frequencies sij and/or rij are equal to zero, since if this occurs we cannot estimate the asymptotic variance–covariance matrix. Both hypothesis tests deduced in Section 2 have been carried out for every sample, with type I error  = 5%.

956

J.A. Roldán Nofuentes, J.D. Luna del Castillo / Journal of Statistical Planning and Inference 138 (2008) 950 – 963

To study the power of the two tests of hypothesis we have taken as the values of accuracy for each diagnostic tests (1 = 0.80, 1 = 0.70, 2 = 0.90, 2 = 0.85), (1 = 0.85, 1 = 0.75, 2 = 0.95, 2 = 0.90), (1 = 0.85, 1 = 0.75, 2 = 0.90, 2 = 0.85), and (1 = 0.80, 1 = 0.70, 2 = 0.95, 2 = 0.90), values which appear frequently in clinical practice. As values of disease prevalence we have taken 10%, 20%, 30%, 40%, and 50%; and as verification probabilities we have taken: (a) ( 11 = 0.75, 10 = 01 = 0.40, 00 = 0.10), (b) ( 11 = 0.95, 10 = 01 = 0.60, 00 = 0.30), and (c) ( 11 = 10 = 01 = 00 = 1). For the sample sizes studied, case (a) corresponds to the situation in which the verification probabilities are low, case (b) corresponds to the situation in which the verification probabilities are high, and case (c) corresponds to the situation where all patients are verified, and therefore there is no verification bias. In cases (a) and (b) we have applied the method of Section 2 to compare the PVs, and in case (c) the comparison between the PVs of diagnostic tests have been made using the method proposed in Section 2 and the method of Leisenring et al. (2000). Table 2 shows some results of the hypothesis test of equality of the PPVs for (1 =0.80, 1 =0.70, 2 =0.90, 2 =0.85) and different values of 1 and 0 . The analysis of these results concludes that the power of the hypothesis test increases through the increase in the verification probabilities and conditional dependence between both diagnostic tests, and it decreases with the increase in the prevalence of the disease. The higher the verification probabilities are, the lower the verification bias is. Therefore, the proportion of verified patients is higher and so the power of the hypothesis test increases. With respect to conditional dependence, the increase in 1 and/or 0 means, in the same verification and prevalence conditions, an increase in i,j,k P (T1 = i, T2 = j, V = 1, D = k), increasing the proportion of verified patients, and so increasing the power. With respect to the prevalence effect, an increase in this means a decrease in the difference between PPVs and so a reduction in the power of the hypothesis test. From the analysis of the results we also obtain that the assumption of conditional independence ( 1 = 0 = 0) does not mean, in relation to conditional dependence ( 1 > 0 and/or 0 > 0), a significant loss in hypothesis test power, mostly for samples of 500 patients at least. Therefore, conditional dependence does not have a great effect on the power when the sample is large. Partial verification does not mean, in relation to the total verification, an important loss in hypothesis test power for samples of 500 patients at least. Regarding the size of the samples, in general terms, with samples of 500 patients the test of hypothesis has a power higher than 90%. Similar conclusions are obtained for the rest of sensitivities and specificities. When all of the patients are verified ( ij = 1, i, j = 0, 1), with samples of 200 patients the power of the hypothesis test of comparison of the PPVs proposed in Section 2 is, except when 1 = 0 = 0, greater than 90%. Moreover, the power of the hypothesis test proposed in Section 2 and power of the hypothesis test of comparison of the PPVs of Leisenring et al. (2000) are very similar, and both powers are practically identical with samples of 200 or more. In order to study type I error of the hypothesis test, we have taken as the values of accuracy of the two diagnostic tests (1 = 0.80, 1 = 0.70, 2 = 0.80, 2 = 0.70) and (1 = 0.90, 1 = 0.80, 2 = 0.90, 2 = 0.80), values which appear frequently in clinical practice. The prevalence and verification probabilities values are the same as those for the power study. Some of the results are shown in Table 3, obtaining that the type I error increases as the prevalence and the verification probabilities increase and it decreases as the conditional dependence between both diagnostic tests increases. In general, when the conditional dependence between both diagnostic tests is high, type I errors do not usually exceed error  = 5%. When all of the patients are verified ( ij = 1, i, j = 0, 1), for a prevalence of 10% and 20% the type I error is normally not greater than the nominal error, except when 1 = 0 = 0, so that the hypothesis test has a similar behaviour to that of an exact test. For a prevalence of between 30% and 50% the type I error fluctuates with respect to the nominal error. Similar behaviour can be observed in the type I error of the hypothesis test of equality of the PPVs of Leisenring et al. (2000). Therefore, the hypothesis test is conservative, as is expected because of small sample sizes, and its behaviour is similar to an exact hypothesis test. Similar conclusions are obtained for (1 = 0.90, 1 = 0.80, 2 = 0.90, 2 = 0.80). Similar conclusions are obtained for the hypothesis test of the equality of the NPVs, although the power of the hypothesis test increases as the prevalence increases. In order to study the effect that verification probabilities have on the naïve estimators of the PVs, and to be able to determine therefore when the results of Leisenring et al. (2000) can be applied if the conditions of the proposition given in Section 4 are not satisfied, we have carried out other simulation experiments in which the samples have been generated in a similar way to the previous experiments. For each sample size, we calculated the mean squared errors of the MLEs and of the naïve estimators of the PVs. From the results obtained we can conclude that the mean squared errors of the naïve estimators of the PPVs are greater than those of the MLEs, and these errors decrease with an increase in verification probabilities. Similar conclusions are obtained for the NPVs. From the results of these

J.A. Roldán Nofuentes, J.D. Luna del Castillo / Journal of Statistical Planning and Inference 138 (2008) 950 – 963

957

Table 2 Power of the hypothesis test of equality of the PPVs n

1 = 0, 0 = 0

1 = 0.035, 0 = 0.05

1 = 0.035, 0 = 0.10

1 = 0.07, 0 = 0.05

1 = 0.07, 0 = 0.10

1 = 0.80, 1 = 0.70, 2 = 0.90, 2 = 0.85, p = 10%, PPV1 = 0.23, PPV2 = 0.40

11 = 0.75, 10 = 01 = 0.40, 00 = 0.10 100 (30.5%) 200 (29.1%) 300 (28.7%) 400 (28.5%) 500 (28.4%) 1000 (28.2%)

0.1410 0.4780 0.7645 0.8980 0.9630 1

0.1430 0.6190 0.8605 0.9740 0.9915 1

0.0960 0.6465 0.9355 0.9830 0.9980 1

0.1540 0.6430 0.9260 0.9895 0.9970 1

0.1200 0.7060 0.9650 0.9970 1 1

11 = 0.95, 10 = 01 = 0.60, 00 = 0.30 100 (49.6%) 0.2655 0.3315 200 (48.7%) 0.7145 0.8285 300 (48.5%) 0.8970 0.9675 400 (48.3%) 0.9685 0.9910 500 (48.3%) 0.9935 1 1000 (48.2%) 1 1

0.3125 0.9025 0.9915 0.9990 1 1

0.3495 0.8880 0.9865 1 1 1

0.3255 0.9460 0.9985 1 1 1

11 = 10 = 01 = 00 = 1 100 (100%) 0.4840 200 (100%) 0.8820 300 (100%) 0.9790 400 (100%) 0.9940 500 (100%) 1 1000 (100%) 1

0.6800 0.9860 1 1 1 1

0.6415 0.9760 0.9990 1 1 1

0.7170 0.9985 1 1 1 1

0.6140 0.9625 0.9960 1 1 1

1 = 0.80, 1 = 0.70, 2 = 0.90, 2 = 0.85, p = 50%, PPV1 = 0.73, PPV2 = 0.86

11 = 0.75, 10 = 01 = 0.40, 00 = 0.10 100 (46.0%) 200 (44.9%) 300 (44.7%) 400 (44.6%) 500 (44.5%) 1000 (44.5%)

0.2255 0.5505 0.7490 0.8545 0.9085 0.9960

0.2095 0.6610 0.8695 0.9480 0.9815 0.9995

0.1640 0.7200 0.9495 0.9920 0.9985 1

0.1900 0.6590 0.8940 0.9720 0.9890 1

0.1535 0.7660 0.9765 0.9975 1 1

11 = 0.95, 10 = 01 = 0.60, 00 = 0.30 100 (65.1%) 0.3655 0.4270 200 (64.6%) 0.6915 0.8345 300 (64.5%) 0.8580 0.9535 400 (64.5%) 0.9345 0.9920 500 (64.5%) 0.9735 0.9975 1000 (64.4%) 0.9995 1

0.4070 0.9450 0.9960 1 1 1

0.4230 0.8590 0.9630 0.9930 1 1

0.4150 0.9595 0.9985 1 1 1

0.8240 0.9975 0.9990 1 1 1

0.6660 0.9405 0.9955 0.9985 1 1

0.8140 0.9980 1 1 1 1

11 = 10 = 01 = 00 = 1 100 (100%) 200 (100%) 300 (100%) 400 (100%) 500 (100%) 1000 (100%)

0.5470 0.8355 0.9455 0.9850 0.9970 1

0.6855 0.9445 0.9875 0.9980 0.9995 1

In parentheses we show the average percentages of the patients verified.

simulation experiments, we can conclude that, when the disease verification process only depends on the results of the diagnostic tests, the evaluation and comparison of the PVs of the two diagnostic tests cannot be carried out only considering those patients verified with the gold standard. Therefore, when the conditions of the proposition given in Section 4 are not satisfied, the comparison of the PVs of the diagnostic tests must be carried out applying our method and not the method of Leisenring et al. (2000).

958

J.A. Roldán Nofuentes, J.D. Luna del Castillo / Journal of Statistical Planning and Inference 138 (2008) 950 – 963

Table 3 Type I error of the hypothesis test of equality of the PPVs n

1 = 0, 0 = 0

1 = 0.075, 0 = 0.10

1 = 0.075, 0 = 0.20

1 = 0.15, 0 = 0.10

1 = 0.15, 0 = 0.20

1 = 0.80, 1 = 0.70, 2 = 0.80, 2 = 0.70, p = 10%, PPV1 = 0.23, PPV2 = 0.23

11 = 0.75, 10 = 01 = 0.40, 00 = 0.10 100 (34.6%) 200 (33.2%) 300 (32.9%) 400 (32.7%) 500 (32.6%) 1000 (32.4%)

0.0035 0.0100 0.0150 0.0230 0.0275 0.0360

0.0015 0.0015 0.0065 0.0100 0.0150 0.0420

0.0005 0 0.0020 0.0065 0.0080 0.0435

0 0.0010 0.0025 0.0005 0.0040 0.0065

0 0 0 0 0.0001 0.0005

11 = 0.95, 10 = 01 = 0.60, 00 = 0.30 100 (53.8%) 0.0045 0.0070 200 (52.9%) 0.0135 0.0075 300 (52.6%) 0.0260 0.0105 400 (52.5%) 0.0325 0.0265 500 (52.4%) 0.0525 0.0275 1000 (52.4%) 0.0400 0.0415

0 0.0020 0.0060 0.0135 0.0225 0.0505

0.0005 0.0040 0.0025 0.0055 0.0080 0.0215

0 0.0005 0.0001 0.0005 0.0005 0.0035

11 = 10 = 01 = 00 = 1 100 (100%) 0.0140 200 (100%) 0.0400 300 (100%) 0.0425 400 (100%) 0.0525 500 (100%) 0.0455 1000 (100%) 0.0455

0 0.0035 0.0130 0.0295 0.0325 0.0490

0.0055 0.0095 0.0145 0.0195 0.0295 0.0410

0 0 0 0 0.0015 0.0060

0.0080 0.0220 0.0275 0.0360 0.0355 0.0510

1 = 0.80, 1 = 0.70, 2 = 0.80, 2 = 0.70, p = 50%, PPV1 = 0.73, PPV2 = 0.73

11 = 0.75, 10 = 01 = 0.40, 00 = 0.10 100 (47.1%) 200 (46.0%) 300 (45.7%) 400 (45.5%) 500 (45.5%) 1000 (45.4%)

0.0505 0.0585 0.0620 0.0540 0.0540 0.0595

0.0150 0.0595 0.0530 0.0570 0.0500 0.0580

0.0005 0.0010 0.0060 0.0055 0.0055 0.0195

0.0120 0.0245 0.0315 0.0375 0.0300 0.0460

0 0 0 0 0.0010 0.0185

11 = 0.95, 10 = 01 = 0.60, 00 = 0.30 100 (66.1%) 0.0665 0.0285 200 (65.6%) 0.0545 0.0610 300 (65.5%) 0.0580 0.0595 400 (65.4%) 0.0475 0.0555 500 (65.4%) 0.0545 0.0515 1000 (65.3%) 0.0440 0.0595

0 0.0025 0.0075 0.0105 0.0180 0.0345

0.0180 0.0435 0.0420 0.0455 0.0380 0.0405

0 0 0 0.0030 0.0055 0.0260

11 = 10 = 01 = 00 = 1 100 (100%) 0.0595 200 (100%) 0.0540 300 (100%) 0.0465 400 (100%) 0.0500 500 (100%) 0.0575 1000 (100%) 0.0545

0.0015 0.0080 0.0255 0.0290 0.0250 0.0540

0.0470 0.0475 0.0525 0.0495 0.0500 0.0515

0 0 0.0020 0.0080 0.0130 0.0480

0.0500 0.0515 0.0560 0.0540 0.0510 0.0510

In parentheses we show the average percentages of the patients verified.

6. Application to the diagnosis of coronary stenosis Coronary stenosis is a coronary artery disease consisting of the narrowing or obstruction of the heart’s aortic valve. This disease can be caused by different disorders (rheumatic fever, valve calcification, valve diseases, congenital anomalies, …) and its symptoms are also varied (weakness, palpitations, chest pain, coughing, …). The prevalence is

J.A. Roldán Nofuentes, J.D. Luna del Castillo / Journal of Statistical Planning and Inference 138 (2008) 950 – 963

959

Table 4 Data from the coronary stenosis study T1 = 1 T2 = 1

T1 = 0 T2 = 0

T2 = 1

T2 = 0

Two or more risk factors V =1 D=1 D=0 V =0

224 38 31

1 32 24

18 6 21

1 35 219

Total

293

57

45

255

37 92 16

0 79 70

0 12 15

0 57 522

145

149

27

579

One risk factor or none V =1 D=1 D=0 V =0 Total

higher in men than in women, feasible to reach higher percentages than 50% in people aged over 50. The diagnosis can be done using a dynamic transthoracic echocardiography with effort or using a dynamic transthoracic echocardiography with dobutamine. As a coronary disease, the stenosis risk factors are arterial hypertension, hypercholesterolemia, habitual smoking, diabetes, and family history of coronary heart disease. Table 4 shows the data obtained applying two diagnostic tests in a sample of 1550 men using as a gold standard a coronary angiography, where T1 represents the outcome of the echocardiography with effort and T2 the outcome of the echocardiography with dobutamine. Taking into account the fact that the angiography can cause different reactions in patients (arrhythmia, embolism, infections, blood thrombus, apoplexies, infarctions, …), not all patients are verified. The data in Table 4 was obtained in two phases. In the first phase, the two diagnostic tests were applied to all of the patients; in the second phase, the gold standard was only applied to a subgroup of patients, depending on the results of the two diagnostic tests and the risk factors. Therefore, the study corresponds to a design in two phases, where the selection of a patient to verify his or her disease status with the angiography only depended on the results of both diagnostic tests and the number of risk factors ( 1 or 2). In this way, a patient with two or more risk factors has a greater probability of being verified than a patient with one risk factor or none. Therefore, in our study we verified the MAR assumption mentioned in the previous sections. Of 1550 patients 650 have two or more risk factors and 900 have one risk factor or none, therefore the number of risk factors was treated as a binary covariate. 6.1. Analysis of stenosis data with patients with two or more risk factors Using the patients with two or more risk factors the verification probabilities are ˆ 11 = 0.89, ˆ 10 = 0.58, ˆ 01 = 0.53, 1 = 0.7207 and PPV 2 = 0.8410. The estimated variances and covariance and ˆ 00 = 0.14, and the MLEs of PPVs are PPV



    2 ) = 0.00035. The value of the statistic of the are Var(PPV1 ) = 0.00062, Var(PPV2 ) = 0.00050, and Cov(PPV1 , PPV −7 hypothesis test is 5.87 (p-value < 10 ) and the 95% confidence interval for PPV1 –PPV2 is (−0.1605, −0.0801), so the hypothesis of equality of the PPVs for both diagnostic tests is rejected. Therefore, in patients with two or more risk factors a positive outcome of the echocardiography with dobutamine is more indicative of the presence of coronary stenosis than a positive outcome of the echocardiography with effort.   The MLEs of NPVs are N PV1 = 0.8639 and N PV2 = 0.9718, and its estimated variances and covariance are



 N     Var(NPV1 ) = 0.00094, Var(NPV2 ) = 0.00053, and Cov( PV1 , N PV2 ) = 0.00052. The value of the statistic of the −6 hypothesis test is 5.21 (p-value < 10 ) and the 95% confidence interval for NPV1 –NPV2 is (−0.1485, −0.0673), so the NPV of the echocardiography with dobutamine is higher than that of the echocardiography with effort. So, in patients with two or more risk factors a negative outcome of the echocardiography with dobutamine is more indicative of the absence of coronary stenosis than a negative outcome of the echocardiography with effort.

960

J.A. Roldán Nofuentes, J.D. Luna del Castillo / Journal of Statistical Planning and Inference 138 (2008) 950 – 963

Using only the patients with disease status verified, applying the method proposed by Leisenring et al. (2000) the same previous conclusions are obtained: a positive outcome of the echocardiography with dobutamine is more indicative of the presence of coronary stenosis than a positive outcome of the echocardiography with effort ( 2 =23.66, p-value < 10−5 ), and a negative outcome of the echocardiography with dobutamine is more indicative of the absence of disease than a negative outcome of the echocardiography with effort ( 2 = 21.32, p-value < 10−5 ). The naïve estimators of PPVs of the echocardiography with effort and the echocardiography with dobutamine are 0.7627 and 0.8462, respectively, and the naïve estimators of NPVs are 0.6833 and 0.9710, respectively. Therefore, though in this example we obtain the same results applying the method of Leisenring et al. (2000) and applying our method, the PPV of echocardiography with effort is overestimated and the NPV is seriously underestimated. The naïve estimators of the PVs of the echocardiography with effort are more affected by verification bias than the naïve estimators of the PVs of the echocardiography with dobutamine. Therefore, the evaluation and comparison of the PVs of the two diagnostic tests cannot be carried out using the results of Leisenring et al. (2000), and it is necessary to use the method that we proposed in Section 2. 6.2. Analysis with covariates The presence of two or more risk factors is a determining factor for the presence of the coronary disease, so that the probability of selecting a patient to verify his or her disease status will depend on the results of the diagnostic tests and on the presence of two or more risk factors. In patients with one risk factor or none the verification probabilities are

ˆ 11 = 0.89, ˆ 10 = 0.53, ˆ 01 = 0.44, and ˆ 00 = 0.10. For the ith patient, the covariate xi = 1 if the patient has at least two risk factors and xi = 0 if the patient have one risk factor or none. Using SPSS 12.0 software, the MLEs of ij l and ij l are ˆ ij l =

exp(−6.996 + 0.950j + 5.067l + 2.795xi ) , 1 + exp(−6.996 + 0.950j + 5.067l + 2.795xi )

j, l = 0, 1

and ˆ i00 = exp(1.385 − 1.523xi )/k, ˆ i01 = exp(−1.681 − 0.193xi )/k, ˆ i10 = exp(0.027 − 1.664xi )/k, ˆ i00 = 1/k, with k = 1 + exp(1.385 − 1.523xi ) + exp(−1.681 − 0.193xi ) + exp(0.027 − 1.664xi ). 1 = 0.4567 and PPV 2 = 0.6406, and the estimated variances and covariance are The MLEs of PPVs are PPV

 PPV

 1 , PPV 2 ) = 0.000319. The value of the statistic of the  Var(PPV1 ) = 0.000408, Var(PPV2 ) = 0.000510, and Cov( −8 hypothesis test is 11.01 (p-value < 10 ), the 95% confidence interval for PPV1 –PPV2 is (−0.2166, −0.1512). So, a positive outcome of the echocardiography with dobutamine is more indicative of the presence of coronary stenosis than a positive outcome of the echocardiography with effort.

N    The MLEs of NPVs are N PV1 =0.9566 and N PV2 =0.9935. The estimated variances and covariance are Var( PV1 )=

    0.000084, Var(NPV2 ) = 0.000022, and Cov(NPV1 , NPV2 ) = 0.000021. The value of the statistic of the hypothesis test is 4.61 (p-value < 10−3 ), and the 95% confidence interval for NPV1 –NPV2 is (−0.0526, −0.0212). Therefore, a negative outcome of the echocardiography with dobutamine is more indicative of the absence of coronary stenosis than a negative outcome of the echocardiography with effort. 7. Discussion The comparison of PVs of two binary diagnostic tests is not a well-investigated area of medical statistics. In this study, we have deduced the MLEs and variances of the positive and negative PVs of two binary diagnostic tests in paired designs when not all the patients are verified with the gold standard, and we have obtained the hypothesis tests in

J.A. Roldán Nofuentes, J.D. Luna del Castillo / Journal of Statistical Planning and Inference 138 (2008) 950 – 963

961

order to compare these parameters when both diagnostic tests are applied to the same random sample. The MLEs and variances of PVs and the hypothesis tests are also obtained when discrete covariates have been observed in all patients. The hypothesis tests obtained depend on the MAR assumption for the verification mechanism, and therefore the disease verification process only depends on the results of the diagnostic tests (and on the covariates, if discrete covariates have been observed in all patients). Though this assumption can be restrictive, it can be assumed when the verification process does not depend on unobservable variables which are related to the disease. Simulation experiments have been carried out in order to investigate the small sample behaviour of the hypothesis tests. The simulation experiments are based on the assumption MAR and the assumption that both diagnostic tests are conditionally dependent on the disease. The results of the simulation experiments show the effect of conditional dependence, the verification probabilities and the disease prevalence on the power and type I error of the hypothesis tests, and, in general terms, we need samples of 500 patients so that the power of the hypothesis tests is greater than 90%. The method which we propose to compare the PVs (without covariates) cannot be applied when any frequency sij and/or rij is zero, since in this situation it is not possible to estimate the variance–covariance matrix of the PVs. One possible solution to this problem could be to add the value 0.5 to all of the frequencies of the table, as is done in the analysis of the 2 × 2 tables when any frequency is zero. This solution would require us to carry out simulation experiments in order to study the effect that this could have on the estimators, the power and the type I error of the hypothesis tests. On the contrary, in the presence of covariates, the method proposed in Section 3 can be applied when any frequency sij and/or rij is zero, just as happens in the example studied. The hypothesis test which we have deduced can be applied when all of the patients are verified, in which case ij = 1 and the frequencies uij = 0. In this situation in which verification bias is not present, the hypothesis tests which we have deduced have a behaviour, in terms of power and type I error, which is very similar (which is even identical in some cases) to the hypothesis tests of comparison of the PVs of Leisenring et al. (2000). The reason for this similarity between the two methods is that the method proposed by Leisenring et al. (2000) is equivalent to a hypothesis test H0 : PV1 − PV2 = 0 when both diagnostic tests are applied to the same sample of patients. The method which we propose in this study is based on the MAR assumption. If this assumption is not reasonable, the verification process also depends on unobservable factors related to the disease and, therefore, by applying our method the values of the estimators of the PVs would be biased. Until now, it has not been possible to evaluate the magnitude of this bias since there is no known statistical method to compare the PVs of two binary tests when the verification process is not MAR. A possible solution to this problem can be the application of multiple imputation methods. In the light of the work of Harel and Zhou (2006) it is necessary to research into methods which allow us to compare the PVs of two binary tests when the verification process is not missing at random. Vacek (1985) show the importance of conditional dependence on the estimation of sensitivity and specificity (and therefore on the PVs) of a binary diagnostic test. The method proposed by Zhou (1998), and that we have used in this study, makes it possible to compare the sensitivities and specificities of two binary diagnostic tests in the presence of partial verification, but it does not allow us to estimate or make inferences about the conditional dependence between both diagnostic tests. Therefore it is necessary to investigate new methods to compare the sensitivities and specificities (or PVs) and to estimate the conditional dependence between both diagnostic tests.

Acknowledgements This research was supported by the Ministerio de Ciencia y Tecnología, Spain, Grant number BFM2003-08950. We thank the editor and referees for their helpful comments that improved the quality of the paper.

Appendix A. Applying the delta method (Agresti, 1990), the asymptotic variance–covariance matrix of PV1 and PV2 is j(PV1 , PV2 ) −1 j(PV1 , PV2 ) j(PV1 , PV2 ) −1 j(PV1 , PV2 ) I + I , j j j j

962

J.A. Roldán Nofuentes, J.D. Luna del Castillo / Journal of Statistical Planning and Inference 138 (2008) 950 – 963

where I−1 and I−1 are the inverse matrices of the matrices of information of Fisher of  and , given for the equations (Zhou, 1998) 

2ij (1 − ij )2 −1 , i, j = 0, 1 I = diag sij (1 − ij )2 + rij 2ij and

200 201 210 I = diag , , n00 n01 n10



−1





1 1

i,j =0

2ij

200 201 210 , , n00 n01 n10



200 201 210 , , n00 n01 n10

.

nij

By performing the matrix operations we obtain expression (5), where the partial derivatives of PVk with respect to  = (00 , 01 , 10 , 11 ) and  = (00 , 01 , 10 ) are jPPV1 = 0, j0j

1j jPPV1 = , j1j 1 − 00 − 01

jPPV1 10 − 11 = , j10 1 − 00 − 01

10 10 + 11 (1 − 00 − 01 − 10 ) 11 jPPV1 = − , 2 j0j 1 − 00 − 01 (1 − 00 − 01 ) jPPV2 = 0, ji0

jPPV2 i1 = , ji1 1 − 00 − 10

jPPV2 01 − 11 = , j10 1 − 00 − 10

jPPV2 01 01 + 11 (1 − 00 − 01 − 10 ) 11 = − , ji0 1 − 00 − 10 (1 − 00 − 10 )2 0j jNPV1 =− , j0j 00 + 01

jNPV1 = 0, j1j

jNPV2 = 0, ji1

i = 0, 1,

jNPV1 = 0, j10

1 − 0j jNPV1 (1 − 00 )00 + (1 − 01 )01 = − , j0j 00 + 01 (00 + 01 )2 i0 jNPV2 =− , ji0 00 + 10

j = 0, 1,

j = 0, 1,

jNPV2 = 0, j01

jNPV2 1 − i0 (1 − 00 )00 + (1 − 10 )10 = − , ji0 00 + 10 (00 + 10 )2

i = 0, 1.

Appendix B. The MLE of PPV of test 1 is written in terms of ˆ ij as 1 = PPV

s11 ˆ 11 + s10 ˆ 10 . (n10 + n11 ) ˆ 11 ˆ 10

(12)

Let 1 = P (V = 1|T1 = 1). If P (V = 1|T1 = 1, T2 = 1) = P (V = 1|T1 = 1, T2 = 0), then 11 = 10 = 1 . The MLE of

1 is s11 + s10 + r11 + r10

ˆ 1 = . n11 + n10

(13)

Substituting Eq. (13) in (12), we obtain the expression of the naïve estimator of PPV of test 1. Similarly, it is demonstrated for the rest of PVs.

J.A. Roldán Nofuentes, J.D. Luna del Castillo / Journal of Statistical Planning and Inference 138 (2008) 950 – 963

963

References Agresti, A., 1990. Categorical Data Analysis. Wiley, New York. Begg, C.B., Greenes, R.A., 1983. Assessment of diagnostic tests when disease verification is subject to selection bias. Biometrics 39, 207–215. Bennett, B.M., 1972. On comparison of sensitivity, specificity and predictive value of a number of diagnostic procedures. Biometrics 28, 793–800. Harel, O., Zhou, X.H., 2006. Multiple imputation for the comparison of two screening tests in two-phase Alzheimer studies. Statist. Med., doi: 10.1002/sim.2715. Kosinski, A.S., Barnhart, H.X., 2003a. A global sensitivity analysis of performance of a medical diagnostic test when verification bias is present. Statist. Med. 22, 2711–2721. Kosinski, A.S., Barnhart, H.X., 2003b. Accounting for nonignorable verification bias in assessment of diagnostic test. Biometrics 59, 163–171. Leisenring, W., Alonzo, T., Pepe, M.S., 2000. Comparisons of predictive values of binary medical diagnostic tests for paired designs. Biometrics 56, 345–351. Pepe, M.S., 2003. The Statisitical Evaluation of Medical Tests for Classification and Prediction. Oxford University Press, Oxford. Roldán Nofuentes, J.A., Luna del Castillo, J.D., 2005. Comparing the likelihood ratios of two binary diagnostic tests in the presence of partial verification. Biometrical J. 47, 442–457. Roldán Nofuentes, J.A., Luna del Castillo, J.D., 2006. Comparing two binary diagnostic tests in the presence of verification bias. Comput. Statist. Data Anal. 50, 1551–1564. Rubin, D.B., 1976. Inference and missing data. Biometrika 4, 73–89. Vacek, P.M., 1985. The effect of conditional dependence on the evaluation of diagnostic tests. Biometrics 41, 959–968. Zehna, P.W., 1966. Invariance of maximum likelihood estimation. Ann. Math. Statist. 37, 744. Zhou, X.H., 1993. Maximum likelihood estimators of sensitivity and specificity corrected for verification bias. Comm. Statist. Theory Methods 22, 3177–3198. Zhou, X.H., 1994. Effect of verification bias on positive and negative predictive values. Statist. Med. 13, 1737–1745. Zhou, X.H., 1998. Comparing accuracies of two screening tests in a two-phase study for dementia. Appl. Statist. 47, 135–147. Zhou, X.H., Obuchwski, N.A., McClish, D.K., 2002. Statistical Methods in Diagnostic Medicine. Wiley, New York.