.I Chron Dis Vol. 34. pp. 599 to 610. 1981 Printed 111Great Britain. All nghts reserved
CopyrIght
0021.9681:81:12059Y-l2~02.00.0 0 1981 Per~amon Press Ltd
METHODOLOGY FOR THE ASSESSMENT OF NEW DICHOTOMOUS DIAGNOSTIC TESTS MAURICESTAQUET,*MARCELROZENCWEIG,~ YOUNGJACK LEERSand FRANCOM. MUGGIA~ (Ret&d
version received 8 December
1980)
Abstract-The correct evaluation of a new diagnostic test requires the knowledge of the true status of the patient. Practically, such a favorable situation is rarely encountered and the status is most usually determined through one or several imperfect reference tests. In this case, a bias is introduced in assessing new diagnostic procedures and prevalence becomes a variable of prominent importance when calculating predictive values as well as sensitivity and specificity. Prevalence also affects the statistical parameters of the new tests in two-stage designs, and corrective procedures are necessary in these instances. This paper focuses an adjustment to be made to allow a correct assessment of new diagnostic procedures in specific situations.
1. INTRODUCTION ASSESSMENT of a new diagnostic procedure requires its evaluation in two groups of subjects: those with the disease under investigation and those without evidence of the illness. Problems encountered in this type of study have been delineated in previous publications [l-7]. Several areas have been identified which require extensive consideration when designing clinical studies or interpreting clinical results i.e. the effect of prevalence of the disease on the predictive values of the diagnostic test, the definition and the selection of the control group, the statistical independence of the new diagnostic technique relative to the procedures used to establish the true status of the patients, the oversimplification imposed by the binary nature of the outcome and the selection of a critical cut-off point to distinguish normal from abnormal findings when results are distributed within a continuous range of values. The knowledge of the true status of the subjects submitted to the new test is of crucial importance for the evaluation of a new diagnostic procedure. Many situations, however, do occur in clinical medicine where the true condition of the patient under scrutinity cannot be determined with certainty because of the imperfect nature of the diagnostic procedures. For instance, the absence of hepatic metastases can never be definitively proved even at autopsy. The inaccurate determination of the true status of the subjects will result in a misclassification of the patients in diseased and non-diseased groups which, in turn will bias the estimation of the statistical parameters used to characterize the new test. It will be shown in this paper that the magnitude and the direction of the bias can be recognized and corrected in some situations and that the influence of the prevalence of the disease is even more important than previously reported. The paper will also focus on adequate sampling procedures and on the corrections that are necessary to validate retrospective evaluations. THE
*EORTC Data Center, Institut tDivision of Cancer Treatment, $On detail from the University
Jules Bordet, lOOO-Brussels, Belgium. National Cancer Institute, Bethesda, Maryland, of Maryland, College Park, Maryland, U.S.A. 599
U.S.A.
MAURICE
TABLE
1. CHARACTERIZATION
STAQUET
et al
OF A DIAGNOSTIC
TEST (N)
ACCORDING
TO
THE TRUE STATUS OF THE PATIENTS True
Test
(N)
_
TP FN
FP TN
TP + FN
FP + TN
+
_
Sensitivity(S)
=
Specificity(SP)
Pred. value (PVf)
+
l-S=
TN = ___ FP + TN
(+) = ~
Prevalence =
status
+
TP
l-Sp=-------=
TP + FN + FP + TN FN TP=% FP FP + TN
Pred. value (PV-)
TP + FP
TP + FP FN + TN
i
B TN
(-) = ____
FN + TN
TP + FN TP+
FN+
FP+TN
2. THEORETICAL
CONSIDERATIONS
The current methodology for evaluating new dichotomous diagnostic tests is based on statistical indexes introduced by Yerushalmy [8], Thorner [9] and Vecchio [lo]. A diagnostic test can yield four outcomes relative to the true status of the patient. These are true positive (TP) or negative (TN) findings when the diagnostic is correct and false positive (FP) or negative (FN) findings when the diagnostic is erroneous. Various arrangements of the data may be used to characterize the test (Table 1). Sensitivity (S) represents the percentage of patients correctly classified in the diseased category and the specificity (SP) is the percentage of patients correctly classified in the non-diseased group. Sensitivity and specificity are not affected by the prevalence (prev) of the disease in the population. In this paper, ‘specificity’ is always used according to this definition. The predictive value of positive tests (PV+) is defined as the ratio of the number of true positive to the total number of positive tests. The predictive value of a negative test (PV-) is the ratio of the number of true negative to the total number of negative tests. PV+ increases and PV- decreases when the prevalence of the disease increases in a selected population. These indicators are also dependent upon the sensitivity and the specificity of the test. Tables giving predictive values for a wide range of prevalence, sensitivity and specificity have been computed by Galen and Gambino [ll]. A prime requirement for a test is that it yields a higher percentage of positive findings among truly diseased than among truly non-diseased patients. This requirement may be expressed by S>l-SP
or or
s+sp>1 PV + > prev
or
PV - > 1 - prev
01
s>
TP + FP TP + FN + FP + TN
The statistical notion of specificity which deals with negative tests among subjects without the disorder should be distinguished from the concept of clinical specificity taken as an indicator of the target disease. Otherwise, the test could only be picking up patient characteristics associated with the disease in question yet which could be shared with
Assessment
of New Dichotomous
Diagnostic
Tests
60 1
/-----~~ : Population ‘I \ ‘\._ _ -H ,/I
Sample
+
&
Test R +Test N
Fig. 1. Evaluation
of a new diagnostic
test (N) by a reference
test (RtDesign
1.
many other patient groups, like age, debility and others. The clinical specificity of a new diagnostic technique can be investigated by selecting a broad range of patients with and without the disease [l, 31. The choice of the subjects to include in the samples will depend both on the nature of the diagnostic test and the medical spectrum [l] of the disease. For pathognomonic results, i.e. a diagnostic finding uniquely associated with a disease, the clinical specificity is maximum and the statistical specificity is equal to 100% (excluding administrative errors). 3.
DESIGN
OF
CLINICAL
INVESTIGATION
In contrast to population screening, the assessment of a diagnostic test in a clinical setting is usually performed in a population preselected by previous diagnostic manoeuvers. In this situation, the prevalence of the disease and the statistical parameters of the new test are relevant only for the clinical entity defined by the selection procedure. Design 1 (Fig. 1) involves a random selection of a group of patients from the population of interest and the performance of both the new test (N) and the reference test (R) in all the cases of the sample. In this situation the prevalence of the disease can be meaningfully estimated from the results of test (R) and the appropriate statistical parameters can be computed. In design 2 (Fig. 2) the true diagnosis is established first on a sample of the population of interest and then samples are taken again from the diseased and non-diseased subjects and submitted to the new test. In design 3 (Fig. 3) the results of the new test are already obtained when performing the reference tests. The true status is determined by means of reference test (R) in samples of patients with positive and negative findings. Designs 2 and 3 are both two-stage sampling where the size of the two groups formed at the second stage is decided by the investigators. In design 2, it can be seen that the prevalence rate is depending on the ratio of the size of the two groups and is therefore at the discretion of the investigators. Likewise in design 3 the prevalence rate of the disease is influenced by this ratio. Correction procedures are then necessary which will be outlined in the following sections. In all cases, random sampling and statistical independence between the new test (N) and the reference test (R) are required.
4.
EVALUATION
OF
THE
DATA
1. Design 1 (a) Reference test with known sensitivity and specijicity. In this type of study, patients undergo an imperfect test called the reference test (test R) with known sensitivity (S,) and specificity (SP,) as well as a new test (test N) for which sensitivity (S,) and specificity
602
MAURICESTAQUET et al.
/----__ /’
--\ Population
:
\ ‘.._
_ _-a*
) .I’
Sample &
Fig. 2. Evaluation of a new diagnostic test (N) by a reference test (RtDesign
2.
(SP,) must be determined. This situation is illustrated in Table 2 with a numerical example where the four cells of the table are also represented with the letters a, b, c, d. Suppose that 100 truly diseased and 100 truly non-diseased patients are available for the study of a disease with a prevalence of 0.5. The disease status is determined by a imperfect reference test (R) with a sensitivity of 0.9 and a specificity of 0.6. Of the truly diseased patients, 90 (0.90 x 100) will be classified as positive to the reference test and the other 10 as negative. In the truly non-diseased group, 60 patients (0.60 x 100) will be negative to the reference test (R) and the remaining 40 will be positive. When the new test
I’ i
I-------
-1
Papulation ‘\
‘._
__A
\ 1
/
l-
T Sample
Fig. 3. Evaluation of a new diagnostic test (N) by a reference test (RtDesign
3.
603
Assessment of New Dichotomous Diagnostic Tests TABLE 2.
ASSESSMENT OFA NEW DIAGNOSTICTEST(N) WITH KNOWN
SENSITIVITY
AND
BY A REFERENCETEST(R)
SPECIFICITY
Reference test (R) + New Test (N)
+ -
72 + 12 (a) 18 + 28 (b)
8 + 18 (c) 2 + 42 (d)
110
90 + 40
10 + 60
200
90
SR and SPR are equal to 0.9 and 0.6. SN = 0.80 SPN = 0.70 PVfN = 0.73 PV-, = 0.78 Prev. = 0.5
(N) is applied to the 130 (90 + 40) patients with a positive reference test (R), the results will vary according to the true status and the sensitivity and specificity of test N. Let us suppose that these figures are 0.8 and 0.7 respectively. Of the 90 patients truly diseased, 72 (90 x 0.8) will be positive to the new test and the other 18 will be negative. Forty truly non-diseased patients, wrongly classified as positive by the reference test, will be divided according to the specificity of test (N) into 28 patients negative to test (N) (40 x 0.7) and 12 positive patients. Thus, the number of patients who are positive to both tests [(a) in Table 21 represents the percentage of true positive patients to test (R) who are also positive to test (N) plus the percentage of false positive patients to test (R) who are also falsely positive to test (N). This can be represented algebraically as follows: a = X.SR’SN + Y(L - SP,)(l
- SPN)
(1)
where X and Y are the number of patients with a truly positive and negative status’ respectively. The same type of reasoning applied to the other cells of the table gives b = X.SB(l - S,) + Y SPN(l - SPR)
(2)
c = x SN(1 - S,) + Y SPR(1 - SPN)
(3)
,d = X(1 - S&L - S,) + Y*SPR’SPN
(4)
since a+b+c+d=N
(5)
X+Y=N
(6)
a + b = prev*(S,)(N)
+ (1 - prev)(l - SP,)(N)
(7)
these equations may be solved as ” sp
(a + c)SP, - c = N(SP, - 1) + (a + b) _ (b + d)S, - b N - NS, - (a + b)
PV+,=z=
S,‘X
SN(a + b - N + SP,N) (a + c)(S, + SP, - 1)
(8) (9)
(10)
MAURICESTAQUETet al.
604 TABLE
3. ASSESSMENT OF A NEWDIAGNOSTIC TEST(N) BOTH TESTS HAVING
A lw;
BY A REFERENCE TEST
(RI,
SPECIFICITY
Reference test (R) + New test (N)
+ :a2) :bg
8+0 (cl 2+ 100 (4
90
10 + 100
80 120
200
s, = 0.9 s, = 0.8 PVfR = 1 PV+, = 1 PV-, = 0.91 PV-, = 0.83 Prev. = 0.5
SPN(N . SR - a - b) (S, + SP, - l)(b + d)
Prev = a + b + (SP, - 1)N (S, + SP, - l)N .
(11) (12)
Hence, the use of an imperfect reference test, but with known sensitivity and specificity allows a complete characterization of a new diagnostic test. Considering the reference test as perfect, would introduce an obvious bias. In this hypothesis (see example in Table 2), the sensitivity of the new test would be wrongly estimated as 0.65 (84/130) instead of 0.80, the specificity as 0.63 (44/70) instead of 0.70, the PV+ as 0.76 (84/110) instead of 0.73 and the PV- as 0.49 (44/90) instead of 0.78. (b) Both new and reference tests have a specijicity of lOa”/,. When the two diagnostic procedures yield pathognomonic findings and therefore no false positive results, an exact computation of their sensitivity and predictive values can be made. Table 3 gives an hypothetical example supposing sensitivities of (R) and (N) known (0.9 and 0.8 respectively). A perfect sampling of 200 patients is performed at random in a population with prevalence of disease of 0.5. Among 100 truly diseased patients, 90 will be positive to (R) (100 x 0.9) and among these, 72 (90 x 0.8) will also be positive to N. Of the 10 truly diseased patients falsely classified as negative by R, 8 will be detected as positive by (N) (sensitivity = 0.8). The 100 truly non-diseased patients will all be negative to (R) and (N). The parameters of the two tests can be computed as follows, using equations (8t(12) S, = a a+b
(13)
S, = a a+c
(14)
PV + = 1 for both tests Y
pV_R = ~ c+d
= NS, - (a + b) S,(c + d)
(16)
NS, - (a + b) S,(b + d)
(17)
pv-,=Y= b+d Prev
=
(15)
(a f W + 4 a.N
(18)
Assessment
TABLE
4. THEORETICAL TEST (R)
WITH
of New Dichotomous PARAMETERS
loo:;,
OF A NEW
SPECIFICITY
WHEN
Diagnostic TEST TRUE
(N)
605
Tests
VERSUS
STATUS
A REFERENCE
IS KNOWN
True status + Test(R)+
Test (N)
+ -
Test (R)-
Test(N)
+ -
’
0 0
NI N2
N3 N4
N5 Nh
N3 + NS N4 + N,
N, + N,+ N3 + N4 N, = number
N5
+
-____ NC
N
of subjects. NI + N3
Theoretical
S, =
Theoretical
SP,
Theoretical
PV + N =
Theoretical
PV - h = N&%~~ 2 4
Prevalence
I
-
N, N2
NI + Nz + N, + N4 N6 = N5 + N6 N,
+N3
N1 + N, + NS
6
= N1 + Nz + N3 + N4 N
Such a method could be used for comparing invasive procedures which are generally pathognomonic. For instance, Grossman et al. [12] made an assessment of the value of percutaneous biopsy in the diagnosis of liver metastases by performing liver biopsies just prior postmortem examination. Using the published data [12], it could be found that 39 patients out of 67 had liver metastases at autopsy of which 27 were biopsy positive (histology or cytology), yielding a sensitivity of 27/39 = 0.69 for the percutaneous procedure. Four patients out of the 28 patients without liver metastases at autopsy were biopsy positive (cytology), giving a sensitivity of 24/28 = 0.86 for postmortem examination. The main obstacle in this case is the bias in the selection of the sample since autopsy cases are not representative of a population of patients with and without hepatic metastases. (c) Reference test (R) with unknown sensiticitp and spec$city of IOO~~.In this case, the reference test yields pathognomonic findings and therefore no false positive results. Thus, for instance, Castagna et al. [13] undertook a study of 109 patients with extrahepatic carcinoma to examine the reliability of liver scans and liver function tests in detecting metastases. Laparotomy or autopsy were utilized to determine the true status of each patient. The fact, however, that the sensitivity of laparotomy or autopsy is most likely to be inferior to 100% introduces an error in the classification of the patients when results of the reference test (R) are assimilated to the true status. This can be seen by considering Tables 4 and 5. Table 4 displays the theoretical distribution of N patients according to the outcome of the two tests (R) and (N) applied together supposing that the true status of the patients is known. The true parameters of the new test can be easily calculated in this theoretical situation. In practice, the true status is unknown and the N patients are distributed as in Table 5 where only the sums of N, + N5 and N, + N, are known but not each element of this sum. Considering the observed values with test (R) as values obtained with the true status introduces the following error: specificity, positive predictive value and prevalence are underestimated, and negative predictive value is overevaluated. Only sensitivity of the new test is correctly calculated. The degree of error in these determinations depends
606
MAURICESTAQUET et al. TABLE 5. OBSERVEDPARAMETERS OF A NEW TESTVERSUSA REFERENCE TESTWITH A SPECIFICITY OF 100%. SENSITIVITY OF THE REFERENCE TEST IS UNKNOWN
--_
Ni = number of subjects. NI Observed S, = N, + NZ
Observed SP, =
Ns + N, NS + N6 + NJ + N.,
Observed PVfN
= N + z1 + N
Observed PV - N =
I
3
5
N4 + Ns N2 + N, + Ns
N, + Nz Observed prevalence = N
upon the sensitivity of test R and test N and upon the prevalence of the disease (Table 6). In this setting, the true value of the specificity of the new test is confined between 100% and a lower bound which is the observed specificity (d/c + d in Table 5). These limits allow the computation of a minimum and maximum value for the prevalence of the disease. They can be found using the following formula:
Prev
(a + b)(N. SPN- b - d)
=
N[SP,(a
(19)
+ b) - b]
which is derived from formulae (9) and (12). A practical application of these observations may be found in the work of Bleiberg et al. [14]. These investigators studied the value of liver function tests and liver scan (the ‘new tests’) in the detection of liver metastases among cancer patients using peritoneoscopy-biopsy as a reference test. The three procedures were performed in 158 patients (Tables 7 and 8). Since the presence of cancerous cells in the biopsy specimen is pathognomonic, the specificity of this test is taken as unity (excluding administrative errors). In this case, as indicated previously, the sensitivity of the new tests can be calculated exactly without bias. They are 0.92 (j$) for liver function tests and 0.72 @) for liver scan, using the author’s criteria
TABLE6. RELATIVEBIAS* ON THEPARAMETERS OF THENEW TEST OF TEST
R
AND TEST
N
AND ACCORDING
Rel. bias* in SN
/”
Prev.
(N)
ACCORDINGTO THE SENSITIVITY
TO THE PREVALENCE
OF THE DISEASE
Rel. bias* in SP N Rel. bias* in PV+,
None None None
*Relative bias = percentage of error on exact value.
Rel. bias* in PV-N
607
Assessment of New Dichotomous Diagnostic Tests TABLE7. LIVERFUNCTION TESTSVERSUSPERI-~ONEOSCOPY-BIOPSY Peritoneoscopy-biopsy
I + -
Liver function tests (N)
+ 33 3
(a) (b)
(c) (d)
36 S, = ;
(R)
_
I 98 60
65 57 ______ 122
158
= 0.92
Obs. SP, = &
= 0.47
Min. prev. = $$ = 0.23 Max. prev. = 0.68
TABLE8. LIVERSCANNING
VERSUSPERITONEOSCOPY-BIOPSY Peritoneoscopy-biopsy
(R)
S, = $ = 0.72
Obs. SP N = z = 0.48 122 Min. prev. = g
= 0.23
Max. prev. = 0.78
1
01
02
03
04
05
0.6
07
08
0.9
IO
Prevalrnco Fig. 4. True specificity of liver scan and liver chemistries according to the prevalence of hepatic metastasis.
608
MAURICE S~AQUETet al. TABLE 9. UNCORRECTED
ASSESSMENT OF A NEW TEST (N) VERSUS A REFERENCE TEST (R) USING DESIGN 3
Reference test (R) _ + New test (N)
6+8
--32 -
24 8
3
1 + 28
32 -
28 4
21
1 + 36
+
18
_
I
64
for positivity and negativity. Specificities, however, are not estimated exactly and the observed values (0.47 and 0.48 for liver tests and liver scan respectively) are underestimated. Since in this design the specificities are a function of the prevalence of the disease, a curve can be drawn which permits comparison of the specificities at various possible prevalences (Fig. 4). It can be seen that the true specificity of the liver function tests is generally superior to that of the liver scan. 2. Designs 2 and 3 In these two-stage designs, patients are selected for the second step (test N in design 2, test R in design 3) on the basis of the results in the initial sampling. The prevalence of the disease is thus no longer identical at both stages of the investigation introducing a bias that affects not only the predictive values of the new test, but also its sensitivity and specificity. Suppose that a sample of 200 individuals are submitted to a new diagnostic test according to design 3 and that 120 are positive and 80 negative to this procedure. Sensitivity and specificity of this new test as well as prevalence of the disease are unknown, but one will assume these values to be 0.9,0.7 and 0.5 respectively. It can then be calculated that the exact PV+ = 0.75 [&I and PV- = 0.875 [s]. Suppose now that two random samples of 32 patients each, among the 120 positive and 80 negative are selected for a reference test with known sensitivity of 0.75 and specificity of 1 (Table 9). Using the exact predictive values, it is possible to compute the number of truly diseased patients in 32 individuals selected among the ones positive to the new test (0.75 x 32 = 24), as well as the number of truly non-diseased among the 32 negative individuals to the new test (0.875 x 32 = 28). It can be seen in Table 9 that, among the 24 truly diseased patients positive to the new test, 18 will be positive with the reference test (sensitivity = 0.75). Since the reference test is pathognomonic (specificity = l), all the 8 truly non-diseased patients which were falsely positive to the new test will be negative to the reference test. Likewise, 75”/, of the 4 patients who are truly diseased but falsely labelled as negative by the new test will be positive with the reference test. At this point some corrections must be made, because the relative size of the two groups chosen by the investigator does influence the computation of the parameters of the new test. In fact, the ratio between the size of the two randomly selected groups must be the same as the one between the number of positive and negative patients according to the new test. In this case, this ratio is $j$ = 1.5. Multiplying each figure in the first row of Table 9 by 1.5 gives the appropriate data to compute correct values of sensitivity, specificity, and predictive values of the new test as well as the exact prevalence (Table 10). This can be verified by using equations (8)-(12) given in the previous section. 5.
DISCUSSION
Sensitivity and specificity are reliable indicators of the potential usefulness of a diagnostic test. However, their absolute values vary with the criteria of positivity and negativity of the procedure, the type of population investigated and the definition of the
Assessment
TABLE
of New Dichotomous
Diagnostic
Tests
609
10. CORRECTED ASSESSMENT OF A NEW TEST (N) VERSUS A REFERENCE TEST (R)USlNC DESIGN 3
-g
s, = 0.9 SP, = 0.7 PVfN = 0.75 PV - N = 0.875 Prev. = 0.5
disease. For instance, in the case of liver metastases, the minimum level of pathological disorder could be, in the case of liver biopsy, the presence of a single metastatic cell whereas for liver scanning, only lesions of more than 1.5 cm in size would be considered. Sensitivity and specificity are not influenced by the prevalence of the disease when the true status of the subjects in the samples is known without error. Practically, this favorable situation is rarely encountered and the status is most usually estimated by means of an imperfect reference test. It follows that a few patients are not correctly classified in their disease category and a bias is introduced in the evaluation of the new test as well as of the prevalence of the disease [15]-[22]. A frequent occurrence in the clinic is the use of invasive tests to determine the true status of the subjects. In this case, the reference test usually yields pathognomonic findings (specificity is then unity) and it is possible to determine correctly the sensitivity of the new test. If the sensitivity of the reference is less than one, the other parameters of the test are biased: specificity and PV+ are underestimated whereas PV- is overestimated. In this case prevalence becomes a variable of prominent importance because it influences the amount of bias introduced in the evaluation of 3 of the 4 statistical parameters of the new test. Moreover when the patients are selected for the tests on the basis of history and clinical examination, a high prevalence is observed provoking a further increase in the relative bias of PV- and of the specificity (Table 6). When autopsy data are used to find the true status, an obvious selection bias is introduced in the assessment of any new test. Interpretation of these data are further obscured by possible variations in the disease course during the time interval between the performance of the test and the postmortem examination. Another important point is the necessity of using appropriate random sampling in the design of trials evaluating new tests in order to validate any generalization of the results. The avoidance of a selection bias is specially difficult when invasive techniques are used to determine the true status of the subjects. This procedure is indeed ethically restricted to a group of patients with certain clinical characteristics which could differ greatly from the population for which the new test is intended. In addition, when two-stage sampling is being used, a common occurrence in retrospective studies, care must be taken to make adjustment for prevalence before computing the usual statistical parameters. REFERENCES 1. 2. 3. 4. 5. 6.
Feinstein A: On the sensitivity, specificity and discrimination of diagnostic tests. Clin. Pharmacol Ther 17: 104116, 1975 Feinstein A: The haze of Bayes, the aerial palaces of decision analysis and the computerized Ouija board. Clin Pharmacol Ther 21: 482496, 1977 Ransohoff DF, Feinstein AR: Problems of spectrum and bias in evaluating the efficacy of diagnostic tests. N Engl J Med 299: 926-930, 1978 Lusted LB: Decision-making studies in patient management. N Engl J Med 284: 416424, 1971 Chiang CL, Hodges JC, Yerushalmy J: Statistical problems in medical diagnoses. Proc. 3rd Berkeley Symposium on Math Stat and Prob IV, 121, 1956 Sunderman FW, Van Soestbergen AA: Laboratory suggestions: probability computation for clinical interpretations of screening tests. Am J Clin Pathol 55: 105-l 11. 1971
MAURICESTAQUET et
610
al.
7. Staquet M, Rozencweig M et al: Biases in the assessment of new diagnostic tests. Proc. 15th Annual Meeting of ASCO, New Orleans, 20: 320, 1979 8. Yerushalmy J: Statistical problems in assessing methods of medical diagnosis with special reference to X-ray techniques. Public Health Rep 62: 1432-1449, 1947 9. Thorner RM, Remein QR: Principles and procedures in the evaluation of screening for disease. Public Health Monogr 67, 1961
10. Vecchio TJ: Predictive value of a single diagnostic test in unselected 1171-1173, 1966 11. Galen RS, Gambino SR: Beyond Normality: the Predictive Value and New York, Wiley, 1975 12. Grossman E, Goldstein MJ et al. Cytological examination as an adjunct hepatic metastases. Gastroenterology 62: 56-60, 1972 13. Castagna J, Benfield JR, et a/: The reliability of liver scans and function 14.
15. 16. 17.
population. N Engl J Med 274: Efficiency
of Medical
Diagnoses.
to liver biopsy in the diagnosis of tests in detecting metastatis. Snrg
Gynecol Obstet 143: 463-466, 1972 Bleiberg H, Rozencweig M, et al: Peritoneoscopy as a diagnostic supplement to liver function tests and liver scan in patients with carcinoma. Surg Gynecol Obstet 145: 821-825, 1977 Bross I: Misclassification in 2 x 2 tables. Biometrics 9: 478-486, 1954
Diamond EL, Lilienfeld AM: Effects of errors in classification and diagnosis in various types of epidemiological studies. Am J Public Health 52: 1137-1144, 1962 Newell DJ: Errors in the interpretation of errors in epidemiology. Am J Public Health 52: 192551928, 1962
19.
Diamond EL, Lilienfeld AM: Misclassification errors in 2 x 2 tables with one margin, fixed: some further comments. Am J Public Health 52: 2106-2110, 1962 Keys A, Kihlberg JK: Effect of misclassification on estimated relative prevalence of a characteristic. Am J
20.
Public Health 53: 16561660, 1963 Gullen WH, Bearman JE, Johnson EA: Effects of misclassification in epidemiologic studies. Public Health
18.
Rep 83: 914918, 1968 21. Greenberg RA, Jekel JF: Some problems in the determination of the false positive and false negative rates of tuberculin tests. Am Rev Respir Dis 100: 645-650, 1969 22. Hui SL, Walter SD: Estimating the error rates of diagnostic tests. Biometrics 36: 167-171, 1980