Journal of Clinical Epidemiology 56 (2003) 956–962
Optimal choice of a cut point for a quantitative diagnostic test performed for research purposes Laurence S. Magdera,b,*, Alan D. Fixa a
Department of Epidemiology and Preventive Medicine, University of Maryland, 660 West Redwood Street, Baltimore, MD 21201, USA b Department of Mathematics and Statistics, University of Maryland, Baltimore, MD, USA Accepted 11 May 2003
Abstract Often, in epidemiologic research, classification of study participants with respect to the presence of a dichotomous condition (e.g., infection) is based on whether a quantitative measurement exceeds a specified cut point. The choice of a cut point involves a tradeoff between sensitivity and specificity. When the classification is to be made for the purpose of estimating risk ratios (RRs) or odds ratios (ORs), it might be argued that the best choice of cut point is one that maximizes the precision of estimates of the RRs or ORs. In this article, two different approaches for estimating RRs and ORs are discussed. For each approach, formulae are derived that give the mean squared error of the RR and OR estimates, for any choice of cut point. Based on these formulae, a cut point can be chosen that minimizes the mean squared error of the estimate of interest. 쑖 2003 Elsevier Inc. All rights reserved. Keywords: Epidemiologic methods; Sensitivity and specificity; Diagnostic tests; Misclassification; Odds ratio; Risk ratio; Study design
1. Introduction Often, in epidemiologic research designed to estimate the association between risk factors and a disease, the classification of a person with respect to the presence or absence of disease is based on whether some quantitative measurement exceeds a specified cut point. This classification is generally not perfectly accurate, and the choice of the cut point involves a tradeoff between sensitivity and specificity. Decreasing the value of the cut point results in higher sensitivity, but lower specificity of the diagnostic test. This article concerns the question of how to choose a cut point in that context. For example, there is current interest in estimating the degree of association between various risk factors (e.g., sexual activity) and seropositivity for Herpes Simplex Virus 8 (HSV8). The degree of association can be quantified by risk ratios (RRs) or odds ratios (ORs). The classification of study participants with respect to HSV8 seropositivity is based on whether the value of the optical density of a serologic assay exceeds a specified cut point. Unfortunately, no matter what cut point is chosen, the assay does not result in perfectly accurate classifications of HSV8. What would be the optimal cut point in this context?
* Corresponding author. Tel.: 410-706-3253; fax: 410-706-8013. E-mail address:
[email protected] (L.S. Magder). 0895-4356/03/$ – see front matter 쑖 2003 Elsevier Inc. All rights reserved. doi: 10.1016/S0895-4356(03)00153-7
Methods for choosing a cut point have been developed for the situation in which the test is to be used for diagnosis in a clinical setting where treatment decisions will be based on the test results [1]. In this context, the choice of cut point should be based on the consequences of treating those who do not have the condition, the consequences of failing to treat those who do have the condition, and the prevalence of the condition in question [1]. This approach can be implemented by plotting an receiver operating characteristic (ROC) curve and finding the point on the curve at which the slope equals [C/B][(1⫺pD)/pD] where C is the net cost of treating someone without the condition, B is the net benefit of treating someone with the condition, and pD is the patient’s pretest probability of having the disease [1]. However, when the test is used for research purposes and treatment decisions are not based on the test results, the choice of the cut point should be based on scientific considerations. Specifically, if the test is to be used in epidemiologic research to estimate RRs and ORs, it might be argued that a cut point should be chosen that results in the most precise RR and OR estimates. In this article, we consider two general approaches for estimating RRs and ORs and for each approach we provide formulae that can be used to calculate the precision of the estimates for any choice of cut point. Based on these formulae, cut points can be chosen that result in the best estimates. Throughout, it is assumed that the sensitivity and specificity of the diagnostic test are known at each possible
L.S. Magder, A.D. Fix / Journal of Clinical Epidemiology 56 (2003) 956–962
cut point. However, the results have implications for choosing cut points when there is uncertainty about the sensitivity and specificity of the test at each cut point.
2. Statement of the problem For linguistic simplicity, we will use the words “disease” to refer to the health condition of interest, and “exposure” to refer to a dichotomous risk factor. Let Z stand for the quantitative variable used to classify people with respect to the presence or absence of disease. The usefulness of Z for classifying people depends on the degree to which the distribution of Z among those with disease differs from the distribution of Z among those without disease. Fig. 1 illustrates hypothetical distributions of Z among the diseased and the nondiseased. In this illustration, the two distributions are both normal with standard deviation equal to 1, but with means that differ by three standard deviations. The horizontal axis is labeled with respect to the distance from the mean of Z in the nondiseased. To classify people as diseased or nondiseased based on an observed value of Z, a cut point is chosen. If the value of Z for a person exceeds the cut point, then the person is classified as diseased (test result positive). Otherwise, the person is classified as nondiseased (test result negative). The two shaded areas in Fig. 1 illustrate the sensitivity (shaded area on the right) and specificity (shaded are on the left) that would result if a cut point of 2.0 were chosen.
957
How should we choose the value of the cut point when the classification is being made strictly for research purposes? We would argue that the a cut point should be chosen that maximizes the precision of estimates of scientific interest. Here, we consider the cases in which the goal is to estimate RRs or ORs. More precisely, let pD|E and pD|E¯ stand for the probability of disease among the exposed and unexposed respectively. In these terms, RR ⫽ pD|E ÷ pD|E¯ and OR ⫽ pD|E/(1⫺pD|E) ÷ pD|E¯ /(1⫺pD|E¯ ). To quantify the precision of an estimate, it is common to use the mean squared error (MSE), the average squared distance between an estimate and the true value [2]. The MSE is equal to the variance of the estimate plus the square of the bias of the estimate. To determine the precision of estimates of the RR and the OR it is common and convenient to work on the log scale. Thus, the precision of an estimate of RR, say RRˆ, can be quantified using MSE ⫽ Expected value of (logRRˆ ⫺ logRR)2 ⫽ Var(logRRˆ) ⫹ (bias(logRRˆ))2. The analogous expression is used for the OR. Given an estimation approach, the problem reduces to finding the cut point that minimizes the MSE.
Fig. 1. Illustration of hypothetical distributions of Z among the diseased (right curve) and undiseased (left curve). The area in the shaded region to the right of 2.0 represents the sensitivity that would result if 2.0 was chosen as the cut point. The area in the shaded region to the left of 2.0 represents the specificity that would result.
958
L.S. Magder, A.D. Fix / Journal of Clinical Epidemiology 56 (2003) 956–962
3. Choosing a cut point when the RR or OR will be estimated in a standard manner We assume independent observations are available from n study subjects. For each subject, information regarding the presence or absence of the exposure and the value of Z is known. Given such data, there are several approaches to estimating the RR and OR. The standard approach would be to choose a cut point, classify the subjects with respect to disease based on this cut point, and estimate the RR or OR as if these classifications represented the true disease status of each person. To describe this approach more precisely, notation is provided in Table 1. This represents a hypothetical two-by-two table that can be constructed once a cut point for the quantitative measurement Z is chosen. Using the notation in Table 1, let pˆT⫹|E ⫽ a/nE and pˆT⫹|E¯ ⫽ c/nE¯ denote the standard estimates of the probability of testing positive for the disease given exposure and nonexposure respectively. The standard approach to estimating the RR and the OR in this setting is to use RRˆstandard ⫽ pˆT⫹|E/pˆT⫹|E¯
To illustrate the results of applying these formulae, we consider the special case in which the distribution of Z among both the diseased and nondiseased is normal with the same variance but different means. Fig. 2a–d shows the MSE of the standard estimates calculated at a range of cut points under various scenarios. Fig. 2b and d is based on two distributions whose means are separated by three standard deviations, as illustrated in Fig. 1. The horizontal axes consists of an interval of possible cut points, labeled by their distance (in standard deviations) from the mean of the distribution of Z among the nondiseased, as in Fig. 1. The sensitivity and specificity corresponding to each possible cut point (calculated based on the normality assumptions) are given below the horizontal axis. The calculations are based on a sample size of 200 per group. It can be seen that in these scenarios, the optimum cut point occurs in a place of high specificity and moderate sensitivity. The importance of high specificity in these scenarios is tied to the fact that the probability of disease in each group is relatively low. Given a low probability of disease and imperfect specificity, the number of true positives might be relatively low compared to the number of false positives. This will lead to relatively greater bias and variance.
and ORˆstandard ⫽ (pˆT⫹|E/(1⫺pˆT⫹|E)) ÷ (pˆT⫹|E¯/(1⫺pˆT⫹|E¯ )). Appendix A contains formulas for the asymptotic variance and bias of these estimates derived using the “delta method” [3]. No assumptions were needed to derive these formulae other than the availability of independent observations and knowledge of the sensitivity and specificity of the test. In particular, the formulae do not depend on the normality of Z. As can be seen, the variance and bias depend on: (1) the sample size among exposed and unexposed, (2) the true values of pD|E and pD|E¯ , and (3) the sensitivity and specificity of the diagnostic test. Given these values, we can calculate the MSE of the estimates. Now we assume that the sensitivity and specificity are known for each possible cut point under consideration. Therefore, given sample sizes and values of pD|E and pD|E¯ we can calculate the MSE of these estimates at each possible cut point under consideration and then choose the cut point that leads to the lowest MSE. In actuality, we will not know the true values of pD|E and pD|E¯ , but substituting our best guesses for them will result in the best guess regarding the optimal cut point.
Table 1 Notation for cell counts in two-by-two tables
Given data such as that in Table 1 and known values of sensitivity and specificity, it is possible to estimate the RR and OR using a method that adjusts for the imperfect sensitivity and specificity of the test [4,5]. The rationale for this approach is as follows: Let pT⫹|E and pT⫹|E¯ stand for the probability of testing positive in the exposed and unexposed, respectively, based on a given cut point. There are two possible ways in which a positive test result could occur: (1) the person really has the disease and the test is correctly positive, and (2) the person does not have the disease, but the test is incorrectly positive. The probability of testing positive is the sum of the probability of these two possibilities. Therefore, pT⫹|E ⫽ ( pD|E)(sens) ⫹ (1⫺pD|E)(1⫺spec) where “sens” and “spec” are short for the sensitivity and specificity of the diagnostic test. Solving this equation for pD|E results in: pD|E ⫽
Classification of study participants based on the diagnostic test
Exposed Unexposed
4. Choosing a cut point when the RR or OR will be estimated using an approach that adjusts for imperfect sensitivity and specificity of the diagnostic test
pT⫹|E ⫺ (1⫺spec) . sens ⫺ (1⫺spec)
Classified as diseased
Classified as nondiseased
Total
Therefore, an unbiased estimate of pD|E is:
a c
b d
nE nE¯
pˆD|E ⫽
pˆT⫹|E ⫺ (1⫺spec) . sens ⫺ (1⫺spec)
L.S. Magder, A.D. Fix / Journal of Clinical Epidemiology 56 (2003) 956–962
959
Fig. 2. Asymptotic mean squared error of the log RRstandard (plots a and b), and log ORstandard (plots c and d) at a range of cut points, under various scenarios. All plots assume 200 subjects per group. Plots (a) and (c) are based on the assumption that the distributions of the quantitative assessment have the same standard deviation, but differ in their means by two standard deviations. Plots (b) and (d) assume they differ by three standard deviations. The probability of disease in the unexposed was assumed to equal 0.1 (thick), or 0.3 (thin). The probability of disease in the exposed was set so that the RR ⫽ 2 (plots a and b) or the OR ⫽ 2 (plots c and d). The numbers for the cut points on the horizontal axes refer to distances from the mean in the undiseased.
Using the same approach to derive pD|E¯ results in the following adjusted estimates for the RR and OR: pˆ pˆ ⫺ (1⫺spec) RRˆadjusted ⫽ D|E ⫽ T⫹|E pˆD|E¯ pˆT⫹|E¯ ⫺ (1⫺spec) and pˆ /(1⫺pˆD|E) ORˆadjusted ⫽ D|E pˆ D|E¯ /(1⫺pˆD|E¯ ) ⫽
pˆT⫹|E ⫺ (1⫺spec) pˆT⫹ |E¯ ⫺ (1⫺spec) ÷ . sens ⫺ pˆT⫹|E sens ⫺ pˆT⫹|E¯
For some data sets, these formulae can result in negative values for the estimate. In those cases, the parameter should be estimated with 0 or infinity depending on the situation. For example, if the denominator of the adjusted RR estimate, pˆT⫹|E¯ ⫺ (1⫺spec), is less than 0, then there are fewer subjects testing positive than would be expected if all the unexposed subjects were truly nondiseased. In this case, the data are most consistent with no probability of disease in the unexposed, and the appropriate estimate of the RR is infinity.
Note that, if the specificity is 1.0, the adjusted estimate of the RR is equivalent to the standard estimate. This reflects the fact that when specificity is perfect, the standard estimate of the RR is asymptotically unbiased. The asymptotic variances of these estimates are given by equations (7) and (8) in the appendix. These variances depend on the sensitivity and specificity of the test, the values of pD|E and pD|E¯ and the sample size in each group. Again, the validity of these formulae does not depend on assumptions about the normality Z. Therefore, for given values of pD|E and pD|E¯ and sample size, a cut point can be chosen which results in the lowest variance. Because these estimates are asymptotically unbiased, their asymptotic MSE is equivalent to their asymptotic variances. Fig. 3a–d shows the asymptotic MSE of the adjusted estimates calculated at a range of cut points under the same scenarios used for Fig. 2. Again, it can be seen that the optimal cut points occur for high values of specificity. Interestingly, despite the fact that these estimates are unbiased, the MSE of the adjusted estimate exceeds the MSE of the standard estimate for many cut points.
960
L.S. Magder, A.D. Fix / Journal of Clinical Epidemiology 56 (2003) 956–962
Fig. 3. Asymptotic mean squared error of the log RRadj (plots a and b), and log ORadj (plots c and d) at a range of cut points, under various scenarios. All plots assume 200 subjects per group. Plots (a) and (c) are based on the assumption that the distributions of the quantitative assessment have the same standard deviation, but differ in their means by two standard deviations. Plots (b) and (d) assume they differ by three standard deviations. The probability of disease in the unexposed was assumed to equal 0.1 (thick lines), or 0.3 (thin lines). The probability of disease in the exposed was set so that the RR ⫽ 2 (plots a and b) or the OR ⫽ 2 (plots c and d). The numbers for the cut points on the horizontal axes refer to distances from the mean in the undiseased.
5. Example
6. Further comments
As mentioned in the introduction, there is currently epidemiologic interest in identifying risk factors for HSV8, a recently discovered virus associated with Kaposi’s sarcoma. Unfortunately, serologic assays to identify infection are thought to have imperfect sensitivity and specificity. Engels et al. [6] evaluated the sensitivity and specificity of several assays at different cut points. Table 2 shows the sensitivity and specificity at three optical density cut points for one of the enzyme-linked immunoassays designed to measure antibodies to the lytic phase glycoprotein K8.1. Table 2 also shows the MSE of estimates of the risk ratio under different scenarios, calculated using the formulae in the appendices of this article. It can be seen that if the standard estimate is used and the prevalence in the unexposed is 5%, using a cut point of 1.5 leads to a far more precise estimate than using a cut point of 0.8 (MSE ⫽ 0.16 compared to MSE ⫽ 0.33). A similar advantage of the larger cut point is seen when the prevalence is 20% and when the adjusted estimate is used.
In practice, it will be impossible to determine the MSE of the estimates at various cut points with certainty. For one thing, there will generally be uncertainty regarding the Table 2 Sensitivity and specificity of the K8.1 assay for detecting HSV8 at various cut points, and the resulting mean square errors of risk ratio estimates Optical density cut points 0.80 Sensitivitya 90% 83% Specificitya MSEb of log standard estimate of the risk ratio Assuming prevalence in unexposed ⫽ 5% 0.33 Assuming prevalence in unexposed ⫽ 20% 0.114 MSEb of log adjusted estimate of the risk ratio Assuming prevalence in unexposed ⫽ 5% 0.79 Assuming prevalence in unexposed ⫽ 20% 0.065 a b
1.00
1.50
85% 90%
78% 98%
0.26 0.072
0.16 0.038
0.55 0.055
0.26 0.042
From Engels et al. [5]. Assuming 200 patients per group and a true risk ratio of 2.0.
L.S. Magder, A.D. Fix / Journal of Clinical Epidemiology 56 (2003) 956–962
sensitivity and specificity of the diagnostic test at various cut points. For another thing, the formulae given in the Appendix provide only approximate MSEs, the quality of which depend on the sample size. However, decisions often have to be made in the presence of uncertainty, and the formulae and the graphs in this article can still provide guidance in choosing cut points. It is clear, for example, that for relatively rare outcomes, good precision in estimation requires high specificity, but is somewhat robust to departures from high sensitivity. This would suggest that one should choose relatively high cut points in this context. Interestingly, we found that in certain settings, the MSE of the standard estimates are lower than the MSE of the adjusted estimates. This occurs because the standard estimates are biased towards 1.0, reducing the probability of getting extremely large or small estimates. In fact, it can be shown that the variances of the standard estimates are always lower than the variances of the adjusted estimates. Thus, for small sample sizes (where the MSE is predominantly determined by the variance), the MSE will be lower for the standard estimates than for the adjusted estimates. However, the lower MSE of the standard method does not mean that it is preferable to the adjusted estimates in these settings. It can be argued based on the likelihood principle [7] that if the sensitivity and specificity are known, then the adjusted estimate of association is a more accurate representation of the information in the data with respect to the true value of the association. For example, consider a data set in which the observed number of positive tests in the exposed group is less than would be expected even if the true disease risk in the exposed was 0, given the imperfect specificity of the test. With such data it is arguable that the data are most consistent with a RR of 0. This is what the adjusted estimate would be, whereas the standard estimate would not equal 0. Thus, if the goal of the analysis is to report what the data say regarding the value of the association, it is best to use the adjusted estimate. There is a third approach to estimating RRs and ORs in this context that obviates the need to choose a specific cut point. In brief, using methods described in Magder and Hughes [5], the exact value of Z can be used in risk assessment using a probabilistic approach. Thus for example, those with a very high value of Z might be classified as having a higher probability of disease than those with a borderline value of Z. These probabilities can be incorporated into an algorithm to compute maximum likelihood estimates of risk ratios or odds ratios. To use this approach the sensitivity and specificity of the assay must be known (or assumed) for multiple cut points. Many studies seek estimates of the association of exposure and disease, while controlling for potential confounders. The adjusted method described above can be extended to this context. A SAS macro is available on the Internet that extends logistic regression to adjust for imperfect sensitivity and specificity of a diagnostic test [8].
961
The methods described in this article are meant to be used when disease status is a true dichotomy (e.g., infected/ uninfected). This should be distinguished from the situation when disease status is matter of degree, and Z is a measure of the degree of disease (e.g., when the disease is obesity and Z is body-mass index). In the latter case, the use of a cut point is mainly for the purpose of providing simpler summaries of the data, and the notions of sensitivity and specificity do not apply. Considerations for choosing a cut point in that setting are discussed by Ragland [9]. The methods described in this article make the implicit assumption that the sensitivity and specificity of the diagnostic test are the same in both study groups. Although generally reasonable, this assumption can be relaxed by using different values for sensitivity and specificity in the terms in the formulae relate to each study group.
Acknowledgments This work was supported by research grant R0-1 AR 43727 of the National Institutes of Health.
Appendix Formulae for the bias and variance of the standard estimates based on a given cutpoint. Let pD|E and pD|E¯ stand for the probability of disease in the exposed and unexposed respectively. Similarly, let pT+|E and pT⫹|E¯ stand for the probability of testing positive in the exposed and unexposed respectively based on a given cut point, T. Then, pT⫹|E ⫽ PD|E sens ⫹ (1⫺pD|E)(1⫺spec)
(1)
and pT⫹|E¯ ⫽ PD|E¯ sens ⫹ (1⫺pD|E¯ )(1⫺spec)
(2)
where “sens” and “spec” refer to the sensitivity and specificity of the diagnostic test based on the chosen cut point. The standard estimate for the RR, pˆT⫹|E/pˆT⫹|E¯ , is an unbiased estimate of pT⫹|E/pT⫹|E¯ . Therefore,
( )
( )
p p bias(logRRˆstandard) ⫽ log T⫹|E ⫺ log D|E pT⫹|E¯ pD|E¯
(3)
Expression (3) can be rewritten in terms of sens, spec, pD|E and pD|E by substituting for pT⫹|E, and pT⫹|E¯ based on expressions (1) and (2). Also, using the “delta” method [3], (1⫺pT⫹|E) (1⫺pT⫹|E¯ ) ⫹ var(logRRˆstandard) ≈ nE pT⫹|E nE pT⫹|E¯
(4)
where nE and nE¯ are the number of study subjects in the exposed and unexposed groups respectively. Again, expression
962
L.S. Magder, A.D. Fix / Journal of Clinical Epidemiology 56 (2003) 956–962
(4) can be written in terms of sens, spec, pD|E and pD|E¯ by substituting from expressions (1) and (2). Using analogous reasoning,
( (
)
p /(1⫺pT⫹|E) bias(logORˆstandard) ⫽ log T⫹|E pT⫹|E¯ /(1⫺pT⫹|E¯ ) ⫺ log
var(logORˆadj) ⫽
(
⫹
(
)
2
sens ⫹ spec⫺1 pT⫹|E(1⫺pT⫹|E) (pT ⫹ |E ⫹ spec ⫺ 1)(sens ⫺ pT⫹|E) nE (8)
(5)
)
pD|E/(1⫺pD|E) pD|E¯ /(1⫺pD|E¯ )
)
2
sens ⫹ spec ⫺ 1 pT⫹|E¯ (1 ⫺ pT⫹|E¯ ) (pT⫹|E¯ ⫹ spec⫺1)(sens ⫺ pT⫹|E¯ ) nE¯
and var(logORˆstandard) ⫽
1 1 ⫹ nE pT⫹|E nE(1⫺pT⫹|E)
References (6)
1 1 ⫹ ⫹ nE¯ pT⫹|E¯ nE¯ (1⫺pT⫹|E¯ )
Formulae for the variance of the adjusted estimates based on a given cut point. Using the delta method, the asymptotic variances of the adjusted estimates can be derived. These are as follows: var(logRRˆadj) ⫽
pT⫹|E(1⫺pT⫹|E) (pT⫹|E ⫹ spec⫺1)2nE
⫹
pT⫹|E¯ (1⫺pT⫹|E¯ ) (pT⫹|E¯ ⫹ spec⫺1)2nE¯
and
(7)
[1] Sox HC Jr, Blatt MA, Higgins MC, Marton KI. Medical decision making. Boston, MA: Butterworth-Heinemann; 1988. [2] Bickel PJ, Doksum KA. Mathematical statistics: basic ideas and selected topics. Oakland, CA: Holden-Day Inc.; 1977. [3] Bishop YMM, Fienberg SE, Holland PW. Discrete multivariate analysis: theory and practice. Massachusetts, MA: MIT Press; 1975. [4] Copeland KT, Checkoway H, McMichael AJ, Holbrook RH. Bias due to misclassification in the estimation of relative risk. Am J Epidemiol 1977;105:488–95. [5] Magder LS, Hughes JP. Logistic regression when the outcome is measured with uncertainty. Am J Epidemiol 1997;146:195–203. [6] Engels EA, Whitby D, Goebel PB, Stossel A, Waters D, Pintus A, Contu L, Bigger RJ, Goedert JJ. Identfying human herpesvirus 8 infection: performance characteristics of serolgic assays. J Acquir Immune Defic Syndr 2000;23:346–54. [7] Royall RM. Statistical evidence. A likelihood paradigm. London: Chapman & Hall; 1997. [8] Web site. http://medschool.umaryland.edu/departments/Epidemiology/ software.html [9] Ragland DR. Dichotomizing continuous outcome variables: Dependence of the magnitude of association and statistical power on the cutpoint. Epidemiology 1992;3:434–40.