Nonparametric Multivariate Inference on Shift Parameters John E. Kolassa, PhD1, Yodit Seifu, PhD Rationale and Objectives: Consider a study evaluating the prognostic value of prostate-specific antigen (PSA), in the presence of Gleason score, in differentiating between early and advanced prostate cancer. This data set features subjects divided into two groups (early versus advanced cancer), with one manifest variable (PSA), one covariate (Gleason score), and one stratification variable (hospital, taking three values). We present a nonparametric method for estimating a shift in median PSA score between the two groups, after adjusting for differences in Gleason score and stratifying on hospital. This method can also be extended to cases involving multivariate manifest variable. Materials and Methods: Our method uses estimating equations derived from an existing rank-based estimator of the area under the receiver operating characteristic curve (AUC). This existing AUC estimator is adjusted for stratification and for covariates. We use the adjusted AUC estimator to construct a family of tests by shifting manifest variables in one of the treatment groups by an arbitrary constant. The null hypothesis for these tests is that the AUC is half. We report the set of shift values for which the null hypothesis is not rejected as the resulting confidence region. Results: Simulated data show performance consistent with the distributional approximations used by the proposed methodology. This methodology is applied to two examples. In the first example, the mean difference in PSA levels between advanced and nonadvanced prostate cancer patients is estimated, controlling for Gleason score. In the second example, to assess the degree to which age and baseline tumor size are prognostic factors for breast cancer recurrence, differences in age and tumor size between subjects with recurrent and nonrecurrent breast cancer, stratified on Tamoxifen treatment and adjusting for tumor grade, are estimated. Conclusions: The proposed methodology provides a nonparametric method for a statistic measuring adjusted AUC to be used to estimate shift between two manifest variables. Key Words: ROC curve; nonparametric analysis; area under curve; shift parameter. ªAUR, 2013
S
erum prostate-specific antigen (PSA) and biopsy Gleason score are prognostic factors frequently used for patients screened for prostate cancer (1). Prognostic factors are important in clinical practice because they ‘‘have a direct impact on therapeutic planning’’ (2). Consider a study evaluating the prognostic value of prostatespecific antigen (PSA), in the presence of Gleason score, in differentiating between early and advanced prostate cancer (1). This data set features subjects divided into two groups (early versus advanced cancer), with one manifest variable (PSA), one covariate (Gleason score), and one stratification variable (hospital, taking three values). These data were analyzed using Bayesian regression techniques (3,4); readers with a radiological or medical physics background might be particularly interested in these references. Our aim is to learn about the difference between median values for PSA for
Acad Radiol 2013; 20:883–888 1 This material is based on work supported by the National Science Foundation (grant DMS 0906569). From the Department of Statistics and Biostatistics, 504 Hill Center, Busch Campus, 110 Frelinghuysen Road, Piscataway, NJ 08854 (J.E.K.); and Novartis Pharmaceuticals Corporation, East Hanover, NJ (Y.S.). Received January 9, 2013; accepted March 25, 2013. Address correspondence to: J.E.K. e-mail:
[email protected] ªAUR, 2013 http://dx.doi.org/10.1016/j.acra.2013.03.006
subjects in the two groups. We refer to such differences in location as a shift. This problem has been well studied in the case when the separate location estimates are approximately normally distributed. Our method applies in more general situations, in which normality assumptions are not needed. We use estimating equation techniques in conjunction with the statistic of Kawaguchi et al (5) to provide an estimate of the shift in the manifest variable coinciding with group assignment. We generate a confidence interval for the shift in PSA scores between the two outcome groups, adjusting for Gleason score. Our method generalizes to the case of multiple manifest variables, and provides a confidence region for the vector of shift parameters for all of these manifest variables. In the case with a single manifest variable and without our covariate adjustment, this model has been previously used (6). In data sets featuring multiple ordinal variables and a dichotomous variable representing group, one might construct the receiver operating characteristic (ROC) curve. Estimates of the areas under the ROC curve (AUCs) that distinguishes between groups 2 and 1 based on each manifest variable measure the degree of association between grouping and that particular manifest variable. The AUC is the same as the probability estimated by the Mann-Whitney statistic in the univariate, unstratified, and unadjusted case (7). These probabilities are the same as those estimated previously (5) incorporating a randomization-based adjustment for covariates. 883
KOLASSA AND SEIFU
The earlier method (5) allows for manifest variables that are strictly categorical. We do not make use of this generalization, and we investigate a shift parameter, which is not justified for purely categorical manifest variables. The simplest case of the previous method (5) is equivalent to the Mann-Whitney statistic (8), which represents an AUC estimated without parametric assumptions (7); other authors survey this area (9,10). Estimating equation techniques, in one dimension, without stratification, and with no covariates, have been used in the radiology literature (11). Alternative parametric estimation procedures also exist (12). The previous method (5) presumes that group membership is allocated randomly by the experimenter. In the example of D’Amico et al (1), both group assignment (advanced versus nonadvanced) and the ordinal variable (in this case, PSA, a continuous variable for which a shift effect is sensible) are dependent variables. We argue that approximate normality is consistent with such a case as well, after matching balance the covariate (in this case, Gleason score). Our simulations are consistent with this.
MATERIALS AND METHODS Consider a clinical trial in which subjects are randomized to a standard or proposed treatment and in which one or more manifest variables are observed for each subject. Suppose that this randomization occurs within strata. Within a stratum, and for a specific manifest variable, then under the null hypothesis that the observations in group 1 have the same distribution as those in group 2, all rearrangements of ranks of the values of this manifest variable in this stratum are equally likely. When there is one manifest variable and one stratum, the standard statistic of Mann and Whitney (8) may be used to test this null hypothesis. When there are multiple strata, but one manifest variable, the extension of van Elteren (13) or the alternatives of Mehrotra et al (14) may be used to test this null hypothesis. The results of van Elteren (13) were extended by Kawaguchi et al (5) to simultaneously address multiple manifest variables. The statistic of Kawaguchi et al (5) is asymptotically multivariate normal, and, furthermore, after appropriate standardization, each component of this statistic estimates the AUC for the corresponding manifest variable. The technique of Kawaguchi et al (5) further extends that of van Elteren (13) to provide an adjusted statistic that makes use of information in covariates. The method of Kawaguchi et al (5) also allows for manifest variables to be missing at random; missing data are featured in the prostate data set, although the missing at random hypothesis is not testable. The permutation distribution of ranks due to random assignment to groups ensures the asymptotic normality of the AUC estimate used by Kawaguchi et al (5), regardless of whether there exist covariates also related to the manifest variable. If the manifest variables are related to covariates, then for any given stratum, the joint distribution of the difference in 884
Academic Radiology, Vol 20, No 7, July 2013
Figure 1. Gleason score versus prostate-specific antigen (PSA) by stage of prostate cancer (advanced/nonadvanced).
covariate mean for group 2 minus the covariate mean for group 1, and the sum of ranks for groups 1 and 2, is asymptotically multivariate normal. Furthermore, Kawaguchi et al (5) present an approximation to the variance matrix of the resulting random vector. The objective of this study is to perform inference on the degree to which the manifest variables in the treatment group are shifted relative to the control group. To this end, we introduce one shift parameter per manifest variable, and we produce a confidence region for these parameters jointly. Let D ¼ ðD1 ; .; Dk Þ represent a vector of potential shifts of the manifest variables for the treatment group relative to the control group. In our first example, this represents the degree to which PSA is elevated in advanced prostate cancer patients, compared to nonadvanced patients. The appendix extends the statistic of Kawaguchi et al (5), used to test for equality of the distribution of manifest variables in the presence of covariates and stratification, to the problem of producing confidence regions for shift parameters. We first consider data from a retrospective cohort study by the Radiologic Diagnostic Oncology Group (RDOG) of 1872 clinically localized prostate cancer patients treated between 1989 and 1997 with local therapy (radical prostatectomy, implant, or conformal external beam radiation therapy) (1). Our new confidence interval construction is applied to the prostate data, displayed in Figure 1, and the construction is illustrated in Figure 2. Figure 2 will be described in more detail in the Results section. In the case of multiple manifest variables, the confidence region is presented as Equation (1) in the Appendix. Competing methods include logistic regression and discriminant analysis. Logistic regression reverses the role of manifest and explanatory variables and does not result in a numerical assessment of the shift between the variables considered in this report as manifest variables. Discriminant analysis, including nonparametric rank-based analyses (16) most parallel
Academic Radiology, Vol 20, No 7, July 2013 NONPARAMETRIC MULTIVARIATE INFERENCE ON SHIFT PARAMETERS
Figure 2. Geometric construction of confidence interval for shift in prostate-specific antigen (PSA): WðdÞ vs. D.
to the method of the current report, is primarily concerned with classification of new observations and not so much with describing differences in location for current observations. To assess adequacy of the normal approximation to the distribution of the area estimate, we simulated the case with one manifest variable, with two groups of 10 subjects per group, in each of two strata. One covariate for each subject was independently drawn from a standard normal distribution. One manifest variable was constructed for each subject, from a regression parameter multiplied by the covariate, plus an additional error independently drawn from a standard normal distribution. In cases representing an alternative hypothesis, a shift was added to the data from group 2. This process was repeated 100 times, and a normal quantile plot was constructed for the standardized AUC estimate. Our methods are applied to the prostate cancer data. Use of the multivariate normal approximation to the sampling distribution of the statistic of Kawaguchi et al (5) requires that covariate values be randomly distributed between the two groups compared by use of this statistic, causing the distribution of covariates in both groups to be the same. Straightforward calculation shows that if covariates in the two groups have different expectations, the statistic is biased. To obtain equal covariate expectations from the data of D’Amico et al (1), we matched to obtain an equal distribution of Gleason scores within the advanced/nonadvanced groups. In this data set, Gleason scores are integers between 2 and 9. For each integer between 2 and 9, observations in both groups with this Gleason score are counted. All of the observations in the smaller group are retained, and the first observations from the larger group are retained, to obtain an equal number of subjects with the Gleason score in each group. For example, among subjects with Gleason score 4, subjects 35, 57, 71, and 106 are advanced, and subjects 85, 111, and 127 are nonadvanced. Fewer subjects with Gleason score 4 are nonadvanced, and all are retained. Among advanced subjects with Gleason score 4, subjects 35, 57, and 71 are retained, and
106 is deleted. This selection was performed using the ordering of observations in the original data set. This data selection process is justified by previous literature using this data set, which treats the experimental subjects as independent and independent of subject ordering; hence, a subset of the data, selected using data order, may be taken without biasing the results. This matching procedure reduces the sample size from 180 to 101. Of interest is a confidence interval for the shift in PSA between advanced and nonadvanced subjects, stratified on hospital and adjusting for Gleason score. As a second example, we consider the data set collected by Sotiriou et al (17), involving 189 subjects with breast cancer, and including information on tumor grade, initial tumor size, subject age, whether the subject was treated with tamoxifen, and whether cancer recurred. This data set was obtained through Madhavan et al (18). We aim to estimate the shift in age and baseline tumor size between recurrent and nonrecurrent cancer patients, stratified on tamoxifen treatment and adjusting for tumor grade. Furthermore, we treat both of these manifest variables jointly, to account for dependence between age and baseline tumor size. These new methods were implemented in the R statistical language (19). Code is available from the corresponding author. RESULTS We performed a simulation to assess whether the normal approximation to the distribution of the estimated AUCs is adequate. Our simulations demonstrated approximate normality for our test statistic under the null model and demonstrated departures from normality under alternatives, leading us to expect efficient coverage from our confidence intervals. Methods of the previous section will be applied first to the prostate data of D’Amico et al (1), described in the introductory section, using PSA as the manifest variable and Gleason score as the explanatory variable, stratifying on hospital, and taking disease status (advanced versus nonadvanced) to determine group. Mathematical details are deferred to the Appendix. Figure 1 shows the relationship among PSA, Gleason score, and advanced status. Figure 2 shows a geometric construction for the point estimate and confidence interval for D, the amount by which the PSA for the advanced group are elevated over those in the nonadvanced group, controlling for Gleason score. The staggered collection of solid horizontal line segments represents the standardized statistic as a function of D; this statistic is given as W in the Appendix. Horizontal portions represent the function over ranges of D between PSA differences between advanced and nonadvanced subjects. Dots represent the function at D equal to a difference between a PSA value for an advanced subject and a PSA value for a nonadvanced subject. Horizontal dashed lines, drawn at 1:96, indicate standard normal approximation critical values for the statistic. Null hypotheses represented by statistic values inside these bounds 885
KOLASSA AND SEIFU
are not rejected. The 95% confidence interval is that set of D for which the test statistic, using data shifted by D, does not reject the null hypothesis. Vertical dotted lines, drawn where the horizontal dashed lines intersect the graph of the test statistic, bound the set of D for which the null hypothesis that the true populations are offset by D is not rejected, and hence represent end points of the confidence interval. The resulting confidence interval is ð0:301; 4:301Þ, indicating that, with 95% confidence, the PSA levels in advanced cancer patients are elevated between 0.301 and 4.301 units above those of nonadvanced patients. Recall that a 95% confidence interval is a recipe for producing an interval that will cover a true parameter with probability of 0.95, regardless of the value of the parameter. Another horizontal dashed line indicates the value 0, and another vertical dotted line shows the value of D where the test statistic crosses 0, and hence indicates the b The point estimate, in this example, is point estimate D. 1.901, indicating that a typical advanced patient has a PSA value 1.901 higher than that of a typical nonadvanced patient. As a second example, we reanalyze a subset of data condensed from five separate data sets by Sotiriou et al (17) of breast cancer patients collected to investigate the association between tumor grade and gene expression. We estimate the shift in age and baseline tumor size between recurrent and nonrecurrent cancer patients, stratified on tamoxifen treatment and adjusting for tumor grade. Hence, we apply Equation (1) with age and tumor size in place of PSA, differentiating between recurrent and nonrecurrent cancer, stratifying on tamoxifen treatment, and adjusting for tumor grade, treated as a categorical variable with categories 1, 2, 3, and missing. Figure 3 shows age and tumor size by recurrence status. Figure 4 shows the confidence region for shift in age and tumor size resulting from Equation (1). Table 1 gives univariate confidence intervals for shifts in age and tumor size separately, as done with the first example. Figure 4 shows, for example, that the a shift in age of 1.5 and a shift in tumor size of 0.35 cm is implausible, because they fall outside the 95% confidence region as shown in Figure 4, even though separately each value falls inside the 95% univariate intervals as shown in Table 1. Thus, simultaneous treatment of the manifest variables yields inference more precise than that produced treating the manifest variables separately. This effect becomes more pronounced when one corrects the univariate results for multiple comparisons; in this case, using, for example, the Bonferroni adjustment, the 95% confidence region of Figure 4 should be compared to the intersection of the separate 97.5% intervals of Table 1. DISCUSSION We have presented a method for constructing nonparametric confidence regions for shift parameters representing the relative locations of continuous variables arising from two groups of patients, in the presence of stratification and adjusting for a covariate. This method involves the use of a nonparametric 886
Academic Radiology, Vol 20, No 7, July 2013
Figure 3. Age and tumor size by stage of breast cancer (recurrent/ nonrecurrent).
Figure 4. Joint confidence region sift in age and tumor size adjusting for tumor grade.
estimate of the AUC and is applied to two examples. The first example involves estimating a shift in PSA score between subjects with advanced and nonadvanced prostate cancer, controlling for Gleason score and stratifying on hospital. This shift parameter represents the difference between the median manifest in the second group minus the median manifest in the first group. In the prostate example, we show that the PSA levels in the advanced group are approximately 2 units higher than they are in the nonadvanced group. The second example involves estimating difference in age and tumor size between women with recurrent and nonrecurrent breast cancer and controlling for tamoxifen treatment through stratification. As noted by Samuelson et al (20), the AUC, and hence our assessment of shift, depends on influences like the makeup of the study population and changes in use of
Academic Radiology, Vol 20, No 7, July 2013 NONPARAMETRIC MULTIVARIATE INFERENCE ON SHIFT PARAMETERS
TABLE 1. Univariate Confidence Intervals for Shift in Age and Tumor Size between Recurrent and Non-recurrent Subjects Level
Variable
0.900 0.900 0.950 0.950 0.975 0.975
Age Size Age Size Age Size
Lower Bound
Upper Bound
2.990 0.400 3.990 0.301 4.000 0.301
4.000 0.899 4.990 0.900 5.000 0.900
instrumentation and, hence, is dependent on the protocol used to generate and collect the data.
ACKNOWLEDGMENTS The authors thank Professor Charles E. Metz of the University of Chicago for his pioneering work in ROC analysis; this issue is dedicated to him. REFERENCES 1. D’Amico AV, Whittington R, Malkowicz SB, et al. Biochemical outcome after radical prostatectomy, external beam radiation therapy, or interstitial radiation therapy for clinically localized prostate cancer. JAMA 1998; 280: 969–974. 2. Zou KH, Warfield SK, Fielding JR, et al. Statistical validation based on parametric receiver operating characteristic analysis of continuous classification data. Acad Radiol 2003; 10:1359–1368. 3. O’Malley A, Zou KH, Fielding JR, et al. Bayesian regression methodology for estimating a receiver operating characteristic curve with two radiologic applications: prostate biopsy and spiral ct of ureteral stones. Acad Radiol 2001; 8:713–725.
4. O’Malley AJ, Zou KH. Bayesian multivariate hierarchical transformation models for ROC analysis. Stat Med 2006; 25:459–479. 5. Kawaguchi A, Koch GG, Wang X. Stratified multivariate Mann Whitney estimators for the comparison of two treatments with randomization based covariance adjustment. Stat Biopharmaceut Res 2011; 3: 217–231. 6. Zou KH, Carlsson MO, Yu C-R. Comparison of adjustment methods for stratified two-sample tests in the context of ROC analysis. Biometr J 2012; 54:249–263. 7. Bamber D. The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. J Math Psychol 1975; 12:387–415. 8. Mann HB, Whitney DR. On a test of whether one of two random variables is stochastically larger than the other. Ann Math Stat 1947; 18: 50–60. 9. Hanley J, McNeil B. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982; 143:29–36. 10. Metz CE. Basic principles of ROC analysis. Semin Nucl Med 1978; 8: 283–298. 11. Schiemann M, Killmann R, Kleen M, et al. Vascular guide wire navigation with a magnetic guidance system: experimental results in a phantom. Radiology 2004; 232:475–481. 12. Metz C, Herman B, Shen J-H. Maximum likelihood estimation of receiver operating characteristic (ROC) curves from continuously-distributed data. Stat Med 1998; 17:1033–1053. 13. van Elteren PH. On the combination of independent two sample tests of Wilcoxon. Bull Int Stat Inst 1947; 37:351–361. 14. Mehrotra DV, Lu X, Li X. Rank-based analyses of stratified experiments: alternatives to the van Elteren test. Am Stat 2010; 64:121–130. 15. Jureckova J. Asymptotic linearity of a rank statistic in regression parameter. Ann Math Stat 1969; 40:1889–1900. 16. Randles RH, Broffitt JD, Ramberg JS, et al. Discriminant analysis based on ranks. J Am Stat Assoc 1978; 73:379–384. 17. Sotiriou C, Wirapati P, Loi S, et al. Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. J Natl Cancer Inst 2006; 98:262–272. 18. Madhavan S, Gusev Y, Harris M, et al. G-DOC: A systems medicine platform for personalized oncology. Neoplasia 2011; 13:771–783. 19. R Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. URL: http://www .R-project.org/; 2012. ISBN 3-900051-07-0. 20. Samuelson F, Gallas BD, Myers KJ, et al. The importance of ROC data. Acad Radiol 2011; 18:257–258.
887
KOLASSA AND SEIFU
APPENDIX et Wk ðDk Þ represent the stratified AUC estimate calculated from the manifest variable k, with values for group 1 unchanged, and values for group 2 replaced by their values with the associated entry in D subtracted off. Let WðDÞ represent the collection of these AUC estimates for the various manifest variables. These statistics, in the case with no b shift, are given by Kawaguchi et al (5). Furthermore, let S represent the estimate of the variance-covariance matrix of the components of WðDÞ, also given by Kawaguchi et al (5). In the two examples that we consider, we apply WðDÞ in cases in which group is assigned systematically rather than randomly; in contrast, Kawaguchi et al (5) assume random group assignment. This extension is justified when covariates are evenly split between the two groups, because of the asymptotic linearity of the Wilcoxon statistic in regression parameters (15). In the case of one manifest variable, the statistic WðDÞ, and its fitted variance, are scalar, and may be written as W ðDÞ and b respectively. Since W ðDÞ is a linear combination of S, stratum-specific proportions of pairs of manifest variables,
L
888
Academic Radiology, Vol 20, No 7, July 2013
one from group 1 and one from group 2, such that the group 1 observation exceeds the group 2 observation minus D, then W ðDÞ is nonincreasing in D. A point estimate of the shift may b and b such that W ðDÞ#1 for D$ D be constructed as any D 2 1 b and the confidence region (1) is an interW ðDÞ$ for D# D, 2 pffiffiffiffi 1 b val with lower end points DL satisfying W ðDÞ# þ 1:96 S 2 p ffiffiffiffi 1 b for D#DL . The upper for D$DL and W ðDÞ$ þ 1:96 S 2 end point DU is similar, with 1:96 replacing 1.96. This construction presupposes that W ðDÞ is approximately normally distributed. In the presence of covariates, presume that WðDÞ has been adjusted for these covariates, as recommended by Kawaguchi et al (5). Then o n b 1 ðWðDÞ 1=2Þ#c2 DðWðDÞ 1=2Þu S r;a
(1)
represents a confidence region for the offsets. Here c2r;a represents the 1 a quantile for the c2r distribution, and 1 is a vector with r components whose entries are all 1.