Computational Statistics and Data Analysis 69 (2014) 220–227
Contents lists available at ScienceDirect
Computational Statistics and Data Analysis journal homepage: www.elsevier.com/locate/csda
Bootstrap corrections of treatment effect estimates following selection Gerd K. Rosenkranz ∗ Novartis Pharma AG, CH-4002 Basel, Switzerland
article
info
Article history: Received 19 August 2012 Received in revised form 2 August 2013 Accepted 14 August 2013 Available online 20 August 2013 Keywords: Estimation of maximum effect Estimation after selection Treatment selection Subgroup selection Bootstrap bias correction
abstract Bias of treatment effect estimators can occur when the maximum effect of several treatments is to be determined or the effect of the selected treatment or subgroup has to be estimated. Since those estimates may contribute to the decision as to whether to continue a drug development program, to select a specific dose or a specific subgroup of patients, methods should be applied that ensure a realistic rather than an overoptimistic estimator of a treatment effect following selection. Selection bias is well studied for normally distributed variables and to a lesser extent for other types of distributions. However, many methods developed for bias correction apply primarily to specific distributions. Since there is always uncertainty about the underlying distribution of data, a more generally applicable method is of interest. The bootstrap has been developed among others to estimate the bias under fairly general distributional assumptions. The potential of the bootstrap in reducing estimator bias after selection is investigated. © 2013 Elsevier B.V. All rights reserved.
1. Introduction Assume that pharmaceutical company A is aiming at licensing a new compound from company B. As it turns out, company B offers two compounds in the indication of interest, both in the same stage of development and with a similar safety profile. The efficacy data look promising. As the first company being interested, A is granted the opportunity to pick one of the two candidates. How shall the company representatives proceed? Eventually company A decides to select the best compound in terms of the observed treatment effect. After resuming development with the selected compound, they find that the effect observed in new trials is somewhat lower than what they understood from company B’s data. Has company A been tricked by B with providing overoptimistic information? The issue of selection arises often in drug development. Two other relevant selection problems in pharmaceutical medicine are of interest. One is to find the optimal dose of a drug, the other constitutes in identifying subgroups of patients that respond better to certain therapies than the population at large. The risk of overestimating effects after selection is present in these situations as well. In fact, additional issues are to be considered. When two treatments are practically equal, it may not matter which one to pick. However, in case two doses of the same treatment are having similar efficacy, the lower one would likely be preferred to prevent unnecessary exposure. When looking for subgroups with a better treatment effect, equal efficacy across subgroups should not result in favoring a subgroup but in considering the full population. Situations like the ones described have been discussed in the literature and methods to minimize the amount of overestimation of parameters following a selection process have been developed. The majority of papers are primarily concerned with normally distributed variables. Cohen and Sackrowitz (1989) (quoting unpublished work by Putter and Rubinstein, 1968) state that an unbiased estimator for the selected mean of two normal populations does not exist. A formal proof and an
∗
Tel.: +41 61 324 3315. E-mail address:
[email protected].
0167-9473/$ – see front matter © 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.csda.2013.08.010
G.K. Rosenkranz / Computational Statistics and Data Analysis 69 (2014) 220–227
221
extension of the result to a broader class of distributions is provided in Vellaisamy (2009). A plethora of alternative estimators that are less biased than the maximum of the arithmetic means have been presented in the literature, e.g., Blumenthal and Cohen (1968), Dahiya (1974), Hsieh (1981), Venter (1988) and Stallard et al. (2008). While unbiased estimates of means after selection do not exist for normally distributed variables such estimates exist for other distributions. Vellaisamy and Sharma (1988) derive an unbiased estimator of the mean of the selected population of gamma distributed variables with common and known shape parameter. We will review some of these approaches and compliment them with a proposal that can be applied broadly without too many distributional assumptions by utilizing the bootstrap (Efron and Tibshirani, 1993; Hesterberg, 2011). The next section provides some notation and definitions. Section 3 derives the bootstrap approximation of the bias of the estimators of the maximum effect, the effect of the selected treatment, and the effect of the selected subgroup. Numerical results will be obtained and compared with the bias of alternative estimators from the literature. 2. Notation We consider the situation where two treatments are to be compared in terms of their efficacy. To this end, data Xij (i = 1, 2; j = 1, . . . , ni ) are obtained from patients under treatment 1 and 2, respectively. We assume that Xij are independently distributed with cumulative distribution functions Fi and thatthe treatment effect can be assessed in terms of parameters θi = t (Fi ). Typical examples of such parameters are the mean xdF (x) and the median F −1 (0.5). Estimators for θi can be found from the plug-in principle. Let Fˆi denote the empirical distribution function obtained from the data vector xi = (xi1 , . . . , xini ): Fˆi (x) = #{xij ≤ x}/ni . Let θˆi = t (Fˆi ) be the plug-in estimator of θi . (Note that we reserve the notation θˆ for plug-in estimators throughout the paper.) Since the Xij are independently identically distributed for i = 1, 2, it follows from Glivenko’s theorem (1933) that Fˆi → Fi under the sup-norm for ni → ∞. If t (F ) is a continuous function of F , θˆi is a consistent estimator of θi and consequently asymptotically unbiased. A quantity of interest in the context of selecting a treatment is the maximum effect of the two treatments, θm = max(θ1 , θ2 ). From the plug-in principle it follows that θˆm = max(θˆ1 , θˆ2 ) is a consistent estimator of θm , however, it is obviously not unbiased. The maximum possible effect θm should be distinguished from the effect θs of the treatment that has been selected as the maximum using a selection rule based on the data. Let u : Rn1 × Rn2 → {0, 1} such that treatment 1 is selected if u(X1 , X2 ) = 1 and treatment 2 otherwise. Then
θs = uθ1 + (1 − u)θ2 . While the parameter θm is considered fixed but unknown, θs is random and depends on the selection. The bias of an estimator θ˜s of θs is defined by BF1 ,F2 (θ˜s , θs ) = EF1 ,F2 [θ˜s − θs ] to account for the random nature of θs . Note that we explicitly mention the (parameters of) the distribution functions under which the expectation is calculated. In addition to considering the bias of an estimator θ˜s of θs it is also instructive to consider the bias of θ˜s given that one of the treatments has been selected: BF1 ,F2 (θ˜s |θi ) = Eθ1 ,θ2 [θ˜s − θi ]. The conditional bias provides an idea of the size of the bias under correct or incorrect selection. The unconditional bias is then BF1 ,F2 (θ˜s , θs ) = Pr[u = 1]BF1 ,F2 (θ˜s |θ1 ) + Pr[u = 0]BF1 ,F2 (θ˜s |θ2 ). To provide an example we assume that Fi are the cumulative distribution functions of normally distributed random variables with mean θi and variance σ 2 . Then θi = t (Fi ) = xdFi (x) and the plug-in estimators are θˆi = t (Fˆi ) = X¯ i =
(1/ni )
j
Xij and θˆm = X¯ m = max(X¯ 1 , X¯ 2 ). Let δ = |θ1 − θ2 | and τ 2 = σ 2 (1/n1 + 1/n2 ). Blumenthal and Cohen (1968) show
that the bias of X¯ m as an estimator of θm is given by Bθ1 ,θ2 (X¯ m , θm ) = Eθ1 ,θ2 [X¯ m − θm ] = τ [φ(δ/τ ) − δ Φ {−δ/(τ
√
2)}]
(1)
where φ and Φ denote the probability density function and cumulative distribution function of a standard normal variate, respectively. Let u = 1 if X¯ 1 > X¯ 2 and zero otherwise. Then X¯ m is also a consistent estimator of θs with bias given by Bθ1 ,θ2 (X¯ m , θs ) = Eθ1 ,θ2 [X¯ m − θs ] = τ φ(δ/τ )
√ (Venter, 1988). B(X¯ m , θm ) is smaller than B(X¯ m , θs ), and both attain a maximum of τ / 2π for θ1 = θ2 .
(2)
222
G.K. Rosenkranz / Computational Statistics and Data Analysis 69 (2014) 220–227
A bias correction of an estimator can be obtained by reducing the estimate by its bias. Since the bias is generally depending on the true parameter values θi , a bias correction has to rely on an estimator of the bias. For the examples above such an estimator can be obtained by replacing δ and τ in Eqs. (1), (2) by their estimators (Dahiya, 1974; Hsieh, 1981):
√ ˆ τˆ ) − δˆ Φ {−δ/( ˆ τˆ 2)}], θ˜mc = X¯ m − τˆ [φ(δ/
(3)
ˆ τˆ ). θ˜sc = X¯ m − τˆ φ(δ/
(4)
However, this bias correction has been derived under the assumption of normally distributed data what can be relevant as we will show in an example. A performance measure of an estimator θ˜ of θ that accounts for both variability and accuracy (i.e., bias) is the mean square error (MSE) MSE(θ˜ ) = E [(θ˜ − θ )2 ] = Var[θ˜ ] + {E [θ˜ − θ]}2 . Since reduction of bias may be accompanied by an increase of variance, we provide also the MSEs of the bias-adjusted estimators for comparison with the original biased estimator. 3. Bootstrap estimator of bias In the following sections we apply the bootstrap to reducing the bias of the estimator of the maximum effect, the estimator of the effect of the selected treatment and the selected subgroup. 3.1. Bias of the estimator of the maximum effect Estimating the bias of the estimator of the maximum effect allows to describe the basic ideas behind bias correction by the bootstrap. For any estimator θ˜ = h(X1 , X2 ) of θ = t (F1 , F2 ) the bias is given by BF1 ,F2 (θ˜ , θ ) = EF1 ,F2 [h(X1 , X2 )] − t (F1 , F2 ).
(5)
The bootstrap estimate of the bias is obtained by substituting Fˆi for Fi in (5), BFˆ ,Fˆ (θ˜ , θ ) = EFˆ ,Fˆ [h(X∗1 , X∗2 )] − t (Fˆ1 , Fˆ2 ), 1 2 1 2
(6)
where Xij ∼ Fˆi . In other words, the bootstrap estimate of the bias is the plug-in estimator of the bias. For general Fi , BFˆ ,Fˆ 1 2 ∗
cannot be obtained in closed form and has to be approximated by Monte Carlo simulation in most practical situations. To ∗ do so we draw B samples X∗ib = (Xi1b , . . . , Xin∗ i b ) from Fˆi , i = 1, 2 and approximate the bootstrap expectation in (6) by B
1 h(X∗1b , X∗2b ) EFˆ ,Fˆ [h(X∗1 , X∗2 )] ≃ h¯ ∗ = 1 2 B b=1 and estimate the bias of h(X1 , X2 ) with
(7)
BFˆ ,Fˆ (θ˜ , θ ) ≃ h¯ ∗ − t (Fˆ1 , Fˆ2 ). 1 2 A bias-reduced estimate h¯ ∗c of θ is then given by h¯ ∗c = θ˜ − BFˆ ,Fˆ (θ˜ , θ ) ≃ h(X1 , X2 ) + t (Fˆ1 , Fˆ2 ) − h¯ ∗ . 1 2 The bias reduced estimator of the plug-in estimator h(X1 , X2 ) = t (Fˆ1 , Fˆ2 ) is given by h¯ ∗c = 2t (Fˆ1 , Fˆ2 )− h¯ ∗ . The considerations above apply in a straightforward manner to the estimation of the maximum effect θm = max(θ1 , θ2 ). The plug-in estimator is θˆm = max{t (Fˆ1 ), t (Fˆ2 )} and the bias-reduced estimator is h¯ ∗c = 2 max{t (Fˆ1 ), t (Fˆ2 )} −
B 1
B b =1
∗ ∗ max{t (Fˆ1b ), t (Fˆ2b )},
(8)
where Fˆib∗ denotes the empirical distribution function of a bootstrap sample X∗ib of Xi . To obtain an idea of how well the bootstrap reduces the bias we conducted a small simulation assuming normally distributed data with means θ1 = 1, θ2 = 0, 0.5, 1, σ = 1 and n1 = n2 = 10. We run 10 000 simulations for each parameter combination with B = 200 bootstrap replicates per simulation The results are compared with the adjusted estimator (3) and shown in Table 1. As expected, the bias is small and the bias corrected estimator of the maximum effect does not differ much from the uncorrected estimator when the means are different. When the means are identical, the bias is substantial and is only partially reduced by either method. Hence it is fair to say that the bootstrap compares well with a method that should be optimal for normally distributed variables. In practice, data are hardly ever known to be normally distributed. Therefore it is of interest to investigate how the bootstrap correction and the correction method based on the assumption of normally distributed data compare when the latter assumption is violated. While both methods produced similar results for uniform and exponential distributions (not shown), a striking difference occurred for log-normally distributed variables. The results for log(Xij ) ∼ N (θi , 1) resulting
G.K. Rosenkranz / Computational Statistics and Data Analysis 69 (2014) 220–227
223
Table 1 Means and mean square errors of the estimate of the maximum effect, the adjusted estimator (3) and the bootstrap estimator (8) from 10 000 simulations of normally distributed variables with ni = 10, θ1 = 1, σ = 1, and B = 200 bootstrap samples per simulation.
θ2
X¯ m
θ˜mc
h¯ ∗c
0.0 0.5 1.0
0.998 (0.097) 1.025 (0.087) 1.173 (0.098)
0.987 (0.111) 0.967 (0.112) 1.081 (0.092)
0.986 (0.106) 0.982 (0.104) 1.105 (0.094)
Table 2 Means and mean square errors of the estimate of the maximum effect, the adjusted estimator (3) and the bootstrap estimator (8) from 10 000 simulations of log-normally distributed variables with ni = 10, σ = 1, expectations ξi = exp(θi + 0.5), where θ1 = 1, and B = 200 bootstrap samples per simulation.
θ2
ξ2
X¯ m
θ˜mc
h¯ ∗c
0.0 0.5 1.0
1.649 2.718 4.482
4.482 (3.253) 4.675 (3.359) 5.417 (4.548)
4.979 (13.08) 5.201 (15.16) 6.065 (18.54)
4.436 (3.349) 4.523 (3.452) 5.143 (4.154)
in ξi = E [Xij ] = exp(θi + 0.5) are shown in Table 2. The bootstrap manages to reduce the bias in this situation while the method devised for normally distributed variables is increasing the bias and consequently the mean square error. 3.2. Bias of the effect estimator of the selected treatment So far we looked at the problem of estimating the maximum effect of two treatments. More often one wants to select the treatment that achieves the maximum effect and to estimate its effect. To apply the bootstrap to this situation define u : Rn1 × Rn2 → {0, 1} such that treatment 1 is selected if u(X1 , X2 ) = 1. For any estimator θ˜s = h(X1 , X2 ) of θs the bias is given by BF1 ,F2 (θ˜s , θs ) = EF1 ,F2 [h(X1 , X2 ) − t (F1 )u(X1 , X2 ) − {1 − u(X1 , X2 )}t (F2 )]
= EF1 ,F2 [h(X1 , X2 )] + [t (F2 ) − t (F1 )]EF1 ,F2 [u(X1 , X2 )] − t (F2 ). The bootstrap estimator of the bias is obtained from this equation by substituting Fˆi for Fi : BFˆ ,Fˆ (θ˜s , θs ) = EFˆ ,Fˆ [h(X∗1 , X∗2 )] + [t (Fˆ2 ) − t (Fˆ1 )]EFˆ ,Fˆ [u(X∗1 , X∗2 )] − t (Fˆ2 ). 1 2 1 2 1 2
(9)
The expectations in (9) can be approximated as described in the previous section. Let h¯ ∗ be defined as in (7) and u¯ ∗ = B 1 ∗ ∗ b=1 u(X1b , X2b ) then B BFˆ ,Fˆ (θ˜s , θs ) ≃ h¯ ∗ + [t (Fˆ2 ) − t (Fˆ1 )]¯u∗ − t (Fˆ2 ). 1 2 Instead of estimating the unconditional bias B(θ˜s , θs ) it may be also instructive to consider the bias of θ˜s given θi has been selected. With u1 = u and u2 = 1 − u, the conditional bias of θ˜s given treatment i has been selected is BF1 ,F2 (θ˜s |θi ) =
=
EF1 ,F2 [{h(X1 , X2 ) − t (Fi )}ui (X1 , X2 )] EF1 ,F2 [ui (X1 , X2 )] EF1 ,F2 [h(X1 , X2 )ui (X1 , X2 )] EF1 ,F2 [ui (X1 , X2 )]
− t (Fi ).
To apply the bootstrap to estimating this conditional bias substitute the empirical distribution functions in the previous equations: EFˆ ,Fˆ [h(X∗1 , X∗2 )ui (X∗1 , X∗2 )] 1 2
− t (Fˆi ). EFˆ ,Fˆ [ui (X∗1 , X∗2 )] 1 2 Eventually the Monte Carlo bootstrap estimates are BFˆ ,Fˆ (θ˜s |θi ) = 1 2
BFˆ ,Fˆ (θ˜s |θi ) ≃ h¯ ∗i − t (Fˆi ) 1 2 with B
¯∗
hi =
h(X∗1b , X∗2b )ui (X∗1b , X∗2b )
b =1 B b =1
. ui (X1b , X2b ) ∗
∗
224
G.K. Rosenkranz / Computational Statistics and Data Analysis 69 (2014) 220–227 Table 3 Means and mean square errors of the estimates of the effect of the selected treatment, the adjusted estimator (4) and the bootstrap estimator from 10 000 simulations of normally distributed variables with ni = 10, θ1 = 1, σ = 1, and B = 200 bootstrap samples per simulation. The last column contains Eθ1 ,θ2 [θs ] from Eq. (10).
θ2
θˆm
θ˜sc
h¯ ∗c
Eθ1 ,θ2 [θs ]
0.0 0.5 1.0
1.001 (0.098) 1.033 (0.092) 1.178 (0.100)
0.964 (0.119) 0.942 (0.106) 1.055 (0.088)
0.969 (0.118) 0.950 (0.105) 1.064 (0.089)
0.9873 0.9341 1.0000
Table 4 Means of the estimates of the effect of the selected treatment, the bootstrap-corrected and the unbiased estimators (11) from 10 000 simulations of exponentially distributed variables with ni = 10, θ1 = 1, and B = 200 bootstrap samples per simulation.
θ2
θˆm
h¯ ∗c
θˇs
0.5 1.0
1.006 1.172
0.964 1.065
0.965 0.996
The adjusted estimate of θ˜s is then h¯ ∗ic = θ˜s + t (Fˆi ) − h¯ ∗i . Again we conduct a small simulation study to investigate the bias of the plug-in estimator θˆm in the context of treatment selection. As in the previous section let Xij be independent and identically distributed normal variables with mean θi and variance σ 2 = 1. Let u(x1 , x2 ) = I (¯x1 > x¯ 2 ). Hence
θ1 − θ2 ¯ ¯ E [u(X1 , X2 )] = Pr[X2 − X1 < 0] = Φ √ 1/n1 + 1/n2 and therefore
θ1 − θ2 E [θs ] = θ2 − (θ2 − θ1 )Φ √ . 1/n1 + 1/n2
(10)
Obviously, E [θs ] ≤ θm where equality only holds for θ1 = θ2 . A reason for this is that also the treatment with the smaller effect has a positive chance to be chosen. The results for θˆm as an estimator of θs are displayed in Table 3. Since there is no unbiased estimator of the mean of the selected treatment for normally distributed data, it may not be surprising that the bootstrap can only partially correct for bias. It is therefore worthwhile to examine the bias reduction potential of the bootstrap for distributions where unbiased estimators after selection exist. Let Xij be exponentially distributed with hazard λi or mean θi = 1/λi . Then X¯i is Gamma(ni λi , ni ) distributed. For n1 = n2 = n, the unbiased estimator of θs is given by (Vellaisamy and Sharma, 1988),
¯ n X ˇθs = X¯ (2) 1 − (1) X¯ (2)
(11)
where X¯ (1) ≤ X¯ (2) is the order statistics of X¯ 1 , X¯ 2 . The results of the simulations are shown in Table 4. The bootstrap correction works well for θ2 < θ1 and leaves still some bias for θ1 = θ2 . It can be more intuitive to consider the bias of the estimator conditional on the selected treatment. This approach particularly delivers information about the bias in case of a correct or an incorrect selection. In reality we would of course hardly ever know whether the decision was correct or not. Table 5 contains the results of the simulation. If treatment 1 is correctly selected for θ2 = 0, 0.5, the bias correction works quite well. For the incorrect selection of treatment 2 there is a bias reduction as well, however, the remaining bias will still be substantial. In case of equal treatment effects, either treatment would be a correct selection, and the overestimation of the effect would be the largest for any correct selection presented in the table. The bias correction cuts the bias by approximately 50%. 3.3. Bias of the effect estimator of the selected subgroup Would subgroup selection just consist of deciding in favor of a subgroup or its complement in a larger population, subgroup selection would not differ from treatment selection. However, there is always the alternative of considering the full population instead of one or the other subgroup, particularly when the treatment effect does not differ among subgroups. A simple selection rule like picking the (sub)group with maximum effect will always bring forward a subgroup but never the full population. Therefore it is more appropriate to select a subgroup only if there is a relevant difference between the two subgroups.
G.K. Rosenkranz / Computational Statistics and Data Analysis 69 (2014) 220–227
225
Table 5 Means and mean square errors of the estimates of the selected treatment conditional on selection from 10 000 simulations of normally distributed variables with ni = 10, θ1 = 1, σ = 1, and B = 200 bootstrap samples per simulation.
θ2
Selection
%/100
θ˜s
h¯ ∗ic
0.0
1 2
0.988 0.012
1.008 (0.096) 0.600 (0.397)
0.987 (0.107) 0.472 (0.265)
0.5
1 2
0.866 0.134
1.060 (0.087) 0.870 (0.196)
1.011 (0.101) 0.763 (0.140)
1.0
1 2
0.498 0.502
1.179 (0.100) 1.179 (0.102)
1.094 (0.093) 1.092 (0.095)
Let Fi be the distribution of data from subgroup i = 1, 2. With n = n1 + n2 the distribution of data from the full population is given by F3 = (n1 /n)F1 + (n2 /n)F2 with corresponding treatment effect θ3 = t (F3 ). We define functions ui : Rn1 × Rn2 → {0, 1} with u1 + u2 ≤ 1 such that subgroup i < 3 is selected if ui (X1 , X2 ) = 1, and the full population is selected if u3 = 1 − u1 − u2 = 1. Let θ˜s = h(X1 , X2 ) be an estimator of the effect θs in the selected subgroup. The unconditional bias is then given by BF1 ,F2 (θ˜s , θs ) = EF1 ,F2 [h(X1 , X2 )] − t (F3 ) − [t (F1 ) − t (F3 )]EF1 ,F2 [u1 (X1 , X2 )] − [t (F2 ) − t (F3 )]EF1 ,F2 [u2 (X1 , X2 )] from which the bootstrap estimator of the bias can be derived as in previous sections. The conditional bias of θ˜s given subgroup i or the full population has been selected is BF1 ,F2 (θ˜s |θi ) =
EF1 ,F2 [h(X1 , X2 )ui (X1 , X2 )] EF1 ,F2 [ui (X1 , X2 )]
− t (Fi ),
i = 1, 2, 3
with corresponding bootstrap approximation BFˆ ,Fˆ (θ˜s |θi ) ≃ h¯ ∗i − t (Fˆi ). 1 2 As an example we consider normally distributed random variables Xij , 1 ≤ j ≤ ni , with means θi and common variance σ 2 = 1. Then θ3 = (n1 θ1 + n2 θ2 )/n and θˆi = X¯ i for i < 3 and θˆ3 = X¯ 3 = (n1 X¯ 1 + n2 X¯ 2 )/n. We consider the selection functions u1 (X1 , X2 ) = I (X¯ 1 − X¯ 2 > d),
u2 (X1 , X2 ) = I (X¯ 2 − X¯ 1 > d)
for some d ≥ 0. (Note that not both u’s can be one at the same time.) Then the probability to select treatment 1 or 2 is p1 = E [u1 (X1 , X2 )] = 1 − Φ p2 = E [u2 (X1 , X2 )] = 1 − Φ
d − (θ1 − θ2 )
√
1/n1 + 1/n2
d − (θ2 − θ1 )
√
1/n1 + 1/n2
, ,
respectively, and hence E [θs ] = θ3 + (θ1 − θ3 )p1 + (θ2 − θ3 )p2 . Obviously p3 = 1 − p1 − p2 is the probability to select the full population. From the definitions of pi it follows for min(n1 , n2 ) → ∞ p1 → 1,
p2 → 0,
p1 = 1/2,
p3 → 0,
p2 → 0,
p1 → 0,
p2 → 0,
p1 → 0,
p2 = 1/2,
p1 → 0,
p2 → 1,
if θ1 − θ2 > d,
p3 → 1/2, p3 → 1,
p3 → 1/2, p3 → 0,
if θ1 − θ2 = d,
if |θ1 − θ2 | < d, if θ2 − θ1 = d,
if θ2 − θ1 > d.
The threshold d ≥ 0 has to be selected on relevance grounds with the results above in mind. Note that for θ1 = θ2 the probability to select the full population is p3 = 2Φ
√
d 1/n1 + 1/n2
− 1.
The results of the simulations are contained in Table 6. The smallest conditional bias occurs when the full population is selected rightly or wrongly. The bias correction works reasonably when the first subgroup is selected correctly. A substantial bias remains when a subgroup is selected despite θ1 = θ2 and of course when the second subgroup is selected incorrectly
226
G.K. Rosenkranz / Computational Statistics and Data Analysis 69 (2014) 220–227 Table 6 Means and mean square errors of the effect estimates of the selected subgroup or the full population conditional on selection from 10 000 simulations of normally distributed variables with ni = 10, θ1 = 1, σ = 1, d = 0.1 (upper half) and d = 0.5 (lower half) and B = 200 bootstrap samples per simulation.
θ2
Selection
%/100
θ˜s
h¯ ∗ic
0.0
1 2 3
0.979 0.006 0.015
1.013 (0.093) 0.652 (0.475) 0.508 (0.054)
0.990 (0.106) 0.528 (0.335) 0.508 (0.054)
0.5
1 2 3
0.817 0.090 0.093
1.073 (0.086) 0.906 (0.224) 0.757 (0.046)
1.015 (0.099) 0.786 (0.153) 0.757 (0.048)
1.0
1 2 3
0.412 0.412 0.176
1.211 (0.111) 1.207 (0.107) 0.998 (0.050)
1.120 (0.096) 1.113 (0.093) 0.998 (0.052)
0.0
1 2 3
0.866 0.001 0.133
1.059 (0.086) 0.785 (0.637) 0.501 (0.049)
1.009 (0.101) 0.651 (0.461) 0.502 (0.051)
0.5
1 2 3
0.499 0.012 0.489
1.177 (0.099) 1.087 (0.394) 0.750 (0.049)
1.093 (0.093) 0.965 (0.273) 0.750 (0.050)
1.0
1 2 3
0.129 0.142 0.729
1.363 (0.190) 1.368 (0.197) 1.002 (0.051)
1.254 (0.134) 1.261 (0.141) 1.002 (0.052)
in which case the MSE increases substantially. The threshold d = 0.1 is ideal to select the correct subgroup if there is a difference in the effect, however, in case of equal effects, the selection of the full population would be missed with a probability of more than 0.8. As expected, a threshold of d = 0.5 leads to a much higher probability to select the full population when the treatments are identical but selects either the first group or both with probability of about 50%. When the standard deviation is not known like for the simulations performed above, it may be advantageous to use d times an estimator of the standard error of (X¯ 1 − X¯ 2 ) as a selection criterion (see Hsieh, 1981). 4. Discussion Estimation after selection is prone to bias (Bauer et al., 2010). For data that are normally distributed it can be shown that no unbiased estimator of the selected treatment exists, while unbiased estimators exist for the gamma distribution family with known and equal shape parameter and equal sample sizes. Estimation methods have been developed for the normal case that reduce the bias after selection. However these methods mainly apply for such data and are not easily generalizable to other distributions. As shown in the previous sections, bias reduction for general distributions or under uncertainty about the underlying distribution can be obtained through the use of the bootstrap. This approach corrects well for populations that differ. However it is not able to completely eliminate bias, not even in cases where this should be theoretically possible, when the populations to be selected are equal or close. One may object that the number of bootstrap simulations should be higher than 200 for the estimation of bias as pointed out by Efron and Tibshirani (1993) and even more so by Hesterberg (2011). We conducted simulations with B = 10 000 and the results did not change. One may therefore conjecture that the bootstrap cannot fully correct in certain situations, at least not for the fairly small sample sizes chosen for the simulations. In defense of this choice it should be noted that often decisions on treatment or subgroup selection for further development are to be taken based on sample sizes well below those typical for phase III studies. Nevertheless, the bootstrap correction competes well with the methods specifically designed for normal variables and works reasonably well in the exponential case. The advantage of being applicable under a broad spectrum of distributions may compensate for not correcting as well as a method designed for a specific situation. The approach described in this note can be generalized to the situation that two treatments are to be compared to a control. In this situation, one would re-sample from each of the three groups and compare the differences between either treatment and placebo. It is also possible to apply the bootstrap to estimates obtained from regression models like analysis of covariance and to survival data with censoring where typically pairs of data are observed representing the survival time and a censoring indicator. Eventually it should be mentioned that reduction of bias in estimation is often paid for by an increase of the standard error of the estimate, i.e., by a decrease of efficiency. This can also be the case for estimators of the selected treatment or the selected subgroup. In our simulations the MSE did sometimes increase or decrease depending on the situation. For the purpose of deciding about the next steps in drug development, the main reason for bias correction after selection is to obtain estimates that do not create the illusion of large effects which are mainly triggered by variability and not be a truly large difference. Such effects would not be reproducible in further experiments.
G.K. Rosenkranz / Computational Statistics and Data Analysis 69 (2014) 220–227
227
References Bauer, P., König, F., Brannath, W., Posch, M., 2010. Selection and Bias—two hostile brothers. Statistics in Medicine 29, 1–13. Blumenthal, S., Cohen, A., 1968. Estimation of the larger of two normal means. Journal of the American Statistical Association 63, 861–876. Cohen, A., Sackrowitz, H., 1989. Two-stage conditionally unbiased estimators of the selected normal means. Statistics and Probability Letters 8, 273–278. Dahiya, R.C., 1974. Estimation of the mean of the selected population. Journal of the American Statistical Association 69, 226–230. Efron, B., Tibshirani, R.J., 1993. An Introduction to the Bootstrap. Chapman & Hall, London. Glivenko, V.I., 1933. Sulla determinazione empirica delle leggi di probabilità. Giornale dell’Istituto Italiano degli Attuari 4, 92–99. Hesterberg, T., 2011. Bootstrap. Wiley Interdisciplinary Reviews: Computational Statistics 3, 497–526. Hsieh, H., 1981. On estimating the mean of the selected population with unknown variance. Communications in Statistics—Theory and Methods 10, 1869–1878. Putter, J., Rubinstein, D., 1968. On estimating the mean of a selected population. Technical Report, No. 165. Department of Statistics, University of Wisconsin. Stallard, N., Todd, S., Whitehead, J., 2008. Estimation following selection of the largest of two normal means. Journal of Statistical Planning and Inference 138, 1629–1638. Vellaisamy, P., 2009. A note on unbiased estimation following selection. Statistical Methodology 6, 389–396. Vellaisamy, P., Sharma, D., 1988. Estimation of the mean of the selected gamma population. Communications in Statistics—Theory and Methods 17, 2797–2817. Venter, J.H., 1988. Estimation of the mean of the selected population. Communications in Statistics—Theory and Methods 17, 791–805.