Journal of Statistical Planning and Inference 149 (2014) 152–161
Contents lists available at ScienceDirect
Journal of Statistical Planning and Inference journal homepage: www.elsevier.com/locate/jspi
Covariate-adjusted response-adaptive designs for generalized linear models Siu Hung Cheung a,n, Li-Xin Zhang b, Feifang Hu c, Wai Sum Chan d a
Department of Statistics, The Chinese University of Hong Kong, Shatin, Hong Kong, China Department of Mathematics, Zhejiang University, Hangzhou, China Department of Statistics, George Washington University, Washington, DC, USA d Department of Finance, The Chinese University of Hong Kong, Shatin, Hong Kong, China b c
a r t i c l e in f o
abstract
Article history: Received 5 August 2013 Received in revised form 22 December 2013 Accepted 14 February 2014 Available online 24 February 2014
Response-adaptive designs have been shown to be useful in reducing the expected number of patients receiving inferior treatments in clinical trials. Zhang et al. (2007) developed a framework for covariate-adjusted response-adaptive designs that can be applied to the class of generalized linear models, providing treatment allocation strategies and estimation methods. However, their results are based on a full model in which all treatment-by-covariate interactions are present. Without relevant distribution theorems on the estimation of parameters in a reduced model, the testing of hypotheses regarding main effects, covariate effects, or their intersections is impossible with their framework. In this paper, we address this deficiency and develop the necessary theoretical properties to conduct hypothesis testing. The theorems that we develop are applicable to generalized linear models. To assist with the comprehension of our proposed framework, we apply it to the logistic regression model for illustrative purposes. We also discuss a procedure for producing asymptotic expected failure rates and treatment proportions, an area neglected in previous covariate-adjusted response-adaptive design research. A simulation study is also presented to reveal the operational characteristics of the framework, including the treatment allocations, failure rates, and test power for various covariate-adjusted response-adaptive designs. & 2014 Elsevier B.V. All rights reserved.
Keywords: Asymptotic normality Clinical trials Logistic regression Personalized medicine Power Total failures
1. Introduction Patients accrual is sequential in clinical studies, with a randomized treatment allocation scheme often adopted to reduce treatment assignment bias (Rosenberger and Lachin, 2002). Response-adaptive designs offer randomized treatment allocation schemes that assign different treatments to incoming patients based on the observed responses of previous patients. The main benefit of such designs is that they reduce the number of patients receiving inferior treatments while maintaining a high level of test power for statistical inferences about treatment efficacy (Rosenberger, 1996). The presence of covariates in the evaluation of treatment efficacy is common in clinical trials (Hu and Rosenberger, 2006; Bonetti and Gelber, 2004; Taves, 2010). Patient information such as sex, age, biomarkers and medical history can serve as informative covariates that should be incorporated into treatment assignment schemes and the subsequent data analysis
n
Corresponding author. E-mail addresses:
[email protected] (S.H. Cheung),
[email protected] (L.-X. Zhang),
[email protected] (F. Hu),
[email protected] (W.S. Chan).
http://dx.doi.org/10.1016/j.jspi.2014.02.006 0378-3758 & 2014 Elsevier B.V. All rights reserved.
S.H. Cheung et al. / Journal of Statistical Planning and Inference 149 (2014) 152–161
153
process. Covariates are playing an increasingly important role in clinical trials as scientists have identified many new biomarkers that may be linked with certain diseases in the past few decades. Identifying biomarkers that seem to be linked with a given disease is just the beginning of the arduous process of developing personalized medicine. We need both new designs and new statistical inference tools that allow biomarkers to be used in clinical trials to assist in patient selection and the significance testing of biomarkers (Hu, 2012). Covariate-adjusted response-adaptive (CARA) designs (Hu and Rosenberger, 2006) are expected to become useful techniques for application in the field of personalized medicine. Consider a clinical trial with K treatments. Let X m ¼ ðX m;1 ; …; X m;K Þ represent the assignment of a treatment to the m-th subject. If the m-th subject receives treatment k, then all of the elements in X m are 0 except for the k-th component X m;k , which is 1. Furthermore, Nm;k represents the number of subjects that receive treatment k in the first m treatment assignments. Let N m ¼ ðN m;1 ; …; Nm;K Þ, which denotes the assignment history thus far. Note that N m ¼ ∑m i ¼ 1Xi. Let the response of the m-th subject to treatment k be Y m;k , which is observed only when X m;k ¼ 1. Denote Y m ¼ ðY m;1 ; …; Y m;K Þ. If covariate information is available, then let ξm be the covariate of the m-th subject. The definition of the general CARA design is then ψ m;k ¼ EðX m;k jX j ; Y j ; ξj ; j ¼ 1; …; m 1; ξm Þ; k ¼ 1; …; K, the conditional probabilities of assigning treatments 1; …; K to the m-th patient, conditioning on the entire trial history, including the information on all previous m 1 assignments, responses, and covariate vectors, in addition to the current patient's covariate vector. Early development of adaptive designs with covariates focused mainly on procedures to alleviate the problem of the severe imbalance in the covariate profiles of patients in different treatment groups. Rosenberger and Sverdlov (2008) called these methods marginal procedures, as their intention is to achieve balance at each level of a given covariate (see, for example, Atkinson, 1982, 1999, 2002). The inclusion of covariates in response-adaptive designs is relatively recent (Rosenberger and Sverdlov, 2008). Rosenberger et al. (2001b) considered a CARA design for binary responses that uses a logistic regression model, and their simulation study indicated that the incorporation of covariates significantly reduces the percentage of treatment failures. However, they did not provide theoretical justifications or asymptotic properties. Zhang et al. (2007) developed a very general framework for investigating the asymptotic properties of CARA designs. Their study covers a wide spectrum of general statistical models, including generalized linear models as special cases. Based on these asymptotic results, it is possible to compute the asymptotic means and variances of both the overall allocation proportion and the allocation proportion with a given covariate. The CARA design applied to generalized linear models by Zhang et al. (2007) is as follows. With respect to treatment k, the expected value of the response Y m;k , denoted by μk , is a function of the linear combination of the covariate effects 1
d1
hk ðμk Þ ¼ αk þ ∑ βkj ξj ;
ð1Þ
j¼1
where hk is a link function, αk stands for the treatment effect, and βkj is the effect due to the j-th covariate in treatment k ¼ 1; 2; …; K. Here, we assume that the total number of unknown parameters is d for each treatment, and thus the number of covariates is d 1. For the j-th covariate, unequal values of βkj , k ¼ 1; …; K, imply the presence of an interaction effect between the j-th covariate and the treatments. Model (1) assumes the presence of treatment-by-covariate interactions for all covariates. As the responses are independent, the parameters can be estimated separately for each treatment, and the related asymptotic results can be obtained as stated in Zhang et al. (2007). However, these results cannot be applied to more restricted models in which some of the treatment-by-covariate interactions are excluded. There are three crucial reasons to consider such restricted models. First, it is often very important to test equal treatment effects or covariate effects in clinical trials. In these cases, the model under the null hypothesis is the restricted model, and related asymptotic theories are required to construct the testing procedure. For example, it is particularly important to detect the interaction between treatments and biomarkers (Royston and Sauerbrei, 2008; Hu, 2012) in personalized medicine. Hence, this paper extends the applicability of the work of Zhang et al. (2007) beyond estimation and the discussion of treatment allocation schemes to the hypothesis testing of treatment efficacy. Second, some treatment-by-covariate interactions may be excluded because of non-significant statistics in testing or the existence of prior information related to the interaction between the treatment and a particular covariate. In such cases, by the principle of parsimony, a simpler model is usually preferred, and the parameters associated with those covariates will be equal across all treatments. Using covariate j as an example, we have β1j ¼ β2j ¼ ⋯ ¼ βKj ð ¼ βj ; sayÞ. With this model, the estimator of βj must be derived from the responses of all treatments, and it is thus more complicated than those given in Zhang et al. (2007). Finally, if some covariates are prognostic, their effects are shared across treatment groups. Hence, estimation of the model coefficients corresponding to those covariates should be based on pooled data across treatment groups to borrow cross-treatment strength. Such pooling should lead to the more efficient estimation of covariate effects than estimating these effects separately for each treatment group. As a result, this approach potentially enables the attainment of robust maximum likelihood estimates (MLE) at earlier stages of the trial, thereby improving overall design performance. Models in which some of the treatment-by-covariate interactions are excluded are frequently encountered in practice. For example, in modeling epileptic data in clinical trials, Fotouhi (2008) discussed a generalized linear model that
154
S.H. Cheung et al. / Journal of Statistical Planning and Inference 149 (2014) 152–161
contained the treatment (progabide and placebo) and three other covariates (age, two-week seizure count, and baseline [seizure count measured over the eight-week period before the start of treatment]). Baseline was the only covariate considered to have an interaction with the treatment. Another example is the large-scale Stroke Prevention in Atrial Fibrillation (SPAF) study, which covered an extensive period with several phases and thousands of participants, and offers an illustration of modeling responses with the possible absence of some of the treatment-by-covariate interactions (see, for example, Hart et al., 2003; SPAF Investigators, 1990). This extended clinical study lasted for more than 10 years, and its early results were used to reshape the modeling processes by discarding some of the insignificant treatment-by-covariate interactions. The objectives of the research reported herein can be summarized as follows. (a) To derive asymptotic theorems of CARA designs for generalized linear models that allow the absence of some of the treatment-by-covariate interactions and address the testing of differences in treatment effects. (b) To develop theorems that provide the means to evaluate asymptotic allocation proportions and failure rates using various CARA designs. This information is valuable because an estimate of the proportion of patients receiving a certain treatment and the overall probability of success can be obtained before the onset of a clinical trial if previous clinical research provides a rough estimated value of treatment efficacy. (c) To use the logistic regression model, a popular and important member of the class of generalized linear models, to provide a clear exposition of the mechanisms involved in applying the derived theorems. (c) To conduct a simulation study to comprehend the operating characteristics of the proposed CARA designs for finite samples, including the measures of the degree of balance (in terms of allocation proportions), efficiency (in terms of test power), and ethics (in terms of treatment failure rate). The remainder of the paper is organized as follows. The description of the allocation scheme for CARA designs is given in Section 2, and Section 3 provides the related asymptotic results. The theorem development in Section 3 is very general, and is valid for generalized linear models. In Section 4, we discuss the logistic regression model, and apply the theorems developed in Section 3 to obtain specific asymptotic results for various treatment allocation schemes. Section 5 presents simulation results of the finite sample performance of the various allocation schemes, and Section 6 provides concluding remarks. 2. The allocation scheme of CARA design Given the covariate ξ, let the response Yk corresponding to treatment k have a distribution in the exponential family. Collect the parameters on the right-hand side of (1) as ðαk ; βk1 ; βÞ, k ¼ 1; 2; …; K. Here, the first entry of ðαk ; βk1 ; βÞ represents the treatment effect, and the remaining elements represent the effects due to the covariates. Correspondingly, the first element of the covariate vector ξ ¼ ð1; ξ1 ; …; ξd 1 Þ is 1. The covariates are decomposed into two groups. The first group of d1 parameters αk and βk1 allows for the presence of treatment–covariate interactions, and the second group β comprises covariates that are assumed to have no interactions with the treatments. The collection of all of the free parameters to be estimated is γ ¼ ðα1 ; β11 ; …; αK ; βK1 ; βÞ. The model considered in Zhang et al. (2007) is a special case of d1 ¼ d. After observation of the response of the m-th patient, let bm;K ; β bm Þ be the MLE of γ. ðb b γ m ¼ ðb α m;1 ; b β m;1 ; …; α bm;K ; β α m;k ; b β m;k1 ; b β m Þ is then the estimate of ðαk ; βk1 ; βÞ. We now describe the allocation scheme of a CARA design. Suppose that a patient is ready for randomization, and the corresponding covariate ξ ¼ x is measured. If the distribution parameter ðαk ; βk1 ; βÞ of each treatment k were known, that is, if the effect of each treatment were known for a given covariate x, then one would simply define the treatment allocation probability of this patient as a function of the values of parameters β, αk ; βk1 , k ¼ 1; …; K, and covariate x, that is, P k ¼ π k ðγ; xÞ;
k ¼ 1; …; K:
As each patient must receive one of the treatments, the allocation function πð; Þ ¼ ðπ 1 ð; Þ; …; π K ð; ÞÞ must satisfy π 1 þ ⋯ þ π K 1. This function could then be derived from certain optimization criteria. A general discussion of the choices of πðÞ is given in Zhang et al. (2007) and Hu and Rosenberger (2006). Several choices of the allocation function for logistic regression models are given in Section 4. In practice, the parameters αk ; βk1 ; β are unknown and must be estimated, and thus the allocation scheme of the CARA design can be described using the following three-step procedure. (a) (Initial Step) To obtain reliable estimates in the early stage, assign m0 subjects to each treatment using restricted randomization. (b) (Estimation Step) At stage m þ 1, assume that m (m ZKm0 ) subjects have been assigned to treatments. Their responses fY j ; j ¼ 1; …; mg and the corresponding covariates fξj ; j ¼ 1; …; mg are observed. Here, we use the vector Y j ¼ ðY j;1 ; …; Y j;K Þ to denote the outcome for patient j, where Y j;k is the outcome for patient j on treatment k. In practice, only one element Y j;k with X j;k ¼ 1 of vector Y j is observed. Using all observations up to patient m, fY j;k with X j;k ¼ 1; ξj : k ¼ 1; …; K; j ¼ 1; …; mg, bK1 ; βÞ. b ðb bk1 ; βÞ b is then the MLE of the unknown parameter γ ¼ ðα1 ; β11 ; …; αK ; βK1 ; βÞ can be derived as b γ m ¼ ðb α1; b β 11 ; …; α bK ; β αk; β the estimate of ðαk ; βk1 ; βÞ. When m is small, it is possible that the MLE has no solution. To avoid such an occurrence, we use
S.H. Cheung et al. / Journal of Statistical Planning and Inference 149 (2014) 152–161
155
the Bayesian method and replace the MLE by the Bayesian maximum posterior estimate (MAPE), although we still denote it by b γ m . Note that all of the estimators are computed on the basis of the data of the m patients seen thus far. (c) (Allocation Step) The ðm þ1Þ-th subject with covariate ξm þ 1 is assigned to treatment k with a probability of P k ¼ π k ðb γ m ; ξm þ 1 Þ, k ¼ 1; …; K. Thus, γ m ; ξm þ 1 Þ; P k ¼ PðX m þ 1;k ¼ 1jF m ; ξm þ 1 Þ ¼ π k ðb
k ¼ 1; …; K;
ð2Þ
where F m ¼ sðX 1 ; …; X m ; Y 1 ; …; Y m ; ξ1 ; …; ξm Þ is the sigma field of the history and π k ð; Þ, k ¼ 1; …; K, are the treatment allocation functions. Once the response Y m þ 1 of patient m þ 1 is available, we repeat the estimation step and wait for patient m þ 2. The estimation and allocation steps are then repeated continuously until the end of the clinical trial. Note that the estimator b γ m is a function of the responses Y j , covariates ξj , and assignments X j , j ¼ 1; …; m for all previous m patients. However, the allocation probabilities (2) also depend on the covariate ξm þ 1 of the incoming patient m þ 1. 3. Asymptotic properties This section discusses the generalized linear model. For a given covariate ξ, the response Yk of a trial of treatment k is assumed to have a distribution in the exponential family, which means that the conditional distribution takes the form f k ðyk jξ; αk ; βk1 ; βÞ ¼ expfðyk μk ak ðμk ÞÞ=ϕk þbk ðyk ; ϕk Þg
ð3Þ T
T
with a link function μk ¼ hk ðξðαk ; βk1 ; βÞ Þ, where αk ; βk1 ; β, k ¼ 1; …; K, are coefficients. Hereafter, we define ab as the dot product a b of the row vectors a and b. We assume that the scale parameter ϕk is fixed. Further, we assume the functions 0 ak ðÞ and hk ðÞ to be twice differentiable with a″k ðuÞ 4 0 and hk ðuÞ a 0 for all u. For the observations up to stage m, the likelihood function for parameter γ ¼ ðα1 ; β11 ; …; αK ; βK1 ; βÞ is m
K
~ ¼ ∏ ∏ ½f ðY j;k jξ ; αk ; β ; βÞX j;k : LðγÞ j k1 k j¼1k¼1
~ bK1 ; βÞ b of γ is that which maximizes LðγÞ bK ; β α1; b β 11 ; …; α over γ. In the beginning of the adaptive design, the The MLE b γ m ¼ ðb sample size m is small. The observed data are insufficient to derive the MLE, and thus we usually need to use the Bayesian method. Let WðγÞ be a prior distribution of parameter γ. Instead of the MLE, we consider the MAPE by maximizing the weighted likelihood function m
K
~ ¼ WðγÞ ∏ ∏ ½f ðY j;k jξ ; αk ; β ; βÞX j;k : WðγÞLðγÞ j k1 k j¼1k¼1
In practice, the prior distribution (or weight function) WðγÞ is chosen to be the density function of a multivariate normal distribution. Because the MLE and MAPE are equivalent for a large sample size, we still denote the MAPE by b γ m. The following theorem provides the asymptotic properties of both the MAPE (or MLE) and the sample allocation proportions. Detailed proofs are given in the appendix. Theorem 3.1. Suppose that WðγÞ has a continuous derivative. In addition, assume that the MAPE b γ m exists and is unique. Under Conditions A and (A.13) in Zhang et al. (2007), for k ¼1,…,K, we have PðX n;k ¼ 1Þ-vk ; and Nn v ¼ O n
PðX n;k ¼ 1jF n 1 ; ξn ¼ xÞ-π k ðγ; xÞ
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi! log log n a:s:; n
a:s:;
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi! log log n b γn γ ¼ O ; n
where vk ¼ E½π k ðγ; ξÞ, v ¼ ðv1 ; …; vK Þ. Further, pffiffiffi pffiffiffi D D nðN n =n vÞ-Nð0; ΣÞ and nðb γ n γÞ-Nð0; I γ 1 Þ: Remark 3.1. The asymptotic variance–covariance matrices Σ and I γ 1 in (6) are defined as follows. 1. First, write 0
1d1
B0 B d1 D¼B B ⋮ @ 0d1
0d1
⋯
0d1
1d1
⋯
0d1
⋮
⋮
⋮
0d1
⋯
1d1
1d d 1
1
1d d 1 C C C: ⋮ C A 1d d 1
Then, D is a dK ðd1 K þd d1 Þ matrix with a full column rank.
ð4Þ
ð5Þ
ð6Þ
156
S.H. Cheung et al. / Journal of Statistical Planning and Inference 149 (2014) 152–161
2. Write g k ðγn Þ ¼ E½π k ðγ n ; ξÞ, gðγn Þ ¼ ðg 1 ðγn Þ; …; g K ðγ n ÞÞ as the functions of γ n , and let ∂gðγÞ=∂γ ¼ ð∂g k ðγÞ=∂θj : j ¼ 1; …; d1 K þ d d1 ; k ¼ 1; …; KÞ. 3. Denote the conditional Fisher information matrix for a given ξ by # " ∂2 log f k ðY k jξ; αk ; βk1 ; βÞ 1 ″ 0 I k ðαk ; βk1 ; β ξÞ ¼ Eαk ;βk1 ;β a μ ½h ðξðαk ; βk1 ; βÞT Þ2 ξT ξ ξ ¼ ϕk k k k ∂ðαk ; βk1 ; βÞ2 and let I k ¼ E½π k ðγ; ξÞI k ðαk ; βk1 ; βjξÞ, k ¼ 1; …; K. 4. Define I γ and Σ by I γ ¼ DT diagðI 1 ; …; I K ÞD and Σ ¼ Σ1 þ 2Σ2 , where Σ1 ¼ diagðvÞ vT v, Σ2 ¼ ð∂g=∂γÞT I γ 1 ∂g=∂γ. Remark 3.2. Suppose that WðγÞ is a continuous function of γ with WðγÞ-0 as J γ J -0; then, the MAPE b γ m exists. Further, when log WðγÞ is a strictly concave function of γ and yk hk ðxÞ ak ðhk ðxÞÞ is a concave function of x (for example, when hðxÞ ¼ x) ~ for each yk and k, then log½WðγÞLðγÞ is a strictly concave function of γ, and thus the MAPE b γ m is unique. As discussed in Zhang et al. (2007), it is important to note that the asymptotic variability for both the allocation proportions and the estimators is different from that in designs that do not utilize the current covariate of the incoming subject. We now present the asymptotic result of the allocation proportions for a given covariate (for a discrete ξÞ. Theorem 3.2. Let Nn;kjx ¼ ∑nm ¼ 1 X m;k Ifξm ¼ xg be the number of subjects with covariate x that are randomized to treatment k, k ¼ 1; …; K in the n trials, and N n ðxÞ ¼ ∑nm ¼ 1 Ifξm ¼ xg be the total number of subjects with covariate x. Write N njx ¼ ðNn;1jx ; …; Nn;Kjx Þ. Given a covariate x, suppose that Pðξ ¼ xÞ 40. Under the conditions in Theorem 3.1, we have Nn;kjx =Nn ðxÞ-π k ðγ; xÞ
a:s: k ¼ 1; …; K
ð7Þ
and pffiffiffiffiffiffiffiffiffiffiffiffi D Nn ðxÞðN njx =Nn ðxÞ πðγ; xÞÞ-Nð0; Σjx Þ;
ð8Þ
where ∂πðγ; xÞ T 1 ∂πðγ; xÞ Σjx ¼ diagðπðγ; xÞÞ πðγ; xÞT πðγ; xÞ þ 2 Pðξ ¼ xÞ; Iγ ∂γ ∂γ where I γ is defined as in Remark 3.1. The results in Theorems 3.1 and 3.2 are very general. We now consider two cases with dichotomous (for example, success or failure) responses. The logistic regression model and probit regression model are commonly used to model dichotomous responses. Let pk ¼ pk ðαk ; βk1 ; βjξÞ be the conditional probability of success for treatment k for a given covariate ξ, and qk ¼ 1 pk be the failure rate. Here, αk is the treatment effect. Example 3.1 (Logistic regression). For the logistic regression, pk satisfies logitðpk Þ ¼ ξðαk ; βk1 ; βÞT ; i.e., pk ¼ 1=ð1 þ e gk Þ with g k ¼ ξðαk ; βk1 ; βÞT . The weighted likelihood function is m
K
1 Y j;k X j;k
~ ¼ WðγÞ ∏ ∏ ½p j;k q WðγÞLðγÞ k;j k;j j¼1k¼1
Y
;
ð9Þ
where pk;j ¼ pk ðαk ; βk1 ; βjξj Þ, qk;j ¼ 1 pk;j . This model satisfies (3) with ak ðuÞ ¼ logðeu þ 1Þ, hk ðxÞ ¼ x and ϕk 1. The conditional Fisher information matrix for a given ξ is I k ðαk ; βk1 ; βjξÞ ¼ pk qk ξT ξ:
Example 3.2 (Probit regression). For the probit regression, pk satisfies inv Φðpk Þ ¼ ξðαk ; βk1 ; βÞT ; i.e., pk ¼ Φðg k Þ with g k ¼ ξðαk ; βk1 ; βÞT . Here, ΦðÞ is the standard normal distribution. The weighted likelihood function is as in (9). This model satisfies (3) with ak ðuÞ ¼ logðeu þ1Þ, hk ðxÞ ¼ log ΦðxÞ logð1 ΦðxÞÞ and ϕk 1. The conditional Fisher
S.H. Cheung et al. / Journal of Statistical Planning and Inference 149 (2014) 152–161
157
information matrix for a given ξ is ½ϕðξðαk ; βk1 ; βÞT Þ2 T ξ ξ I k ðαk ; βk1 ; βξÞ ¼ pk qk
1 2 with ϕðxÞ ¼ pffiffiffiffiffiffi e x =2 : 2π
For the dichotomous response case, the allocation function πðÞ ¼ ðπ 1 ðÞ; …; π K ðÞÞ is usually chosen as a function of p1,…,pK. For further details, refer to Hu and Rosenberger (2006) and Rosenberger and Sverdlov (2008). According to Theorem 3.1, Nn;k -E½π k ðγ; ξÞ n
a:s:; k ¼ 1; …; K:
Given the value of γ, allocation function π k , and the distribution of covariate ξ, the limit allocation proportions can be computed according to the formula on the right-hand side of the foregoing equation. Further, the asymptotic variabilities of the sample allocation proportions and the estimator b γ of the parameter can also be computed following the steps in Remark 3.1. However, because the asymptotic variabilities depend on the allocation functions π k , conditional Fisher information matrices I k ðαk ; βk1 ; βjξÞ, and the partial derivatives of E½π k ðγ; ξÞ, k ¼ 1; …; K, they are not simple closed-form expressions. For the dichotomy response case, the failure rate is a good measure of ethics. The intention of most adaptive designs is to minimize the number of patients receiving the inferior treatment. The theorem that enables us to evaluate the asymptotic failure rate is outlined as follows. Theorem 3.3. Assume that there are n patients in the entire clinical trial, and let Sn be the total number of failures. The failure rate is then K Sn 1 n ∑ ∑ 1 Y m;k X m;k ; ¼ nm¼1k¼1 n
and with a probability of 1 K Sn - ∑ E qk αk ; βk1 ; β ξÞπ k ðγ; ξÞ n k¼1
as n-1:
ð10Þ
This limit is the average failure rate. γ m 1 ; ξm Þ, we have Proof. Write θk ¼ ðαk ; βk1 ; βÞ. From the fact that E½1 Y m;k jF m 1 ; ξm ¼ qk ðθk jξm Þ and E½X m;k jF m 1 ; ξm ¼ π k ðb K Sn 1 n ∑ ∑ 1 Y m;k X m;k E 1 Y m;k X m;k jF m 1 ¼ nm¼1k¼1 n K 1 n ∑ ∑ Eξ qk θk ξÞπ k b þ γ m 1; ξ ; nm¼1k¼1
where Eξ ðÞ denotes the expectation with respect to ξ. The first term converges to 0 almost surely because the summation over m is a summation of bounded martingale differences. For the second term, the application of (4) gives Eξ ½qk ðθk jξÞπ k ðb γ m 1 ; ξÞ-E½qk ðθk jξÞπ k ðγ; ξÞ
a:s:;
and thus the second term converges almost surely to ∑Kk ¼ 1 E½qk ðθk jξÞπ k ðγ; ξÞ. Given the value of γ, allocation function π k , and the distribution of covariate ξ, the average failure rate can be computed according to the formula on the right-hand side of (10). According to (10), if the allocation function π k ðÞ is chosen as a decreasing function of success probability pk, the CARA design can reduce the failure rate. □ 4. An illustrative example We now present an illustrative example using logistic regression to explain the mechanisms for applying the theorems in Section 3. Let the responses be dichotomous (for example, success or failure) and, for explanatory purposes, consider the specific case (for which the general case can be easily obtained) in which there are two treatments and the logistic regression model considered in Rosenberger and Sverdlov (2008) is applied. We thus have logitðpk Þ ¼ αk þ βk1 ξ1 þ βk2 ξ2 þβk3 ξ3 ;
ð11Þ
where pk ¼ pk ðαk ; βk jξÞ is the conditional probability of the success of treatment k, and αk is the treatment effect, k ¼1,2. The components ξ1 ; ξ2 ; ξ3 of the covariate vector ξ ¼ ð1; ξ1 ; ξ2 ; ξ3 Þ, which represent sex, age (year), and cholesterol level (mg/dl), are assumed to be independently distributed as Bernouli ð1=2Þ, discrete uniform ½30; 75, and normal with a mean of 200 and a standard deviation of 20, respectively. Departing slightly from the model given in Rosenberger and Sverdlov (2008), we assume here that only sex has an interaction with the treatment, and hence βk2 ¼ β2 and βk3 ¼ β3 , k¼1,2. The parameter vector is reduced to γ ¼ ðα1 ; β11 ; α2 ; β21 ; β2 ; β3 Þ with six components that require estimation. The parameters for this example are selected for comparability with those given in Rosenberger and Sverdlov (2008). Here, let β2 ¼ 0:04 and β3 ¼ 0:012, indicating that the probabilities of success decrease with age and cholesterol level.
158
S.H. Cheung et al. / Journal of Statistical Planning and Inference 149 (2014) 152–161
Table 1 Parameter values and expected probabilities of success for logistic regression model (11). Parameter Model A
αk βk1 E½pk jmale E½pk jfemale E½pk
Model B
Model C
Trt 1
Trt 2
Trt 1
Trt 2
Trt 1
4.600 0.700 0.68 0.52 0.60
4.600 0.700 0.68 0.52 0.60
5.670 0.700 0.85 0.75 0.80
4.600 0.700 0.68 0.52 0.60
4.600 0.700 0.68 0.52 0.60
Trt 2 6.340 2.500 0.35 0.85 0.60
Note: trt, treatment.
For the treatment effects αk ; k ¼ 1; 2, and the covariate sex βk1 , k ¼1,2, we select three sets of parameter values, as shown in Table 1. In Model A, the chosen parameters are identical for both treatments. In Model B, sex has no interaction with treatment, but the treatment effects differ. In the last model, the sex–treatment interaction is considerable. Both the expectation of the unconditional probabilities of success E½pk , k¼1,2, and the conditional probabilities of success, E½pk jmale and E½pk jfemale, for a given sex are also reported in Table 1. Suppose that after m patients have been treated and their responses observed, the MAPE (or MLE) of γ, b γ m , can be computed by fitting the logistic regression model logit½PðY j ¼ 1jξj Þ ¼ α2 þ β21 ξj;1 þ β2 ξj;2 þβ3 ξj;3 þ ðα1 α2 ÞX j;1 þðβ11 β21 Þξj;1 nX j;1 ;
j ¼ 1; …; m;
where Y j ¼ X j;1 Y j;1 þ ð1 X j;1 ÞY j;2 is the observed response. For the prior distribution, we simply assume that all parameters are independent and have the standard normal prior distribution, i.e., WðγÞ ¼ ð1=ð2πÞ3 Þ expf 1=2ðα21 þα22 þβ211 þ β221 þ β22 þ β23 Þg. After the estimator b γ is obtained, the ðm þ 1Þ-th subject with covariate ξm þ 1 is then allocated to treatment 1 with a probability of πðb γ m ; ξm þ 1 Þ, and to treatment 2 with a probability of 1 πðb γ m ; ξm þ 1 Þ. We explore the three choices of π suggested in Rosenberger and Sverdlov (2008), that is,
1. Rosenberger et al.'s (2001b) target π1 ¼
p1 =q1 ; p1 =q1 þp2 =q2
2. the allocation in Rosenberger et al. (2001a), pffiffiffiffiffi p π 2 ¼ pffiffiffiffiffi 1pffiffiffiffiffi ; p1 þ p2 3. the covariate-adjusted version of optimal allocation (Rosenberger and Sverdlov, 2008) pffiffiffiffiffi q p2 π 3 ¼ pffiffiffiffiffi2 pffiffiffiffiffi : q1 p1 þ q2 p2
Here, logitðpk Þ ¼ αk þβk1 ξ1 þ β2 ξ2 þ β3 ξ3 and qk ¼ 1 pk , k¼1,2. We refer to these treatment allocation schemes as CARA1, CARA2, and CARA3, respectively. As reported by Rosenberger et al. (2001b), the CARA1 scheme yields similar power to that of equal allocation, and reduces the proportion of treatment failure by several percentage points. The CARA2 scheme is a covariate-adjusted version of the optimal allocation rule given in Rosenberger et al. (2001a), the original aim of which was to lower the expected number of treatment failures with a fixed asymptotic variance of allocation. The CARA3 scheme is similar to the CARA2 scheme but, instead of fixing the asymptotic variance of the allocation, the variance of the log-odds ratio of the treatment effect is fixed. The complete randomized design (CRD) is also included. Using Theorem 3.1, the asymptotic allocation proportions are computed for treatment 1, and tabulated in Table 2. Also, the average failure rates (FR) for the four treatment allocation schemes are evaluated according to the formula ∑2k ¼ 1 E½qk ðαk ; βk1 ; β2 ; β3 jξÞπ k ðγ; ξÞ, as on the right-hand side of (10), as shown in Table 2. In Model A, the parameters are identical for both treatments, and thus equal asymptotic failure rates are expected for all of the allocation methods. In the other two models, the asymptotic failure rates for the CARA designs are lower than those for the CRD, with CARA1 superior in terms of the probability of success. In the next section, the additional criterion, test power, is used to compare the different treatment allocation schemes, and a simulation study is conducted to supplement the asymptotic findings.
S.H. Cheung et al. / Journal of Statistical Planning and Inference 149 (2014) 152–161
159
Table 2 Asymptotic allocation proportions (Trt 1) and failure rates for logistic regression model (11). Procedure
Model A
CRD CARA1 CARA2 CARA3
B
C
N 1 ðnÞ n
FR
N 1 ðnÞ n
FR
N 1 ðnÞ n
FR
0.50 0.50 0.50 0.50
0.40 0.40 0.40 0.40
0.50 0.74 0.54 0.64
0.30 0.25 0.29 0.27
0.50 0.48 0.51 0.44
0.40 0.29 0.37 0.35
5. Simulation study Consider testing the hypothesis of a treatment effect for male and female patients H 0 : α1 ¼ α2
and
β11 ¼ β21 :
The test used in this simulation is the log-odds ratio test in Rosenberger and Sverdlov (2008). The measure is log ORðξÞ ¼ log
p1 ðθ1 jξÞ=q1 ðθ1 jξÞ ¼ α1 α2 þ β11 β12 ξ1 : p2 ðθ1 jξÞ=q2 ðθ1 jξÞ
The likelihood ratio test is used to test H0. The test statistic is λ¼
maxfLðθÞ : α1 ¼ α2 and β11 ¼ β21 g ; maxfLðθÞ : all θg
where m
K
LðθÞ ¼ ∏ ∏ ½f k ðY j;k jξj ; θk ÞX j;k : j¼1k¼1
Based on the asymptotic normality result of the MLE, the asymptotic distribution of the likelihood ratio test statistic can be derived with standard arguments. Theorem 5.1. Suppose that K¼2 and the conditions in Theorem 3.1 are satisfied. Under H0, we have D
2 log λ-χ 2ð2Þ :
The sample size n is chosen to ensure that the power is roughly 0.8 for Models B and C. The simulation results are obtained based on 10 000 replications. The results for the simulated allocation proportions, failure rates, and power of the asymptotic likelihood ratio test are given in Table 3. The main findings can be summarized as follows. (a) For the treatment proportions and failure rates, the simulated results agree very closely with the asymptotic results in Table 2, which provides a strong justification for using the asymptotic properties derived in Section 3 to evaluate the different CARA designs. This result renders the computation of failure rate and treatment allocation estimates much easier in research that employs logistic models. (b) The superiority of the CARA designs over the CRD is substantiated by the simulation results. The CARA designs all have a test power comparable to that of the CRD, but with significant ethical savings in terms of failure probability. (c) Although the test power of CARA2 is relatively higher than that of CARA1 and CARA3, its failure rate is also higher on average. Thus, if treatment failure is of prime concern, then CARA2 is not the preferred option, and CARA1 seems to be the best choice. However, the test power of CARA3 compared with that of CARA1 is slightly higher for Model B, as also indicated by the simulation results in Rosenberger and Sverdlov (2008). It is thus to be considered as a potential candidate even though the differences are quite small. (d) For all of the designs, the level of significance is controlled at the nominal level (0.05) when the null hypothesis is true in Model A.
6. Discussion and conclusion Adaptive designs have been verified as a superior alternative to complete randomized treatment allocation schemes in sending more patients to better treatments in clinical trials. The ethical savings can be further improved by using covariate information. A general framework encompassing CARA designs is given in Zhang et al. (2007). However, their framework
160
S.H. Cheung et al. / Journal of Statistical Planning and Inference 149 (2014) 152–161
Table 3 Simulated proportion, failure rate, and power for logistic regression model (11). Model
Procedure
N 1 ðnÞ (SD) n
FR (SD)
Power
Model A (n¼200)
CRD CARA1 CARA2 CARA3
0.50 0.50 0.50 0.50
(0.033) (0.104) (0.040) (0.049)
0.40 0.40 0.40 0.40
(0.035) (0.034) (0.035) (0.035)
0.05 0.05 0.05 0.05
Model B (n¼200)
CRD CARA1 CARA2 CARA3
0.50 0.72 0.54 0.63
(0.033) (0.089) (0.039) (0.060)
0.30 0.25 0.29 0.28
(0.032) (0.035) (0.031) (0.034)
0.79 0.78 0.84 0.80
Model C (n¼100)
CRD CARA1 CARA2 CARA3
0.50 (0.045) 0.49 (0.094) 0.51 (0.054) 0.46 (0.062)
0.40 0.32 0.38 0.37
(0.049) (0.050) (0.047) (0.054)
0.85 0.85 0.90 0.85
Note: SD, simulated standard deviation.
does not cover models in which some of the treatment-by-covariate interactions are absent, which means that important inferences concerning the testing of main and covariate effects cannot be made. The primary contribution of this paper is to address this deficiency in Zhang et al. (2007) and to provide the required theoretical theorems. The second important contribution of this paper is the generation of useful asymptotic properties related to treatment proportions and failure rates for generalized linear models (logistic regression models, in particular). Most previous studies report treatment proportions and failure rates using simulation results, with no systematic development of related theorems before this paper and the work of Zhang et al. (2007). Finally, our simulation study confirms the applicability of CARA designs and provides insights into the specific CARA design choice. The advantage of using CARA designs is that they can reduce the expected failure rate which can be reduced while maintaining an adequate test power. As Hu (2012) points out, CARA designs play an important role in clinical trials of personalized medicine because of their efficient and ethical properties. The results in this paper provide the theoretical foundation for the hypothesis testing of main effects, covariate effects, or their interactions in clinical trials using CARA designs. Such a foundation ensures the applicability of CARA designs in clinical trials, particularly in clinical trials of personalized medicine (Bonetti and Gelber, 2004; Royston and Sauerbrei, 2008; Hu, 2012). Although the focus of the last part of this paper is on logistic regression, the spectrum of the asymptotic results presented in Section 3 is much wider, and readers will find the paper useful for other generalized linear models.
Acknowledgments The authors are grateful to the referee and the associate editor for providing their helpful comments, which have led to the significant improvement of this paper. Cheung acknowledges support from the Hong Kong RGC General Research Fund, Grant no. CUHK400612. Zhang's research is supported by grants from the National Natural Science Foundation of China (No. 11225104), the Natural Science Foundation of Zhejiang Province (No. R6100119) and the Fundamental Research Funds for the Central Universities.
Appendix A. Technical proofs The proofs of Theorems 3.1 and 3.2 are given as follows. Recall that γ ¼ ðα1 ; β11 ; …; αK ; βK1 ; βÞ. Let the matrix D be defined as in Remark 3.1. Denote θk ¼ ðαk ; βk1 ; βÞ, θ ¼ ðθ1 ; …; θK Þ, bm;k ¼ ðb bm ¼ ðb bm;K Þ. Then θ ¼ γDT and b θ α m;k ; b β m;k1 ; b β m Þ and θ θ m;1 ; …; θ θm ¼ b γ m DT . For the observations X 1 ; …; X m , Y 1 ; …; Y m , ξ1 ; …; ξm up to stage m, the likelihood function for θ is m
K
K
m
K
LðθÞ ¼ ∏ ∏ ½f k ðY j;k jξj ; θk ÞX j;k ¼ ∏ ∏ ½f k ðY j;k jξj ; θk ÞX j;k ≔ ∏ Lk ðθk Þ; j¼1k¼1
k¼1j¼1
with log Lk ðθk Þ p ∑m j ¼ 1 X j;k fY j;k ak ðμj;k Þg; K
~ ¼ LðγDT Þ ¼ ∏ Lk ððαk ; β ; βÞÞ: LðγÞ k1 k¼1
μj;k ¼ hk ðθTk ξj Þ;
k¼1
k ¼ 1; 2; …; K. The likelihood function for γ is
S.H. Cheung et al. / Journal of Statistical Planning and Inference 149 (2014) 152–161
161
~ The MAPE b γ m of γ is that which maximizes WðγÞLðγÞ over γ and is a solution of the equation ~ γ Þ ∂ log Wðb ∂ log Lðb γ mÞ m þ ¼ 0: ∂γ ∂γ Applying Taylor's theorem yields 8 2 3 9 γ þ tðbγ m γÞ Z 1 2 = ~ ~ ~ <∂2 log LðγÞ ∂ log LðγÞ ∂ log LðyÞ γ mÞ 4 5 dt þ ∂ log Wðb þ b γm γ ¼ 0; þ 2 2 : ; ∂γ ∂γ ∂γ ∂y 0 γ where f ðxÞjba ¼ f ðbÞ f ðaÞ. It is easy to see that ~ ∂ log LðγÞ ∂ log LðθÞ ¼ D; ∂γ ∂θ
~ ∂2 log LðγÞ ∂2 log LðθÞ ¼ DT D: ∂γ 2 ∂θ2
Note that ∂ log LðθÞ ¼ ∂θ
m
∑ X j;1
j¼1
m ∂ log f 1 ðY j;1 jξ; θ1 Þ ∂ log f K ðY j;K jξ; θK Þ ; …; ∑ X j;K ∂θ1 ∂θK j¼1
!
and 1 ∂2 log LðθÞ - diagðI 1 ; …; I K Þ m ∂θ2
a:s:
Following a similar argument to (A.15) in Zhang et al. (2007), we have ~ 1 ∂ log LðγÞ 1 b I γ 1 ð1 þ oð1ÞÞ þO γ m γ ¼ m ∂γ m
1 ∂ log LðθÞ 1 DI γ ð1 þ oð1ÞÞ þ o m 1=2 ¼ a:s: m ∂θ and
1 ∂ log LðθÞ 1 T b DI γ D ð1 þoð1ÞÞ þo m 1=2 θm θ ¼ m ∂θ
a:s:
The rest of the proof is similar to that of Theorems 2.1 and 2.2 in Zhang et al. (2007), and is thus omitted here. Remark A.1. In general, if θ ¼ θðγÞ is a twice differentiable function of γ ¼ ðγ 1 ; …; γ b Þ (b r dK), then Theorems 3.1 and 3.3 remain true with D ¼ ∂θðγÞ=∂γ if rankðDÞ ¼ b. References Atkinson, A.C., 1982. Optimal biased coin designs for sequential clinical trials with prognostic factors. Biometrika 69, 61–67. Atkinson, A.C., 1999. Optimal biased-coin designs for sequential treatment allocation with covariate information. Statist. Med. 18, 1741–1752. Atkinson, A.C., 2002. The comparison of designs for sequential clinical trials with covariate information. J. Roy. Statist. Soc. A 165, 349–373. Bonetti, M., Gelber, R.D., 2004. Patterns of treatment effects in subsets of patients in clinical trials. Biostatistics 5, 465–481. Fotouhi, A.R., 2008. Modelling overdispersion in longitudinal count data in clinical trials with application to epileptic data. Contemp. Clinical Trials 29, 547–554. Hart, M.D., Halperin, J.L., Pearce, L., Anderson, D.C., Kronmal, R.A., McBride, R., Nasco, E., Sherman, D., Talber, R.L., Marler, J.R., 2003. Lessons from the stroke prevention in atrial fibrillation trials. Ann. Internal Med. 138, 831–838. Hu, F., 2012. Statistical issues in trial design and personalized medicine. Clin. Invest. 2, 121–124. Hu, F., Rosenberger, W.F., 2006. The Theory of Response-Adaptive Randomization. Wiley, New York. Rosenberger, W.F., 1996. New directions in adaptive designs. Statist. Sci. 11, 137–149. Rosenberger, W.F., Lachin, J.M., 2002. Randomization in Clinical Trials: Theory and Practice. Wiley, New York. Rosenberger, W.F., Sverdlov, O., 2008. Handling covariates in the design of clinical trials. Statist. Sci. 23, 404–419. Rosenberger, W.F., Stallard, N., Ivanova, A., Harper, C., Ricks, M., 2001a. Optimal adaptive designs for binary response trials. Biometrics 57, 909–913. Rosenberger, W.F., Vidyashankar, A.N., Agarwal, D.K., 2001b. Covariate-adjusted response-adaptive designs for binary response. J. Biopharm. Statist. 11, 227–236. Royston, P., Sauerbrei, W., 2008. Interactions between treatment and continuous covariates: a step toward individualizing therapy. J. Clinical Oncol. 26, 1397–1399. SPAF Investigators, 1990. Design of multicenter randomized trial for the Stroke Prevention in Atrial Fibrillation study. Stroke 21, 538–545. Taves, D.R., 2010. The use of minimization in clinical trials. Contemp. Clinical Trials 31, 180–184. Zhang, L.X., Hu, F., Cheung, S.H., Chan, W.S., 2007. Asymptotic properties of covariate-adjusted response-adaptive designs. Ann. Statist. 35, 1166–1182.