Statistics and Probability Letters 82 (2012) 865–870
Contents lists available at SciVerse ScienceDirect
Statistics and Probability Letters journal homepage: www.elsevier.com/locate/stapro
A note on the construction of locally D- and DS -optimal designs for the binary logistic model with several explanatory variables M. Gaëtan Kabera a,∗ , Linda M. Haines b , Principal Ndlovu c a
Biostatistics Unit, Medical Research Council, Overport 4067 Durban, South Africa
b
Department of Statistical Sciences, University of Cape Town, Rondebosch 7700, South Africa
c
Department of Statistics, University of South Africa, Pretoria 0001, South Africa
article
info
Article history: Received 21 July 2011 Received in revised form 24 January 2012 Accepted 25 January 2012 Available online 7 February 2012 Keywords: D-optimality DS -optimality k-variable binary logistic model Standardized variance function
abstract An explicit formulation of D- and DS -optimal designs for the binary logistic regression model in several variables, without interaction between the variables, is presented. The proof of the optimality of the designs is ‘‘traditional’’ in the sense that it invokes the Equivalence Theorem, and builds on the earlier work of Sitter and Torsney (1995b) and Torsney and Gunduz (2001) and complements that given by Yang et al. (2011) © 2012 Elsevier B.V. All rights reserved.
1. Introduction There is some current interest in the problem of constructing D-optimal designs for binary regression models in several explanatory variables, a problem which Dror and Steinberg (2006) describe as being far from trivial. Studies for the twovariable case include those by Atkinson and Haines (1996), Haines et al. (2007), Jia and Myers (2001) and Sitter and Torsney (1995a). For the more general k-variable model with k ≥ 2, Sitter and Torsney (1995b) and Torsney and Gunduz (2001) adopt a geometric approach and provide a partial formulation of the requisite D-optimal designs, while Woods et al. (2006) and Dror and Steinberg (2006) address issues relating to the numerical construction of these designs. Most recently Yang et al. (2011) have introduced a particularly elegant and powerful approach to obtaining explicit formulations of the D-, Aand E-optimal designs for the multi-variable logit and probit regression models, which holds for certain parameterizations and which is based on the Loewner ordering of the attendant information matrices. In this note a proof of the D- and DS -optimality of specified designs for the classical logistic regression model in several variables is presented. The proof is ‘‘traditional’’ in the sense that it invokes the Equivalence Theorem, and builds on the earlier work of Sitter and Torsney (1995b) and Torsney and Gunduz (2001) and is more direct than that given by Yang et al. (2011). The model setting is introduced in Section 2. Section 3 contains the main theorem, a corollary relating to DS optimality and some remarks on the results. Conclusions and pointers for future research are given in Section 4. 2. Preliminaries Consider a binary response variable Y which follows a Bernoulli distribution with probability of success p for the setting of k explanatory variables represented by the vector x = (x1 , x2 , . . . , xk )T . The two possible observations of Y at x can be
∗ Correspondence to: Department of Statistics, University of Johannesburg, PO Box 524, Auckland Park 2006, Johannesburg, South Africa. Tel.: +27 11 559 2478; fax: +27 11 559 2499. E-mail addresses:
[email protected],
[email protected] (M.G. Kabera),
[email protected] (L.M. Haines),
[email protected] (P. Ndlovu). 0167-7152/$ – see front matter © 2012 Elsevier B.V. All rights reserved. doi:10.1016/j.spl.2012.01.023
866
M.G. Kabera et al. / Statistics and Probability Letters 82 (2012) 865–870
coded as 1 for a success or positive response and 0 for a failure or negative response. If the probability of success p at x is described by the binary logistic model, then p = E [Y = 1|x] =
1 1 + exp(− xT β)
or equivalently u = logit(p) = ln
p
= xT β,
1−p
where β = (β0 , β1 , β2 , . . . , βk )T is the vector of model parameters and x = [1, xT ]T = (1, x1 , x2 , . . . , xk )T (see Dobson and Barnett, 2008). The Fisher information matrix for β for a single observation x = (x1 , x2 , . . . , xk )T is then given by eu
M (x; β) = p(1 − p) x xT =
(1 +
1 T 1x . x
)
eu 2
Now consider an approximate design ξx which puts weights λi on the distinct points xi = (xi1 , xi2 , . . . , xik )T for i = 1, 2, . . . , r with r ≥ k + 1, expressed as
ξx =
x1
λ1
x2
λ2 r
where 0 < λi < 1,
... ...
i=1
xr
,
λr
λi = 1. Then the information matrix for this design is given by
r
M (ξx ; β) =
λi M (xi ; β)
i =1
and depends on the value of the parameters β. In the present study, locally D-optimal designs, that is designs which maximize the determinant of the information matrix, namely |M (ξx ; β)|, over all possible designs for a best guess of β, are sought (Chernoff, 1953). 3. D-optimal design for the k-variable binary logistic model It is convenient, following Ford et al. (1992) and Torsney and Gunduz (2001), to invoke the canonical transformation of design points x to points u = (u1 , . . . , uk ) according to the relation 1
0
b 1 20 u= = . u .
β1
β2
b21
.
.. .
b22
bk0
bk1
β0
0
.. .
bk2
... ... ... .. .
...
0
βk
b2k 1 = B x, . x
..
bkk
where B is a nonsingular matrix of order k + 1 with elements corresponding to constant coefficients. Note that u1 , . . . , uk are linear functions of x1 , . . . , xk and thus define k independent hyperplanes in Rk . Then, for a single observation x and attendant u, M (u) = BM (x; β)BT =
eu1
1 T 1u
(1 + eu1 )2 u
and it follows immediately from the invariance of D-optimality under linear transformation of the explanatory variables that the problem of finding designs which maximize |M (ξx ; β)| is equivalent to that of finding designs of the form
ξu =
u1
λ1
u2
λ2
... ...
ur
λr
which maximize |M (ξu )| where M (ξu ) = i=1 λi M (ui ). Furthermore it is well-established that D-optimal designs for the binary logistic model in k explanatory variables can only be constructed provided lower and upper bounds are placed on the k − 1 linearly independent hyperplanes u2 , . . . , uk in Rk and, again since D-optimality is invariant under linear transformation, these constraints can be taken without loss of generality to be of the form −bj ≤ uj ≤ bj , j = 2, . . . , k. The main result can now be formulated and proved as follows.
r
Theorem. Consider the k-variable binary logistic model with u1 = β0 + β1 x1 + · · · + βk xk unrestricted over the real line and with the restrictions −bj ≤ uj ≤ bj placed on uj for j = 2, . . . , k, where u1 , u2 , . . . , uk are linearly independent hyperplanes in Rk . Then the design ξu∗ which puts equal weight on the support points located at the 2k k-tuples (±u∗k , ±b2 , . . . , ±bk ) where u∗k is the unique positive solution to the equation fk (u) = (k + 1)u + 2 − ((k + 1)u − 2)eu = 0 is D-optimal on the (u1 , u2 , . . . , uk )-design space (−∞, ∞) × [−b2 , b2 ] · · · × [−bk , bk ].
M.G. Kabera et al. / Statistics and Probability Letters 82 (2012) 865–870
867
Proof. It is straightforward to show that the design which puts equal weights on the 2k points (±u, ±b2 , . . . , ±bk ) with u ≥ 0 has information matrix
1 0 0
eu
M (ξu ) =
(1 + eu )2 .. .
0 u2 0
0 0 b22
0
0
.. .
0
... ... ... .. .
.. .
...
0 0 0
.. .
b2k
with determinant D=
e(k+1)u
(1 + eu )2(k+1)
u2 b22 · · · b2k
and hence that dD du
e(k+1)u
=
(1 +
u[(k + 1)u + 2 − ((k + 1)u − 2)eu ] b22 · · · b2k .
)
eu 2k+3
Thus = 0 for u = 0 and for u satisfying the equation fk (u) = (k + 1)u + 2 − ((k + 1)u − 2)eu = 0. Now fk (0) = 4, ′ fk (0) = 2 > 0, fk (u) → −∞ as u → ∞ and fk′′ (u) = −eu (2k + (k + 1)u) < 0, so fk (u) is concave for all k = 1, 2, . . .. dD du
2
b2 ···b2
It thus follows that fk (u) has exactly one positive root, written as u∗k > 0. Furthermore dduD2 = 222k+1k > 0 at u = 0, so D is a minimum at u = 0 and, since D → 0 as u → ∞, D is necessarily a maximum at u∗k and, by symmetry, at −u∗k . Thus the design ξu∗ , as specified in the theorem, is a candidate D-optimal design. A specific condition on ku∗k 2 is required later in the proof but is conveniently demonstrated here. Now fk (1.6) ≈ 5.5812 − 6.3249 k < 0 and thus u∗k < 1.6 for all k = 1, 2, . . .. Also, since fk (u∗k ) = 0, it follows that
∗2
∗
kuk = uk
∗
2(euk + 1) ∗
(euk − 1)
− uk . ∗
Now consider the function t ( u) = u
2(eu + 1)
(eu − 1)
for all u > 0
−u
which is equal to ku∗k 2 at u∗k for k = 1, 2, . . .. Then t (0) → 4 as u ↓ 0 and it can be shown, by some straightforward but tedious algebra, that t (u) is a concave decreasing function on u ∈ [0, ∞). Thus t (u) ≤ 4 for all u ≥ 0. Furthermore t (u) = 2 when u ≈ 1.7126 > 1.6, so t (u) ≥ 2 for u < 1.6. Thus the double inequality 2 ≤ ku∗k 2 ≤ 4 holds for all k = 1, 2, . . .. It now remains to prove that the design ξu∗ is D-optimal over the set of all possible designs. Consider invoking the Equivalence Theorem for D-optimality (Atkinson et al., 2007) and in particular demonstrating that the directional derivative function, φ(u, ξu∗ ), for the design ξu∗ in the direction of any k-tuple u = (u1 , u2 , . . . , uk ) ∈ (−∞, ∞) × [−b2 , b2 ] × · · · × [−bk , bk ] is such that
φ(u, ξu∗ ) = (k + 1) − tr[M −1 (ξu∗ )M (u)] ≥ 0 or, equivalently, that the standardized variance function d(u, ξu∗ ) = tr[M −1 (ξu∗ )M (u)] is less than or equal to k + 1, with equality holding at the support points of ξu∗ . Now
∗
d(u, ξu ) = ∗
eu1 (1 + euk )2
u21
u22
u2k
1 + ∗2 + 2 · · · + 2 b2 bk uk
∗
euk (1 + eu1 )2
for all points (u1 , u2 , . . . , uk ) in the design space. Clearly d(u, ξu∗ ) is a maximum over uj ∈ [−bj , bj ] when uj = ±bj for j = 2, . . . , k. Thus it remains to show that the condition ∗
gk (u1 ) =
eu1 (1 + euk )2 ∗
euk (1 + eu1 )2
u21
k + ∗2 uk
≤k+1
holds for all u1 ∈ (−∞, ∞) and for all k = 1, 2, . . .. Now gk (u1 ) is an even function in u1 , so any results for u1 > 0 also hold for u1 < 0. Also gk (u1 ) = k + 1 at u1 = ±u∗k , as required. The derivative of gk (u1 ) with respect to u1 is given by ∗
gk (u1 ) = ′
2
2
eu1 (1 + euk )2 [(ku∗k + u21 + 2u1 ) − (ku∗k + u21 − 2u1 )eu1 ] ∗
2
euk u∗k (1 + eu1 )3
868
M.G. Kabera et al. / Statistics and Probability Letters 82 (2012) 865–870
a
b
Fig. 1. Plots of (a) gk (u1 ) and (b) hk (u1 ) with k = 2.
which has the same roots and sign as the function 2
2
hk (u1 ) = (ku∗k + u21 + 2u1 ) − (ku∗k + u21 − 2u1 )eu1 . Clearly gk′ (0) = 0 = hk (0) and gk′ (u∗k ) = 0 = hk (u∗k ) and thus u1 = 0 and u1 = ±u∗k are stationary points of the function gk (u1 ) and solutions for u1 to hk (u1 ) = 0. In addition, since ku∗k 2 ≤ 4, it follows that ∗
gk′′ (0) =
2
(1 + euk )2 (4 − ku∗k ) 2
∗
8u∗k euk
>0
and thus that gk (u1 ) has a minimum at u1 = 0. Proving that the global maximum of gk (u1 ) occurs at u1 = u∗k for u1 ≥ 0 can thus be achieved by showing that hk (u1 ), which has the same sign as the derivative gk′ (u1 ), is greater than 0 for u1 ∈ [0, u∗k ) and less than 0 for u1 ∈ (u∗k , ∞). Now hk (0) = h(u∗k ) = 0 and hk (u1 ) → −∞ as u1 → ∞. Furthermore h′k (0) = 4 − ku∗k 2 > 0, so clearly hk (u1 ) has at least one maximum on the interval [0, u∗k ]. Consider writing the derivative of hk (u1 ) as h′k (u1 ) = s1 (u1 ) − s2 (u1 ) where 2
2
s1 (u1 ) = 2(1 + u1 ) and s2 (u1 ) = (ku∗k + u21 − 2)eu1 . Then, since ku∗k 2 < 4, it follows that s1 (0) = 2 > s2 (0) = ku∗k − 2. Clearly, s1 (u1 ) is a linearly increasing function on [0, ∞) since s′1 (u1 ) = 2 > 0. Furthermore, s1 (u1 ) < s2 (u1 ) as u1 → ∞ 2
since limu1 →∞ [s2 (u1 ) − s1 (u1 )] = limu1 →∞ u21 eu1 = ∞. In addition, s′2 (u1 ) = (ku∗k + u21 + 2u1 − 2)eu1 ≥ (u21 + 2u1 )eu1 ≥ 0 ∗2
∗2
since kuk ≥ 2, and s′′2 (u1 ) = (kuk + u21 + 4u1 )eu1 > 0. This means that s2 (u1 ) is an exponentially increasing convex function on [0, ∞). Thus, s1 (u1 ) and s2 (u1 ) intersect exactly once on [0, ∞), indicating that h′k (u1 ) has one root and thus that hk (u1 ) has one stationary point which is necessarily a maximum lying between 0 and u∗k . It now follows that hk (u1 ) and hence the slope gk′ (u1 ) change sign from positive to negative as u1 > 0 increases exactly once at the point u∗k , as required. This result is illustrated in the plots of gk (u1 ) and hk (u1 ) against u1 for k = 2 shown in Fig. 1(a) and (b) respectively. Consider now the matrix B partitioned as
1 b
0T B22
with the information matrices M (ξx ; β) and M (ξu ) conformably
partitioned. Then the scalar (1, 1)-elements of the information matrices, namely m11 (ξx ; β) and m11 (ξu ), are equal and the DS -optimality criterion for the precise estimation of the parameters (β1 , . . . , βk ), that is the set of parameters excluding the intercept term β0 , can be written as 1 |M (ξu )| |M (ξx ; β)| = 2 × . m11 (ξx ; β) |B | m11 (ξu ) Thus the DS -optimal design in terms of the variables x is equivalent to that in terms of the transformed variables u and the following corollary can now be introduced. Corollary. The DS -optimal design for the precise estimation of the parameter set β = (β1 , . . . , βk ) in the k-variable logistic regression model puts equal weight on the support points (±u∗k−1 , ±b2 , . . . , ±bk ). Proof. The proof follows immediately from that of the theorem. Specifically consider a design ξu which puts equal weights on the points (±u, ±b2 , . . . , ±bk ). Then the DS -optimality criterion for ξu is given by
|M (ξu )| euk = u2 b22 · · · b2k m11 (ξu ) (1 + eu )2k
M.G. Kabera et al. / Statistics and Probability Letters 82 (2012) 865–870
869
and is maximized for the design ξu = ξu∗ with u = u∗k−1 . Furthermore the DS -optimal analogue of the variance function can be expressed as u∗ k−1 2
eu1 (1 + e
dS (u, ξu∗ ) =
u∗ e k−1
)
u21
(u∗k−1 )2
(1 + eu1 )2
+
u22 b22
··· +
u2k
b2k
and satisfies the inequalities dS (u, ξu∗ ) ≤
eu1 (1 + e u∗ e k−1
u∗ k−1 2
)
(1 + eu1 )2
u21
k−1+
(u∗k−1 )2
≤ k.
Thus the generalized Equivalence Theorem for DS -optimality holds (Atkinson et al., 2007) and ξu∗ is DS -optimal.
A series of remarks aimed at putting the nature of the proof of the theorem given here into context are now presented. Remark 1. The constraints placed on the variables uj , j = 2, . . . , k, are not practically useful. However constraints on the original explanatory variables x2 , . . . , xk can be readily introduced into the theorem as follows. Suppose that the constraints aj ≤ xj ≤ cj where aj and cj are constants are placed at the xj , j = 2, . . . , k, and suppose also that these constraints are not necessarily symmetric. Then the variables uj = xj −
−
cj −aj 2
≤ uj ≤ 1
cj − a j 2
β0 c +a 2 2 − 2 c3 + a3 − 2 .. . ck + ak − 2
cj + a j 2
can be introduced such that the symmetric constraints
apply, that is the matrix B is taken to be
β1
β2
β3
0
... ...
0
1
0
...
0
0
1
...
.. .
.. .
.. .
.. .
0
0
0
...
0
0
0
βk 0 0, 1
and thus the results of the theorem hold. Remark 2. Yang et al. (2011) provide an elegant, powerful and very different approachto deriving the locally D-optimal βk−1 β0 ∗ design ξu specified in the above theorem which is based on the reparameterization θ = β , . . . , β , βk and on a series k
k
of lemmas and theorems developed by invoking the Loewner ordering of the information matrices for θ . The D-optimal design for the parameterization β then follows directly from that for θ on observing that the D-criterion is invariant under linear transformation of the explanatory variables. In contrast, the proof presented in this paper is based on the celebrated Equivalence Theorem of Kiefer and Wolfowitz (1960) and builds on the earlier results of Sitter and Torsney (1995a,b) and Torsney and Gunduz (2001) relating to the broad form of the locally D-optimal designs for binary regression in two or more variables. In essence the proof in this paper brings closure to some of the latter results. Note however that the approach of Yang et al. (2011) also yields D-optimal designs for the parameters β in the probit case whereas the methodology presented here cannot be invoked since the probit function is not analytically tractable. Remark 3. The approach used in the proof given in this study represents an immediate extension to that given in Kabera (2009) for k = 1 and k = 2. In fact Sebastiani and Settimi (1997) note that a proof for k = 1 with u1 = β0 +β1 x unconstrained is given in the Ph.D. Thesis of White (1975) but this proof is not detailed and would seem to rely on numerical calculations. It is also worth noting that Sebastiani and Settimi (1997) provide explicit forms for D-optimal designs for binary logistic regression in one variable with the logit u1 bounded either above or below or both using proofs that are similar in spirit to the proof developed here. Remark 4. A small but interesting result emanates from the proof of the theorem and in particular from the form of fk (u). ∗
∗
(k+1)u∗ +2
Specifically it follows from the fact that fk (u∗k ) = ((k + 1)u∗k + 2) − ((k + 1)u∗k − 2)euk = 0 that euk = (k+1)u∗k −2 and hence, k by substitution, that ∗ ∗ fk+1 (u∗k ) = ((k + 2)u∗k + 2) − ((k + 2)u∗k − 2)euk = u∗k (1 − euk ) = −
4u∗k
(k + 1)u∗k − 2
.
Furthermore, since fk (u∗k ) = 0 and fk k+2 1 = 4 > 0, u∗k > k+2 1 . Thus fk+1 (u∗k ) < 0 and, since fk+1 (u∗k+1 ) = 0, u∗k+1 < u∗k for k = 1, 2, . . .. Note also that u∗k → 0 as k → ∞.
870
M.G. Kabera et al. / Statistics and Probability Letters 82 (2012) 865–870
4. Conclusions This paper provides an explicit solution to the problem of constructing D- and DS -optimal designs for the parameters
β0 , β1 , . . . , βk of the k-variable binary logistic regression model subject to restrictions on a specified set of k − 1 linearly independent hyperplanes and, more specifically, on the explanatory variables x2 , . . . , xk in the design space Rk . The resultant designs comprise 2k equally weighted points of the form (±u∗k , ±b2 , . . . , ±bk ) and (±u∗k−1 , ±b2 , . . . , ±bk ) respectively. The analytic proof presented here is based on the Equivalence Theorem and some straightforward algebra and complements and is more direct than that given by Yang et al. (2011). A number of open problems relating to the construction of optimal designs for binary regression models in several variables emerge immediately from this and other studies. These relate in particular to A- and E-optimal designs, to link functions other than the logit and probit, such as the complementary log–log and the double exponential, to restrictions placed on the hyperplane u1 = β0 + β1 x1 + · · · + βk xk and to models which include interactions in the explanatory variables. Some progress on the topic of imposing restrictions on u1 has been made by Haines et al. (2007) and further work is reported in Kabera (2009), but much remains to be done. Acknowledgments The authors would like to thank the Medical Research Council, the University of KwaZulu-Natal, the University of Cape Town, the University of South Africa, the National Research Foundation and the University of Johannesburg for financial support. The authors are also very grateful to two anonymous referees for their valuable suggestions. References Atkinson, A.C., Donev, A.N., Tobias, R.D., 2007. Optimum Experimental Designs, With SAS. Oxford University Press, Oxford. Atkinson, A.C., Haines, L.M., 1996. Designs for nonlinear and generalized linear models. In: Ghosh, S., Rao, C.R. (Eds.), Handbook of Statistics. Vol. 13. B.V. Elsevier Science, Amsterdam, pp. 437–475. Chernoff, H., 1953. Locally optimal designs for estimating parameters. Annals of Mathematical Statistics 24, 586–602. Dobson, J.A., Barnett, A.G., 2008. An Introduction to Generalised Linear Models, Third ed. Chapman and Hall, Boca Raton, Florida. Dror, H.A., Steinberg, D.M., 2006. Robust experimental design for multivariate generalized linear models. Technometrics 48, 520–529. Ford, I., Torsney, B., Wu, C.F., 1992. The use of a canonical form in the construction of locally optimal designs for non-linear problems. Journal of the Royal Statistical Society, Series B 54, 569–583. Haines, L.M., Kabera, M.G., Ndlovu, P., O’Brien, T.E., 2007. D-optimal designs for logistic regression in two variables. In: Lopez-Fidalgo, J., Rodrigez-Diaz, J.M., Torsney, B. (Eds.), MODA8-Advances in Model-Oriented Designs and Analysis. Physica-Verlag, Heidelberg, pp. 91–98. Jia, Y., Myers, R., 2001. Optimal experimental designs for two-variable logistic models. Technical Report, Department of Statistics, Virginia Polytechnic Institute and State University Blacksburg, Virginia, USA. Kabera, M.G., 2009. D-optimal designs for drug synergy. Ph.D. Thesis, School of Statistics and Actuarial Science, University of KwaZulu-Natal. Kiefer, J., Wolfowitz, J., 1960. The equivalence of two extremum problems. Canadian Journal of Mathematics 12, 363–366. Sebastiani, P., Settimi, R., 1997. A note on d-optimal designs for a logistic regression model. Journal of Statistical Planning and Inference 59, 359–368. Sitter, R.R., Torsney, B., 1995a. Optimal design for binary response experiments with two design variables. Statistica Sinica 5, 405–419. Sitter, R.R., Torsney, B., 1995b. D-optimal designs for generalized linear models. In: Kitsos, C.P., Müller, W.G. (Eds.), Proceedings of MODA 4. Physica Verlag, Heidelberg, pp. 87–102. Torsney, B., Gunduz, N., 2001. On optimal designs for high dimensional binary regression models. In: Atkinson, A.C., Bogacka, B., Zhigljavsky, A. (Eds.), Optimal Design 2000. Kluwer, Dordrecht. White, L.W., 1975. The optimal design of experiments for estimation in nonlinear model. Ph.D. Thesis, University of London. Woods, D.C., Lewis, S.M., Eccleston, J.A., Russel, K.G., 2006. Designs for generalized linear models with several variables and model uncertainty. Technometrics 48 (2), 284–292. Yang, M., Zhang, B., Huang, S., 2011. Optimal designs for generalized linear models with multiple design variables. Statistical Sinica 21, 1415–1430.