Journal Pre-proof Optimal designs for binary response models with multiple nonnegative variables Shih-Hao Huang, Mong-Na Lo Huang, Cheng-Wei Lin
PII: DOI: Reference:
S0378-3758(19)30081-3 https://doi.org/10.1016/j.jspi.2019.09.006 JSPI 5763
To appear in:
Journal of Statistical Planning and Inference
Received date : 22 May 2019 Accepted date : 14 September 2019 Please cite this article as: S.-H. Huang, M.-N. Lo Huang and C.-W. Lin, Optimal designs for binary response models with multiple nonnegative variables. Journal of Statistical Planning and Inference (2019), doi: https://doi.org/10.1016/j.jspi.2019.09.006. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
© 2019 Elsevier B.V. All rights reserved.
Journal Pre-proof
of
Optimal designs for binary response models with multiple nonnegative variables Shih-Hao Huanga , Mong-Na Lo Huangb1 , Cheng-Wei Linb a
Department of Mathematics, National Central University, Taiwan. Department of Applied Mathematics, National Sun Yat-sen University, Taiwan.
p ro
b
Abstract
Pr e-
In this work we investigate locally optimal designs for binary response models with multiple nonnegative explanatory variables. We characterize an essentially complete class with respect to Schur ordering, in which a scaled φp -optimal design exists for any given p ∈ [−∞, 1]. In particular, we explicitly identify D-optimal designs for logit, probit, double exponential, double reciprocal models within the class. Keywords: D-optimality; Essentially complete class; Scaled φp -optimality; Schur ordering.
al
1. Introduction
1
Jo
urn
Binary response experiments are frequently performed in scientific studies. The most commonly seen models for describing the relationship between a binary response and explanatory covariates are the generalized linear models, such as logit, probit, double exponential, and double reciprocal models. The optimal design problems for binary response models with single covariate have been well investigated; see for example, Ford et al. (1992), Sitter and Wu (1993), and Biedermann et al. (2006). However, only limited literature theoretically considers the design problems for binary models with multiple covariates. The model that we are interested in is a generalized linear model for binary Corresponding author. E-mail address:
[email protected]
Preprint submitted to Journal of Statistical Planning and Inference
September 24, 2019
Journal Pre-proof
response experiments with multiple covariates, as (1)
of
Prob(Y = 1) = F (β0 + β1 x1 + · · · + βk xk ) = F (β0 + x> β).
Jo
urn
al
Pr e-
p ro
Here Y is the response with the covariates at levels x = (x1 , . . . , xk )> respectively, θ > = (β0 , β > ) = (β0 , β1 , . . . , βk ) is the vector of unknown parameters with β1 , . . . , βk > 0, and the link function F is a cumulative distribution function. Sitter and Torsney (1995) investigated optimal designs for two covariates by the geometric approach (Elfving, 1952). They indicated that if the design space is unbounded, then in a multi-covariate case, the D-criterion can be made arbitrarily large by choice of design, which is quite different from that in the one-covariate case. Therefore, they adopted a design space [−1, 1] × R, where R = (−∞, ∞). By using the complete class approach, Yang et al. (2011) extended Sitter’s work from two covariates to a Q given finite number of covariates with design space k−1 i=1 [Uj , Vj ] × R. On the other hand, Haines et al. (2007) and later Kabera (2015) also investigated optimal designs for two covariates, but in contrast, they adopted another design space R2+ = [0, ∞)2 , by using the general equivalence theorem (Kiefer, 1974). It motivates us to extend their work to consider a more general case of multiple nonnegative covariates, and provide theoretical justifications showing why these optimal designs on Rk+ have the specific form. We note that neither our design space Rk+ nor the design space in Yang et al. (2011) is a special case of the other, and so the structure of our optimal designs, which have k + 1 or 2k support points, is different from theirs. This paper is constructed as follows. In Section 2 we provide some background knowledge of designs and notations. In Section 3, we characterize an essentially complete class consisting of permutation-invariant designs supported on the coordinate axes. Therefore, seeking optimal designs within the class is no longer a k- but a one-dimensional optimization problem. We then obtain D-optimal designs for logit, probit, double exponential, and double reciprocal models within the class in Section 4. Section 5 ends with conclusion and future works. All proofs are presented in the Appendix. 2. Preliminaries
Considering the following binary response model as that in Haines et al. (2007), we assume that β1 , . . . , βk > 0 and rewrite model (1) in terms of the scaled covariates 2
Journal Pre-proof
zi = βi xi as (2)
of
Prob(Y = 1) = F (β0 + z1 + · · · + zk ) = F (β0 + z > 1),
p ro
where z = (z1 , . . . , zk )> ∈ Rk+ and 1 = (1, . . . , 1)> . The link functions for the most popular binary models: logit, probit, double exponential, and double reciprocal models, are respectively as: ec ; ec Z1 + c 1 2 √ e−s /2 ds; FP (c) = 2π Z−∞ c 1 −|s| FDE (c) = e ds; −∞ 2 Z c 1 FDR (c) = ds. 2 −∞ 2(1 + |s|) Rk+
Pr e-
FL (c) =
An approximate design with scaled covariates is a discrete probability measure on denoted by ξ = {(zi , wi )}ni=1 =
n X
wi Azi .
i=1
urn
al
Here z1 , z2 , . . . , zn ∈ Rk+ are distinct support points; the weights w1 , w2 , · · · , wn are interpreted as the proportions of observations taken at z1 , z2 , . . . , zn , and (w1 , w2 , · · · , wn ) Pn belongs to the (n−1)-dimensional simplex, S n−1 = {(w1 , w2 , · · · , wn ) : wi ≥ 0, i=1 wi = 1}; Az = {(z, 1)} denotes the single-point design at z. The information matrix of ξ in terms of the scaled covariates is n n X X 1 zi> > M (ξ) = wi M (Azi ) = wi λ(β0 + zi 1) , (3) > z z z i i i i=1 i=1
Jo
where λ(c) = F 0 (c)2 /{F (c)(1 − F (c))} and F 0 (c) = dF (c)/dc. The original information matrix for the parameter vector θ is Mθ (ξ) = B −1 M (ξ) B −1 ,
(4)
where B is a diagonal matrix with diagonal entries (1, β > ). Moreover, we say that ξ is valid, which means θ is estimable under ξ, if M (ξ) is invertible. 3
Journal Pre-proof
p ro
of
The optimality criteria of interest in this work are measuring a design’s ability of parameter estimation based on its information matrix. An important family of such optimality criteria is related to the (scaled) φp -functions: 1/p 1 , p ∈ (−∞, 1] \ {0}, k+1 TrM (ξ)p 1/(k+1) (5) φp (M (ξ)) = |M (ξ)| , p = 0, νmin (M (ξ)) , p = −∞,
Pr e-
where νmin (M ) is the minimum eigenvalue of M . Let Ξ be the set of all valid designs with finite support points. A design is called φp -optimal if it maximizes φp (M (·)) over Ξ. The popular D-, A-, and E-criteria correspond to p = 0, −1, and −∞ respectively (Pukelsheim, 2006, Chapter 6). For dealing with all φp -criteria at the same time, we introduce two orderings of designs. (i) A design ξ2 is said to be at least as informative as ξ1 under the L¨oewner ordering (ξ1 ≤L ξ2 ) if (6)
αi Ui M (ξ1 )Ui> 0
(7)
M (ξ2 ) − M (ξ1 ) 0 (it is nonnegative definite).
(ii) A design ξ2 is said to be at least as informative as ξ1 under the Schur ordering (ξ1 ≤S ξ2 ), if m X
al
M (ξ2 ) −
i=1
urn
for some orthogonal matrices U1 , U2 , . . . , Um , and (α1 , α2 , . . . , αm ) ∈ S m−1 (Cheng, 1995; Harman, 2008). It is clear that condition (6) implies condition (7), and (see, e.g., Marshall & Olkin, 1979, page 56, Theorem A.8) condition (7) implies the condition φp (M (ξ1 )) ≤ φp (M (ξ2 )) for all p ∈ [−∞, 1].
Jo
In addition, the L¨oewner ordering has a strong property that, for arbitrary ξ1 , ξ2 , ξ3 ∈ Ξ and α ∈ [0, 1], ξ1 ≤L ξ2 implies αξ1 + (1 − α)ξ3 ≤L αξ2 + (1 − α)ξ3 .
We call a class of designs C ⊆ Ξ an essentially complete class with respect to the Schur ordering if, for an arbitrary design ξ1 ∈ / C, there exists a design ξ2 ∈ C such that 4
Journal Pre-proof
p ro
(i) if ξ1 ∈ C and ξ1 ≤S ξ2 for some ξ2 , then we drop ξ1 ;
of
ξ1 ≤S ξ2 . Therefore, for an arbitrary p, if a φp -optimal design exists, then there exists one in C. If we can obtain an essentially complete class consisting of designs with a simple structure, then we can identify φp -optimal designs more easily. Of course the set of all valid designs Ξ is essentially complete. For a given essentially complete class C, we obtain a smaller essentially complete class by:
(ii) if ξ1 ≤L ξ2 for some ξ2 and ξ3 = wξ1 + (1 − w)ξ4 ∈ C for some w ∈ (0, 1], then we drop ξ3 . 3. Essentially complete classes
al
Pr e-
To begin with, we construct an essentially complete class consisting of designs with a simple structure, and then we may identify the corresponding optimal designs for given unknown parameters values in the next section. Let Li be the nonnegative part of the i-th axis of Rk+ , and let L = ∪ki=1 Li . Let C1 be the class consisting all designs supported on L. We first show that C1 is essentially complete. For a single-point design Az , we define the corresponding boundary design Bz ∈ C1 as ( A0 , z = 0, Pk Bz = j=1 αj Atej , otherwise,
Lemma 3.1.
urn
where 0 = (0, . . . , 0)> , t = z > 1, αj = zj /t, and ei ’s are columns of the identity matrix, P I = [e1 , . . . , ek ]. The following lemma shows that for a design ξ = ni=1 wi Azi , there P exists a design ξ¯ = ni=1 wi Bzi ∈ C1 which is at least as informative as ξ under the L¨oewner ordering.
Jo
(a) Let z1 , z2 ∈ Rk+ with z1 6= z2 and z1> 1 = z2> 1. Let z = wz1 + (1 − w)z2 for some w ∈ (0, 1). We have that Az ≤L wAz1 + (1 − w)Az2 , and M (Az ) 6= M (wAz1 + (1 − w)Az2 ). (b) Az ≤L Bz . In addition, M (Az ) = M (Bz ) if and only if Az = Bz . 5
Journal Pre-proof
Pr e-
p ro
of
Lemma 3.1(a) shows the necessity of adopting a design space having bounded quantile hyperplanes. Namely, each qth quantile hyperplane {z : β0 + z > 1 = F −1 (q)} is bounded. As when k ≥ 2 and the design space is unbounded, then for an arbitrary design Az , there will always exist z1 and z2 in the unbounded design space satisfying Lemma 3.1(a) and thus a dominating “extremer” design exists. Alternatively, adopting a design space having bounded quantile hyperplanes can avoid the situation Q above, such as our design space Rk+ and the design space k−1 i=1 [Uj , Vj ] × R considered in Yang et al. (2011). In contrast, in the one-covariate case, each quantile hyperplane degenerates to a single point and therefore no boundary condition is needed. By Lemma 3.1(b), the class C1 is essentially complete by dropping all designs in Ξ with at least one support point not on L. Borrowing the idea of permutation-invariant introduced in Cheng (1995), we obtain an even smaller essentially complete class C2 consisting all symmetric designs in C1 , as follows. Let P be the set of all k × k permutation matrices. Note that P P 0 wi Bzi ∈ C1 can be represented as ξ¯ = ni=1 wi Ati e(i) for some each design ξ¯ = ni=1 n ≥ n0 , where (w1 , . . . , wn ) ∈ S n−1 , ti ≥ 0, and e(i) is one of {e1 , . . . , ek }. For such ξ¯ ∈ C1 , we define the corresponding symmetric design ξe as n X n X X wi ξe = Ati P e(i) = wi Dti , k! i=1 P ∈P i=1 (
A0 , Pk
al
where
Dt =
1 j=1 k Atej ,
t = 0, t > 0.
urn
Such ξe as well as the design Dt still belongs to C1 . The information matrix of design Dt is λ(β0 + t) k t1> M (Dt ) = . (8) t1 t2 I k
Jo
Similar to the result in Cheng (1995), the following lemma shows that because of the permutation-invariant property, asymmetric designs do not perform better than the corresponding symmetric ones. Therefore, after dropping all “asymmetric” design in C1 , ( n ) X C2 = wi Dti : n ∈ N, (w1 , . . . , wn ) ∈ S n−1 , ti ≥ 0 ⊂ C1 i=1
6
Journal Pre-proof
is still essentially complete.
of
Pn ¯ e Lemma 3.2. For arbitrary ξ¯ = i=1 wi Ati e(i) ∈ C1 , we have that ξ ≤S ξ = Pn i=1 wi Dti ∈ C2 .
n X wi i=1
k
kΨ0 (hi ) {Ψ1 (hi ) − β0 Ψ0 (hi )}1> {Ψ1 (hi ) − β0 Ψ0 (hi )}1 {Ψ2 (hi ) − 2β0 Ψ1 (hi ) + β02 Ψ0 (hi )}I (9)
Pr e-
e = Mpseudo (η) = M (ξ)
p ro
Note that searching for an optimal design among designs in C2 is a one-dimensional P optimal design problem. That is, each design ξe = ni=1 wi Dti can be represented as a pseudo design η = {(hi , wi )}ni=1 , where hi = ti + β0 ∈ [β0 , ∞), and the information matrix of η is
where Ψi (h) = hi λ(h). Therefore, abundant techniques for design problems with one-dimensional design space can be applied, such as Tchebycheff systems (Karlin and Studden, 1966) and de la Garza phenomenon (Yang , 2010). 4. The D-optimal designs for commonly used links
urn
al
Note that Lemma 3.2 holds for arbitrary link functions. In this section, we only consider D-optimal designs for logit, probit, double exponential, and double reciprocal models. For the model with another link function, such as log-log or complementary log-log, a D-optimal design can also be numerically obtained within the class C2 . We adopt the results for one-dimensional optimal designs in Yang and Stufken (2009). The following lemma, which is an extension of their Theorems 2 and 3, provides a unified structure of the optimal designs for logit, probit, double exponential, and double reciprocal models with arbitrary number of covariates.
Jo
Lemma 4.1. For model (2) with F = FL , FP , FDE , or FDR , and for a pseudo design η = {(hi , wi )}ni=1 with n ≥ n∗ and hi ∈ [β0 , ∞), there exists a pseudo design ∗ ∗ η ∗ = {(h∗i , wi∗ )}ni=1 with h∗1 > · · · > h∗n∗ ≥ β0 and (w1∗ , . . . , wn∗ ∗ ) ∈ S n −1 , such that η ≤L η ∗ , where (a) if β0 ≥ 0, then n∗ = 2 and h∗1 > h∗2 = β0 ; (b) if β0 < 0 and F = FL or FP , then n∗ = 2 and h∗1 > 0 > h∗2 = max(−h∗1 , β0 ); 7
,
Journal Pre-proof
(c) if β0 < 0 and F = FDE or FDR , then n∗ = 3 and h∗1 > h∗2 = 0 > h∗3 = max(−h∗1 , β0 ).
p ro
of
Lemma 4.1 works on the scaled covariates. In general, a φp -optimal design for the scaled covariates is not φp -optimal for the original covariates, except p = 0, the Doptimality. Lemma 4.1(a,b) shows that D-optimal designs for logit and probit models can be seen as two-point pseudo designs, and their pseudo design points are either symmetric, or the smaller one is the lower boundary β0 , which explains the structure of the D-optimal designs provided in Haines et al. (2007) for F = FL and k = 2. We further characterize a D-optimal design for logit and probit models with an arbitrary k.
Pr e-
Theorem 4.2. For model (2) with F = FL or FP , let the threshold β0∗ < 0 be the negative root of ˙ 0 )β0 = 0, 1 + λ(β
(10)
˙ where λ(c) = d log(λ(c))/dc. The design ξ ∗ = (1 − w∗ )Dt∗1 + w∗ Dt∗2 of the following structure is D-optimal. 1 (a) When β0 ≥ β0∗ , we have (t∗1 , t∗2 , w∗ ) = (ha − β0 , 0, k+1 ), where ha > max(0, β0 ) is a root of
al
˙ 2 + λ(h)(h − β0 ) = 0.
(11)
urn
(b) When β0 < β0∗ , we have (t∗1 , t∗2 , w∗ ) = (hb −β0 , −hb −β0 , wb ), where hb ∈ (0, −β0 ] is a root of 2 2 ˙ ˙ ˙ h 2k − 1 + k λ(h)h + β0 1 + k λ(h)h + γ(h) 1 + λ(h)h = 0, (12) q 2 γ(h) = (2kβ0 h)2 + (h2 − β02 ) and
Jo
wb =
2kβ0 hb − (hb − β0 )2 + γ(hb ) . 4(k + 1)β0 hb
(13)
Under the logit model, the threshold β0∗ = −1.5434 is independent of the dimension k, and we obtain locally D-optimal designs under some selected k and β0 as Figure 1. Some properties observed from Theorem 4.2 and Figure 1 are listed below: 8
Journal Pre-proof
5
β*0
4 3 2
of
0.4 Weight w*
0.3 0.2
1
p ro
Value of t*1 and t*2
0.5
β*0
6
0
0.1 -5
-4
-3
-2
-1
0
-5
1
β0
(a)
t∗1
(upper) and
-4
-3
-2
-1
0
1
β0
t∗2
(b) w∗
(lower)
Pr e-
Figure 1: Locally D-optimal designs for logit model under several k and β0 , where ξ ∗ = (1−w∗ )Dt∗1 + w∗ Dt∗2 : k = 1 (solid); k = 2 (dashed); k = 4 (dotted); k = 9 (dot-dashed).
˙ (i) When k = 1, equation (12) is reduced to 1 + λ(h)h = 0 and equation (13) is reduced to wa = 1/2, which agrees with the results in Sitter and Wu (1993). When k = 2, this theorem agrees with the results in Section 3 of Haines et al. (2007).
al
(ii) Note that β0 ≥ β0∗ if and only if the response probability at 0, FL (β0 ), is greater than FL (β0∗ ) = 0.1760. In this situation, through Theorem 4.2(a) there is a locally D-optimal design with k + 1 support points, where t∗2 = 0, and t∗1 = ha − β0 depends only on the intercept β0 but not on the dimension k.
urn
(iii) When β0 < β0∗ , through Theorem 4.2(b) there is a locally D-optimal design with 2k support points. These 2k support points are intersection points of the k axes and the two symmetric quantile hyperplanes with response probability FL (−hb ) and FL (hb ), respectively. Under a fixed k, hb decreases to the root of 2 ˙ λ(h)h + 1+k as β0 decreases to −∞; under a fixed β0 < β0∗ , hb decreases to the ˙ root of (h − β0 )λ(h) + 2 as k increases.
Jo
The D-optimal designs for the probit model also have similar properties and similar patterns (see Figure 2) with threshold β0∗ = −1.1381 and FP (β0∗ ) = 0.1275. The D-optimal designs for double exponential and double reciprocal models are
9
Journal Pre-proof
β*0
0.5
5
β*0
4 3 2
of
0.4 Weight w*
0.3 0.2
1
p ro
Value of t*1 and t*2
6
0
0.1 -5
-4
-3
-2
-1
0
-5
1
β0
(a)
t∗1
(upper) and
-4
-3
-2
-1
0
1
β0
t∗2
(b) w∗
(lower)
Pr e-
Figure 2: Locally D-optimal designs for probit model under several k and β0 , where ξ ∗ = (1 − w∗ )Dt∗1 + w∗ Dt∗2 : k = 1 (solid); k = 2 (dashed); k = 4 (dotted); k = 9 (dot-dashed).
more complicated. For their characterization, we give some notations as follows. Let ψ = ψ(t1 , t2 , t3 ; w1 , w2 , w3 ) = (k − 1) log(ψ2 ) + log(ψ2 ψ0 − ψ12 ), for t1 > t2 > t3 ≥ β0 and (w1 , w2 , w3 ) ∈ S 2 , where ψi = ψi (t1 , t2 , t3 ; w1 , w2 , w3 ) = P3 j=1 wj Ψi (tj ) for ` = 0, 1, 2. We have the following theorem. (a) When β0 ≥ 0, ξ ∗ = (11).
al
Theorem 4.3. For model (2) with F = FDE or FDR , the design ξ ∗ of the following structure is D-optimal. k D k+1 (ha −β0 )
+
1 D k+1 0
where ha > β0 is a root of equation
urn
(b) When β < 0, we let (h1 ; w10 , w20 , w30 ) and (h2 ; w100 , w200 , w300 ) satisfy: (h1 ; w10 , w20 , w30 ) = (h2 ; w100 , w200 , w300 ) =
arg max
h>0,(w1 ,w2 ,w3 )∈S 2
P3
Jo
The design ξ ∗ =
arg max
h>0,(w1 ,w2 ,w3 )∈S 2
i=1
ψ(h − β0 , −β0 , −h − β0 ; w1 , w2 , w3 ), ψ(h − β0 , −β0 ,
0
and
; w1 , w2 , w3 ).
wi∗ Dt∗i , where
(t∗1 , t∗2 , t∗3 ; w1∗ , w2∗ , w3∗ ) = (h1 − β0 , −β0 , −h1 − β0 ; w10 , w20 , w30 ) if h1 < −β0 , and (14) (t∗1 , t∗2 , t∗3 ; w1∗ , w2∗ , w3∗ ) = (h2 − β0 , −β0 , 10
0
; w100 , w200 , w300 )
otherwise. (15)
Journal Pre-proof
of
We use Theorem 4.3 to obtain D-optimal designs for double exponential and double reciprocal models under some selected k and β0 as Figures 3 and 4, respectively. From these figures and Theorem 4.3, we observe that:
p ro
(i) When β0 is extremely small, the D-optimal design has 3k support points (k points for each t∗i ), the corresponding weights are ordered as w2∗ > w1∗ > w3∗ . As β0 increases, w2∗ increases and the rest decrease. (ii) When β0 increases to greater than βa , Dt∗3 vanishes.
(iii) When β0 increases to greater than βb , the weights w1∗ becomes larger than w2∗ .
Pr e-
(iv) When β0 increases to greater than βc = 0, the obtained D-optimal design ξ ∗ is minimally supported on k + 1 support points.
5. Conclusion and future works
urn
al
k , we have provided an essentially In this work, with a relative large design space R+ complete class of optimal designs for generalized linear models with binary response experiments and locally D-optimal designs for four commonly used link functions. In practice, most of the design spaces are bounded, such as those in factorial or mixture experiments. Our results are useful in cases that when the optimal designs obtained fall into the design spaces of interest, then they are optimal altogether. On the other hand, even if the optimal design obtained does not fall into the design space under study, we may scale it in a certain way to a feasible one and use it as an initial design for numerical search in the future. Moreover, it may provide a way for evaluating the efficiency of the numerically obtained optimal designs, and act as a “benchmark” for comparison. These will be investigated more in the future.
Acknowledgments
Jo
The authors were supported by the Ministry of Science and Technology, Taiwan, with grants MOST 107-2118-M-008-005-MY2 and MOST 103-2118-M-110-002-MY2.
11
Journal Pre-proof
βa
βb
βc
βa
5
βc
3 2 1
of
0.8 Weight w*i
Value of t*i
4
0.6 0.4
p ro
0.2
0 -3
-2
-1
0
0.0 -3
1
β0
(a) The values of
t∗i
-2
-1
0
1
β0
(b) The values of wi∗ under k = 2
under k = 2
βa
βb
βc
βa
8
βb
βc
Pr e-
1.0 0.8
Weight w*i
6 Value of t*i
βb
1.0
4
2
0.6 0.4 0.2
-5
-4
-3
-2 β0
(c) The values of
t∗i
-1
0
1
under k = 4
al
0 -6
0.0 -6
-5
-4
-3
-2
-1
0
1
β0
(d) The values of wi∗ under k = 4
Appendix
urn
Figure 3: Locally D-optimal designs for double exponential model under k = 2 or 4, and several P2 P3 ∗ ∗ ∗ ∗ β0 , where ξ ∗ = i=1 wi Dti or i=1 wi Dti : the solid, dashed, and dotted curves correspond to the largest to the smallest t∗i . Thresholds βa , βb , and βc indicate when Dt∗3 vanishes, when w1∗ = w2∗ = 0.5, and when ξ ∗ is minimally supported, respectively.
A.1. Proof of Lemma 3.1
Jo
(a) It is sufficient to show that M (wAz1 + (1 − w)Az2 ) − M (Az ) is nonnegative definite and is not a zero matrix. Let t = z1> 1 = z2> 1 = z T 1. By equation (3),
12
Journal Pre-proof
βa
βb
βc
βa
5
βc
3 2 1
of
0.8 Weight w*i
Value of t*i
4
0.6 0.4
p ro
0.2
0 -3
-2
-1
0
0.0 -3
1
β0
(a) The values of
-2
-1
0
1
β0
t∗i
(b) The values of wi∗ for k = 2
for k = 2
βa
βb βc
βa
8
βb β c
Pr e-
1.0 0.8
Weight w*i
6 Value of t*i
βb
1.0
4
2
0.6 0.4 0.2
0 -6
-5
-4
-3
-2
-1
β0
1
for k = 4
al
(c) The values of
t∗i
0
0.0 -6
-5
-4
-3
-2
-1
0
1
β0
(d) The values of wi∗ for k = 4
urn
Figure 4: Locally D-optimal designs for double reciprocal model under k = 2 or 4, and several P2 P3 ∗ ∗ ∗ ∗ β0 , where ξ ∗ = i=1 wi Dti or i=1 wi Dti : the solid, dashed, and dotted curves correspond to the largest to the smallest t∗i . Thresholds βa , βb , and βc indicate when Dt∗3 vanishes, when w1∗ = w2∗ = 0.5, and when ξ ∗ is minimally supported, respectively.
we have that
Jo
M (wAz1 + (1 − w)Az2 ) − M (Az ) 1 z1> 1 z2> 1 z> = λ(β0 + t) w + (1 − w) − z1 z1 z1> z2 z2 z2> z zz > 0 0T = λ(β0 + t) , 0 w(1 − w)(z1 − z2 )(z1 − z2 )> 13
Journal Pre-proof
which is nonnegative definite and is not a zero matrix, by λ(β0 + t) > 0, w ∈ (0, 1), and z1 6= z2 .
P+ =
P+ =
1 0> 0 P
: P ∈P .
Pr e-
A.2. Proof of Lemma 3.2 Let
p ro
of
(b) Since z > 1 = (te1 )> 1 = · · · = (tek )> 1, the result Az ≤L Bz is a simple extension of (a) and the proof has therefore been omitted. We only need to show that when z 6= 0, M (Az ) = M (Bz ) implies Az = Bz . By a similar calculation of P that in (a), M (Az ) = M (Bz ) implies that zz > = t2 kj=1 αj ej e> j , and as a > result, zz is a diagonal matrix. Therefore, z has at most one nonzero entry and hence Az = Bz .
Note that P+ has k! elements and each P+ ∈ P+ is orthogonal. Therefore,
! n n X 1 X X X 1 e − ¯ >= M (ξ) P + M (ξ)P P+ wi M (Ati e(i) ) P+> wi M (Dti ) − + k! k! i=1 i=1 P+ ∈P+ P+ ∈P+ " # " # n X X 1 1 tki 1> 1 ti e> (i) P+> P+ = wi λ(β0 + ti ) ti − t2i > 2 e e t e t k! 1 kI i (i) i (i) (i) k i=1 P ∈P +
+
al
n X
=
wi λ(β0 + ti )O(k+1)×(k+1)
i=1
urn
= O(k+1)×(k+1) ,
where O(k+1)×(k+1) is the (k + 1) × (k + 1) zero matrix and is nonnegative definite. e Therefore, ξ¯ ≤S ξ.
n X i=1
Jo
A.3. Proof of Lemma 4.1 By the proofs of Theorems 2 and 3 in Yang and Stufken (2009), there exists the corresponding h∗1 , . . . , h∗n∗ and w1∗ , . . . , wn∗ ∗ which satisfy the statements of this lemma such that ∗
wi Ψ0 (hi ) =
n X i=1
wi∗ Ψ0 (h∗i ),
n X
∗
wi Ψ1 (hi ) =
i=1
n X i=1
14
wi∗ Ψ1 (h∗i ),
and
n X i=1
∗
wi Ψ2 (hi ) <
n X i=1
wi∗ Ψ2 (h∗i ).
Journal Pre-proof
Therefore, by equation (9), we have that Mpseudo (η ∗ ) − Mpseudo (η) is nonnegative definite and thus η ≤L η ∗ .
of
A.4. Proof of Theorem 4.2
p ro
It is sufficient to show that the corresponding pseudo design is D-optimal. By Lemma 4.1, if a two-point pseudo design is D-optimal, the two points must be symmetric when β0 is small enough, and otherwise the smaller one must be β0 . Therefore, the threshold β0∗ < 0 exists. As a result, we prove this theorem by the following steps: (i) when β0 is small enough, the pseudo design ηb = {(hb , 1 − wb ), (−hb , wb )} is Dk 1 optimal; (ii) otherwise, a pseudo design ηa = {(ha , k+1 ), (β0 , k+1 )} is D-optimal; (iii) ∗ we characterize the threshold β0 .
Pr e-
(i) Let (t∗1 , t∗2 , w∗ ) = (hb − β0 , −hb − β0 , wb ). By equation (9) and λ(−hb ) = λ(hb ), the determinant of Mpseudo (ηb ) is k+1 k ((1 − wb )t∗1 + wb t∗2 )1> |Mpseudo (ηb )| = ((1 − wb )t∗ + wb t∗ )1 ((1 − wb )t∗2 + wb t∗2 )I 1 2 1 2 ∗ ((1 − wb )t1 + wb t∗2 )2 −k k+1 ∗2 ∗2 k = k λ(hb ) ((1 − wb )t1 + wb t2 ) 1 − 2 2 (1 − wb )t∗1 + wb t∗2 k−1 = k −k λ(hb )k+1 β02 + h2b − 2hb β0 + 4wb hb β0 (4wb (1 − wb )h2b ).
λ(hb ) k
al
By solving ∂ log |Mpseudo (ηb )|/∂wb = 0, we have
2khb β0 − (hb − β0 )2 ± γ(hb ) . 4(k + 1)hb β0
urn
wb =
Since γ(hb )2 > {2khb β0 −(hb −β0 )2 }2 , the optimal weight wb must be as equation (13). Finally, by solving ∂ log |Mpseudo (ξ)|/∂hb = 0 together with (13), we have equation (12).
Jo
(ii) For the pseudo design ηa = {(ha , 1 − wa ), (β0 , wa )}, the corresponding design ξa = (1 − wa )D−β0 +ha + wa D0 has exact k + 1 support points and it is Doptimal for the model with k + 1 unknown parameters. Hence, the weight on 1 each support point must be k+1 , see for example, section 8.12 of Pukelsheim 1 (2006). This implies that wa = k+1 . For characterization of ha , we have that
15
Journal Pre-proof
(by equation (9) λ(β0 ) + kλ(ha ) (−β0 + ha )λ(ha )1> (−β0 + ha )λ(ha )1 (−β0 + ha )2 λ(ha )I
.
of
|Mpseudo (ηa )| ∝
By solving ∂ log |Mpseudo (ηa )|/∂ha = 0, we have that ha satisfies equation (11).
p ro
(iii) Through (i) and (ii), when β0 = β0∗ , we have that (hb , β0 ) = (hb , −hb ), and hence hb = −β0 . Therefore, by simplifying equation (11) with hb = −β0 and ˙ ˙ λ(c) = −λ(−c) when F = FL or FP , we get that β0∗ satisfies equation (10). A.5. Proof of Theorem 4.3
Pr e-
By applying Lemma 4.1(a), the proof of (a) is similar to the proof at Appendix A.4(ii) and it has therefore been omitted. To prove (b), since When β0 < 0, Lemma P3 ∗ ∗ 4.1(c) indicates that there exists a design ξ ∗ = i=1 wi Dti is D-optimal, where ∗ ∗ ∗ ∗ 2 ∗ ∗ ∗ ∗ (w1 , w2 , w3 ) ∈ S , ti = hi − β0 , and h1 > h2 = 0 > h3 = max(−h∗1 , β0 ). In P addition, for a design ξ = 3i=1 wi∗ Dt∗i , the log determinant of its information matrix is proportional to ψ(t∗1 , t∗2 , t∗3 , w1∗ , w2∗ , w3∗ ). That means, (t∗1 , t∗2 , t∗3 , w1∗ , w2∗ , w3∗ ) satisfies either equation (14) or (15), and it is the former case if −h1 − β0 > 0. References
al
Biedermann, S., Dette, H., and Zhu, W. (2006). Optimal designs for dose–response models with restricted design spaces. Journal of the American Statistical Association, 101, 747–759.
urn
Cheng, C.-S. (1995). Complete class results for the moment matrices of designs over permutation-invariant sets. Annals of Statistics, 23, 41–54. Elfving, G. (1952). Optimum allocation in linear regression theory. The Annals of Mathematical Statistics, 23, 255–262.
Jo
Ford, I., Torsney, B., and Wu, C. F. J. (1992). The use of a canonical form in the construction of locally optimal designs for non-linear problems. Journal of the Royal Statistical Society. Series B, 54, 569–583.
16
Journal Pre-proof
of
Haines, L. M., Kabera, G., Ndlovu, P., and Brien., T. E. (2007). D-optimal designs for logistic regression in two variables. In L´opez-Fidalgo, J., Rodriguez-Diaz, J. M., and Torsney, B. mODa 8 - Advances in Model-Oriented Design and Analysis , 91–98. Springer, New York.
p ro
Harman, R. (2008). Equivalence theorem for Schur optimality of experimental designs. Journal of Statistical Planning and Inference, 138, 1201–1209. Kabera, G. M., Haines, L. M., and Ndlovu, P. (2015). The analytic construction of D-optimal designs for the two-variable binary logistic regression model without interaction. Statistics, 49, 1169–1186.
Pr e-
Karlin, S., and Studden, W. J. (1966). Tchebycheff Systems with Applications in Analysis and Statistics. Wiley, New York. Kiefer, J. (1974). General equivalence theory for optimum designs (approximate theory). Annals of Statistics, 2, 849–879. Marshall A. W., and Olkin, I. (2014). Inequalities: Theory of majorization and its applications. Academic Press, New York. Pukelsheim, F. (2006). Optimal design of experiments. SIAM, Philadelphia.
al
Sitter, R. R., and Torsney, B. (1995). Optimal designs for binary response experiments with two design variables. Statistica Sinica, 5, 405–419.
urn
Sitter, R. R., and Wu, C. F. J. (1993). Optimal designs for binary response experiments: Fieller, D, and A criteria. Scandinavian Journal of Statistics, 20, 329–341. Yang, M. (2010). On the de la Garza phenomenon. The Annals of Statistics, 38, 2499–2524.
Jo
Yang, M., and Stufken, J. (2009). Support points of locally optimal designs for nonlinear models with two parameters. The Annals of Statistics, 37, 518–541. Yang, M., Zhang, B., and Huang, S. (2011). Optimal designs for generalized linear models with multiple design variables. Statistica Sinica, 21, 1415–1430.
17