Journal of Statistical Planning and Inference 134 (2005) 288 – 301 www.elsevier.com/locate/jspi
Optimal designs for choice experiments with asymmetric attributes Leonie Burgess,∗ , Deborah J. Street Department of Mathematical Sciences, University of Technology Sydney, P.O. Box 123, Broadway, NSW 2007, Australia Received 19 July 2003; accepted 31 May 2004 Available online 23 July 2004
Abstract In this paper we establish the form of the optimal design for choice experiments in which attributes need not have the same number of levels for testing main effects only, when there are k attributes, and all choice sets are of size m. We give a construction for optimal and near-optimal designs with small numbers of choice sets. We derive the general form of the determinant of the information matrix for estimating main effects and two-factor interactions and derive the optimal designs for this situation in some special cases. © 2004 Elsevier B.V. All rights reserved. MSC: primary 62J15; secondary 62K05 Keywords: Paired comparisons; Multiple comparisons; Bradley–Terry model; Multinomial logit model; Fractional factorial designs; Orthogonal main effect plans
1. Introduction Discrete choice experiments are widely used in various areas including marketing, transport, environmental resource economics and public welfare analysis; see Louviere et al. (2000) for an introduction to the area. A choice experiment consists of a number of choice sets, each containing several options (alternatives, profiles or treatment combinations). Respondents are shown each choice set in turn and are asked which option they prefer in each of the choice sets presented. Each ∗ Corresponding author. Tel.: +61-02-9514-2248.
E-mail address:
[email protected] (L. Burgess). 0378-3758/$ - see front matter © 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.jspi.2004.03.021
L. Burgess, Deborah J. Street / Journal of Statistical Planning and Inference 134 (2005) 288 – 301
289
option in a choice set is described by a set of attributes (factors), each with some number of levels. We assume that there are no repeated options in a choice set and no choice set is repeated in an experiment. In general we will describe the products to be compared by k attributes, where we assume that the qth attribute has q levels, represented by 0, 1, . . . , q − 1, and that all the choice sets in an experiment have m options. We will insist that respondents choose one of the options in each choice set (termed forced choice in the literature). This is the appropriate setting when you must commute to work or you must choose where to stay on an overnight business trip. We begin with a small example. Suppose that we are interested in the effect of three attributes on how people travel to work. Suppose that the first attribute is mode of transport with four levels: car, bus, train and bicycle. Suppose that the second attribute is total time for the journey with three levels: 25, 35 and 45 min. Suppose that the third attribute is cost per journey with levels $5 and $3. Using these attributes and levels we can describe 24 journeys (profiles or treatment combinations). One such journey is (car, 45 min, $5). We use these profiles to construct choice sets with, say, three options each and ask each respondent to choose one of the three options. The first choice set might be (car, 45 min, $5), (bus, 35 min, $3) and (bicycle, 25 min, $3). Each respondent is shown all of the choice sets in the experiment, one after the other, and for each choice set they are asked to choose one of the options presented. Provided that the choice experiment has been correctly designed, these responses can be used to estimate the effect of each of the attributes on travel to work options and to estimate the effect of the interactions of any two of the attributes on the choice of method of travelling to work. Results in the literature to date have focussed on choice experiments in which all attributes have two levels. For the estimation of main effects optimal designs for m = 2 can be constructed by using a resolution 3 fractional factorial design and constructing pairs by taking the foldover of each profile, so 010 is paired with 101, for example. Details may be found in various publications including van Berkum (1987, 1989), Street and Burgess (2004) and Grasshoff et al. (2003, 2004). When m > 2 results on the optimal choice experiments are given in Burgess and Street (2003a). For the estimation of main effects and two-factor interactions from a choice experiment with m = 2, one needs to start with a design of resolution 5 and obtain suitable pairs. Details are given in Street and Burgess (2004). For larger choice sets constructions for optimal and near-optimal designs are given in Burgess and Street (2003a). As in the earlier work we use a multinomial logit (MNL) model throughout. The aim of this paper is to give optimal and near-optimal choice experiments for the estimation of main effects for any values of k and m and any combination of factor levels. For the estimation of main effects and two-factor interactions we have established the form of the determinant of the information matrix in general and have found the optimal designs for particular values of m and k, specifically m = 2 and k = 3, 4, 5. 2. Difference vectors Burgess and Street (2003a) have shown that the optimality of designs with two level attributes depends on the number of attribute level differences between all pairwise
290
L. Burgess, Deborah J. Street / Journal of Statistical Planning and Inference 134 (2005) 288 – 301
comparisons of the options in a choice set. In a choice set of size m there are m2 pairs of profiles in the choice set and the pairwise differences between the attributes in a choice set are recorded in a difference vector. We now define the difference vector for a choice set with asymmetric attributes with levels 1 , 2 , . . . , k . Again we are interested in the number of attributes with equal levels and the number with different levels in the choice set, as this is linked to how efficiently main effects and interaction effects can be estimated. For example, for options 2401 and 1403 in a choice set, the levels of attributes 1 and 4 are different while the levels of attributes 2 and 3 are the same. Each entry in the difference vector is a binary k-tuple, which indicates whether the levels of the attributes are the same or different. For example, suppose m = 3 and k = 2 with levels 1 = 2 and 2 = 3. We consider all the pairwise comparisons of treatment combinations in the choice set (00,10,12). The first and second treatment combinations have the first attribute different and the second attribute the same, so that part of the difference vector is d1 = 10. The first and third treatment combinations have both attributes different, so that part of the difference vector is d2 = 11. When comparing the second and third treatment combinations, only the second attribute is different, so the final part of the difference vector is d3 = 01. Thus the difference vector for choice set (00,10,12) is v = (d1 , d2 , d3 ) = (10,11,01). In general, let v = (d1 , d2 , . . . , dm(m−1)/2 ) be a difference vector where di = i1 i2 . . . ik for i = 1, 2, . . . , m2 . Now iq = 1 if the levels of attribute q are different in the ith pairwise comparison of two treatment combinations in the choice set, and iq = 0 otherwise. Since we can write the treatment combinations in the choice set in any order, the order of the comparison of pairs of treatment combinations is not important, so we assume that any difference vector has d1 d2 · · · dm(m−1)/2 . In the previous example choice sets (00,10,12) and (01,02,11) have ordered difference vectors (10,11,01) and (01,10,11), respectively. These difference vectors are considered to be the same and are both written as (01,10,11). However, the entries 01 and 10 in the difference vector denote which factor is different and are not considered to be the same. We now establish the upper bound for the sum of the differences in a difference vector. Theorem 1. For a particular difference vector v, for a given m, the least upper bound for the sum of the differences for a particular factor q is 2 q = 2, m odd, (m − 1)/4, 2 m /4, q = 2, m even, Sq = 2 − ( x 2 + 2xy + y))/2, 2 < m, (m q q m(m − 1)/2, q m, where positive integers x and y satisfy the equation m = q x + y for 0 y < q . Proof. The upper bound for the sum of the differences for two level attributes has been established in Burgess and Street (2003a). When q m, there are enough levels of the factor so that the level can change in each treatment in the choice set. Therefore, in this case, the maximum sum of the differences is m2 . When 2 < q m, we write the treatment combinations in the choice set as the rows of an m × k array. Suppose in column q of this array, p1 of the entries are 0, p2 of the entries are q 1, and finally, pq of the entries are q − 1, where i=1 pi = m. By looking at the pairwise
L. Burgess, Deborah J. Street / Journal of Statistical Planning and Inference 134 (2005) 288 – 301
291
differences between the entries in column q, the contribution to the sum of the differences Sq is q
q −1
i=1,i
pi pj =
1 2
q i=1
pi (m − pi ) =
1 2 2m
−
1 2
q i=1
pi2 .
q 2 q pi )/2 subject to i=1 pi =m and we do this by minimising We wish to maximise (m2 − i=1 q 2 i=1 pi . Now the global minimum has pi = m/q . Suppose m = q x + y. Then b1 pi ’s are equal to m/q = x and b2 pi ’s are equal to m/q + 1, where b1 + b2 = q . Thus q 2 q x + y = m = b1 x + b2 (x + 1), so b2 = y and b1 = q − y. So the minimum i=1 pi is q 2 2 2 2 i=1 pi = b1 x + b2 (x + 1) = q x + 2xy + y. Hence for column (or attribute) q, the maximum contribution to Sq is (m2 − (q x 2 + 2xy + y))/2, where m = q x + y. Example 1. Let m = 3 and k = 2 with 1 = 2 and 2 = 3. The possible difference vectors are (01, 01, 01) with S1 = 0 and S2 = 3, (01, 10, 11) with S1 = 2 and S2 = 2, and (01, 11, 11) with S1 = 2 and S2 = 3. Using Theorem 1, the upper bound for the attribute with two levels is S1 = (m2 − 1)/4 = 2 and for the attribute with three levels S2 = m(m − 1)/2 = 3. Example 2. Now consider m = 4 and k = 2 with 1 = 2 and 2 = 3. The possible difference vectors are (01, 01, 01, 10, 11, 11) with S1 = 3 and S2 = 5, (01, 01, 10, 10, 11, 11) with S1 = 4 and S2 = 4, and (01, 01, 10, 11, 11, 11) with S1 = 4 and S2 = 5. Using Theorem 1, the upper bound for the attribute with two levels is S1 = m2 /4 = 4. Now 2 < 2 m so we solve 4 = 3x + y for x and y. Thus x = y = 1 and the upper bound for the attribute with 3 levels is S2 = (m2 − (q x 2 + 2xy + y))/2 = 5. For particular values of m and k, there are usually several difference vectors. We denote these by vj and adjoin a subscript j to the previous definitions of the difference vector entries. Thus vj = (d1j , d2j , . . . , dm(m−1)/2,j ) where dij = i1 i2 · · · ik and iq = 1 or 0 as before. In this paper, we restrict the set of possible choice experiments so that any experiment contains all the choice sets with a particular difference vector equally often. As in Burgess and Street (2003a), to simplify the calculations we require the scalar cvj and the indicator variable avj . Now cvj is defined to be the number of choice sets containing the treatment 00 . . . 0 with the difference vector vj . The indicator variable avj = 0 if no choice sets have the difference vector vj and avj = 1/(the total number of choice sets in the experiment) if there are choice sets with the difference vector vj . At least one of the avj values must be non-zero and a particular choice set may only appear once, or not at all. We define xvj ;d to be the number of times the difference d appears in the difference vector vj . Example 3. Let m = 3, k = 2 with 1 = 2 and 2 = 3. Using all treatment combinations from the complete 2 × 3 factorial, there are 63 = 20 distinct choice sets of size 3, two with difference vector v1 = (01, 01, 01), 12 with difference vector v2 = (01, 10, 11) and 6 with difference vector v3 = (01, 11, 11). Consider the choice sets containing the
292
L. Burgess, Deborah J. Street / Journal of Statistical Planning and Inference 134 (2005) 288 – 301
treatment 00: treatment (00,01,02) with difference vector v1 , treatment combinations (00,01,10), (00,01,11), (00,02,10), (00,02,12), (00,10,11) and (00,10,12) with difference vector v2 , and treatment combinations (00,01,12), (00,02,11) and (00,11,12) with difference vector v3 . Thus we have cv1 = 1, cv2 = 6 and cv3 = 3. For v1 the only possible value of d is 01, which appears 3 times, so xv1 ;01 = 3. Similarly, xv2 ;01 = xv2 ;10 = xv2 ;11 = xv3 ;01 = 1 and xv3 ;11 = 2.
3. The model and the information matrix Consider an experiment in which there are N choice sets of m treatment combinations, of which ni1 ,i2 ,...,im compare the specific treatment combinations Ti1 , Ti2 , . . . , Tim , where ni1 ,i2 ,...,im = 1 if (Ti1 , Ti2 , . . . , Tim ) is a choice set and is 0 otherwise. Then N=
ni1 ,i2 ,...,im .
i1
In choice experiments we define the parameters
= (1 , 2 , . . . , L ) associated with L = kq=1 q treatment combinations T1 , T2 , . . . , TL and in this paper we consider the MNL model (see, for example, Louviere et al., 2000). In the MNL model, given a choice set which contains m distinct treatment combinations Ti1 , Ti2 , . . . , Tim the probability that treatment Ti1 is preferred to the other m − 1 treatment combinations in the choice set is i P (Ti1 > Ti2 , . . . , Tim ) = m 1 j =1 ij for ij = 1, 2, . . . , L. Choices made in one choice set do not affect choices made in any other choice set. If m = 2 this is just the Bradley–Terry model (see Bradley and Terry, 1952). This is the model considered in Street et al. (2001). We let k () be the matrix of second derivatives of the likelihood function, where = (1 , 2 , . . . , L ) and we assume that 1 = 2 = · · · = L = 0 , say, (that is, all treatments are equally attractive, the usual null hypothesis). For choice sets of size m for k binary attributes, Burgess and Street (2003a) have provided a general form of k () and the information matrix Ck = Bk k Bk /2k , where Bk is the (2k − 1) × 2k matrix of contrasts associated with a 2k factorial design. For attributes with two levels, the general form of k () is a linear combination of the identity matrix and some Dk,i matrices, where Dk,i is a (0, 1) matrix of order 2k with a 1 in position (x, y) if treatment combinations x and y have i attributes with different levels. However, the D matrices for attributes with more than two levels need to indicate which attributes are different, not just that i attributes are different. We define Dd to be a (0, 1) matrix of order L with rows and columns labelled by the treatment combinations, and there is a 1 in position (x, y) if the attribute differences between treatment combinations x and y are represented by the binary k-tuple d. In general,
L. Burgess, Deborah J. Street / Journal of Statistical Planning and Inference 134 (2005) 288 – 301
293
Dd = Mi1 ⊗ Mi2 ⊗ · · · ⊗ Mik where iq = 0, I , Miq = q Jq − Iq , iq = 1 and Jq is a matrix of ones of order q . The general form of the k matrix for any choice set size for asymmetric attributes is given by yd Dd 1 m−1 − , k = zI k L ij m2 m2 j =1 (j − 1) d where yd = and z=
2 m
cvj xvj ;d avj
j
cvj avj =
1 m−1
j
yd .
d
The summations over j and d are over all possible difference vectors vj and all distinct difference vector entries d, respectively. The yd values represent the linear combination of the cvj xvj ;d avj values for those difference vectors vj in which the difference vector entry d appears. Example 4. Let m = 3, k = 2 with 1 = 2 and 2 = 3. From Example 3 the distinct di entries are 01, 10 and 11. Since the difference 01 appears in v1 three times and once each in v2 and v3 , y01 will contain av1 three times and both av2 and av3 once. Similarly, for y10 the difference 10 appears only in v2 . Finally, the difference 11 appears once in v2 and twice in v3 . Thus we have y01 =2(3cv1 av1 + cv2 av2 + cv3 av3 )/3 = 2av1 + 4av2 + 2av3 , y10 =2cv2 av2 /3 = 4av2 , y11 =2(cv2 av2 + 2cv3 av3 )/3 = 4av2 + 4av3 , z=av1 + 6av2 + 3av3 . Then
= 29 zI 6 − where
1 9
y
J − I3 D01 = 3 0
0 D11 = J3 − I 3
01
2
D01 + y10 D10 +
y11 D11 , 2
0 0 , D10 = J3 − I3 I3 J 3 − I3 , 0
using the standard treatment order.
I3 , 0
and
294
L. Burgess, Deborah J. Street / Journal of Statistical Planning and Inference 134 (2005) 288 – 301
4. Optimal designs for main effects only We now evaluate the kq=1 (q − 1) × kq=1 (q − 1) principal minor, CM , of the information matrix Ck = B k B , associated with the main effects so that we can determine the form of the D optimal designs. We let Bq be the normalised contrast matrix for main effects for a factor with q levels. Then the normalised contrast matrix for main effects for a 1 × 2 × · · · × k factorial is B1 ⊗ √1 j2 ⊗ · · · ⊗ √1 jk k 2 √1 j ⊗ B ⊗ · · · ⊗ √1 j 1 1 2 k k , BM = .. . √1 j1 ⊗ √1 j2 ⊗ · · · ⊗ B k 1 2 where jq is a 1×q vector of ones. Note that Bq (Jq −Iq )B q =−Iq , jq (Jq −Iq )/ q = (q − 1)jq / q and jq B q = 0. Thus B1 Md1 ⊗ √1 j2 Md2 ⊗ · · · ⊗ √1 jk Mdk k 2 √1 j M ⊗ B M ⊗ · · · ⊗ √1 j M 1 1 d1 2 d2 k k dk
B . BM Dd BM = M .. . √1 j1 Md1 ⊗ √1 j2 Md2 ⊗ · · · ⊗ B Md k k 1
2
As the off-diagonal positions always have a jq B q term in the Kronecker product, multiplied by a constant, all off-diagonal terms are zero.
is Since Mdq is either Iq or Jq − Iq , the qth block diagonal term of BM Dd BM k
(j − 1)ij (−1)iq Iq −1 .
j =1,j =q
Using this result we can show that the qth block diagonal of CM is 1 yd (m − 1)zI q −1 − I k ij q −1 m2 j =1,j =q (j − 1) d 1 (−1)iq = 2 Iq −1 . yd 1 − m (q − 1)iq d
Thus the determinant of CM is q −1 k 1 1 det(CM ) = yd 1 − . iq m2 (1 − ) q q=1 d To k find the D-optimal design, we must maximise det(CM ) subject to the constraint q=1 q z/m = 1.
L. Burgess, Deborah J. Street / Journal of Statistical Planning and Inference 134 (2005) 288 – 301
295
The following theorem establishes that the D-optimal design, for estimating main effects only, is one which consists of choice sets in which the sum of the differences attains the maximum value given in Theorem 1. Theorem 2. The D-optimal design for testing main effects only, when all other effects are assumed to be zero, is given by choice sets in which, for each vj present, the bound of Theorem 1 is satisfied, and there is at least one vj with a non-zero avj ; that is, the choice set is non-empty. Proof. Substituting in det(CM ) for yd gives
q −1 2 1 det(CM )= cvj xvj ;d avj 1 − iq m3 ) (1 − q q=1 j d q −1 k q 2 = cvj avj xvj ;d . m3 q − 1 k
q=1
Now
d,iq =1 xvj ;d
det(CM ) =
=
k q=1
j
m(m−1)/2 i=1
d,iq =1
dij for attribute q, so we have
2 cvj avj 3 m j
m(m−1)/2
q −1 dij
.
i=1
m(m−1)/2 Thus det(CM ) is maximised when i=1 dij is maximised for each attribute. Thus, using Theorem 1, the maximum det(CM ) is given by q −1 k 2 cv a v S q det(CM )= j j j m3 q=1 q −1 k 2Sq
= k m2 (q − 1) q=1 i=1,i =q i
−1 for those vj in the design. since avj = L j cvj /m
Example 5. Recall that for m=3 and k =2 with 1 =2 and 2 =3, there are three difference vectors v1 = (01, 01, 01), v2 = (01, 10, 11) and v3 = (01, 11, 11). Theorem 2 states that the D-optimal design has choice sets in which the entries in the difference vectors sum to S1 = (m2 − 1)/4 = 2 for attribute 1, and S2 = m(m − 1)/2 = 3 for attribute 2. Only difference vector v3 satisfies S1 = 2 and S2 = 3 so the D-optimal design consists of the six triples, each with difference vector v3 : (00,01,12), (00,02,11), (00,11,12), (01,02,10), (01,10,12) and (02,10,11) where av1 = av2 = 0 and av3 = 1/6. In this case det(CM ) = (4/27)(1/6)2 .
296
L. Burgess, Deborah J. Street / Journal of Statistical Planning and Inference 134 (2005) 288 – 301
Any two designs can be compared by calculating the efficiency of one design relative to the other, preferably the optimal design if is is known. In general the D-efficiency of any design is given by 1/w det(C) det(Coptimal ) where p is the number of parameters that are to be estimated in the model. For designs that estimate main effects, w = kq=1 (q − 1). For designs that estimate both main effects and k two factor interactions, w = kq=1 (q − 1) + k−1 q1 =1 q2 =q1 +1 (q 1 − 1)(q 2 − 1). 5. Construction of designs for main effects only In Theorem 3 of Burgess and Street (2003a) a construction for small optimal designs for two-level attributes is given, for choice sets of size m. A fraction of a factorial design is required as a starting design, then the choice sets are formed by adding one or more sets of generators. For an optimal design for testing main effects only, the generators must have a difference vector in which the sum of the differences is the maximum. Using the same method we now give a construction for optimal choice sets of size m for the estimation of main effects only for asymmetric attributes with two or more levels, using the complete factorial as the starting design. In order to do this we need to consider the magnitude of the differences between the attributes in a choice set rather than just whether the attributes are different, as we have done in the previous sections. In a choice set of size m there will be m(m − 1) differences in the levels of the attributes between pairs of treatment combinations. Let e = (e1 e2 . . . ek ) represent the differences between one pair of treatment combinations. The differences are calculated componentwise modulo q for attribute q. For example, suppose m = 3 and k = 2 with 1 = 2 and 2 = 3. For the choice set (00,10,12) the m(m − 1) = 6 differences are 00 − 10 ≡ 10, 10 − 00 ≡ 10, 00 − 12 ≡ 11, 12 − 00 ≡ 12, 10 − 12 ≡ 01 and 12 − 10 ≡ 02. For the first attribute the difference 0 appears twice and 1 appears 4 times, and for the second attribute the differences 0, 1 and 2 all appear twice. In the following theorem we give a construction for optimal choice sets for estimating main effects only. Theorem 3. Let F be the complete factorial for k attributes where the qth attribute has q levels. Suppose that we choose a set of m generators G = {g1 = 0, g2 , . . . , gm } such that gi = gj for i = j . Suppose that gi = (gi1 , gi2 , . . . , gik ) for i = 1, . . . , m and suppose that the multiset of differences for attribute q {±(gi1 q − gi2 q ) | 1 i1 , i2 m, i1 = i2 } contains each non-zero difference modulo q equally often. Then the choice sets given by the rows of F + g1 , F + g2 , . . . , F + gm , for one or more sets of generators G, are optimal for the estimation of main effects only, provided that there are as few zero differences as possible in each choice set. Proof. Let Pq ,eq be an q × q (0,1) matrix where there is a 1 in position (t1 , t2 ) if −1 t2 − t1 = eq . Thus, Pq ,0 = M0 and eqq =1 Pq ,eq = M1 where M0 and M1 are defined in
L. Burgess, Deborah J. Street / Journal of Statistical Planning and Inference 134 (2005) 288 – 301
297
Section 3. Now P1 ,e1 ⊗ P2 ,e2 ⊗ · · · ⊗ Pk ,ek indicates those combinations t1 and t2 with t2 − t1 = (e1 e2 · · · ek ), where all differences are calculated component-wise modulo q . We let e be the total number of times that e = (e1 e2 · · · ek ) appears as a difference elements in the one or more generators G. Thus we know that between · · · . . . ei−1 ei+1 ek e = i , say, independent of ei , for each ei = 0. We also e1 know that 1 = 2 ··· e (P1 ,0 ⊗ P2 ,0 ⊗ · · · ⊗ Pk ,0 ) m N e1 ek − ... e (P1 ,e1 ⊗ P2 ,e2 ⊗ · · · ⊗ Pk ,ek ) e1
ek
where N is the number of choice sets in the experiment. Consider ··· e =(i − 1)i + ... ... e1 ...ei−1 0ei+1 ...ek e1
ek
e1
ei−1 ei
ek
= (i − 1)i + i0 = , say, for i = 1, . . . , k. Note that (1 − 1)10 = (2 − 1)20 = · · · = (k − 1)k0 = .
is Using the contrast matrix BM from Section 4, it can easily be shown that C = BM BM a block diagonal matrix where the qth block diagonal for attribute q is given by q q Iq −1 / (m2 N). Now we wish to have as few zero differences as possible. Recall that m = q x + y where 0 y < q . Then we need to have y entries which are repeated x + 1 times each and q − y entries that are repeated x times each. This gives (x + 1)xy + x(x − 1)(q − y) differences that are zero, so the number of non-zero differences is m(m − 1) − (x + 1)xy − x(x − 1)(q − y) = 2Sq , where Sq is defined in Theorem 1. So considering all the choice sets, the total number of differences for attribute q is 2Sq N . We also know that each nonzero level of attribute q appears q times as a difference between elements of the sets of generators. So the total number of non-zero differences for attribute q is (q − 1)q L. Equating gives 2Sq N = (q − 1)q L. So the coefficient of the qth block diagonal matrix for the designs constructed this way is q q 2Sq
= , 2 m N 2 q − 1 i =q i m showing the designs are optimal by Theorem 2.
How do we go about using this construction? There are probably many ways of doing this but here is one technique which works. We begin by calculating the values of x and y (where m = q x + y) so we know that we have y values (between 0 and q − 1) that are repeated x + 1 times each and (q − y) entries which are repeated x times each. We then partition the values between 0 and q − 1 into two sets, one y entries and the
containing other containing the remaining (q − y) entries. There are
q y
ways to do this. For each
298
L. Burgess, Deborah J. Street / Journal of Statistical Planning and Inference 134 (2005) 288 – 301
partition we calculate the differences that arise from a vector with m entries in which the entries in the set with y entries are each repeated x + 1 times and the entries in the other set are each repeated x times. All such differences have as few 0 differences as possible in the m(m − 1) differences. Next we partition the vectors into sets based on the number of times each non-zero entry modulo q appears as a difference. We then choose how many vectors to take from each partition so that we have all non-zero differences appearing equally often over the set of vectors chosen. If there are several attributes, perhaps with different numbers of levels, then we must choose the same number of vectors for each attribute. Once we have the vectors for each attribute then we can calculate the entries for the generators by choosing one entry from each vector for each generator in such a way that no generator is repeated. An example should make this clearer. Example 6. Suppose that k = 2, 1 = 2 and 2 = 3 and that m = 3. Since 3 = 2 × 1 + 1, we have two partitions for the first attribute, where the entries in the first set are repeated twice and those in the second set are repeated once. Hence the vectors for the first attribute are (0,0,1) and (0,1,1). Each of these vectors has the difference 0 appearing twice and the difference 1 appearing 4 times and so, based on the differences, there is only one partition of the vectors. Since 3 = 3 × 1 we have one partition, {0,1,2}, and hence one vector (0,1,2) for the second attribute. Each non-zero difference appears three times in this vector. To get equal replication of the non-zero differences, we need only choose one vector for each attribute. Hence we calculate our generators by choosing the first position for each of the three generators from (0,0,1) and the second position from (0,1,2). We have always assumed, without loss of generality, that the first generator is {0,0}. So the other two generators are {0,1} and {1,2} or {0,2} and {1,1} and the two sets of generators are (00, 01, 12) and (00, 02, 11). Either of these sets of generators give 6 optimal triples for estimating main effects: (00, 01,12), (00, 02,11), (00,11,12), (01,02,10), (01,10,12) and (02,10,11).
6. Designs for main effects and two-factor interactions To find the D-optimal designs for estimating main effects and two-factor interactions for k attributes with any number of levels and for any choice set size m, we now evaluate the w × w principal minor, CMF , of Ck = B k B where
w=
k q=1
(q − 1) +
k−1
k
(q1 − 1)(q2 − 1).
q1 =1 q2 =q1 +1
As in Section 4 we let BM be the rows of B that correspond to main effects, and we now let BF be the rows of B that correspond to the two-factor interactions. The contrast matrix associated with main effects and two-factor interactions is denoted by BMF and is the
L. Burgess, Deborah J. Street / Journal of Statistical Planning and Inference 134 (2005) 288 – 301
299
concatenation of BM and BF , where BF is defined as B ⊗ B ⊗ √1 j ⊗ · · · ⊗ √1 j 1 2 k k 3 3 B ⊗ √1 j ⊗ B ⊗ · · · ⊗ √1 j 3 1 k k 2 2 . BF = .. . √1 j1 ⊗ · · · ⊗ √ 1 jk−2 ⊗ Bk−1 ⊗ Bk 1
k−2
Using the results in Section 4, the block diagonal term of CMF for main effects for factor q is given by 1 1 Iq −1 . yd 1 − m2 (1 − q )iq d
Similarly, it can be shown that the block diagonal term of CMF for the interaction of factors q1 and q2 is 1 1 I(q1 −1)(q2 −1) . yd 1 − m2 (1 − q1 )iq1 (1 − q2 )iq2 d
Thus the determinant of CMF is given by q −1 k 1 1 yd 1 − det(CMF )= m2 (1 − q )iq q=1 d (q −1)(q −1) k−1 k 1 2 1 1 yd 1 − . × i i 2 m (1 − q1 ) q1 (1 − q2 ) q2 q1 =1 q2 =q1+1
d
We have been unable to get any general results, true for all m, for the form of the designs that are optimal using this formula. When all attributes have two levels and m = 2, optimal designs have been given by van Berkum (1987, 1989) and Street and Burgess (2004); for larger values of m, optimal designs have been given by Burgess and Street (2003a). When m = 2 and all attributes have the same number of levels, Grasshoff et al. (2003) give the optimal pairs to choose from all possible pairs from a complete factorial. For the asymmetric situation, we can get specific results for fixed m and k. For instance, when m = 2 and k = 2, there are three difference vectors possible and so only seven possible choice experiments to consider. We can explicitly evaluate the determinants for each of these cases. The choice experiments and the corresponding determinants are given in Table 1. It is now possible to investigate the relative magnitudes of these determinants. In all cases the largest determinant is that obtained from the choice experiment containing all the pairs, since we could think of this as a main effects only model with k=1 and (1 −1)(2 −1) levels (so of course 1 = 2 = 2 is excluded). But this design has a determinant that is only about 5% larger than that from the choice experiment with only the pairs with difference vector (11) when 1 = 2 = 20 but it is 965 times larger than the determinant for the experiment with only the pairs with difference vectors (01) and (10) when 1 = 2 = 20.
300
L. Burgess, Deborah J. Street / Journal of Statistical Planning and Inference 134 (2005) 288 – 301
Table 1 Values of det(CMF ) for m = 2 and k = 2 All pairs with difference vector
det(CMF )
(01) (10)
0 0
(11) (01), (10) (01), (11) (10), (11) (01), (10), (11)
(1 −1)2 (2 −1)1 (1 (2 −1)−2 )(1 −1)(2 −1) ((1 −1)(2 −1))1 2 (1 +2 )(1 −1)(2 −1) (1 +2 −2)1 2 −1 1 ( −1) 1 1 (2 −1)(2 −1) 1 ( −1) (1 −1)(1 −1) 2 2 1 2 (1 2 )
11 22 (1 2 −1)1 2 −1
Indeed as 1 and 2 tend to infinity the limit of the ratio of the determinant of the design with all pairs to that of the design with pairs with difference (11) only is 1. Consider now the situation when k = 3 and m = 2. Here there are 7 difference vectors possible and so 127 possible choice experiments to consider. For 1 = 2, 2 = 2, 3 and 3 8 the optimal design has all the pairs with difference vectors (011), (101) and (110). For 1 =2, 4 2 8, 2 3 8 the optimal design has all the pairs with difference vectors (101) and (110). When 1 = 3 the optimal design has all the pairs with difference vectors (011), (101) and (110) when 2 = 3, 4, 5 and 3 = 3, 4, 5, 6 but it has all the pairs with difference vectors (101) and (110) when 5 2 8 and 2 3 8 (except 2 = 5, 3 = 5, 6). If we assume, without loss of generality, that 1 2 3 , then for fixed 1 all three difference vectors of weight 2 give the optimal design when 2 and 3 are “close enough” to 1 and as 2 and 3 get larger it is sufficient to have only those pairs with difference vectors (101) and (110). When m = 2 and k = 4 we have considered all cases with 2 1 2 3 4 8. There are 8 different sets of optimal difference vectors. The most common has three difference vectors of weight 3, (1011), (1101) and (1110) and the second most common has two difference vectors of weight 3, (1101) and (1110). Details may be found in Burgess and Street (2003b). These designs all have a large number of choice sets and we can use the method in Section 5 to obtain near-optimal designs with a smaller number of choice sets. Example 7. Suppose m = 2, k = 4 with 1 = 2, 2 = 3, 3 = 6 and 4 = 6. By investigating all possible designs, we found that the 8100 pairs with difference vectors (1101) and (1110) is the optimal design. By starting with the complete factorial and adding generators (1101) and (1110) we obtain a design in 432 pairs that is 93.6% efficient. Results for m = 3 and k = 2 for 2 1 2 8, and also for m = 4 and k = 2 for some values of 1 and 2 may be found in Burgess and Street (2003b).
L. Burgess, Deborah J. Street / Journal of Statistical Planning and Inference 134 (2005) 288 – 301
301
7. Concluding comments If all the attributes used in describing each option in each choice set are allowed to vary, there is evidence that respondents will not perform as consistently as they might (see, for instance, Swait and Adamowicz (1996), Severin (2000) and Maddala et al. (2003)). Thus it is of interest to be able to construct the best possible designs when at most s of the attributes can be different between any pair of options in a choice set. For m = 2 Grasshoff et al. (2003, 2004) have derived optimal designs for the symmetric case (1 = 2 = · · · = k ) in which a predetermined number of attributes in a choice set are different. The theory and construction method presented in this paper can easily be used to get bounds on det(C) for the asymmetric case for any value of m. The choice of generators for the construction of a design would then be governed by the number of attributes to be different in each choice set. Acknowledgements This research was supported by the Australian Research Council with grant A79906045. We thank the referees for their constructive comments on an earlier version. References van Berkum, E.E.M., 1987. Optimal paired comparison designs for factorial and quadratic models. J. Statist. Plann. Inference 15, 265–278. van Berkum, E.E.M., 1989. Reduction of the number of pairs in paired comparison designs and exact designs for quadratic models. Comput. Statist. Data Anal. 8, 93–107. Bradley, R.A., Terry, M.E., 1952. Rank analysis of incomplete block designs. I. The method of paired comparisons. Biometrika 39, 324–345. Burgess, L., Street, D.J., 2003a. Optimal designs for 2k choice experiments. Comm. Statist. Theory Methods 32, 2185–2206. Burgess, L., Street, D.J., 2003b. Optimal designs for asymmetric choice experiments. Research Report, Department of Mathematical Sciences, University of Technology, Sydney. Grasshoff, U., Grossmann, H., Holling, H., Schwabe, R., 2003. Optimal paired comparison designs for first-order interactions. Statistics 37, 373–386. Grasshoff, U., Grossmann, H., Holling, H., Schwabe, R., 2004. Optimal designs for main effects in linear paired comparison models. J. Statist. Plann. Inference, in press. Louviere, J.J., Hensher, D.A., Swait, J.D., 2000. Stated Choice Methods: Analysis and Application. Cambridge University Press, Cambridge. Maddala, T., Phillips, K.A., Johnson, F.R., 2003. An experiment on simplifying conjoint analysis designs for measuring preferences. Health Econom. 12, 1035–1047. Severin, V., 2000. Comparing statistical efficiency and respondent efficiency in choice experiments. Ph.D. Thesis, University of Sydney. Street, D.J., Burgess, L., 2004. Optimal and near-optimal pairs for the estimation of effects in 2-level choice experiments. J. Statist. Plann. Inference 118, 185–199. Street, D.J., Bunch, D.S., Moore, B., 2001. Optimal designs for 2k paired comparison experiments. Comm. Statist. Theory Methods 30, 2149–2171. Swait, J., Adamowicz, W., 1996. The Effect of Choice Environment and Task Demands on Consumer Behaviour: Discriminating between Contribution and Confusion. University of Florida, Gainesville.