Computational Statistics & Data Analysis 14 (1992) 427-436 North-Holland
427
K. Felsenstein Institut fiir Statistik und Wahrscheinlichkeitstheone, Technische Unil*ersitliitWien, A-1040 Wien. Austria Received March 1988 Revised December 1990 Abstract: The discrimination between rival models for a controlled experiment is treated from a Bayesian point of view. The Bayes-risk for 0- Ko, K, loss serves as criterion for an optimal design. Results for densities with special structure are given. The exponential life time distribution and the linear regression model are investigated in detail with numerical examples included. Keywords: Bayes test; Optimal design: Bayes risk; Controlled experiment: Discrimination between models.
The first task concerning a statistical analysis is the choice of the model. A plurality of methods have been developed for discrimination between rival models. We start from a control’led experiment and compare two models. The aim of the study is a powerful test for the discrimination and therefore an optimal design for that purpose. A decision theoretic approach required the loss function L( 9 evaluating a test procedure and a design (or controlled factor) d. Throughout the paper we assume O-K,, K, loss (in fat! a O-1 loss, which makes no real difference), representing the loss coming from a wrong decision. Two probabilities 7rro, 1. Each design d rI = 1 - 7ro are assigned to the two models M,, and distribution in M,, and M, respectively. The optimal test corresponds to procedure is the ayes-test determining the model M,, if
where f=<0 1d) are the densities with respect to a measure p of the model z = 0 Correspondence to: K. b,ztsensteirI, Institut fur Statistik und ‘W~hrssheinlichkeitstheoric. Hauptstrasse 8-10, A-1040 Wien, Austrra. 0167-9473/92/$05.00
0 1992 - Elsevicr Science Publishers B.V. All rights rcservcd
Wicdncr
K. Fekertsteirt / Discrirnirlatiortantortg rival models
328
or 1 for fixed design 8. This choice criteria yields the Bayes-risk r(d) that is the weighted probability of an incorrect decision, i.e. r(d) =K,r,,P(XEACIIM,,,
d) +@*P(XEA
I M,, d),
where the set 4 consists of all points fulfilling (12. Now we denote a design n * to be optimal, if it minimizes the Bayes-risk. Since we suppose that densities exist, the Bayes-risk for O-l loss reads
To avoid trivial cases let the prior probabilities ro, r, be different from 0 or 1. In general the optimal design depends upon the prior belief ro, I+
2. Controlled parameters The densities f,,Cs), flc - ) belong now to a parametrized class and at least some of the parameters are fttnctions of a controlled variable. The Bayes-test should clarify which function describes the influence of the controlled variable correctly. First we deal with unknown parameters which are independent of the adjustable variable C!and determine the densities. Let 8,, and 8, be the nuisance parameters of the two models. Therefore we denote t, .: densities f,,( 1OIj, d) and f,( - IO,. d. The dimension of the parameters 8, and 6, need not be the same, thus the specification of two quite different prior densities is requested. We introduce an additional parameter z E (0, 1) indicating the ‘true’ model. The combination of i and 8: is treated as the modelparameter 7 = (z, 0,) with prior density l
7i-(7) =
;7(e,12)~~.
For fixed iar .d hence omitted) ti the observations distribution of 7 which can be written as
X lead to the posterior
7;r(~lx)=~(eJx. +(21x). de, I X, i) stands for the posterior density of 8= and the posterior probability of the model i is
7rzm,(X)
r(zlX)=
"ram,,(X) +vdX)
l
m z( X) is the marginal density of X in the state z, i.e. :Y,(x) =
fp
I++,
I Z) dbqe,). S = (2, 6) consists of both cho timal selection of
lies in our interest only. Therefore
the loss stays the 0, l-loss
if z#?, otherwise. The Bayes-test according to that 0, l-loss accepts the hypothesis z = 0 if V%(X)
> 7’rI?(X)*
The likelihood ratio is replaced by the ratio of the marginal densities. The principle of the best test is the same as in the case where all parameters are known. However, the shape of f< ) or wn( ) depends upon the design d and at the moment we assume that both f=<4, z = 0, 1 belong to the same parametrized class such that different designs give different parameters. Let us assume that our class of distributions constitutes an l-dimensional location family of continuous distributions. The two models differ only through different acceleration models 8,,(d) and e,(d) for the location parameter 0. The test should distinguish the suitable acceleration models. Therefore we write l
f,(x
Id) =f(x
- e,(d)),
z=o,
l
1.
In general the optimal design which minimizes the Bayes-risk of the discrimination test depends upon the prior and upon the density ft. ), but under special assumptions about f< .) we derive conditions which assure the Bayes-risk to be independent of the concrete distribution and the prior. If the density f : R --) IF8+ is increasirzg 011 R 0 nrd decreasirlg 011 R (; therl the design d * is optimal iff the distance
is maximum at d *. roof. The Bayes-risk r(d) =q
-q,
/I
max[O, af(x-Ol(d))-f(x-0(,(d))] -CC
dx-,
(2)
is without any loss of generality less than 1. From (2) it can where ac = Qrr,,, be seen that Y(- ) is a function of 8, - &, only, r(d)
=
77,
-
qIqe,
- O,,),
with R(8) := j
-x
max[O, cuf(s - 0) -f(x)]
ds.
We assume that e,(d) > 8,,(d) for fixed d and 8 > 0 in (2) and (3), respectively. otherwise we replace f(x) by f( --.a-). t suffices to show t at R(8) is not decreasing in 8.
K. Feisenstein / Discrimination crmong rkal modeh
430
If x < 0 then max[O, on R,, so R(8) = /I max[O, 0 For a h > 0 we get
R(8+ h) = j The
monotony of
cxf(x
ckf(x
max[O,
- 0) -f(x)]
-6)
0 because of the monotony of J’c0)
=
dx.
-f(x)]
tzf(x-6)-f(x+h)]
dx.
- I1 f<
.) for nonnegative
reals gives
R(0+ h) 2 /%max[O,
af(x - 0) -f(x)] dx = R(8), 0 and d cannot be optimal if the distance of the acceleration large as possible. EI
models is not as
For arbitrary location families this assertion might fail but it can be substituted by similar results for other classes of densities, for example for periodic f( ). The main limitation of this result is its restriction to one observation. This can be removed if a l-dimensional sufficient statistic for the parameter exists and belongs to a location family. Since all posterior probabilities only depend on that sufficient statistic, so the risk does. In many applications, a sufficient statistic can be obtained which belongs to a scale parameter family. Again we restrict to a l-dimensional density f( 0). S&e the parameter is now a scale parameter we write l
f,(x
Id)
z=o,
= &(d)f(R(d)x),
1,
(4‘)
with positive functions 8,,( - ), e,( 1. In order to derive a criterion for a design to be optimal, we define the two ‘parts’ g,(x) =f(x) and g,(x) = f( -x) both on R,. For each 8 and cy we define the sets l
G~h=(x101cregi(8x)>g,(X)],
i=l,2,
which are assumed to be intervals for our result. Our main application in Section 3 are exponential models. Therefore we especially consider Gammadensities y(n 0). In this case g2 = 0 and the upper limit of the interval is s,(e) = (In CY+ n In e)/(e - 1) with CL= I;_+ ifa ifff
the density f (
l
)
2.
in (4) fulfills the followirzg conditions bi)-(iii): 1,2 with IIc~<~such (i) = [w’ 7 H,ff
that
K. Felsenstein / Discrimination among rir *al models
331
und if ck is finite then
c:,<’ (iii>
<
00
j
6~~
=
l
Si(Oj is dqferentiabk
[O, s,(e));
on (cfi, 0~);
and if ( iLv ) or ( L:) holds: (iv) si(0) is the unique solution of aeg,(8x j = gi(x j > 0; (vj Si(8j is decreasing; then a design that maximizes 1in@,(d)) - in@,,(d)) 1 is optimal.
The Bayes-risk is r(d) = TT, - TQ,
and therefore a function of the quotient for fixed d and define R(8) :=
/mmax[O, -x
cyOf(Ox) -f(x)]
01CdJ/8&d) only. We assume 8, > 8,
dx,
for 6 2 1. Since R(0) = R,(6) + R,(8), where
Ri(e) :=/ Imax[O, 4g,( 0x) - gi( x)] dx, 0
the assertion is proved if Ri( 1, i = 1, 2, are both increasing in 0. If CY2 1 then only CL,, = IL!+ and if cy < 1 then only Cd,, = 0 is possible for 1 I 8 _
R;‘(e)
= ~g,(~s,(~))s,(~)
+
[~g,(es,(e))e
-g,(s,(e))]s;(e).
Since cygi(8si(8))8 ~g,(s~(e)) we have by virtue of the assumptions on si(0) R]+(e) 2 0,
i=l,2.
Cl
As a principal application of the last section we consider the discrimination of acceleration models concerning life data coming from an exponential distribution with failure rate 0. nstcad of design we call1 t e (positive) controllable variable stress S under which the observations Xl, i = 1,. . . , n, of life time are taken. Since for fixed stress Cy__, is a sufficient statistic which belongs to a ication of scale parameter family with ail t roposition 2, we conclude:
K. Felsenstein / Discrimination umong rir*almodels
432
Indepeizdent of the prior probabilities for the two models and the number of obsenqations, the optimal stress S* maximizes the symmetric quotient Qc S ) = max( t&/6,, 8 ,/B,,) of the two completely specified acceleration models 8,,(S) and
e,(s).
ann et In the following examples soml: standard acceleration models (see al., 1974) of the exponential model are compared to each other and the corresponding risks are plotted. We examine for l-dimensional stress the Eyring model e(S) = S exp(a + b/S),
a, bEIW,
for both l- and 2-dimensional
stress the generalized
O(S ,,.“,S,l)=exp(a,,+a,/S, and for 2-dimensional
+ --
Arrhenius
+a,,/S,,),
model
% ). . . ya,, E K
stress a modified version of the Eyrinr model
e(s,, S,) = S,S,
exp(a +
e 1. We discriminate
+
a, b, c E R.
between
e,,(s) = exp(2.5 - 1.0/S) = Arrhenius( 2.5,
1.O),
-
and 8,(S)=Sexp(l.5-OS/S)-Eyring(l.0,
-0.5).
For increasing sample size ra the Bayes-risk decreases for all stress niveaus and becomes more flattened. From the curve of the Bayes-risk r,(S) for sample size 1 (see left lower part of Figure 1) it can be seen that there are regions where observations (for fixed n) cannot improve the Bayes-risk. Since, in application observations have to be paid, designs out of those regions are not expected to gain information. The numerical results of a simulated sample are contained in the following output. 8AY
LSC
Rival
A
-
output
Models:
simulated
for
example-1
A-
theta
=
B-
theta
=
sample:
posterior
dimension stress:
Bayes’
pi0
=
distribution:
stress
decision:
0.2,
PI0 =
1,
pi1 =
stress
A;
expected
e discriminate
=
1.4108857139945030e-002 0.8
0.995508, range
5.0000000754544538e-001
For purposes of illustration
(,(S, , S, ) = ex
1.7312778532505035e-001
2.5609150528907776e-001
distribution:
Optimal
Eyring(l.O,-0.5) 2.6993909478187561e-001
4.3356005102396011e-002 prior
Arrhenius(Z.S,-1.0)
:
low
loss: PI1
=
=
0.2,
with 10;s:
KO
=
0.004492,
Kl
=
1.0
0.004492 high risk:
=
2.0
5.8260018357255743e-002.
4.4916311998574173e-003.
we compare models with two-dimensiona 1 stress. between /S, - 1.0/S,) 2: Arrhenius(5.0,
- 1.O,
-
1 .O),
K. Felsenstein / Discrimination among rival mod&
/
t 2
/
1
i
4(S)
/
2-
/
0
433
:
/
I-
/“---“---
Q(S)
1
.2
Fig. 1. Curves for Eyring/Arrhenius
xc&ration
models.
,I,-
F-
-. If-
Fig.
2.
\
\
\
\
Symmetric quotient and Baycs-risk for Example 2.
--
---.
K. Fekenstein / Discrimination among rival models
434
and 0,( S,, S,) = S,S, exp(l.5 - 0.2/S, - 0.2/S2) = modified Eyring( 1S,
0.2, - 0.2).
-
The following program output contains the numerical res BAY-DISC Rival
output
for
Models:
A
-
B A -
simulated
prior posterior stress Optimal
example-2 theta = Arrhenius(S.O,-l.O,-1.0) = modified Eyring(l.S,-0.2,-0.2) theta 4.5719128102064133e-003
sample:
distribution:
pi0
distribution: dimension
PI0
=
stress
2,
stress
minimal
Bayes’
decision:
pi1
= 0.8
= 0.625334, range:
loss: PI1
low
KO =
1.0,
Kl
=
1.0
= 0.374666
= 0.2,
high
=
3.0
value:
(7.9999999672092925e-001, yields
= 0.2,
risk
of:
A;
8.0000001267161380e-001) 4.3408176291659517e-001
expected
loss:
3.74666
I 0896833847e-001.
symmetric quotient sions. The plane of wing the Bay-es-risk.
In Figure 2 we illustrate the relationship between Q
inear regression
that our observations
Xi, i = 1,.
. . , yt,
fi to a linear re;i;ession
model Xi = q(di)‘O + Ei, with 77: $3 + R’, normally distributed errors pi and the from the experimental region 9. Again we consider models and write X=FzO/tZ,
z=o,
sign points di taken different regression
1,
where 0, E [w’z,the design matrix F= E RnXrz and g m N(0, ~~1,). The known variance o2 of the errors should be equal in both ‘mode First we consider the case of known parameters 8, a 0 ,. Then the search for an optimal design d * = (d:, . . . , dJ9 is the consequ e of the next result. 3. If the densities of the obsen’ations X E UP for the design d are the form
_’
of
fh I d) =f( IIx - Jqd) II), where O,(d) E IL!”and with the function f ( ) decreasing on IT, optimal if it maximizes the norm IIt+J d) - 8 ,( d) II. l
the design d * is
of. Since the Bayes-risk is r(d) = rl - rO
ma@,
af(
II x - o,(d) + o,(d) II) -f( II x II)] dx,
K. Felsenstein / Discrimination among viral models
435
for Q!= 7~&r,, s 1, we consider R(8) =
nma@,
qf( IIx - 0 11)-f(
IIx II)] dx,
and
A, := (x E IFJ”I af( II x - 8 II) >f( !I x II)). For a fixed h > 0 and x EA@ we have af( II Y - e(1 + A) II) 2f(
II x II),
for y =x f 8A. By virtue of the monotony of f<*>, f( IIx 11)
- O(l + A)
II) 2f( II Y II),
and Y EAo(I+A)- The Lebesgue-measure
of &(I +hJ is larger than that of A, and
since
cwf( II Y
- q 1 + A)
II) -f(
II Y II) 2 qf( II x - 0 il) -f( II x II ),
we conclude R@(l + A)) 2 R(e). If II8 II= II6’11)there exists an orthogonal matrix M with M ‘8 = 6. A simple substitution gives R(8) = R(8) and hence R(8) is maximal if II8 II is maximal. cl Surprisingly, if f,C x Id) = f(x - Oi< d)) and fc *) is a density decreasing in any direction from the origin this result does not hold in general. We return to the regression model and obtain an optimal design d* if the norm
II qI( q,
- q(d)&
II
is maximum for d * in the experimental region 9. Often the knowledge of the regression parameters can be doubted. In the situation of unknown regression parameters two priors have to be specified and we choose the conjugate normal distribution. The prior of ez is therefore a u,-dimensional normal N(pi, A,) with mean p2 E W- and covariance matrix A, E R’;X’z. The observations X leads to the normal posterior distributions of 8, and 8, with covariance matrix
AT=a*(cq9-z+a2A-~i )-l
9
and mean vector 1 & = 7A~(~z’X+o-2A;‘pz). Following Section 2, the discrimination test is carried out with the marginal distribution of X. Again, an n- imensional normal distribution wit
K. Fehstein
436
/ Discrimirzation amoutg rir ‘al models
and covariance matrix
for z = 0,
1
results. The value of interest is the Bayes-risk YW which reads
(7) where mZ( 1, z = 0, 1 denote the n-dimensional normals with the parameters given in (5) and (6). In order to reduce the effort for the optimization in (7), continuous designs might be introduced. These are distributions on 9, or more general a randomized solution of (7) is achieved. Finally we give a numerical example of two rival regression models. l
xarnple 3. As rival models we compare polynomial and exponential
regression
. 0.0,
0.0)
1.25, 0.0,
0.0) 1.25)
0.1) 0.5)
1.0,
q
Bayes
’
tfecision:
A;
expected
loss:
Kl
=
I..0
1.0
8.8030754454892765e-002.
Fences Atkinson, A. and V. Fedorov, The design of experiments for discriminating between two rival models, Biometrika, 62 (1) (1975a) 57-70. Atkinson, A. and V. Fedorov, Optimal design: Experiments for discriminating between several models, Biometrika, 62 (2) (1975b) 289-303. Felsenstein, K., Some characteristics of Bayesian designs, in: Viertl, ed., Probability and Bayesian Statistics (Plenum, New York, 1987). Hedayat, A., M. Jacroux and D. Majumdar, Optimal designs for comparing test treatments with controls, Statistical Science, 3 (4) (1988) 462-491. Mann, N., R. Schafer and N. Singpurwalla, Methods for Statistical Analysis of Reliability and Life Data K’iley, New York. 1974).