Journal o f Statistical Planning and Inference 12 (1985) 359-368 North-Holland
359
D E T E R M I N I N G DEFECTIVES IN A L I N E A R O R D E R M. A I G N E R and M. S C H U G H A R T I1. Math. Institut, Freie Universiti~t Berlin, Arnimallee 3, 1)-1000 Berlin 33, West-Germany Received 8 November 1984 Recommended by G . O . H . Katona
Abstract: Consider a linearly ordered set L of cardinality n o f which exactly d elements are defective. We want to determine the subset D o f defective elements by a series of group tests. The possible test sets are all lower and upper intervals and after every test A we receive as answer IA ADI, i.e. the exact number o f defective elements in A. In this paper the minimum number D(d, n) of group tests necessary to specify D is determined for all d and n.
A M S Subject Classification: 94A50, 68E05. Key words: Search theory; Group tests; Linearly ordered sets.
1. Introduction Consider a set L of n elements of which exactly d are 'defective'. A much studied problem calls for the determination of the subset D of defective elements by a series o f group tests. For a general survey of the subject see Katona (1973). Two natural test families have been investigated. First a test may consist in testing a subset A c_L with the two outcomes 'yes' if AAD:/:O and 'no' if A N D = 0 , i.e. we receive as answer whether the set under consideration contains at least one defective element ( = the set is 'contaminated') or none ( = the set is 'pure'). For d = 1 the worst-case cost is trivially I-log2 n'], but already for d = 2 a complete answer is not known, the best bounds being due to Chang, Hwang and Lin (1982), see also Chang and Hwang (1981), Sobel (1968) and Tosi6 (1980). The problem is wide open for d_>3. Secondly, a test may consist in testing a subset A c L with the outcome [ A n D I, the exact number of defective elements in A. This version seems to be considerably harder. In this connection see Aigner (1986) and LindstrSm (1975). In this paper we consider a natural 'alphabetical' variant of the problem. Assume that L = {/1,/2,---, ln} is linearly ordered, where the first element is smallest and the n-th element is largest. We allow as test sets all lower intervals L~ = {11 < 12<--- < l/} and all upper intervals Lf={lj
360
M. Aigner, M. Schughart / Determining defectives
D(d, n) is a monotone function of n we present the formula for D(d, n) in terms of the threshold function fa(m) where fd(m) is defined as the largest integer such that D(d, fd(m))< m. The main theorem in Section 3 states f d ( k d + i - l ) = ( d + i ) 2 k-1
fork>_2, d _ l ,
O<_i<_d-1.
(1)
2. Preliminary results We use the notation of Section 1. L = {11 < 12< ' " < t,, } is a linear order, D is the set of defective elements with [DI= d. An element not in D is called pure. We make a few simple observations. Since [L~ADI=j ¢, ILF+1AD[ = d - j , we m a y confine our tests to lower intervals. Next, since IL(ODI-j ~* I L ~ A ( L - D ) I = i - j , the worst-case cost for D is the same as for the complement L - D, whence
D(d,n)=D(n-d,n).
(2)
We may thus assume n__ 2d. Our next observation is that
D(d,n)<_D(d,n+ 1)
for all d and n.
(3)
To see this, assume the largest element of {ll < 12< --" < ln+l} to be pure. Then any optimal strategy for the (d, n + l)-problem yields a successful strategy for the (d, n)problem, which proves the inequality. Our final observation is
(4)
D(1, n) = flog2 n ] ,
since we may apply the usual halving procedure to the chain L. The following is the basic lemma for the recursive procedure used in the proof o f the main theorem. Lemma. For all d, n e nq,
D(d, n) = 1 +
rain
max ( D ( d - j , 1) + D(j, n - 1)).
(5)
rn/2 l <_l
Proof. Suppose L~ is the first test set in a strategy for the (d,n)-problem. If IL[ND[ = d - j then we know that d - j defective elements are in L[ and j defective elements are in Lr+ 1- The cost of the remaining problem is D ( d - j , l)+ D(j, n - l) since any subsequent test can only render information on one of the two sets L[ or Lr+l. Indeed, if we use the set L~ with k < l , then the information on L~+ l remains the same, and if we use LIk with k > l , then no new information is gained on the set L(. Since we are interested in the worst-case cost we have to assume the maximum of these subsequent costs D ( d - j , l)+D(j, n - l ) . An optimal strategy wiU then choose as first test set that L~ for which this maximum is minimal. Since the sum
M. Aigner, M. Schughart / Determining defectives
361
D ( d - j , 1) + D(j, n - 1) is symmetric in the second argument for everyj it is clear that we may restrict ourselves to L( with l__>In/2-].
[]
Corollary. The sequence {D(0, n), D(1, n), D(2, n),..., D(n, n)} is unimodal for every
n, i.e., D(d,n)<_D(d+ 1,n) for d+ l_<[-n/2],
(6)
D(d,n)>_D(d+ 1,n) for d>_ In/2-].
(6')
Proof. By (2), it suffices to show (6). We use induction on n. For n = 1 there is nothing to prove. Let L be an n-chain containing d + 1 defective elements. We declare one of these d + 1 elements to be pure. To determine the remaining d defectives we use as first test set an L[ which is optimal for the (d+ 1, n)-problem. Using the same argument as in the previous lemma, we have
D(d, n)< 1 + max{A j: O<_j<_d} with Aj = D ( d - j , l) + D(j, n - l). The (d+ 1)-st defective element may be in L[ or in Lf÷l. By the choice of L( and (5) we see
D(d + 1,n)= 1 + max{Bj, Cj: O<_j<_d} with B i = O ( d - j , l ) + D ( j + 1,n-l), C j = D ( d - j + 1 , l ) + D ( j , n - l ) . If j + 1 < [(n - l ) / 2 ] , then by the induction hypothesis
D(j,n-1)<_D(j+ 1 , n - l ) ,
i.e. Aj<_Bj.
If, on the other hand, j + 1> [ ( n - l ) / 2 ] then
and hence by induction
D ( d - h l ) < _ D ( d - j + 1,/),
i.e. A j ~ Cj.
In either case, we have A j_< max{Bj, Cj } for every j , and thus (6).
[]
The unimodality property allows us to sharpen (3) to
D(d,n)<_D(d,n+ 1)_2d- 1.
(7)
Indeed, if we choose LI = {ll } as first test set for the (d, n + 1)-problem then depending on whether Ii is defective or pure we need D ( d - 1, n) or D(d, n) more tests, i.e. at most D(d, n) more tests by (6).
M. Aigner, M. Schughart / Determining defectives
362
3. T h e t h e o r e m
Consider the threshold function fd(m) as introduced in Section 1. By our assumption n>_2d we m a y confine ourselves to values fd(m)>_2d and hence to k>_2 in (1). Theorem.
Let fd(m) be the largest integer with D(d, fa(m))<-m, m>_2d- 1. Then fa(kd+ i - 1) = ( d + i)2 k- 1 for k>_2, d>_ 1, O<_i<_d- I.
(1)
P r o o f . To prove (1) we have to show
D(d, (d + i)2 k- 1) < kd+ i - 1 (k>_2, d>_ 1, O<_i<_d- 1)
(8)
D(d,2d)>_2d-1
(9)
and (d_> 1),
D(d, (d+ i)2 k- 1+ 1)> kd+ i (k>_2, d>_ 1, O<_i
(10)
In the p r o o f of (8) and (10) we use introduction on k a n d d + i, in this order.
A. Proof of (8). We are actually going to prove (8) for all i>__0. Let us verify the induction start k = 2 first.
D(d,2(d+i))<_2d+i-1 The p r o o f inequality (11) when argument.
for all d_> l , i _>0.
(11)
is by induction on d+i. For d+i= 1, i.e. d = 1, i = 0 we have the trivial D(1, 2 ) < 1. Now assume (11) to be correct for all d ' + i ' < d + i. We prove d + i is even. The case when d + i is odd can be settled by an analogous We choose L d+i a s our first test set whence by (5),
D(d,2(d+i))
(D(d-j,d+i)+D(j,d+i)).
(12)
0--
If i>d, then d - j < ( d + i ) / 2
and thus by induction
d+i
d+j- 1
< 2(d-j) + ~ 2
D(j,d+i)=D(j,2(j+(d2i -
j ) ) ) < -2 j 4
(13)
d +2 i
j-1.
(14)
Addition of (13) and (14) for all j together with (12) yields then the desired inequality (11). Now suppose i _ d. We know f r o m (6) and (2) that
D(d-j,d+i)
and
D(j,d+i)<_D((d-i)/2,d+i)
M. A igner, M. Schughart / Determining defectives
for
O
363
thus be simplified to
(D(d2i
max
l,d+i)+D(d2i+l,d+i)).
(15)
0<_ I< (d + i ) / 2 - r d / 2 ]
Considering one such term D((d + i)/2 and hence by induction on d + i,
- 1,d + i) + D((d- i)/2 + 1,d + i) we have l < i
D(-Clli l,d+i) = D ( ~ - - - l , 2 ( d 2 i
l+l)) <_d+i-l-1,
(16)
D(d~i+ l,d+i) : D ( d ; i + 1,2(d2i+l+(i-l)))<_d+l-1.
(17)
Addition of (16) and (17) for all I together with (15) yields again (11). Now suppose (8) is correct for all k ' < k. We are actually going to prove (8) for all i > 0 as in the induction start k = 2. For given k we use again induction on d + i. For d + i = 1 we have D(1, 2 k - l ) = [log2 2 x - l ] = k - 1 by (4). As in the proof of (11) we confine ourselves to the case when d + i is even. We c h o o s e L~d+i)2k-2as our first test set. Then by (5)
D(d,(d+i)2*-l)<<_ 1 + max (D(d-j, (d+ i)2 k-2) + D ( j , (d+ i)2k- 2)). 0
Ld/ZJ
(18) By induction on d + i we have
D(j, (d + i)2 k-2) =D(j, d ; i 2 k - ~ ) = D ( j , ( j + ( d ; i
\
j))2 k-l)
<_kj-~ d+i j - 1 .
(19)
2
If
j<_(d-i)/2 then by induction on k, D(d-j, (d + i)2 ~- 2) = D(d-j, (d-j + (i +j))2 k- 2) <(k- 1)(d-j)+i+j- 1.
(20)
Adding (19) and (20) we obtain
D(d-j, (d + i)2 k-2) + D(j, ( d +
i)2 k-2) <
3i kd- -d + -=+j - 2
2
2
2
364
M. Aigner, M. Schughart / Determining defectives
If, finally, j > ( d - i ) / 2 , then by induction on d + i,
D(d-j'(d+i)2k-2)=D( d - j ' +-----~/2kd1 ) 2
< k ( d - j ) +j
d-i 2
1.
(20')
Addition of (19) and (20') gives again
D(d-j, (d + i)2 k- 2) + D(j, (d + i)2 k- 2) < kd + i - 2, whence we obtain in all cases
D(d, (d+ i)2 k- 1)< kd+ i - 1, which was to be proved. B. Proof of(9). We use induction on d. For d = 1 we have D(1,2)= 1. For d > 1, (5) implies
D(d,2d)= 1 + min
max (D(d-j,d+l)+D(j,d-l)).
(21)
O<_l<_d-10<_j<_d
To prove (9) we have to find for every l a term D(d-j, d + 1) + D(j, d - 1) which is at least 2d - 2. Consider D([ (d + 1)/2 J, d + l) + D(r (d- 1)/2 ~, d - 1) for 0 _ 1_
D ( ~ J , d + l) + D ~ - - ~ , d - l) >-(d + l-1) + ( d - l - 1 ) = 2 d - 2. If d + l is odd, then (7) and induction imply again
,+1,
>-D(d+l+2 l ' d + l + l ) + D (
d-l+l,d-l+l)-2
>_(d + l) + ( d - l) - 2 = 2 d - 2. C. Proof of (10). A s before we verify the induction start k = 2 first.
D(d,2(d+i)+l)>_2d+i
for all d_>l, O<_i<_d.
(22)
M. Aigner, M. Schughart / Determining defectives
365
'he case when i---0 is settled by (9) and (7). Indeed,
D(d,2d+ 1) = D ( d + 1,2d+ 1) > D ( d + 1,2d+ 2 ) - 1
(23)
>_2d+ 1 - 1 =2d.
'he proof of (22) is by induction on d + i. For d + i = 1, i.e. d = 1, i = 0, the inequality )(1, 3)>2 is clear. Suppose (22) holds for all d , i' with d'+ i'
D(d,2(d+i)+ l)= l + rain max ( D ( d - j , d + i + l ) + D ( j , d + i - l + l)). l<_l<_d+i O<_.j<_d
(24)
a order to prove (22) we must find for every l a term D ( d - j , d + i + l ) + )(j, d + i - l + 1) which is at least 2d+ i - 1 . Case (0. 3i+l>d+ 5. Consider j = F(d+i-l-1)/4] then O
i-'l]2'(d-ld+i-'14
4
,) (I d+/4`-'l- L~-'-' -= JO) +
+
+D(Id+i-l-41]'2(Fd+i-l''4l+([d J+i-I 2 -
.et us verify the induction hypothesis for the summands in (25). For the first sumland we have
d+i-l-1 4
I
1Ld2-lJ-
- l>-d+i-l-14
d-i-l
3i+l-d-5
_>0,
m
4
F+i4, ,1
,j,
=2i +i,-al ]2 , _< =i-
[
+1-
.L: ~
J
,
d-i- l _ 1 2
I d + i - l - 4 11 __
+i-l-1 4
d+ i - l 4
i-1"
1 '1
M. Aigner, M. Schughart / Determining defectives
366
Furthermore,
(d-ld÷i;l-II)+(-d+i41-11-[d-i2-1J-II d-i-lJ2
=d-
- 1
Similarly, for the second summand we see
1V 1 .+i,11 V,+,,.1 L,./,j D(d-ld+i)-l-ll'd+i4+ll+D(Vd+i-l-l]'d+i-l+l 0<_
d+i-I 2
4
JV _
+
d+i-l-1
<
d+i-l-1
4
2
4
-
'
=
4
By induction we infer
4
+(-d+i41"ll+Ld2-1Jl =2d+i-1.
C a s e (ii).
3i+ l < d + 5. Consider j= L(d+ i - 1)/2.], then
O
and
D(d-[d+~-lJ,d+i+l)+D(Ld2-l_l,d+i-l+l I + 2 J,d+i-l+l) >_D(Id2+ll,2(rd2+l I+ ( i - 1 ) ) + 1)
+i-tJ,2[a+i-t 2
+D(Ld 2
J+l).
(26)
M. Aigner,M. Schughart/ Determiningdefectives
367
he induction hypotheses for the two summands in (26) are again easily verified, hence
D ( d - [d+-~-l ',d+i+l)+D(~d+i-I >_(2Id2+ll+i-l)+2[d+2-'
]
=2d+i-
1.
Now we come to the proof of (10). As before, we actually verify (10) for the range ___i < d. We assume (10) is correct for k ' < k. For k and d + i = 1, (10) is trivially true (4). For i = 0 the assertion (10) is also true. Indeed, by induction on k we have
D(d, d2 k- 1+ 1) --D(d, (d+ d ) 2 k -
2 + 1)>_ ( k - 1)d+ d =
kd.
ow we use induction on d + i where we may assume i > 0. The proof is exactly as ,r the case k = 2. By (5), we have
D(d, (d+ i)2 k- 1+ 1) =
(D(d-j, (d+ i)2 k-2 + l) + D(j, (d + i)2k-2-1+ 1)) l
1 + m i n max
t
for
j
(27)
~r given i we have to find j with
D(d-j, (d + i)2 k-2 + l) + D(j, (d + i)2 k- 2_ l+ 1) > kd + i -
1.
(28)
re just list the two cases.
Case (i). 3i+(l-2)/2k-2>d.
We take j =
F(d+i)/4-(1-
1)/2 k-] - 1 and write
8) in the form
D(d-j,(d-j+(
i2d+2~llJ
+h(j,(J+(Ldxi
+j))2k-'+l)
2k-/ lJ -J)) 2 k - l + l ) -
(29,
he induction hypotheses for both s u m m a n d s are easily verified whence the term t (28) is at least
i-d+l-I
I
[
l-1 7-1+ ?F2"~)+ (d2i ~_( kd+ i 2 d+2k---1 =kd+i-2+~ 2k-l • ad thus at least k d + i - 1 .
i 2k_l
1+
M. Aigner, M. Schughart / Determining defectives
368
Case (iO. 3i + ( l - 2)/2 k-2 _ d. We take j = F(d - i)/2 - ( l - 1)/2 k- 17- The proof is then as in case (i). []
Examples. We close with some examples following immediately from (1). D(d, 2 d ) - - 2 d - 1,
(30)
D(d, 2 d + 1) = 2d,
(31)
D(d,2")=(n-
Flog2dl)d+2 ['°g2al (n___2),
D(2,2n)=2n-1
1
for n > [log2d 1.
D(3,2")=3n-3
(n_>3).
(32) (33)
Table 1 gives the smallest values of D(d, n). Table 1 Values of D(d, n) d 1
2 3 4 5 6
n
2
3
4
5
6
1
2
2
3
3
2
3 2
4 4 3
4 5 4 3
7
8
9
10
II
12
3
3
4
4
4
4
5 6 6 5 3
5 6 7 6 5
6 7 8 8 7
6 7 8 9 8
6 8 9 10 10
6 8 9 10 11
References M. Aigner (1986). Search problems on graphs. Discrete Appl. Math., to appear. G.J. Chang and F.K. Hwang (1981). A group testing problem on two disjoint sets. SIAM J. Algebraic Discrete Methods 2, 35-38. G.J. Chang, F.K. Hwang and S. Lin (1982). Group testing with two defectives. Discrete AppL Math. 4, 97-102. G. Katona (1973). Combinatorial search problems. In: Srivastava et al., Eds., A Survey o f Combinatorial Theory. North-Holland, Amsterdam, 285-308. B. Lindstr6m (1975). Determining subsets by unramified experiments. In: Srivastava, Ed., A Survey o f Statistical Designs and Linear Models. North-Holland, Amsterdam, 407-418. M. Sobel (1968). Binomial and hypergeometric group-testing. Studia Sci. Math. Hung. 3, 19-42. R. Tosi6 (1980). An optimal search procedure. J. Statist. Plann. Inference 4, 169-171.