385
Journal of Statistical Planning and Inference 12 (1985) 385-394 North-Holland
OPTIMAL PARTITIONS OF FINITE POPULATIONS DORFMAN-TYPE GROUP TESTING
FOR
C. Zachary GILSTEIN* Bell Communications Research, Red Bank, NJ 07701, USA Received 17 February 1984; revised manuscript received 25 March 1985 Recommended by S. Panchapakesan
Abstract: In some group testing models, a group of units may be tested simultaneously to determine that either all units are satisfactory or that at least one unit in the group is defective. In this article, a simple method is given for determining an optimal partition of a finite population into groups for doing group testing by using Dorfman-type procedures for both the usual binomial model and the modified binomial model of Pfeifer and Enis (1978). It is shown that an optimal partition can be determined by evaluating the expected number of tests for at most two partitions. The given method of determining an optimal partition greatly improves on the method given by Pfeifer and Enis (1978) and proves the optimality of the method suggested by Lee and Sobel (1972) for the usual binomial model. AMS Subject Classification: 62L99, 62P 10. Key words: Binomial model; Modified binomial model.
1. Introduction In binomial-type group testing, the object is to classify each member of a population into one of two categories, say satisfactory or defective. The basic feature of this type of group testing is that a group of units may be tested simultaneously to determine that either all units are satisfactory or at least one unit in the group is defective. The state o f each unit is satisfactory or defective with known probabilities q and 1 - q respectively, and the states o f the units are assumed to be independent. Since units are tested in groups, it may be possible to test groups so that the expected number of tests required to classify every member of a population of size n is less than n. A group testing procedure specifies the sizes of groups to be tested and the method of testing a group, once it is found to have at least one defective unit. The intent is to determine procedures which lead to a small expected number of tests. Here we consider Dorfman-type procedures for the usual binomial model (B * This research was completed while the author was a graduate student in the Department of Mathematics, University of California, Los Angeles, CA 90024. 0378-3758/85/$3.30 © 1985, Elsevier Science Publishers B.V. (North-Holland)
386
C.Z. Gilstein / Optimal partitions for group testing
model) and the modified binomial model (M model) of Pfeifer and Enis (1978). Dorfman-type procedures specify for those groups known to have at least one defective unit that each unit is tested individually until all units in the group have been identified. Though Dorfman-type group testing does not generally give the m i n i m u m expected n u m b e r of tests, it has the advantages of being a simple procedure, never requiring more than two tests on any member of the population, and being a substantial improvement over testing every member of the population individually when defectives are rare. More efficient procedures are described in the references listed below, but these are more complicated, may require several tests on a member of the population, and yield only a small improvement over Dorfmantype procedures. In the B model, the only information given about a unit when it is tested is that it is either satisfactory or defective. The B model has been widely discussed by Dorfman (1943), Sterret (1957), Sobel and Groll (1959), and Sobel (1960, 1968). In the M model, satisfactory units have a known (fixed) response _a on a given test, while defective units have responses in excess of a, but, otherwise, unknown. In the case of an infinite population, the optimal group size can be determined (cf. Pfeifer and Enis, 1978) and groups of this size can be formed to test the population. However, for a finite population, the optimal infinite population group size m a y not divide evenly into the finite population size so it is not clear what size groups to test. The purpose of this paper is to determine partitions of finite populations into groups such that the expected number of tests for Dorfman-type procedures in either the B or M model is minimized. The method for determining the optimal group sizes given here substantially improves on the method offered by Pfeifer and Enis (1978) for the M and B models and proves the optimality of the method offered by Lee and Sobel (1972) for the B model. If the expected number of tests for every possible partition of a population of size n had to be evaluated, then for n large about exp(n~)/4nx/-3 partitions would have to be considered (see Abramowitz and Stegun (1972), p. 825). Using the principle of optimality of dynamic programming, Pfeifer and Enis (1978) developed an algorithm which involves evaluating the expected number of tests for only about ½n2 partitions of the population. The algorithm is valid for both the M and B models. Lee and Sobel (1972) gave a method of finding a partition of a finite population for the B model. They restricted attention to just two partitions but did not show that one of these partitions was optimal. They did offer some justification for the restriction to two partitions for the case of q close to 1. Below we give three theorems which demonstrate that in both the M and B models an optimal partition can be determined by evaluating the expected number of tests for at most two partitions for any size population and for all q E (0, 1). We also give a simple method for determining the two partitions to compare. For the B model, the two partitions are the same as the two suggested by Lee and Sobel (1972).
C.Z. Gilstein / Optimal partitions for group testing
387
2. Optimal partitions Let Yx be the random variable representing the number of tests required to identify all defectives in a group of size x using a Dorfman-type procedure. We are interested in the expected value of Yx for each model. Clearly the expected value depends on q. Denote this expected value by fM(x, q) or fB(X, q) for the M and B models, respectively. Under Dorfman-type testing in the B model, if a group of size x is defective (i.e., at least one of the units is defective), then every unit is tested, unless the first x - 1 units are all found to be satisfactory in which case it can be inferred that the last unit is defective. In the M model, however, if a defective group of size x has a response t>x._a, then units need only be tested (1) until t - ~ i~l r i : (X--k)'~ for some k, where !"/is the response of the ith unit, in which case the remaining units are known to be satisfactory, or (2) until k = x - 1 in which case the state of the last unit can be inferred. Since, under the assumptions of the M model, the response of a defective unit is known to be in excess of a, it is often possible to completely identify all defective units in a defective group by testing only a small number of units individually. Pfeifer and Enis (1978) showed that
q2 fM(x,q)=x--~(1--qX-1), 1--q
X>_I.
(1)
The analogous expression for the B Model is
fB(x,q)= l +x-qX-l-(x-1)q
x,
x > l,
(2)
as derived by Sobel and Groll (1959). To determine the optimal size groups in which to divide a population of size n, we want to choose a partition of n, that is, a set {b 1, bin} of integers such that bi >- 1, i = 1, " ' ' ' m and ~, mi = l bi = n, to minimize ...,
m
f(bi, q),
(3)
i=!
where f is either fB (x, q) or fM (X, q) depending on the model used. If (3) is minimized by say {b~, ..., bin}, * then the optimal partition of a population o f size n is into groups of sizes b~, ..., b*. We begin by showing that in either the M or the B model an optimal partition must have all groups the same size or with a difference of at most one.
Theorem 1. Let group testing be done under the M model f o r a finite population o f size n. Then f o r all q ~ (0, 1), there exists an optimal partition containing groups whose sizes differ by at most 1. Proof. First note that since there are a finite number of partitions of n, an optimal partition exists. We have
388
C.Z. Gilstein / Optimal partitions for group testing
q2
fM(x, q ) = x - - ~ ( 1 - - q X - l ) . 1--q We will treat f~(x, q) as a continuous function of x for all x e [1, oo). Differentiating twice with respect to x, we obtain
qX+ 1On q)2 f~(x,q)=
1-q
Note that fr~ is positive for all x and for all q e (0, 1). Therefore, fM is a convex function. This implies that any partition of n which contains group sizes differing by more than one, say bi-bj>_2, can be improved by replacing these groups by groups of sizes b i - 1 and bj+ 1. [] The proof in the B model is more involved since the function fB(x, q) is not everywhere convex in x. Theorem 2. Let group testing be done under the B model for a finite population of size n. Then for all q ~ (0, 1), there exists an optimal partition containing groups whose sizes differ by at most 1. The proof of Theorem 2 is substantially more technical than Theorem 1 and is therefore given in the Appendix. The above theorems imply that for both the M and B models an optimal partition of a population o f size n into groups must have all groups the same size or groups of sizes k and k + 1 for some k. Knowing that the optimal partition must have groups with sizes differing by at most one, we may derive a simple technique for finding the optimal partition of a finite population for any q. Let a be an integer such that
~(a,q)
a
o*o,
x
where i stands for either B or M. The value of a is the optimal group size for a n infinite population since it minimizes the expected number of tests per unit. The values of a for q ranging from 0 to 1 have been given in Pfeifer and Enis (1978) and are also given in Tables 1 and 2. The following theorem shows how we can use the value of a to restrict attention to just two partitions of a finite population to determine an optimal partition. Theorem 3. Let the size o f the population be n. Let [n/a] =s, where [t] denotes the greatest integer < t and a is the infinite population optimal group size. (i) I f s = n/a, then the optimal partition is s groups o f size a. (ii) I f s< n/a, then the optimal partition is s groups o f sizes k and k + 1 (for some k) or s + 1 groups o f sizes I and 1+ 1 (for some 1).
C.Z. Gilstein / Optimal partitions f o r group testing
389
Proof. As discussed in the Appendix, fB(x,q)/x has a unique m i n i m u m for xe[1,Xo]. Pfeifer and Enis (1978) showed also that fM(X,q)/x has a unique minimum. Below f(x, q) stands for either fB (x, q) or fM (X, q). The case s = n/a is trivial and is left to the reader. Now assume s=#n/a. Note that n/s>a. Therefore, a partition of n into s groups of sizes not differing by more than one will yield say, u I groups of size (a+t) and u 2 groups of size ( a + t + l ) , for some t_>0, where u l ( a + t ) + u 2 ( a + t + 1)=n and ul +u2=s. We now show that a partition of n into fewer groups leads to a greater expected number of tests. A partition into fewer groups leads to say, vl groups of size (a+j) and v2 groups of size (a+j+ 1), for some j > t, where vl (a + j ) + v2 (a + j + 1) = n and v I + v2 < s. Now, since all groups sizes are greater than or equal to a,
f ( a + t, q) < f(a + t + 1, q) <_f ( a +j, q) < f ( a + j + 1, q) a+t a+t+ 1 a+j a+j+ 1 So
u l f ( a + t ' q ) + u z f ( a + t + l ' q ) < [ u2+ ul(a+t)a+t+l] f ( a + t + 1, q) H
a+t+l
f ( a + t + 1, q)
tl
<_a+j .f(a +j, q) < vlf(a+j, q) + v 2 f ( a + j + 1, q). Therefore, a partition into s groups is a better partition than any partition into fewer groups. Next note that n/(s+ 1)
C.Z. Gilstein / Optimal partitions for group testing
390
Example. Take q = 0.986, n = 22. We find from Table 1 that a equals 9 for this value o f q in the B model. [22/9] = 2. So we consider division into 2 or 3 groups. This leads to partitions { 11,11 } or {7,7,8}. E q u a t i o n (2) shows 2 fB (7, 0.986) +fB (8, 0.986) --5.131 < 5.136 = 2fB (11, 0.986). So the best partition is { 7, 7, 8 }. Corollary. L e t a b e as in t h e a b o v e t h e o r e m a n d a s s u m e n >
a 2 -
a. With n = s" a + r
w h e r e 0 < r < a - 1, t h e o p t i m a l p a r t i t i o n o f n is either
(i) s - r g r o u p s o f size a a n d r g r o u p s o f size a + 1, or (ii) s - a + r + 1 g r o u p s o f size a a n d a - r g r o u p s o f size a - 1.
Proof. By the r e m a r k above it is sufficient to show that (i) a < _ n / s < a + 1, and (ii) a - 1 < _ n / ( s + 1 ) < a . Since n _>a 2 - a, it follows that s_> a - 1. Therefore n
r
r
--=a+-~a+~a+l. s s a-1
The final inequality holds only as an equality if s = a - 1 and r = a - 1 in which case (i) does give the partition into s groups. Also n s+l - -
-a.
s s+l
r s+l
a-1 a
r s+l
+ ~ > _ a . ~ + ~ > a - 1 .
[]
The above corollary indicates the value of n where we m a y restrict attention to group sizes within one unit of the infinite sample optimal group size. It is still necessary to c o m p a r e the expected n u m b e r of tests for a partition into groups of sizes a and a + 1 with the expected n u m b e r of tests for a partition into groups of sizes a and a - 1. Tables 1 and 2 m a y be used to determine the optimal partition for the B and M models, respectively, when n is sufficiently large. Suppose the value o f q of interest is q*. To use the tables, first find the value of a which is the infinite population optimal group size. If n is finite but exceeds a 2 - a, then find the remainder r when n is divided by a. If q* is greater than the entry in the ath column and rth row then use groups of sizes a and a + 1, otherwise use groups of sizes a and a - 1. For instance, if n = 46, q * = 0.953 and the M model is being used, then the o p t i m a l p a r t i t i o n is 2 groups of 7 and 4 groups o f 8, while if n = 4 7 the optimal partition is 5 groups of 7 and 2 groups o f 6.
3. Summary We have presented a simple method o f determining an optimal partition of a finite population for doing group testing by using D o r f m a n - t y p e procedures for both the usual binomial model and the modified binomial model of Pfeifer and Enis
C.Z. Giistein / Optimal partitions for group testing
391
(1978). The technique involves calculating the expected number of tests by using equations (1) or (2) for just two partitions which may easily be determined by applying Theorem 3. Tables 1 and 2 can also be used to determine quickly the optimal partition if the population size is moderately large. Group testing has many applications in blood testing and quality assurance and usually requires a number of tests much smaller than the size of the population.
Table 1 Binomial model: Cutoff points for finite population partitions r
a=2
a=3
a=4
a=5
a=6
a=7
a=8
a=9
a=lO
0 1 2 3 4 5 6 7 8 9
0.6180 0.7071
0.7974 0.8413 0.8740
0.8971 0.9116 0.9232 0.9325
0.9399 0.9455 0.9503 0.9544 0.9579
0.9609 0.9635 0.9657 0.9677 0.9695 0.9712
0.9727 0.9740 0.9751 0.9762 0.9772 0.9782 0.9790
0.9798 0.9805 0.9812 0.9819 0.9825 0.9830 0.9835 0.9840
0.9845 0.9849 0.9854 0.9857 0.9861 0.9865 0.9868 0.9871 0.9874
0.9877 0.9880 0.9883 0.9885 0.9888 0.9890 0.9892 0.9895 0.9897 0.9899
a=9
a=10
0.9677 0.9686 0.9695 0.9703 0.9711 0.9719 0.9726 0.9733 0.9739
0.9745 0.9751 0.9797 0.9762 0.9767 0.9772 0.9777 0.9782 0.9786 0.9790
Table 2 Modified model: Cutoff points for finite populationpartitions
r
a:-2
a=3
a=4
a=5
a=6
a=7
0 1 2 3 4 5 6 7 8 9
0.0 0.0
0.5 0.6180 0.7071
0.7676 0.8019 0.8295 0.8514
0.8689 0.8815 0.8913 0.9015 0.9095
0.9164 0.9219 0.9269 0.9313 0.9353 0.9389
0.9421 0.9449 0.9475 0.9499 0.9520 0.9541 0.9559
a=8 , 0.9577 0.9592 0.9607 0.9620 0.9633 0.9645 0.9656 0.9667
Appendix: Proof of Theorem 2
We first state a lemma which will be used in the proof of the theorem. The proof of the lemma is trivial and is therefore omitted. Lemma 1. Let u and t) be the sizes o f two groups satisfying
fB(u,q)
u-t-t)
and fB(o,q)
u+o
392
C.Z. Gilste& / Optimal partitions f o r group testing
Then fB(u,q)+ fB(o,q)
fB(X, q)= 1 + x - - q x - l - (X-- l)q x and
f~'(x, q) = - qX- l (ln q)2 _ 2qXln q _ ( x - 1)qX (ln q)2. Let x0 be the root of f~' (x, q ) = 0, that is, x0-
q In q - 2q - In q qln q
(A1)
Then for q>0.422, x0 is greater than 1, and for 1 l for all x > x l . It follows that if a partition of n has any group size greater than x l , then this partition can be improved by doing unit testing on these groups. So we need only consider partitions with all group sizes less than Xl. If Xo>X~, then we are done. Substituting x0 into gB(x, q)-> 1 implies that for all q < 0 . 7 4 , x 0 > x 1. Thus we may restrict attention below to q e [0.74, 1). We now want to show that if we have any group size b such that Xo---b < xt then we can improve the partition by dividing b into two smaller groups. If we can find a group of size a so that
gB(a,q)
and
gB(b--a,q)
C.Z. Gilstein / Optimal partitions for group testing
393
then by L e m m a 1, a partition with two groups of sizes a and b - a is an improvement over one group of size b. Clearly, if we can find such an a, then we can improve all partitions by reducing groups sizes until all are less than x0. It is sufficient to show that there exists an a such that both a and b - a are greater than Xm, because gB(X, q) is increasing for Xm < X < X l . Furthermore, since x0 < bXm so that Xo-a>Xm. We will first take care of the case q ~ [0.74, 0.79] and then consider q ~ (0.79, 1). For q e [0.74,0.79] (see D o r f m a n (1943)),
gB(2, q)<--gB(x,q),
x = 1,2, ....
For q in this interval, we may take a = 2. Since for q e [0.74, 0.79] the m i n i m u m over the set o f integers o f gB (X, q) is obtained at x = 2, Xm is less than 3. Also x0 is an increasing function of q and at q = 0.74, x0 equals 6.29. Therefore, we have that X o - 2 > 3 > X m for all q~[0.74,0.79]. For q e (0.79, 1) we will show that a can always be taken to be the integer in the interval [Xm,Xm + 1). Thus it is sufficient to show that X 0 > 2 x m + 1.
(A2)
Differentiating gB (X, q) yields g~ (x, q) = x - 2 { [(1 - x l n q)(1 - q + xq) - xqlq x - l - 1}.
(A3)
If g~(½(Xo-1), q ) > 0 for all q~(0.79, 1), then (A2) holds. Note that
x2g~(x, q) + 1 = [(1 - x l n q)(1 - q +xq) - x q ] q x- 1 =[1-q-x2qlnq+xqlnq-xlnq]qX-I > [ - x 2 q In q + xq In q - x In q] qX- t.
(A4)
At x = ½(Xo- 1) the right-hand side of (A4) becomes
[ (xo ;qlnq+(X0 1 To show that g~ (½(xo - 1), q) > 0 for all q ~ (0.79, 1), it is sufficient to show that the expression in (A5) is greater than one for all q e (0.79, 1). Rewrite equation (A5) as [½Xo(1 - ½Xo)q In q - ½x0 In q + (½x0 - ¼)q In q + ½ In q] qtXo- 3)/2
=[½Xoq((l-½xo)lnq-~-)+(½Xo-¼)qlnq+½1nq]q
tx°-3)/2.
(A6)
Using the value o f Xo, we m a y simplify (A6) to [ ½Xoq (1 + ½ ( 1 - 1 ) l n q ) - q - ¼ q l n q ] q
tx°-3)/z .
(A7)
394
C.Z. Gilstein / Optimal partitions for group testing
Clearly the e x p r e s s i o n in (A7) is g r e a t e r t h a n [½x0 - 1] qXo/2. Note t h a t ½ x 0 1 n q = - 1 + ½ ( I -
(A8)
1/q)lnq>
- 1, so
qX°/2>e-1
U s i n g this, we get
that (A8) is greater than
[½x0- 1]e -1
(A9)
Since x0 is an increasing function of q for all q ~ (0, 1), it suffices to check (A9) for q = 0 . 7 9 . For q = 0 . 7 9 , x0=8.219 and [½x0- 1]e-1 = 1.14>1. Therefore, for all q e (0, 1), we may restrict attention to partitions all of whose group sizes are less than x0 and in this region the convexity of fB (x, q) implies the result of the theorem. []
Acknowledgment I would like to thank Professor Thomas S. Ferguson for a critical reading of this paper.
References Abramowitz, M. and I.A. Stegun (1972). Handbook o f Mathematical Functions. Dover, New York, p. 825. Dorfman, R. (1943). The detection of defective members of large populations. Ann. Math. Statist. 14, 436-440. Lee, J.-K. and M. Sobel (1972). Dorfman and R r t y p e procedures for a generalized group-testing problem. Math. Biosci. 15, 317-340. Pfeifer, C.G., and P. Enis (1978). Dorfman-type group testing for a modified binomial model. J. Amer. Statist. Assoc. 73, 588-592. Sobel, M. (1960). Group testing to classify efficiently all units in a binomial sample. In: R.E. Machol, Ed., Information and Decision Processes. McGraw-Hill, New York, pp. 127-161. Sobel, M. (1968). Optimal group testing. In: A. Renyi, Ed., Proceedings o f the Colloquium on Information Theory, Vol. II. Janos Bolyai Mathematical Society, Budapest, pp. 411--488. Sobel, M. and P. Groll (1959). Group testing to eliminate efficiently all defectives in a binomial sample. Bell System Tech. J. 38, 1179-1252. Sterret, A. (1957). On the detection of defective members of large populations. Ann. Math. Statist. 28, 1033-1036. Ungar, P. (1960). The cutoff point for group testing. Comm. Pure AppL Math. 13, 49-54.