Preventive Veterinary Medicine, 15 (1993) 159-167
159
Elsevier Science Publishers B.V., Amsterdam
Bayesian approach to calculate sample size for the examination of animals for the presence of residues K. F u c h s a, J. G ~ l l e s a a n d J. KiSfer b alnstitut J~r Angewandte Statistik und Systemanalyse, Joanneum Research, Steyrergasse 25a, 8010 Graz, Austria bSteirischer Schweinegesundheitsdienst, Zimmerplatzgasse 15, 8010 Graz, Austria (Accepted 22 July 1992)
ABSTRACT Fuchs, K., GSlles, J. and Ktifer, J., 1993. Bayesian approach to calculate sample size for the examination of animals for the presence of residues. Prey. Vet. Med., t 5:159-167. The Council of the European Economic Communities has issued a directive (86/469/EEC) for the examination of animals and fresh meat for the presence of residues of several hormones, drugs and environmental pollutants. The purpose of this paper is to compare the method used by the EEC with a Bayesian approach (which is based on prior information ) to calculate necessary sample size, using an example of residues of drugs in pork in the Austrian state of Styria. The Bayesian approach requires smaller sample sizes for all levels of inspection.
INTRODUCTION
The presence of residues (such as hormones, pesticides, heavy metals or drugs) in food of animal origin has long been a subject of public discussion in many countries. Since the early 1980s different monitoring systems have been developed in order to determine the true amount of these residues and to draw conclusions as to the possible contaminants. Once it is decided to analyse the animals for the presence of contaminants one must examine either the whole lot or a sample of it. The first strategy would destroy the lot. Therefore, the question arises: what should be the size of the sample so that reliable and statistically significant conclusions can be drawn from the analytical results? Answers to that question involve statistical concepts of probability and sampling. Briefly, a sampling plan is a procedure Correspondence to: K. Fuchs, Institut fdr Angewandte Statistik und Systemanalyse, Joanneum Research, Steyrergasse 25a, 8010 Graz, Austria.
© 1993 Elsevier Science Publishers B.V. All rights reserved 0167-5877/93/$06.00
160
K. FUCHS ET AL.
for withdrawing a sample, carrying out analysis of appropriate sample units, and making appropriate decisions based on an agreed criterion. In this regard, the Council of the European Economic Communities has issued a directive ( 8 6 / 4 6 9 / E E C ) for the examination of animals and fresh meat for the presence of residues of several hormones, drugs or environmental pollutants (like organochloric pesticides or heavy metals ). The purpose of this paper is to compare the method used by the EEC (in this paper termed the 'classical approach') to the Bayesian approach, where the sample size is calculated using prior information.
THE CLASSICAL APPROACH
Let us suppose that n different animals are to be examined for the presence of residues, and in each there is the probability P that the outcome will be positive; then, the number of tests which yield a positive result may be represented by a random variable X binomially distributed with parameters n and p, i.e. Xe B ( n,p ). The probability of getting c positive results out of a total of n is then given by:
P(X=c)=(n)pc(1-p) "-c.
(1)
Accordingly, the probability that the number of contaminated animals does not exceed c is given by:
P(X<_c)=
- - [[ n ]
p , (1 -P)n-'.
i__---ok/i
(2)
I f n is large enough (at least np( 1-p) > 9 ) (Hartung, 1987, p. 203), one can calculate a confidence interval for p (the proportion of contaminated animals ) by approximating the binomial distribution with the help of the normal distribution. If n is too small, it is useful to consider the following. Supposing XeB (n,p) and F is a random variable with F-distribution with f~ = 2 ( c + 1 ) and f 2 = 2 ( n - c ) degrees of freedom, it holds that: (3) (see Johnson and Kotz, 1970, p. 58f, and Hartung, 1987, p. 204). Assuming F=(\ cn-c+ 1 1P p ) ' the 1 - o~ confidence interval for p is given by:
BAYESIANAPPROACHTO DETERMINGRESIDUESIN ANIMALS
(C'~I- l ) F ~ / 2 (fl Z )
( n - c ) + ( c + 1 ) F ~ / 2 f f 1Z )
| 61
(c+ 1)Fi_./a(fl ~ ) (n--c) + (c+ 1)F,_./z(fl f2)
I
I
P~
Pu
(4)
where Pl,Pu denote the lower and the upper limit of the proportion p, respectively. P~,Puare known as Pearson-Clopper values (Kendall and Stuart, 1967, p. 118). The directive 86/469 (from the Council of European Communities) gives the following inspection guidelines for food control. (a) Normal control: the food control should be carried out in such a way that with a probability of 95% (i.e. 1 - o t = 0.95 ) one should guarantee that in case there is no contaminated animal in the sample (i.e. c = 0 ), the proportion of contaminated animals in the whole population is less than 1% (Pu < 1%). (b) Reduced control: in a reduced control, the above-mentioned procedure should guarantee that less than 2% of the whole population is contaminated (pu<2%). (c) Tightened control: with a probability of 99.9% ( 1 - o l = 0 . 9 9 9 ) tightened control should guarantee that in case of an absence of contaminated animals in a sample, the proportion of contaminated animals in the whole population is less than 1% (Pu < 1%). For example if c = 0 , 1 - ~ = 0 . 9 5 (inspection guideline for normal control) and n=365, then f l = 2 and f2=730. Thus Fl_,~/2(fl,fz) =F0.975(2,730) ~3.708. The upper limit of 95% confidence limit for p is 3.708/(365 + 3.708 ) = 0.01004 which is very close to 0.01 or 1% specified in the directive. So the required sample size for normal control is 365. Similarly the required sample size for reduced control is 181 and that for tightened control is 753. The EEC suggests, probably for organisational simplicity, that 300, 150, 700 animals must be sampled under the normal, reduced and tightened control guidelines, respectively. THE BAYESIAN APPROACH
The main idea behind the Bayesian approach for calculating the sample size is to use prior information (obtained either from previous investigations or from other sources) about the parameter p (i.e. proportion of contaminated animals). According to Bayes Theorem the posteriori probability density is given by
162
K. FUCHS ET M..
g(pfn,c)= .
f(cJn,p)g(p)
(5)
~f(c In,p)g(p)dp 0
where the prior information about p is specified in the form of the density function g(p) a n d f ( c ] n,p) is the likelihood of the binomial model B(n,p) (for details see Press, 1989). Now we wish to explore how g(pin,c) varies for different choices o f g ( p ) . Instead o f using a mathematical representation of g(p) which affords both theoretical and practical knowledge, it is very convenient to discover a family of probability densities which, by varying the small n u m b e r of parameters of a mathematical function, could be m a d e to generate a range of 'shapes' of prior information which would adequately represent m a n y actual forms of information occuring in practice. In the binomial case we require a family o f functions defined on the interval 0 < p < 1, and specified in terms of a few parameters which can be varied to provide a n u m b e r of flexible forms. Such a family of densities is the beta family, defined (for a > 0, b > 0) in the shape of , , F(a+b) ~, ~ gtp)=ff(di~)p (l-p) h r=Beta(a,b)
(6)
where F(. ) is the g a m m a function (see H a h n and Shapiro, 1967, p. 83 ). Since g(p) is a probability density, we have fg(p)dp= 1 and so 1
I'(a)I'(b) Pa-'(1--P)~b-~)dP-- F(a+b) o
(7)
Figure 1 shows a n u m b e r o f density functions of the beta distribution depending on different values for a and b. This helps us to get an understanding of the nature of influence of a and b. For a particular choice ofa,b we get
g(Pl n,c) = ,
p,+,.-l( l_p)t,+,, ,-J
f p a + " - ' ( 1- p ) " + "
(,--
'dp
0
F(a+b+n) ,+,. ,, p)~,+,, ,.-F(a~cc)~(~-_~n+c)p - tl'
(8)
using eqn. (7) in order to evaluate the integral. Now we have an interesting result. The posterior density g(pfn,c) is also of the beta form, but with parameters ( a + c) and ( b + n + c) in place of the prior parameters a,b. This provides a relatively simple rule for updating informa-
BAYESIAN
APPROACH
TO
DETERMING
RESIDUES
IN
10(I
100i
100
g(}~p I
Bcla{
| 63
ANIMALS
8
1.2(}0)
6O
Z 24o
a5 ~
©
2(1
s
leta( I,1001
Beta{2,200)
4t
I~Bcta(5'200)
2O •
Beta( 1,5l)1 [1~ l
4.0 S.0 12.0 16.0 20.0 prop. t)f LontRnlin. anilllals l)f
.~ 4{1
l l l l ilL~
50) ~,,l,,~],,tllt~lttt[
l l l l l l l l l
0 4.0 8.0 12.0 16.0 20.0 prop. of cot~tamin, anilnals c/c
0 4.0 8.(1 12.0 16.0 20.0 prop. of contamin, animals cA
Fig. 1. Density function of the beta distribution for different parameters a,b.
tion in the case of a binomial probability model B (n,p) and prior information represented by a beta probability density g(p). Suppose we know g(p) = Beta (a,b) and F is an F-distributed random variable with fi = 2 ( a + c ) and f 2 = 2 ( b + n - c ) degrees of freedom (note: these are the parameters of the posterior beta probability density) we obtain similar to eqn. (3) F = \ (b+n-Ca+c l P ) a n d s o s i m i l a r t o e q n . ( 4 ) a l - o ~ c o n f i - p dence interval forp is given (see Stange, 1977, p. 177):
(a+c)F,_./z(f~f2) (a+c)F./z(fi f2) < P< ( b+ n c ) + ( a+ c )F~-./2(fl~ f2) ( b + n - c ) + (a+c)F./z(flf2) I
I
Pl
Pu
(9)
For applying eqn. (9) it is necessary to specify the parameters a and b of the beta distribution in such a way that the density curve reflects the distribution of the parameter p under study. An extensive discussion about the various possibilities to solve this problem can be found in Hampton et al. ( 1973 ). One can apply, for example, the technique of judgemental curve fitting, where the prior information is modelled with the help of characteristic
164
K. FUCHS E'I AL.
points (e.g. mean, median, variance). The mean E(p) and the variance V(p) of the beta distribution are given by
a ab E(p)=~+b and V(p)- (a+b)2(a+b + 1)
-
(10)
which characterise the density function of this distribution completely (Press, 1989, p. 41). Table 1 shows for different values of E ( p ) and S(p) (where S(p) = x / V i p ) ) the corresponding parameters a and b. As m e n t i o n e d above, our main goal is to calculate the sample size n needed to be able to draw reliable and statistically significant conclusions from the analytical results. After determining the values of oz and po, we can apply eqn. (9) for calculating the necessary sample size iteratively. Comparing eqns. (4) and (9) we get particularly ndassical= F/Bayes + b for a = 1; this means that in the case o f a Beta (1,b) probability density, one can save b units using the Bayesian approach in comparison with the classical one. Further we have to guarantee that the true proportion p of contaminated animals lies below an upper threshold Pu. We take a sample o f n animals and find c contaminated ones. Now we can decide with the help of an operation characteristic curve whether or not p lies below Pu (Schilling, 1982 ). Figure 2 shows the OC curves for the following situation: n u m b e r o f animals examined n = 50, n u m b e r o f contaminated animals c = 0, prior density function g ( p ) = Beta ( 1,50 ). -
-
-
-
-
Looking at the OC curves we see that if the true proportion p is 2% we, wrongly, accept the hypothesis stating that p is less 2% (i.e. Pu) in only five cases out of 100 (i.e. fl-risk equals 5%). In contrast, if we use the same fl-risk in the classical case the hypothesis states that p is less than 6.5%, which would TABLEI Different values for E(p) a n d S(p) a n d the c o r r e s p o n d i n g p a r a m e t e r s a a n d b Model
E(p)
S(p)
a
h
1 2 3 4 5 6 7 8 9 |0
2.0% 2.0% 2.0% 2.0% 1.5% 1.5% 1.5% 1.0% 1.0% 0.5%
2.0% 1.5% 1.0% 0.5% 1.5% 1.0% 0.5% 1.0% 0.5% 0.5%
1 2 4 15 1 2 9 1 4 1
49 98 196 745 66 131 591 100 396 200
165
B A Y E S I A N A P P R O A C H T O D E T E R M I N G R E S I D U E S IN A N I M A L S l{}{).{}{~
8{}.0%
6{}.0%
! ..........
i
i~
2{}.{}~
5.0:7~ 0.0%
i~ 0.0
2.0
4.0
i:
i
6.0
8.0
i I 0.0
i 12.0
proportion of contaminated animals %
Fig. 2. OC curves using the classical a n d the Bayesian a p p r o a c h for n = 50 a n d c = 0 .
not satisfy the requirements of the EEC. Alternatively, if the true proportion is 2% we would, wrongly, accept the hypothesis that p is less than 2% in 39 out of 100 (i.e. fl-risk equals 39%) in the classical case. In order to have the same precision as that of the Bayes approach the classical method would require n = 150 and c = 0 . This shows that for the same sample size the precision in estimating p by the Bayesian method is higher than that by the classical method. E X A M I N A T I O N O F P I G S F O R T H E P R E S E N C E O F D R U G S IN STYRIA ( A U S T R I A )
In Styria (one of the nine states of Austria) more than 1.2 million pigs are slaughtered per year. The first step toward the installation of the Styrian monitoring system was a survey conducted in 1985 to determine the amount of heavy metals in pigs in Styria (see K6fer et al., 1985). The results of this survey led to the actual monitoring system established in 1988 (G611es et al., 1989), which also was based on Aigner (1987) and on K6fer and Aigner ( 1988 ) and was modified later by Fuchs et al. ( 1990 ). The results from these studies showed that p (proportion of animals contaminated with drug residues) followed a beta distribution with the shape parameter a = 1 (see left side of Fig. 1 ). Moreover, the results showed that p is expected to lie between 0.5% and 1.5% (i.e. E(p) between 0.5% and 1.5%). Equation (10) and Table l, as well as a comparison of Fig. 1 with the empirical data of residues in pigs in Styria, imply a value for b of 100. That means that a Beta (1,100) priori
166
K. FUCHSET AL.
IlX)O, t ,~0(I,;
.~' (,(L(I'', ~z
40 0';~
(3 (~
IO0
2(X}
3.{X}
4.~X}
propt)rlit)ll ol COlllalllitlatcdaZlilllals ~e~ Fig. 3. OC curves for the three inspection situations.
function for p, is the best one on which to model the proportion of contaminated pigs in Styria. After determining the values for a = 5% and Pu = 1% (inspection guidelines of the directive) we can apply eqn. (9) for calculating the necessary sample size iteratively. For example if n = 2 6 5 then f~ = 2 and f2 = 7 3 0 (for a = 1 and b = 100). Thus F I - , / 2 (J] f2) =F0.975 (2,730) ~ 3.708. The upper limit of a 95% confidence limit for p is 3.708/(365 + 3.708 ) = 0.01004 which is very close to 0.01 or 1% (i.e. Pu) specified in the directive. So the required sample size for normal control is 265. Similarly, the required sample size for reduced control is 81 and for tightened control is 653. Figure 3 shows the OC curves for these different inspection situations. In the case of normal control the corresponding OC curve shows for example, that if the true proportion p is 1% we, wrongly, accept the hypothesis stating that p is less than 1% (i.e. p~,) in only five cases out of 100 (i.e.//-risk equals 5%). In the case of reduced control the corresponding OC curve shows, for example, that the fl-risk equals 5% to accept p less than 2% if the true proportion of contaminated animals is 2%. In the case of tightened control the corresponding OC curve shows for example, that the/~-risk equals 0.1% to accept p less than 1% if the true proportion of contaminated animals is 1%. DISCUSSION
This paper shows that the Bayesian approach, by using prior information, can considerably reduce the sample size for the examination of animals for
BAYESIAN APPROACH TO DETERMING RESIDUES IN ANIMALS
167
the presence of residues. This consequently leads to a reduction of costs without any deterioration in accuracy of the estimation. A Beta (1,100) function as the best one on which to model the prior information of the problem of residues in pigs in Styria confirms this for all levels of inspection. It is important to notice that this reduction in the sample size corresponds to only one residue type. As the EC directive classifies a number of hormones and drug residues as well as different animal types (cows, calves and pigs), the reduction in sample size would affect each residue. In the framework of the Austrian investigation programme for residues, a sampling plan based on the Bayesian principle reduced the total sample size by about 2000 units, a saving of about $125 000.
REFERENCES Aigner, C., 1987. Stichprobenmodelle zur Feststellung von Arzneimittelrfickst/~nden in Schweineund Rindfleisch. Diplomarbeit am Institut ftir Statistik der Technischen Universit~it Graz, pp. 27-43. Council of the European Economic Communities, 1986. Directive 86/469/EEC for the examination of animals and fresh meat for the presence of residues, pp. 1-10. Fuchs, K., G611es, J. and K~fer, J., 1990. Neue Ans~itze zur Objektivierung der Richtlinie 86/ 469/EEC. Proceeding of 31. Arbeitstagung des Arbeitsgebietes Lebensmittelhygiene der DVG, pp. 91-92. Grilles, J., Fuchs, K. and K6fer, J., 1989. Monitoringsysteme zur Uberwachung von Nahrungsmitteltierischer Herkunft in Osterreich. Beitr~ige Lebensmittelangelegenheiten-Veterin~irwesen-Strahlenschutz des Bundeskanzleramtes 4/89, pp. 7-113. Hahn, G.J. and Shapiro, S.S., 1967. Statistical Models in Engineering. John Wiley, London, p. 83. Hampton, J.M., Moore, P.G. and Thomas, H., 1973. Subjective probability and its measurement. J. R. Stat. Soc., A 136: 21-35. Hartung, J., 1987. Statistik. R. Oldenburg Verlag MiJnchen Wien. 6. Auflage, pp. 203-204. Johnson, N.L. and Kotz, S., 1970. Discrete Distributions. Houghton Mifflin Company, New York, pp. 58-59. Kendall, M.G. and Stuart, A., 1967. The Advanced Theory of Statistics. Vol. 2. Charles Griffin, London, p. 118. Kt~fer, J. and Aigner, C., 1988. Stiehprobenmodelle f'fir ein Monitoring der Schadstoffbelastung in Nahrungsmitteln tierischer Herkunft. Tagungsband des Symposiums ftir Umweltstatistik in Graz, pp. 101-113. K6fer, J., Lichtenegger, F., Schindler, E. and G611es, J., 1985. Untersuchungen fiber den BleiCadmium und Quecksilbergehalt im Schweinefleisch in der Steirermark. Wien. Tieraerztl. Monatschr., 73: 197-202. Press, S.J., 1989. Bayesian Statistics: Principles, Models and Applications. John Wiley, New York, p. 41. Schilling, E.G., 1982. Acceptance Sampling in Quality Control. Marcel Dekker, New York, pp. 26-30. Stange, K., 1977. Bayes-Verfahren. Springer Verlag, Berlin, p. 177.