Nuclear Instruments and Methods in Physics Research A 462 (2001) 561–567
The signal estimator limit setting method Shan Jin1, Peter McNamara* University of Wisconsin-Madison, USA Received 3 July 2000; accepted 22 September 2000
Abstract A new technique is presented for determining confidence limits in searches with low statistics. Recognizing that the physicist wishes to set limits on the signal hypothesis, the technique defaults to a confidence limit on signal only when zero events are observed. In order to accomplish this, the method overcovers, but much less so than methods using conditioning techniques. As a result, this technique gives more powerful frequentist confidence limits than conditioning techniques while still avoiding some unpleasant features of classical confidence limits. # 2001 Elsevier Science B.V. All rights reserved. PACS: 02.50.Cw; 13.85.Rm Keywords: Limit setting; Small signals
1. Introduction After an experimenter has performed a search, the experimenter would like to make a statement about the result of the search. The traditional frequentist method is to report a classical confidence interval. Although fine on average, classical confidence levels return unphysical results in some well-known circumstances. Feldman and Cousins [1] address this problem and potential problems which can arise in the confidence-level construction if the experimenter chooses, based on
*Corresponding author. Tel.: +44-22-767-7364; fax: +4422-782-8370. E-mail addresses:
[email protected] (S. Jin),
[email protected] (P. McNamara). 1 Present address: CERN/EP Division, 1211 Geneva 23, Switzerland.
the data, whether to report a one- or two-sided interval. Although the method of Feldman and Cousins removes the problem of unphysical results from the classical confidence interval treatment, the method yields results which are still troublesome when zero candidates are observed. In this case, it is possible to take advantage of the knowledge that zero-signal events were produced to set a limit on the signal which is independent of the background expectation. Dissatisfaction with the results from the Feldman and Cousins method, especially when zero events are observed, has led to several new unified techniques which make observations of zero events less dependent on the background hypothesis. [2] This paper describes an ad hoc method of determining frequentist upper bounds on a small signal hypothesis in the presence of background.
0168-9002/01/$ - see front matter # 2001 Elsevier Science B.V. All rights reserved. PII: S 0 1 6 8 - 9 0 0 2 ( 0 0 ) 0 1 3 1 2 - 7
562
S. Jin, P. McNamara / Nuclear Instruments and Methods in Physics Research A 462 (2001) 561–567
These upper bounds have the desirable property that when zero counts are observed, the bound on the signal hypothesis is independent of the background, and is given by a ¼ expðmÞ, where 1 a is the confidence level and m is the expected number of signal events. In order to obtain this property, the method overcovers more than that is required by the discreteness of the possible observations, but its overcoverage is much less than that coming from conditioning techniques, and therefore it produces more powerful limits. This method does not attempt to produce a unified treatment of confidence bounds and confidence intervals. In searches for new particles, a standard ‘‘threshold’’ for discovery, equivalent to a five standard deviation fluctuation of the background, has been established by the community [3] while lower confidences of 90–95% are more standard in reporting negative results. In this regard, the unified prescriptions are too strict, mandating the exclusion of the no-signal hypothesis at the lower confidence level which has no significance to the community. From this perspective, the unified approach is not attractive. The danger of undercoverage resulting from the flipflopping procedure described by Feldman and Cousins [1] results from a decision-making process on the part of the experimenter. It can be addressed by simply resolving, before the experiment, to quote an upper limit on the signal hypothesis regardless of the outcome of the experiment, rather than flip-flopping. This is not meant to prevent an experimenter from also making other discovery-related statements, but neglecting to produce the upper bound as a result of the data observation is exactly the problem Feldman and Cousins address. The flip-flopping problem can therefore be neglected, and it is assumed that experimenters will not waver in their resolve to quote an upper limit along with the rest of their results.
2. Construction of the signal estimator The signal estimator confidence level is constructed by determining the classical confidence
level and extending it with an additional term which gives the signal estimator the properties described in the previous section. The justification for this additional term can be understood best if one divides an observation into the components originating from signal and those originating from background. Given an observation, an experimenter would like to be able to know which events originated from signal processes, and which originated from background processes. Knowing which events come from signal processes, the experimenter could then discard the background events. An upper bound on the signal hypothesis could then be constructed as CL ¼ 1 a ¼ 1 Pm ðn nobs ðnb Þobs Þ
ð1Þ
where a is the confidence coefficient, nobs is the number of events observed, ðnb Þobs is the number of background events observed, and Pm ðn nobs ðnb Þobs Þ is the probability of observing fewer signal events than the observed given signal expectation m. In reality, one does not know which events in an observation come from the signal and which come from the background. One must therefore construct a confidence level which incorporates one’s limited knowledge of the background. Given a known background expectation b, one can compute the probability of observing a specific number of background events and make use of the relation nobs ¼ ðns Þobs þ ðnb Þobs
ð2Þ
to combine signal confidence levels. In this construction, the classical confidence level is the sum over background hypotheses less than or equal to the observation 1 CLclassical ¼ ¼
nobs X
Pb ðn ¼ ib ÞPm ðm nobs ib Þ
ib ¼ 0 nobs X
Pmþb ðn ¼ iÞ
ð3Þ
i¼0
while the conditional confidence level [4] is the average over background hypotheses less than or
563
S. Jin, P. McNamara / Nuclear Instruments and Methods in Physics Research A 462 (2001) 561–567
equal to the observation
confidence level is
1 CLconditional Pnobs
1 CLSE ¼
¼
ib ¼ 0
Pb ðn ¼ ib ÞPm ðm nobs ib Þ Pnobs : ð4Þ ib ¼ 0 Pb ðn ¼ ib Þ
ð5Þ the signal-estimator confidence level would be computed as 1 CLSE ¼
nobs X
Pb ðn ¼ ib ÞPm ðm nobs ib Þ
ib ¼ 0
þ ¼
1 X
Pb ðn ¼ ib ÞPm ðm ¼ 0Þ
ib ¼ nobs þ1 nobs X
Pmþb ðn ¼ iÞ
i¼0
þ ð1
nobs X
Pb ðm ¼ ib ÞÞexpðmÞ:
ð6Þ
ib ¼ 0
This construction, it will be shown, does not undercover and has the desirable property that the confidence level is independent of the background for an observation of zero events. Further, it overcovers less than the conditional confidence level and therefore allows more powerful exclusions.
3. Properties of the signal estimator It is trivial to determine the signal-estimator confidence in the case of an observation of zero events. When zero events are observed, the
Pb ðn ¼ ib ÞPm ðn 0Þ
ib ¼ 0
þ The signal-estimator method extends this weighted sum over all background event hypotheses, independent of the observation. In performing this sum, the method considers ðns Þobs to be the physical quantity of interest, bounding it so that background hypotheses greater than the observation contribute meaningfully to the sum. Using the formula ( nobs ðnb Þobs if nobs ðnb Þobs 0 ðns Þobs ¼ 0 if nobs ðnb Þobs 0
nobs X
1 X
Pb ðn ¼ ib ÞPm ðn 0Þ
ib ¼ nobs þ1
¼ Pm ðn 0Þ ¼ em :
ð7Þ
This is independent of the number of background events expected, and is equivalent to the confidence level expected by the Bayesian when zero events are observed. From a frequentist standpoint, coverage is the most important property of a technique for determining confidence levels. The classical confidence level is designed to overcover only by the amount mandated by the discrete nature of Poisson statistics. By imposing the condition that when zero events are observed, the confidence level is 1 em , the signal estimator method will cover differently from the classical confidence level. Therefore, it is important to show that the method does not undercover. To demonstrate that the method covers, it is necessary to show that given a confidence level 1 a, the fraction of experiments in which m is excluded is not greater than a for any signal hypothesis m. This can be written as Fmþb ð1 CLSE aÞ a:
ð8Þ
Expanding the requirement on 1 CLSE gives nobs X i¼0
1 X
Pmþb ðn ¼ iÞ þ
Pb ðn ¼ ib Þ
ib ¼ nobs þ1
Pm ðm 0Þ a
ð9Þ
which is satisfied by some set of values of nobs . Clearly, Fmþb ð1 CLSE aÞ is zero if the set is empty, and Eq. (8) is satisfied. If the set is nonempty, it is of the form ½0; nmax ðm; aÞ, and Fmþb ð1 CLSE aÞ ¼
nmax ðm;aÞ X i¼0
Pmþb ðn ¼ iÞ
ð10Þ
564
S. Jin, P. McNamara / Nuclear Instruments and Methods in Physics Research A 462 (2001) 561–567
which, because nmax ðm; aÞ satisfies Eq. (9) implies F mþb ð1 CLSE aÞ
nmax ðm;aÞ X
Pmþb ðn ¼ iÞ
i¼0
þ
1 X
Pb ðn ¼ iÞPm ðn 0Þ a
ð11Þ
i¼nmax ðm;aÞþ1
proving that this method does not undercover. Each of these three methods of confidence-level calculation overcover to some extent. The classical confidence level, which overcovers only to the extent required by the discreteness of the available observations, has the unattractive property that it can exclude signal with expectation m ¼ 0. The conditional and signal estimator confidence levels overcover to a greater extent. A frequentist would like to choose the technique which gives the least overcoverage, and is, therefore, the most powerful. Fig. 1 shows the 95% confidence-level upper bounds on the mean signal expectation for the three methods with three expected background events. When zero events are observed, the signal estimator and conditional upper bounds are the same. For observations of one or more events, however, the signal estimator method gives more strict upper bounds on the signal than the
Fig. 1. A comparison of three 95% confidence-level upper limits for a background expectation of three events. The solid line () is the signal-estimator upper limit, the dashed line (- - - - -) is the conditional upper limit, and the dotted line ( ) is the classical upper limit. For an observation of zero events, the classical method would exclude all signal hypotheses, while the signal estimator and conditional methods would exclude signal hypotheses greater than three events. For observations of one or more events, the signal estimator method always outperforms the conditional method.
conditional method, indicating that it is a more powerful method of setting limits. Fig. 2 shows the 95% confidence-level upper bounds resulting from the signal-estimator method when the background expectation is varied. This figure illustrates some interesting features of the signal estimator method. First, one can see clearly that when zero events are observed, the upper limit on the signal rate is independent of the number of expected background. For larger numbers of events observed, the upper limit is very similar to the upper limit obtained by the classical method, except when a significant deficit is observed. In case of a significant deficit, the signal estimator upper limit again converges to the limit obtained from zero observed candidates. This is less conservative than the result obtained by conditioning methods, which deviate more from the classical confidence level, and which do not converge as quickly to the zero events observed hypothesis in the presence of a deficit. Although it is difficult to prove analytically that the signal-estimator method overcovers less than conditional methods, it can be shown for several cases of interest. Fig. 3 shows the overcoverage for a 95% confidence-level limit for the three upper limit methods discussed above for three different Poisson backgrounds. One first sees that the overcoverage imposed by the discrete nature of the inputs is large, and that the classical method overcovers significantly for most signals. When no background is expected the three methods are
Fig. 2. Upper bound of a 95% CL confidence limit for an unknown Poisson signal mean m in the presence of expected Poisson background with known mean b.
S. Jin, P. McNamara / Nuclear Instruments and Methods in Physics Research A 462 (2001) 561–567
565
Fig. 4. A comparison of mean 95% CL upper limits vs the background expectation b when no signal is expected. Solid line () is the mean upper limit on the signal rate m for the signal estimator method, the dashed line (- - - - -) is the mean upper limit on m for the conditional method, and the dotted line ( ) is the mean upper limit on m for the classical method.
Fig. 3. A comparison of overcoverage for various 95% CL upper limits vs the Poisson signal mean m. Solid line () is the overcoverage of the signal estimator method, the dashed line (- - - - -) is the overcoverage of the conditional method, and the dotted line ( ) is the overcoverage of the classical method. (a) b ¼ 0, (b) b ¼ 3 and (c) b ¼ 10.
equivalent, and overcover in the entire range from 0 to 5%. When three background events are expected, the conditioning and signal estimator methods always overcover more than the classical method, especially for signal expectations below 3, where the conditioning and signal estimator methods overcover by 5%. This trend continues when large number of background events are expected. The other trend to take note of is that the overcoverage of the conditioning method is always greater than the overcoverage of the signalestimator method.
Another way of expressing this is to quote the mean excluded signal rate as a function of the background level. This would give a measure of the expected ‘‘power’’ of the methods. All else being equal, the method with the greatest ‘‘power’’ would be the best choice. Fig. 4 shows the average signals excluded in experiments with background expectation b. For b ¼ 0, one always observes zero events, and therefore all methods give a 95% CL upper limit at m ¼ 3. With larger amounts of background expected, the mean signal rate excluded rises for all methods. The rate at which the signal-estimator mean limit rises suggests that it is more powerful than the conditional method, but less powerful than the classical confidence level.
4. Generalization This technique was developed especially for Higgs searches at LEP, as a more powerful frequentist alternative to the conditioning techniques employed elsewhere [5]. In these searches, a sample of events is obtained from the data, each with some set of discriminating variables which can be combined analytically into a test-statistic, such as the likelihood ratio, which discriminates experiments with a specific signal hypothesis from those with no signal. The frequentist framework is
566
S. Jin, P. McNamara / Nuclear Instruments and Methods in Physics Research A 462 (2001) 561–567
then applied to this test-statistic to determine confidence levels. When extended to this more general framework, if the test-statistic has the property that the removal of any candidate event from an experiment must decrease the test-statistic for that experiment and that for any finite set of observed events the test-statistic is a finite real number, then all of the conclusions about the signal-estimator method derived in the previous sections hold true. The construction of the signal-estimator confidence-level proceeds similar to that described above. The constraints put on the test-statistic require that there exists a test-statistic value emin for an observation of exactly zero events. All observations of a greater number of events must be larger than emin . Any test-statistic value with a finite probability can be attained by some set of possible observations. For each observation in the set, each candidate may originate from either signal or background. Rather than deriving a general formula explicitly as above, the formula will simply be generalized in a method similar to that used to generalize the conditional confidence levels for the number counting case to the generalized conditional method described in Ref. [5]. The generalized form of the signal estimator confidence level can be written as 1 CLSE ¼ Psþb ðe eobs Þ þ Pb ðe > eobs Þ Ps ðe emin Þ ¼ Psþb ðe eobs Þ þ ð1 Pb ððe eobs ÞÞexpðsÞ
estimator values ½emin ; emax ðs; aÞ will exclude the hypothesis. If no hypotheses exclude s, s is overcovered. If a set of hypotheses excludes s, then Fsþb ðe emax ðs; aÞÞ Psþb ðe emax ðs; aÞÞ þ ð1 Pb ðe emax ðs; aÞÞÞexpðsÞ a: ð13Þ Thus, the signal-estimator method never undercovers. That the signal-estimator method has less overcoverage than the generalized conditional method is again difficult to prove analytically, but it can be tested easily for specific examples. Confidence levels for each individual hypothesis are computationally intensive in this generalized framework, so it is simpler to fix the hypotheses and examine their confidence levels rather than to fix a confidence level and fit the hypothesis. To illustrate the point, choose a discriminate with signal and background p.d.f.s as shown in Fig. 5. Fix the number of background events expected to 5, and vary the number of signal events expected. One can now define the Likelihood Ratio as described in Ref. [5], and compute confidence levels. A measure of the power of a method is the mean confidence level computed with that method for background experiments only. More powerful methods will have lower values of this quantity, reflecting the fact that they are more likely to exclude the given
ð12Þ
where Pb and Psþb represent the probability given the background or signal and background rates and the corresponding discriminating variable distributions. For an observation of zero candidates, the observed estimator value will be emin . Therefore, 1 CLSE can be seen to be expðsÞ, showing that the confidence level when zero events are observed is always the same, regardless of the form of the test-statistic. The proof that the method does not undercover can be straightforwardly generalized from the number counting case. Given a signal hypothesis s, either no hypotheses will exclude s, or a set of
Fig. 5. Probability distribution functions for the discriminant used in a search. The solid histogram is the p.d.f. for candidates arising from the signal process, and the dashed histogram is the p.d.f. for candidates arising from the background process.
S. Jin, P. McNamara / Nuclear Instruments and Methods in Physics Research A 462 (2001) 561–567
567
in cases such as these, the signal-estimator method is more powerful than the conditional method.
5. Conclusions
Fig. 6. A comparison of mean confidence levels vs. the signal expectation s with background expectation fixed to five events. Each candidate has an associated discriminant value as described in the text. The solid line () is the mean signal estimator confidence level for background hypotheses, the dashed line (- - - - -) is the mean conditional confidence level for background hypotheses, and the dotted line ( ) is the mean classical confidence level for background hypotheses.
signal. Fig. 6 shows the relative powers of the three methods discussed in this paper. For low numbers of events, one can see that both the signalestimator method and the conditional method perform less well than the classical method. When more than about four signal events are expected, however, the signal-estimator method performs quite similar to the classical method, while the conditional method remains significantly less powerful for signals with at least 10 expected events. Therefore, one can conclude that, at least
This paper presents a method for determining exclusion confidence levels on signal hypotheses in the presence of background. The method covers more than the classical one-sided confidence bound, but has the desirable property that when zero counts are observed, the confidence level is independent of the background hypothesis. The method overcovers less than the conditional confidence level, and allows one to set more powerful exclusion limits. The method can be applied simply, and has been generalized for use with test-statistics such as the likelihood ratio.
References [1] G. Feldman, R. Cousins, Phys. Rev. D 57 (1998) 3873. [2] C. Giunti, Phys. Rev. D 59 (1999) 53 001; S. Ciampolillo, II Nuovo Cimento A 111 (1998) 1415; B. Roe, M. Woodroofe, Phys. Rev. D 60 (1999) 53 009-1; M. Mandelkern, J. Schultz, hep-ex/9910041; G. Punzi, hep-ex/9912048. [3] Matts Roos’ comments in ‘‘Panel Discussion’’ at the CERN Workshop on ‘‘Confidence Limits’’ http://www.cern.ch/ CERN/Divisions/EP/Events/CLW/QA/PS/clwdiscuss.ps [4] O. Helene, Nucl. Instr. and Meth. 212 (1983) 319. [5] A. Read, DELPHI Note 97-158 PHYS 737.