Pattern Re(ognitionVol. 11, pp. 353 360,
0031-3203 79'1201-0353 S0200'0
Pergamon Press Ltd, 1979 Printed in Great Britain. i Pattern RecognitionSoci...
Pergamon Press Ltd, 1979 Printed in Great Britain. i Pattern RecognitionSociety
SOME ASPECTS OF ERROR B O U N D S IN FEATURE SELECTION D. E. BOEKEE and J. C. A. VAN DER LUBBE Information Theory Group, Department of Electrical Engineering, Delft University of Technology, 4 Mekelweg, Delft, The Netherlands
(Received 3 May 1978; in revised form 6 February t979) Abstract In this paper we discuss various bounds on the Bayesian probability of error, which are used for feature selection, and are based on distance measures and information measures. We show that they are basically of two types. One type can be related to the f-divergence, the other can be related to information measures. This also clarifies some properties of these measures for the two-class problem and for the multiclass problem. We give some general bounds on the Bayesian probability of error and discuss various aspects of the different approaches. Information measures Error bounds
Distance measures
I. INTRODUCTION In recent years many authors 0 10~have paid attention to the problem of bounding the Bayesian probability of error. This has been of particular interest in the field of pattern recognition related to feature selection. The problem can be stated as follows for a multi-class pattern recognition problem. Suppose we have m pattern classes C 1, C 2 , . . . , C , , with a priori probabilities Pi = P(Ci). Let the feature x have a class conditional density function p(x/C~). Then the a posteriori probability for class C~ given the feature value x can be given by P(C~/x). If the decision in the classifier is made according to the Bayes method, we select the class for which P~. p(x/Cg), or equivalently P(Ci/x), is maximal. This decision rule leads to a probability of error which is given by fy P~ = 1 max [Pi. p(x/C~)]dx ~ = 1 - Ex[max P(C,/x)],
Feature evaluation
l:divergence
the probability of error has also received much attention. In this paper we discuss some aspects of these distance measures, both for the two-class problem and for the multi-class problem. We show that many of the well-known distance measures belong to two general classes of distance measures. Firstly we shall study a g r o u p o f d i s t a n c e measures which can be related to the f-divergence. Secondly we shall study a generalclass of distance measures, called the general mean distance and relate it to a class of information measures. 2. THE F-DIVERGENCE In this section we shall study the behaviour of distance measures as measures of divergence between two hypotheses. This is the simple case of two pattern classes. A very general measure of divergence given by Csiszar <1~), is defined as follows. De[inition 1. Let f(u) be a convex function which satisfies f(0) = lim f(u) (la)
i
u~O
where Ex is tile expectation with respect to x. Several probabilistic distance measures and information measures have been used to obtain bounds on the B a y e s i a n p r o b a b i l i t y o f e r r o r , l i k e K o l m o g o r o v ' s variational distance, the Bhattacharyya distance, the J-divergence, the (generalized) Bayesian distance, as well as many others. More details about these measures can be found in (1)-(10), where upper and lower bounds to the probability of error are also given, Much effort has been given to generalize these measures from two pattern classes with equal a priori probabilities to two pattern classes with non-equal a