Pattern Recognition Letters ELSEVIER
Pattern RecognitionLetters 18 (1997) 705-709
Minimum error thresholding: A note Fan Jiulun a,*, Xie Winxin b a Department of Basic Courses, Xi'an Institute of Posts and Telecommunications, Xi'an, 710061, PR China b Department of Electronic Engineering, Xidian University, Xi'an, 710071, PR China
Received 6 December 1996;revised 26 June 1997
Abstract
We explain the minimum error thresholding originated by Kittler and Illingworth using relative entropy. @ 1997 Elsevier Science B.V. Keywords: Minimumerror thresholding; Relative entropy
1. Introduction
Image thresholding based on gray level histogram information is a simple and important technique for segmentation, the purpose of which is to identify the regions of image objects correctly. At present there exist many threshold selection methods based on the histogram of the image. Minimum error thresholding originated by Kittler and Illingworth (1986) is one popular method. In minimum error thresholding, appropriate thresholds are selected by the minimum error criterion. This criterion is designed to minimize the classification error probability based on the condition that histograms are governed by a mixture of Gaussian densities. Ye and Danielsson (1988) provided another derivation of the criterion by using the correlation. The methods used in Kittler and Illingworth (1986) and Ye and Danielsson (1988) lacked strict mathematical explanation though they were reasonable. Morri (1991) expressed the minimum error criterion from the viewpoint of Shannon entropy. This explanation is strict and has no relation to the threshold selection. Kurita et al. (1992) showed that minimum error thresholding, assuming normal distribution with different variances, is equivalent to the maximization of the likelihood of the joint distribution in the population mixture model. In this note, using relative entropy, we provide a strict derivation of the criterion. The histogram h ( g ) , g = 0,1 . . . . . T, can be viewed as an estimate of the probability density function p ( g ) , g = 0,1 . . . . . T, of the mixture population comprising gray levels of object and background pixels. In the
* Corresponding author. 0167-8655/97/$12.00 © 1997 Elsevier Science B.V. All rights reserved. PII S0167-8655(97)00059-7
F. Jiuhm, X. Winxin / Pattern Recognition Letters 18 (1997) 705-709
706
following we shall assume that each of the two components p( g I i) for i = 0,1, of the mixture is normally distributed with mean /z i standard deviation o-~ and a priori probability Pi, i.e. 2
P(g) = Y: Pip(gli),
(1)
i=1
where
(g-~i)-
p(gWi)~exp( ,,
oi
"
(2)
Minimum error thresholding is based on the idea of arbitrarily dividing the histogram into two parts, modeling each part with a normal distribution and comparing the model with the histogram. Suppose that we threshold the gray level data at some arbitrary level t and model each of the two resulting pixel populations by a normal density h(gli,t) with parameters i~i(t) and ~i(t) and a priori probability Pi(t) given, respectively, as b
Pi(t) = Y:
h(g),
(3)
g=¢l
Y: h(g)g /zi(t)-
g="
Pi(t)
(4)
and
o.i
(0 =
P,(t)
'
(5)
where a=
0, t+l,
i=l, i=2
(6)
and
b={t,
T,
i=1, i = 2.
(7)
For a threshold t ~ {0,1 . . . . . T}, Kittler and Illingworth (1986) derive the minimum error criterion function:
J(t) = 1 + 2[ Po(t)In ¢ro(t ) + P,(t)In ~r,(t)] - 2[ Po(t)In Po(t) + P l ( t ) I n P , ( t ) ] . As a threshold, we select t = t* minimizing
(8)
J(t).
2. Relative entropy expression of m i n i m u m error criterion Let p = (p0,Pl Pr} and p' = { p~,p'l . . . . . p~-} be two probability distributions defined on the same set. The relative entropy between p and p' (or equivalently, the entropy of p relative to p ' ) is defined by . . . . .
f
L(p;p') = ~_~pj In i:,
pj P'i"
(9)
F. Jiulun, X. Winxin / Pattern Recognition Letters 18 (1997) 705-709
707
The definition given by Eq. (9) was first introduced by Kullback (1968) as a distance measure between two probability distributions. It is well know that L(p;p') is nonnegative, additive, but not symmetric (Johnson, 1979). The relative entropy basically provides a criterion to measure the discrepancy between two probability distributions p and p'. The smaller the relative entropy, the less the discrepancy. It is natural to use relative entropy as a measure of difference between histogram h(g) and the mixture probability density function p(g). Many researchers have used the relative entropy in image threshold selection (Le and Lee, 1993; Chang et al., 1994; Brink and Pendock, 1996; Pal, 1996). In the following we shall provide a derivation of J(t) using relative entropy. For an image, we have a histogram h(g) and an estimate of the probability density function p(g)= h(glO, t)Po(t) + h(gll,t)Pl(t). If the mixture of normal distributions model is used to match the histogram, one goodness criterion on the chosen t would be to use relative entropy, i.e.,
r R(t) ~
h(g) h(g)ln--.
(10)
P( g )
g=0
The minimum of this function should correspond to the maximum match between the model and the histogram. Substituting Eqs. (1)-(7) into R(t) we have r R(t) = ~ h ( g ) In h ( g ) + In ~
1 + -~ + Po(t) In o-o(t) + P,(t) In o-,(t) - Po(t) In Po(t)
g=0
- Pl(t)In P , ( t ) , i.e., T
2 R ( t ) = 2 Y'~ h( g)21n h ( g ) + 2In 21/2~- + 1 + 2Po(t ) In o.o(t) + 2 P l ( t ) In o'l(t ) - 2Po(t ) In Po(t) g=0
-- 2 P l ( t ) In Pl(t). Because R(t)> 0, ~ rg=O h(g) In h ( g ) + In ~ following function: -
is constant, minimizing
-
R(t)
is equivalent to minimizing the
J ( t ) = 1 + 2[ P0(t) In o.o(t) + P,(t) In o-l(t)] - 2[ P0(t) In Po(t) + P1(t) In PI( t)]. The above result gives the minimum error thresholding a clear mathematical meaning.
Appendix A. The derivation of J(t) from R(t)
T R(t)=
h(g)
Y'.h(g)ln g=0
t h(g) r h(g) - Y'~h(g)lnh(glO,t)Po(t) + Y'~ h(g)ln h(gll,t)Pl(t ) P(g) g = 0 g=t+l
h(g)lnh(g) + ~_, h(g)lnh(g)] - [ ~
=
g=t+l
T
+
]
T
,
g=0
g=0
E h(g)lnh(gll,t)Pl(t) = E h ( g ) l n h ( g ) - ~ h(g)lnh(glO,t) g=t+ 1
F. Jiulun.X. Winxin/ PatternRecognitionLetters18(1997)705-709
708 t
E
g=0
g=t+
r
h(g)lnh(g[l,t)-
g=t+l
I
t
= E h(g)lnh(g)T E
-
T E h(g)lnP,(t)
T
- Y~ h ( g ) l n P o ( t ) -
g=t+
1
Y'~ h ( g ) l n
2¢Y4
1 h(g)ln
(
T
t)
:o
r + E
g=t+l
T
t
g :o
,e= o
g=t+ J
t
2 °'~2( t )
g-t+l
P,(t) lnP,(t) ~ g=t+
h(g)lncr,(t) 1
(g_tz,(t)) 2
v
+ E
no(t) lnPo(t )
T
g=0
(g_/Xo(t)) 2 g= 0
2°h2(t)
~ h ( g ) l n 2~/~w + Y'~ h(g)lno'o(t)+ g-O
+ ~h(g)
2 o"2 (t)
T (g_iz,(t)) 2 Y'~ h(g)
T
g=0
- Po(t) In Po(t)
h(g)ln( 2~/~o-o(t))+ ~ h(g) (g-lz°(t))~"
h(g)ln( 2~o',(t))+
= ~ h(g)lnh(g)+
-2~o2~
- PL(t) In P , ( t )
~7~
~
= ~ h(g)lnh(g)+
(
(g-/x'(t))2)]
2V~W0"1(t) exp
I
[ exp
h(g)
2 °,2 ( t )
Po(t) l n P o ( t ) - P , ( t ) l n P , ( t ).
Using Eq. (5), we have t
( g - / Z o ( t ) )2
E h(g)
o.2(t)
g=0
V (g _ iZ,(t)) z + E h(g) 1
12(t)
Po(t) +P,(t) = 1.
Therefore, R(t)
=
T Y', h ( g ) In h ( g ) + In 2 ~
1
+ -~ + Po(t)In o-0(t ) + P l ( t ) I n oh(t ) - Po(t)In Po(t)
g= 0
- P,(t)In
Pl(t),
i.e. 7"
2 R ( t ) = 2 Y'~ h ( g ) l n h ( g ) + 21nv/2w + 1 + 2Po(t) lno-o(t ) + 2P,(t) ln~r,(t) - 2Po(t) lnPo(t ) g= 0
- 2Pl( t) ln P,( t). Because R(t)> O, ET=oh(g)ln h(g)+ In 272-~-wis constant, thus minimizing R(t) is equivalent to minimizing the following function: J ( t ) = 1 + 2[ Po(t) In ¢ro(t) + P , ( t ) In cr,( t)] - 2[ Po(t) In Po(t) + P , ( t ) In PI( t)].
References
Brink, A.D., Pendock, N.E., 1996. Minimum cross-entropy threshold selection. Pattern Recognition 29, 179-188. Chang, C.I., Chen, K., Wang, J., Althouse, M.L.G., 1994. A relative entropy-based approach to image thresholding. Pattern Recognition 27, 1275-1289.
F. Jiulun, X. Winxin / Pattern Recognition Letters 18 (1997) 705-709
709
Johnson, R.W., 1979. Axiomatic characterization of the directed divergences and their linear combinations. IEEE Trans. Inform. Theory 25, 709-716. Kittler, J., Illingworth, J., 1986. Minimum error thresholding. Pattern Recognition 19, 41-47. KuUback, S., 1968. Information Theory and Statistics. Dover, New York. Kurita, T., Otsu, N., Abdelmalek, N., 1992. Maximum likelihood thresholding based on population mixture models. Pattern Recognition 25, 1231-1240. Morri, F., 1991. A note on minimum error thresholding. Pattern Recognition Lett. 12, 349-351. Le, C.H., Lee, C.K., 1993. Minimum cross entropy thresholding. Pattern Recognition 26, 617-625. Pal, N.R., 1996. On minimum cross-entropy thresholding. Pattern Recognition 29, 575-580. Ye, Q., Danielsson, P., 1988. On minimum error thresholding and its implementations. Pattern Recognition Lett. 7, 201-206.