Features selection and ‘possibility theory’

Features selection and ‘possibility theory’

Pattern Reco~nilion. Vot. 't9. N o . ~, pp. 63 7 2 , 1986. P r i n t e d in G r e a t Britain. 3203,,86 $ 3 . 0 0 + .00 Pergamon Press Ltd 1986 P a t...

589KB Sizes 1 Downloads 143 Views

Pattern Reco~nilion. Vot. 't9. N o . ~, pp. 63 7 2 , 1986. P r i n t e d in G r e a t Britain.

3203,,86 $ 3 . 0 0 + .00 Pergamon Press Ltd 1986 P a t t e r n R e c o g n i t i o n Societ~ 0031

(

FEATURES SELECTION AND 'POSSIBILITY THEORY' V. DI GESO Istituto di Matematica, Universit~ di Palermo, 1-90123 Palermo, Italy and M. C. MACCARONE Istituto di Fisica Cosmica ed Applicazioni delrlnformatica del C.N.R., Palermo, Italy (Received 4 January 1985; in final form 6 June 1985; receivedfor publication 5 July 1985)

Ahatraet--A method is presented that uses both 'cluster analysis' and 'possibility theory' to select the most significant variables in multidimensional data. The method is applied to a set of biomedical data from electron spin resonance spectroscopy measurements on patients with a brain injury. Clustering algorithms

Data analysis

Multivariate analysis

!. INTRODUCTION

Fuzzy sets

Pattern recognition

ality of the feature space, it seems useful to introduce morphological elements of the data distribution. In Section 2 is given a theoretical background of the possibility theory. In Section 3 the F.S. method is described. Section 4 shows a detailed application of the F.S. to biomedical data. Details of the implementation are given in Section 5. Section 6 is dedicated to concluding remarks.

In data analysis, given a set of data-points in a multidimensional space, the need to choose the relevant features often arises. This problem is also referred to as feature selection (F.S.). In statistical pattern recognition, (1) cluster analysis(2~ and signal processing, °) F.S. reduces the information redundance, allowing a simpler description of the system under consideration. F.S. also reduces the total computational time whenever the analysis involves a combinatorial handling of the features, as in multiregression problems and factorial analysis.(4. s) In the literature there are many examples where the F.S. problem is handled by using linear statistical methods, such as variance and principal component analysis,(6~ or the X2-test. Some efforts have also been made to select features in the case of non-linear models. (7) Nevertheless unfavourable conditions arise whenever the sample of points is so small, or the number of features is so large, that the 'large numbers law' is not applicable and the Gaussian distribution is a weak hypothesis. Examples of such unfavourable conditions occur in gamma-ray astronomy ~s~ and in biomedicine.(9~

2. F U Z Z Y

POSSIBILITY

SETS AND

Many papers have been written to introduce the

°.



** o°*o

°.

0 e

°°

.



O

°. .. ,... "* • ..

g oo ~

°

°0

,°',*

• "-........~"

Figure 1 shows a typical 'sky map', as detected by the COS-B satellite. The picture looks like a sparse graph and it is hard to define the clusters-models. Whenever such situations appear, a non-probabilistic approach seems to be helpful, as R. Yager underlined,(1°) on which the possibility theory was introduced. In the present paper a F.S. method is proposed, based both on statistical and possibilistic tools. The general idea is that, in order to reduce the dimension-

:.... s



:

..

.," ° °.°

.. • °

e e

.,*,"

."

.

".

.

Fig. 1. An example of gamma ray photons picture, from the COS-B satellite. 63

64

V. DI Gin0 and M. C. MACCARONE and classification theory. In the present paper we will show an application in which the use of P.F. to classify shapes may be useful in F.S. problems. Several properties characterize the P.F., in particular we will use the following:

fO0 8o

(a) P.F. is monotone, in the sense that it increases with "the 'membership' of x to A (e.g. the Minkowsky measure is monotone); (b)fA~a(x) = max{fA(x),fB(x)} VA, B c_ X; (c)fA~B(X) = min{fA(x),fs(x)} VA, B c_ X.

~ 6o

Z

o--

, 0

,

,,I

, , , , i

I , , , , I " 2

Threshotd ~ (in fr. m. ~ t h

3. F E A T U R E S S E L E C T I O N 3

unit )

Fig. 2. The theoretical mean number of clusters Nc(O) versus the threshold (in units of free mean path) for the uniformity test in a 2-dimensional space. The symbols x are the experimental values N*(O).

concept of fuzzy-sets: Zadeh proposed a formal definition of this concept (t~) and studied several applications;(~ 2) Yager gave the first idea about the definition of a possibility function whenever probability becomes hard to define or an inconsistent measure; (~3) De Luca and Termini have developed the mathematical background to handle several classes of functionals, such as the fuzzy-entropy, useful in decision theory and pattern recognition. (t4. ~s) In this framework we give only a short sketch of the theory, useful for the definition of the F.S. method. Definition I. Given a universal class X, let fx be a function

fx:X --* [0, 1] fx is called the fuzzy-set of X. The well-known characteristic function is an example of such functions and it defines the classical sets of X. The last ones are therefore included in the fuzzy-sets of X. The power set of X is also included in the power fuzzy-set, which has the power of the continuum. Definition 2. Given a fuzzy-setfx of X and 0 • R +, the 0-image offx is:

x y = {x • x:Ltx) >1 O} Definition 3. Given two fuzzy-setsf~, g~ of X, the Oevaluation ofg~ with respect tof~ is the restriction off~ to X~' f xI(X) = f~(x), Vx e X~. In the case in which g is a characteristic function of some set A c_ X, the 1-evaluation function is a restriction off~ on A and the intuitive meaning is that, for each x • A,fx.;(x) is the membership degree ofx to A. In concrete applications fA is built by combining some features of x. The function fA is also called the 'possibility function' (P.F.) and may be used in decision

Here we give a detailed illustration of the procedure used to perform the F.S. The input data are the set X of N samples in the n-dimensional normed feature-space. In the first step, KI ~< n l-dimensional subspaces, S(t), are selected by applying the statistical uniformity test (U.T.) described later. In the second step, K2 2dimensional S(2) subspaces are selected, using the same U.T., among the total of ( ~ - ~ )

subspace combin -

ations. The last step performs a further selection, based on the definition of suitable P.F. of the remaining (~-~)

- K2 2-dimensional subspaces which were

not selected by the previous U.T. 3.1. The features selection based on the uniformity test The first two steps of the features selection are based on the U.T. introduced by Di Gest) and Sacco"6); in this framework we give a brief description of it. In Di Gesfi and Sacco (16) the authors give the theoretical estimation of the expected number of'clusters Nc(~b) found by cutting the minimum spanning tree of a set X at a given threshold 4- The method is robust and its validity has been tested under conditions of low statistics. Whenever the data points are uniformly Poisson distributed the relation between No(O) and N may be written N~(~) = l + (N - l ) - e x p { - 3 . . V~(~)},

(1)

where 3. is the density-point and V4(~) is the hypervolume around a point x • X (Fig. 2 shows the case d = 2). Given a sample of experimental N*(O~) computed for several threshold values (i = 1, 2..... t), the probability to be consistent with the theoretical expected number No(O) is given by Q~ = 1

Qi=Pi'Qi-l'(l-ln(P~-l))

fori> 1

where P~ is computed by means of

P(No d~) =

( N -)- 1. No- I

exp(_3.Vd(O,))N _ I × (I -- exp(--,[Vd(c~i)))N-N~, (2)

Features selection and 'possibility theory'

~..

65

°.

.,~:~"

Fig. 3. Ideal elements for thinned and circular cluster classes.

which is the probability of having Nc clusters at the threshold ~bi.The probability of having N*(~j) different from the expected Nc(~bi) is given by computing

{')

N ~ p(Nc, CPi)

normalized respect to the maximum moment. In application, the moments and the directions are of the form mi = Y,~(x~ - yj tan ai)2/(1 - tan 2 ~i)

for N* < Nc(d~,)

at=i-(360/r)

N , ffi 1

Pi =

N

P(N~, c~)

for N* > Nc(~).

N¢ = N *

In the tests the analysis is performed for d = 1 and d = 2; in these cases Vd(~ ) in relation (1) assumes the values q5i and ~b/2, respectively. The features selection test will choose those subspaces such that the probability Q of being consistent with the expected number of clusters is less than a chosen probability threshold (e.g. Q < 10- 7). 3.2. The features selection based on the possibility function The selection of the subspaces is based on the shape classification of the clusters found by computing the minimum spanning forest (M.S.F.) of the subspaces of order d. In our case we consider d = 2. The heuristic argument is that the presence in the subspace of only a circular cluster is related to the uniformity of the sample in the subspace. The chosen universal class X is the set of the shapes of the clusters found by the M.S.F. algorithm. Each point of X may be characterized by r-parameters A = (ml, m2..... m,). In our case mi ~ [0, 11 are the axial moments of second order, computed with respect to the r-axis crossing at a given direction :~ the barycenter of a cluster A and F'~

19/t-C

i = 1,2 . . . . . r.

Two fuzzy-sets have been considered to perform the shape classification: the first is denoted by 7" and includes thinned clusters, the second one is denoted by C and includes circular clusters. In Fig. 3 are shown examples of 'ideal elements' belonging to T and C, respectively; an element y is called 'ideal' with respect to a fuzzy-set X iff~(y) = 1. Given an element A ~ X, the value offr(A)(fc(A)) defines its membership degree to T(C). On the basis of the possibility function properties shown before, the P.F. chosen to classify T against C is

fr~c(A) = max{fr(A),fc(A)} such that

fruc(A)

/fr(A) = ~fc(A)

,A~T ,A~C

3.3. Possibility functions The selection power of several 'possibility functions' has been tested on bidimensional elliptical clusters with increasing eccentricity value, generated via Monte Carlo simulation. In particular, two 'possibility functions' are used: normalized fuzzy entropy 1

fE,= l _ -

'

y.

r ill

( - - ~ u l n ~ i j - (1 -- ~0)ln(1 -- ~ij)) 6[0, 1];

66

V. D1 GES~ and M. C. MACCARONE 2'I,l'

1.0

==

''

I,I,

I,',

I,,

'L

I I ,

''J['~'l'''lli~l''~i'''

~

~u

•~

Circutor, . . . . . .

i 0.8

o. ~

--

08

f

0.6

06

Thinned 0.4

T ,t,ll,,l,,,l,,,l,, 0.2

0

lla

0.4

0.6

08

,.

.

.

.

04

1.0,

l,,,l,~,l,,,l,,,,,,i 0

02

04.

06

08

Eccentricity

Eccentricity

(a)

(b)

I0

Fig. 4. Membership degree to circular and thinned prototypes for the Minkowsky metrics (a) and normalized fuzzy entropy (b) of clusters generated via Monte Carlo simulation at increasing eccentricity. Minkowsky metrics

fM, = 1 ---1 r

(mi.J -- m~)h

)'"

h = dimensionality (h = 2). Figure 4 shows the fuzzy membership degree at the two prototypes vs the increasing eccentricity of simulated shapes. The bold lines refer to the degree of belonging to a circular prototype, the dashed lines to the elliptical one. The classification in Fig. 4a is performed using Minkowsky metrics; Fig. 4b refers to normalized fuzzy entropy (N.F.E.) metrics. Note that N.F.E. has a better discriminant power. Moreover, in Fig. 5 we show two clusters with eccentricity 0.0 (circular) and eccentricity

~[0, 1"1;

j

where

~q

=

Imi,~ - mjI < 1,

i = 1,..., n u m b e r of prototypes, r ---- n u m b e r of rotation axes, mr = jth axial moment of the shape, mi. ~ = j t h axial moment of the ith prototype,



.:.--~¢, ~-..

.',7

|

!

i.O

,

,

,

i

,

,

,,

I '

i



'

'

i

I

' '

'

' ~...

"

_

LO

\

/

q

I ,

0.5

k

05

../ 1&..o

o Z I

I

,

0

,

I

I

i

,

J

90 Angle

of oxiot moment

(a)

,

0

I 180

I O

,

,

,

,

I

,

I

90

,

,

,i

180

An
(b)

Fig. 5. Example of selection with P.F. for simulated clusters. Spatial distribution and normalized axial quadratic moments for eccentricity = 0.0 (a) and for eccentricity = 0.86 (b).

Features selection and 'possibility theory'

O0

00000

00000

67

00000

00000

0

OOO

00000

0

* * * * *

*

0

~oo ,oo ~

. 0. . . . 0. ~. 0. . . .~ . . . . . ~ . . .

0

~

*

.

.

.

.

~ 0 ~ 0

.

.

.

.

.

.

.

~

0

*

~ 0 ~ . . . .

*

00000

00000

00000

00000

00000

00000

00000

00000

00000

~

~

~

~0~£~

~

~ 0 ~ 0

~

~00~0

~

~ r ~

~o*

~****

***~

~***

***

**m~*

***~*

~r~

~

~

~

~

~

0

~

0

~

0

0

~

.E

£

e-

000-~

~ *

~

- ~

So

~**o*

*

~ o

=

¢-

~

E

~0

~ 0

= ~ . ~

o~=~o

~o~

~ 0 ~

~ 0 ~

~

~.C~

0~=~

m~O*

.

*** o~

o

-r

~

***m

r

~

r

~0~0

~0~0,

***~,

~ , _

~

O~

r

~

~0,0~

i

=

~

~m

....

~

0

*~*

****~

0 0 ~

~

*

*

68

V. Dn GES6 and M. C. MACCARONE

%° I' ~t°°

~" ~ : .

°*°o ~ ,



~,

Circular

Thinned

fM

0.97

0.95

fE

0.68

0.54

Circular

Thinned

fu

095

0 98

fE

054

067

Shape = Circutor

Shape = Thinned Numrot = I

Slope = O ± 18 degrees Fig. 6. Results of the classification of the clusters in Fig. 5 using the Minkowsky (fM) and normalized fuzzy

entropy (fE) possibility functions. 0.86 (thinned); the spatial distribution of the clusters and their normalized axial moments are shown. The two clusters shown in Fig. 5 are very spread, therefore chance fluctuations may introduce some variability in the distribution of the axial moments of the circular one. The results of shape classification are given in Fig. 6. It should be noted that both functions are able to discriminate the shapes, in accordance with the membership degree as in Fig. 4.

applied. As we pointed out before, the feature values are not always defined, therefore the number of sample points whose parameters are always defined depends on the subspace considered. Obvious reasons suggest that we select, for further computation, only those subspaces with a number of points greater than 4. By setting the confidence level Q = 10- 2 we obtain K 2 = 9 2-dimensional parameters which do not pass the

4. APPLICATION AND RESULTS

ing subspaces in S (2) are then analysed by P.F. to find possible dependences between the parameters by considering their shapes.

The method has been applied to a set of biomedical data from electron spin resonance spectroscopy measurements on patients with a brain injury. (9) The sample size is N = 46 and each event is a collection of chemical, physical and biological parameters, the dependence of which is far from linear. The total number of parameters is n = 17, and they they are not always defined. Table 1 shows the complete set of measured values. Before performing the features selection, a normalization of each parameter to its mean value is performed. 4.1. Selection with U.T. The selection of the K 1 1-dimensional subspaces, that do not get through the U.T., is made by setting the significant probability level to Q = 10 -2. The U.T. applied to the data gives K~ = 8 (see Table 2). In Fig. 7 is shown an example of the U.T. used for the parameters BB and TF. In the second step, applied to the (n - K 0 = 9 remaining components, the 2-dimensional U.T. is

U.T. (see Table 3). The ( ~ - L )

- K2 = 25 remain -

Table 2. Results of application of the l-dimensional U.T. to the data in Table 1 Coordinate

Points N

PO2 PCO2* PH* BE BB HCO3 TCO2 GR* HB HT TFFE* CPCU* FE* TF* CU CP* FR * Failed U.T. (Q < 10-2).

30 30 30 30 29 30 26 23 24 15 42 44 43 43 36 44 30

Probability Q 8.4 x 5.4 x 0.0 3.6 x 0.23 7.9 x 5.9 x 3.7 x 1.3 x 9.5 x 4.3 × 9.2 × 5.2 × 2.0 x 0.1 I 4.7 x 6.5 ×

10 -2 10 -3 10 -2 10 -2 10 -2 10 -4 10 -2 10 -2 10 4 10 -3 10 -3 10 -s 10 - 4

10 -2

Features selection and 'possibility theory'

Table 3. Results o f the 2-dimensional U.T.

i

i ....

30

I,,,,I,,,

lll,ill'i

i'L

Coordinate couples

20

Z

o

r,,,,I,,,JI 0

....

i

2

~,,,,i 3

.... 4

I5

T h r e s h o t d qb ( i n f r . m . p o t h u n i t )

(a) 50

~

#

I il

Ill

I I II

, I I II

Points N

Probability Q

29 28 29 25 20 14 22 17 29 30 26 20 15 22 16 29 25 20 15 22 15 26 20 15 22 16 16 11 19 16 15 20 11 12 3 22

9.8 x 1 0 3 2.4 x 10 -2 1.6 x 10 .-2 3.2 x 10 2 2.6 x 10 2 9.3 x 10 ~ 2 0.12 3.9 x 10 -3 1.0 x 10 5 3.7 x 10 -4 1.0 x 10 .-2 4.2 x 10 -2 7.6 x I0 2 2.9 x 10 -2 2.6 x 10 -2 9.3 x 10 -3 1.5 x I0 2 7.9 x 10 -.2 7.4 x 10 2 3.8 x 10 -2 3.2 x 10 2 1.0 x 10 -g 8.7 x 10 -2 0.12 9.7 x 10 2 6.9 x 10 3 0.11 7.4 x 10 -2 9.9 x 10-" 3.4 x 10 2 7.7 x 10 -3 6.7 x 10 2 9. I x 10 -2 5,2 x 10 -3 not considerable 4.9 x 10 -2

PO2-BE* PO2-BB PO2-HCO3 PO2-TCO2 PO2-HB PO2-HT PO2-CU PO2-FR* BE-BB* BE-HCO3* BE-TCO2 BE-HB BE-HT BE-CU BE F R BB-HCO3* BB-TCO2 BB-HB BB-HT BB-CU BB-FR HCO3-TCO2* HCO3-HB

ff ~

69

I I I I I I I Ill..."

30

HCO3-HT :~

HCO3-CU HCO3-FR* TCO2-HB TCO2-HT TCO2-CU TCO2-FR HB-HT* HB-CU HB-FR HT-CU* HT-FR CU-FR

-

o --r . . . . 0

I ....

I ....

I ,,°, ,I . . . .

i

2

3

4

5

T h r e s h o t d ~ (in f r . m . tooth u n i t )

(b)

Fig. 7. Use of the uniformity test for l-dimensional space. • Indicates the experimental number of cluster N*l~b). The ]BB variable in (a)'passes the U.T. IQ = 0.23), but the T F variable in (bl fails it (Q = 2 x 10-8).

* Failed U.T. (Q < 10-2). 4.2. Selection with P.F.

f o r m e d in a c c o r d a n c e with t h e m e a n d e n s i t y o f t h e real d a t a (in t w o d i m e n s i o n s ) a n d w i t h e c c e n t r i c i t y 0 for t h e c i r c u l a r a n d 1 for t h e t h i n n e d o n e . T h e d i s t r i b u t i o n s o f t h e axial m o m e n t s f o r e a c h p r o t o t y p e a r e s h o w n in Fig. 8. T h e p r o t y p e s m a y b e g e n e r a t e d via preclassified d a t a ( t r a i n i n g set) if t h e s a m p l e h a s e n o u g h d a t a . T h e f ~ a n d f r p o s s i b i l i t y f u n c t i o n s h a v e been a p p l i e d in o r d e r to a n a l y s e t h e r e m a i n i n g 25 c o u p l e s o f

T h e v a r i a b i l i t y o f t h e d a t a s u g g e s t s t h a t we define t h e m o m e n t s o f t h e 'ideal e l e m e n t s " o r p r o t o t y p e s o f T(C) by m e a n s o f a t r a i n i n g set i n s t e a d o f t h e t h e o r e t i c a l one. I n t h e a p p l i c a t i o n t h e t o t a l n u m b e r o f m o m e n t s is 10. Each prototype derives from the mean of 40-sample c l u s t e r s g e n e r a t e d via M o n t e C a r l o s i m u l a t i o n per-

,,

~,o

i! "

N0

'-';"?G'. ~.;

,,.:, s.?,~: •

, I,,,"T,,

, i,,,



i,

, ,r,

, , ,~

Circutar

-.2"

8

°o6 ~, o 4

oE E 02

6

x '~

(a)

0

o (b)

90 Angte of oxiot moment

180

Fig. 8. Circular and thinned prototypes for the selection test based on the P.F. (a) Spatial distribution, (b) Normalized axial quadratic m o m e n t s along 10 directions.

70

V. Dl H C 0 3 vs. P 0 2 I .... i .... I .... i ....

....

GF.S0 and

I ....

M. C. M^CCARONE

i.

4.,

g

~5

i

o~

-I+



o

-i

Z

-2 -3

l

Thinned

fM

097

082

fe

0.75

,=

o. "..

0

E

,

+.

Circular

0.65

Shape = Circular

~

-.

-~t'.... 1 , . . . I . . -3

-2

-i

I ....

I ....

I...,T

0

i

2

Normalized coordinates

(a)

3

TC02 vs. BE .... I ....

''''I+'''I

l''i'l

'''

Circular

Thinned

fm

0.86

0.96

fE

061

067

2 {

#. • o...

0 o

++

o Z

-i

Shape = Thinned

Numrot = -2

8

Slope = 5 4 + 18 degrees

-3

,,,.I -3

.... -2

I,,,.I..,.l,,,,I,,,, -I 0 I

2

Normalized coordinates

(b) CU vs. HB 3

j ....

l'

'"

I'''

II.=,,IIT,,

.g +6 t-

I

Circular

Thinned

fm

0.88

094

fF

055

0.68

iT.,

.

• I

~3

"'~ O

Shape = Thinned

Numrot = 4 Z

Slope = 126 + 18 degrees

-2

-3

I ....

-3

-2

J,,.

-i

.I . . . . I . . . . I 0

f

2

3

Normalized coordinates

(c) Fig. 9. Results of the classification with P.F. on the couples of variables PO2-HCO3 (a), BE-TCO2 (b) and CU-HB {c), normalized to their mean value.

variables. For each couple, the features selection test is applied at the ~ threshold which allows consideration of the distribution (2) of Nc close to the normal one. It may be shown that the value of O is for d = 2 . In fact this is the threshold value which maximizes the variance of (2) expressed as var [N~] -- (N -- 1) (exp( -- n+#23.)) (1 -- exp(-- n ~b~2)).

The significant clusters are defined in accordance with the theoretical U.T. law, i.e. clusters with a number of points

-

Np~ >1

[E]'~c + k"

var

,

where N is the number of data points for that couple, N c is the number of clusters at the threshold $ and the

Features selection and 'possibility theory' approximated values of E[N/Nc] and of var [N/Nc]

71

J l n l [ T n l n l l l l l ] e l t l l l r l l l -

are



+ N

/ / /

I

/ /

v a r r N ~ ~_ N2 vat I'Nc] L Ned (Nc)"

/

/

t.)

• •

0 /

In Fig. 9 are shown the results of selection with P.F. on the couples PO2-HCO3, BE-TCO2 and HB-CU. The set P O 2 - H C O 3 is classified as of circular shape and this indicates the lack of significance of this couple of variables in the pathology under consideration. It follows that the couple P O 2 - H C O 3 is not selected as a relevant feature. The couples BE-TCO2 and H B - C U are instead both classified as elliptical shapes, i.e. the variables are correlated and so it is possible to define the regression line by considering the axial rotation for which we obtain the maximum membership degree. The range of the slope of the regression line is SLOPE" = 180" - PASROT'. (NUMROT - 1) + 180"/NSTEPS where PASROT is the rotation step in degrees (in our case NSTEPS = 10) and N U M R O T is the number of rotations performed to reach the maximum membership degree. In our example

/,

J

/ / / --2

t

-3

,.ffl

[

i

I

-2

, ,

]

-I

I , I

I [ ,

,

I I I

0

I

I I I I

2

T¢02 Borycenter of the cluster: TC02 = O, CU= 0 l l(:;:ircutorl,Thinnedi ,M 10.95 0.9s I fE [ 0.54 0.67 Shape - Thinned Numrot - 8 Slope = 36 -+ 18 degrees

Fig. 10. Results of the P.F. showing the non-linear dependence between two variables (CU vs TCO2). not applied. - - u n d e r the column 4 indicates whether the P.F. is not applicable (one parameter) or the couple of parameters has already been rejected by the U.T. Note that the subspace TCO2-CU, not selected by the U.T., has been retrieved by the test based on the P.F. Figure 10 shows evidence of the non-linear structure i n h e r e n t in t h e data.

BE-TCO2

SLOPE = 54" + 18°

NB-CU

SLOPE = 126" + 18°,

and the two couples are selected as relevant features. In Table 4 are shown the comparison results with the partial correlation method ~9) (column 2); columns 3 and 4 refer to F.S. performed by the U.T. and the P.F., respectively. The first 9 rows concern analysis on single parameters, while the last three are for couples of parameters. The value 'YES' means that the subspace considered is selected, i.e. the single parameter or the couple of parameters must be considered as relevant features, otherwise the value is ' N O ' - - u n d e r column 2 indicates that the method was

5. IMPLEMENTATION

The fuzzy analysis method has been implemented on a Digital VAX 11/750 in FORTRAN 77 language with graphic and pictorial facilities, such as the printer plotter, the pictorial display VS 11 (512"512.16) and the graphic display VT125, that allow the user to choose the display modality (plot, hardcopy, print). The mean CPU time for the whole fuzzy analysis is

Tcp v = Tu.T. + Tp.r.,

Table 4. Comparison with the results derived from other methodstg)

Coordinates

Partial correlation method

U.T.

P.F.

TFFE CPCU2 CP PH HCO3 BE HB TF FE CPCU-CP CPCU-CU TCO2-CU

NO NO NO NO YES YES YES YES YES NO YES --

NO YES NO NO YES YES YES NO NO NO YES YES

----

•- Q = 9.2 x 10-3

--

------YES NO

~ finherent variableness -~Lin the control group

72

V. DI Gtst~ and M. C. MACCARONE

TU.T. O(No" log (No)) for each parameter or couple of parameters, =

Tp.F. = O(N o + Nc + NpR ) for parameters, where

each

couple

of

3 give the elements of fuzzy sets and possibility-theory used and describe the methods, respectively. Applications on real biomedical data and information on implementation of the method are given in Sections 4 and 5. REFERENCES

1. V. Di Ges~, B. Sacco and G. Tobia, A clustering method No = n u m b e r of data for each parameter or couple of applied to the analysis of sky maps in gamma-ray parameters, astronomy, Memorie Soc. astr. ital. 517-528 (1980). NeR = number of prototypes, 2. A. Hardy and J. P. Rasson, A statistical approach to cluster analysis, Dept. Mathematics, University of Nc = number of clusters in two-dimensions for eachNamur, Belgium, Report 82/7 bis (1982). couple of parameters. 3. S. K. Pal and R. A. King, Image enhancement using smoothing with fuzzy sets, IEEE Trans. Syst. Man For example for N c = 1, N D = 29, Nes = 2, the C P U Cybernet. SMC-! 1 (1981). time is 1.5 s. 4. J. K. Friedman and J. W. Tukey, A projection pursuit The method has been included in a wider interactive algorithm for exploratory data analysis, IEEE Trans. system to analyse such data; the user may select, via a Comput. C-23, 881-890 (1974). 5. J. H. Friedman, S. Steppel and J. W. Tukey, A nonrequest menu on the display terminal, the following parametric procedure for comparing multivariate point variables: the coordinates of the d-subspace (d >/ 1); sets, Stanford Linear Accelerator Center, Computer confidence level and threshold value ~ to perform the Group, Technical Memo No. 153 (1973). test based on the U.T.; kind and prototypes classes; 6. D. F. Morrison, Multivariate Statistical Methods. McGraw-Hill (1967). possibility functions, (normalized fuzzy entropy, Min7. W.J. Borucki, D. H. Card and G. C. Lyle, A method of kowsky metrics ...). using cluster analysis to study statistical dependence in 6. FINAL REMARKS multivariate data, IEEE Trans. Comput. C-24, 1181-1191 (1975). 8. B. N. Swanenburg, K. Bennett, G. F. Bignami, R. In the paper a new method to select significant Buccheri, P. Caraveo, W. Hermsen, G. Kanbach, G. G. variables in multidimensional space is shown. The Lichti, J. L. Masnou, H. A. Mayer-Hasselwander, J. A. main idea is to combine probabilistic and possibilistic Paul, B. Sac,co, L. Scarsi and R. D. Wills, Second COS-B approaches in the selection steps. As a matter of fact, catalog of high energy gamma-ray sources, Astrophys. J. 243, 69-73 (1981). complementary information resulted from both. We 9. M. Brai, M. Lupoli, G. Alia, Melchiorre Brai, P. Salerno point out that the selection based on the 'possibility and P. Vanadia, Uso delle tecniche di risonanza di spin functions' takes into account the morphological naelettronica (ESR) per 1o studio dei livetli sierici di Fe 3 + e ture of the sample points. Several factors affect the Cu 2+ in pazienti traumatizzati cranici, Acta anaesth. feature selection test, such as the dispersion of the Padova 32, 843-855 (1981). prototypes (fuzzy-sets), the chosen features (in our case 10. R. R. Yager, Fuzzy Sets, P. P. Wang and S. K. Chang, eds, pp. 171-194, Plenum Press, New York (1980). the normalized axial quadratic moments), the possi11. L. A. Zadeh, Fuzzy sets, Inf. Control 8, 338-353 (1965). bility functions behaviour; for example, the fuzzy 12. L. A. Zadeh, K. S. Fu, K. Tanata and M. Shimura, eds, entropy measure shows a better discriminant power Fuzzy Sets and Their Applications to Cognitive and Decision Processes. Academic Press, New York (1975). with respect to Minkowsky metrics (see Fig. 4). 13. R. R. Yager, A foundation for a theory of possibility, J. SUMMARY Cybernet. 10, 177-204 (1980). 14. A. De Luca and S. Termini, A definition ofa nonprobabilThe paper considers a new features selection apistic entropy in the setting of fuzzy sets theory, Inf. Control 20, 30 I-3 ! 2 (1972). proach, based on possibility theory, suitable in those conditions in which classical statistical methods seem 15. A. De Luca and S. Termini, Entropy of L-fuzzy sets, Inf. Control 24, 55-73 (1974). to be powerless. 16. V. Di Gest~ and B. Sac,co, Some statistical properties of Section 1 gives the main motivations and the general the minimum spanning forest, Pattern Recognition 16, ideas which are developed in the paper. Sections 2 and 525-531 (1983).

About the Author--VITo DI GEst3 is an Associate Professor at Palermo University. His main interests are in image processing and data analysis techniques. About the Author--MARIA C. MACCARONEis a staffmember of the Istituto di Fisica Cosmica ed lnformatica del C.N.R. in Palermo. Her main interests are in pattern recognition and in data analysis techniques.