Approximations for the probability of misclassification

Approximations for the probability of misclassification

Patttrn Rttoumtlo,) Pergamon Pres~ 1976 \ o | 8 pp 119-126 Printed m Great Britain A P P R O X I M A T I O N S FOR THE PROBABILITY O F MISCLASSIFICA...

567KB Sizes 2 Downloads 78 Views

Patttrn Rttoumtlo,)

Pergamon Pres~ 1976 \ o | 8 pp 119-126 Printed m Great Britain

A P P R O X I M A T I O N S FOR THE PROBABILITY O F MISCLASSIFICATION S G WHEELER and D S INGRAM Federat Sxstems Dwlsion Internauonal Business Machines Corporauon Houston Texas 77058 U S A

(Received 6 Mal 1974 and m tevls,'d tolm 26 Ma~ch 1975) ~bstract An algorithm which estimates the probabiht 5 of classffvmg a multl~arlate normal observation into eJther of t~o classes is developed and evaluated As part of the evaluauon the algorithm is used to compare and e~aluate maximum hkehhood Euchdean d~stance hnear &scrtmlnant and other classlficat~on rules Classlficat~on Pattern Recognmon of MlsclassIfiCatlon

Bayes Decision Rule

1 I'~TRODUCTION Man5 remote sensing applications reqmre an observauon to be classified to a specific accurac~ This requirement is important because it can be the criterion in determining the feasibility of achieving an application objectwe A t.vp~cal apphcatlon would revolve processing the data from a geographic area to esttmate the acreage of an agrtcultural crop such that the error m the estimate ts less than a specified value For most remote sensing apphcaUons the prtmary data source is multlspectral scanner data As a veh~de moves along a trajectory a sensor sweeps across the earth's surface and records the average value of the lntens~t5 of the received electromagnetic radmtlon m each of ~7 wavelength intervals (channels). hence each observauon is an n × 1 vector Multlspectral scanners have been mounted on the Earth Resources Technolog) Satelhte (ERTS) Skylab and N A S A Aircraft Several types of algorithms have been developed to classify remotely sensed data These algorithms range m completeness and complexity from the Bayes decision rule to a Euclidean distance measure The motwatlon for using Bayes rule is to obtam the best answer while the rationale for the Euclidean distance measure is to decrease the computational tune reqmred to process the data The accuracy of the clasmficatlon algorithm is a function of the spectral sunllarlt3 function, such as the Bayes rule or the Euchdean distance the number and ranges of the channels of data and the statistics of the classes being compared The obleCtlVe of this paper is to demonstrate the development and performance of an algorithm which accurately predicts the probability of error as a function of the spectral similarity function the number and location of the channels and the statistics of the classes being compared The effectiveness of the algorithm is demonstrated m a controlled experiment b3 generating simulated data whose mean vector and co-

Slmalant) Functions

Probablhty

variance matrix are known The statistics of the simulated data were determined from data collected by aircraft sensors over an agricultural area designated as the C1 flight line The predicted probabthty of error compared very favorably with the probability of error determined from the controlled expernnent It is demonstrated that the probabiht2~ of error model can be used to evaluate feature selection techniques determine the feaslbihty of achieving an apphcatlon objective and estabhshmg the conditions necessary to meet an apphcatlon objective

2 MATHEMATICAL

ANALYSIS

In this Section the pattern classification procedure is reviewed It is demonstrated that the most wldel) used spectral strmlarity functions are approximations to the Bayes classifier The statistical implications of the various approximations are interpreted The probability of error model is derived by developing an expression for the probability dtstrlbution function of the difference between two quadratic forms, AQ The distribution function is based on the cumulants of

119

AQ 2 1 Classification procedure Let x ~ NLu, E) then the Bayes classifier maximizes the probability that x originated m the assigned class If the observation ts an element of the tth class it is assumed that x ~ N(I~,, Y~,) The a priori frequenc2¢ of occurrence is designated as Pi P.,, p~ where p, ts equal to or proportional to the probability of observing the ~th class Given an observation 0~, the probabdlt2¢ that x~ N ~ , . 57,) is calculated by P(dx) =

kp,L,(~) ~_ pjL/v.)

t = 1, 2

,k

( 1)

120

S G WHEELERand D S IrqGRAM

where L,('¢) = (2n)-"/zlY,t - t'z x exp{ - 0 5(x - / ~ , ) r z , - l(x - #,)}

(2)

and/~, ts the mean of the ~th class and ~z ts the covartance matrtx of the /th class L,(x) ~s the hkehhood functton of x for the tth class The Bayes rule asstgns to the class for whtch P(tlx) ts maxtm~zed, th~s ~s equivalent to minimizing St = - 2 1 o g p , + loglZ, I + (x - / ~ , ) r z , - t ( x - #,)

(3)

In order to evaluate equatmn (3) there are (n" + 3n)/2 multtphcat~ons and (n' + 3n)/2 ad&tmns for each class for each observatzon The amount of processmg ttme requtred has provided the motwatton for stmphfymg the computattonal form of equatton (3) The four most commonly used approximations mvolve approxtmatmg the covartance matrix as dmgonal. Z = En. as a constant ttmes the identity matrix, Z = azI. as the tdenttty matrix. Z = I. and as the average of the covartance matrtces of the classes being compared. Z = Y. Table 1 tllustrates the various s~mdanty functions, the stat~st|cal ~mpltcatmns and the number of operations reqmred per observatton per class. The effect of the approxtmatmns assoctated w~th each szmtlartty functmn can be tllustrated eastly for a two-&mens~onal observatton vector Let the co-ordinate axes of the two-&mens~onal vector x be designated as ~ and x.. The locus of points for which the quadrattc form m equatton (3) ts a constant represents an eqmprobable contour The prmc~pal axes of the eqmprobable contours are ~t and ~ The center of the elhpse is at #t. #2 F~gure 1 illustrates the equtprobable contour for S~ Ftgure 2 dlustrates the effect on the equtprobable contour by assummg

a diagonal covarmnce matrix Figures 3-5 correspond to $3, S~ and $5 In each of the four cases where the approximate smatlartty functton ~s compared wtth S1. the dotted area represents the area common to the approxlmatton and St The area which satisfies the eqmprobabthty contour for the approxtmatlon but not for the actual stmdarlty functton ts accentuated by stratght hne segments, hence, one can see the degree of approxtmatton The approxxmatmns decrease the number of operattons reqmred but the effect on the accuracy must be quanttfied

2 2 General dzscusmon of the approxmu~tmns Generahzed hkehhood, or quadrattc form. classtficatton rules were introduced and defined in the previous sectmns In thts and the following sectton we discuss techmques for approximating the error resultmg from use of these rules We wtll restrict attention to classdicatton into one of two classes However, th~s model has recently been extended to several classes The extension ts being tested and wdl be reported at a later date Our work. although ~t IS approxtmate. generahzes earher work on probabthty of mlsclassfficatmn, e g by Fukunaga and Krtle) '~) because a faintly of classtficatlon rules is bemg modelled and. more ~mportantly. the data to be classified are not reqmred to be taken from either of the training populations The latter ~s tmportant ff ~t ~s desired to study effects of devtat~ons from assumptions We assume that an observation vector ,c ts taken from a p-varmte Gaussmn population wtth mean vector # and covarmnce matrix Y. It ~s to be classified. or asstgned, into the population assocmted w~th the smaller value of L,(x) = C, + (x -- #,)rE 7 t(x - ,u,) = C, + Q,(\). (4)

Table 1 Slmdarlty functions and staUstlcal ~mphcatlons Assumptton Z=Z

Stmllanty function

Z = Zd

$1 = log tYJ + (v - #)ry'- l(v _ #) _ 21ogp S, = log 1~2al+ (~ - #)TEg t(~ -- #)

Y = a21

Sa = log 0-2 +

Z= I

$4 = ( x - p ) q x -

Z = Z.

(~

-

~lr(~

-

p)

0-2

p)

$5 = 2xrZ.-~# + ,urz~-~ - 21ogp

Stausucal lmphcauons • Data set Is accurately described • Principal axes of the equlprobable contour are assumed to be parallel with the measurement coordinates • Probablhty of occurrence is independent of the angle between the observation vector and the measurement coordinates • Hyperelhpsotd is approxlmated by a h3persphere with a radms of l • Hyperelhpsozd is approxlmated by an average hyperelhpsold

Number of multlphes

Number of adds

(n" + 3n)/2

(n-' + 3n).2

3n

2n

2n + 1

2n

2;i

2;1

~

n- 1

'

Apprommat~ons for the probabflaD of m~sclasslfic,mon X2

x7.

121

Accepted by S I and S 3

Accepted by 53re}ected by S~

XI

Fag 1

Fag 3

for 1 = 1.2 Here, as d~scussed in earlier Sections, the #, and E, are chosen to reflect class parameters as well as the type smulanty function to be used in classnScatlon The number C, equals log)Z,I plus a term reflecting any prior knowledge about the relative fiequenmes of the two classes The classfficanon rule. then. is to assign x to the second population n~ case

it IS possible to write down expressions for this probability density funcnon, they are generally vet) complex mfimte series with very slo~ convergence patterns Our approach has been to find the cumulants of the dlstnbunon of Q2(x) - Qj(x) and use these to approximate the probability dlstr~buuon Cumulants which we denote K1,K2. . are linear functions of the moments of a random variable These are catalogued m. e g Kendall and Stuart (o)For convemence we note that the first cumulant, K a is the mean and the second. K2, is the v a m n c e There are a number of possible methods of using cumulants to estmaate distribution functions We have looked at four of these, namely (1) a normal approxmaanon using KI and K z . (2) an Edgeworth expansion m the normal distribution function and its derwanves. (3) a Pearson curve approxtmanon, and (4) use of the Johnson series of probability curves All of these techniques are discussed m Refs 2 and 6 and we have also used

Q2txl + C: <_ Q,(x) + C 1

or if Q:(x) - Ql(x) < C~ - C2

(5)

The probability of this event occurring can be written as P[Q2(x) - Q,(x) <_ Ca - C2].

(6)

which is seen to be the probability distribution function of Q2(x) - Ql(x) evaluated at Ca - C2 Although

Accepted by

$ ) and S 2

xz

Accepted by S) and S 4

x2

Accepted bYS2rejected by S I

F~g 2

Accepted by S4rejected by

Fig 4

S)

122

S G WHEELERand D S INGRAM Accepted by S I oncl $5

Accepfed by S 5 rejecfed by $ I

F~g 5 Craig's(1) exposmon on the Pearson curves The normal approximation is the simplest but. except in spe-cial cases, the least accurate The Edgeworth expansion. if camed to an infinite number of terms, is exact but it can converge slowly. We have found that sometimes a small number of terms, say 12, gives good results and sometmaes poor results, but there is no way to decide when each happens We settled then on a combmatmn of the Pearson and the Johnson systems Both of these approxmaatmg formulae have yielded very good approxlmatmns to the error probabdities The Pearson system of frequency curves is a family of probability distnbutmn functions which are determined by their first four cumulants Many of the well known dtstnbutlons of staustics are members of the family Detads of the procedure used to fit cumulants to the Pearson curves are gwen m Crmg (t) and Elderton and Johnson(2) Briefly. values of skewness 3 BI = Kg/K2, and kurtosls, B., = 3 + K , / K 2 are used to determine the Pearson curve type All of the curve types have different forms but each has four or fewer parameters which are also calculated from the coefficients of skewness and kurtosls. The appendix gaves examples of fitting the Pearson curves to a Standard Data ~et from Fukunaga (3) All of the Pearson curves except the one desagnated type 4 can be transformed into a normal, an incomplete gamma or incomplete beta type dlstnbutmn Good approximations are available for integrals of these distributions so, except for type 4 numerical integration is not required. In order to develop a system of approxamaUons winch did not require numerical approximations, the Pearson type 4 curve was replaced by the Johnson-Su curve The Johnson system is a different family of probablhty distributions which arise from considering transformations of the normal distribution Because these distributions can be transformed to the normal distribution ~t is easy to calculate their probablhty integrals Like the Pearson curves, the members of

the Johnson curves are determined by the coefficients of skewness and kurtosls However. except for the Johnson-Su curve, parameters are dttficult to calculate from the moments and cumulants Luckily, the Su curve is similar to the Pearson type 4 and can replace that curve Details of fitting this curve can be found in Refs 2 and 6 The combined system in which the Pearson and Johnson curves are used in approximating the probabdaties of misclasstficataon was programmed and tested Results of our study of the approximations are given m Section 5 An example m which this approximation is used to estimate the probability of misclassaficatmn for a Standard Data set is given in the Appendix 2 3 The curnulants The approxLmatlons to the distribution function of Q = Qz(x) - Qt(x) depend on the cumulants of tins random varmble. In this sectmn we develop expressions for these cumulants The derivation begins by showing that the difference between two quadratac forms can be written as a constant plus a third quadratic form This new quadratic form as then dlagonahzed by a linear transformatmn which forms it into a weighted sum of squares of independent normal vanates The cumulants of such a weighted sum of squares are known so they can be wntten down and expressed as functions of the population parameters (//, Z), (//I, Yt) and (#2, Y2) The development is stmalar to that of the characterist]c function in Fukunaga so only the important steps will be given The difference between the quadratic forms as AQ(x) = (,,; -/./2)TY~ t(x - / / 2 ) - (x - ,ul)rY~- I(x - #1), (7) winch can be wntten as AQ(v) = (,c - A - t F ) r A ( x - A - I F ) - F r A - IF + #r,.X21#2 - #rtE( lpt,

(8)

where A = E~ t -- E [ i and F = ~_;-1//2 - E~ t//, For convenaence we have assumed that A has full nmk The final results do not, however, depend on this assumption and they remain vahd even if when the assumption is violated Now making the transformation y = R- l(x - / / ) where R is the matrix whmh transforms Z to the Identity matrix and A to a dmgonal matrix, D, the &fference between the quadratic forms reduces to AQ(x) = G + ~ dj(yj + m)". j= t

where G is a constant. p = the rank of Z or 4.

(9)

Approx]matmns for the probabflV> of mlsclass]ficat~on

123

Table 2 Mean vectors and covanance matrices of RC1 and RC2 72.3~0 69 380 5~ 000 5 ; 180 72 1 9 0 7~ ~ 6 0 57 500 68 9 2 0 55.380 76.580 1~0 020 107 960

080 663 61~ 77~ 917 Oa7 380 3 551 2 618 3,812 k 993 3 583

2 ~ 1 I ~ ~ 2 3 3 ~ 3 2

2 1 I 3 ~1= ~

2 ~65 1 1~7

663 2~ 592 898 166 32U ~66 908 016 235 295 7~1

1 I~7 2 756 979 I 125 1 55~ 925 ~86 1 ~g8 1 5~ 1.1~6 -1.898 1.313

I 172 87a I ~I Z2 = 8~ ~98 1 505 I Ig7 913 -~ 570 I 380

1 1 I I 2 2 1 2 I 2 I

61~ 592 716 151 359 375 518 376 761 2~1

~2 =

.g30

i 77~ I 898 1 151 2 07~ 2 712 2,885 1,833 2 933 2 3~6 2.669 0oo 383

3 917 166 2 359 2 712 673 7 0~7 053 6 337 65~ 6 721 6 ~ 3 ~0

I 172 9?9 I 716 f12~ I 226 .678 ,~32 1 ~53 1 19U 907 -2 2~6 -I ~97

.87a 1.125 82~ I ~6~ 1.176 .602 5~6 I 228 1 260 .80~ -2 59~ -2 021

1 55~ 1 226 1,176 3 2~0 1,~69 83~ 2 06~ 1 77~ i 795 1.029 316

71~

71 53~ 68 360 53 2~0 53 ?10 71 2 8 0 78 SaC 56 920 67 68G 5k 170 71.150 121 770 g~ 590

~ q 2 2 7 I0 5 6 q 9 21 12

1 3 I 1 3 15 9

0~7 32u 375 885 0~7 112 136 565 0~5 109 269 ~11

2 2 I I ~ 5 3 3 2 ~ 8 5

925 678 602 ~69 960 ~57 122 703 92~ 356 271

dj = the lth diagonal of D. m~ = t h e / t h element of the vector R- XM, and yj = the tth element of vectory y Jensen and Solomon ~5) prowde the cumulants of the weighted sum of squares P

y" d,(y, + m,) 2,

1 I

1 5 3

~80 ~66 518 833 053 136 610 923 736 99~ 56~ 056

.86 ~32 5~6 83~ ~57 ~88 780 635 825 927 6~6

3 3 2 2 6 6 3 7 5 5 -2 -I

551 908 376 933 337 565 923

78~ ~2~ 829 8~0 980

1 505 1 ~98 ~53 1 228 2 06~ 1.122

780 3 53~ 2 308

1 0~2 -7

522 297

2 3 1 2 ~ ~ 2

6aB 016 761 316 65~ 0~5 736 5 ~2~ 5 760 3 882 -8 375 -~ 896

1 1 1 1 a

197 S~k 19~ 260 77~ 703 635 2 308 3 ~60

3 ~ 2 2 6 9 ~

812 235 251 669 721 109 99~

5 829

~ 993 3 295 1 71~ 000 6 ~ 21 269 8 56~

2 8~0

3 583 2 7~. 930 383 3 ~0 12 ~11 5 056

~ 980

3 882 11 357 28 910 16 ~ I

8 28 211 113

375 910 ~12 ~93

~ 16 113 78

913 1 196 907 80a 1 795 3 91~ 1 825 1 0~2 567

-1 -1 -2 -2 -1 15 5 "7 -10

570 898

-1 -1

567

7 673

- 1 0 100 6 5~0

27 312 16 070

2~6 59~ 029 356 927 522

100 27 312

20~ 113

20U 0~8

895 ~ ~93 [77

380 313 ~9 ~ 02a - 316 9 271 3 6~6 - ~ 297 - 6 5uD

-1 -2

16 070 113 77

0~8 26~

portant is if Z] = 22 m which case only KI(AQ) and K2IAQ) are non-zero In fact the d i s t n b u n o n of AQ in this case is normal with mean equal to K~(AQ) and varxance equal to K,(AQ) Other special eases follow from considering equalmes among (~ ~7) (/~]. El) and (/~2.22)

3=1 3

as

(10)

K~ = 2 ~-l(s - 1)v0~, where p

0, = ~ d ~ ( 1 + s m ~ )

f o r s = 1.2,3,

(11)

J=l

Using the definmons of D and M, the cumulants can be written m terms of the original populatmn parameters as KI(AQ) = trA~" + ~ 2

-

~)T~'~2 1 (#2

-- 0A] -- J2)TZll

- - I-/)

(/J1 - - # ) ,

(12)

K,(AQ) = 2 ~- l(s - 1)I [tr(AZ) ~ + s(£21 62 - £7 1 6 0 r ( E A p - 2 £(Z~- 1 62 - Zi" ] 61)],

(13)

s=2.3. where 6, = ~ - #, We note that these cumulants do not contain terms m A-~ and it can be shown that existence of thas inverse is not necessary for the cumulants A particular s]tuauon when this would be ira-

N u m b e r of Channels Channels Used

1 10

.2 10,12

NUMERICAL

RESULTS

N u m e n e a l experiments were conducted using simulated data to evaluate the effectiveness of the probabihty of error model The probablhty of error is a function of the swmlanty function, the number and location of the channels and the statistics of the classes being compared The statistics of the simulated data used m the experiment correspond to the classes designated as red clover 1 (RCI) and red clover 2 (RC2) as determined from the C1 flight hne The mean vectors and covarlance matnces are hsted m Table 2 Based on this data two sets of graphs were generated, Fig 6 illustrates the predicted P(e) vs the number of channels for each smaflanty function whde Fig 7 plots the predicted P(e) vs multiples of Z~. and E2 for the four channel ease The Pie) denotes the probability of classify]rig a true RCI observation into RC2 The results for $3 and $4 are so close that only the values for $3 are plotted The pamcular channels used m Fig 6 were chosen as the channels that produced the best results for S~. the> are

3 9,10,12

4 6,9.10,12

5 1,6.9.10,12

6 1,6,8,9,10 12

12 all

S G WHEELEgand D S INGRAM

124

~

0

04--

4!i

o

. . I "" ' ' ' ' s 2

i /

o03--

.~s..¢

a

/

///"

s~

/'/ ~02

->" o

o C~

s, I

,

I

z

I

3

t

I

4

5

Number

I

6

of

I

7

I

8

I

9

I

)o

I

i)

'D

0

I

12

chonnels

IL

)

0

Fig

2

3

4

6

Multiples of ~-iand ~a Some of the results illustrated m F~g. 6 are somewhat surprising, for example The sharp decrease m P(e) at 4 channels for St and $5 The small decrease m P(e) assocmted with increasing the number of channels from 4 to 12. The trends indicated for $2 and $3 for more than 3 channels The advantage of F~g. 7 ts that tt can be used to determine the effect of changing the covarmnce matrix The numerical expernnents conducted to test the accuracy of the probabdtty of error model consisted of generatmg 1000 samples for each case and counting the number of samples that were classified as RC1 and RC2. Table 3 illustrates the results of the expertment usmg snnulated data. The Predicted value is obtained from equation (13) while the values corresponding to Expernnent were obtained as described m the preceding paragraph The standard devmtmn assocmted wlth the expernnent is trE defined by a~ = P(e) [1 - P(e)]/N where N is the number of samples used The difference between the expernnentally determined value and the predmted value for P(e) is given by A. In almost every case the value of A is less than one standard devmt~on

Fig 7 Figure 7 dlustrates the effect of changing the stze of the covarmnce matrix. The accuracy of the model was tested for muluples of Et and Y2 corresponding to 0-5. 1 and 4 The results are illustrated in Table 4 In all but three cases the value of A ~s less than one cr~ and m two cases Xt ts equal to Z2 so the predmted P(e) is known to be exact 4. C O N C L U S I O N S

An algorithm has been developed whmh predicts the probablhty of mlsclassfficatton between two classes The algorithm models the effect of various snnllarlty functions and the channels and statlst~cs of the classes being compared. The numerical expermaents that have been conducted have shown the algorithm to accurately predict the P(e) Tins algorithm Is partmulady beneficml because it can be used to determine the condmons reqmred to classify an observation within a specified accuracy level SUMMARY

Tins paper outhnes the development and evaluanon of an algorithm which can be used to estnnate

Table 3 Accuracy of predicted P(e) as a funcuon of the number of channels and the similarity funcUon SIMILARITY FUNCTIONS

Si

$2

$3

S4

$5

CHANNEL 10

~redicted Experiment

197 203

197 199

197 203

210 199

01~

013

013

013

013

006

002

006

- 011

- 007

Experiment

089 093

009

222 215 013

206 196 013

211 189 012

099 I00

~E

%

A ffi F.xp - Pred

210 203

CHANNELS 6. 9e l0 v 12 Predicted

: Exp - P r e d

004

-

009

007

- 010

- 022

001

007

287 280 014

239 245 014

240 242 014

081 072 008

- 010

° 007

006

002

* 009

ALL 12 CHANNELS

Predicted Experiment ~E ~ffi ~ x p - P r e d

066 056

~pproMmatlons for the probabfllt3 of mlsclassificauon

125

Table 4 Accuracy of predicted P(e) as a funcuon of multiples of El and Z2 and the slmdanty function SIMILARITY FUNCTION S1 Multiple = 5 Predacted Experiment *E = Exp-Pred Multiple = 1 Predicted Expernnent CE A = Exp-Pred Multaple = 4 Predlcted Ex'~er xment *rE & = Exp-Pred

S2

S3

S5

S4

033 035 006

121 131 011

128 132 011

117 122 011

034 042 006

002

010

004

005

008

089 093 009

222 215 013

206 196 013

211 18q 012

090 100 009

004

- 007

- Ol0

- 022

001

249 245 014

404 404 016

344 360 015

366 355 015

260 280 014

- 004

000

016

- 011

020

the error rates resulting from classifying multivariate normal observauons into one of two classes The rules for which probability estimates can be produced include Bayes rules, Linear D l s c n m m a n t rules. weighted and unwmghted Euchdean distance rules and others The estimator was developed by denying the cumulants of the statistics used m classification and then approximating their dlstrtbutmns by using the Pearson and Johnson systems of frequency curves The estimator was tested and evaluated by use o f a large number o f slmulatmn experiments and was found to give very good results The paper closes with a discussion of the results of the experiments and the m e a m n g s of some of the more c o m m o n classlficatmn rules

Equauons 112) and (13) give the first four cumulants as K1 = 23 2219, g 2 =

383 4878,

K3 = 12088 9879 and Ka = 596610 8508 The coefficients of skewness and kurtoms are B1 = 2 5913 and B2 = 7 05684 Also reqmred are values 2B2 - 3B1 - 6

6=

andD=B 1 -4{0+2)=231660 According to the criteria given m Craig, 6 and D both greater than zero specify that the Pearson type 6 curve with density functmns f ( t } = C(t - rl)"~(t - r,)"-, t >_ I 1

is appropriate The parameters gaven b3 I1

-,

S-~+\ D

rl

m1

1 2988

26

REFERENCES

1 E C Craig ~ new exposmon and chart for the Pearson system of frequency curves, Ann M a t h Staust 7, 16 11936) W P Elderton and N L Johnson, Systems of F)equencl Cur~es Um~erslty Press Cambridge MA 11969) 3 K Fukunaga lntroductmn to Statlstzcal Pattern Reco Oration Academic Press New York 11972) 4 K Fukunaga and T F Krfle, Calculauon of Bayes recogmtion error for two multivariate &stnbutlons I E E E Trans Comput C-18. 220 (1969) 5 J R Jensen and H Solomon A Gaussian approxunauon to the &stnbutmn of a definite quadratic form J 4mez Statzst 4ss 67. 898 11972) 6 M G Kendall and A Stuart The Advanced Theory (q Statlstlcs Vol 1 Hafner New York 11969)

= 0 03377

B:+3

- 4 6 3657,

26 l+6\B

6

1+26

1

x~

6

0 7643

Table 5 Means and covanances of the standard data 1 and 2 from Fukunaga M 1 = [ 7 825

6 750

5 835

8 525

6 615

7 065

7 865

4 435J~ T

M 2 = [5 760

5 715

5 705

4 150

6 225

6 960

6 750

3 910] T

I 281 I 967

0 351 0 664 7 138

-0 293 -0 219 I 192 2 269

0 0 2 i 5

0 0 1 0 I 2

301 556 116 146 280 941

0 141 0 276 0 678 0 201 0 933 1 949 I 577

4 417 S 074

4 244 4 636 5 420

0 0 0 i 2 4

790 639 903 326 229 008

0 0 i I 2 2 4

] =

034

098 259 726 367 727

APPENDIX EXAMPLE

OF

FITTING

THE

PEARSON

1 2 2 -0 2 2 i 6

336 q 094 097 308 107 197 229 606

CURVE

This appen&x gives an example of approximating the prohabdn) of mlsclasslficaUon for Standard Data 1 and 2 from Fukunaga ~3) The two mean vectors and vanancecovanance mamces are given m Table 5 "The quantity to be calculated is the probability that an element for standard population 1 will be classified into population 2

zz

= I4 792

2 2 3 5

406 798 224 287

I I 2 3 3

789 824 ill 006 574

785 644 131 897 471 405 507

2 2 2 2 1 I I 3

993 799 943 648 915 106 727 972

126

S G WHEELERand D S [NGRAM

and

is given by

1 + 3 x//~~

1 + 26

P(e) = 1 - B[(rt - r2)/ty - r2)]. -- --63 9835

For tlus curve type m2 is always less than zero The probability of error is

P(e) ~=f2 f(t) dt,

where B(zt is the incomplete Beta integral with parameters m 1 + I and - ( m 1 + m2 + 1) Using a standard approximation to this integral, the probability of error is given. in percent, as

P(e) = 0 049% This compares favorably with the exact value of P(e) = t 6% given in Fukunaga

where y = (logXd - log~X,I - KO/x/'--K2 By transformation ~t is seen that the probability of error

Note that evaluating the exact value requires a numerical integration and evaluation of characteristic roots of a matrix The computation here requires only routine matrix manipulations and available integration routines

About the AUthor--STANLEY G WSEELER received the B S degree from Bucknell University in 1962 and the Master of Statistics degree from the University of Florida in 1966 He did postgraduate studies in mathematical statistics at The George Washington University After receiving his degree in 1966. Mr Wheeler worked as a blostatistlcal analyst with the Health Sciences Computer Center of the University of Maryland and the Administrative Research Staff of the U S Veterans Administration He joined the IBM Federal Systems Division (FSD) m 1968 He has worked as a statistical analyst on a number of different projects including simulation studies of large communications networks and analysis of large area seismic arrays In 1973 he joined the FSD Houston Operations where his interests have centered on studies of pattern recognition primarily as applied to remote sensing of the earth's resources and envtronment About the Author--DOUGLAS S INGRAMreceived a B A degree m physics and a M A degree in mathematics from Pepperdlne University m 1965 and 1967 respectively In 1970 he received a Ph D in aerospace engineering from the Umverslty of Texas From 1965 to 1971. Dr Ingrain worked for TRW Systems on problems in celestial mechanics and orbit determination for project Apollo During 1971 he was involved in data processing and analysis in support of petroleum Identification for Schlumberger Well Services In 1972. Dr Ingram joined the Federal Systems Division of IBM in developing techniques to obtain information from remotely sensed data Currently he is manager of the Earth Resources Analysis Department The responsibilities of this department include sensor modeling and analysis, developing data analysis techniques and developing data analysis systems to solve users' problems