On the robustness of the equal-mean discrimination rule with uniform covariance structure against serially correlated training data

On the robustness of the equal-mean discrimination rule with uniform covariance structure against serially correlated training data

003L 32038853.00+ .tit) Pergamon Press pie Pattern Recogmtaon Soctcty Patro'~ R¢cottntt~on. Vol. 21. No, ~ pp. 189 194. 1988. Pnnted in Great Brllmn...

370KB Sizes 2 Downloads 13 Views

003L 32038853.00+ .tit) Pergamon Press pie Pattern Recogmtaon Soctcty

Patro'~ R¢cottntt~on. Vol. 21. No, ~ pp. 189 194. 1988.

Pnnted in Great Brllmn.

ON THE ROBUSTNESS OF THE EQUAL-MEAN DISCRIMINATION RULE WITH UNIFORM COVARIANCE STRUCTURE AGAINST SERIALLY CORRELATED TRAINING DATA DEAN M. YOUNG* and DANNY W. TURNER Baylor University, Waco, TX 76798 and VIRGIL R. MARCO Oklahoma State University. OK 74078. U.S.A. (Receh'ed 14 July 1987)

Abstract - The effect of correlated training data on the error rates for the sample linear di~riminant function has been studied by Basu and Odcll."b McLachlan. q:~Tubbs. ~ and Lawoko and McLachlan/~ *~This paper investigates the effect of serially correlated data on the expected error rate of the equal-mean classilicr with uniform covariance structure. Misclassitication probabilities Stationary time series Serial correlation Spectral density function

I. I N T R O I ) U C T I O N

Research el1"orts for the statistical discrimination problem, which was initially posed by Fisher, ~7~have been concentrated mainly on the two-population case for multivariate normal populations with a c o m m o n covariance matrix and unequal mean vectors. It was only relatively recently that O k a m o t o ~KI first studied the discrimination problem with respect to unequal covariance matrices for two multivariate normal populations with c o m m o n mean vectors. Geisser, ~ Geisser and Desu, "°'tal Desu and Geisser, "2~ Enis and Geisser "~ and Lee 1~4t all have investigated the equal-mean discrimination problem via Bayesian approaches. In particular, Bartlett and Pleasd ~5~have investigated this problem for the case when Et and E2 are known to have uniform covariance structures of the form

sample linear discrimimmt function. In particular, Basu and Odell ~u and McLachlan c:~ have studied the effect ofequicorrelated training data on the error rates. Similarly, Tubbs (31 and Lawoko and McLachlan ~4 ~ have studied the effect of serially correlated training data on the error rates. In this paper we consider the effect of serially correlated training data on the sample equal-mean classifier assuming the uniform covariance structure. The main tool in the analysis is an asymptotic expansion for the expected error rate of the equalmean discriminant function with uniform covariance structure derived by Marco et al. c1~1 For the sake of completeness we shall give a brief overview of the equal-mean discrimination problem for the two-population case.

2. BAYES D I S C R I M I N A T I O N F O R V A R I A N C E ,MATRICES

~:, = 'r~ E(! - P)b + p j . r ] ~ = a~r(l

-

v)b

+ vJJ']

Asymptotic expansions

(l)

Let i denote an individual belonging to one of two distinct populations. Assume that each member of the union of the two populations possesses a finite set of c o m m o n characteristics or features which we denote by F = U'h f2 . . . . . f~)' whose observed values are denoted by x = (xl, x2 . . . . . xp)' such that xj is the observed value of the c h a r a c t e r i s t i c f . j = 1, 2. . . . . p. If we assume that the characteristics F = (fl, fz . . . . . fp)' are selected a priori, the discriminant problem can be summarized as follows.

whereJ' = (I,I ..... l ) i s a ! x p vector ofones, anda~, a.~, and p are known with a~ > alz. An excellent development of this discrimination model may be found in Ksbirsagar"~j with the assumption a~ = 1. Several authors have investigated the effect of correlated training data on the error rates of the "To whom correspondence should be sent. 189

190

D.M. YOUNG.D. W. TURNERand V. R. MARCO

Let Fit and I1. denote two distinct populations whose known multivariate probability density functions of the p-dimensional observation random vector X are denoted by pdx) and p.,(x), respectively. Let xt and ",: be the known a-priori probabilities that an individual I is selected from the population l-It or 1-I.,, respectively. Let C(i j) be the cost of misclassifying an individual from population l'Ij into population I-I, where

fl z

i,j=l,2.

(2)

Then, given the p x I observation vector x on the characteristics of an individual l selected at random from the union of the populations lit and FI:, the problem is to formulate a decision rule R which minimizes the expected cost of misclassification for assigning the individual I to one of the populations Fi. i = 1, 2. The Bayes solution to this problem is well known and is given in Anderson. "9~ if we make the usual assumption that p~(x) is a multivariate normal density function of the form:



(6)

and Z~(t')dt' = I - H L o - ~ _ ! j

P(2 2) =

(7)

cl

where o-2k

C(i[J){ >0=0 i=j'i#J"



L - ~ " - 1" J

k = ~.and 0"" -- 1

ct = ~ , c 2

a 2 -- I

k = p(Ino"L

and where Z~(') is the probability density function and H is the cumulative distribution function of a ;~2 random variable with p degrees of freedom. Thus. for ~zC(!!2) = ~ttC(21 I) with o'~, o::. and p known, the probability of misclassification is given by

= 1/2

1-fl

where a z =

7"'~i

J +

La 2-

IJ.J

(a~/a~).

p, (x) = (2~)-P': I X~I - ':

x exp[-!(x-p)'X,-'(x-p)],

i = 1,2,

(3)

then the form of the Bayes discrimination function for equal-mean discrimination is given by the following: Allocate the individual / which generates observation x to FI., if Q = (x - It)' (Zt-1 - Z2-t)(x - I t ) > k

(4)

where

llowever, in actuality, the parameters o~. o~, and p are usually unknown and thus the discriminant fupction Q must be estimated from independently sampled training data from [l t and H:. The sample estimative discriminant function ~ is formed by replacing the unknown population parameters in the expression for Q at (5) with their respective estimators. To estimate the parameters o~ and try, we may apply the unbiased estimators

k = 2" l n :1t "2C~()! + In Ix21' "~tC(21 i) 15hi '

pnf t-tk-t

otherwise, allocate the individual 1 which generates observation x to 1-It. Since the set of X satisfying (2 = k has Lebesgue measure zero, the Bayes discrimination is determined uniquely with probability one. Without loss of generality, we shall assume ~uis the p-dimensional zero vector. Furthermore, by standardizing using the standard deviation of Fit, we may take X, = [(1 --

m ( J ' X J ~3 = rp + t~p - l ) p ] ~ and apply the method of moments to obtain

p)lp + pJJ']

Z2 = o"2 [(1 --

where X0k is the kth dimension of the jth sample observation from the training sample X , , X~z..... X,,,, i = 1, 2. Similarly, to estimate the parameter p when the parameters try, i = I, 2, and p are unknown, we observe that

1

p)lp + pJJ']

13 = 2 p ( p

where az= a~/tr~. If Y~t and 1~2 have the uniform covariance structures as just defined, then the Bayes discrimination function (2 may be reexpressed as o 2-

1

1) [W* + W., - 2p],

(10)

where nt

w,=

a'(t o__)(2=:,

-

1, , , j ~,(J'X¢):,i .

1, 2,

z2

_

1 +(p--

I)p

where : t = x'x, z2 = (J'x) 2, a 2 = o~o~, p and J are defined as in (I), and x is now standardized as described above, if-,, C(112) = "t C(211), then using results from Kshirsagar, ~'~j we find the associated conditional probabilities of correct classification for the equal-mean, uniform-covariance-matrix case to be as follows:

and Xt ,, Xd 2..... X,,, is a random sample from H , i = !, 2. Throughout the remainder of this paper we shall assume equal training-sample sizes, i.e. nl -- n, = n. The expected error rate of (~, denoted by E[R(~t, ~)]. is approximated by the asymptotic expansion derived by Marco et al."s) and is given below in Theorem I. We shall need the following notation for the statement of the theorem. For i = 1, 2, let

Error rates of the sample equal-mean discriminant function Con" (X,,. X,0 = 4b~*, for t = j - k

q-q,

~,,. = .

191

(j#k=

4:<,"

and c" where o.2 = ( ~ j ~ ) and 0k = 0{ l/n~). r" - e--~ o~ = ~

Theorem 1: The expectation of R(#.,, 8;) over the joint

where ff~ < m. Using Theorem 4.2.1 of Fuller (Ref. (20). p. 138), we transform the correlated training vectors X,, into a set of "nearly" independent observations Z~ where, for sufficiently large training sample sizes n,, i = 1, 2,

I

sampling distribution o f ~ and #~ can be expanded in the form

1,2 . . . . n,) (12)

Z~ ~ N(O, d,,E,),

(13)

and, asymptotically, Corr (Z,,, Ze) = O, (k # j),

Er.R(~, ~ ) ] = 2

n(0.~. 0.~) + ! Y~ e(o~

,q)-" e£ n

-

o:

+

(1 i)

i=l

where

E(~ - at): = ( . a" , , n" , p ) [ l (~{R = - i { ( 2 @ ~ )

where X, is given in (1). The d, in (13) are defined as follows. Let f,(') denote the spectral density function for population UI,. Then, d, = 2nf(O),

1)] o'~.

+p"(p--

and, for odd n,,

[all(a) - art (~)]

+ (r,',j~) [~z t/ (,~) - ,~" tt (c')]l,

d,.,, = d,.:, + I = 2nf(2nr/n,),

and

r = I. 2..... ! (n, -- I). ?~ R

=

-

t {o~ [,::

It

(,l)

<~: it ff)]}.

-

(14)

When n, is even, r ranges from I to (In, - I) in (14) and

Also, note that

d,,, = 2nf(n).

?ll(ti) = /~a: -- Ina a -- I) h(u}, (r/z -- l) 2

?,I1((~)

= p(a'

-

I - a:lna:)

o: (a: -

To evaluate the effect of serially correlated training observations on the expected error rate of the equal-mean discrimination function with uniform covariance structure, we mr,st first calculate the variance of #~', i --- 1, 2, where

h(v),

1)2

ph(u)

¢72/!(d ) =

2(a: -

1) "l

a: Ina" {[(p - 2)(a"

- -

:7:

I)

It.l 1-i

- pa" Ina"'] [0.z _ ina~ _ 1]" + 2(0": - 1)(1 - a 4 + 20." In0.") lna"}, ~:"ll(t~) =

is an estimator of 0.~* using correlated training data where ~ * = a~/g, and g, is defined in (15). First note that Z~jl ~ d,i ~ Z" (I). Then, it follows that

ph(v)

2a'(a z - 1)' Ina" {[(p -2)(0.2 - I)

=

k-lJ-I

f fi

- plno2 ] [a: - I - a" Ina2] ' + 2['20.'*1no2 - ( a 2 - 1)(30"-'- 1)['(a2 - 1)Ina"]]}, h(-) is the density function of a chi-square random variable with p degrees of freedom, paZlna: t/ = ~

,

a"-I

k-lj=l

= ~0.~ where

pine2 V ---- ~

= (l/n,) fi d,r

,

a:-I

and t~ and 0 are obtained by replacing a" with 82 in expressions u and v where #2 = (dffd~).

(15)

I"1

The variance of ~" is obtained as follows. Var(~') = E[~" - ~ d ] :

3. T I l E ASYMPTOTIC EFFECT OF SERIAl, CORRELATION ON TIlE EXPECTED ERROR RATE

Following the approach of Tubbs °~ and Lawoko and McLachlan, ~4~we assume that the training vectors X,r j = I, 2 . . . . . n,, belonging to I-I,, i = I, 2, have a correlation structure of a discrete linear process which has an absolutely summable covariance function. We further assume that

=

f

eiz ,, - ,,,,

~-ij-i

t-lm-lj.I ll#.m

E(zD~ - d,, <,-.)(ZT,. - d,, ,q)

= ~, E[(IlpnD ~ fi (X~,, - =~):] k-I/~l

P

P

.

4

2

4

2

a-'

=

4

2

2

70

172

n = 35

q~t 0 0.1 0.5 0 0.1 0.5 0 0.1 0.5 0 0.1 0.5

0 0.1 0.5 0 0.1 0.5 0 0.1 0.5 0 0.1 0.5

0.0012 0.0031 0.0282 0.0010 0.0026 0.0241 0.0018 0.0046 0.0431 0.0012 0.0030 0.0289

-0.1 ASMDIFF

0.0024 0.0061 0.0513 0.0020 0.0052 0.0445 0.0036 0.0092 0.0783 0.0024 0.0060 0.0541

-0.I ASMDIFF

#.,

0.05

-0.0036 0.00OI 0.0451 -0.0030 0.0001 0.0391 - 0.0054 0.0001 0.0684 - 0.0035 0.0001 0.0470

-0.0448 -0.0413 0.0020 -0.0370 -0.0341 0.0020 - 0.0656 -0.0607 0.0010 - 0.0405 -0.0376 0.0009

0.0007 0.0025 0.0276 0.0006 0.0021 0.0236 0.0010 0.0038 0.0422 0.0007 0.0025 0.0283

-0.0018 0.0000 0.0251 -0.0015 0.0000 0.0214 -0.0028 0.0000 0.0382 -0.0018 0.0000 0.0255

-0.0258 -0.0240 0.0005 -0.0214 -0.0199 0.0005 -0.0384 -0.0357 0.0002 -0.0241 - 0.0225 0.0002

-0.05 0.1 0.5 ASMDIFF ASMDIFF ASMDIFF

-

0.0013 0.0050 0.0502 0.0011 0.0043 0.04 35 0.0020 0.0076 0.0776 0.0013 0.0049 0.0528

-0.05 0.1 0.5 ASMDIFF ASMDIFF ASMDIFF

~b.,

-0.05

0.0012 0.0031 0.0282 0.0010 0.0026 0.0241 0.0018 0.0046 0.0431 0.0012 0.0030 0.0289

-0.1 ASMDIFF

0.0024 0.0061 0.0513 0.0020 0.0052 0.0445 0.0036 0.0092 0.0783 0.0024 0.0060 0.0541

-0.1 ASMDIFF

-0.0036 O.01)OI 0.0451 -0.0030 0.0001 0.0391 -0.0054 0.0001 0.0684 -0.0035 0.13001 0.0469

-0.0448 -0.04t3 0.0020 -0.0370 - 0.0341 0.0019 -0.0656 -0.0607 0.0010 -0.0405 -0.0376 0.0009

0.0007 0.0025 0.0276 0.0006 0.0021 0.0236 0.0010 0.0038 0.0422 0.0007 0.0025 0.0283

-0.0018 0.0000 0.0251 -0.0015 0.0000 0.0214 - 0.0028 0.0000 0.0382 -0.0018 0.0000 0.0255

-0.0258 -0.0240 0.0005 -0.0214 - 0.0199 0.0005 - 0.0384 -0.0358 0.0002 -0.0241 -0.0225 0.0002

-0.05 0.1 0.5 ASMDIFF ASMDIFF ASMDIFF

~b,

0

P

0.0013 0.0050 0.0502 0.0011 0.0043 0.0435 0.0020 0.0076 0.0765 0.0013 0.0049 0.0528

-0.05 0.1 0.5 ASMDIFF ASMDIFF ASMDIFF

q~:

0

P

0.0012 0.0031 0.0282 0.0010 0.0026 0.0241 0.0018 0.0046 0.0431 0.0012 0.0030 0.0290

-0.1 ASMDIFF

0.0024 0.0061 0.0515 0.0020 0.0052 0.0447 0.0036 0.0092 0.0786 0.0024 0.0060 0.0544

-0.1 ASMDIFF

q~,

0.3

- 0.0036 0.01)01 0.0453 - 0.1X)30 0.001) 1 0.0392 -0.0054 0.0001 0.0686 -0.0035 0.0001 0.0473

- 0.0448 - I).0412 0.0022 -0.0370 - 0.034 I 0.01)2 I -0.0665 -0.0606 0.01) 13 -0.0405 -0.0376 0.0012

0.0007 0.0025 0.0277 0.0006 0.0021 0.0237 0.0010 0.0038 0.0423 0.0007 0.iX)25 0.0284

-0.0018 0.0000 0.0251 -0.0015 0.0000 0.0215 -0.0028 0.0003 0.0383 -0.0018 0.0000 0.0256

-0.0258 - 0.0240 0.01106 -0.0214 - 0.0199 0.0005 -0.{)384 -0.0357 0.0003 -0.0241 - 0.0225 0.001)3

-0.05 0.1 0.5 ASMDII:F ASMDIFF ASMDII:F

~b 2

0.3

0.01) 13 0.(X)50 0.0504 0.01) 1 I 0.01143 0.0437 0.0020 0.0076 0.0768 0.0013 0.0050 0.0531

-0.05 0.1 0.5 ASMDIFF ASMDIFF ASMDIFF

Table 1. The asymptotic change in the expected error rate ( A S M D I F F = e - e~) for values ofn t = n, = n, o 2, $1, q~z' and p

O

~r

<

,--1 t"7

,.< o

Error rates of the sample equal-mean discriminant function where a~, = (I/n,) ~ d~j, and we have used the fact that I-1

Z,j ~ ~,.d/~,, X, r F r o m Theorem 2 in Marco et al) t~, it follows that Vat ( ~ ' ) = 2a7' a'~ [I + p"(p -np

.. I)],i = 1, ")

(16)

Let e0 and e~ denote the asymptotic expected error rate when using independent and serially correlated training vectors, respectively. The error rate e0 is given by the first two terms of (11). The error rate e~. is obtained by applying Theorem 1 with a~, ~ , a z, u. and v replaced by 0.~*, ~ * , o':*. u*, r*, respectively. where u* = p0.:* Incr:* a : * -- I

t;*=

tr z* -- I' and

,7, 0.~

,Z,

Thus, if the conditions listed in (12) and { 13) hold, then the asymptotic difference in error rate between the independence case and the serially corrchtted case may be expressed by e,

-

e,,

=

r(0.~*, 0.;'*

) -

R(0.i,' tr',)'

+ ~ (,7,c,*

- c,) + o :

(t 7)

i-I

where C, = (0-~/n, p) [1 + (p - I) p:] t3~R and C,* = (a~/n, p) [1 + (p - I) ;)"] ?~R* and t'~R* is givcn in (1 l) with a~ and tr~ replaced with ~ * and 0-~*, respectively. 4.

always increases the expected error rate for the linear discriminant function. However, Table 1 shows that the magnitude of increase in the expected error rate can be much greater than the magnitude of a decrease in the expected error rate, at least for the configurations considered in the table. SUMMARY

Using an asymptotic expansion, we have investigated the effect of serially correlated training data on the error rate of the sample equal-mean classifier assuming the uniform covariance structure. For the parameter configurations considered, the expected error rate may increase or decrease depending on the type of serial correlation encountered in the training data. REFERENCES

pln°2*

0.~*

193

A NUMERICAl. EVAI,UATION OF T I l E ASYMPTOTIC EFFECT OF SERIAl,IX CORRELATED T R A I N I N G OBSERVATIONS ON T I l E EXPECTED ERROR RATE

In this section, we evaluate the asymptotic change in the expected error rate for autocorrelated training data from independent training data. We shall apply an autocorrelation structure similar to that used by L a w o k o a n d McLachlan. ~a~Let ~b,, = qbI where I~b, i < 1 for i = I, 2. The asymptotic change in the expected error rate ( A S M D I F F ) is given in Table I for various values ofp, p, n = nl = nz, o~ (a~ = 1), q~z, and ~b:. From Table I it is clear that the expected error rate may increase or decrease, depending upon the situation encountered. This result is at least slightly surprising in view of the results of Lawoko and McLachlan "~, who have shown that serially correlated training data from first-order autoregressive processes

I. J. P. Basu and P. L. Odell, Effects ofintraclass correlation among training samples on the misclassitication probabilities of Bayes' procedure, Pattern RecoNnition 6, 13 16 {1974). 2. G. J. McLachlan, Further results on the effect of intraclass correlation among training samples in discriminant analysis. Pattern Reco~lnition 8, 273-275 (1976). 3. J. D. Tubbs. Effect of autocorrclatcd training samples on Bayes' probability of misclassification. I'attern Rcc,flnition 12, 351-354 (1980). 4. C. R. O. Lawoko and G. J. McLachl:m, Some asymptotic results on the effect of autocorrelation on the error rates of the sample linear diseriminant function, Pattern Recotlnition 16, 119-121 (1982). 5. C. R. O. Lawoko and G. J. McLachlan, Discrimi,lation with autocorrelated observations, Pattern Recofwitum 18. 145-149 (1985). 6. C. R. O. Lawoko :rod G. J. McLachlan, Asymptotic error rates of the W and Z statistics when the training observations are dependent, Pattern Recoftnition 19, 467-471 (1986). 7. R. A. Fisher, The use of multiple measurement in taxonomic problems, Attn. Euflen. 7, 179-188 11936). 8. M. Okamoto, Discrimination for variance matrices, Osaka Math. J. 13, 1-39 0961). 9. S. Geisser, Posterior odds for multivariate normal classification, JI R. statist. Soc. 26B, 69-76 (1964). 10. S. Geisser and M. M. Desu, Bayesian zero-mean uniform discrimination, Research Report No. 10, pp. 1-19. Department of Statistics, State University of New York at Buffalo (1967). I1. S. Geisser and M. M. Desu, Predictive zero-mean uniform discrimination, Biometrika 55, 519-524 (1968). 12. M. M. Desu and S. Geisser, Methods and applications of equal-mean discrimination, Discriminant Analysis and Applications, T. Cacoullos, ed., pp. 139-159. Academic Press, New York (1973}. 13. P. Enis and S. Geisser, Sample discriminants which minimize posterior squared error loss. S. Afr. Statist. J. 4, 85-93 (1970). 14. J.C. Lee. A note on equal-mean discrimination, Communs Statist. 4A (3), 251-254 (1975). 15. M. S. Bartlett and N. W. Please, Discrimination in the case of zero mean differences, Biometrika 50. 17--21 (1963). 16. A. M. Kshirsagar, Multirariate Analysis. Marcel Dekker, New York (1972). 17. C. R. O. Lawoko and G. J. McLachlan, Discrimination with autocorrelated observations, Pattern Recognition 18, 119-121 (1985).

194

D.M. YounG. D. W. Tt:R.~rER and V. R. MAKco

18. V. R. Marco, D. M. Young and D. W. Turner, Asymptotic expansions and estimation of the expected error rate for equal-mean discrimination with uniform covariance structure, Biometrical J. (1987).

19. T. W. Anderson. .4n Introduction to Multit'uriate Statistical Analy.~is. John Wiley. New York 11984L 20. W.A. Fuller. Introduction to Statistical Time Series. John Wiley, New York 11976).

About the Author--VmGIL R. MARCO received the Ph.D. degree in mathematical sciences from the University of Texas at Dallas in 1981. Dr Marco joined the faculty of the Mathematics Department at Baylor University in 1982 where he remained for three years. At present. Dr Marco is an assistant professor in the Department of Statistics at Oklahoma State University. His research interests include discriminant analysis, low-dimensional representation of high-dimensional data. and multivariate graphics.

About the Author--DEAN M. YOUNGreceived the B.S. degree from Texas Tech University in 1970, two M.S. degrees from Baylor University in 1975. and the Ph.D. degree in the Mathematical Sciences from the University of Texas at Dallas in 1980. Dr Young has been a member of the Information Systems department in the Hankamer School of Business at Baylor University since 1980. He is currently an Associate Professor. His research interests include discrimination and classification, dimension reduction, statistical graphics, and the theory and applications of variance bounds and he has authored several papers on each of these topics in refereed journals. Dr Young is a member ofthe American Statistical Association and the Institute of Mathematical Statistics.

About the Author--D^NNv W. TURNrK rcccivcd the B.S. dcgrcc in mathematics from Clcmson Univcrsity in [969. I.Ic rcmaincd at Clcmson and complctcd thc Ph.D. degree in mathematics in 1973. Dr Turncr joincd the mathcmatics dcpartmcnt at Baylor University in 1973. At prescnt hc is a Professor of Mathcmatics and Director of Mathematics Graduate Studies at Baylor. tic has had industrial cxpcricncc with Exxon and Texas Instrumcnts. ltis rcsearch intcrests include statistical graphics, applied probability, and classification and hc has bccn an author of numcrous papcrs in these areas. Dr Turner is a member of thc Mathematical Association of America and thc Amcrican Statistical Association. to which hc is Baylor's Institutional Representative.