The classification capability of a dynamic threshold neural network

April 1994 Pattern Recognition Letters ELSEVIER Pattern Recognition Letters 15 (1994) 409-418 The classification capability of a dynamic threshold ...

Download PDF

593KB Sizes 0 Downloads 53 Views

Report

PDF Reader
Full Text

April 1994

Pattern Recognition Letters ELSEVIER

Pattern Recognition Letters 15 (1994) 409-418

The classification capability of a dynamic threshold neural network Cheng-Chin Chiang, Hsin-Chia Fu * Department of Computer Science and Information Engineering, National Chiao-Tung University, Hsinchu, Taiwan 300, ROC

Received 12 November 1992

Abstract

This paper proposes a new type of neural network called the Dynamic Threshold Neural Network (DTNN). Through theoretical analysis, we prove that the classification capability of a DTNN can be twice as effective as a conventional sigmoidal multilayer neural network in classification capability. In other words, to successfully learn an arbitrarily given training set, a DTNN may need as little as half the number of free parameters required by a sigmoidal multilayer neural network.

1. Introduction

Recently, a n u m b e r o f researchers (Sontag, 1990; Baum and Hausler, 1989; Huang and Huang, 1991; Nilsson, 1965) have studied the recognition capability o f multilayer neural networks. In general, the main results obtained in these studies are derivations of the lower or upper bounds on the number of hidden neurons required to learn the recognition o f a given training set S containing a fixed n u m b e r of patterns. For example, it has been proved (Huang and Huang, 1991; Nilsson, 1965) that a committee machine requires at most k - 1 hidden neurons to dichotomize an arbitrary dichotomy defined on any training set with k patterns. Sontag (1990) also proved that if the direct input-to-output connections or the continuous sigmoid activation functions are used, then a network requires at most k hidden neurons to dichotomize an arbitrary dichotomy defined on any training set containing 2k patterns. Chiang and Fu (1992) proposed the new activation function called Quadratic Sigmoidal Function ( Q S F ) for multilayer neural networks to approximate continuous-valued functions. A QSF in ~n is defined as QSF:

1 f ( n e t i , Oi) = 1 + e x p ( n e t 2 - 0 2 ) '

( 1)

where neti = w i . x = Wi.o+ ~ = o wi,:xj. The two vectors w~ = (wo, ..., w~) and x = ( 1, x l, ..., xn) are the weight vector and the input vector, respectively. The parameter 0~ is called the threshold because it controls the distance between the two state transition boundaries of the neuron. In comparison with conventional sigmoidal multilayer neural networks, we obtained satisfactory results, such as faster learning, smaller network size, and better generalization capability, with our QSF networks. Fig. I shows the graphical demonstration for a QSF in E2. Note that, by Eq. ( 1 ), the threshold 0r is independent o f the input x: thus we call this threshold a "static threshold" because the two state transition boundaries are fixed for each neuron during the retrieving phase. Thus, * Corresponding author. Email: [email protected] 0167-8655/94/$07.00 © 1994 Elsevier Science B.V. All rights reserved SSDI 0167-8655 (93)E0041-L

410

C.-C. Chiang, H.-C. Fu / Pattern Recognition Letters 15 (1994) 409-418

1,

O. 75 0.5 0.250

24

02 ~ ~ ~ _ 4 - 2

-

4

Fig. 1. Graphical demonstration ofa QSF in ~2. hereafter, we will refer to a QSF neuron as a Static Threshold Quadratic Sigmoidal neuron. As shown in Fig. 1, each QSF defines two parallel linear state transition boundaries in the input space. Theoretically, a nonlinear state transition boundary should have better partitioning capability than a linear state transition boundary. Therefore, in this paper, we extend the QSF to another more generalized activation function called E x t e n d e d Q S F which is defined as 1

Extended QSF:

f(neti, 0~) = 1 + e x p ( n e t 2 - (g(Oi, x) ) 2 ) ,

(2)

where neti = w i ' x = Wi,o + Y j%0 wijxj, O~ = ( Oi,o, Oi, l . . . . , Oi,,), and g( Oi, x ) = Oi,o + Y]=o Oi,jxj is called the thresholdingfunction. According to Eq. (2), we see that a D y n a m i c Threshold Quadratic Sigmoidal neuron i contains n weights (wij, 1 ~
o

_.

o

-'

°

(b) Fig. 2. Graphical demonstrations of Extended QSF in R2. (a) ( 1+exp ((2x+y) 2- (x+y) 2) ) - ~; (b) ( 1+ exp((x-y) 2- (x+y) 2) ) - ~; (c) ( 1+exp( (2.5)2--x2))-i.

C.-C. Chiang, H.-C. Fu / Pattern Recognition Letters 15 (1994.) 409-418

411

ent parameter settings. This property enables the Dynamic Threshold Quadratic Sigmoidal neuron to have more powerful classification capability. By incorporating both Dynamic and Static Threshold Quadratic Sigmoidal neurons, we can design a more powerful multilayer neural network called Dynamic Threshold Neural Network and its learning algorithm. The proposed network architecture for a DTNN is shown in Fig. 3. In this paper, we will study the classification capability of the DTNNs which contain one Dynamic Threshold Quadratic Sigmoidal neuron in the output layer and several Static Threshold Quadratic Sigmoidal neurons in one hidden layer. We will prove that a singlehidden-layer D T N N can be twice as effective as a single-hidden-layer sigmoidal neural network in classification capability. In other words, to successfully learn a given training set, a DTNN may need as little as half the number of free parameters required by a sigmoidal neural network.

2. Classification capability of single-hidden-layer DTNNs Before presenting our study on the capability of DTNNs, we have to introduce the Quadratic Heaviside function

QuadraticHeaviside:

0, if0 2 - ( w i . x ) 2 < 0 , ~q(Wi'X, Oi)= 1, i f 0 ~ - (wi-x):>~0.

(3)

The Quadratic Heaviside function is an extension of the conventional Heaviside function. We call a neuron which uses the Quadratic Heaviside activation function a Static Threshold Quadratic Heaviside neuron. Let ~ ( x ) denote the conventional Heaviside function, i.e., ~e (x) = 0 for x < 0 and Yf (x) = 1 for x 1>0. Then

~,V;a(wi.x, 0~) = ~ ( q ( w ~ . x , Oi) ) ,

(4)

Q

Dynamic Threshold Quadratic Sigrnoidal Neuron Static Threshold Quadratic Sigmoidal Neuron Input Neuron

•

O0 IIIIIIllllllllUll

Threshold Connection ( O~ ) Weight Connection ( ~: ) ij

Fig. 3. Network architecture of Dynamic Threshold Neural Networks.

C.C. Chiang. H.-C. Fu / PatternRecognitionLetters 15 (1994)409-418

412

where q(w~.x, 0~)=0 2 - (wi-x) 2. By Eq. (3), it is clear that the output of a Quadratic Heaviside function remains unchanged if we scale up (or scale down) the weight vector (wj) and threshold (0t) by a nonzero factor of k. Thus, the following l e m m a can easily be derived.

Lemma 1. For any constant k # O,

~q(Wi'X, Oi) =affq(kwi.x, kOi) for allx~R ~ . Let a(x) be the sigmoid function, i.e., a(x) = ( 1 + exp ( - x) ) - 1. The following l e m m a will also be very useful in later theorem proofs regarding the capability of our DTNNs. Lemma 2. For a given error tolerance e > O,

,a(x)-~(x),~¢

for ,x[ ~ > ] l o g ( 1 - ' ) ] . I \~]l

Proof. For x < 0, a(x) must be less than or equal to ~, i.e. (1 + exp ( - x ) ) - 1 < e. Thus, it is easy to derive that x~<-log((1-~)/E). On the other hand, for x>~O, a(x) must be larger than or equal to l - e , i.e., ( 1 + exp ( - x) ) - 1~> 1 - e. Thus, we derive x >1log ( ( I - E) / e). Therefore, we conclude that, if Ix 1i> Ilog ( ( 1 - e) / e) I, it is the case that [ a ( x ) - ~ ( x ) I ~~ O,

(5)

where g(O~, x) = 0~,o+ ~7=oO~jxj. We also call a neuron which uses the E Q H F as its activation function a Dy-

namic Threshold Quadratic Heaviside neuron. In the following, we will first study the capability of highway-linked feedforward (direct input-to-output connections are included) single-hidden-layer networks (see Fig. 4 ( a ) ) which contain Static Threshold Quadratic Heaviside neurons in one hidden layer and one D y n a m i c Threshold Quadratic Heaviside neuron in the output layer. Then we will extend our results to feedforward (no direct input-to-output connections) networks (see Fig. 4 ( b ) ) which contain Static Threshold Quadratic Sigmoidal neurons in one hidden layer and one D y n a m i c Threshold Quadratic Sigmoidal neuron in the output layer. Suppose that a training set S consists of distinct vectors xl, ..., xp, where x ~ R n. since the set A=~"-

U

{s l s . ( x , - x A = O ,

s~n}

i ~ j , l <~i,j<~p

is not empty, we can always find a projection vector v in A such that the new training set

S~={yi l yi-~v'xi, l <~i<~p} contains only distinct elements. Assume that there exists a network containing h neurons in its first hidden layer that can dichotomize a dichotomy which is induced from S onto S~. Let the weights of these h hidden neurons be wt, w2, ..., wh ( w i c k ) . Then it is obvious that the network can be transformed to dichotomize the original dichotomy on S by replacing the weights of these h hidden neurons as wl v, w2v, .... why. Without loss of generality, we can sort the elements in the new training set Sv and reindex them such that y~
Sy ={yi[yi=v'xi, xi~S-).

C.-C. Chiang, H.-C. Fu / Pattern Recognition Letters 15 (1994) 409-418

413

I

Q O l b

Q

Dynamic Threshold Quadratic Heaviside neuron

O

Dynamic Threshold Quadratic Sigmoidal neuron

Static Threshold Quadratic Heaviside neuron

~

Static Threshold Quadratic Sigmoidal neuron

(a)

(b)

Fig. 4. (a) A highway-feedforwardsingle-hidden-layernetwork composed of Static Threshold Quadratic Heaviside neurons and Dynamic Threshold Quadratic Heaviside neurons; (b) a feedforwardsingle-hidden-layernetworkcomposedof Static Threshold Quadratic Sigmoidal neurons and Dynamic Threshold Quadratic Sigmoidalneurons. We shall assume that y, is in S + since we can always find a vector v for this purpose. Now, we can prove the following theorem. 1. Given a training set S = {y~, Y2, -.., Y4k+ 1 [ Y~~ ~, 1 <~i <~4 k + 1}, a highway-linked feedforward network containing at most k Quadratic Heaviside neurons in one hidden layer and one Dynamic Threshold Quadratic Heaviside neuron in the output layer can dichotomize an arbitrary dichotomy defined on S.

Theorem

Proof. Let us use the notation "/i < I / ' for intervals to mean that x < y for all x~I, and all y~Ij. We also use the notation " x < / " to denote that x < y for all yeI. Since the y,'s have been sorted in ascending order, we can find 4 k + 1 disjoint closed subintervals I~ ( 1 ~
S+=(Y2,+~ I l~
then it would be the worst case for the network to dichotomize. Let I - = U 02kI 2i and I + ~ U 02kI 2 i + 1 • Thus, if we can construct a network with the stated network architecture such that the network outputs "1" for x ~ I + and "0" for x d - , then the proof can be completed. Let flj (0 ~
for O<<.i<~k- 1 .

(6)

Let wi = (Wi.o, w/.t ) denote the bias and weight of the ith Quadratic Heaviside hidden neuron, and let 0! 1) denote the threshold of the ith Quadratic Heaviside hidden neuron. Also let u = ( u o , ut . . . . . uk) and O = (0~ 2) , 012), ..., 0~2)) denote the hidden-to-output connection weight vector and the threshold vector of the Dynamic Threshold Quadratic Heaviside neuron, respectively. In addition, v ( ~ ) is used to denote the direct input-to-output connection weight. Thus, the output of this network can be formulated by

414

C-C Chiang, H.-C Fu / Pattern Recognition Letters 15 (1994) 409-418

Target Output

~

"--

a~oun'ary , t~

'

Intervals

2nd Hidden Unit Output

0

"'"

'

~I~'i--n~n~,~,~3,'~ n

l stHidden __mi--i-] Unit Output

0

0

0 d

~/i I~ I~i,l .,~1 ~.i-1 ~.i I I ~.iI -I Ioo,0~k-2 I I7118:I9( 'YCi+l,J4i+2'~4i+3J4i+4 ']~i+5 ~ ' 14k-3

]k-I ~k-I +k~-Il~k-I ~ik

: L:

' .,'_.._~_:...~,""-- _ ~ ,

i'th Hidden Unit Output ~ - ~ - ~ -

k'th Hidden Unit Output

'

Network / Output A for x ~I4i+1,-~, ,

L" q

Netw°rk[~ ii[

Output for x ~I4i+1

,~, .,,~., ,

1 ,,~ -~, ,~

1

"

..I I..L.., ..i

, .,m

...I I I

,,.

Output

I I i

Fig. 5. Graphical demonstration of Theorem 1. k O=f(Uo +V' X+ ~ ui'h(Wi,o +Wi, lX,

0i(1)), O),

(7)

i=1

where x e ~ is the input, f ( ) denotes the Extended Quadratic Heaviside function (see Eq. (5)), and h( ) denotes the Quadratic Heaviside function (see Eq. (3)). Now, let us set the parameters of this network as follows:

U0 ~ 0,

u~ = - ½ (~,, + ~,;),

0~2~ =max{ Iff_l l, IPk[},

0,.(2) =

W,,o = - ½(P, + # D ,

wi,l =1,

0~ ]~ = ½(fl~-fl,) for 1
v=l .

- - ~l ( 7 i -, - 7 i ) - - 0 ~

2)

for i ~ 0 ,

Based on these settings, for a given input x in/4i+ 2 or/4i+ 3 or Li+ 4, we have x~ (fl~,fl;). By Eq. (3), it is easy to

C.-C. Chiang, H.-C. Fu /Pattern RecognitionLetters 15 (1994) 409-418

415

prove that only the ith hidden neuron will output "1" and all other hidden neurons will output "0" (see Fig. 5 ). The g(O, x) of the output neuron is derived as 1 (y~_ Yt). According to Eqs. (7) and (5), the output of the network becomes

O = f ( x + u i , 19) , ={1, if~i<<.x<~?'i, O, iffli
o=f(x, O), =51, if - 10~2) I ~x~< 10~2) I, andxsI4i+~ for 1 <<.i<~k, ( 0, otherwise. Therefore, we can prove that for all x in I4~+1 ( 1 ~
C~_

~O-__Wo,-O-wo~ ( wl wl )

and an error tolerance ~> O, then Iq)(2w.y, 20)-J,'fq(w.y, O)l ~e,

as 2>-'N/ll°g( (1-E)/~)lm

for a l l x ~ C ,

wherem=min({lO2-(Wo+WlX)2l I x e C } ) . Proof. The Quadratic Sigmoid function dp(w.y, O) can be regarded as a variant of the sigmoid function, i.e., ( w .y, O) = tr( 0 2- ( wo + wlx) 2), where tr(x) denotes the conventional sigmoid function. Thus, by Lemma 2, if 1(20) 2 - (2Wo"l-AWlX)2l >i I l o g ( ( 1 - ~ ) / e ) i, then [O'((20) 2 - (2W0 "~-2W1X) 2 ) - - o ~ ( ( 2 0 ) 2 - (2W0 "~-2WlX)2) I ~ e.

Let m = m i n ( { 1 0 2 - (wo +wtx)21 I x e C } ) . Since x cannot be (O-wo)/wl or ( - 0 Therefore, Eq. (2) can be rewritten as

I q)( 2w.y, 20) - Ygq(2w.y, 20) l <,,e, i f 2 f > N / l l ° g ( ( 1 - e ) / e ) l m By Lemma 1, we conclude

forallxeC.

Wo)/Wl, we have m > 0 .

416

C.-C. Chiang, H.-C. Fu / Pattern Recognition Letters 15 (1994) 409-418

I O(2w.y, 20) - ~ ( w . y , O) I ~<~, if2/> ~/llog( ( lm- ~)/e) I for all xE C, wherem=min({lO2-(Wo+WlX)21

[xeC}).

[]

Similarly, the following corollary tells us that each Extended Quadratic Heaviside function can be approximated by an Extended Quadratic Sigmoid function with an arbitrary small error tolerance. Corollary 1. Let 7t( w.y, O) and I2( w.y, O) denote an Extended Quadratic Sigmoid function and an Extended Quadratic ( 1, yl, Y2,

...,

Heaviside function, respectively, where w = ( Wo, wl, ..., wn), Yn). Given an error tolerance ~ > 0 and a compact set

(wo+

2

O = (0o, O~.... , On),

and

y=

2

then Iq)(Aw'y, A O ) - g 2 ( w ' y , O ) [ ~ , ,

i f 2>~ ~ l l o g ( ( ~ ' ) / ' ) ]

for all y ~ C ,

where 2

m=min({(Oo+i~=lOiY,)-(Wo+i~=lwiyi)

2

I(Y,,

Y2,

...,yn)~C}).

In the following lemma, we prove that a Quadratic Sigmoid function can also be used to approximate the linear functionf(x) =x. Lemma 4. Let ~ ( w.y, O) denote a Quadratic Sigmoid function, where w = ( Wo, wl ) and y = ( 1, x ). Suppose that, for some weight vector wc= ( c, O) and threshold 0o,

O~(w.y, Onet O) . . . . . o=Oo= lz v~0 , where net = w.y = wo + w tx. Let C = ~ be a compact domain. There exists a weight vector wx = ( c - 2 - ~c, ,t - ~), and

lim 2_ [ ¢ ( w ~ . y , O o ) - ¢ ( w ~ ' y , Oo) ] + c ~ x ,

,t.o~ It

for all x e C .

Proof. For convenience, let f ( n e t , O) denote the Quadratic Sigmoid function, where net= w.y = Wo+ wtx. Thus • (wc'y, 0 o ) = f ( c , 0o). Since O ~ (Onet w . y , O) . . . . . o=oo = l~v~O '

we have limf(C+; t -1 ( x - c), 00) - f ( c , 0o)

C.-C. Chiang, H.-C. Fu / Pattern Recognition Letters 15 (1994) 409-418

417

Rearranging the terms in the above equation, we obtain lim 2 [ f ( c + ; t _ 1( x - c ) ,

0o)-f(c, Oo)]+c~x.

In other words, there exists a weight vector wa= ( c - 2 - lc, 2 - ~), such that l i m 2 - [ ~ ( w ~ . y , Oo)-qg(w~.y, Oo)]+c-~x

for all x ~ C .

[]

In the proof of L e m m a 4, #, c, and ~(w~.y, 0o) are all constants (independent of x); thus this lemma says that we can use one Quadratic Sigmoid function ( • (w~-y, 00) ) to approximate the linear function f ( x ) = x. With the above auxiliary lemmas, the following theorem can be proved. Theorem 2. Given a training set S = {Yx, Y2, ..., Y4k+ 1 I Yi ~ ~ , 1 <~i <<.4k + 1}, a single-hidden-layer D T N N contain-

ing at most k + 1 Static Threshold Quadratic Sigmoidal hidden neurons and one Dynamic Threshold Quadratic Sigmoidal output neuron can dichotomize an arbitrary dichotomy defined on S. Proof. Consider each Quadratic Heaviside hidden neuron i of the network constructed in Theorem 1. Based on the parameter settings in the proof of Theorem l, we derive 0 } 1 ~ - W,,o _ ~ ,

Wi, l

- 0 } 1) -W,,o _

~,.

Wi,1

We have assumed that both fli and fl~ are not in any interval Ii for 1 <~i<~4k+ 1 in Eq. (6); then by Lemma 3, each term of the Quadratic Heaviside function (h (W~,o+ Wi,lX, Oi) ) in Eq. (7) can be replaced by a Static Threshold Quadratic Sigmoidal neuron with activation function ~ ( 2 w . y , 20) if 2 is large enough. Let h~ denote the output of the ith neuron. In the proof of Theorem 1, we have seen that for any input in U ~kl {I4i+2 t.J I4i+ 3 k.) I4~+4}, one and only one hidden neuron will output "1". Thus, based on the parameter settings in the proof of Theorem 1, the term (g(O, y ) ) 2 _ (w.y)2 for the Dynamic Threshold Quadratic Heaviside output neuron is equal to ( 0 ! 2) .~_ 0 ~2) ) 2 - - ( X "JI- U i ) 2 ( i # 0 ), where y = ( 1, x, h 1, h 2 , . , . , hk ) and w = ( Uo, v, u 1, u2, ..., uk). In the settings in the proof of Theorem 1, 0 ! 2 ) = ~1 (~,, - ~) - 0~2) and u~ = - - ~1 (~,,, + ~). In addition, we have also assumed that ~'~ and ~ are not in any interval Ii for l~

The classification capability of a dynamic threshold neural network

The classification capability of a dynamic threshold neural network

Recommend Documents