Neurocomputing 34 (2000) 169}193
On quantization error of self-organizing map network Yi Sun* Department of Electrical Engineering, City College of City University of New York, New York, NY 10031, USA Received 17 August 1999; accepted 12 April 2000
Abstract In this paper, we analyze how neighborhood size and number of weights in the selforganizing map (SOM) e!ect quantization error. A sequence of i.i.d. one-dimensional random variable with uniform distribution is considered as input of the SOM. First obtained is the linear equation that an equilibrium state of the SOM satis"es with any neighborhood size and number of weights. Then it is shown that the SOM converges to the unique minimum point of quantization error if and only if the neighborhood size is one, the smallest. If the neighborhood size increases with the increasing number of weights at the same ratio, the asymptotic quantization error does not converge to zero and the asymptotic distribution of weights di!ers from the distribution of input samples. This suggests that in order to achieve a small quantization error and good approximation of input distribution, a small neighborhood size must be used. Weight distributions in numerical evaluation con"rm the result. 2000 Elsevier Science B.V. All rights reserved. Keywords: Self-organizing map; Quantization error; Convergence; Minimum point
1. Introduction Kohonen's self-organizing map (SOM) [11] is one of the most important learning algorithms in neural networks. The SOM has two obvious properties. One is that the order appears in a mapping space, usually one- or two-dimensional space for visualization. The other is that a "nite number of weights approximates the probability distribution of input samples. Both properties are expressed as propositions in [14]. A large number of experiments verify them.
* Corresponding author. Tel.: #1-212-650-6621; fax: #1-212-650-8249. E-mail address:
[email protected] (Yi Sun). 0925-2312/00/$ - see front matter 2000 Elsevier Science B.V. All rights reserved. PII: S 0 9 2 5 - 2 3 1 2 ( 0 0 ) 0 0 2 9 2 - 7
170
Yi Sun / Neurocomputing 34 (2000) 169}193
Theoretical analysis on the SOM obtained before focused on the characterizations of convergence, appearance of order in the mapping space, asymptotic weight distribution as the number of weights approaches in"nity, and learning rate [1}3,5,6,12,16,17,20}22,24,25]. Lo et al. [17] proved that for one-dimensional random variable with uniform distribution and neighborhood size two, the order of mapping appears and weights converge to the unique equilibrium state with probability one. The equilibrium state is the unique solution of a linear equation that depends on the number of weights. For the one-dimensional case with neighborhood size two, Bouton and Page`s [1] proved the almost-sure ordering and convergence with extended nonuniformly distributed stimuli. They further presented a study of convergence in distribution [2]. Yin and Allinson [25] proved the mean-square convergence of SOM in any dimensional space by exploiting the central limit theorem. Erwin et al. showed the convergence and ordering in various neighborhood functions. Budinich and Taylor [3] obtained conditions for convergence to ordered states. Regarding weight density, Ritter [20] showed that in one-dimensional case the asymptotic weight density, as the number of weights approaches in"nity, is proportional to the exponent of density function of input samples. The exponent depends on the neighborhood size. An important application of the SOM algorithm is vector quantization where a set of input data points are approximated by a few weights. The mean square error (MSE) [7,15] is usually used to measure the quantization error. The smaller the quantization error is, the better the quantization is. The number of weights and the neighborhood size determine the performance of the SOM, and therefore determines the quantization error. Intuitively, the quantization error decreases with the increasing number of weights. However, it is unknown how the neighborhood size e!ects the quantization error. It is desired to set up a relationship between the quantization error and the number of weights and the neighborhood size. As outlined in [23], in this paper we present such a study. A sequence of i.i.d. one-dimensional random variables with uniform distribution is considered as input of the SOM. The general linear equation that an equilibrium state satis"es in any neighborhood size is presented. The properties of this linear equation are analyzed, and then an upper and a lower bound of the quantization error are obtained. It is shown that the quantization error grows with exponent three of the neighborhood size. If the neighborhood size increases with the increasing number of weights, the asymptotic quantization error does not converges to zero and the asymptotic distribution of weights di!ers from the (uniform) distribution of input samples. This strongly suggests to use small neighborhood size in order to reduce the quantization error and obtain good approximation of input distribution. Numerical evaluations of weight distribution is demonstrated to verify the analytical result.
2. Equilibrium state of weights and minimum point of quantization error Let +x(n), be a sequence of i.i.d. random variables with uniform probability distribution on [a, b]. The probability density function is f (x)"1/(b!a) if x3[a, b]
Yi Sun / Neurocomputing 34 (2000) 169}193
171
and f (x)"0 if x,[a, b]. Consider the discrete-time SOM that has k weights with +x(n), of its input. In one dimensional case, the updating equation of weights has the following form:
m (n)#a(n)[x(n)!m (n)], i3N (n), G J m (n#1)" G i"1,2, k, G m (n), i,N (n), G J
(1)
where "x(n)!m (n)""min "x(n)!m (n)" and the neighborhood of weight l is J XHXI H de"ned as N (n)"N "+ j"" j!l "(c,. J J
(2)
+a(n), is a learning rate sequence, c denotes the neighborhood size, and k denotes the number of weights. In general, the neighborhood has a bell shape and changes with time. In this paper, as in (2), we consider only the #at shape and "xed size. We adopt the mean square error, which is usually used in vector quantization and pattern recognition, as a measure of quantization error after the SOM converges to an equilibrium state. For one-dimensional uniform distribution, the quantization error can be written as
@min +(x!m ), XGXI G e"E min +(x!m ), " dx, (3) G b!a ? XGXI where m is the ith weight in the equilibrium state. To express e as a function of c and k, G the weights of equilibrium state must be expressed by any c and k. 2.1. Equilibrium state Assume that the learning rate sequence +a(n), satis"es conditions: (i) a(n)'0, ∀n; (ii) a(n)"R; and (iii) a(n)(R. Assume further that at initial the weights LJ L are ordered in the form: a4m (0)(m (0)(2(m (0)4b. The order will remain I at any future update [17]. All the assumptions are the same as those made in [17] except that in this paper the neighborhood size c is not limited to be equal to two. Lo et al. proved in [17] (also see [1] by Bouton and Page`s) that when c"2, under the above assumptions the weights converge with probability one to the unique equilibrium state, which is the solution of the following equation:
4 0
!1 !1 4
!1 !1 2
0
!1 !1 4
2
0
2
0
0
2
0
2
0
0
2
0
2
0
!1 !1
2 !1 !1
0
0
2 4
!1 !1 0
2 !1 !1 4
!1 !1
0 4
m m m
2a 2a
0 2 " 2 . m 0 I\ m 2b I\ m 2b I
(4)
172
Yi Sun / Neurocomputing 34 (2000) 169}193
The result is extendible to any neighborhood size as given by the following proposition. Proposition 1. Under the above assumptions, evolving with Eq. (1), the weights converge with probability one to the unique equilibrium state of the solution of equation (5)
Am"h
where m (the ith element of m) is the ith weight, and vector h3R) and matrix A3RI"I G are dexned as follows: (i) If k'2c!1, h"(2a,2, 2a, 0,2, 0, 2b,2, 2b)2,
(6)
where the xrst c elements of h are equal to 2a and the last c elements are equal to 2b. For c"1 the nonzero elements of A in (5) are a "3, a "!1, a "!1, a "2, a "!1, i"2,2, k!1 GG\ GG GG> a "!1, a "3; II\ II and for c'1 the nonzero elements of A are a "4, a "!1, a "!1, i"1,2, c, GG GG>A\ GG>A a "!1, a "!1, a "4, a "!1, GG\A GG\A> GG GG>A\ i"c#1,2, k!c,
(7)
a "!1, GG>A
a "!1, a "!1, a "4, i"k!c#1,2, k, (8) GG\A GG\A> GG and all other elements of A for c51 are zeros. (ii) If k42c!1 and cOk, h"[2a,2, 2a, 2(a#b),2, 2(a#b), 2b,2, 2b]2, in which the xrst k!c elements are equal to 2a, the middle 2c!k elements are equal to 2(a#b) and the last k!c elements are equal to 2b; the nonzero elements of A are a "4, a "!1, a "!1, i"1,2, k!c, GG GG>A\ GG>A a "4, i"k!c#1,2, c, GG a "!1, a "!1, a "4, i"c#1,2, k, GG\A GG\A> GG and all other elements of A are zeros. If c"k, h"[2(a#b),2, 2(a#b)]2 and A"diag(4,2, 4). Proposition 1 includes the result given by Lo et al. [17] as an instance for c"2. To prove Proposition 1, the method summarized in [8] Appendix B (which was originally proposed in [18,9]) is applied. A proof is presented in the appendix. Proposition 1 shows that neighborhood size and number of weights determine the distribution of weights in an equilibrium state, and therefore determine the quantization error of the SOM. To analyze the solution of (5) and corresponding quantization
Yi Sun / Neurocomputing 34 (2000) 169}193
173
error with respect to c and k can lead to understanding of some dynamic properties of the SOM. In what follows, we analyze the quantization error based on analysis of solution of (5). It is clear that if k42c!1, the middle 2c!k weights in an equilibrium state are all located at (b!a)/2. In the following discussion of this paper, we eliminate this trivial case and assume k'2c!1. First, the following lemma is proved in the appendix. Lemma 1. A is positive dexnite. In the particular case when c"2, Lemma 1 is proved in [17,1] and [12]. However, Lemma 1 includes all neighborhood sizes. The following corollary follows Lemma 1 because if A is positive de"nite, A is nonsingular, and therefore the solution of (5) is unique. Corollary 1. The solution of Eq. (5) is unique. 2.2. Minimum point of quantization error Following the well-known Lloyd}Max optimal quantization conditions [7], we obtain immediately that if the support [a, b] is partitioned into k equal intervals and weights are located at the means of these intervals, respectively, the quantization error achieves its unique minimum. We use a k-dimensional vector mH to denote this minimum point
b!a 3(b!a) (2k!1)(b!a) 2 mH" a# , a# , , a# . 2k 2k 2 2k
(9)
The corresponding minimum quantization error is (b!a) e " .
12k
(10)
This minimum quantization error is the lower bound for any quantization algorithm.
3. Properties of equilibrium state and quantization error In what follows, we analyze the condition in which the SOM converges to the minimum point, present several useful properties of the equilibrium state, and then analyze how k and c e!ect the quantization error. 3.1. Properties of equilibrium state Without solving Eq. (5) directly, we can obtain the following properties of its solution.
174
Yi Sun / Neurocomputing 34 (2000) 169}193
Lemma 2. If c"1, the equilibrium state of weights in the SOM is located at the minimum point of the quantization error, that is m"mH. Proof. For m"[m , m ,2, m ]23RI. Let s "a, s "b, and s "(m #m ), I I G G G> i"1, 2,2, k!1. e can be expressed as
I 1 I QG (x!m ) G dx" e" [(s !m )!(s !m )] G G G\ G b!a 3(b!a) G\ G Q G
1 I\ " 4(m !a)# (m !m )#4(b!m ) . G> G I 12(b!a) G
(11)
Let *E/*m "0, i"1, 2,2, k, which yields G 4(m !a)!(m !m )"0, (m !m )!(m !m )"0, i"2,2, k!1, G G\ G> G (m !m )!4(b!m )"0. I I\ I
(12)
It follows from a4m (m (,2, (m 4b that the unique minimum point of I e satis"es 3m !m "2a, !m #2m !m "0, i"2,2, k!1, G\ G G> !m #3m "2b, I\ I
(13)
which is the special case of Eq. (5) for c"1. 䊐 Lemma 2 shows that when neighborhood size is the smallest, the case when only the winning weight is contained in weight neighborhood and is adjusted at each updating step, weights converge to the unique minimum point with probability one. In order to obtain other properties, we de"ne a subset X in space RI. De5nition 1. Given the above matrix A, de"ne a subset X in RI as X"+x"x3RI,(Ax) "0, j"c#1,2, k!c,, where (Ax)j denotes the jth element of H k-dimensional vector Ax. The following lemma describes a property of any vector in subset X. Lemma 3. For ∀c51, the following inequalities hold: ""Ax"" 4""x"" ""Ax"" ,
∀x3X,
"x ". where ""x"" denotes the R-norm of x, ""x"" "max XGXI G
(14)
Yi Sun / Neurocomputing 34 (2000) 169}193
175
Proof. For any x3X, assume "(Ax) """"Ax"" "max "(Ax) " and "x """"x"" " H XGXI G P max "x ". For c '1, we have XGXI G "(Ax) """4x !x !x ", i"1,2, c, G G G>A\ G>A "(Ax) """!x !x #4x !x !x ", i"c#1,2, k!c, (15) G G\A G\A> G G>A\ G>A "(Ax) """!x !x #4x ", i"k!c#1,2, k. G G\A G\A> G (i) If j3+c#1,2, k!c,, due to the de"nition of X, "(Ax) "4"(Ax) ""0, i"1,2, k, G H that is Ax"0. Since A is nonsingular in terms of Lemma 1, x must be the zero vector, and therefore (14) is true. (ii) Assume j3+1,2, c, k!c#1,2, k,: (a) If r3+1,2, c,,""Ax"" ""(Ax) "5"(Ax) """4x !x !x "52"x "" H P P P>A\ P>A P 2""x"" . Meanwhile, ""Ax"" ""(Ax) """4x !x !x "46"x ""6""x"" , if H H H>A\ H>A P j3+1,2, c,; or ""Ax"" ""(Ax) """!x !x #4x "46"x ""6""x"" , if H H\A H\A> H P j3+k!c#1,2, k,. (b) If r3+k!c#1,2, k,,""Ax"" ""(Ax) "5"(Ax) """!x !x #4x "5 H P P\A P\A> P 2"x ""2""x"" . Meanwhile, ""Ax"" ""(Ax) "46"x ""6""x"" for the same reason as in P H P (a). (c) If r3+c#1,2, k!c,, in terms of the de"nition of X, !x
P\A
!x #4x !x !x "0. P\A> P P>A\ P>A
Since x O0 (otherwise, x"0 and (14) is true), P x x x x P\A # P\A> # P>A\ # P>A "4. x x x x P P P P Since x x x x !14 P\A 41, !14 P\A> 41, !14 P>A\ 41, !14 P>A 41, x x x x P P P P we must have x "x "x "x "x . P\A P\A> P>A\ P>A P For the same reason as the above, if r!c3+c#1,2, k!c,, x "x "x "x "x "x , P\A P\A> P\ P P\A P if r!c#13+c#1,2, k!c,, x "x "x "x "x "x , P\A> P\A> P P> P\A> P if r#c!13+c#1,2, k!c,, x "x "x "x "x "x , P\ P P>A\ P>A\ P>A\ P
(16)
176
Yi Sun / Neurocomputing 34 (2000) 169}193
if r#c3+c#1,2, k!c,, x "x "x "x "x "x . P P> P>A\ P>A P>A P By repeating this procedure, we obtain x "x , ∀i3+1, 2,2, k, G P and therefore, ""Ax"" ""(Ax) """4x !x !x ""2"x ""2""x"" , if j3+1,2, c, H H H>A\ H>A P or ""Ax"" ""(Ax) """!x !x #4x ""2"x ""2""x"" , H H\A H\A> H P if j3+k!c#1,2, k,. By concluding (i) and (ii), ""Ax"" 4""x"" 4""Ax"" , ∀x3X. Similarly, we can prove that for c"1, ""Ax"" 4""x"" 4""Ax"" , ∀x3X. Hence, (14) is true for c51. 䊐 We obtain the following lemma in terms of Lemma 3. Lemma 4. For c51, m!mH is a vector in X and satisxes (17)
A(m!mH)"p,
where p"(2(b!a)/k)[(c!1), (c!2),2,1, 0,2, 0,!1,2,!(c!2),!(c!1)]2, and m!mH"!(m!mH)%, where&denotes the vector reverse, and 2(b!a)(c!1) ""A(m!mH)"" " . k
(18)
Proof. For c'1, if j"c#1,2, k!c, by noticing (5), (8) and (9) we have (Am) "h "0, H H
(19)
and (AmH) "a mH #a mH #a mH#a mH #a mH H HH\A H\A HH\A> H\A> HH H HH>A\ H>A\ HH>A H>A
[2( j!c)!1](b!a) [2( j!c#1)!1](b!a) ! a# "! a# 2k 2k
(2j!1)(b!a) #4 a# 2k
Yi Sun / Neurocomputing 34 (2000) 169}193
177
[2( j#c!1)!1](b!a) [2( j#c)!1](b!a) ! a# ! a# 2k 2k
(20)
"0.
So (A(m!mH)) "0 for j"c#1,2, k!c and m!mH3X according to the de"niH tion of X. If c"1, then m"mH due to Lemma 2, so A(m!mH)"0, and therefore m!mH3X. Hence, m!mH3X for c51. For c'1, if j"1,2, c, (AmH) "a mH#a mH #a mH H HH H HH>A\ H>A\ HH>A H>A (2j!1)(b!a) [2( j#c!1)!1](b!a) "4 a# ! a# 2k 2k
[2( j#c)!1](b!a) ! a# 2k
2(b!a)( j!c) "2a# k and so 2(b!a)( j!c) (A(m!mH)) "(Am) !(AmH) "2a!(AmH) "! . H H H H k
(21)
Similarly, if j"k!c#1,2, k, (AmH) "a mH #a mH #a mH HH H H HH\A H\A HH\A> H\A> [2( j!c)!1](b!a) [2( j!c#1)!1](b!a) "! a# ! a# 2k 2k
#4
a#(2j!1)(b!a) 2k
2(b!a)( j#c!k!1) "2b# k and so 2(b!a)( j#c!k!1) . (A(m!mH)) "(Am) !(AmH) "2b!(AmH) "! H H H H k (22) If c"1, p"0, which corresponds to m"mH. In summary, A(m!mH)"p for c51. Since p"!pH and A is equal to its upper-down and left}right #ips, A(m!mH)%"!p"!A(m!mH). Because A is nonsingular, it is necessary that m!mH"!(m!mH)%.
178
Yi Sun / Neurocomputing 34 (2000) 169}193
Obviously, 2(b!a)(c!1) ""A(m!mH)"" ""p """p "" . I k This completes the proof. 䊐 By applying Lemmas 3 and 4, we obtain the following theorem. Theorem 1. The weights in the equilibrium state satisfy (b!a)(c!1) (b!a)(c!1) 4""m!mH"" 4 . 3k k
(23)
Corollary 2. If and only if c"1, updating Eq. (1) converges to the minimum point of quantization error with probability one. For proof of Corollary 2, Lemma 2 gives the su$ciency and Theorem 1 ensures the necessity. The minimum point mH of e changes with the number of weights. An equilibrium state of weights (i.e., the solution of (5)) changes with the number of weights and neighborhood size. Theorem 1 expresses how the distance between the equilibrium state and the minimum point changes with the number of weights and neighborhood size. It is clear that if c is "xed, lim ""m!mH"" "0 at the converI gence rate O(1/k). The weights converge to the minimum point of quantization error when the number of weights approaches in"nity. However, if k"(c!1)/k is "xed and k and c!1 increase at the same rate, m does not converge to mH because of ""m!mH"" 5(b!a)k/3. Fig. 1 shows numerical evaluation of ""m!mH"" and its lower and upper bounds given by (23) with k"255, a"0 and b"1. Theorem 1 describes the activities of all k weights when the number of weights increases and c is "xed. In [17], Lo et al. analyzed in the case c"2 the convergence property of the two weights m and m near the boundaries of support [a, b] as I k increases. They concluded that m and m converge to a and b, respectively as k goes I to in"nity. However, they did not answer at what convergence rate m and m converge to a and b, respectively. We give the following lemma that not only I answers this question for all c51, but also shows how the neighborhood size in#uences m and m . I Lemma 5. Weights m and m in the equilibrium state satisfy I (b!a)(31c#21) (b!a)(2c!1) 4m !a"b!m 4 . I 2k 112k Proof. We "rst prove the second inequality. Due to (9) and (23), m !a"m !mH#mH!a
(24)
Yi Sun / Neurocomputing 34 (2000) 169}193
179
Fig. 1. Distance ""m!mH"" and its lower and upper bounds with k"255, a"0, and b"1.
b!a (b!a)(c!1) b!a "m !mH# 4 # 2k k 2k (b!a)(2c!1) " 2k and b!a b!a (b!a)(c!1) b!m "b!mH#mH!m " #mH!m 4 # I I I I I I 2k 2k k (b!a)(2c!1) " . 2k In terms of Lemma 4, m !mH"mH!m and so m !a"b!m . I I I Now we prove the "rst inequality of (24). Denote *m "m !mH for i"1,2, k. G G G For c"1, the inequality is true because of (9). For c'1, by means of Lemma 4, (b!a)(c!1) 1 # (*m #*m ), *m " A A> 2k 4 *m "(*m #*m ), A A\ A *m "(*m #*m #*m #*m ), A> A A>
180
Yi Sun / Neurocomputing 34 (2000) 169}193
hence by means of Theorem 1, 1 1 (b!a)(c!1) # (*m #*m )! ""m!mH"" *m 5 2k 16 4
(25)
and we have (b!a)(c!2) 1 *m " # (*m #*m ) A> A> 2k 4 (b!a)(c!2) 1 3 5 # (*m #*m )! ""m!mH"" . 2k 16 8
(26)
By adding (26) to (25), we obtain 4(b!a)(2c!3) 5 ! ""m!mH"" . *m #*m 5 7 7k
(27)
Replacing *m #*m in (25) by (27), (b!a)(16c!17) 33 (b!a)(31c!35) *m 5 ! ""m!mH"" 5 . 28k 112 112k
(28)
Hence, b!a (b!a)(31c#21) m !a"*m # 5 . 2k 112k
䊐
According to Lemma 5, at the order of O(1/k), m and m converge to a and b, I respectively, as k approaches in"nity. Note that m and m do not converge to a and b, I respectively, if k and c increase at the same rate. 3.2. Properties of quantization error Based on the above analysis, now we derive an upper and a lower bound of the quantization error. An upper bound is given below. Theorem 2. In an equilibrium state, the quantization error is upper bounded by e4e
#e(c),
(29)
where (b!a)(c!1) e(c)" [4(c!1)#6(c!1)#3] 6k and the equality holds if and only if c"1.
(30)
Yi Sun / Neurocomputing 34 (2000) 169}193
181
The proof of Theorem 2 is presented in the appendix. The upper bound in Theorem 2 is loose. However, it is tight enough to observe the property of quantization error as k approaches in"nity. Corollary 3. If c is xxed, lim
I
e"0 at the order O(1/k) of convergence.
We obtain a lower bound of quantization error as follows. Theorem 3. In the equilibrium state, the quantization error is lower bounded by
(b!a)(k!1) (b!a) 31c#21 e' # . 12k 2 112k
(31)
Proof. In the following proof, we denote d"(b!a)/k and *m "m !mH for G G G i"1,2, k. We note that due to the convexity of function f (x)"x, I>a x5( I>a x ) where 04a 41 and I>a "1. Let a "(m !a)/(b!a), G G G G G G G G G a "(m !m )/(b!a) for i"2,2, k, and a "(b!m )/(b!a), and G G G\ I> I x "m !a, x "m !m for i"2,2, k and x "b!m . In terms of (11), the G G G\ I> I following terms in e satisfy I\ (m !a)# (m !m )#(b!m ) G> G I G (m !a) I\ (m !m ) (b!m ) G> G # I 5(b!a) # b!a b!a b!a G 1 I\ " (m !mH#mH!a)# (m !mH #mH !mH#mH!m ) G> G> G> G G G b!a G
#(b!mH#mH!m )) I I I 1 d I\ d # (*m #d!*m )# !*m " *m # G> G 2 I b!a 2 G 1 d d I\ " #2*m (*m )# # ((*m )#d#(*m ) 2 G> G b!a 2 G d d #2*m d!2*m *m !2d*m )# #(*m )!2 *m . (32) G> G> G G I I 2 2
Since I\(2*m d!2d*m )"2*m d!2d*m , (32) becomes G G> G I I\ (m !a)# (m !m )#(b!m ) G> G I G 1 d I\ d 5 *m ! # ((*m !*m )#d)# #*m 2 G> G I 2 b!a G
182
Yi Sun / Neurocomputing 34 (2000) 169}193
(k!1) (b!a)(k!1) ' d" . b!a k
(33)
Hence, it follows from (33) and Lemma 5 that I\ 1 e" [4(m !a)# (m !m )#4(b!m )] G> G I 12(b!a) G (b!a)(k!1) (m !a)#(b!m ) I ' # 12k 4(b!a)
(b!a)(k!1) (b!a) 31c#21 5 # . 12k 2 112k
䊐
(34)
By means of Theorems 2 and 3, the following corollary is obtained. Corollary 4. If the neighborhood size and the number of weights approach inxnity with c"O(k?), 04a41, then
"0, a(2/3, lim e 5@\? k , a"1, k"c/k. I
(35)
The upper and lower bounds of quantization error decrease with increasing number of weights and decreasing neighborhood size. However, if the number of weights and neighborhood size increase at the same rate, the bounds (especially the lower bound) of quantization error do not converge to 0. Meanwhile, we have the following corollary which follows Theorems 2 and 3. Corollary 5. If k is xxed, the quantization error increases with increasing c at the order O(c). Shown in Fig. 2 with k"255, a"0 and b"1 are minimum quantization error given by (10), the quantization error e"(b!a)/12 of single weight (which is achieved by k weights in the case c"k when all k weights are located at (b!a)/2 due to Proposition 1), the quantization error of the SOM for c"1,2, 127 and its upper and lower bounds given by (29) and (31), respectively. The quantization error and its upper and lower bounds show the same slop for large c, which implies that the bounds have good estimate on exponent of c in e. Theorems 2 and 3 strongly suggest that in order to achieve a small quantization error in the SOM, the neighborhood size must be small. 3.3. Asymptotic distribution of weights in the equilibrium state As the number of weights goes to in"nity, the weights in the equilibrium state satis"es some distribution. It is interesting to see in what condition the weights have
Yi Sun / Neurocomputing 34 (2000) 169}193
183
Fig. 2. Quantization errors with k"255, a"0, and b"1. The slops of bounds implies that the quantization error e increases with power of neighborhood size.
the same distribution of the input samples. The weights are said to have the same distribution of the input samples if lim e" lim E[ min +(x!m ),]"0. G I I XGXI Corollary 4 implies the following proposition.
(36)
Proposition 2. If the neighborhood size increases with c"O(k?) of a(2/3, the asymptotic distribution of weights in the equilibrium state is uniform (same as the distribution of the input samples). If c and k increase at the same rate, the distribution is nonuniform. 3.4. Pattern of weight distribution in the equilibrium state In previous subsection, we show that the quantization error grows with exponent three of increasing neighborhood size. In this subsection, via numerical solution, we demonstrate the visual pattern of weight distribution with respect to neighborhood size. Given number k of weights, neighborhood size c, and support [a, b], we can numerically solve Eq. (5) and obtain the equilibrium state m to which the SOM converges. By changing the ratio of neighborhood size to number of weights, the pattern of weight distribution can be shown. Fig. 3 shows an example of such a pattern for neighborhood size c"1,2, k with number of weights k"32 and support [a, b]"[0, 1]. Note that since Eq. (5) is linear, using support [0, 1] in Fig. 3 does not in#uence the pattern.
184
Yi Sun / Neurocomputing 34 (2000) 169}193
Fig. 3. Weight distribution in equilibrium state. o's on one line denote the placement of a weight under di!erent neighborhood size where c"1, 2,2, k with k"32, a"0, and b"1.
From Fig. 3, we can see that when c'1, weights are not uniformly distributed. In particular, when the ratio of neighborhood size to number of weights is larger than or equal to ten percent, some clusters appear and severely disturb the weight distribution which is much di!erent from a uniform distribution. We examined many di!erent number of weights, the weight distribution shows the similar pattern as in Fig. 3. This con"rms the above analytical result in that when neighborhood size is relatively large, the distribution of weights di!ers from the uniform distribution of input samples. We note that for the uniform distribution of input samples, the power law [20] implies that the asymptotic distribution of weights in equilibrium state is also uniform as K or both K and c approach in"nity. However, our analytical and numerical results show a counterexample when both K and c increase at the same rate.
4. Conclusions In this paper, we analyze the quantization error of the SOM with respect to neighborhood size and number of weights. A sequence of i.i.d. one-dimensional random variable of uniform distribution is considered as input of the SOM. First presented is the general linear equation that weights in an equilibrium state satisfy with any neighborhood size and number of weights. Based on this equation, the properties of equilibrium state and quantization error are analyzed. It is shown that the SOM achieves the minimum quantization error if and only if neighborhood size is one, the smallest. If neighborhood size is larger than one, the equilibrium state displaces from the minimum point. In terms of a upper and a lower bounds of quantization error, the quantization error grows with exponent three of increasing
Yi Sun / Neurocomputing 34 (2000) 169}193
185
neighborhood size. If the neighborhood size and the number of weights increase at the same rate, the quantization error does not converge to its minimum, and the asymptotic distribution of weights is not the same (uniform) distribution as the input samples. The weight distribution in numerical evaluation con"rms analytical result. In conclusion, in order to achieve a small quantization error and good approximation of input distribution, a small neighborhood size must be used.
5. For further reading The following references are also of interest to the reader: [4,10,13,19,24,26].
Acknowledgements The author would like to thank the reviewers for their careful review of this paper and constructive comments.
Appendix. Proof of Proposition 1. To prove Proposition 1, the method summarized in Appendix B of [8] is used. For k'2c!1, the required six conditions are checked as follows: (1) As assumed, the learning rate sequence +a(n), satis"es three conditions, (i) positive, (ii) of in"nite value of in"nite sum, and (iii) of "nite value of in"nite squared sum. (2) Without loss of generality, we assume that a(n)41. At any time n, assume a4m (n)4b for any i"1,2, k. If x(n)5m (n), G G m (n#1)"m (n)#a(n)(x(n)!m (n))5m (n)5a G G G G
(A.1)
and m (n#1)"(1!a(n))(m (n)!x(n))#x(n)4x(n)4b. G G If x(n)(m (n), we have G m (n#1)"m (n)#a(n)(x(n)!m (n))(m (n)4b G G G G and
(A.2)
(A.3)
m (n#1)"(1!a(n))(m (n)!x(n))#x(n)'x(n)5a. (A.4) G G Since P(a4x(0)4b)"1 and a4m (0)4b, due to inequalities (A.1)}(A.4), G P(a4m (1)4b)"1. Assume a4m (n)4b at time n'1, since P(a4x(n)4b)"1, G G due to (A.1)}(A.4), P(a4m (n#1)4b)"1. Hence, P(a4m (n)4b)"1 for any G G n51. Weights are bounded with probability one.
186
Yi Sun / Neurocomputing 34 (2000) 169}193
(3) For m(n)"[m (n), m (n),2, m (n)]23RI, according to (1) the updating function I of the ith weight is u (m(n), x(n))"x(n)!m (n), for i"1,2, k. (A.5) G G This updating function is continuously di!erentiable with respect to m and x, and its G derivatives are bounded in time. (4) In what follows, we obtain the statistical expectation of the updating function for c"1 and c'1, respectively. Let s (n)"a, s (n)"b, and s (n)"[m (n)#m (n)], I G G G> i"1,2, k!1. The assumption a4m (0)(m (0)(2(m (0)4b ensures that I a4m (n)(m (n)(2(m (n)4b for any n51 [17]: I (i) For c"1, let < (n)"[s (n), s (n)] for i"1,2, k. If x(n)3< (n) , the ith weight G G\ G G will be adjusted at time n#1. The nonzero part of conditioned probability density function of X"x(n) given event A "+x(n)3< (n), for i"1,2, k is G G P(X3dx"A ) P(X3dx5A ) P(X3dx) f (x(n)) G " G " f G (x(n)"A )" " . G 6 dx P(A )dx P(A )dx P(A ) G G G Since P(A )"[s (n)!s (n)]/(b!a), G G G\ 1 , x(n)3< (n), G f G (x(n)"A )" sG (n)!sG\ (n) i"1,2, k. G 6 0, elsewhere,
(A.6)
(A.7)
The statistical expectation of updating equation over X is s (n)#s (n) G !m (n) u (m(n))"E[X!m (n)"A ]"E[X"A ]!m (n)" G\ G G G G G G 2
1 (!3m (n)#m (n)#2a), i"1, 4 1 " (m (n)!2m (n)#m (n)), i"2,2, k!1, G G> 4 G\ 1 (m (n)!3m (n)#2b), I 4 I\
(A.8)
i"k.
In terms of (5), (6) and (7), (A.8) can be written in matrix form as u (m(n))"! (Am(n)!h), (A.9) where the ith element of u (m(n)) is u (m(n)). G (ii) For c'1, let < (n)"[s (n), s (n)] for i"1,2, c, G G>A\ < (n)"[s (n), s (n)] for i"c#1,2, k!c, and < (n)"[s (n), s (n)] for G G\A G>A\ G G\A I i"k!c#1,2, k. If x(n)3< (n), the ith weight will be adjusted at time n#1. The G nonzero part of conditioned probability density function of X"x(n) given event A "+x(n)3< (n), for i"1,2, k is G G P(X3dx"A ) P(X3dx5A ) P(Xdx) f (x(n)) G " G " f G (x(n)"A )" " . (A.10) 6 G dx P(A ) dx P(A ) dx P(A ) G G G
Yi Sun / Neurocomputing 34 (2000) 169}193
We have
187
1 (s (n)!s (n)), i"1,2, c, b!a G>A\ 1 P(A )" (s (n)!s (n)), i"c#1,2, k!c, G G\A b!a G>A\
which yields
1 (s (n)!s (n)), G\A b!a I
(A.11)
i"k!c#1,2, k,
1 , i"1,2, c, s (n)!s (n) G>A\ 1 f G (x(n)"A )" 6 G , i"c#1,2, k!c, (A.12) s (n)!s (n) G>A\ G\A 1 , i"k!c#1,2, k s (n)!s (n) I G\A for x(n)3< (n), and f G (x(n)"A )"0, elsewhere. The statistical expectation of the G 6 G updating equation over X is then u (m(n))"E[X!m (n)"A ]"E[X"A ]!m (n) G G G G G
1 (s (n)#s (n))!m (n), i"1,2, c, G 2 G>A\ 1 " (s (n)#s (n))!m (n), i"c#1,2, k!c, G\A G 2 G>A\
1 (s (n)#s (n))!m (n), G\A G 2 I
i"k!c#1,2, k,
1 (!4m (n)#m (n)#m (n)#2a), i"1,2, c, G G>A\ G>A 2 1 (n)!4m (n)#m (n)#m (n)), i"c#1,2, k!c, " (m (n)#m G\A> G G>A\ G>A 2 G\A 1 (m (n)#m (n)!4m (n)#2b), G\A> G 2 G\A
i"k!c#1,2, k. (A.13)
In terms of (5), (6) and (8), (A.13) can be written in matrix form as u (m(n))"!(Am(n)!h). (A.14) Regarding to (A.9) and (A.14), it is shown in condition (5) that the following limit exists: u (m)" lim u (m(n)), for c51. L
(A.15)
188
Yi Sun / Neurocomputing 34 (2000) 169}193
(5) For any c51, we de"ne a continuous time di!erential equation as dm(t) "!j(Am(t)!h), dt
(A.16)
where j" if c"1 and j" if c'1, and a4m (0)(m (0)(2(m (0)4b. In I (A.16), the nonzero elements of A's for c"1 and for c'1 are given by (7) and (8), respectively, and h is given by (6). We now assert that the di!erential equation (A.16) has a Liapunov function de"ned by (A.17)
<(t)"""Am(t)!h"". To validate this assertion, we show that <(t) satis"es two conditions: (i) d<(t)/dt(0 for all t, (ii) <(t) has a unique minimum point that is the solution of Eq. (5). In terms of (A.16), we obtain d<(t) dm(t) "( <(t))2 "(2A2Am(t)!2A2h)2[!j(Am(t)!h)] dt dt "!2j(Am(t)!h)2A(Am(t)!h)(0, ifAm(t)!hO0,
(A.18)
where we note that A is positive de"nite due to Lemma 1 to be seen later. From the de"nition of <(t) and the property that A is positive de"nite, it is clear that <(t)50 where <(t)"0 if and only if Am(t)"h. <(t) has a unique minimum point that is the solution of Eq. (5). This unique point can be reached when dm(t)/dt"0 as tPR, which implies that the ordinary di!erential equation (A.16) has a locally asymptotically stable solution, the solution of Eq. (5). (6) Let m denote the solution of the di!erential equation (A.16) with a basin of attraction X(m ). m is the unique attractor and is globally attractive in whole RI, i.e. X(m )"RI. Therefore, for updating Eq. (1), we can de"ne a compact subset of X(m ) as ""[a, b]I.
(A.19)
If we choose a(n)41 for all n and a4m (0)(m (0)(2(m (0)4b, according to I the result veri"ed in condition (2), m(n)3" with probability one for all n. Based on these six conditions, we may state that with updating Eq. (1), lim m(n)"m , with probability one. L In the same method, we can prove Proposition 1 for k42c!1.
(A.20) 䊐
Proof of Lemma 1. For c"1, we have I\ x2Ax"3x !x x # (!x x #2x!x x )!x x #3x G G\ G G G> I I\ I G "3x !x x #(!x x #2x !x x )#(!x x #2x !x x ) #(!x x #2x !x x )#2
Yi Sun / Neurocomputing 34 (2000) 169}193
189
#(!x x #2x !x x )#(!x x #2x !x x ) I\ I\ I\ I\ I\ I\ I\ I\ I\ I !x x #3x I I\ I "2x #(x !x )#(x !x )#(x !x )#2 (A.21) #(x !x )#(x !x )#2x'0,∀x3RI, xO0. I\ I\ I\ I I For c'1, let A"B#C, where B and C are lower triangle and upper triangle matrices, respectively, where the nonzero elements of B are de"ned as b "2, i"1,2, c, GG (A.22) b "!1, b "!1, b "2, i"c#1,2, k GG\A GG\A> GG and the other elements of B are zero; the nonzero elements of C are de"ned as c "2, c "!1, c "!1, i"1,2, k!c, GG GG>A\ GG>A c "2, i"k!c#1,2, k GG and the other elements of C are zero. For ∀x3RI, xO0, we have A I x2Bx"2 x# (!x x !x x #2x) G G G\A G G\A> G G GA> A I\A "2 x# (!x x !x x #2x ), G G G>A G> G>A G>A G G I\A I x2Cx" (2x!x x !x x )#2 x G G G>A\ G G>A G G GI\A> and then x2Ax"x2(B#C)x"x2Bx#x2Cx A I\A "2 x# (2x!x x !2x x !x x G G G G>A\ G G>A G> G>A G G I #2x )#2 x G>A G GI\A> A "2 x#(2x !x x !2x x !x x #2x ) G A >A >A >A G #(2x !x x !2x x !x x #2x ) >A >A >A >A #(2x !x x !2x x !x x #2x ) >A >A >A >A #(2x !x x !2x x !x x #2x )#2 >A >A >A >A
(A.23)
(A.24)
(A.25)
190
Yi Sun / Neurocomputing 34 (2000) 169}193
#(2x !x x !2x x !x x #2x ) I\A\ I\A\ I\ I\A\ I\ I\A I\ I\ I #(2x !x x !2x x !x x #2x)#2 x I\A I\A I\ I\A I I\A> I I G GI\A> A "2 x#x !x x #(x !x )#(x !x ) G A >A >A G #(x !x )#(x !x )#(x !x )#2 >A >A >A #(x !x )#(x !x )#(x !x ) I\A\ I\ I\A I\ I\A I I !x x #x#2 x I\A> I I G GI\A> A\ 1 3 1 "2 x# x # x# (x !x )#(x !x ) G A >A 2 2 A 2 G #(x !x )#(x !x )#(x !x ) >A >A >A #(x !x )#2#(x !x )#(x !x ) >A I\A\ I\ I\A I\ 1 !x ) #(x !x )# (x I I\A I 2 I\A> 3 1 I # x # x#2 x'0. G 2 I\A> 2 I GI\A> Hence, A is positive de"nite for c51. 䊐
(A.26)
Proof of Theorem 2. In the following proof, we denote d"(b!a)/k and g"(b!a)(c!1)/k. Eq. (11) yields 1 e" [4(m !mH#mH!a) 12(b!a) I\ # (m !mH #mH !mH#mH!m ) G> G> G> G G G G #4(b!mH#mH!m )] I I I 1 " 4(m !mH)#12(m !mH)(mH!a) 12(b!a) #12(m !mH)(mH!a)#4(mH!a) I\ # [(m !mH )#(mH !mH)#(mH!m ) G> G> G> G G G G #3(m !mH )(mH !mH)#3(m !mH )(mH!m ) G> G> G> G G> G> G G #3(mH !mH)(mH!m ) G> G G G
Yi Sun / Neurocomputing 34 (2000) 169}193
#3(m !mH )(mH !mH)#3(m !mH )(mH!m ) G> G> G> G G> G> G G #3(mH !mH)(mH!m ) G> G G G #6(m !mH )(mH !mH)(mH!m )] G> G> G> G G G #4(b!mH)#12(b!mH)(mH!m )#12(b!mH)(mH!m ) I I I I I I I #4(mH!m ). I I Note that after replacing m in (11) by mH, G G 1 I\ E " [4(mH!a)# (mH !mH)#4(b!mH)],
12(b!a) G> G I G due to (23) "m !mH"4g, i"1,2, k G G and from (9) d mH!a"b!mH" I 2
191
(A.27)
(A.28)
(A.29)
(A.30)
and mH !mH"d, i"1,2, k!1. G> G These imply 1 e4e # [4g#12g(d/2)#12g(d/2)
12(b!a) I\ # (g#g#3gd#3gg#3dg#3gd#3gg#3dg#6gdg) G #12(d/2)g#12(d/2)g#4g] 1 "e # [4g#6gd#3gd
12(b!a) I\ # (8g#12gd#6gd)#3gd#6gd#4g] G kg "e # (4g#6gd#3d)
6(b!a) (b!a)(c!1) "e # [4(c!1)#6(c!1)#3]
6k "e #e(c).
(A.31)
192
Yi Sun / Neurocomputing 34 (2000) 169}193
For c"1, (11) yields e"e because m "mH, i"1,2, k, due to Lemma 2. Since
G G the minimum point of e is unique, Corollary 2 guarantees that the equality of (29) holds if and only if c"1. 䊐
References [1] C. Bouton, G. Page`s, Self-organization and a.s. convergence of the one-dimensional Kohonen algorithm with non-uniformly distributed stimuli, Stochastic Process. Appl. 47 (1993) 249}274. [2] C. Bouton, G. Page`s, Convergence in distribution of the one-dimensional Kohonen algorithm when the stimuli are non uniform, Adv. Appl. Probab. 26 (1994) 80}103. [3] M. Budinich, J.G. Taylor, On the ordering conditions for self-organizing maps, Neural Comput. 7 (1995) 284}289. [4] H.F. Chen, Recursive Estimation and Control for Stochastic System, Wiley, New York, 1985. [5] M. Cottrell, J.C. Fort, A stochastic model of retinotopy: a self organizing process, Biol. Cybernet. 53 (1986) 405}411. [6] E. Erwin, K. Obermayer, K. Schulten, Self-organizing maps: stationary states, metastability, and convergence rate, Biol. Cybernet. 67 (1992) 35}45. [7] A. Gersho, R. Gray, Vector Quantization and Signal Compression, Kluwer Academic Publishers, Boston, 1992. [8] S. Haykin, Neural Networks: A Comprehensive Foundation, Macmillan, New York, 1994. [9] H.J. Kushner, D.S. Clark, Stochastic Approximation Methods for Constrained and Unconstrained systems, Springer, New York, 1978. [10] T. Kohonen, Automatic formation of topological maps of patterns in a self-organizing system, Proceedings of the second Scandinavian Conference on Image Analysis, Espoo, Finland, 1981, pp. 214}220. [11] T. Kohonen, Self-organized formation of topologically correct feature maps, Biol. Cybernet 34 (1982) 59}69. [12] T. Kohonen, Analysis of simple self-organizing process, Biol. Cybernet. 44 (1982) 135}140. [13] T. Kohonen, Clustering, taxonomy, and topological maps of patterns, Proceedings of the Sixth International Conference on Pattern Recognition, Munich, Germany, 1982, pp. 1148}1151. [14] T. Kohonen, Self-Organization and Associative Memory, Springer Series in Information Sciences, Springer, New York, 1988. [15] T. Kohonen, The self-organizing map, Proc. IEEE 78 (9) (1990) 1464}1480. [16] Z.-P. Lo, B. Bavarian, On the rate of convergence in topology preserving neural networks, Biol. Cybernet 65 (1991) 55}63. [17] Z.-P. Lo, Y. Yu, B. Bavarian, Analysis of the convergence properties of topology preserving neural networks, IEEE Trans. Neural Networks 4 (March 1993) 207}220. [18] L. Ljung, Analysis of recursive stochastic algorithm, IEEE Trans. Automat. Control AC}22 (1977) 551}575. [19] J. Makhoul, S. Roucos, H. Gish, Vector quantization in speech coding, Proc. IEEE 73 (1985) 1551}1588. [20] H. Ritter, Asymptotic level density for a class of vector quantization processes, IEEE Trans. Neural Networks 2 (1) (January 1991) 173}175. [21] H. Ritter, K. Schulten, On the stationary state of Kohonen's self-organizing sensory mapping, Biol. Cybernet. 54 (1986) 99}106. [22] H. Ritter, K. Schulten, Convergence properties of Kohonen's topology conserving maps: #uctuations, stability and dimension selection, Biol. Cybernet. 60 (1988) 59}71. [23] Y. Sun, On the reconstruction error of the Kohonen self-organizing mapping algorithm, Proceedings of the IEEE International Conference on Neural Networks, ICNN'96, Washington, DC, June 3}6, 1996, pp. 190}195. [24] V. Tolat, An analysis of Kohonen's self-organizing maps using a system of energy functions, Biol. Cybernet. 64 (1991) 55}63.
Yi Sun / Neurocomputing 34 (2000) 169}193
193
[25] H. Yin, N.M. Allinson, On the distribution and convergence of feature space in self-organizing maps, Neural Comput. 7 (1995) 1178}1187. [26] P.L. Zador, Asymptotic quantization error of continuous signals and the quantization dimension, IEEE Trans. Inform. Theory IT-28 (March 1982) 139}149.
Yi Sun received the B.S. and M.S. degrees in electrical engineering from the Shanghai Jiao Tong University, Shanghai, China, in 1982 and 1985, respectively, and the Ph.D. degree in electrical engineering from the University of Minnesota, Minneapolis, MN, in 1997. From 1985 to 1993, he was a lecturer at the Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University. In the summer of 1993, he was a visiting scientist at the Department of Mechanical Engineering, Concordia University, Montreal, Canada. From March to September 1997, he was a Post-doctoral Research Fellow at the Radiology Department, University of Utah, Salt Lake City, UT, where he studied MRI imaging. In the period of October 1997 to August 1998, as a Post-doctoral Research Associate, he worked on wireless communications in the Department of Electrical and Systems Engineering at the University of Connecticut, Storrs, CT. Since September 1998, Dr. Sun has been an Assistant Professor in the Department of Electrical Engineering at the City College of City University of New York. Dr. Sun's research interests are in the areas of wireless communications (with focus on CDMA multiuser detection, slotted CDMA networks, channel equalization and sequence detection, and multicarrier systems), image and signal processing, medical imaging, and neural networks.