Neural Networks 21 (2008) 406–413 www.elsevier.com/locate/neunet
2008 Special Issue
Two k-winners-take-all networks with discontinuous activation functionsI Qingshan Liu ∗ , Jun Wang Department of Mechanical and Automation Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong Received 26 July 2007; received in revised form 29 October 2007; accepted 13 December 2007
Abstract This paper presents two k-winners-take-all (k-WTA) networks with discontinuous activation functions. The k-WTA operation is first converted equivalently into linear and quadratic programming problems. Then two k-winners-take-all networks are designed based on the linear and quadratic programming formulations. The networks are theoretically guaranteed to be capable of performing the k-WTA operation in real time. Simulation results show the effectiveness and performance of the networks. c 2007 Elsevier Ltd. All rights reserved.
Keywords: Recurrent neural network; Discontinuous activation function; Linear programming; Quadratic programming; Global convergence; Differential inclusion; Lyapunov stability
1. Introduction The winner-takes-all (WTA) operation is the selection of the maximum from a collection of input signals. The WTA networks have been widely used in various applications, such as associative memories (Hertz, Krogh, & Palmer, 1991), signal processing (Andreou et al., 1991), cooperative models of binocular stereo (Marr & Poggio, 1976), Fukushima’s neocogniton for feature extraction, etc. (Yuille & Geiger, 2003). The k-winners-take-all (k-WTA) operation which selects the k largest inputs out of n inputs (1 ≤ k ≤ n), is a generalization of the WTA operation. The k-WTA operation has important applications in machine learning, such as kneighborhood classification, k-means clustering, etc. As the number of inputs increases and/or the selection process should be operated in real time, parallel algorithms and hardware implementation are desirable. For these reasons, there have been many attempts to design very large scale integrated (VLSI) circuits to perform k-WTA operations. In the literature, many WTA and k-WTA networks have been proposed (e.g., Calvert and Marinov (2000); Jayadeva and Rahman (2004); Liu and
Wang (2006); Maass (2000); Marinov and Calvert (2003); Marinov and Hopfield (2005); Sekerkiran and Cilingiroglu (1999); Urahama and Nagao (1995); Wang (1994); Wolfe et al. (1991); Yen, Guo, and Chen (1998) and Yuille and Geiger (2003)). Generally, the k-WTA operation can be defined as the following function xi = f (u i ) =
1, 0,
if u i ∈ {k largest elements of u}, otherwise,
(1)
where u = (u 1 , u 2 , . . . , u n )T is the input vector and x = (x1 , x2 , . . . , xn )T is the output vector. The solution to the k-WTA problem can be determined by solving the following linear integer programming problem: minimize − subject to
n X i=1 n X
u i xi , (2)
xi = k,
i=1 I An abbreviated version of some portions of this article appeared in Gu and
Wang (2007) as part of the IJCNN 2007 Conference Proceedings, published under IEE copyright. ∗ Corresponding author. E-mail addresses:
[email protected] (Q. Liu),
[email protected] (J. Wang). c 2007 Elsevier Ltd. All rights reserved. 0893-6080/$ - see front matter doi:10.1016/j.neunet.2007.12.044
xi ∈ {0, 1},
i = 1, 2, . . . , n.
According to Gu and Wang (2007), if the kth and (k + 1)th largest elements of u (denoted as u¯ k and u¯ k+1 respectively) are different, problem (2) is equivalent to the following linear
407
Q. Liu, J. Wang / Neural Networks 21 (2008) 406–413
programming (LP) problem: minimize − u T x, n X subject to xi = k,
(3)
i=1
0 ≤ xi ≤ 1,
i = 1, 2, . . . , n,
or the following quadratic programming (QP) problem: 1 T αx x − u T x, 2 n X subject to xi = k, minimize
(4)
i=1
0 ≤ xi ≤ 1,
i = 1, 2, . . . , n,
where α ≤ u¯ k − u¯ k+1 is a positive constant. In the literature, several effective recurrent neural networks for solving linear and quadratic programming problems have been proposed. Xia and Wang (1995) proposed the primal-dual network for solving linear programming problems. For k-WTA operation, the network architecture needs n + 1 neurons and 2n + 2 connections, and its dynamic equations based on (3) for k-WTA operation can be written as: dx = −(eT x − k)e + ro (ye − u), dt (5) dy = −ro [eT f (x + ye + u) − k], dt where x ∈ Rn , y ∈ R, e = (1, 1, . . . , 1)T ∈ Rn , ro = k f (x + ye + u) − xk22 , > 0 is a scaling parameter, f (x) = ( f (x1 ), . . . , f (xn ))T and 0, if xi < 0, f (xi ) = xi , if 0 ≤ xi ≤ 1, (6) 1, if xi > 1. The primal-dual network for solving quadratic programming problems has been proposed by Xia (1996). For k-WTA operation, the network architecture needs 3n + 1 neurons and 6n + 2 connections, and its dynamic equations based on (4) for k-WTA operation can be written as: dx = −(1 + α)(x − (x + ve + w − αx + u)+ ) dt −(eT x − k)e − x − y + e, dy = −y + (y + w)+ − x − y + e, dt dv (7) = −eT (x − (x + ve + w − αx + u)+ ) dt +eT x − k, dw = −x + (x + ve + w − αx + u)+ dt −y + (y + w)+ + x + y − e, where x, y, w ∈ Rn , v ∈ R, e = (1, 1, . . . , 1)T ∈ Rn , x + = (x1+ , . . . , xn+ )T , and xi+ = max{0, xi }(i = 1, . . . , n). The projection network proposed by Xia and Wang (2004) can be utilized for k-WTA operation, which needs n+1 neurons
and 2n + 2 connections. The dynamic equations of the k-WTA network (Gu & Wang, 2007) based on (3) can be written as: dx = −x + f (x + η(u + ye)), dt (8) dy = −eT x + k, dt where x ∈ Rn , y ∈ R, η is a positive constant, and f is defined in (6). The dynamic equations of the k-WTA network based on (4) can be written as: dx = −x + f (x − η(αx − u − ye)), dt (9) dy = −eT x + k, dt where x, y, η and f are defined in (8). The simplified dual network proposed by Liu and Wang (2006) for k-WTA operation needs n neurons and 3n connections which has the least neurons compared with the above two models, and its dynamic equations based on (4) can be written as: dy = −M y + f ((M − I )y − s) − s, (10) dt x = M y + s, where x, y ∈ Rn , M = 2(I − eeT /n)/α, s = Mu + ke/n, I is an identity matrix, and f is defined in (6). In this paper, we are concerned with constructing two new k-WTA networks based on the linear and quadratic optimization formulation, which has n neurons and less connections compared with the existing ones. 2. Preliminaries In this section, some definitions concerning matrix norms and non-smooth analysis are presented (e.g., see Clarke (1983); Filippov (1988) and Horn and Johnson (1985)). For x ∈ Rn , the vector norms k · k p are defined as v u n n X uX kxk1 = |xi |, kxk2 = t xi2 , kxk∞ = max |xi |. i=1
i=1
1≤i≤n
The induced matrix norms k · k p are displayed as follows kAk1 = max j
n X
|ai j |,
kAk2 =
p λmax (AT A),
i=1
kAk∞ = max i
n X
|ai j |,
j=1
where A = {ai j }n×n , λmax (·) is the maximum eigenvalue of the corresponding matrix. Definition 1 (Filippov (1988)). Suppose E ⊂ Rn . F : x 7→ F(x) is called a set-valued function from E ,→ Rn , if to each point x of a set E, there corresponds a non-empty closed set F(x) ⊂ Rn .
408
Q. Liu, J. Wang / Neural Networks 21 (2008) 406–413
Definition 2. A function f : Rn → R is said to be Lipschitz near x ∈ Rn if there exist ε, δ > 0, such that for any x (i) ∈ Rn satisfying kx (i) − xk2 < δ (i = 1, 2), we have | f (x (1) ) − f (x (2) )| ≤ εkx (1) − x (2) k2 . If f is Lipschitz near any point x ∈ Rn , then f is said to be locally Lipschitz in Rn . Definition 3. Assume that f is Lipschitz near x. The generalized directional derivative of f at x in the direction v ∈ Rn is defined as f (y + ξ v) − f (y) f 0 (x; v) = lim sup . y→x ξ ξ →0+
Fig. 1. Discontinuous activation function g(xi ).
3.1. Model description
The Clarke’s generalized gradient of f is defined as ∂ f (x) = {y ∈ Rn : f 0 (x; v) ≥ y T v, ∀v ∈ Rn }. When f is locally Lipschitz in Rn , f is differentiable for almost all (a.a.) x ∈ Rn (in the sense of Lebesgue measure). Then, the Clarke’s generalized gradient of f at x ∈ Rn is equivalent to ∂ f (x) = K { lim ∇ f (xn ) : xn → x, xn 6∈ N , xn 6∈ E}, n→∞
where K (·) denotes the closure of the convex hull of the corresponding set, N ⊂ Rn is an arbitrary set with measure zero, and E ⊂ Rn is the set of points where f is not differentiable. Rn
Definition 4. A function f : → R which is locally Lipschitz near x ∈ Rn is said to be regular at x if there exists the one-sided directional derivative for any direction v ∈ Rn which is given by f (x + ξ v) − f (x) , f 0 (x; v) = lim sup ξ + ξ →0 and we have f 0 (x; v) = f 0 (x; v). The function f is said to be regular in Rn if it is regular for any x ∈ Rn . Consider the following ordinary differential equation (ODE): dx = ψ(x), x(t0 ) = x0 . (11) dt A solution of (11) in the Filippov sense is defined as follows: Definition 5. A set-valued map defined as \ \ φ(x) = K [ψ(B(x, ε) − N )], ε>0 µ(N )=0
where µ(N ) is the Lebesgue measure of set N , B(x, ε) = {y : ky − xk2 ≤ ε}. A solution of (11) is an absolutely continuous function x(t) defined on an interval [t0 , t1 ](t0 ≤ t1 ≤ +∞), which satisfies x(t0 ) = x0 and the differential inclusion: dx ∈ φ(x), a.a. t ∈ [t0 , t1 ]. dt 3. LP-based model In this section, a k-WTA network based on the linear programming problem (3) is constructed.
According to the Karush–Kuhn–Tucker (KKT) conditions (see Bazaraa, Sherali, and Shetty (1993)), x ∗ is an optimal solution of (3), if and only if there exist y ∗ ∈ R and z ∗ ∈ Rn such that (x ∗ , y ∗ , z ∗ )T satisfies the following optimality conditions: −u + ey + z = 0,
(12)
eT x z i zi zi
(13)
= k, ≥ 0, = 0, ≤ 0,
if xi = 1, if 0 < xi < 1, if xi = 0,
(14)
where e = (1, 1, . . . , 1)T ∈ Rn . From (12), we have (15)
x = x + u − ey − z. Substituting (15) into (13), it follows that eT (x + u − ey − z) = k. Then y=
1 T (e x − eT z + eT u − k). n
Substituting (16) into (12), we have eeT eeT eeT k x+ I− z−u+ u − e = 0. n n n n
(16)
(17)
Let P = eeT /n and q = u − Pu + ke/n, then, (17) can be written as P x + (I − P)z − q = 0.
(18)
The matrix P, called the projection matrix, is symmetric and satisfies P 2 = P and (I − P)2 = I − P. The eigenvalues of P take 1 or 0 only. As a result, kPk2 = kI − Pk2 = 1. According to (14), the following activation function is defined as shown in Fig. 1 1, [0, 1], g(xi ) = 0, [−1, 0], −1,
if xi > 1, if xi = 1, if 0 < xi < 1, if xi = 0, if xi < 0.
(19)
409
Q. Liu, J. Wang / Neural Networks 21 (2008) 406–413
Fig. 3. Discontinuous activation function h(z i ).
Proposition 2. The network (20) can perform the k-WTA operation if it has a unique equilibrium point and σ ≥ 0 when (I −eeT /n)u = 0 or one of the following conditions holds when (I − eeT /n)u 6= 0:
Fig. 2. Architecture of the k-WTA network described in (20).
(i) σ ≥ Based on Eqs. (18) and (19), the dynamic equation of the LP-based k-WTA network model is described as follows: dx = −P x − σ (I − P)g(x) + q, dt
(20)
where g(x) = (g(x1 ), g(x2 ), . . . , g(xn is a positive scaling constant and σ is a non-negative gain parameter. The architecture of the k-WTA network is depicted in Fig. 2, from which we can observe that a k-WTA circuit consists of n integrators, 2n + 3 summers, and 2n weighted connections.
(iii) σ ≥ (iv) σ ≥
where K [g(x)] = (K [g(x1 )], . . . , K [g(xn )])T and K [g(xi )] = [g(xi− ), g(xi+ )]. 3.2. Convergence results By using the Lyapunov method and non-smooth analysis (e.g., see Aubin and Cellina (1984); Clarke (1983); Forti and Nistri (2003) and Lu and Chen (2006)), the stability and global convergence of network (20) have been proved in Liu and Wang (in press-a). Denote Ω ∗ as the optimal solution set and Ω¯ as the equilibrium point set. Throughout this paper, we always assume that the optimal solution set Ω ∗ is not empty and there exists a finite x ∗ ∈ Ω ∗ . Definition 7. x¯ is said to be an equilibrium point of system (20) if there exists γ¯ ∈ K [g(x)] ¯ such that 0 = −P x¯ − σ (I − P)γ¯ + q.
(21)
The following two propositions are obtained directly from Corollaries 1 and 2 in Liu and Wang (in press-a). Proposition 1. The network (20) can perform the k-WTA operation if Ω¯ ⊂ {x ∈ Rn : 0 ≤ x ≤ 1}.
Pn
i=1 |u i −
j=1 u j /n|
2n−2
(ii) σ ≥ n
, or
rP
Pn n 2 i=1 (u i − j=1 u j /n)
, or n(n−1) Pn 2 maxi |u i − j=1 u j /n|, or, qP Pn n 2 i=1 (u i − j=1 u j /n) n P o. P n n min+ i=1 (u i − j=1 u j /n)γi | γ ∈{−1,0,1} | i
))T ,
Definition 6. A solution of system (20) with initial condition x(t0 ) = x0 is an absolutely continuous function x(t) defined on an interval [t0 , t1 ](t0 ≤ t1 ≤ +∞), which satisfies x(t0 ) = x0 and the differential inclusion: dx ∈ −P x − σ (I − P)K [g(x)] + q, a.a. t ∈ [t0 , t1 ], dt
Pn
4. QP-based model In this section, a k-WTA network is constructed based on the quadratic programming problem (4). 4.1. Model description According to the Karush–Kuhn–Tucker (KKT) conditions (see Bazaraa et al. (1993)), x ∗ is a solution of (4) if and only if there exist y ∗ ∈ R and z ∗ ∈ Rn such that (x ∗ , y ∗ , z ∗ )T satisfies the following optimality conditions: αx − u + ey + z = 0, T
e x = k, if z i > 0, xi = 1, 0 ≤ xi ≤ 1, if z i = 0, xi = 0, if z i < 0.
(22) (23) (24)
Similarly to the analysis in Section 3, (22) and (23) are equivalent to the following equation: (I − P)z + [α I + (1 − α)P]x − q = 0,
(25)
where P and q are defined in (18). According to (24), the following activation function is defined as shown in Fig. 3 if z i > 0, 1, h(z i ) = [0, 1], if z i = 0, (26) 0, if z i < 0. Since 1 1 I + 1− P = I, α α 1 1 −1 [α I + (1 − α)P] = I + 1 − P. α α [α I + (1 − α)P]
(27)
410
Q. Liu, J. Wang / Neural Networks 21 (2008) 406–413
Fig. 4. Architecture of the k-WTA network described in (31) and (32).
in which K [h(z)] = (K [h(z 1 )], . . . , K [h(z n )])T and K [h(z i )] = [h(z i− ), h(z i+ )].
It follows that [α I + (1 − α)P]−1 (I − P) =
1 (I − P), α
(28) 4.2. Convergence results
and [α I + (1 − α)P]−1 q =
q k(α − 1) + e. α nα
(29)
From (25), (28) and (29), we have 1 q k(α − 1) e. x = − (I − P)z + + α α nα
(30)
Based on Eqs. (25) and (30), the QP-based k-WTA network model is described as follows:
The stability and global convergence of system (31) can be proved by using the Lyapunov theory and non-smooth analysis method (see Liu and Wang (in press-b)). Since the optimal solution set Ω ∗ is not empty and there exists a finite x ∗ ∈ Ω ∗ , the equilibrium point set Ω¯ of (31) is not empty. Definition 9. z ∗ is said to be an equilibrium point of system (31) if there exists γ ∗ ∈ K [h(z ∗ )] such that 0 = −(I − P)z ∗ − [α I + (1 − α)P]γ ∗ + q.
(33)
• State equation dz = −(I − P)z − [α I + (1 − α)P]h(z) + q, dt • Output equation
(31)
Proposition 3. The system (31) with any α > 0 is stable in the sense of Lyapunov and any trajectory is globally convergent to an equilibrium point. Proof. See Theorem 1 (Liu & Wang, in press-b).
1 q k(α − 1) x = − (I − P)z + + e, α α nα
(32)
where h(z) = (h(z 1 ), h(z 2 ), . . . , h(z n ))T and is a positive scaling constant. I − P has n eigenvalues {0, 1, 1, . . . , 1}. The linear part of (31) is independent of α. Then the convergence of network (31) is independent of α, which can be seen from the simulation results in Section 5. The architecture of the network described in (31) and (32) is depicted in Fig. 4. From the network model (31) and (32) and the architecture shown in Fig. 4, we can see that the network can be implemented with n integrators, 3n connections and 4n + 3 summers. For system (31), the Filippov solution is defined as follows: Definition 8. A solution of system (31) with initial condition z(t0 ) = z 0 is an absolutely continuous function z(t) defined on an interval [t0 , t1 ](t0 ≤ t1 ≤ +∞), which satisfies z(t0 ) = z 0 and for a.a. t ∈ [t0 , t1 ],
dz ∈ −(I − P)z − [α I + (1 − α)P]K [h(z)] + q, dt
Proposition 4. x ∗ = −(I − P)z ∗ /α + q/α + (α − 1)ke/(nα) is an optimal solution of k-WTA problem (4), where z ∗ is an equilibrium point of system (31). Proof. See Lemma 2(ii) (Liu & Wang, in press-b).
From the results of Propositions 3 and 4, the network in (31) and (32) is globally convergent and the output vector is globally convergent to an optimal solution of problem (4). Thus, if the kth and (k + 1)th largest elements of the input signals u i (i = 1, 2, . . . , n) are different, the network in (31) and (32) is capable of guaranteeing the k-WTA operation. The number of neurons and connections of the proposed kWTA network models compared with several others are listed in Table 1. From this table, it is clear that the models proposed herein have lower model complexity. Here, we have proposed two k-WTA networks, but they have different architectures and properties. The network in (20) has lower model complexity than the network in (31) and (32). However, when problem size n is large, the value of σ in (20)
411
Q. Liu, J. Wang / Neural Networks 21 (2008) 406–413 Table 1 Comparison of related networks in terms of model complexity Model type
Eqn(s)
Layer(s)
Neurons
Connections
LP-based primal-dual network QP-based primal-dual network LP-based projection network QP-based projection network QP-based simplified dual network LP-based network herein QP-based network herein
(5) (7) (8) (9) (10) (20) (31), (32)
2 2 2 2 1 1 1
n+1 3n + 1 n+1 n+1 n n n
2n + 2 6n + 2 2n + 2 2n + 2 3n 2n 3n
Fig. 5. Transient behaviors of the k-WTA network (20) with σ = 6 in Example 1.
Fig. 6. Transient behaviors of the k-WTA network (20) with σ = 2 in Example 1.
needs to be large, which can be observed from the following simulation results in Section 5. The convergence time decreases when the problem scale n increases for network (20). For network (31) and (32), the parameter α needs to be sufficiently small when the kth and (k + 1)th largest elements of u are contiguous regardless of the problem size. Some comparisons are shown in the ensuing simulation results. 5. Simulation results Example 1. Consider a k-WTA problem with input vector u i = i (i = 1, 2, . . . , n), k = 3. The proposed two k-WTA networks are utilized to determine the three largest inputs. According to Proposition 2(iii), a lower bound of σ is 4 when n = 5. The transient behaviors of state variables of the k-WTA network (20) are depicted in Fig. 5 with = 10−6 and σ = 6. It shows that the states of the network are globally convergent to the unique optimal solution x ∗ = (0, 0, 1, 1, 1)T from 10 random initial points. However, if σ = 2, the equilibrium point is not unique and may not be an optimal solution to problem (3) (see Fig. 6). Fig. 7 shows the convergence of the k-WTA network (20) with different n, from which we can see that the
Fig. 7. Convergence behaviors of the k-WTA network (20) with respect to different n in Example 1.
convergence time of the first and last components of x decreases when the problem size n increases.
412
Q. Liu, J. Wang / Neural Networks 21 (2008) 406–413
Fig. 8. Transient behaviors of state vector z of the k-WTA network in (31) and (32) in Example 1.
Fig. 10. Convergence behavior of the k-WTA network in (31) and (32) with respect to different α in Example 1.
Fig. 11. Convergence behavior of the k-WTA network in (31) and (32) with respect to different n in Example 1. Fig. 9. Transient behavior of output vector x of the k-WTA network in (31) and (32) in Example 1.
The transient behaviors of state variables and output variables of the k-WTA network in (31) and (32) are respectively depicted in Figs. 8 and 9 with = 10−6 , α = 0.25 and 20 random initial values. It shown that the state vector z is globally convergent to the equilibrium point set and the output vector x is globally convergent to x ∗ = (0, 0, 1, 1, 1)T . Fig. 10 shows that the convergence rate remains steady with respect to various values of α. In Fig. 11, it can be observed that when the problem size n increases, the convergence time of the first and last components of x also increases. Example 2. Let us consider a set of four sinusoidal input signals with the following instantaneous values u p (t) =
10 sin[2π(t + 0.2( p − 1))] ( p = 1, 2, 3, 4) and k = 2. The four input signals and the transient outputs of the two k-WTA networks are depicted in Fig. 12, in which = 0.01, σ = 15 and α = 0.001. The simulation results show that the k-WTA networks can generate the two largest signals in real time. 6. Conclusions This paper presents two new k-WTA networks with discontinuous activation functions. Compared with the other networks for k-WTA operation, the proposed networks have lower architectural complexity. The global convergence of the k-WTA networks are guaranteed by theoretical analysis. Simulation results show the effectiveness and efficiency of the k-WTA networks.
Q. Liu, J. Wang / Neural Networks 21 (2008) 406–413
413
Fig. 12. Inputs and outputs of the two k-WTA networks in Example 2.
References Andreou, A., Boahen, K., Pouliquen, P., Pavasovic, A., Jenkins, R., & Strohbehn, K. (1991). Current-mode subthreshold mos circuits for analog vlsi neural systems. IEEE Transactions on Neural Networks, 2(2), 205–213. Aubin, J., & Cellina, A. (1984). Differential inclusions: Set-valued maps and viability theory. New York: Springer-Verlag. Bazaraa, M., Sherali, H., & Shetty, C. (1993). Nonlinear programming: Theory and algorithms (2nd edn). New York: John Wiley. Calvert, B., & Marinov, C. (2000). Another k-winners-take-all analog neural network. IEEE Transactions on Neural Networks, 11(4), 829–838. Clarke, F. (1983). Optimization and nonsmooth analysis. New York: Wiley. Filippov, A. (1988). Differential equations with discontinuous righthand sides. Mathematics and its applications (Soviet series). Boston: Kluwer Academic Publishers. Forti, M., & Nistri, P. (2003). Global convergence of neural networks with discontinuous neuron activations. IEEE Transactions on Circuits and Systems-I, 50(11), 1421–1435. Gu, S., & Wang, J. (2007). A k-winner-take-all neural network based on linear programming formulation In: Proceedings of IJCNN 2007. Hertz, J., Krogh, A., & Palmer, R. (1991). Introduction to the theory of neural computing. MA: Addison Wesley. Horn, R., & Johnson, C. (1985). Matrix analysis. London, UK: Cambridge University. Jayadeva,, & Rahman, S. (2004). A neural network with O(n) neurons for ranking n numbers in O(1/n) time. IEEE Transactions on Circuits and Systems-I, 51(10), 2044–2051. Liu, Q., & Wang, J. (2007). A one-layer recurrent neural network with a discontinuous activation function for linear programming, Neural Computation, 20 (in press-a). Liu, Q., & Wang, J. (2007). A one-layer recurrent neural network with a discontinuous hard-limiting activation function for quadratic programming. IEEE Transactions on Neural Networks (in press-b). Liu, S., & Wang, J. (2006). A simplified dual neural network for quadratic programming with its kwta application. IEEE Transactions on Neural Networks, 17(6), 1500–1510. Lu, W., & Chen, T. (2006). Dynamical behaviors of delayed neural network
systems with discontinuous activation functions. Neural Computation, 18, 683–708. Maass, W. (2000). On the computational power of winner-take-all. Neural Computation, 12, 2519–2535. Marinov, C., & Calvert, B. (2003). Performance analysis for a k-winnerstake-all analog neural network: basic theory. IEEE Transactions on Neural Networks, 14(4), 766–780. Marinov, C., & Hopfield, J. (2005). Stable computational dynamics for a class of circuits with O(n) interconnections capable of KWTA and rank extractions. IEEE Transactions on Circuits and Systems-I, 52(5), 949–959. Marr, D., & Poggio, T. (1976). Cooperative computation of stereo disparity. Science, 194(4262), 283–287. Sekerkiran, B., & Cilingiroglu, U. (1999). A CMOS k-winners-take-all circuit with O(n) complexity. IEEE Transactions on Circuits and Systems-II, 46(1), 1–5. Urahama, K., & Nagao, T. (1995). K-winners-take-all circuit with o(n) complexity. IEEE Transactions on Neural Networks, 6(3), 776–778. Wang, J. (1994). Analogue winner-take-all neural networks for determining maximum and minimum signals. International Journal of Electronics, 77(3), 355–367. Wolfe, W., Mathis, D., Anderson, C., Rothman, J., Gottler, M., Brady, G., et al. (1991). K-winner networks. IEEE Transactions on Neural Networks, 2(2), 310–315. Xia, Y., & Wang, J. (1995). Neural network for solving linear programming problems with bounded variables. IEEE Transactions on Neural Networks, 6(2), 515–519. Xia, Y. (1996). A new neural network for solving linear and quadratic programming problems. IEEE Transactions on Neural Networks, 7(6), 1544–1548. Xia, Y., & Wang, J. (2004). A general projection neural network for solving monotone variational inequalities and related optimization problems. IEEE Transactions on Neural Networks, 15(2), 318–328. Yen, J., Guo, J., & Chen, H. (1998). A new k-winners-take-all neural network and its array architecture. IEEE Transactions on Neural Networks, 9(5), 901–912. Yuille, A., & Geiger, D. (2003). Winner-take-all networks. In The handbook of brain theory and neural networks (2nd edn) (pp. 1228–1231). MA: MIT Press Cambridge.