Unbiased consensus in wireless networks via collisional random broadcast and its application on distributed optimization

Unbiased consensus in wireless networks via collisional random broadcast and its application on distributed optimization

Signal Processing 98 (2014) 212–223 Contents lists available at ScienceDirect Signal Processing journal homepage: www.elsevier.com/locate/sigpro Un...

962KB Sizes 2 Downloads 63 Views

Signal Processing 98 (2014) 212–223

Contents lists available at ScienceDirect

Signal Processing journal homepage: www.elsevier.com/locate/sigpro

Unbiased consensus in wireless networks via collisional random broadcast and its application on distributed optimization Hui Feng, Xuesong Shi, Tao Yang, Bo Hu n Department of Electronic Engineering, Fudan University, 200433, China

a r t i c l e in f o

abstract

Article history: Received 9 May 2013 Received in revised form 26 September 2013 Accepted 12 November 2013 Available online 1 December 2013

We first propose an unbiased consensus algorithm in wireless networks via random broadcast, by which all the nodes tend to the initial average in mean almost surely. The innovation of the algorithm lies in that it can work in any connected topology, in spite of the possible collisions from simultaneous data arriving at receivers in a shared channel. Based on the consensus algorithm, we propose a distributed optimization algorithm for a sum of convex objective functions, which is the fundamental model for many applications on signal processing in network. Simulation results show that our algorithms provide an appealing performance with lower communicational complexity compared with existing algorithms. & 2013 Elsevier B.V. All rights reserved.

Keywords: Consensus Random broadcast Gossip Distributed optimization

1. Introduction In a multi-hop wireless network, each node communicates with others via broadcast wireless media, where data are usually exchanged among neighboring nodes due to the power constraint. In the early period of the study on wireless networks, many researchers focused on the multi-hop communication strategies, such as the MAC and routing protocols [1]. With the development of sensor network and Internet of Things (IoT), a smart network embraces necessary signal processing jobs in-network [2], which considers the data transmission and processing in integrated way. The potential applications include distributed estimation [3–6], data fusion [7], classification [8], and recovery for sparse signals through l1-norm minimization [9]. Consensus is a representative topic in network processing. The goal of consensus is to design a local (not global) information exchanging strategy, by which all nodes will n

Corresponding author. Tel.: þ86 21 65642762. E-mail addresses: [email protected] (H. Feng), [email protected] (X. Shi), [email protected] (T. Yang), [email protected], [email protected] (B. Hu). 0165-1684/$ - see front matter & 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.sigpro.2013.11.017

tend to the same state asymptotically. The fundamental theory of linear consensus roots in the classic linear dynamical system and the Markov chain in stochastic process. Wang et al. gave a necessary and a sufficient condition on the consensus of a general output-feedback linear multi-agent system [10]. The consensus of network was discussed in [11] with infinite time-delay, packet loss and quantization in communication. The model predictive controller was introduced to accelerate the consensus rate [12]. Su et al. investigated the consensus of a distributed T-Z fuzzy filter with time-varying delays and link failures [13]. Ren et al. [14] and Olfati-Saber et al. [15] surveyed a lot of mainstream consensus algorithms, and analyzed the convergence results under various information exchanging strategies. By their styles of information exchanging, existing consensus algorithms can be divided into three categories as follows, Style I : Pair-wise exchange; Style II : Local fusion; Style III : Asynchronous broadcast gossip. In Style I, two neighboring nodes exchange and mix their data each time. The pioneering works of Style I brought in the P2P file sharing over the Internet [16].

H. Feng et al. / Signal Processing 98 (2014) 212–223

As for Style II, each node acquires information from all its neighbors, and then makes linear mixing with its own data simultaneously [17]. Style III is of an asynchronous broadcast way, where each node broadcasts data to its neighbors. If the data from the neighbors is received successfully, the node mixes the received data with its own data. Obviously, Style III is very suitable for a wireless network, due to the inherent broadcast property of wireless communication. However, research works on Style III are not as plenary as that on Styles I and II, in which [18,19] are two representative works. A consensus result can be reached in a deterministic or random way. Most algorithms collected in [15] rely on deterministic information exchanges, where all the nodes will reach a consensus asymptotically as the following definition. Definition 1. In a network consisting of N nodes, where node i has an initial value at slot 0 as x0i A RM ; i ¼ 1; 2; …; N. The network reaching an asymptotical consensus result means that there exists x1 A RM such that limk-1 xki ¼ x1 ; i ¼ 1; 2; …; N. Further, if x1 ¼ x 0 9 N  1 ∑i x0i , we have the average consensus (AC) result. In contrast to deterministic ones, the works of [18–21] investigated the random algorithms involving random information exchanges over the network. For instance as in [19], each node does random broadcast gossip as Style III. Compared with deterministic ones, random algorithms may speed up the convergence rate, and substantially reduce the communication cost [22]. A random consensus algorithm usually guides all the nodes’ values in a network to be the same almost surely (a.s.) asymptotically as the following definition. Definition 2. In a network consisting of N nodes, where node i has an initial value at slot 0 as x0i A RM ; i ¼ 1; 2; …; N. The network reaching probabilistic consensus (PC) means that there exists x1 A RM such that limk-1 xki ¼ x1 ; i ¼ 1; 2; …; N a.s. Further, if E½x1  ¼ x 0 ¼ N  1 ∑i x0i , we have the unbiased consensus (UC) result. In a wireless network using shared channels, there are possible collisions on receivers. Collisions may change the topology of a network temporarily, which make the consensus procedure sophisticate. Two methods have been considered to solve the problem of collisions. The first is to avoid any possible collision by designing special consensus strategy. Aysal et al. [19] proposed a random gossip algorithm, where only one node may wake up each time and do broadcast. The authors proved that such algorithm can reach UC, and yet not efficiently, since only one node can broadcast each time. In a practical scenario, we expect that more nodes would broadcast simultaneously, such that data will be disseminated over the network faster. Fagnani et al. [21] designed a random broadcast strategy, the collision broadcast gossip algorithm (CBGA), where each node broadcasts with the same probability. However, the authors only proved the UC of CBGA in specific Abelian Caylay topologies, while their algorithm may not reach UC in other connected networks. As a matter of fact, there is an inherent relationship between the consensus problem and the distributed optimization problem. As indicated in [23], the consensus problem is a special case of the variance minimization problem over the network, where the objective function is

213

min ∑i J xki x k J 2 . The collaborative optimization over network is an important method to solve practical problems in wired and wireless networks, such as signal processing [24], distributed learning [25], and automatic control [26]. In such cases, the objective function is usually with the form of a sum of multiple components, i.e., f ðxÞ ¼ ∑N i ¼ 1 f i ðxÞ, where each component belongs to a specific processing node in network. Each node expects to obtain a global optimization result by local information exchanges among neighbors [27], which combines the network communication and the distributed computing together [28]. There are mainly three methods to distribute an optimization problem in the literature. The first one is to add explicit constraints by introducing auxiliary variables, and then solve it by the method of multipliers (MoM) [29, Section 3.4]. A widely applied variant of MoM algorithm is the alternative direction MoM (ADMoM) [29,30], which has been used to solve various distributed problems [3–6,8,9]. The second method is the incremental approach [22,31,32], where each node makes a local gradient descent, then relays the result to another node and repeats the procedure. As not requiring the global activity over network, the incremental approach cannot make concurrent computations on multiple nodes, which leads to a slow convergence rate. The third method is to integrate the consensus algorithm with local computations. Nedic et al. combined the Style II and Style III consensus with the local gradient descent in [33,34] respectively. Ref. [34] is the closest work to the distributed optimization part of this paper. However, the convergence rate of [34] is rather slow, since only one node may wake up in each slot. The contributions of this paper are mainly twofold. First, we propose a random broadcast gossip strategy, which is a kind of Style III consensus approach. As in [21], we take into account the possible collisions on the receivers due to the asynchronous behavior. We prove that our algorithm can reach UC in any connected network, which is a significant improvement of [19,21]. Second, we integrate the gradient descent with the random broadcast gossip, by which we obtain a fast distributed optimization method for a sum of convex functions. The rest of this paper is organized as follows. The problem statement and the algorithms are presented in Section 2. The convergence analysis of proposed algorithms is given in Section 3. Simulations with interpretations can be seen in Section 4. Finally, we conclude this paper with some remarks in Section 5.

2. Problem statement and algorithms The application scenario considered in this paper is as the following assumption: (A1) All nodes are working in half-duplex mode in a slottedtime shared channel, i.e., each node interchanges the roles as a transmitter or a receiver in slotted time, but cannot transmit and receive in the same slot. (A2) All nodes are equipped with omni-directional antennas with the same coverage radius, such that a pair of

214

H. Feng et al. / Signal Processing 98 (2014) 212–223

nodes can communicate with each other if and only if the distance between their locations is within the radius. (A3) The underlying graph G of the network is connected. There are two layers of topologies of a wireless network in this paper. The first layer of topology is the underlying graph G consisting of N nodes. By (A2), any two nodes within one-hop distance establish an edge in G linking with each other. The N  N adjacent matrix A of the undirected G is defined by entries: ( 1 there is an edge linking node j with node i; aij ¼ 0 otherwise: The N  N degree matrix D of the graph G is defined as a diagonal matrix, where the i-th diagonal entry di is the number of neighbors of node i in G. The second layer of topology is a time-varying graph Gk corresponding to the slot k. In this paper, each node broadcasts its data to neighbors following the independent and identical Bernoulli process with probðtransmitÞ ¼ p and probðreceiveÞ ¼ 1 p. Possible receiving failures may occur in two circumstances. First, there is a collision when two or more packets arrive at the same node in the same slot. Second, a packet arrives at a node which is just in transmitting mode. As shown in Fig. 1, among the three neighbors of node 2, only node 3 hears the broadcast successfully, and others fail due to either a collision or being in transmitting mode themselves. In order to describe the time switching links of graph Gk , we define the time-varying adjacency matrix Ak of Gk by rows: ( T ej if i’j succesfully; k rowi ðA Þ ¼ all 0 otherwise; where eTj is a row vector with all N elements as zeros but the j-th position as 1. The condition “i’j successfully” means that node i hears the broadcast from neighboring node j successfully, which only occurs when node j broadcasts while node i is in receiving mode. We also define the k in-degree matrix Dk of Gk with the diagonal element di : ( 1 if i’j succesfully; k di ¼ 0 otherwise:

Node in transmitting mode Node in receiving mode

The nodes hearing the broadcast successfully will update their values via a linear mixing operation with the mixing coefficient γ i , and those failing to hear the broadcast will reserve the original data. Then we have the first algorithm listed in Algorithm 1. Main result1: We prove that the following algorithm C-RBG makes the whole network reach UC as Definition 2. Algorithm 1. Consensus via random broadcast gossip (C-RBG). For slot k ¼0, 1, 2, … Each node broadcasts with probability p. For node i¼ 1, 2,…, N If i’j successfully, xki þ 1 ¼ ð1  γ i Þxki þ γ i xkj

ð1Þ

Else xki þ 1 ¼ xki until some stopping criterion is met.

The mixing coefficient γ i is crucial to the average conservation in the whole network. We design γ i specifically as γ i ¼ ð1 pÞq  di ;

ð2Þ

where q is a constant larger than the maximum degree of the graph G, i.e., q 4 maxi di , such that 0 o γ i o 1 . The intuitive behavior under such mixing rule is that the node having more neighbors would be more likely to accept its neighbors' data, rather than reserve its own data. In a regular graph where all nodes have the same number of neighbors, all the nodes have the same mixing coefficient. In Section 3, we can see that the design of γ i as (2) aims to overcome the various collision probabilities on nodes with various degrees, which is crucial to the unbiasedness of C-RBG. The second proposed algorithm is to solve the optimization problem with a sum of objective functions as N

min

f ðxÞ ¼ ∑ f i ðxÞ i¼1

s:t:

x A X ¼ ⋂X i ;

ðP1Þ

i

where f i : RM -R is the continuous convex objective function (but not having to be differentiable) of node i. The feasible set X i of node i is a nonempty, closed, and convex subset of RM . As shown in Algorithm 2, each node randomly broadcasts its value, and those successfully received nodes update their data like C-RBG, followed by a step of subgradient descent. Algorithm 2. Distributed gradient descent via random broadcast gossip (DGD-RBG). For slot k ¼0, 1, 2, … Each node broadcasts with probability p. For node i¼ 1, 2,…, N If i’j successfully,

1

2

5 4

vki ¼ ð1  γ i Þxki þ γ i xkj

ð3Þ

xki þ 1

ð4Þ

¼ P X i ðvki  αki g ki Þ

Else xki þ 1 ¼ xki untilsome stopping criterion is met.

3 Fig. 1. Nodes 1, 2 and 5 are broadcasting, and nodes 3 and 4 are in receiving mode.

In (4), g ki is the subgradient of fi at vki , αki is the stepsize, and P X i is the projection operator onto the set X i of node i.

H. Feng et al. / Signal Processing 98 (2014) 212–223

Clearly, parallel operations are feasible in (3) and (4). For the well-definition of the subgradient descent of (4), we also assume that (A4) The norm of all subgradients in DGD-RBG is bounded, i.e., there exists a constant C such that J g ki J r C holds for all i; k. Let U ki be the event that node i is updated in slot k, and IcðÞ is the 0–1 indicator function. Each node counts the accumulated number of updates until slot k as uki ¼ ∑k IcðU ki Þ, and the stepsize for each subgradient descent is chosen as [34] αki ¼

1 : uki þ 1

ð5Þ

It can be seen that the stepsize as (5) is diminishing over the time. Main result2: We prove that DGD-RBG will make the whole network reach PC as Definition 2. At the same time, limk-1 xki ¼ xn , a.s., where xn A X n is an optimal solution of (P1).

215

where Rki’j is the event that node i successfully receives the broadcast from node j in slot k. The event occurs when one of the neighbors of node i broadcasts, and node i is in receiving mode. Correspondingly, we can get the probability of Dk by diagonal elements: k

probðU ki Þ ¼ probðdi ¼ 1Þ ¼ di pð1  pÞdi ;

ð10Þ

where U ki is the event that node i updates successfully in slot k, which has relation to the degree of each node in the undirected graph G. Now, we can define the expectation matrices for further discussion as D 9 E½Dk , A 9 E½Ak , L 9E½Lk  9D A , and W 9 E½W k . By these definitions, the expectation of (7) can be formalized as E½zk þ 1 jzk  ¼ W zk , where W has the following properties. Lemma 1. Let assumptions (A1)–(A3) hold, we have (1) W ¼ I  ɛL, where ɛ ¼ pð1  pÞq , and L ¼ D  A is the Laplacian matrix of the undirected graph G; (2) W is non-negative and doubly stochastic, i.e., W 1 ¼ 1, and 1T W ¼ 1T ; (3) The eigenvalues of W can be sorted as

3. Convergence results

0 rλN ðW Þ r⋯ rλ2 ðW Þ o λ1 ðW Þ ¼ 1;

3.1. C-RBG

where λi ðW Þ ¼ 1  ɛλN  i ðLÞ; i ¼ 0; 1; …; N  1, and 1 is the unique largest eigenvalue of W .

The iterations of all nodes as (1) can be formalized in a matrix form as X

kþ1

k

k

¼W X ;

ð6Þ

Proof. From (8), we can calculate W by rows as rowi ðW Þ ¼ eTi ð1 di pð1  pÞdi þ ∑ pð1 pÞdi ðð1 γ i ÞeTi þγ i eTj Þ jANi

where the i-th row of the matrix X k is the data of node i in slot k ðxki ÞT , and W k is the updating matrix in slot k. Consider (6) by columns, the linear dynamic system (6) can be described as zkmþ 1 ¼ W k zkm ;

ð7Þ

is the m-th column of X k , where which is a vector consisting of all nodes' m-th components. Since many conclusions drawn from the following discussions are the same for all components, we can omit the column suffix m for brevity without any confusion. From the description of C-RGB, the rows of W k are ( ð1  γ i ÞeTi þ γ i eTj if i’j successfully; rowi ðW k Þ ¼ ð8Þ eTi otherwise: zkm

¼ ðxk1;m ; xk2;m ; …; xkN;m ÞT

It follows that W k can be written in a matrix form as W k ¼ I  ΓLk ¼ I  ΓðDk  Ak Þ, where Γ is a diagonal matrix with elements γ i , and Lk 9ðDk  Ak Þ is the Laplacian matrix of Gk . It is obvious that W k is a nonnegative stochastic matrix, i.e., W k 1 ¼ 1, but is not doubly stochastic, because 1T W k a1T . From the random broadcast behavior with possible collisions discussed above, we can write out the concrete probability of each instance of Ak by rows: probðRki’j Þ ¼ probðrowi ðAk Þ ¼ eTj Þ ¼ pð1  pÞdi ;

ð9Þ

¼ eTi

ɛ ∑

jANi

ðeTi

 eTj Þ;

ð11Þ

where N i is the index set of all neighbors of node i. It follows (11) that W ¼ I  ɛL , which relates the timevarying updating matrix to the Laplacian matrix L of the time-invariant G. W is nonnegative due to all W k are nonnegative. The doubly stochastic property of W comes from the facts that L1 ¼ 0 and 1T L ¼ 0T for the Laplacian matrix of an undirected graph. The third result of this lemma can be derived from the analog analysis on the Perron matrix in [15], to which we refer in detail. □ Various existing random consensus algorithms can be rewritten in the form as (7). Some examples have been given in [18]. A useful extension of (7) is with i.i.d. link failure probability τ, where the updating matrix in mean will be W ¼ I  τɛL. The following proposition provides a weak convergence result of the dynamic system (7), which can be analyzed from Lemma 1. A stronger convergence result will be given in Theorem 1. Proposition 1 (Convergence in mean, Aysal et al. [19]). Let assumptions (A1)–(A3) hold, we have E½ lim zk  ¼ 1z 0 ; k-1

where z 0 ¼ 1T z0 =N:

216

H. Feng et al. / Signal Processing 98 (2014) 212–223

From the eigen-decomposition view to W , we have that E½zk -c1 and the decaying rate of E½zk  depends on the second largest eigenvalue λ2 ðW Þ ¼ 1 ɛλN  1 ðLÞ. Since the eigenvalues of L only depend on the topology, we can choose a suitable ɛ to let λ2 ðW Þ as small as possible. Recall that ɛ ¼ pð1  pÞq , we suggest that the choice of q should be slightly larger than maxi di , but not too much. Corollary 1. The random sequence fz k g constitutes a martingale as E½z k þ 1 ∣z k  ¼ z k : Proof. E½z k þ 1 jz k  ¼ E½1T zk þ 1 =Njz k  ¼ E½1T W k zk =N∣z k  ¼ 1T T k

k

W z =N ¼ 1 z =N ¼ z , where we use the fact of Lemma 1 in the fourth equality. □ k

1T E½zk  ¼ 1T z0 for any k, since 1T W ¼ 1T , which gives the UC result of C-RBG. □ To study the mean square convergence of the algorithm, we define the deviation vector as ξk ¼ zk  1z k :

ð12Þ

There are two convergence factors defined in existing literatures. Definition 3 (Asymptotic Mean Square Convergence Factor). The exponential rate of convergence is defined as Ra ¼ supξk a 0 limk-1 ðE½ J ξk J 22 jξk =J ξk J 22 Þ1=k . Definition 4 (Per-Step Mean Square Convergence Factor). The exponential rate of convergence is defined as Rs ¼ supξk a 0 E½ J ξk þ 1 J 22 = J ξk J 22 .

Now we want to evaluate the possibility of z k deviating from the initial average z 0 . Proposition 2 gives an upper bound of the deviation in a probabilistic view. Before the formal statement, we need a lemma to analyze the bound of all iteration values.

Since Lemma 1 in [36] claims that Ra rRs , we will only consider Rs in the following discussion. From (7) and (12), we have ξk þ 1 ¼ W k zk JW k zk ¼ Bk zk , where J ¼ ð1=NÞ11T and Bk ¼ ðI JÞW k . Note that for any stochastic matrix W k , there is Bk 1 ¼ ðI JÞW k 1 ¼ ðI  JÞ 1 ¼ 0. It follows that

Lemma 2. Let assumptions (A1) and (A2) hold. For any k, we have J zk J 1 r J z0 J 1 .

ξk þ 1 ¼ Bk zk  Bk 1z k ¼ Bk ξk :

Proof. From (1), all elements of zk are linear mixing of the elements of zk  1 , which implies that maxi jzki j rmaxi jzki  1 j, i.e., J zk J 1 r J z0 J 1 . Backward to slot 0, we prove the lemma. □

The following lemma depicts several important properties of Bk .

Proposition 2. Let assumptions (A1) and (A2) hold. Given any positive constant δ, we have ! δ2 k 0 probðjz  z j ZδÞ rexp : 8kðγ max J z0 J 1 Þ2

 1   1 T kþ1 1 z ∑ zk ¼ γ i zki þ γ i zkj ; N N ði;jÞ A Ak

where ði; jÞ A Ak denotes the node index of all successful transmission pairs in slot k. Since j γ i zki þ γ i zkj j r2γ i maxðjzki j; jzkj jÞ, we have jz k þ 1  z k j r2γ max J z0 J 1 ; where γ max ¼ maxi fγ i g, and Lemma 2 is used. Apply the Azuma–Hoeffding inequality [35] to the martingale defined in Corollary 1, we complete the proof as probðjz k  z 0 j ZδÞ rexp

 δ2 2kð2γ max J z0 J 1 Þ2

! :

(1) Rs ¼ λ1jS ðE½ðBk ÞT Bk Þ; (2) λ1jS ðE½ðBk ÞT Bk Þ o1.

3.2. DGD-RBG

Proof. From the iteration formula (1), we have z k þ 1 z k ¼

Lemma 3 (Aysal et al. [19] and Nedic [34]). Let assumptions (A1)–(A3) hold, and λ1jS ðE½ðBk ÞT Bk Þ be the spectral radius of E½ðBk ÞT Bk  in the subspace S ¼ fz A RN j1T z ¼ 0g, we have



As other gradient-like algorithms, the stepsize rule of iterations is crucial to the convergence of DGD-RBG. The following lemma gives a bound of the stepsize defined as (5). It is worthy to note that the bound is not the same as that in [34], since the transmission behavior is different. Lemma 4. Let assumptions (A1) and (A2) hold. For any ~ there exists a constant K, such that for positive integer k, ~ ~ k Z K, 1=ððk þ kÞηmax Þ r αki r 1=ððk  kÞη min Þ a.s., where ηmin ¼ mini di pð1  pÞdi , and ηmax ¼ maxi di pð1  pÞdi . Proof. Apply the strong law of large numbers to (5), there is lim ∑IcðU ki Þ=k ¼ lim uki =k ¼ di pð1  pÞdi a:s::

k-1 k

Theorem 1. Let assumptions (A1)–(A3) hold, C-RBG will achieve UC. Proof. For each W k , the diagonal entry wkii 4 0. Since W ¼ I  ɛL, the underlying graph of W is strongly connected. From Corollary 3.2 in [18], we have that the dynamic system (7) will achieve PC. Further, we have

k-1

Thus, for any positive δ, there exists a constant K, such that for k ZK: juki kdi pð1  pÞdi j rδ: ~ pð1 pÞdi 1, we complete the proof by simple Let δ ¼ kd i substitutions. □

H. Feng et al. / Signal Processing 98 (2014) 212–223

217

For each component m, we have

The following lemma gives the estimation of the eigenvalues of E½ðW k ÞT W k , which will be used in the proof of Lemma 6.

E½ J W k zkm  1ym J 22  ¼ E½ JW k zkm J 22  2E½ðW k zkm ÞT 1ym  þ J 1ym J 22

r λ1 ðE½ðW k ÞT W k Þ J zkm J 22  2ðzkm ÞT W 1ym þ J 1ym J 22 Lemma 5. Let Assumptions (A1) and (A2) hold, we have λ1 ðE½ðW k ÞT W k Þ r 1.

By applying Lemmas 1 and 5, we have

M

m¼1

Reordering the sum by component index m, we complete the proof. □ The following lemma is the fundamental tool to analyze the convergence of the stochastic sequence generated by DGD-RBG.

Proof. Expand the variance by the sum of components, we have N

Lemma 7 (Prop. 4.2 in Bertsekas and Tsitsiklis [37]: Supermartingale convergence). Let ðΩ; F ; PÞ be a probability space and let ðF k  F k þ 1 be a filtration of sub s-fields of F . Let fuk g, fr k g, and fwk g be three nonnegative sequences of random variables. Assume that ∑1 and k ¼ 0 wk o1 E½uk þ 1 jF k  r uk  r k þ wk hold a.s. Then the sequence fuk g converges to a nonnegative random variable and ∑1 k ¼ 0 r k o 1 a.s.

M

∑ E½ J vki  y J 22  ¼ ∑

i¼1

M

m¼1

¼ ∑ J zkm  1ym J 22 :

Lemma 6. Let Assumptions (A1) and (A2) hold, the variance of all nodes after updating as (3) will not expand, i.e., 2 2 M k N k ∑N i ¼ 1 E½ J v i y J 2  r ∑i ¼ 1 J xi  y J 2 , for any y A R .

N

M

m¼1

∑ E½ JW k zkm  1ym J 22  r ∑ ð J zkm J 22  2ðzkm ÞT 1ym þ J 1ym J 22 Þ

Proof. Since 0 rγ i r1 in (2), W k is nonnegative, so is E½ðW k ÞT W k . Since W k is stochastic, we have E½ðW k ÞT W k 1 ¼ E½ðW k ÞT 1 ¼ ðW ÞT 1 ¼ 1. By applying the Gershgorin circle theorem to E½ðW k ÞT W k , we claim that all eigenvalues of E½ðW k ÞT W k  lie in the disk ⋃i jλ  diagi ðE½ðW k ÞT W k Þj r ∑j a i ðE½ðW k ÞT W k Þij , which completes the proof. □

∑ E½ J vki;m  ym J 22 

i¼1m¼1 M

¼ ∑ E½ J W k zkm  1ym J 22 ; m¼1

where ym is the m-th component of y.

1

1

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1

0

0 0

0.2

0.4

0.6

0.8

1

0

0.2

0.4

0.6

0.8

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0

0.2

0.4

0.6

0.8

1

Fig. 2. Three kinds of network: (a) random geometric network. (b) tree network. (c) cycle network.

1

218

H. Feng et al. / Signal Processing 98 (2014) 212–223

Proposition 3 claims that the stochastic sequences fxki g and fvki g of all nodes will reach the consensus of the average value in slot k. Theorem 2 claims that the consensus result lies in the optimal solution set of the problem (P1). The proofs of two propositions (see the Appendices) are analog to [34], while details are different due to different broadcast behaviors between two papers.

with 5 levels. In the cycle network, the nodes are connected one by one as a ring. The simulation results include two parts. In the consensus part, we investigate the average bias besides the consensus results. In the distributed optimization part, we validate that our algorithm can reach a consensus of some optimal solution.

Proposition 3. Let assumptions (A1)–(A4) hold, for DGDRBG, we have

4.1. C-RBG

(1) (2)

limk-1 J xki x k J 22 limk-1 J vki x k J 22

¼ 0 a.s., ¼ 0 a.s.

Theorem 2. Let assumptions (A1)–(A4) hold, and assume X n of (P1) is non-empty, the stochastic sequences fxki g generated by DGD-RBG converge to some xn A X n a.s. 4. Simulations We validate our algorithms in various connected undirected networks, three of which are a random geometric network, a tree network, and a cycle network as shown in Fig. 2. Each topology contains 50 nodes. In the random geometric network, the nodes are uniformly distributed in a unit square. There is an edge linking any pair of nodes within the transmission radius, which is set to be 0.08. In the tree network, the nodes form a hierarchy topology

For the convenience of visualization, the dimension of the nodes values is set to be 1, i.e., xki A R; i ¼ 1; 2; …; 50. The initial value for each node is randomly generated with a uniform distribution within ½0; 1. We compare C-RBG with two state-of-the-art broadcast gossip algorithms for consensus. One is the broadcast gossip algorithm (BGA) [19], where only one node broadcasts in each slot, so that there is no collision problem. The other is the collision broadcast gossip algorithm (CBGA) [21], where each node broadcasts with probability p in each slot. In both algorithms each node updates its value by (1) with a fixed mixing coefficient γ. We choose three different values 0.2, 0.5 and 0.8 of the mixing coefficient in BGA and CBGA to evaluate the performance of the algorithms. We first present the convergence results of C-RBG, BGA and CBGA. For C-RBG, we set q ¼ maxi di þ 0:001 for the random geometric network, and q ¼ maxi di þ5 for the tree network and the cycle network. In all these cases, we set p¼ 0.07 both for C-RBG and CBGA, so that the

10−1

Variance of Nodes

Variance of Nodes

10−1

10−2

10−3

10−4

10−2

10−3

10−4 0

50

100

150

200

250

0

300

200

400

600

800

1000

1200

1400

Number of Iterations

Number of Iterations

Variance of Nodes

10−1

10−2

10−3

10−4 0

500

1000

1500

2000

Number of Iterations Fig. 3. Average variance of nodes versus time slots: (a) random geometric network, (b) tree network and (c) cycle network.

H. Feng et al. / Signal Processing 98 (2014) 212–223

219

cycle network. The reason is that the cycle network is a kind of regular graph. From (2), (9) and (10), it can be seen that both the collision probability and the mixing coefficient are identical for all nodes. Thus, CBGA become equivalent to C-RBG in networks with the topology of a regular graph, where both CBGA and C-RGB can achieve unbiased consensus. In the random geometric network and the tree network, the degrees of all nodes are not the same. The collision probabilities for all nodes are not identical either. However, CBGA applies the same mixing coefficient to all nodes, which results in the bias of CBGA especially, when the broadcast probability becomes larger. For C-RBG, the bias is very small regardless of network topology and broadcast probability, which validates the unbiasedness of C-RBG in various topologies.

communicational complexity is expectedly comparable. The divergence of the whole network in slot k can be measured by the average node variance as Var k ¼ k 2 k N  1 ∑N i ¼ 1 ðxi  x Þ . Fig. 3 shows the evolution curve of the variance versus slots in different networks, where the y-axis is the average variance of 1000 simulation instances from independent random initial values. Fig. 1 shows that C-RBG converges in almost the same rate with CBGA. Both C-RBG and CBGA converge exceedingly faster than BGA in late iterations, because there may be multiple transmissions in single slot for C-RBG and CBGA. Fig. 4 shows the average numbers of slot needed to reach a certain variance s ¼ 10  3 for CBGA and C-RBG with various broadcast probabilities, which is also averaged from 1000 simulations. We notice that there is an optimal broadcast probability to minimize the iteration slots for C-RBG. With too small p, the gain concurrence is not significant. With too large p, severe collisions potentially degrade the performance gain of the algorithms. Besides the speed of convergence, we also concern about the bias of the consensus results from the initial average. The bias at the k-th slot is defined as Biask ¼ ∑i xki =N  x 0 . The y-axis of Fig. 5 is the absolute value of the bias at the slot k ¼500 averaged over 104 times simulations from the same initial values. Evidently, in Fig. 5, the consensus results of CBGA are biased, except that in the

4.2. DGD-RBG As mentioned in introduction, DGD-BGA in [34] is the closest work to our algorithm, which is chosen for comparison in this subsection. We use DGD-RBG and DGD-BGA to solve the same following distributed convex optimization problem from [34]: N

min f ðxÞ ¼ ∑ f i ðxÞ;

x A \ iXi

i¼1

2200

700

2000 1800

Iteration Number

Iteration Number

600 500 400 300

1600 1400 1200 1000 800 600

200

400 100 0.02 0.04 0.06 0.08

200 0.1

0.12 0.14 0.16 0.18

0.2

0.05

0.1

0.15

0.2

0.25

0.3

Broadcast Probability

Broadcast Probability 5000 4500

Iteration Number

4000 3500 3000 2500 2000 1500 1000 500 0.05

0.1

0.15

0.2

0.25

0.3

Broadcast Probability Fig. 4. Average numbers of iterations needed to reach the variance s ¼ 10  3 : (a) random geometric network, (b) tree network, (c) cycle network.

220

H. Feng et al. / Signal Processing 98 (2014) 212–223

a

b 0.015

0.025

Absolute Average Bias

Absolute Average Bias

0.02

0.015

0.01

0.01

0.005

0.005

0 0.02

0 0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

0.02 0.04 0.06 0.08

Absolute Average Bias

c

0.1

0.12 0.14 0.16 0.18

0.2

Broadcast Probability

Broadcast Probability 0.015

0.01

0.005

0 0.02 0.04 0.06 0.08

0.1

0.12 0.14 0.16 0.18

0.2

Broadcast Probability Fig. 5. Absolute average bias of nodes versus broadcast probability p: (a) random geometric network, (b) tree network and (c) cycle network.

1 Optimal Value

0.8

Value of Each Node

0.6 0.4 0.2 0 −0.2 −0.4 −0.6 −0.8 −1 0

100

200

300

400

500

600

Number of Iterations

0.1 in all the networks. Fig. 6 shows an example of the evolution process of DGD-RBG in the random geometric network. It illustrates how the values fluctuate and tend towards a consensus state around the global optimal solution. We compare the convergence rate of DGD-RBG with DGD-BGA [34]. The stepsize for DGD-BGA is set to be the same as that for DGD-RBG as in Eq. (5). Define the target n k n function error as Err k ¼ N  1 ∑N i ¼ 1 ½f ðxi Þ  f ðx Þ, where x denotes a global optimal solution of this convex optimization problem. The results are presented in Fig. 7, where the y-axis is the target function error averaged over 1000 simulation instances. It can be seen that DGD-RBG converges towards xn much faster than DGD-BGA in all the networks.

Fig. 6. An evolution instance of all 50 nodes values in the random geometric network for DGD-RBG.

5. Conclusions

where x A R; X i ¼ ½  1; 1 and f i ðxÞ ¼ s2i x2  si x. fsi g are independent random variables selected from ½  1; 1. The initial values x0i are all set to be zero. We set q ¼ maxi di þ0:001 in the random geometric network, and q ¼ maxi di þ 5 in the tree network and the cycle network, p is set to be

We have presented two distributed algorithms, C-RBG and DGD-RBG, via the random broadcast gossip, which explore the inherent broadcast property of wireless networks. Concurrent multiple transmissions in networks accelerate the convergence of both algorithms. Meanwhile, separating mixing coefficients of each node keeps the

H. Feng et al. / Signal Processing 98 (2014) 212–223

101

Target Function Error

Target Function Error

101

100

10−1

221

100

10−1 0

50

100

150

200

0

200

Number of Iterations

400

600

800

1000

Number of Iterations

Target Function Error

101

100

10−1 0

500

1000

1500

Number of Iterations Fig. 7. Average target function errors versus time slots: (a) random geometric network, (b) tree network and (c) cycle network.

average invariant in mean, which is crucial to the correction of DGD-RBG. We expect that our work could attract more interests on broadcast-based signal processing in-network. There are a series of points worthy of further discussions in theory. Link failure or noise can be discussed as in [18,3]. Possible transmission delay is also common in wireless networks, which will substantially affect the network dynamics [17,11]. We would also like to investigate the performance of more stepsize rules, such as the fixed stepsize [34] and adaptive stepsize rules. The energy consumption can be evaluated more thoroughly in various topologies. Besides the subgradient descent, we hope to introduce more efficient local calculation methods to accelerate the global convergence rate.

Appendix A. Proof of Proposition 3

Proof. Rewrite (3) and (4) in column style with the explicit column index m: zkmþ 1 ¼ W k zkm þskm : Here skm can be explicitly written by components as ( ðP X i ðvki  αki g ki Þ  vki Þm ; i A RxðkÞ k si;m ¼ 0; i2 = RxðkÞ where Rx(k) is the index set of the successfully updating nodes in slot k. Recall that ξkm ¼ zkm 1z km , we have ξkmþ 1 ¼ Bk ξkm þMskm , where M ¼ ðI  JÞ. It follows that E½ J ξkmþ 1 J 2  rE½ J Bk ξkm J 2  þ E½ J Mskm J 2 :

Acknowledgments We thank the anonymous reviewers for their comments to improve the quality of this paper. We also thank Mr. Yong Nie for his professional work on LaTeX typesetting for this paper. The work was supported in part by the NSTMP of China (2012ZX03001007-003, 2013ZX03003006-003), the NSF of China (60972024), and the Ph.D. Programs Foundation of China (20120071110028).

ðA:1Þ

Since J M J 2 r 1, by (A.4) and Lemma 4, for any sufficiently large k, we have h i h i  1=2 E J Mskm J 2 r E J skm J 2 r ∑i A RxðkÞ ðαki g ki;m Þ2 r

NC ~ ðk  kÞη min

;

where g ki;m is the m-th component of g ki .

ðA:2Þ

222

H. Feng et al. / Signal Processing 98 (2014) 212–223

For any ξkm A S, there is

that with probability 1,

pffiffiffiffiffiffiffiffi E½ J Bk ξkm J 2  r ðE½ J Bk ξkm J 22 Þ1=2 r λ1jS J ξkm J 2 ;

ðA:3Þ

max

where λ1jS denotes λ1jS ðE½ðBk ÞT Bk Þ for brevity. Insert (A.2) and (A.3) into (A.1), we have h i pffiffiffiffiffiffiffiffi NC E J ξkmþ 1 J 2 r λ1jS J ξkm J 2 þ : ~ ðk  kÞη min Then we can construct a supermartingale as follows: pffiffiffiffiffiffiffiffi h i 1 1  λ1jS k 1 NC ; J ξm J 2 þ E J ξkmþ 1 J 2 r J ξkm J 2  ~ kþ1 kþ1 k kðk  kÞη min where k Z K as required in Lemma 4. ~ Since ∑k NC=ðkðk  kÞη min Þ o1 , it follows Lemma 7 that pffiffiffiffiffiffiffiffi 1  λ1jS k J ξm J 2 o 1; ∑k k þ1 which implies that with probability 1, lim J ξkm J 2 ¼ 0:

ðA:4Þ

k-1

The result of (A.4) implies that limk-1 zkm ¼ c1 a.s., which holds for any component index m, so part (1) is proved. The proof of part (2) is trivial since limk-1 vkm ¼ limk-1 W k zkm ¼ zkm . □ Appendix B. Proof of Theorem 2

Proof. Without loss of generality, we consider node i, for any xn A X n D X i , J xki þ 1 xn J 22 ¼ J P X i ðvki  αki g ki Þ  xn J 22 r J vki  αki g ki  xn J 22 ; where we use the projection property onto a convex set (Prop. 2.2.1 in [38]). By the subgradient property (Prop. 4.4.2 in [34]), we have ðg ki ÞT ðvki  xn Þ Zf i ðx k Þ f i ðxn Þ  C J vki x k J 22 : It follows that J xki þ 1 xn J 22 r J vki  xn J 22 þ ðαki CÞ2 2αki ðf i ðvki Þ  f i ðxn ÞÞ þ2αki C J vki  x k J 22 : Summarizing all nodes' data and take expectation, for sufficiently large k, we have h i E ∑i J xki þ 1 xn J 22 jxk h i rE ∑i J vki  xn J 22 þ ðαki CÞ2   h i    2αki ∑i f i ðvki f i xn þ 2αki C∑i E J vki  x k J 22 r∑i J xki  xn J 2

f ðx k Þ  f ðxn Þ ~ ðk þ kÞη max

þ

NC ~ ððk  kÞη

min Þ

2

þ2

f ðx k Þ  f ðxn Þ o1; ~ k ¼ 0 ðk þ kÞη 1



CNE½ J vki  x k J 22  ; ~ ðk  kÞη min

where we use the fact of Lemmas 4 and 6. k 2 k ~ From Lemma 7, we have ∑1 k ¼ 0 E½ J v i x J 2 =ððk  kÞηmin Þ 2 ~ o 1. Combined with ∑1 NC=ððk  kÞη Þ o1, it claims min k¼0

which implies that limk-1 f ðx k Þ ¼ f ðxn Þ , and the sequence  k ∑i J xi xn J 22 will converge to a nonnegative random variables a.s. By Proposition 3, we have limk-1 f ðxki Þ ¼ f ðxn Þ , which implies that the sequence fxki g will converge to the optimal set X n . Since the convergence of f∑i J xki  xn J 22 g holds for any xn A X n , it follows that fxki g generated by DGDRBG must converge to some xn of the problem (P1) a.s. □ References [1] I.F. Akyildiz, W. Su, Y. Sankarasubramaniam, E. Cayirci, Wireless sensor networks: a survey, Comput. Netw. 38 (4) (2002) 393–422. [2] A. Giridhar, P. Kumar, Toward a theory of in-network computation in wireless sensor networks, IEEE Commun. Mag. 44 (2006) 98–107. [3] I. Schizas, A. Ribeiro, G. Giannakis, Consensus in ad hoc WSNS with noisy links–part i: distributed estimation of deterministic signals, IEEE Trans. Signal Process. 56 (2008) 350–364. [4] I. Schizas, G. Giannakis, S. Roumeliotis, A. Ribeiro, Consensus in ad hoc WSNS with noisy links–part ii: distributed estimation and smoothing of random signals, IEEE Trans. Signal Process. 56 (2008) 1650–1666. [5] G. Mateos, I. Schizas, G. Giannakis, Distributed recursive leastsquares for consensus-based in-network adaptive estimation, IEEE Trans. Signal Process. 57 (2009) 4583–4588. [6] I. Schizas, G. Mateos, G. Giannakis, Distributed LMS for consensusbased in-network adaptive processing, IEEE Trans. Signal Process. 57 (2009) 2365–2382. [7] F. Xi, J. He, Z. Liu, Adaptive fast consensus algorithm for distributed sensor fusion, Signal Process. 90 (5) (2010) 1693–1699. [8] P.A. Forero, A. Cano, G.B. Giannakis, Consensus-based distributed support vector machines, J. Mach. Learn. Res. 11 (2010) 1633–1707. [9] J.F.C. Mota, J.M.F. Xavier, P.M.Q. Aguiar, M. Puschel, Distributed basis pursuit, IEEE Trans. Signal Process. 60 (4) (2012) 1942–1956. [10] X. Wang, L. Cheng, Z.-Q. Cao, C. Zhou, M. Tan, Z.-G. Hou, Outputfeedback consensus control of linear multi-agent systems: a fixed topology, Int. J. Innov. Comput. Inf. Control 7 (5(A)) (2011) 2063–2074. [11] R. Yang, P. Shi, G.-P. Liu, H. Gao, Network-based feedback control for systems with mixed delays based on quantization and dropout compensation, Automatica 47 (12) (2011) 2805–2809. [12] Y. Wakasa, K. Tanaka, Y. Nishimura, Distributed output consensus via LMI-based model predictive control and dual decomposition, Int. J. Innov. Comput. Inf. Control 7 (10) (2011) 5801–5812. [13] X. Su, L. Wu, P. Shi, Sensor networks with random link failures: distributed filtering for t-s fuzzy systems, IEEE Trans. Ind. Inf. 9 (3) (2013) 1739–1750. [14] W. Ren, R.W. Beard, E.M. Atkins, A survey of consensus problems in multi-agent coordination, in: Proceedings of 2005 American Control Conference (ACC 2005), 2005, pp. 1859–1864. [15] R. Olfati-Saber, J.A. Fax, R.M. Murray, Consensus and cooperation in networked multi-agent systems, Proc. IEEE 95 (1) (2007) 215–233. [16] D. Shah, Gossip algorithms, Found. Trends Netw. 3 (1) (2009) 1–125. [17] R. Olfati-Saber, R.M. Murray, Consensus problems in networks of agents with switching topology and time-delays, IEEE Trans. Autom. Control 49 (9) (2004) 1520–1533. [18] F. Fagnani, S. Zampieri, Randomized consensus algorithms over large scale networks, IEEE J. Sel. Areas Commun. 26 (4) (2008) 634–649. [19] T.C. Aysal, M.E. Yildiz, A.D. Sarwate, A. Scaglione, Broadcast gossip algorithms for consensus, IEEE Trans. Signal Process 57 (7) (2009) 2748–2761. [20] S. Boyd, A. Ghosh, B. Prabhakar, D. Shah, Randomized gossip algorithms, IEEE Trans. Inf. Theory 52 (6) (2006) 2508–2530. [21] F. Fagnani, P. Frasca, Broadcast gossip averaging: interference and unbiasedness in large Abelian Cayley networks, IEEE J. Sel. Top. Signal Process. 5 (4) (2011) 866–875. [22] A. Nedic, D.P. Bertsekas, Incremental subgradient methods for nondifferentiable optimization, SIAM J. Optim. 12 (2001) 109–138. [23] T. Erseghe, D. Zennaro, E. Dall Anese, L. Vangelista, Fast consensus by the alternating direction multipliers method, IEEE Trans. Signal Process. 59 (11) (2011) 5523–5537.

H. Feng et al. / Signal Processing 98 (2014) 212–223

[24] S. Kumar, F. Zhao, D. Shepherd, Collaborative signal and information processing in microsensor networks, IEEE Signal Process. Mag. 19 (2) (2002) 13–14. [25] J.B. Predd, S.B. Kulkarni, H.V. Poor, Distributed learning in wireless sensor networks, IEEE Signal Process. Mag. 23 (2006) 56–69. [26] A. Nedic, A. Ozdaglar, Distributed subgradient methods for multiagent optimization, IEEE Trans. Autom. Control 54 (1) (2009) 48–61. [27] M. Rabbat, R. Nowak, Distributed optimization in sensor networks, in: Proceedings of IPSN, 2004, pp. 20–27. [28] Q. Zhao, A. Swami, L. Tong, The interplay between signal processing and networking in sensor networks, IEEE Signal Process. Mag. 23 (4) (2006) 84–93. [29] D.P. Bertsekas, J.N. Tsitsiklis, Parallel and Distributed Computation: Numerical Methods, Prentice Hall, NJ, 1989. [30] S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein, Distributed optimization and statistical learning via the alternating direction method of multipliers, Found. Trends Mach. Learn. 3 (1) (2011) 1–122. [31] E.S.H. Neto, A.R.D. Pierro, Incremental subgradients for constrained convex optimization: a unified framework and new methods, SIAM J. Optim. 20 (2009) 1547–1572.

223

[32] L. Li, J.A. Chambers, A new incremental affine projection-based adaptive algorithm for distributed networks, Signal Process. 88 (10) (2008) 2599–2603. [33] A. Nedic, A. Ozdaglar, P.A. Parrilo, Constrained consensus and optimization in multi-agent networks, IEEE Trans. Autom. Control 55 (4) (2010) 922–938. [34] A. Nedic, Asynchronous broadcast-based convex optimization over a network, IEEE Trans. Autom. Control 56 (6) (2011) 1337–1351. [35] M. Mitzenmacher, E. Upfal, Probability and Computing: Randomized Algorithms and Probabilistic Analysis, Cambridge University Press, 2005. [36] J. Zhou, Q. Wang, Convergence speed in distributed consensus over dynamically switching random networks, Automatica 45 (6) (2009) 1455–1461. [37] D.P. Bertsekas, J.N. Tsitsiklis, Neuro-Dynamic Programming, Athena Scientific, 1996. [38] D.P. Bertsekas, A. Nedi, A.E. Ozdaglar, Convex Analysis and Optimization, Athena Scientific, 2003.