Distributed subgradient algorithm for multi-agent optimization with directed communication topology

17th 17th IFAC IFAC Symposium Symposium on on System System Identification Identification 17th IFAC Symposium on SystemCenter Identification Beijing I...

Download PDF

569KB Sizes 1 Downloads 39 Views

Report

PDF Reader
Full Text

17th 17th IFAC IFAC Symposium Symposium on on System System Identification Identification 17th IFAC Symposium on SystemCenter Identification Beijing International Convention Beijing International Convention Center 17th IFAC Symposium on System Identification Available online at www.sciencedirect.com Beijing International Convention Center October 19-21, 2015. Beijing, China 17th IFAC Symposium on System Identification October 19-21, 2015. Beijing, China Beijing Center OctoberInternational 19-21, 2015. Convention Beijing, China Beijing Center OctoberInternational 19-21, 2015. Convention Beijing, China October 19-21, 2015. Beijing, China

ScienceDirect

IFAC-PapersOnLine 48-28 (2015) 863–868

Distributed subgradient algorithm for multi-agent optimization with diDistributed subgradient algorithm for multi-agent optimization with diDistributed subgradient algorithm for multi-agent optimization with rected communication topology Distributed subgradient algorithm for multi-agent optimization with didirected communication topology rected communication topology rected communication topology 1,2 1,2, Zengqiang 1,2,3 Yanhui YIN1,2 Zhongxin LIU LIU1,2 CHEN1,2,3 1,2,, Zhongxin 1,2, Zengqiang CHEN 1,2,3 Yanhui YIN Yanhui YIN1,2, Zhongxin LIU1,2, Zengqiang CHEN1,2,3

Yanhui YIN LIU1,2 , Zengqiang CHEN 1,2, Zhongxin 1,2,3 1. and Engineering, University, Tianjin 300071, 1. College College of of Computer Computer and Control Control Engineering, Nankai University, Tianjin 300071, China China Yanhui YIN , Zhongxin LIU Nankai , Zengqiang CHEN 1. College of Computer and Control Engineering, Nankai University, Tianjin 300071, China E-mail:Engineering, [email protected] E-mail: [email protected] 1. College of Computer and Control Nankai University, Tianjin 300071, China E-mail: [email protected] 1. College of Computer and Control Nankai University, Tianjin 300071, China E-mail:Engineering, [email protected] 2. Tianjin Tianjin Key Key Laboratory Laboratory of of Intelligent Intelligent Robotics, Nankai Nankai University, University, Tianjin, Tianjin, 300071, 300071, China China 2. Robotics, E-mail: [email protected] 2. Tianjin Key Laboratory of Intelligent Nankai University, Tianjin, 300071, China E-mail: Robotics, [email protected] E-mail: [email protected] 2. Tianjin Key Laboratory of Intelligent Robotics, Nankai University, Tianjin, 300071, China E-mail: Robotics, [email protected] 2. Tianjin Key Laboratory of Intelligent Nankai University, Tianjin, 300071, China E-mail: [email protected] 3. College College of of Science, Science, Civil CivilE-mail: Aviation University of of China, China, Tianjin Tianjin 300300, 300300, China China 3. Aviation University [email protected] 3. College of Science, Civil Aviation University of China, Tianjin 300300, China E-mail: [email protected] E-mail: [email protected] 3. College of Science, Civil Aviation University of China, Tianjin 300300, China E-mail: [email protected] 3. College of Science, Civil Aviation University of China, Tianjin 300300, China E-mail: [email protected] E-mail: [email protected] Abstract: This This paper paper studies studies aa distributed distributed subgradient subgradient algorithm in in directed directed graphs. graphs. In In contrast contrast to to previous previous work work the the problem problem Abstract: algorithm Abstract: This paper studies a distributed subgradient algorithm instochastic. directed graphs. In paper contrast to previous work the problem is considered when thestudies weighted adjacencysubgradient matrices are arealgorithm not doubly doubly First the the shows that an an work agreement can be be is considered when the weighted adjacency matrices not stochastic. First paper shows that agreement can Abstract: This paper a distributed in directed graphs. In contrast to previous the problem is considered when thestudies weighted adjacency matrices arealgorithm not function doubly stochastic. First the paperThen shows thatknowledge an work agreement canhobe Abstract: paper a distributed instochastic. directed In paper contrast to some previous theabout problem reached in This general directed graphs, but the thesubgradient global optimal optimal may not not graphs. beFirst minimized. some about reached in general directed graphs, but global function may be minimized. Then knowledge hois considered when the weighted adjacency matrices are not doubly the shows that an agreement can be reached in general directed graphs, but the global optimal function may not be minimized. Then some knowledge about hois considered when directed the weighted adjacency are not function doubly the paper shows thatknowledge anensure agreement can aabelmogeneous Markov chains isgraphs, used to analyze analyze the transition transition matricesstochastic. and not new update rule is is proposed to ensure that the lmogeneous Markov chains is used to the matrices and aa new update rule proposed to that the reached in general the matrices global optimal may beFirst minimized. Then some about mogeneous Markovto chains isgraphs, usedsettobut analyze the when transition matricesof and a new update ruleknown is proposed to ensure that the hoa lreached in general directed the optimal function may not be minimized. Then some knowledge about hogorithm converges the optimal optimal for the global case the topology topology graphs is update fixed and and to all agents. agents. Forthat switching gorithm converges to the settobut for the case when the of graphs is fixed known to all For switching mogeneous Markov chains is used analyze the transition matrices and a new rule is proposed to ensure the a lgorithm converges to the optimal set for the case when the topology of graphs isthe fixed and known to all agents. For switching mogeneous chains is used tofor analyze the transition matricesresults a and new update rule is proposed toThe ensure the a ltopology theMarkov paper to establishes theset relationship between thetopology optimal results and limit vector sequence. The paper provides topology the paper establishes the relationship between the optimal limit vector sequence. paper provides gorithm converges the optimal the case when the ofand graphs isthe fixed and known to all agents. Forthat switching topology the paper establishes the relationship between the optimal results and the limit vector sequence. The paper provides gorithm converges to the optimal set for the case when the topology of graphs is fixed and known to all agents. For switching explicit proof for the results and stimulation research validates the effectiveness. explicit proof for the results and research validates the effectiveness. topology the paper establishes thestimulation relationship between the optimal results and the limit vector sequence. The paper provides explicit proof for the results and stimulation research validates the effectiveness. topology the paper establishes thestimulation relationship between the optimal results and Markov the limitchains vector sequence. The paper provides Key Words: Distributed optimization, subgradient algorithm, directed graphs, explicit proof for(International the results and research validates the Hosting effectiveness. Key Words: Distributed optimization, subgradient algorithm, directed graphs, Markov Ltd. chains © 2015, IFAC Federation of Automatic Control) by Elsevier All rights reserved. Key Words: optimization, subgradient algorithm, graphs, Markov chains explicit proof Distributed for the results and stimulation research validatesdirected the effectiveness. Key Words: Distributed optimization, subgradient algorithm, directed graphs, Markov chains Key Words: Distributed optimization, subgradient algorithm, directed graphs, Markov chains  



1 Introduction 1 1 Introduction Introduction 1 Distributed Introduction optimization of of large-scale large-scale networks networks has has rereDistributed optimization 1 Distributed Introduction optimization of years large-scale networks has received much attention in recent due to its applications in 



Distributed optimization of years large-scale networks has received much attention in recent due its applications in ceived much attention in recent years due to toresource its applications in Distributed optimization large-scale networks has rea variety variety of problems problems such asofdistributed distributed allocation much attention in recent years due toresource its applications in aceived of such as allocation a[7], variety of problems such as distributed resource allocation ceived much attention in recent years due to its applications in distributed estimation [15], distributedresource congestion control a variety of problems such[15], as distributed allocation [7], distributed estimation distributed congestion control [7], distributed estimation [15], distributed congestion control aand variety of problems such as distributed resource allocation distributedestimation motion planning planning [5,9]. Compared Compared to control cen[7], distributed [15], distributed congestion and distributed motion [5,9]. to aaa cenand distributed motionmethod planning [5,9]. Compared cen[7], distributed estimation [15],where distributed congestion control tralized optimization a fusion fusion centerto responand distributed motionmethod planning [5,9]. Compared torespona centralized optimization where a center tralized optimization method where a fusion center responand motion planning [5,9]. Compared torespona cen-a sibledistributed for optimization collecting and computing information is needed, needed, tralized method where a fusion center sible for collecting and computing information is aa sible for optimization collecting and computing information is needed, tralized method where a fusion center respondecentralized one enjoys the advantages of offering more sible for collecting and computing information is needed, a decentralized one enjoys the advantages of offering more decentralized one enjoys the advantages of offering morea sible for collecting and computing information is protection needed, balanced communication load and better privacy decentralized one enjoys the advantages of offering more balanced communication load and better privacy protection balanced communication load andwe better privacy protection decentralized one enjoys the advantages of offering more [4,14]. aa more specific way, consider aa distributed balancedIn communication load andwe better privacy protection [4,14]. In more specific way, consider distributed [4,14]. In a more specific way, we consider a distributed balanced communication load and better privacy protection computation model in which the global objective function is [4,14]. In a model more specific way, we consider a function distributed computation in the global objective is computation model in which which the global objective function is [4,14]. In a more specific way, we consider a distributed taken as the sum of the agents’ individual objective functions, computation model in which the global objective function is taken as the sum of the agents’ individual objective functions, m of the agents’ individual objective functions, taken as the sum m computation model in the objective function is (( xxthe ))  ffi ((the xx)) which ffindividual xxglobal )) represents the cost funcwhere i.e., m of takenffas sum agents’ objective functions,  ( represents the cost funcwhere i.e., i(  1 i  i i im1of fas (agent xthe ) fi (the xx) agents’ findividual x) represents the functions, cost funcwhere i.e., n taken sum objective i( n is a  1 i decision vector. and tion of i , f ( x )  f ( x ) f ( x) represents the cost funci.e., vector. tion of agent ii,,imand x where n is a 1 i x afii (decision decision vector.thealgorithms and tion offaddress (agent x)   fi ( xx) where x) represents cost funci.e., na is To this number of distributed 1 problem, is a decision vector. algorithms tion ofaddress agent i,i and To this problem, a number of distributed n To address this problem, a number of distributed algorithms is a decision vector. and tion of agent i , x  have proposed and widely investigated. Most literature To been address this problem, a number of distributed algorithms have been proposed and widely investigated. Most literature have proposed and widely investigated. Most literature To been address this problem, a[7], number offocused distributed algorithms builds on the seminal work which on optimizing have been proposed and widely investigated. Most literature builds on the seminal work [7], which focused on optimizing builds on the seminal work [7], which focused on optimizing have been proposed and widely investigated. Most literature the global global objective function among different processors each builds on the seminal work [7], which focusedprocessors on optimizing the objective function among different each the global objective function among different processors each builds on the seminal work [7], which focused on optimizing of which which knew complete information about the the global objecobjecthe globalknew objective function among different processors each of complete information about global of which knew complete information about the global objecthe global objective function among different processors each tivewhich function. The authors information of the the paper paper [2] [2] improved the model model of knew complete about the global objective function. The authors of improved the tive function. The authors of the paper [2] improved the model of which knew complete information about the global objecby supposing thatauthors each processor processor knew only aboutthe itsmodel local tivesupposing function. The of the paper [2] only improved by that each knew about its local by supposing thatthe each processor knew about itssum local tive function.and The authors ofobjective the paper [2] only improved the model information global function was the of by supposing thatthe each processor knew only about itssum local information and global objective function was the of information and the global objective function was the sum of by supposing thatthe each processor knew only was about itssum local agents’ individual objective functions. A distributed distributed algorithm information and global objective function the of agents’ individual objective functions. A algorithm agents’ individual objective functions. A distributed algorithm information and the global objective function wassubgradient the sum of combining consensus algorithm techniques with agents’ individual objective functions. A distributed algorithm combining consensus algorithm techniques with subgradient combining consensus algorithm techniques with subgradient agents’ individual objective functions. A distributed algorithm methods was also presented in [2]. As continuations of the combining consensus algorithm techniques with subgradient methods was also presented in [2]. As continuations of the methods was also presented in [2]. As continuations ofwere the combining consensus algorithm techniques with subgradient work in [2], Communication delays and state constraints methods wasCommunication also presented delays in [2].and As state continuations ofwere the work in [2], constraints work in [2], Communication delays and state constraints were methods was also presented in16]. [2].and As state continuations ofwere the considered in [1, 3, 8, 11, 13, The work in [8] also rework in [2], Communication delays constraints considered in [1, 3, 8, 11, 13, 16]. The work in [8] also reconsidered in [1, 3, 8, 11, 13, 16]. Theadjacency workconstraints in [8] alsowere rework in [2], Communication delays and state vealed how to adjust the weights of the matrices to considered in [1, 3, 8, 11, 13, 16]. Theadjacency work in [8] also revealed how to adjust the weights of the matrices to vealed how to adjust the weights of the adjacency matrices to considered in [1, 3, 8, 11, 13, 16]. The work in [8] also respeed up convergence when the topology was fixed. The auvealed how to adjust the weights of the adjacency matrices to speed up convergence when the topology was fixed. The auspeed up convergence when the topology was fixed. The auvealed how to adjust the weights of the adjacency matrices to speed up convergence when the topology was fixed. The auspeed convergence the topology was fixed. The auThe up authors would like when to acknowledge the National Natural Science

thors of the paper [6] studied the distributed optimization thors the [6] studied the distributed optimization thors of of when the paper paper [6] studied the distributed optimization problem links of the agents failed according to a given thors of when the paper [6] studied the distributed optimization problem links of the agents failed according to problem when links of the agents failed according to aa given given thors of the paper [6] studied the distributed optimization stochastic process. In [10] the authors did not require that problem when linksIn of [10] the agents failed according to a given stochastic process. the authors did not require that stochastic process. In [10] the authors did not require that problem when links of the agents failed according to a given local functions or state constraint sets were convex. Besides, stochastic process. In [10] the authors did convex. not require that local functions or state constraint sets were Besides, local functions or state constraint sets were convex. Besides, stochastic process. In [10] the authors did not require that other algorithms been studied such as game-based local functions orhave statealso constraint sets were convex. Besides, other algorithms have also been studied such as game-based other algorithms have also been studied such as game-based local functions or state constraint sets were convex. Besides, methods [12]. other algorithms have also been studied such as game-based methods [12]. methods [12].mosthave other algorithms also literature been studied such asthat game-based However, existing assumpts the adjamethods [12]. However, most existing literature assumpts that the However, most existing literature assumpts that the adjaadjamethods [12]. cency matrices areexisting doubly stochastic, stochastic, which is is that difficult to be be However, most literature assumpts the adjacency matrices are doubly which difficult to cency matrices are doubly stochastic, which is difficult tothe be However, most existing literature assumpts that the adjaensured especially when thestochastic, graphs are arewhich directed. In fact fact as as cency matrices arewhen doubly is difficult tothe be ensured especially the graphs directed. In ensured especially when the graphs are directed. In fact as the cency matrices are doubly stochastic, which is difficult to be work in in [9] [9] shows the the model with fixed fixed delays is is equivalent equivalent to ensured especially when the graphs are directed. In fact as the work shows model with delays to work in [9] shows the model with fixed delays is equivalent to ensured especially when the graphs are directed. In fact as the a delay-free delay-free one via via an augmentation of the communication communication in [9] shows thean model with fixed of delays is equivalent to awork one augmentation the awork delay-free one via an augmentation of the communication in [9] shows the model with fixed delays is equivalent to Our one workviais is an to augmentation extend the the work work in [2] [2] to study study the the agraphs. delay-free of the communication graphs. Our work to extend in to graphs. Our work is to extend the work in [2] to study the agraphs. delay-free one via an augmentation of the communication algorithm in work more general situation. We[2]show show that the an Ourin is togeneral extend situation. the work in to study algorithm aaa more We that an algorithm situation. We[2]show that the an graphs. Ourin work is togeneral extend the under work directed in tocommunistudy agreement can bemore reached exactly algorithm in a be more general situation. We show that an agreement can reached exactly under directed communiagreement can be reached exactly under directed communialgorithm in more general situation. We show that an cation topology by choosing aa proper stepsize rule. For the agreement cana be reached exactly under directed communication topology by choosing stepsize rule. For cation topology byreached choosing a proper proper stepsize rule. For the the agreement can be exactly under directed communicase when the weights of the adjacency matrices are constant cationwhen topology by choosing a proper stepsize rule. For the case the of matrices are constant case when thetoweights weights of the the adjacency adjacency matrices are constant cation topology by choosing a proper stepsize rule. For rule the and known all agents, propose aa new update case when thetoweights of thewe adjacency matrices are constant and known all agents, we propose new update rule and known toweights all agents, we propose a newoptimal update rule case when the of the adjacency matrices are constant whereby the algorithm converges to the global soluand known to all agents, we propose a newoptimal update solurule whereby the algorithm converges to the whereby the switching algorithm converges toestablish the global global optimal soluand known to all agents, we propose a new update rule tion set. For topology we the relationship whereby the switching algorithm converges toestablish the global optimal solution set. For topology we the relationship tion set. For switching topology we establish theoptimal relationship whereby the algorithm converges to the global solubetween the optimal results and the limit vector of the trantion set. For switching topology we establish the relationship between the optimal results and the vector of between the switching optimal results and we thealimit limit vector of the the trantrantion set. For topology the relationship sition matrices. Our work provides good understanding of between the optimal results and theestablish limit vector of the transition matrices. Our work provides a good understanding of sition matrices. Our work provides a good understanding of between the optimal results and the limit vector of the tranthe performance of the distributed subgradient methods with sition matrices. Our work provides a good understanding of the performance of the distributed subgradient methods with the performance of the distributed subgradient methods with sition matrices. Our work provides a good understanding of directed topology. the performance of the distributed subgradient methods with directed topology. directed topology. the performance of the distributed subgradient methods with This paper is organized as follows: In Section 2, we give directed topology. This paper is as In 2, Thisbasic paper is organized organized as follows: follows: In Section Section 2, we we give give directed topology. some knowledge on convex convex analysis, graph theory theory and Thisbasic paper is organized as follows: In Section 2, we give some knowledge on analysis, graph and some basic knowledge on convex analysis, graph theory and This paper is organized as follows: In Section 2, we nonnegative matrix. Then Then we present the problem problem formulation some basic knowledge on we convex analysis, graphformulation theory give and nonnegative matrix. present the nonnegative matrix. Then we present the problem formulation some basic knowledge on convex analysis, graph theory and give give some some necessary assumptions. The mainformulation results and are nonnegative matrix. Then assumptions. we present theThe problem and necessary main results and give some necessary assumptions. main results are are nonnegative Then we present theThe problem formulation presented inmatrix. Section 3. We We make use use of the the properties of and give some necessary assumptions. The main results are presented in Section 3. make of properties of presented in Section 3. We make use of the properties of and give some necessary assumptions. The main results are homogeneous Markov 3. chains to study study theoftransition transition matrices presented in Section We make usethe the properties of homogeneous Markov chains to matrices homogeneous Markov chains to study the transition matrices presented in and Section Weupdate make usetheis oftransition the properties of in a new new way, way, then 3. achains new rule proposed to ensure homogeneous Markov to study matrices in a and then a new update rule is proposed to ensure in a new way, and then a new update rule is proposed to ensure homogeneous Markov chains to study the transition matrices thea optimal optimal result. Fora switching switching topology we establish establish the in new way,result. and then new updatetopology rule is proposed to ensure the For we the the optimal result. For topology we establish the in new way,between and then a switching new update rule isand proposed to ensure relationship the optimal results the limit vector thea optimal result. For switching topology we establish the relationship between the optimal results and the limit vector relationship between the optimal results and the limit vector the optimal result. For switching topology the sequence. We provide detailed proof and stimulation research. relationship between the optimal results andwe theestablish limit vector sequence. We provide detailed proof and stimulation research. sequence. Webetween provide detailed proof andinstimulation research. relationship the optimal results and the limit vector Finally, concluding remarks are given Section 4. sequence. We provide detailedare proof andinstimulation research. Finally, concluding remarks given Section Finally, concluding remarks given Section 4. 4. sequence. We provide detailedare proof andin Finally, concluding remarks are given instimulation Section 4. research. Finally, concluding remarks are given in Section 4.

The authors would like to acknowledge the National Natural Science The authors would likeNos.61174094, to acknowledge the National Science Foundation of China China (Grant 61273138), and the theNatural Tianjin Natural Natural Foundation of (Grant 61273138), and Tianjin The authors would likeNos.61174094, to acknowledge the National Science Foundation of China (Grant Nos.61174094, 61273138), and14JCYBJC18700). theNatural Tianjin Natural Science Foundation of China (Grant Nos.13JCYBJC17400, Science of(Grant China Nos.13JCYBJC17400, The Foundation authors would like(Grant to acknowledge the National Science Foundation of China 61273138), and14JCYBJC18700). theNatural Tianjin Natural Science Foundation of ChinaNos.61174094, (Grant Nos.13JCYBJC17400, 14JCYBJC18700). Foundation of©China 61273138), andof14JCYBJC18700). the Tianjin Natural Science Foundation of(Grant ChinaNos.61174094, (Grant Nos.13JCYBJC17400, 2405-8963 2015, IFAC (International Federation Automatic Control) Copyright © IFAC IFAC 2015 863Hosting by Elsevier Ltd. All rights reserved. Science Foundation of 2015 China (Grant Nos.13JCYBJC17400, 14JCYBJC18700). Copyright © 863 Peer review of International Federation of Automatic Copyright ©under IFAC responsibility 2015 863Control. Copyright © IFAC 2015 863 10.1016/j.ifacol.2015.12.238 Copyright © IFAC 2015 863

2015 IFAC SYSID 864 October 19-21, 2015. Beijing, China

Yanhui YIN et al. / IFAC-PapersOnLine 48-28 (2015) 863–868

Notions A vector x  n is viewed as a column vector, and x denotes its transpose. 1n denotes the n-dimensional vector with entries all equal to 1. For a matrix A, we use Ai j or aij to denote the matrix entry in the i-th row and j-th column. For two vectors x, y  n , the scalar product is denoted by xT y . We use ||  || to denote the standard Euclidean norm.

nonempty set. When agents communicate with each other, we assume there is an m-dimensional vector ai (k ) for agent i. The scalar aij (k ) stands for the weight for the information agent i receives from agent j directly during the time interval (tk , tk 1 ). The distributed subgradient algorithm can be described as the following relation [2]:

T

m

2

xi (k  1)   aij (k ) x j (k )   i (k )di (k ),

Preliminaries and problem formulation

(2)

j 1

In this section, we first introduce some preliminary knowledge on convex analysis, graph theory and nonnegative matrix, and then the problem formulation and some basic assumptions are given in the second subsection.

where  i (k )  0 is the stepsize at time tk used by agent i and di (k ) is a subgradient of fi ( x) at xi (k ). The first term on the right can be regarded as an average consensus computation and the second term stands for the subgradient method part. Let ai (k ) be the i-th column of a square m  m matrix A(k ). A(k ) is said to be the weighted adjacency matrix. We can use A(k ) to establish the relationship between xi ( s) and xi (k  1) for any s  k :

2.1 Preliminaries A set K is convex if for any 0    1 ,

 x1  (1   ) x2  K , x1 , x2  K.

m

A function f : n  is called a convex function when its domain dom( f ) is convex and the following relation holds:

xi (k  1)   [(k , s)]ij x j ( s) j 1

f ( x  (1   ) y)   f ( x)  (1   ) f ( y), x, y  dom( f )

k



j

(r  1)d j (r  1)

(3)

 i (k )di (k ), where the matrices (k , s)  A(s) A(s  1) A(k ). The paper [2] shows that the problem can be solved approximately when the following assumptions hold:

(1)

The set of all subgradients at x is denoted by f ( x ). In this paper a digraph is denoted by G  (V , E ), where V  {1, 2, , m} is the node set and E  V V is the edge set. We say node j is a neighbor of node i if ( j, i)  E. Specially we assume (i, i)  E for all i V . The set of neighbors of node i is denoted by N i . A directed path of G is an alternating sequence of nodes and edges, beginning and ending at a node. A digraph is called strongly connected if there exists a directed path between any two nodes of the node set. A stochastic vector is a vector whose components are nonnegative and their sum is equal to 1. A square matrix is a stochastic matrix when each row of it is a stochastic vector, and it is said to be doubly stochastic when its transpose is a stochastic matrix as well. 2.2

i j

r  s 1 j 1

for any 0    1. For a convex function f ( x) , we denote its subgradient at x  dom( f ) by s f ( x ), satisfying f ( x )  [s f ( x )]T ( x  x )  f ( x), x  dom( f ).

m

 [(k , r )] 

Assumption 1 There exists a scalar 0    1 for all i, j V and all k  0 satisfying (a) aii (k )  ; (b) aij (k )   if j is a neighbor of i during (tk , tk 1 ); (c) aij (k )  0 otherwise. Assumption 2 (a)



m j 1

aij (k )  1; (b)



m i 1

aij (k )  1.

Assumption 3 The graph G  (V , E ) is strongly connected. E is denoted as follows:

E  {( j, i) | ( j, i)  Ek for infinitely indices k}. Assumption 4 There exists an integer B such that for every B consecutive time slots,

( j, i)  Ek

Ek 1

Ek  B 1

for all ( j, i)  E and k  0 .

Problem formulation

We consider a network with m agents V  {1, 2, , m}, each of which has an individual cost function known by itself only. The global cost function is taken as the sum of all agents’ local cost functions. Every agent updates its information and communicates with its neighbors at discrete times t0 , t1 , . The information sate of agent i at time tk is denoted by xi (k ), which can be regarded as an estimate of the global decision vector. Our goal is that the estimates of agents can reach consensus and minimize the global function, i.e., the agents are supposed to solve the following problem in a distributed way:

Assumption 5 The subgradient sequence {d j (k )} is bounded for each j and all k  0, || d j (k ) || L. Let x j (0) denote the initial vector of agent j and there is a positive scalar H satisfying || x j (0) || H . Assumption 6 the stepsize is a fixed positive scalar  for all j  V and k  0, i.e.,  i (k )   , k  0 and j V . Next we will analyze the problem without assumption 2(b), which is not easy to be ensured based on local information for general graphs. Also in this paper we may adopt the following stepsize rule replacing assumption 6 to ensure exact results. Assumption 7 { i (k )} is a nonnegative sequence for any i V . (a)  j (k )   (k ), k  0, j V ;

m

minimize f ( x)   fi ( x), x 

n

,

i 1

where each fi : n  is a convex function. The optimal value of the problem is denoted by f * , which is assumed to be finite. The optimal solution set X * is denoted by X *  {x  n | f ( x)  f *}, which is assumed to be a

 (c)  (b)

864

 k 0  k 0

 j (k )  , j  V ; [ j (k )]2  , j V .

2015 IFAC SYSID October 19-21, 2015. Beijing, China

3

Yanhui YIN et al. / IFAC-PapersOnLine 48-28 (2015) 863–868

N (k )  max{|| xi (k )  x h (k ) ||| i, h  V }

Main results

k 1

 4mCH  k 1  4mCL  ( s  1)  k  s 1

In this section we will give our main results about the distributed subgradient algorithm with directed communication topology. First we show the estimates of all agents can reach consensus in general directed graphs. Then a new update rule based on some knowledge about Markov chains is proposed to ensure that the global function can be minimized with fixed topology. For switching topology we establish the relationship between the optimal results and the limit vector sequence. 3.1

s 1

 2 (k  1) L.

Let k   and from lemma 2 we have

lim N (k )  0.

k 

This result shows that agents will reach an agreement when k  , but they may not minimize the global function. We will discuss the optimization results in next two subsections.

Convergence results

We will review convergence results for the matrices (k , s) firstly.

3.2

Lemma 1 [2] Let assumptions 1, 3, 4 hold. We have: (a) The limit (s)  limk   (k , s ) exists. (b) (s)   (s)1T , where  (s)  m is a stochastic vector. (c) For all i  V , and k  s , we have:

Assumption 8 the weights of the graphs are constant and known to all agents, i.e., for all k  0 , A(k )  A. We will use some basic knowledge about Markov chains. Note that AT is a stochastic matrix whose entry a ij satisfies 0  aij  1. Let AT  P. P is a one-step state transition matrix of some finite homogeneous Markov chain. So aij  pij can be regarded as the probability from state i to state j at one step.

Lemma 2 [3] Let 0    1. { (k )} is a positive scalar se quence satisfying limk   (k )  0,  k 0  (k )   . Then (a) lim  0  k   ( )  0; k

(b)

  k 0

k

0

Global optimization results for fixed topology

In this subsection we propose a new update rule to ensure that the algorithm will minimize the global optimization function under the following assumption, in which we think the weights are easy to be computed in a distributed way when they are fixed.

(4) | [(k , s)]ij  i (s) | 2C  k  s ,  B0 B0 B0 1/ B0 where C  (1   ) / (1  ),   (1  ) . i ( s) is the i th component of  ( s), and B0  (m  1) B stands for the upper bound time slot for communication between any two agents of V.

k  

865

 k   ( )  .

Lemma 3 A homogeneous Markov chain is ergodic if there exists an integer n satisfying ( pij )( n)  0, i, j V , where V denotes the state space and ( pij )( n ) denotes the n-steps transfer probability from state i to state j. Stationary distribution   m ( limn ( pij )( n )   j , i, j V . ) satisfies

Next theorem will show that all agents’ estimates reach consensus in the limit. Theorem 1 Let assumptions 1, 2(a), 3, 4, 5, 7 hold. Sequence {xi (k )} is generated by the update rule (2), and N (k ) is denoted by N (k )  max{|| xi (k )  x h (k ) ||} . We have i , hV

lim N (k )  0.

PT    ;

k 

m



j

 1,

j 1

Proof From (3) we have

where  j  0 is the j th component of  .

m

|| xi (k )  x h (k ) ||||  x j (0)([(k  1, 0)]ij  [( k  1, 0)]hj )

Proof omitted due to space limitation. In a homogeneous Markov chain, we know that the n-steps state transition matrix is P n , which implies that in our model the transpose of transition matrices [(k , s)]T  ( AT )k  s 1 can be regarded as the (k−s+1)-steps state transition matrix corresponding to the one-step transition matrix P  AT . Next lemma is a basic property of (k , s) (see [2]).

j 1

k 1 m

 ([(k  1, s)]ij s 1 j 1

[(k  1, s)]hj ) ( s  1)d j ( s  1)  (k  1)(di (k  1)  d h (k  1)) || . By (4), we obtain that

Lemma 4 Let assumptions 1, 2(a), 3, 4 hold. We have [(s  B0  1, s)]ij   B0  0.

| [(k  1, 0)]ij  [( k  1, 0)]hj || [( k  1, 0)]ij   j (0)   j (0)  [(k  1, 0)]hj |

From lemma 3 and lemma 4 we know the Markov chain corresponding to AT is ergodic. Furthermore we have

| [(k  1, 0)]ij   j (0) |

 j (s)   j . s  0.

+ |  j (0)  [( k  1, 0)]hj |

We use a new update rule for agents as follows:

 4C  k 1 .

Note that || x j (0) || H , || d j (k ) || L, then

865

(5)

2015 IFAC SYSID 866 October 19-21, 2015. Beijing, China

Yanhui YIN et al. / IFAC-PapersOnLine 48-28 (2015) 863–868

A     m  i  1   i 1 .   i (k )   (k ) /  i  m  i i j i  x (k  1)   a j (k ) x (k )   (k )d i (k ) j 1 

 2mCH  2 k  2  2mCH  2 (k ) k 1

 2mC L   2 ( s  1)  k  s 1 s 1

(6)

k 1

 2mC L   2 (k )  k  s 1 s 1

  L (k  1)   L 2 (k ). 2

{ (k )} is a sequence just as shown in assumptions 7(b), 7(c).  is the stationary distribution of A. Since this update rule is different from that in subsection 3.1, we first prove that the convergence results still hold in this case. In fact, we have

Summing this relation from k  1 to k   , we obtain 



k 1

k 1

  (k )N (k )  2mCH  

k 1 s 1

j 1

k 1

s 1



  L   2 (k  1)   L   2 (k )

1 i i

k 1

i 1

k 1



 2mCH   2 k  2  (2mCH

m

xi (k  1)   aij (k ) x j (k )   i (k )d i (k ).

k 1

j 1



 2mCL (1/ (1   ))   L)  2 (k )

The form of the preceding equation is the same as (2), so theorem 1 still holds in our model. We will see our model can exactly converge to a global optimization point.

k 1

 k 1

 2mCL   ( s  1)  2

Lemma 5 Let assumptions 1, 2(a), 3, 4, 5, 7(b), 7(c), 8 hold. Sequence {xi (k )} is generated by the update rule (6), and N (k ) is denoted by N (k )  max{|| xi (k )  x h (k ) ||} . We then i , hV have

k 1 s 1

  L   2 (k  1) k 1 

Note that 0    1 ,  k 1 2 (k )   . So the first, second and fourth terms on the right are summable. By lemma 2, the  third term is summable. So we have  k 1 (k )N (k )  .

 (k )N (k )  . k 1

Theorem 2 Let assumptions 1, 2(a), 3, 4, 5, 7(b), 7(c), 8 hold. {xi (k )} is generated by distributed optimization algorithm (6).

Proof From (3), (6) we have m

xi (k  1)   [(k , s)]ij x j ( s)

There exists a point x*  X * satisfying

j 1

1 j

k  s 1





m

k 1



minimize   f ( x),





 2mC L   2 (k )  k  s 1

Let d i (k )   i1di (k ) be a subgradient of  i1 fi (k ). So (6) can be used to solve a problem as follows:

k

k 1

 2mC L   2 ( s  1)  k  s 1

m





 2mCH   2 (k )

 k 1

xi (k  1)   aij (k ) x j (k )   i1 (k )di (k ).

m

2k 2

lim || xi (k )  x* || 0 for all i V .

[(k , r )]ij  (r  1)d j (r  1)

k 

r  s 1 j 1

 i1 (k )di (k ),

  j 1 ([(k  1, s)]ij

Proof Our proof is divided into two parts. First we show {|| xi (k )  z* ||} is convergent for any z*  X * . Then we show that xi (k ) will converge to X * as k  . || xi (k )  z* ||2 || vi (k  1)  z* (7)  i1 (k  1)di (k  1) ||2 ,

[(k  1, s)] ) ( s  1)d j ( s  1)

where vi (k  1)   aij (k  1) x j (k  1). We then have

m

|| xi (k )  x h (k ) ||||  x j (0)([(k  1, 0)]ij  [( k  1, 0)]hj ) j 1

k 1 m

s 1 j 1

m

h j

j 1

 (k  1)( i1di (k  1)   h1d h (k  1)) || .

|| x (k )  z || || vi (k  1)  z * ||2  ||  i1 (k  1)di (k  1) ||2 i

By lemma 3  j 1 is bounded. Let   max{ j 1 | j V}. Then s 1

2

m

||  a ij (k  1)( xi (k  1)  z * ) ||2

k 1

N (k )  4mCH  k 1  4mC L  ( s  1)  k  s 1

*

j 1

 ||  i1 (k  1)di (k  1) ||2

(6)

Furthermore, let M (k )  max{|| xi (k )  z* ||2 | i V } , then

2 (k  1) L. By multiplying the relationship (6) with  (k ) , then  (k )N (k )  4mCH  k 1 (k )

M (k )  M (k  1)   2 L2 2 (k  1).

For any r  k  1 , we obtain

k 1

 4mC L  ( s  1) (k )  k  s 1

k 1

M (k )  M (r )   2 L2  2 ( s).

s 1

 2 L (k  1) (k )

s r

866

2015 IFAC SYSID October 19-21, 2015. Beijing, China

Yanhui YIN et al. / IFAC-PapersOnLine 48-28 (2015) 863–868

Let r  0, k  , by assumptions 5 and 7(c), we have M (k )  . Let r  , k  , we obtain

2 i1 (k  1)( fi ( x i (k  1))  fi ( z* )) 2 L i1 (k  1) || v i (k  1)  x i (k  1) ||  i2 L2 2 (k  1),

limsup M (k )  liminf M (r ). r 

k 

Consider (1), (5) and let  (k ) 

Because {|| M (k ) ||} is bounded, we can obtain {|| M (k ) ||} is



convergent. So {|| x (k )  z ||} is convergent. By (7), we have || xi (k )  z * ||2 || vi (k  1)  z * ||2  ||  i1 (k  1)di (k  1) ||2

k 

r 1

1 i

|| v (k  1)  z ||  ||   ( k  1)di (k  1) ||  2 L (k  1)

1 i

2

2

|| x (k )  z ||   a (k  1) || x (k  1)  z || 2

i j

*

i

k  m



2  (r  1)( f ( (r  1))  f ( z * ))  M (0) r 1



 lim M (k )  2 L   (r  1) || x j ( r  1))   ( r  1) ||

2

k 

1 i

2mL  (r  1)N (r  1)   (0)N (0) r 2

|| v (k  1)  x (k  1) || i

i



 m L2   2 (r  1)  2 L lim  (k  1) M (k  1)

 2 (k  1) i1 ( f i ( x i (k  1))  f i ( z * ))

r 1

k 

(8)

 2 L || v (k  1)  x (k  1) || i

  i1 L2 (k  1)) It is interesting to notice that (8) is quite similar with (2), so we can easily get the relationship between || xi (k )  z* ||2 and || xi (r )  z* ||2 for any 0  s  k  1 based on (3):

k 1

lim 2 (r  1)( f ( (r  1))  f ( z * ))  .

k 

r 1

Since f ( (r  1))  f ( z* )  0, by assumption 7(b) we obtain liminf ( f ( (r  1))  f ( z* ))  0.

m

|| x i (k )  z * ||2   [(k  1, s)]ij || x j ( s)  z * ||2

k 

j 1



k 

Because || x j (r  1))   (r  1) || N (r  1) and {M (k )}, {N (k )} are bounded, the first, second, seventh and eighth terms on the right are bounded. Considering assumption 7(c) and lemma 5, all the terms on the right are bounded. So

j 1

  i1 (k  1)(2( fi ( xi ( k  1))  f i ( z * ))

k 

2 L lim  (k  1)N (k  1)   2 L2 lim  2 (k  1).

m

|| xi (k )  z * ||2   a ij (k  1) || x i (k  1)  z * ||2

k 1

r 1



  i2 L2 2 (k  1)

i

k 

Note that f ( x)   i 1 fi ( x). we have

j 1

 2 L (k  1)

r 1

2 L lim  (k  1)N (k  1)   2 L2 lim  2 (k  1),

m

*

r 1

k 

i

 2 ( k  1) i1[di (k  1)]T ( x i (k  1)  z * ). From (1), we have [di (k  1)]T ( xi (k  1)  z* )  ( fi ( xi (k  1))  f ( z* )). Then i



2 L lim  (k  1) M (k  1)

|| v (k  1)  x (k  1) || i

j 1



2mL  (r  1)N (r  1)  m L2   2 (r  1)

 x (k  1)  z ) *

j 1

m

2  (r  1) ( f j ( x j (r  1))  f j ( (r  1))

*

i

m

r 1



 2 (k  1) i1[ di ( k  1)]T (vi ( k  1)  x i ( k  1) i

1 m x j (k ),  m j 1

lim M (k )  M (0)  2  (r  1) ( f j ( (r  1))  f j ( z * ))

*

i

867

(9)

Since { (k )} is bounded, { (k )} must have a limit point x*  X * . limk  N (k )  0 and {|| xi (k )  z* ||} is convergent, so {||  (k )  z* ||} is convergent. Let z*  x* , {||  (k )  x* ||} is convergent. In view of (9), we have limk  ||  (k )  x* || . limk  || xi (k )  x* || 0, i V . Theorem 2 shows our algorithm can exactly minimize the global function.

m

 [(k  1, r )]ij  j 1 (r  1)(2( f j ( x j (r  1))

r  s 1 j 1

 f j ( z * ))  2 L || v j (r  1)  x j (r  1) ||  j 1 L2 (r  1))  i1 (k  1)(2( fi ( xi (k  1))  f i ( z * )) 2 L || vi (k  1)  x i (k  1) ||  i1 L2 (k  1)).

Example We consider a graph with 5 nodes whose estimates are scalars. fi ( x)  ( x  ci )2 , i  1, 2,3, 4,5. ci is an arbitrary scalar. It’s easy to get the optimal solution of the problem 5 x*  ( i 1 ci ) / 5. {xi (k )} is generated by the update rule (6) and  (k )  0.1/ k , the weighted adjacent matrix A is

m

|| x i ( k )  z * ||2   [ ( k  1,0)]ij || x j (0)  z * ||2 j 1

k 1 m

2 [ ( k  1, r )]ij j 1 ( r  1)( f j ( x j ( r  1))  f j ( z * ))

 0.1   0.2  0.4   0  0.3 

r 1 j 1

k 1 m

2 L [ ( k  1, r )]ij  j 1 ( r  1) || v j ( r  1)  x j ( r  1) || r 1 j 1

k 1 m

 L2  [ ( k  1, r )]ij  j 2 2 ( r  1) r 1 j 1

867

0.3 0.3 0.3 0   0.4 0.2 0 0  0.1 0 0 0.3  .  0 0.5 0.2 0  0.2 0 0.5 0.7 

2015 IFAC SYSID 868 October 19-21, 2015. Beijing, China

Yanhui YIN et al. / IFAC-PapersOnLine 48-28 (2015) 863–868

achieve the optimization approximately, which we leave for future work.

Figure 1 shows our algorithm can perfectly converge to an optimal point in X * .

References

150 agent1,2,3,4,5 optimal value

[1]

100

estimates of agents

[2] 50

[3]

0

-50

[4]

-100

[5] -150

0

5

10 k

15

20

[6]

Fig. 1: The trajectories of the estimates of agents generated by (6).

3.3

[7]

Global optimization results for switching topology

[8]

In this subsection we briefly provide the relationship between the convergence results and the limit vector  j ( s) for general directed graphs.

[9]

Lemma 6 [13] Let assumptions 1, 2(a), 3, 4 hold. We have (a) limk  (k , s)ij   j ( s)1T ; (b)  j ( s)  0.

[10]

i

Theorem 3 Let assumptions 1, 2(a), 3, 4, 5 hold. {x (k )} is generated by (2). If  i (k )   (k ) /  j (k ) , where { (k )} is a sequence satisfying assumptions 7(b), 7(c), the system will converge to an point in X * .

[11]

[12]

Proof From lemma 6(b) we know {1/  j ( s)} is bounded. The rest of the proof is similar to that in theorem 2 by replacing  j by  j (k ) and omitted due to space limitation.

4

[13]

Conclusions

[14]

In this paper we studied the distributed subgradient method for the case that the weighted adjacency matrices are not doubly stochastic. For fixed topology we proposed a new update rule to ensure the optimization results. Our conclusion in general directed graphs showed that it was not easy to get exact results when the topology is arbitrary switching. An interesting idea is to estimate the behavior of { j (k )} to

[15]

[16]

868

A. Nedić, A. Ozdaglar, Convergence rate for consensus with delays, Journal of Global Optimization, 47(3): 437-456, 2010. A. Nedić, A. Ozdaglar, Distributed Subgradient Methods for Multi-Agent Optimization, IEEE Transactions on Automatic Control, 54(1): 48-61, 2009. A. Nedić, A. Ozdaglar, P. A. Parrilo, Constrained Consensus and Optimization in Multi-Agent Networks, IEEE Transactions on Automatic Control, 55(4): 922-938, 2010. F. Yan, S. Sundaram, S. V. N. Vishwanathan, Y. Qi, Distributed Autonomous Online Learning: Regrets and Intrinsic Privacy-Preserving Properties, IEEE Transactions on Knowledge and Data Engineering, 25(11): 2483-2493, 2013. G. G. Rigatos, Distributed gradient and particle swarm optimization for multi-robot motion planning, Robotica, 26(3): 357-370, 2008. I. Lobel, A. Ozdaglar, Distributed Subgradient Methods for Convex Optimization Over Random Networks, IEEE Transactions on Automatic Control, 56(6): 1291-1306, 2011. J. N. Tsitsiklis, Problems in Decentralized Decision Making and Computation, Ph.D. dissertation, MIT, Cambridge, MA, 1984. K. I. Tsianos, M. G. Rabbat, Distributed consensus and optimization under communication delays, 2011 49th Annual Allerton Conference on Communication, Control, and Computing, Allerton, pp. 974-982, 2011. L. Jiang, D. Shah, J. Shin, J. Walrand, Distributed Random Access Algorithm: Scheduling and Congestion Control, IEEE Transactions on Information Theory, 56(12): 6182-6207, 2010. M. Zhu, S. Martinez, An Approximate Dual Subgradient Algorithm for Multi-Agent Non-Convex Optimization, IEEE Transactions on Automatic Control, 58(6): 1534-1539, 2013. M. Zhu, S. Martinez, On Distributed Convex Optimization Under Inequality and Equality Constraints, IEEE Transactions on Automatic Control, 57(1): 151-164, 2012. N. Li, J. R. Marden, Designing Games for Distributed Optimization, IEEE Journal of Selected Topics in Signal Processing, 7(2): 230-242, 2013. P. Lin, R. Wei, Distributed subgradient projection algorithm for multi-agent optimization with nonidentical constraints and switching topologies, Proceedings of the IEEE Conference on Decision and Control, pp. 6813-6818, 2012. Q. Ling, Z. Wen, W. Yin, Decentralized jointly sparse optimization by reweighted ℓq minimization, IEEE Transactions on Signal Processing, 61(5): 1165-1170, 2013. S. S. Ram, A. Nedic, V. V. Veeravalli, Stochastic Incremental Gradient Descent for Estimation in Sensor Networks, Conference Record - Asilomar Conference on Signals, Systems and Computers, Paciﬁc Grove, pp. 582-586, 2007. Y. Lou, G. Shi, K. H. Johansson, Y. Hong, Approximate Projected Consensus for Convex Intersection Computation: Convergence Analysis and Critical Error Angle, IEEE Transactions on Automatic Control, 59(7): 1722-1736, 2014.

Distributed subgradient algorithm for multi-agent optimization with directed communication topology

Distributed subgradient algorithm for multi-agent optimization with directed communication topology

Recommend Documents