Distributed continuous-time algorithm for nonsmooth optimal consensus without sharing local decision variables

Distributed continuous-time algorithm for nonsmooth optimal consensus without sharing local decision variables

Distributed Continuous-time Algorithm for Nonsmooth Optimal Consensus without Sharing Local Decision Variables Journal Pre-proof Distributed Continu...

3MB Sizes 0 Downloads 8 Views

Distributed Continuous-time Algorithm for Nonsmooth Optimal Consensus without Sharing Local Decision Variables

Journal Pre-proof

Distributed Continuous-time Algorithm for Nonsmooth Optimal Consensus without Sharing Local Decision Variables Shu Liang, Le Yi Wang, George Yin PII: DOI: Reference:

S0016-0032(19)30941-X https://doi.org/10.1016/j.jfranklin.2019.12.028 FI 4345

To appear in:

Journal of the Franklin Institute

Received date: Revised date: Accepted date:

27 April 2019 17 December 2019 25 December 2019

Please cite this article as: Shu Liang, Le Yi Wang, George Yin, Distributed Continuous-time Algorithm for Nonsmooth Optimal Consensus without Sharing Local Decision Variables, Journal of the Franklin Institute (2019), doi: https://doi.org/10.1016/j.jfranklin.2019.12.028

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2019 Published by Elsevier Ltd on behalf of The Franklin Institute.

Distributed Continuous-time Algorithm for Nonsmooth Optimal Consensus without Sharing Local Decision Variables✩ Shu Lianga,b,∗, Le Yi Wangc , George Yind a Key Laboratory of Knowledge Automation for Industrial Processes of Ministry of Education, School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, PR China. b Institute of Artificial Intelligence, University of Science and Technology Beijing, Beijing 100083, PR China. c Department of Electrical and Computer Engineering, Wayne State University, Detroit, MI 48202, USA. d Department of Mathematics, Wayne State University, Detroit, MI 48202, USA.

Abstract A distributed continuous-time algorithm is proposed for constrained nonsmooth convex optimization. A distinct feature of our algorithm is that it does not require agents to share their local decision variables to the network, and it still achieves the optimal solution. With the help of Lagrangian functions, exact penalty techniques, differential inclusions with maximal monotone maps and saddle-point dynamics, we prove the convergence of the proposed algorithm and show that it achieves an O(1/t) convergence rate. Numerical example also illustrates the effectiveness of the proposed method. Keywords: Distributed optimization, nonsmooth convex optimization, continuous-time algorithm, saddle-point dynamics

✩ This research was supported in part by the U.S. Army Research Office under grant W911NF-19-1-0176, and in part by the National Natural Science Foundation of China under grants 61873024 and 61903027. ∗ Corresponding author Email addresses: [email protected] (Shu Liang), [email protected] (Le Yi Wang), [email protected] (George Yin)

Preprint submitted to Journal of the Franklin Institute

January 3, 2020

1. Introduction Control and optimization of large-scale networks and multi-agent systems have been actively studied in the field of engineering and applied mathematics. Design of distributed continuous-time algorithms becomes more and more popular and blurs the boundaries of the physical plant, information, and computation. Many distributed continuous-time algorithms are developed for optimal consensus and optimization problems, with various considerations such as intersections of convex sets [1, 2], weight-balanced digraphs [3, 4], second-order agents [5, 6], approximate projection and nonsmooth functions [7, 8]. Security and privacy of distributed algorithms have received increasing attention in recent years, partly because of the threats from adversary or malevolent machines and/or the potential leakage of sensitive data. In general, security and privacy have broad aspects from social science, economics and engineering technologies, referring to [9, 10]. Various methods for traditional centralized algorithms have addressed similar privacy concerns for distributed algorithms. For example, differential privacy is a technique that employs a randomized perturbation to trade off between the accuracy of the solution and the guaranteed level of privacy preservation. Differential privacy-based methods have been recently introduced in distributed optimization algorithms, see [11, 12] and the references therein. Another example is the encryption-based technique for secure multiparty computation, which can be adopted by a network system operator for distributed algorithms [13]. On the other hand, compared with the centralized algorithms, a distinct feature of distributed algorithms is that each local agent has insufficient data for the problem and must resort to information/data sharing to cooperatively accomplish the tasks. In this regard, privacy preservation for distributed algorithms is more challenging than centralized ones, and different distributed designs may result in different levels of privacy preservation. For example, a distributed subgradient algorithm with asynchronous heterogeneous stepsizes is developed in [14] to improve privacy preservation of the original distributed algorithm with synchronous homogeneous stepsizes. Moreover, some

2

gossip algorithm design is investigated in [15] to protect node privacy against external eavesdroppers with full information flow. In this paper, we design a distributed continuous-time algorithm for solving constrained minimization of a sum of nonsmooth convex functions, where the knowledge of them is scattered in a multi-agent network. Although this problem has been solved in the aforementioned works such as [8], we make further consideration. To be specific, the distributed algorithm in [8] requires the agents to share the local primal and dual variables, which may lead to the leakage of private data or information, see also [14, 15, 16]. In contrast, we present a novel distributed design that does not share those private variables and still achieves the optimal solution. The main contributions are as follows. First, we transform the optimal consensus problem into an equivalent and new optimization model by using an exact penalty technique and an auxiliary variable approach. Second, we present a distributed primal-dual gradient algorithm to solve the optimization and keep the local decision variable private. Third, we prove the ergodic-type convergence of our algorithm by considering saddle-point dynamics and using the theory of differential inclusion with maximal monotone map. The rest of the paper is organized as follows. Section 2 provides necessary preliminaries. Section 3 formulates the problem. Section 4 introduces the algorithm design, while Section 5 presents the convergence analysis. Section 6 gives simulation results. Finally, Section 7 concludes the paper with some further remarks. Notations: Denote by R the set of real numbers, and by h· , ·i and k · k the

inner product and Euclidean norm in Rn , respectively. Denote by | · | the `1

norm in Rn . Denote by ⊗ the Kronecker’s product. In is the identity matrix in Rn×n , and col(x1 , ..., xn ) is the column vector stacked by vectors x1 , .., xN . 2. Preliminaries In this section, we give some preliminary results related to convex analysis, differential inclusion, graph theory.

3

2.1. Convex analysis A set C ⊂ Rn is convex if λz1 + (1 − λ)z2 ∈ C for any z1 , z2 ∈ C and λ ∈ [0, 1]. For x ∈ C, the tangent cone to C at x, denoted by TC (x), is defined as  TC (x) , v ∈ Rn | ∃ xk ∈ C, tk > 0, xk → x, tk → 0

such that v = lim

k→∞

while the normal cone to C at x, denoted by NC (x), is defined as

xk − x , tk

NC (x) , {w ∈ Rn | hw, y − xi ≤ 0, ∀ y ∈ C}. A projection operator is defined as PC [z] , argmin kx − zk, x∈C

while the distance function with respect to C, denoted by dC (·), is defined as dC (z) , min kz − xk = kz − PC [z]k. x∈C

A function f : C → R is said to be convex if f (λz1 + (1 − λ)z2 ) ≤ λf (z1 ) + (1 − λ)f (z2 ) for any z1 , z2 ∈ C and λ ∈ [0, 1]. f is said to be proper if f (x) < ∞ for at least one point x ∈ C, and f (x) > −∞ for all x ∈ C. The subdifferential of f at x, denoted by ∂f (x), is the set defined as ∂f (x) , {g ∈ Rn | hg, x0 − xi ≤ f (x0 ) − f (x), ∀ x0 ∈ C}. In particular, the set-valued sign function Sgn(·) is the subdifferential of `1 norm, with each component as    {1} if y > 0    Sgn(y) = ∂|y| = {−1} if y < 0 .      [−1, 1] if y = 0

For more details on convex analysis, the reader is referred to [17]. 4

2.2. Differential inclusion A set-valued map F from Rn to Rn is a map associated with any x ∈ Rn a

subset F(x) of Rn . The graph of F, denoted by gph F, is defined as gph F , {(x, y) ∈ Rn × Rn | y ∈ F(x)}. The set-valued map F is said to be monotone if hx1 − x2 , y1 − y2 i ≥ 0,

∀ (xi , yi ) ∈ gph F, i = 1, 2.

Moreover, F is said to be maximal monotone if there is no other monotone set-value map F˜ whose graph contains strictly gph F.

A differential inclusion can be expressed as follows: x˙ ∈ F(x),

x(0) = x0 ,

(1)

where F is a set-valued map. A trajectory x(t) : [0, +∞) → Rn is said to be a solution to (1) if it is absolutely continuous and satisfies the inclusion for almost all t ∈ [0, +∞). In addition, a point x∗ is said to be an equilibrium of (1) if 0 ∈ F(x∗ ).

More details on set-valued maps and differential inclusions can be found in [18]. 2.3. Graph theory A graph of a network is denoted by G = (V, E), where V = {1, ..., N } is a set of nodes and E ⊂ V × V is a set of edges. Node j is said to be a neighbor of node i if (i, j) ∈ E. The set of all the neighbors of node i is denoted by Ni . The adjacent matrix A = [aij ] is an N × N matrix such that aij = aji > 0 if {i, j} ∈ E, j 6= i, and aij = 0, otherwise. The Laplacian matrix is L = D − A, PN where D is an N × N diagonal matrix with Dii = j=1,j6=i aij , i ∈ {1, ..., N }.

Graph G is said to be undirected if (i, j) ∈ E ⇒ (j, i) ∈ E. A path of G is a sequence of distinct nodes where any pair of consecutive nodes in the sequence has an edge of G. Node j is said to be connected to node i if there is a path from j to i, and G is said to be connected if any two nodes are connected. We refer the reader to [19] for further discussion on graph theory. 5

3. Problem formulation Consider the following optimal consensus problem: min

x1 ,...,xN

N X

fi (xi ),

s.t.

i=1

x1 = · · · = xN ∈ Ω,

(2)

where the feasible constraint set Ω ⊂ Rn is closed and convex. For each i ∈ V = {1, 2, ..., N }, fi : Ω → R is convex. Moreover, there is a constant κ0 > 0 such

that |fi (x) − fi (y)| ≤ κ0 kx − yk,

∀ x, y ∈ Ω,

∀ i ∈ V,

i.e., fi is κ0 -Lipschitz continuous. A multi-agent system is desired to solve this problem in a distributed manner over a communication graph G. That is, for each i ∈ V, the ith agent updates the local variable xi ∈ Ω to reach the optimal solution by local data and communication. The following basic assumptions will be used. Assumption 3.1. 1. The optimization problem (2) has at least one finite solution. 2. The graph G is undirected and connected. Let x = col(x1 , ..., xN ) and Ω = Ω × · · · × Ω ⊂ RnN . Since G is connected, the constraints x1 = · · · = xN is equivalent to L ⊗ In x = 0, where L is the Laplacian matrix of G. By employing a dual variable λ = col(λ1 , ..., λN ) (or called Lagrangian multiplier), the first-order optimal condition can be written as 0 ∈ col(∇f1 (x1 ), ..., ∇fN (xN )) + L ⊗ In λ + NΩ (x) 0 = L ⊗ In x. It can be observed that although problem (2) involves only primal variables, the first-order optimal condition involving both primal and dual variables has a decomposed structure, which is preferable for distributed algorithm design. Therefore, many distributed algorithms for problem (2) use both primal and 6

dual variables, i.e., x and λ. For example, the authors in [8] introduces a distributed algorithm as follows.    X X   (λi − λj ) (xi − xj ) − x˙ ∈ PT Ω (xi ) − ∂fi (xi ) −    i X    λ˙ i = (xi − xj )  

j∈Ni

j∈Ni

(3)

j∈Ni

In this algorithm, xi , λi , i ∈ V are the primal and dual variables with respect to a Lagrangian function L(x, λ) ,

  N  X X fi (xi ) + λi + xi , (xi − xj ) , i=1

j∈Ni

for an optimization problem min

x1 ∈Ω,...,xN ∈Ω

s.t.

PN

i=1

fi (xi )

(4)

x1 = · · · = xN ,

which is provably equivalent to (2). It is shown in [8] that the algorithm converges to a saddle point of L, and therefore solves the original optimization problem. Note that algorithm (3) requires agents to share their local decision variables to the network, which may lead to the leakage of private information. In this paper, we are devoted to novel distributed design that satisfies both the following two features. F1 The algorithm should be distributed and solve the optimal consensus problem (2), as same as the goal in [8]. F2 The algorithm should keep the local decision variables and subgradients private. That is, for any i ∈ V, the ith agent should not share xi to the network. 4. Algorithm design Our algorithm for problem (2) is given as follows.

7

Algorithm 1 Initialization: For each i ∈ V, the ith agent keeps and initializes variables xi , λi , ui as xi (0) ∈ Ω,

λi (0) ∈ Rn ,

ui (0) ∈ Rn .

Update flows: For each i ∈ V,     x˙ i ∈ PT Ω (xi ) − ∂fi (xi ) − λi      λ˙ i = xi − ui  X    Sgn(ui − uj ) + λi u ˙ ∈ −K  i 

(5)

(6)

j∈Ni

where K > N κ0 is a constant parameter. Rt Output: The time average x ¯i (t) , 1t 0 xi (τ )dτ , which approaches an optimal

solution as t → ∞.

Remark 4.1. Algorithm 1 satisfies features F1 and F2. For each i ∈ V, the ith agent updates xi , λi according to local information and updates ui by using the information of uj from its neighbors only. The ith agent does not need to share the local decision variable xi . In order to show the mechanism of Algorithm 1, we present the main idea of our design and postpone the rigorous analysis and proofs in the next section. First, we transform problem (4) into the following equivalent one min f (x) + Kg(x),

x∈Ω

(7)

where K > N κ0 and f (x) ,

N X

fi (xi ),

i=1

g(x) ,

1 X |xi − xj |. 2 (i,j)∈E

Next, by using auxiliary variables u = (u1 , ..., uN ), we further transform problem (7) to min

f (x) + Kg(u)

s.t.

x=u

x∈Ω, u∈RnN

8

(8)

We recall that x = col(x1 , ..., xN ) and Ω = Ω × · · · × Ω ⊂ RnN . Then we associate problem (8) with a Lagrangian function ˜ u, λ) , f (x) + Kg(u) + hλ, x − ui + δΩ (x), L(x,

(9)

where the indicator function δΩ (·) is defined as   0 if x ∈ Ω δΩ (x) = .  ∞ if x ∈ /Ω

Finally, Algorithm 1 is designed as a saddle-point dynamics with respect to

˜ L. 5. Convergence analysis In this section, we verify the correctness of our algorithm and give the convergence analysis. We first present two lemmas that will be used in our analysis. Lemma 5.1 transforms a constrained optimization problem into an unconstrained one by using the distance function for an exact penalty, referring to [20, page 50]. Lemma 5.2 gives the existence and uniqueness of the solution to a differential inclusion and establishes its ergodic-type convergence, referring to [18, pages 147–156]. Lemma 5.1 (Exact penalty). Let f : Rn → R be a κ0 -Lipschitz continuous function and Ω be a closed convex set. Then, for any κ > κ0 , x∗ ∈ argmin f (x) x∈Ω



x∗ ∈ argmin{f (x) + κdΩ (x)}. x∈Rn

Remark 5.1. The exact penalty technique has been used in designing robust average consensus and optimization, such as [21, 22]. Lemma 5.2 (Maximal monotone inclusion). Consider the following differential inclusion with a maximal monotone set-valued map A, x˙ ∈ −A(x),

x(0) = x0 .

Then the following two statements hold. 9

1. There exists a unique solution x(·) defined on [0, +∞), which is also the solution of x(t) ˙ = −m(A(x(t))),

for almost all t ∈ [0, +∞),

where m(·) is the minimal selection operator that selects an element with the least norm from a set. 2. Moreover, the time-average of x(t), denoted by x ¯(t), converges to an equilibrium point x∗ , that is, lim x ¯(t) = x∗ ,

t→∞

where

x ¯(t) ,

1 t

Z

t

x(τ )dτ.

0

The following theorem establishes the equivalence of the original optimization problem and the transformed ones. Theorem 5.1. Under Assumption 3.1, the optimization problems (2), (4), (7), (8) are equivalent to each other in the sense that their optimal solutions coincide. Proof. The equivalence of problems (2) and (4) is straightforward, and so is the equivalence of problems (7) and (8). Thus, it suffices to prove the equivalence of problems (4) and (7). First, we use the exact penalty result in Lemma 5.1. Since N X f (x ) − f (x) = (fi (x0i ) − fi (xi )) 0

i=1

≤ κ0 the function f is



N X i=1

kx0i − xi k ≤



N κ0 kx0 − xk,

N κ0 -Lipschitz continuous. Define S , {x ∈ RnN | x1 = · · · = xN ∈ Ω}.

Then we have argmin f (x) = argmin{f (x) + x∈S

x∈RnN

10



N κ0 dS (x)}.

Second, we compare the functions g(x) and dS (x). Clearly, ∀ x ∈ S.

g(x) = dS (x) = 0, Moreover, for x ∈ Ω, d2S (x) = min kx0 − xk2 = 0 x ∈S



1 N

N X N X i=1 j=1

N X i=1

kxi −

x1 + · · · + xN 2 k N

N N 1 XX |xi − xj |2 . N i=1 j=1

kxi − xj k2 ≤

Since the graph G is connected and undirected, there is a path Pij ⊂ E connecting nodes i and j for any i, j ∈ V. Then g(x) =

1 1 X |xk − xl | ≥ 2 2 (k,l)∈E

X

(k,l)∈Pij

|xk − xl | ≥ |xi − xj |,

which leads to

Therefore,



d2S (x) ≤ N g 2 (x),

∀ x ∈ Ω.

N g(x) + δΩ (x) can replace dS (x) as an alternative exact penalty

function. In other words, argmin f (x) = argmin{f (x) + Kg(x)}. x∈S

x∈Ω



This completes the proof.

Remark 5.2. The novelty of Theorem 5.1 is that it provides (8) as an equivalent transform of problem (2), which enables a distributed design with features F1 and F2. This transformation is fulfilled by using the exact penalty technique and the auxiliary variable approach. Note that the exact penalty technique has been used in [22] for distributed optimization with coupled inequality constraints. The penalty term is with respect to the dual variable in that work, while we use the penalty method in the domain of the primal variable. Recall that L˜ in (9) is the Lagrangian function associated with the equivalent problem (8). Let us define a variable ξ , (x, u, λ), 11

and a set-valued map 

˜ u, λ) ∂x L(x,



    ˜ u, λ) F (ξ) ,  ∂u L(x, .   ˜ ∂λ (−L)(x, u, λ)

(10)

Then, by combining Theorem 5.1 with the well-known Karush–Kuhn–Tucker (KKT) conditions and saddle-point conditions in terms of the Lagrangian function, we immediately obtain the following corollary. Corollary 5.1. Let x∗ be a point in Ω and ξ ∗ be a point with its first component being x∗ = (x∗ , ..., x∗ ) ∈ S. Then the following three statements are equivalent. 1. (Optimal solution) x∗ is an optimal solution to the problem (2). 2. (First-order condition) 0 ∈ F (ξ ∗ ).

˜ that is, 3. (Saddle-point condition) ξ ∗ = (x∗ , u∗ , λ∗ ) is a saddle point of L, for any ξ = (x, u, λ), there holds ˜ ∗ , u∗ , λ) ≤ L(x ˜ ∗ , u∗ , λ∗ ) ≤ L(x, ˜ u, λ∗ ). L(x Our next result gives the correctness of Algorithm 1. Theorem 5.2. Under Assumption 3.1, any equilibrium point ξ ∗ of the dynamics (6) renders 0 ∈ F (ξ ∗ ). Proof. Direct calculation yields ˜ u, λ) = ∂f (x) + λ + NΩ (x), ∂x L(x, ˜ u, λ) = K∂g(u) − λ, ∂u L(x, ˜ ∂λ (−L)(x, u, λ) = −(x − u), and ∂ui g(u) = K

X

j∈Ni

Sgn(ui − uj ).

Therefore, dynamics (6) can be rewritten in a compact form as ξ˙ ∈ −Ψ(ξ), 12

(11)

where



  Ψ(ξ) ,  

  −PTΩ (x) − ∂f (x) − λ   ˜ u, λ) . ∂u L(x,  ˜ ∂λ (−L)(x, u, λ)

To complete the task, we prove a stronger result that m(F (ξ)) ∈ Ψ(ξ) ⊂ F (ξ),

(12)

where m(·) is the minimal selection operator. Clearly, if (12) holds, then any equilibrium ξ ∗ (i.e., 0 ∈ Ψ(ξ ∗ )) will also render 0 ∈ F (ξ ∗ ). Recall the fact about the orthogonal decomposition with respect to the tangent cone TΩ (x) and the normal cone NΩ (x) that −z = PTΩ (x) [−z] + PNΩ (x) [−z],

∀ z ∈ RnN .

This equality implies −PTΩ (x) [−z] = z + PNΩ (x) [−z] ∈ z + NΩ (x), and m(z + NΩ (x)) =

argmin kyk

y∈z+NΩ (x)

= z + PNΩ (x) [−z] = −PTΩ (x) [−z]. 

Thus, (12) holds. This completes the proof. The following theorem gives the convergence of our algorithm.

Theorem 5.3. Under Assumption 3.1, dynamics (6) yields a unique trajectory. Moreover, the output x ¯i (t) converges to an optimal solution x∗ to (2), i.e., lim x ¯i (t) = x∗ ,

t→∞

∀ i ∈ V.

Proof. We divide the proof into three steps as follows. Step 1: We prove that the set-valued map F (ξ) defined in (10) is maximal monotone. Since ˜ ˜ F (ξ) = ∂(x,u) L((x, u), λ) × ∂λ (−L)((x, u), λ), 13

it follows from [17, page 548] that F (ξ) is maximal monotone if there exists a proper, convex function ψ((x, u), ω) such that ˜ L((x, u), λ) =

inf {ψ((x, u), ω) − hλ, ωi}.

ω∈RnN

These requirements are fulfilled by choosing ψ((x, u), ω) = f (x) + Kg(u) + δΩ (x) + δ{−(x−u)} (ω). Thus, the maximal monotonicity of F holds. Step 2: We prove the existence and uniqueness of solution to the dynamics (6). Since F is maximal monotone, it follows from Lemma 5.2 that there exists a unique solution to the differential inclusion ξ˙ ∈ −F (ξ),

(13)

which also satisfies ˙ = −m(F (ξ(t))), ξ(t)

for almost all t ∈ [0, +∞).

(14)

Recall from (11) and (12) that the compact form of dynamics (6) is ξ˙ ∈ −Ψ(ξ), where m(F (ξ)) ∈ Ψ(ξ) ⊂ F (ξ). Then, any solution to (14) must be a solution to (6), and any solution to (6) must be a solution to (13). This verifies the existence and uniqueness of the solution to the dynamics (6), which coincides with the solution to (13). Step 3: We prove the convergence of the solution. Since the solution to (6) is as same as the solution to (13), we can directly analyze (13). Again, it follows from Lemma 5.2 that 1 t→∞ t lim

Z

t

ξ(τ )dτ = ξ ∗ ,

0

where 0 ∈ F (ξ ∗ ). By Corollary 5.1, there holds ¯ = x∗ = (x∗ , ..., x∗ ), lim x(t)

t→+∞

where x∗ is an optimal solution. This completes the proof. 14



Finally, we discuss the convergence rate. Recall the time average notations Z 1 t ¯ ¯ ¯ (x(t), u(t), λ(t)) , (x(τ ), u(τ ), λ(τ ))dτ, t 0 where (x(·), u(·), λ(·)) is the trajectory of (6). Theorem 5.4. Under Assumption 3.1, there exists a constant M0 > 0 such that ¯ ˜ x(t), ˜ ∗ , u∗ , λ∗ )k ≤ M0 , ¯ ¯ kL( u(t), λ(t)) − L(x t

∀ t > 0,

where L˜ is defined in (9). ˜ u, λ) is convex in (x, u) and linear in λ, it follows from Proof. Since L(x,

Jensen’s inequality that, for any x0 ∈ Ω, u0 ∈ RnN , λ0 ∈ RnN , Z 1 t ˜ ˜ x(t), ¯ ¯ L(x(τ ), u(τ ), λ0 )dτ ≥ L( u(t), λ0 ), t 0 and 1 t

Z

0

t

¯ ˜ 0 , u0 , λ(τ ))dτ = L(x ˜ 0 , u0 , λ(t)). L(x

Moreover, it follows from (13) that for almost all τ > 0,   d 1 k(x(τ ), u(τ )) − (x0 , u0 )k2 dτ 2

= (x(τ ), u(τ )) − (x0 , u0 ), −ζ |ζ∈∂(x,u) L(x(τ ˜ ),u(τ ),λ(τ ))

(15a)

(15b)

(16a)

˜ 0 , u0 , λ(τ )) − L(x(τ ˜ ≤ L(x ), u(τ ), λ(τ )),

and similarly, d dτ



1 kλ(τ ) − λ0 k2 2



(16b)

˜ ˜ = L(x(τ ), u(τ ), λ(τ )) − L(x(τ ), u(τ ), λ0 ). By taking the time average of (16) over the interval [0, t] with the relaxation (15a) and the equality (15b), we obtain Z 1 t ˜ ¯ ˜ 0 , u0 , λ(t)) L(x(τ ), u(τ ), λ(τ ))dτ − L(x t 0 k(x(0), u(0)) − (x0 , u0 )k2 ≤ , 2t

15

(17a)

and

Z 1 t ˜ 0 ˜ ¯ ¯ L(x(τ ), u(τ ), λ(τ ))dτ L(x(t), u(t), λ)− t 0 kλ(0) − λ0 k2 = . 2t ¯ ¯ ¯ Substituting (x0 , u0 , λ0 ) by (x(t), u(t), λ(t)) into (17) yields

Z t

M0

1 ¯ ˜ ˜ x(t),



¯ ¯ L(x(τ ), u(τ ), λ(τ ))dτ − L( u(t), λ(t)) ,

t 2t 0

(17b)

(18)

where

2 2 ¯ ¯ ¯ M0 , max{k(x(0), u(0)) − (x(t), u(t))k , kλ(0) − λ(t)k }. t>0

¯ ¯ ¯ Since (x(t), u(t), λ(t)) converges to an equilibrium point (x∗ , u∗ , λ∗ ) as shown in Theorem 5.3, it is uniformly bounded for t ∈ [0, +∞), which implies M0 < +∞. ˜ which implies Note that ((x∗ , u∗ ), λ∗ ) is a saddle point of L,

¯ ˜ ∗ , u∗ , λ(t)) ˜ ∗ , u∗ , λ∗ ) ≤ L( ˜ x(t), ¯ ¯ L(x ≤ L(x u(t), λ∗ ).

(19)

Substituting (x0 , u0 , λ0 ) by (x∗ , u∗ , λ∗ ) into (17) and using the relaxation (19) yield

where

Z t

M1

1 ∗ ∗ ∗ ˜ ˜

L(x(τ ), u(τ ), λ(τ ))dτ − L(x , u , λ ) ≤ ,

t 2t 0

(20)

M1 , max{k(x(0), u(0)) − (x∗ , u∗ )k2 , kλ(0) − λ∗ k2 } ≤ M0 . By combining (18) and (20), we obtain ¯ ˜ x(t), ˜ ∗ , u∗ , λ∗ )k ≤ ¯ ¯ kL( u(t), λ(t)) − L(x

M0 , t 

which completes the proof.

Remark 5.3. Theorems 5.1–5.4 provide a complete procedure to prove that Algorithm 1 solves problem (2) and with an O( 1t ) convergence rate. Our approach combines Lagrangian methods in convex optimization, exact penalty techniques, differential inclusions with maximal monotone maps, and saddlepoint dynamics in the distributed design. It is also interesting to point out that our convergence analysis does not require to construct any Lyapunov function, which is totally different from [8]. 16

6. Simulations Consider a parameter estimation problem of a linear system with input b ∈

Rn , output y ∈ R, disturbance e ∈ R, and parameter x ∈ Ω ⊂ Rn as y = hb, xi + e.

An agent, labeled with i, has given some input signal bi ∈ Rn and observed the

disturbed output signal yi ∈ R. Moreover, there may be multiple agents who can have local input-output data of the same system. For example, when an aircraft crosses different areas, the local command center with respect to each area has some input-output data of the aircraft. In the parameter estimation problem, it is desired to estimate the parameter x through these agents based on local data yi , bi . This estimation problem can be formulated as the optimization problem (2) with fi (x) = |yi − hbi , xi|.

√ In order to test our distributed algorithm, we assign Ω = [− π, π 2 ]n and randomly generate each element in bi , yi from the interval [−



5−1 2 ,



5+1 2 ].

We take

N = 12, n = 2 and use the network topology, as shown in Fig. 1. 11

12

1

10

2

9

3 8

4 7

6

5

Figure 1: The communication graphs of twelve agents.

The computation results of our algorithm are shown in Figs. 2–4. Figure 2 indicates the convergence to the optimal solution. Figure 3 shows the 17

trajectories of the shared auxiliary variables. Figure 4 illustrates the conver¯ gence u(t) → u∗ = x∗ . In order to show the performance, we further take the following residuals Resf (t) , max i∈V

X N j=1

fj (¯ xi (t)) − fj (x∗ )



and

Resx (t) , max{k¯ xi (t) − x∗ k}. i∈V

Both our algorithm and the one in (3) are tested. For (3), we use xi (t) rather than x ¯i (t) to calculate the residuals. Figure 5 gives the comparison result. It indicates that both the algorithms solves the optimization problem, while our algorithm does not share local decision variables. 3 2.5 2

the first coordinates

1.5 1

the second coordinates

0.5 0 -0.5 -1

0

20

40

60

80

100

120

140

160

180

200

¯ Figure 2: Trajectories of the output x(t) of Algorithm 1 for the numerical example.

18

Figure 3: Trajectories of the shared auxiliary variables u(t) in the numerical example.

3 2.5 2

the first coordinates 1.5 1 0.5

the second coordinates

0 -0.5 -1

0

20

40

60

80

100

120

140

160

180

200

¯ Figure 4: Trajectories of the time average u(t) in the numerical example.

19

100

50

0

0

20

40

60

80

100

120

140

160

180

200

0

20

40

60

80

100

120

140

160

180

200

10

5

0

Figure 5: Performance of the algorithms in terms of residuals.

7. Conclusions A distributed continuous-time algorithm is derived for nonsmooth convex optimization. The algorithm solves the optimization problem while protects the privacy of agents in the sense that it only requires agents to share an auxiliary variable rather than their local decision variables. The effectiveness of our method is illustrated by both mathematical analysis and numerical example. Future works may include absorbing other techniques for security and privacy into the distributed design and also consider other types of important optimization problems.

Declaration of Competing Interest None.

20

References [1] G. Shi, K. H. Johansson, Y. Hong, Reaching an optimal consensus: dynamical systems that compute intersections of convex sets, IEEE Transactions on Automatic Control 58 (2013) 610–622. [2] Z. Fu, X. He, T. Huang, H. Abu-Rub, A distributed continuous time consensus algorithm for maximize social welfare in micro grid, Journal of the Franklin Institute 353 (2016) 3966–3984. [3] B. Gharesifard, J. Cort´es, Distributed continuous-time convex optimization on weight-balanced digraphs, IEEE Transactions on Automatic Control 59 (2014) 781–786. [4] Z. Deng, S. Liang, Y. Hong, Distributed continuous-time algorithms for resource allocation problems over weight-balanced digraphs, IEEE Transactions on Cybernetics 48 (2018) 3116–3125. [5] Q. Liu, J. Wang,

A second-order multi-agent network for bound-

constrained distributed optimization, IEEE Transactions on Automatic Control 60 (2015) 3310–3315. [6] G. Wang, C. Wang, L. Li, Z. Zhang, Designing distributed consensus protocols for second-order nonlinear multi-agents with unknown control directions under directed graphs, Journal of the Franklin Institute 354 (2017) 571–592. [7] Y. Lou, Y. Hong, S. Wang, Distributed continuous-time approximate projection protocols for shortest distance optimization problems, Automatica 69 (2016) 289–297. [8] X. Zeng, P. Yi, Y. Hong, Distributed continuous-time algorithm for constrained convex optimizations via nonsmooth analysis approach, IEEE Transactions on Automatic Control 62 (2017) 5227–5233.

21

[9] D. Wang, Z. Wang, B. Shen, F. E. Alsaadi, T. Hayat, Recent advances on filtering and control for cyber-physical systems under security and resource constraints, Journal of The Franklin Institute 353 (2016) 2545–2466. [10] H. Song, G. A. Fink, S. Jeschke, Security and Privacy in Cyber-physical Systems: Foundations, Principles, and Applications, John Wiley & Sons, Chichester, UK, 2018. [11] S. Han, U. Topcu, G. J. Pappas, Differentially private distributed constrained optimization, IEEE Transactions on Automatic Control 62 (2017) 50–64. [12] E. Nozari, P. Tallapragada, J. Cort´es, Differentially private distributed convex optimization via functional perturbation, IEEE Transactions on Control of Network Systems 5 (2018) 395–408. [13] Y. Lu, M. Zhu, Privacy preserving distributed optimization using homomorphic encryption, Automatica 96 (2018) 314 – 325. [14] Y. Lou, L. Yu, S. Wang, P. Yi, Privacy preservation in distributed subgradient optimization algorithms, IEEE Transactions on Cybernetics 48 (2018) 2154–2165. [15] Y. Liu, J. Wu, I. Manchester, G. Shi, Dynamical privacy in distributed computing part II: PPSC Gossip algorithms, arXiv preprint arXiv:1808.00120 (2019). [16] H. Yun, H. Shim, H.-S. Ahn, Initialization-free privacy-guaranteed distributed algorithm for economic dispatch problem, Automatica 102 (2019) 86–93. [17] R. T. Rockafellar, R. J. B. Wets, Variational Analysis, volume 317 of Grundlehren der mathematischen Wissenschaften, Springer-Verlag, New York, 1998.

22

[18] J. P. Aubin, A. Cellina, Differential Inclusions, volume 264 of Grundlehren der mathematischen Wissenschaften, Springer-Verlag, Berlin, 1984. [19] C. Godsil, G. F. Royle, Algebraic Graph Theory, volume 207 of Graduate Texts in Mathematics, Springer-Verlag, New York, 2001. [20] F. H. Clarke, Y. S. Ledyaev, R. J. Stern, P. R. Wolenski, Nonsmooth Analysis and Control Theory, volume 178 of Graduate Texts in Mathematics, Springer-Verlag, New York, 1998. [21] W. Ben-Ameur, P. Bianchi, J. Jakubowicz, Robust distributed consensus using total variation, IEEE Transactions on Automatic Control 61 (2016) 1550–1564. [22] S. Liang, X. Zeng, Y. Hong, Distributed nonsmooth optimization with coupled inequality constraints via modified Lagrangian function, IEEE Transactions on Automatic Control 63 (2018) 1753–1759.

23