Hierarchically Aggregated Optimization Algorithm for Heterogeneously Dispersed Utility Functions

Hierarchically Aggregated Optimization Algorithm for Heterogeneously Dispersed Utility Functions

Proceedings of 20th The International Federation of Congress Automatic Control Proceedings of the the 20th World World Congress Proceedings of the 20t...

366KB Sizes 0 Downloads 14 Views

Proceedings of 20th The International Federation of Congress Automatic Control Proceedings of the the 20th World World Congress Proceedings of the 20th9-14, World The International International Federation of Congress Automatic Control Toulouse, France, July 2017 The Federation of Automatic Control Available online at www.sciencedirect.com The International of Automatic Control Toulouse, France, July Toulouse, France,Federation July 9-14, 9-14, 2017 2017 Toulouse, France, July 9-14, 2017

ScienceDirect

IFAC PapersOnLine 50-1 (2017) 14442–14446

Hierarchically Aggregated Optimization Hierarchically Aggregated Optimization Hierarchically Aggregated Hierarchically Aggregated Optimization Optimization Algorithm for Heterogeneously Dispersed Algorithm for Heterogeneously Dispersed Algorithm for Heterogeneously Dispersed Algorithm for Heterogeneously Dispersed Utility Functions Utility Functions Utility Functions Utility Functions ∗

Koji Tsumura ∗ Koji Koji Tsumura Tsumura ∗ Koji Tsumura ∗ ∗ ∗ The University of Tokyo, Bunkyo-ku 7-3-1, Tokyo, JAPAN (e-mail: ∗ The University 7-3-1, Tokyo, JAPAN (e-mail: University of of Tokyo, Tokyo, Bunkyo-ku Bunkyo-ku 7-3-1,Science Tokyo, and JAPAN (e-mail: [email protected]), CREST, Japan Technology ∗ The The University of Tokyo, Bunkyo-ku 7-3-1, Tokyo, JAPAN (e-mail: [email protected]), CREST, Japan Science and Technology [email protected]), CREST, AgencyJapan Science and Technology [email protected]), CREST, Japan Science and Technology Agency Agency Agency Abstract: In this paper, we deal with simplification/aggregation of large-scaled optimization Abstract: In we deal large-scaled Abstract: In this this paper, paper, we reduction deal with with simplification/aggregation simplification/aggregation of large-scaledinoptimization optimization problems. Motivated by the of data transmission and of computation distributed Abstract: In this paper, we reduction deal with simplification/aggregation of large-scaledin problems. Motivated by the data and computation distributed problems. Motivated byand the reduction of data transmission transmission andagents computation inoptimization distributed optimization algorithms necessity of of a criterion for clustering in multi-agent systems problems. Motivated byand the necessity reductionof data transmission andagents computation in distributed optimization algorithms aa the criterion for in optimization algorithms and necessity of of criterion for clustering clustering in multi-agent multi-agent systems such as electric power networks, we clarify relationship betweenagents optimal solutions of systems original optimization algorithms and necessity of a the criterion for clustering in multi-agent systems such as power we clarify relationship between optimal solutions original such as electric electric power networks, networks, we and clarify relationship betweenagents optimal solutions of of original large-scaled optimization problems thethe corresponding approximated optimization problems. such as electric power networks, we and clarify relationshipapproximated between optimal solutions of original large-scaled optimization problems the corresponding optimization problems. large-scaled optimization problems thethe corresponding approximated optimization problems. Based on this analysis, we proposeand a hierarchically distributed optimization algorithm which large-scaled optimization problems and the corresponding approximated optimization problems. Based on this analysis, we propose a hierarchically distributed optimization algorithm which Based analysis, we propose a hierarchically distributed optimization algorithm which reduceson thethis data transmission and computation considerably. Furthermore, we assume stochastic Based on analysis, we propose a hierarchically distributed optimization algorithm which reduces the data transmission and computation considerably. Furthermore, we reduces thethis data transmission and computation considerably. Furthermore, we assume assume stochastic dispersion on profit functions of agents and analyze the stochastic characteristics of thestochastic solutions reduces theon data transmission and computation considerably. Furthermore, we assume stochastic dispersion profit functions of agents and analyze the stochastic characteristics of the solutions dispersion on profit functions of agents and analyze the stochastic characteristics of the solutions of the optimization problems. dispersion on profit functions of problems. of the the optimization optimization problems.of agents and analyze the stochastic characteristics of the solutions of problems. © the 2017,optimization IFAC (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved. Keywords: large-scaled optimization problem, distributed optimization, aggregation, Keywords: optimization problem, distributed Keywords: large-scaled optimization problem, distributed optimization, optimization, aggregation, aggregation, hierarchicallarge-scaled optimization, electric power network Keywords: optimization problem, distributed optimization, aggregation, hierarchical optimization, electric network hierarchicallarge-scaled optimization, electric power power network hierarchical optimization, electric power network 1. INTRODUCTION The distributed optimization algorithm realizes paral1. The distributed optimization algorithm realizes paral1. INTRODUCTION INTRODUCTION The distributed optimizationconcealment algorithm of realizes parallelization of the computation, information, 1. INTRODUCTION The distributed optimizationconcealment algorithm of realizes parallelization of information, lelization of the theofcomputation, computation, information, and flexibility the change ofconcealment information of network struclelization of theof computation, concealment of information, and flexibility the change of information network strucand flexibility of the change of information network structure or connection/disconnection of each subsystem. On Recently, it has been recognized as urgent necessity to and flexibility of the change of information network structure or connection/disconnection of each subsystem. On ture or connection/disconnection of each subsystem. On Recently, it has been recognized as urgent necessity to the other hand, there are drawbacks of the distributed Recently, it has been recognized as urgent necessity to develop modeling methods and optimization algorithms ture or connection/disconnection of each subsystem. On other hand, there the distributed Recently, it has been recognized as urgent necessity to the the other hand, there are are drawbacks ofthe the distributed develop modeling methods optimization algorithms optimization algorithm. Onedrawbacks of them isof slow converdevelop modeling methods and optimization algorithms for large-scaled systems suchand as supply-demand balancing the other hand, there are drawbacks of the distributed optimization algorithm. One of them is the slow converdevelop modeling methods optimization algorithms optimization algorithm. One of them is the converfor large-scaled systems such as gence rate and there are investigations to slow improve the for large-scaled systems suchand as supply-demand balancing in electric power networks or supply-demand maximization balancing of utility optimization algorithm. One of them is the slow convergence rate and there are investigations to improve the for large-scaled systems such as supply-demand balancing gence rate and there are investigations to improve the in electric power networks or maximization of utility variable update laws (Kelly et al. (1998); Paganini (2002); in electricin power networks systems. or maximization of issue utility functions social/economic A significant in gence rate and laws there(Kelly are investigations to improve the variable update et (2002); in electricin networks systems. or maximization of issue utility variable update laws (Kelly et al. al. (1998); (1998); Paganini (2002); functions social/economic A in Xia & Wang (2005); Jadbabaie et al. Paganini (2009); Wen & functions in power social/economic systems. A significant significant issue in variable such large-scaled optimization problem is to find optimal update laws (Kelly et al. (1998); Paganini (2002); & Wang (2005); et al. Wen functions in social/economic systems. A significant issue in Xia Xia &(2004); Wang Yamamoto (2005); Jadbabaie Jadbabaie et (2012a,b); al. (2009); (2009);Inagaki Wen & & such large-scaled problem is Arcak & Tsumura such large-scaled optimization problem is to to find find optimal optimal solutions with a optimization feasible amount of computation and Xia &(2004); Wang Yamamoto (2005); Jadbabaie et (2012a,b); al. (2009);Inagaki Wen & Arcak Tsumura such large-scaled problem is to find optimal Arcak (2004); Yamamoto &one Tsumura Inagaki & solutions with aa optimization feasible amount of computation and Tsumura (2014)). Another& is that(2012a,b); the amount of data solutions with feasible amount of structure, computation and Arcak flexibility for change of the information and thus (2004); Yamamoto & Tsumura (2012a,b); Inagaki & Tsumura (2014)). Another one is that the amount of data solutions with a feasible amount of computation and Tsumura (2014)). Another one is that the amount of data flexibility for change of the information structure, and thus exchange between sub-algorithms inevitably becomes large flexibility for change of thean information structure, and thus it is necessary to divide optimization algorithm into Tsumura (2014)). sub-algorithms Another one is inevitably that the amount of large data exchange becomes flexibility for change of thean information structure, and thus exchange between sub-algorithms inevitably becomes large it is to optimization algorithm into and then,between the distributed algorithm is not effective when it is necessary necessary to divide divide an optimization algorithm into exchange small sub-algorithms (Dantzig & Wolfe (1960)). An idea between sub-algorithms inevitably becomes large then, the algorithm is when it is necessary to divide an optimization algorithm into and and then, the distributed distributed algorithm is not notis effective effective when small sub-algorithms (Dantzig & (1960)). An idea enough channel capacity of the network not provided. small sub-algorithms (Dantzig & Wolfe Wolfealgorithms (1960)). An idea to realize it is distributed optimization (Arrow and then, the distributed algorithm is notis effective when enough channel capacity of the network not provided. small sub-algorithms (Dantzig & Wolfe (1960)). An idea enough channel capacity of the network is not provided. to realize it is distributed optimization algorithms (Arrow Compared to the research activity on the former one, the to realize it is distributed optimization algorithms (Arrow enough et al. (1958)). channel capacity ofactivity the network isformer not provided. Compared to the research on the one, the to realize it is distributed optimization algorithms (Arrow Compared to the research activity on the former one, the et al. (1958)). latter one has not been enough investigated. et al. (1958)). to the activity on the former one, the latter not been investigated. et al.distributed (1958)). optimization algorithm for constraint op- Compared The latter one one has has notresearch been enough enough investigated. latter one has not been enough investigated. The distributed optimization algorithm for constraint opRelevant to the above issue, in general, by collecting The distributed optimization algorithm constraintalgoop- Relevant to the above issue, in general, timization problems is given through thefor prime-dual by collecting The distributed optimization algorithm for constraint opRelevant to the above issue, in general, by system collecting timization problems is given through the prime-dual algoa number of sub-systems in a large-scaled to timization given through rithm with problems Lagrangeisfunctions (Arrowthe et prime-dual al. (1958)).algoThe Relevant toof the above issue, in large-scaled general, by system collecting a number sub-systems in a to timization problems is given through the prime-dual algoa number of sub-systems in a large-scaled system to rithm with Lagrange functions (Arrow et al. (1958)). The constitute a cluster and regarding it as a sub-system, it rithm with Lagrange functions (Arrow et al. (1958)). The algorithm is composed of update laws of the optimized aconstitute number ofcluster sub-systems in a large-scaled system to regarding it aa sub-system, it rithm withisLagrange functions (Arrow etof (1958)). The constitute cluster and regarding it as ascan sub-system, it algorithm composed of laws optimized is expectedaa that theand original problem be simplified algorithm composed of update update laws ofal.the the optimized variables inis the prime problem and the Lagrange multiplier constitute a that cluster and regarding it ascan a sub-system, it is the original problem be algorithm composed of update laws of the optimized is expected that the original problem can be simplified simplified variables in prime and Lagrange multiplier andexpected the simplified problem is easily solved. Moreover such variables inisthe the prime problem problem and the the Lagrange multiplier is in the dual problem. Those update laws are performed expected that the original problem can be simplified and the simplified problem is easily solved. Moreover such variables in the prime problem and the Lagrange multiplier and the simplified problem the is easily solved. Moreover such in the problem. Those laws are simplification will decrease amount of computation and in the dual dualand problem. Those update update laws are performed performed iteratively simultaneously, and they exchange the tem- and the simplified problem the is easily solved. Moreover such simplification will decrease amount of computation and in the dualand problem. Those update laws are performed simplification will decrease the amount of computation and iteratively simultaneously, and they exchange the temdata transmission between sub-systems. For example, in iteratively and simultaneously, and they exchange the temporal variables each other. Under an appropriate concave simplification will decrease the amount of computation and transmission sub-systems. For in iteratively and simultaneously, andan they exchange the tem- data data transmission between sub-systems. For example, example, in poral variables each other. Under concave the case of electric between power networks, grouping suppliers or poral variables each other. Under an appropriate concave condition on the utility function, theappropriate global convergence transmission between sub-systems. For example, in the case of power grouping suppliers or poral variables each other. Under an appropriate concave data the case of electric electric power networks, networks, grouping suppliers or condition on utility function, the convergence consumers, called aggregation, has been recognized as an condition on the the utility function, the global convergence to the equilibrium is guaranteed andglobal it is the optimal the case of electric power networks, grouping suppliers or consumers, called aggregation, has been recognized as an condition on the utility function, the global convergence consumers, called aggregation, hasoptimization been recognized as an to the guaranteed it optimal effective approach for simplifying problems, to the equilibrium equilibrium iswhen guaranteed and it is is the the optimal solution. Moreover, is a utilityand function is in some consumers, called aggregation, hasoptimization been recognized as an effective approach simplifying problems, to the equilibrium iswhen guaranteed and it is the optimal effective approach for simplifying optimization problems, solution. aa utility is some stable supply, and for dispersion of risk. In particular, in the solution. Moreover, when utility function is in in some class, the Moreover, optimization algorithm can function be decomposed into effective approach simplifying optimization problems, stable and dispersion of In particular, in solution. Moreover, when a utility function is in some stable supply, and for dispersion of risk. risk. In the particular, in the the class, optimization algorithm can be into case ofsupply, PV (photovoltaics) suppliers, dispersion of class, the optimization algorithm can be decomposed decomposed into stable severalthe sub-algorithms and thus, it becomes a distributed supply, and dispersion of risk. In particular, in the case of PV (photovoltaics) suppliers, the dispersion of class, the optimization algorithm can be decomposed into case of PV (photovoltaics) suppliers, the dispersion of several sub-algorithms and thus, it becomes a distributed the PV outputs is inevitably large, and thus, aggregation several sub-algorithms becomes distributed the PV optimization algorithmand andthus, manyit of actual aoptimization of outputs PV (photovoltaics) suppliers, the dispersion of is large, several sub-algorithms thus, becomes aoptimization distributed case the PV outputs isisinevitably inevitably large, and and thus, aggregation optimization algorithm and many of actual of PV suppliers highly expected in thus, order aggregation to average optimization algorithmand and manyitsuch of actual optimization problems on large-scaled systems as supply-demand the PV outputs is inevitably large, and thus, aggregation suppliers highly expected in to optimization algorithm and manysuch of actual optimization of of PV suppliersInis is such highly expected in order order to average average problems systems as thePV dispersion. case, the effect of averaging the problems on large-scaled systems such as supply-demand supply-demand balancing on arelarge-scaled in the class and the distributed optimization of PV suppliersIn is such highly expected in order to average the dispersion. case, the effect of averaging the problems on large-scaled systems such as supply-demand the dispersion. In such case, the effect of averaging the balancing are in the class and the distributed optimization dispersion depends on the variety of the suppliers’ property balancing class (Arrow and theet distributed algorithm are can in bethe applied al. (1958);optimization Roozbehani the dispersion. In on such case, theof of averaging the dispersion the variety the property balancing are in class (Arrow and theet distributed dispersion depends on the variety ofeffect the suppliers’ suppliers’ property algorithm can be applied al. (1958); Roozbehani such as thedepends difference of the power-supply characteristics algorithm canYamamoto bethe applied (Arrow et al.(2012a,b); (1958);optimization Roozbehani et al. (2010); & Tsumura Inagaki & dispersion depends on the variety of the suppliers’ property such as the difference of the power-supply characteristics algorithm can be applied (Arrow et al. (1958); Roozbehani et al. Yamamoto et al. (2010); (2010); Yamamoto & & Tsumura Tsumura (2012a,b); (2012a,b); Inagaki Inagaki & & such as the difference of the power-supply characteristics Tsumura (2014)). et al. (2010); Yamamoto & Tsumura (2012a,b); Inagaki & such as the difference of the power-supply characteristics Tsumura (2014)). Tsumura (2014)). Tsumura (2014)).

Copyright 15007Hosting by Elsevier Ltd. All rights reserved. 2405-8963 © © 2017 2017, IFAC IFAC (International Federation of Automatic Control) Copyright © 2017 15007 Copyright © under 2017 IFAC IFAC 15007Control. Peer review responsibility of International Federation of Automatic Copyright © 2017 IFAC 15007 10.1016/j.ifacol.2017.08.2287

Proceedings of the 20th IFAC World Congress Koji Tsumura et al. / IFAC PapersOnLine 50-1 (2017) 14442–14446 Toulouse, France, July 9-14, 2017

or the utility functions. However, the design policy and the effect of aggregation has not been enough investigated. From the above discussion, in order to decrease amount of data transmission and computation in distributed optimization algorithms and propose design policy for clustering multi-agent systems such as aggregation of electric power networks, in this paper, we aim the following: First, we simplify a large-scaled optimization problem by using clustering and show the relationship between the original problem and the simplified problem. Based on the above result, we also explain that a hierarchically distributed optimization algorithm is possible which realizes restricting the amount of data transmission and computation. Second, we introduce statistical dispersion to utility functions of sub-systems and analyze the stochastic property between the simplified optimization problem and the original problem. Finally, we propose a design policy for clustering sub-systems from the above results. 2. OPTIMIZATION PROBLEM AND DISTRIBUTED ALGORITHM We consider the following optimization problem with a constraint: max f (x), x ∈ RN x

s.t. g(x) = 0, (1) where x is an optimized variable, f (x) is an evaluated function, called utility function, and strict concave, and assume that a condition g(x) = 0 gives a convex set of x. From the above assumptions, it is known that the optimal solution is unique. In order to find the solution, first, define the following Lagrange function; L(x, p) :=f (x) + p · g(x), where p is the Lagrange multiplier. Define the dual function by D(p) := max L(x, p), x

then the solution of its minimization min D(p) p

gives a solution (x∗ , p∗ ) and x∗ is the optimal solution of the original problem (1). The algorithm to find the solution by the steepest descent method is given by ∂L(x, p) , x˙ = ∂x ∂L(x, p) p˙ = − . (2) ∂p When the optimization problem (1) is in a form N ∑ fi (xi ), max {xi }

s.t.

i=1

X−

N 1 ∑ xi = 0, N i=1

(3)

the algorithm (2) becomes a distributed form as explained below. In this case, the Lagrange function is given by ( ) N N ∑ 1 ∑ L(x, p) = fi (xi ) + p X − xi N i=1 i=1

14443

and (2) is represented in element-wise by 1 x˙ 1 = ∇f1 (x1 ) − p N 1 x˙ 2 = ∇f2 (x2 ) − p N .. . 1 x˙ N = ∇fN (xN ) − p N ( ) N 1 ∑ p˙ = − X − xi . N i=1

(4) (5)

It is known (Arrow et al. (1958)) that the update of each optimized variable xi is performed only by xi , the derivative ∇fi (xi ) of the utility function fi (xi ), and p, and the other utility functions fj and xj , j ̸= i are not necessary. The update of p is also performed only by the sum of xi , and the knowledge of fi , ∀i is not necessary. By preparing agents who calculate each update law in (4) or (5) in parallel, exchange the temporal optimized variables xi or p each other, then finally xi and p converge to the optimal solution x∗i and p∗ . In this way, the algorithm is performed in the distributed form. From the above property of the distributed optimization algorithm, it has been actively investigated to apply it to a class of large-scaled optimization problems, e.g., demandsupply balancing in electric power network (Roozbehani et al. (2010); Yamamoto & Tsumura (2012a,b); Inagaki & Tsumura (2014)), in which consumers have utility functions on consuming electric power whereas suppliers have cost functions for generating it and the purpose is the total optimization of the sum of these functions. 3. HIERARCHICAL AGGREGATION The distributed algorithm is suitable for solving a class of large-scaled optimization problems, however it is necessary to exchange the temporary computed variables between agents repeatedly and it requires enough network capacity. When the number of agents N is enormous, such condition is difficult to be satisfied and some improved computation framework is necessary. In order to solve this difficulty, we consider to divide all the utility functions {fi (xi )} into several groups called clusters, aggregate the set of the utility functions in each cluster into an aggregated optimization problem, and regard the original problem as a collection of the aggregated sub-problems. By this simplification, we expect to restrain the number of the optimization variables and the amount of exchanging temporal variables between agents. As a result, we also expect to accelerate convergence rate to the optimal solutions. 3.1 Clustering First, let M be the number of clusters indexed by j = 1, 2, . . . , M and divide all the utility functions {fi (xi )} and the optimized variables {xi } into {fi1 (x1i )}, {fi2 (x2i )}, . . . , {fij (xji )}, . . . , {fiM (xM i )}, where the number of functions in cluster j is Nj , j = 1, 2, . . . , M .

15008

Proceedings of the 20th IFAC World Congress 14444 Koji Tsumura et al. / IFAC PapersOnLine 50-1 (2017) 14442–14446 Toulouse, France, July 9-14, 2017

According to this clustering, the original problem (3) can be represented as max

Nj M ∑ ∑

{xji } j=1 i=1

p∗ = fij (xji ), is derived.

Nj M ∑ Nj 1 ∑ j x = 0. (6) s.t. X − N Nj i=1 i j=1 ∑N j j j Note that we can regard i=1 fi (xi ) as an aggregated utility function of cluster j. In this case, the Lagrange function can be represented as   Nj Nj M ∑ M ∑ ∑ ∑ Nj 1 fij (xji ) + p X − xji  . L(x, p) := N N j j=1 i=1 i=1 j=1

In order to clarify the fundamental property of the problem, we restrict the class of the utility functions to the following quadratic form: 1 1 j (x − aji )2 + bji , dji > 0 (7) fij (xji ) = − 2 dji i

With this simplification, we will represent the behavior of the optimized variables and the values of the utility functions around the optimal solutions explicitly. We also define the following quantities according to the above clustering: a ˜j :=

Nj Nj Nj 1 ∑ j ˜j 1 ∑ j ˜j 1 ∑ j ai , b := bi , d := d >0 Nj i=1 Nj i=1 Nj i=1 i (8)

When the utility function is given by (7), (4) becomes 1 1 x˙ ji = − j (xji − aji ) − p (9) N di ∗

and the equilibrium point xji is given by dji ∗ p . (10) N On the other hand, the update law of p can be represented by ∗

xji = aji −

Nj M ∑ Nj 1 ∑ j p˙ = x − X. N Nj i=1 i j=1

(11)

Nj M ∑ Nj 1 ∑ j ∗ x − X = 0. N Nj i=1 i j=1

(12)



Therefore, {xji } satisfy

By substituting (10) into (12), we get ( ) Nj M ∑ Nj 1 ∑ j dji ∗ ai − p − X N Nj i=1 N j=1   Nj Nj M M ∑ ∑ ∑ Nj 1 ∑ dji ∗ Nj  1 p −X aji  − = N Nj i=1 N Nj i=1 N j=1 j=1 =

M ∑ Nj j=1

M 1 ∑ Ni ˜j ∗ a ˜ − d p −X =0 N N j=1 N j

and then 1

1 N

∑M

Nj ˜j j=1 N d

 

M ∑ Nj j=1

N



a ˜j − X 

(13)



By substituting the equilibrium point ({xji }, p∗ ) into the Lagrange function, we get ( ∗ ) L {xji }, p∗ ( ) Nj M ∑ ∑ dji j ∗ 2 (p ) + bi − = 2N 2 j=1 i=1   ( )  Nj M j ∑ ∑ d Nj 1 aji − i p∗  + p ∗ X −  N N N j j=1 i=1   M M 1 1 ∑ Nj ˜j  ∗ 2 ∑ ˜j =−  d (p ) + Nj b 2 N j=1 N j=1    2 M M ∑ ∑ Nj j Nj ˜j  1  1 a ˜ − X + b . = N − ∑ M N j  2 j=1 d˜j j=1 N N j=1 N

(14) From above, it is known that the optimized value (14) can be represented only by a ˜j , ˜bj , and d˜j , which are defined in (8) and the averages of aji , bji , and dji , respectively, in each cluster. 3.2 Aggregated Optimization Problem Based on the discussion in Section 3.1, in the case that the utility functions {fij } are given by (7), we consider to simplify the original optimization problem (6) to the following optimization problem: ( ) M ∑ 1 1 j j 2 ˜bj N (˜ x − a ˜ ) + − max j 2 d˜j x ˜j j=1 s.t.

X−

M ∑ Nj j=1

N

x ˜j = 0

(15)

Note that a ˜j , ˜bj , and d˜j in (15) are given by (8) and the averages of {aji }, {bji }, and {dji } in cluster j.

The number of the optimized variable x ˜j in (15) is M ∑M and it is considerably small compared to N = j=1 Nj in the original problem (6) if each Nj is enormously large. Moreover, the variable x ˜j in (15) can be also regarded j as the average of {xi } in cluster j, In this sense, we call this problem as “an aggregated optimization problem.” The corresponding Lagrange function for (15) is also given by ) ( aj }, {˜bj }, {d˜j } L˜ {˜ xj }, p˜; {˜   ( ) M M ∑ ∑ Nj j  1 1 j x ˜ . = Nj − (˜ x −a ˜j )2 + ˜bj + p˜ X − ˜j 2 N d j=1 j=1 (16)

In the following, we find the relationship between the optimal solution of this aggregated optimization problem 15009

Proceedings of the 20th IFAC World Congress Koji Tsumura et al. / IFAC PapersOnLine 50-1 (2017) 14442–14446 Toulouse, France, July 9-14, 2017

(15) and that of the original problem (6). We get the following: Theorem 1. On the optimization problem (6) and the corresponding aggregated problem (15), the following holds: ) ( ) ( ∗ = L˜ {˜ xj∗ }, p˜∗ ; {˜ aj }, {˜bj }, {d˜j } (17) L {xj∗ i }, p    2 M M ∑ ∑ Nj j Nj ˜j  1  1 = N − ∑M Nj  a ˜ − X + b  j ˜ 2 j=1 d N N j=1 j=1

optimization algorithm. We summarize the algorithm in the following: Hierarchically Distributed Optimization Algorithm upper layer x ˜˙ j = Nj

N

p˜˙ =

(18)

Nj 1 ∑ j∗ x =x ˜j∗ , p∗ = p˜∗ Nj i=1 i

3.3 Hierarchically Distributed Optimization Algorithm When variable p in the update law (9) of xji for the original problem (6) can be replaced by its optimal value p∗ , the update law (9) becomes a completely closed form and its ∗ equilibrium is also equal to the optimal solution xji . From the relationship (19) in Theorem 1, it is known that p∗ is identical to p˜∗ for the aggregated optimization problem (15), we can get the following hierarchically distributed optimization algorithm through (15): We consider two layers; the upper layer and the lower layer, which constitute the whole algorithm. In the upper layer, consider to solve the aggregated optimization problem (15). Assume that the number of the clusters, M , is enough small compared with the number of variables {xji }, N , of the original problem (6). Under such condition, x ˜j∗ and p˜∗ are found by algorithms } { Nj 1 j j j ˙x ˜ = Nj − (˜ p˜, (20) x −a ˜ ) − j ˜ N d M ∑ Nj j ˙p˜ = x ˜ − X. (21) N j=1

with a small computation number compared to that for finding the optimal solutions for the original problem (6). The optimized value of the original problem (6) can be also known by (17). On the other hand, in the lower layer, by using the optimal solution p˜∗ calculated in the upper layer or the temporal p˜, we perform the following update law 1 1 (22) x˙ ji = − j (xji − aji ) − p˜∗ N di or 1 1 (23) x˙ ji = − j (xji − aji ) − p˜, N di as the substitution of (9). Different from the ordinary distributed optimization algorithm (4) and (5), there is no variable transmission from the lower layer (22) or (23) to the upper layer (20) and (21). In this sense, these update laws can be regarded as a hierarchically distributed

{

} 1 j Nj j − (˜ p˜ x −a ˜ ) − j ˜ N d

M ∑ Nj j=1

lower layer

(19)

This result implies that when the averages of aji , bji , and dji are known precisely, the original optimization problem (6) can be completely simplified via the aggregated optimization problem and the hierarchically distributed algorithm is possible as explained below.

14445

1

x ˜j − X

1 p˜ N 1 1 x˙ ji = − j (xji − aji ) − p˜∗ N di

x˙ ji = − or

N

dji

(xji − aji ) −

By using this hierarchically distributed optimization algorithm, it is known that the number of the optimized variables and the amount of data transmission of the temporally calculated optimization variables are drastically restrained compared to that of the original algorithm (4) and (5). As a result, the improvement of the convergence rate is also expected. 4. STOCHASTIC ANALYSIS In the previous section, we explain that when the averages of aji , bji , and dji are known precisely, the original optimization problem (6) can be simplified via the aggregated optimization problem and the hierarchically distributed algorithm is possible. However, when the number of agents is enormous, it is difficult to get the averages of aji , bji , and dji precisely, and it is reasonable that only their expectations are known. From this consideration, in this section, we assume that aji and bji are stochastic variables and they have statistical dispersion. On the other hand, assume that d˜j is known for simplicity in this paper. Then, we calculate the statistics of the optimal solutions. Hereafter, we assume that aji and bji are stochastic variables and independent each other. We also assume the following: 2

j E[aji ] =: aj , V[aji ] = σai = σaj j

2

j E[bji ] =: b , V[bji ] = σbi = σbj

2

2

(24) E[(aji )4 ] < ∞ Under the above preparations, we will find the following quantities: [ ( )] ∗ E L {xj∗ (25) }, p i [ ( )] ∗ V L {xj∗ (26) i }, p

In particular, we consider an aggregated optimization problem (15) defined in Section 3 with substitutions a ˜j ← j j j ˜ a and b ← b , and compare its optimal solution and that of the original problem (6) from the statistical viewpoint. We get the following:

15010

Proceedings of the 20th IFAC World Congress 14446 Koji Tsumura et al. / IFAC PapersOnLine 50-1 (2017) 14442–14446 Toulouse, France, July 9-14, 2017

Theorem 4.1. On the optimization problem (6) and an aggregated optimization problem (15) with substitutions j a ˜j ← aj and ˜bj ← b , the following holds: )] [ ( ∗ E L {xj∗ i }, p M ( ) 1 ∑ Nj j 2 1 j = L˜ {˜ xj∗ }, p˜∗ ; {aj }, {b } − ∑M Nj σ 2 j=1 d˜j j=1 N a N



 2 M ∑ Nj j 1  1  = N − ∑M Nj  a − X 2 j=1 d˜j N j=1 N   M M ∑ ∑ 1 Nj j 2  Nj j  σ + + b  N j=1 N a N j=1

(27)

(28)

Remark 4.1. The following are known from (27) or (28): ∑M N ∑M N j • The values of the terms j=1 Nj aj and j=1 Nj b are constant for the difference of the clustering {fi1 (x1i )}, {fi2 (x2i )}, . . . , {fiM (xM i )}. • On the other hand, from the comparison between (14) or (18) and (27) or (28), it is known that the ∑M N 2 term j=1 Nj σaj in (27) or (28) is different for the difference of the clustering {fi1 (x1i )}, {fi2 (x2i )}, . . . , j ˜j ← aj and ˜bj ← b . {fiM (xM i )} with substitutions a j That is, when the variance of ai in cluster j is large, ∑M N 2 the value of j=1 Nj σaj in (27) or (28) makes the optimized value of the original problem deteriorate (decreases, note that − 12 ∑M 1 Nj ˜j < 0) in the j=1 N

d

sense of the expectation compared with the optimal solution of the aggregated problem for a ˜j ← aj and j ˜bj ← b . In other words, the optimal solution of the aggregated optimization problem (15) is an optimistic estimation for that of the original problem (6). • In order to close the optimal solutions of the aggregated problem (15) and the original problem (6) in the sense of their expectations, it is necessary to collect the utility functions or {aji } of {xji } for cluster j such as the dispersion of which is small as possible.

Moreover, the variance of the optimal solution for the original problem (6) satisfies the following: Theorem 4.2. For simplicity, assume dji = 1. Then the following holds. )] [ ( ∗ }, p lim V L {xj∗ i N →∞ 2  M M M ∑ ∑ ∑ 2 Nj j 2 Nj σbj (29) Nj σaj + a − X = N j=1 j=1 j=1

Remark 4.2. From the above result, when the variances 2 2 σaj and σbj of aji and bji in cluster j are small, the statistical dispersion of the optimized value becomes small. This implies that when we estimate the optimal solution of the original problem (6) by using the aggregated problem, it is preferable to collect the evaluation functions {fij } into a cluster such that their stochastic properties are similar.

5. CONCLUSION In this paper, we dealt with a distributed optimization algorithm and in order to reduce the number of the optimized variables and the amount of variable transmission between subprograms we considered a hierarchically distributed optimization algorithm. In particular, when the utility functions are quadratic, we consider to divide all the utility functions into several clusters and derived an aggregated optimization problem which regards each cluster as an averaged utility function with an averaged optimization variable. Then we showed that when the averaged utility functions are known precisely, the solution of the hierarchically distributed problem is equal to the original one. Moreover, we also considered a situation that only the expectations of the utility functions are known, and derived the expectations and the variances of the optimized values. From its analysis, it is known that the statistical dispersion of the utility functions in each cluster should be small in order to decrease the errors between the estimated optimized value by using the aggregated optimization problem and that of the original problem. This result gives a design policy for clustering the utility functions. This work was supported by JST CREST Grant Number JPMJCR15K1, Japan. REFERENCES Arrow, K. J., Hurwicz, L., and Uzawa, H. (1958). Studies in Linear and Nonlinear Programming. Stanford University Press. Dantzig, G. B., Wolfe, P. (1960). Decomposition principle for linear programs. Operations Research, 8, 101–111. Inagaki, K., Tsumura, K. (2014). Distributed optimization via dual decomposition with distributed Newton’s method. MSCS2014. Jadbabaie, A., Ozdaglar, A., and Zargham, M. (2009). A distributed Newton method for network optimization. Proceedings of the 48th IEEE Conference on Decision and Control, 2736–2741. Kelly, F., Maulloo, A., and Tan, D. (1998) Rate control in communication networks: shadow prices, proportional fairness and stability. Journal of the Operational Research Society, 49, 237–252. Paganini, F. (2002). A global stability result in network flow control. Systems & Control Letters, 46, 165–172. Roozbehani, M., Dahleh, M., Mitter, S. (2010). On the stability of wholesale electricity markets under real-time pricing. 49th IEEE Conference on Decision and Control, 1911–1918. Wen, J. T., and Arcak, M. (2004). A unifying passivity framework for network flow control. IEEE Transactions on Automatic Control, 49, 162–174. Xia, Y., Wang, J. (2005). A recurrent neural network for solving nonlinear convex programs subject to linear constraints. IEEE Transactions on Neural Networks, 16, 2, 379–385. Yamamoto, H., Tsumura, K. (2012a). Control of smart grids based on price mechanism considering of network structure. Proceedings on SICE 12nd Annual Conference on Control Systems. Yamamoto, H., and Tsumura, K. (2012b). Control of smart grids based on price mechanism and network structure. Mathematical Engineering Technical Reports, The University of Tokyo, METR 2012–11.

15011