Available online at www.sciencedirect.com Available online at www.sciencedirect.com Available online at www.sciencedirect.com
ScienceDirect
Procedia Computer Science 00 (2019) 000–000 Procedia Computer Science 00 (2019) 000–000 Procedia Computer Science 147 (2019) 354–360
www.elsevier.com/locate/procedia www.elsevier.com/locate/procedia
2018 International Conference on Identification, Information and Knowledge in the Internet of Things, IIKI 2018 2018 International Conference on Identification, Information and Knowledge in the Internet Things, IIKI 2018 Energy-Efficient Composition of ofConfigurable Operators in Big
Data Energy-Efficient Composition of Configurable Operators in Big Data Environment Environment Jiajia Yaoa , Zhangbing Zhoua,b , Deng Zhaob , Mengyu Suna a,b of Geosciences(Beijing), of Information China University Beijing100083, Jiajia Yaoa ,Engineering, Zhangbing Zhou , Deng Zhaob , Mengyu Suna China b
a School a School
Computer Science Department, TELECOM SudParis, Evry 91011, France of Information Engineering, China University of Geosciences(Beijing), Beijing100083, China b Computer Science Department, TELECOM SudParis, Evry 91011, France
Abstract With the rapid development of edge computing, heterogeneous smart devices, serving as operators, require to be Abstract deployed closely to data sources, and cooperate to accomplish big data applications considering factors including With rapid development edge computing, heterogeneous devices,can serving as operators, to be energythe consumption and fast of response. Specifically, an operators smart composition be encapsulated andrequire represented deployed to data sources, and in cooperate accomplish big to data factorsdefined including as feasibleclosely solution, and an operator a certaintosolution prefers be applications encoded withconsidering abstract matrix by energy consumption fast response. Specifically, operators composition can be encapsulated and represented evaluation indicatorsand including spatial and temporalanconstraint, priority between consecutive operators, the loadas feasible factor, solution, an operator in a certain solution be encoded strategy with abstract definededge by balancing andand energy consumption. Discovering an prefers optimaltocomposition in thematrix constructed evaluation including spatial and temporal constraint,optimization priority between consecutive operators, loadcluster can indicators boil down to a multi-objective and multi-constrained problem, which can be solvedthe through balancing factor, and energy consumption. an optimal composition strategy in the constructed edge adopting heuristic algorithms. Experimental Discovering evaluation demonstrates that the Grey Wolf Optimization outperforms cluster boil down to a multi-objective andanmulti-constrained optimization problem, which can be solved through Particlecan Swarm Optimization in discovering approximately optimal operators composition in the corresponding adopting heuristic algorithms. evaluation demonstrates that the Grey Wolf Optimization outperforms edge cluster, and can minimizeExperimental the energy consumption of the network. Particle Swarm Optimization in discovering an approximately optimal operators composition in the corresponding edge cluster, and can minimize consumption of the network. c 2019 The Authors. Publishedthe by energy Elsevier B.V.
This an Authors. open access article under B.V. the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc© 2019isThe Published by Elsevier c 2019 The Authors. Published by Elsevier B.V. nd/4.0/) This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) This is an under open access article thecommittee CC BY-NC-ND (https://creativecommons.org/licenses/by-ncPeer-review under responsibility ofscientific the scientific committee oflicense the 2018 International Conference on Information Identification, Peer-review responsibility of theunder of the 2018 International Conference on Identification, and nd/4.0/) Information Knowledge in the Internet of Things. Knowledge in and the Internet of Things. Peer-review under responsibility of the scientific committee of the 2018 International Conference on Identification, Keywords: Operators; Operators Composition; Cluster; Energy Efficiency Information and Knowledge in the Internet ofEdge Things. Keywords: Operators; Operators Composition; Edge Cluster; Energy Efficiency
1. Introduction 1. Introduction The edge computing paradigm has prospected and facilitated ubiquitous data accessing, processing and combining close to data sources to alleviate excessive network traffic in a big data environment [1]. Due The edge computing paradigm has prospected and facilitated ubiquitous data accessing, processing and combining close to data sources to alleviate excessive network traffic in a big data environment [1]. Due ∗ ∗
Corresponding author. E-mail address:
[email protected] Corresponding author. E-mail address:
[email protected]
c 2019 The Authors. Published by Elsevier B.V. 1877-0509 This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) 1877-0509 ©c2019 The Authors. Published by Elsevier B.V. 1877-0509 2019 The Authors. Published by Elsevier B.V.of the 2018 International Conference on Identification, Information Peer-review under responsibility of the scientific committee This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) ThisKnowledge is an open in access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) and the Internet of Things. Peer-review under responsibility of the scientific committee of the 2018 International Conference on Identification, Information and Peer-review responsibility of the scientific committee of the 2018 International Conference on Identification, Information Knowledge inunder the Internet of Things. and Knowledge in the Internet of Things. 10.1016/j.procs.2019.01.237
2
Jiajia Yao et al. / Procedia Computer Science 147 (2019) 354–360 Jiajia Yao / Procedia Computer Science 00 (2019) 000–000
355
to the increasing amount of data to be processed and the complexity of functionalities, the requirement of certain applications may hardly be achieved by a single operator. Instead, it may require the collaboration of multiple neighboring operators in a collective fashion [2]. An example is shown in [1], where video processing operators, social network data operators and mobile data operators are deployed in the neighboring region, while monitoring operators drive to coordinate the functions of the above three kinds of operators for detecting the involved people in a car accident. In this setting, the functional composition of heterogeneous operators is a pressing and promising alternative. Considering that operators usually have spatial and temporal awareness, and are mostly limited in their energy, storage, as well as the single functionality and computing resource, the functionalities of operators are usually encapsulated and represented as an abstract matrix, through which we can evaluate the fitness of a feasible solution. The development of service discovery and composition techniques has been achieved in the domain of Web/REST service during the past decade [3]. Besides, a renewed interest is to focus on the network traffic becoming the primary optimization factor with the advent of big data applications [5]. By adopting machineto-machine connection technique, big data sources are always aggregated and processed by heterogeneous operators to satisfy the demands of an application. Therefore, it is worth considering that an optimal edge cluster should be constructed to reduce the size of the search space during data transmission. Multiple operators with heterogeneous processing functionalities are distributed in the whole edge network, which should consider the spatial and temporal constraints and ensure the efficient use of energy. Moreover, a certain type of operators hosted by different edge nodes should be configurable by computing priorities between consecutive data processing. Note that operators may take input data from more than one streams, we should consider whether there is a standby-state to increase energy consumption when multiple data sources transmitting data to the same parent for data aggregation or processing. To address this challenge, this article proposes an energy-efficient optimization mechanism for operators composition. Our contributions can be summarized as follows: • For a certain application, an edge cluster is initially constructed by adopting Multistage Shortest Path (MSP) algorithm to find an optimal edge cluster. After that operators contained in the feasible solution are encapsulated into corresponding abstract matrix, which is encoded by five evaluation indicators including Tpr, Spt, Pri, Lbf and Egc, which refer to spatial and temporal constraint, priority between consecutive operators, the load-balancing factor and energy consumption, respectively. • The composition of operators can be simplified as an optimization problem with multiple objectives under multiple constraints. Typically, Grey Wolf Optimization (GWO) and Particle Swarm Optimization (PSO) are adopted to search for the optimal composition strategy in this article. Comprehensive experimental results have proved that our techniques are high-efficiency. The evaluations demonstrate that GWO outperforms PSO when searching for an approximately optimal composition strategy.
2. Preliminaries 2.1. System Definition and Problem Formulation The system is comprised of edge nodes supporting heterogeneous operators to handle various data processing and computing functions. An edge node may co-support more than one functionalities in terms of the diversity of operators. Generally: Definition 1 (Edge Node). An edge node Ni is a tuple (id, nm, olist , spt, RegN ), where id is the unique identity, nm is the name, olist is the set of operators, spt is the spatial geographical range, and RegN represents the remaining energy, of Ni . Definition 2 (Operator). An operator om is a tuple (nm, tpr, spt, Ni ,wgtmn , erg), where nm is the name of om , tpr represents the temporal durations, spt represents the spatial constraint, Ni represents nodes
356
Jiajia Yao et al. / Procedia Computer Science 147 (2019) 354–360 Jiajia Yao / Procedia Computer Science 00 (2019) 000–000
3
hosting in om , wgtmn is the link weight between two operators om to on , and erg is the residual energy of om . Definition 3 (Data Source). A data source dj can be defined as a tuple with the attributes of (nm, wgtjn , spt), which are specified in the same way as om . Typically, we consider a group of operators and data sources logically composing a hierarchical tree structure for the given application. Our goal is to search for approximately optimal composition strategy and satisfy the requirements of a complex application, as well as minimize the energy consumed during communication, thus prolong the lifetime of the edge network. 2.2. Energy Model In this section, we take advantage of the widely-used first order radio model [4] to compute the energy consumption when transmitting data packets across the application tree Specifically, ET x ij (k,d) refers to the energy consumption when transmitting k bits with a distance of d between two neighboring operators, while ERx (k) is the energy consumed of receiving a k bit packet, the formulae can be expressed by the following equations: ET x
ij (k,d)
= Eelec × k + amp × k × dn
ERx (k) = Eelec × k
(1) (2)
where Eelec is the energy consumption incurred to running the transmitter or receiver circuitry, amp is performed as a transmit amplifier. Therefore, we can calculate the total cost to transmit k bits from the operator oi to another oj for higher-level data processing, denoted as Eij (k), which is specified by the following equation: Eij (k) = ET x
ij (k,d)
+ ERx (k)
(3)
Specially, the main difference of transmitting a packet between operators and data sources comparing with that of a sink is considered. Usually, a sink is assumed to have unlimited energy so as to have no energy restriction.
3. Edge Cluster Construction Leveraging the MSP algorithm, we introduce the main ideas to construct an edge cluster. Assuming the application, which contains five data sources (d1 , d2 , d3 , d4 , d5 ), four operators (o1 , o2 , o3 , o4 ), and one sink (s1 ). The link weights between any two neighbouring hops specify the data dependencies between any two adjacent hops, which is pre-defined by the nature of a given application and the degree of importance during data aggregation and processing. Typically, the algorithm processes a path by performing iteratively in a bottom-up fashion based on the given application tree structure and we enumerate all paths from every data source to the sink, respectively. Assuming the applications structured as hierarchies (trees), an operator can be appointed to coordinate the functions of more than one operators or data sources, accumulate and filter data from them and send notifications to a sink. We select all feasible solutions in the edge cluster that the link weight ratio between two adjacent operators must be larger than the distance ratio. The functional specification of operators in a feasible solution can be encoded as a corresponding abstract matrix specified by five evaluation indicators referring to spatial and temporal constraints, priority between consecutive operators, the load-balancing factor and energy consumption. A p-dimension abstract matrix can be denoted as M p [oi , oj , · · ·, ok ], where p is the dimensions of the matrix which represents the number of operators contained in a feasible solution.
4
Jiajia Yao et al. / Procedia Computer Science 147 (2019) 354–360 Jiajia Yao / Procedia Computer Science 00 (2019) 000–000
357
4. Composition Of Operators In this section, we introduce the spatial and temporal constraint, priority for selecting an optimal next hop, and the energy efficiency. Two optimization algorithms are proposed to discover and select the optimal composition. 4.1. Operators Constraints 4.1.1. Spatial and Temporal Constraints Generally, operators execute processing functionalities during certain time duration, as well as certain service region. Intuitively, the spatial-constraint of an operator oi is specified in terms of the geographical position and the service scope, which is represented as follows: spt(oi ) = (poi , roi )
(4)
where poi represents the physical location of the operator oi , which is defined by the value of geographic coordinates, and roi shows the service radius denoting the dominated range within which oi is able to process the data request. Note that a given application (denoted as apl) and the corresponding edge cluster (denoted as ec) are also specified by the spatial range in a similar fashion, the spatial correlation of apl and ec can be calculated by the following equation: spt(apl, ec) = (spt(ec) ∩ spt(apl)) ÷ spt(apl)
(5)
tpr(apl, oi ) = (tpr(oi ) ∩ tpr(apl)) ÷ tpr(apl)
(6)
4.1.2. Priority Between Consecutive Operators Without loss of generality, an edge node can host multiple operators, which should be configurable when more than one operator with the same functionality handled by different nodes can simultaneously work well. The data processing delay should be considered provided that the parent takes input data from multiple streams and part of them can be instantiated by the same node, to be more precise, we should take into account whether the next operator hosted by the same edge node can ensure both the requirements of data processing and the service time are available immediately without any halt as soon as the previous operator has completed. Usually, there exists energy consumption during the waiting time (denoted as delay(Ni .om , Ni .om+1 )). Therefore, we calculate the priorities considering different cases in terms of the delay time hosted by the same node. We can compute the priority factor of a certain operator (denoted as pri(Ni .om )) by the following equation:
pri(Ni .om ) =
Einv (Ni .om ),
Einv (Ni .om ) + delay(Ni .om , Ni .on ),
Einv (Ni .om ) + ET R ,
(7)
The first equation illustrates the case where there is no delay between two consecutive operators. The second one indicates that we should consider the energy consumption during the delay time after the previous operator has accomplished. The last equation shows how to calculate the priority of operators hosted by different nodes. 4.1.3. Energy Constraints To alleviate network traffic and achieve the load balancing, operators supported by nodes with much more remaining energy should prefer to be chosen for the instantiation of a certain solution. Besides, according
358
Jiajia Yao et al. / Procedia Computer Science 147 (2019) 354–360 Jiajia Yao / Procedia Computer Science 00 (2019) 000–000
5
to the residual energy of each node, we put forward the ratio to measure the rest of energy available to Ni , which is computed in terms of the following equation: ecr(Ni .om ) = Einv (Ni .om ) ÷ (RegN − Ecot (Ni ))
(8)
lbf(Ni .om ) = ecr(Ni .om ) ÷ thrd
(9)
Therefore, we can get the load balancing factor (denoted as lbf (Ni .om )) of any node supporting om and select the optimal one. The lbf (Ni .om ) is calculated as follows: where thrd is the average energy cost ratio of all nodes in the whole network. Intuitively, the node with smaller value of lbf can ensure to possess relatively amounts of residual energy to complete the demands of data processing. 4.2. Operators Composition Considering the feasible solution M p [oi , · · ·, oj ] = oi → . . . → oj , each operator contained should be instantiated to achieve certain data aggregation or processing functionality as well as take the constraints into account comprehensively. We prove that our problem is to find the optimal operators composition in the case of multi-constrained and multi-objective conditions. • Multi-Objective Functions: 1. Minimize: f1 = E(M p [oi , · · ·, oj ]) f2 = α ∗ Pri(M p [oi , · · ·, oj ]) + β ∗ Lbf(M p [oi , · · ·, oj ]) 2. Maximize: f3 = ϕ ∗ Spt(M p [oi , · · ·, oj ]) + δ ∗ Tpr(M p [oi , · · ·, oj ]) where ϕ, δ, α and β represent the constant influencing factors, specially, ϕ + δ = 1, α + β = 1. • Constraints: 1. (RegN − Ecot (Ni )) ≥ Ecot (Ni ) • Fitness Function: fit(M p [oi , · · ·, oj ]) = w1 ∗ f1 + w2 ∗ f2 − w3 ∗ f3 = w1 ∗ E(M p [oi , oj , · · ·, ok ]) + w2 ∗ (α ∗ Pri(M p [oi , · · ·, oj ]) + β ∗ Lbf(M p [oi , · · ·, oj ])) − w3 ∗ (ϕ ∗ Spt(M p [oi , · · ·, oj ]) + δ ∗ Tpr(M p [oi , · · ·, oj ])) where w1 , w2 and w3 define the corresponding positive constant factors, which indicate the degree of importance among the three functions f1 , f2 and f3 during the selection of feasible solution, respectively, and w1 + w2 + w3 = 1. The fitness function measures whether a instantiated solution can both satisfy the requirements of objective functions and consider constraint conditions aforementioned. Obviously, the smaller the fitness function value is, the better the instantiated solution will be. 4.3. Optimization Algorithm in Operators Composition In this section, we propose two optimization algorithms including the GWO and PSO to solve the multiobjective and multi-constrained optimization problem, their procedures are performed as below: GWO algorithm is a novel swarm intelligence algorithm to simulate the behavior of grey wolves. Specifically, the α wolf is considered to have the strongest leadership, and is the most intelligent and powerful leader with the best fitness close to the prey. Besides, β and δ wolf have the suboptimal fitness. In the course
6
Jiajia Yao et al. / Procedia Computer Science 147 (2019) 354–360 Jiajia Yao / Procedia Computer Science 00 (2019) 000–000
359
of hunting, the β and δ wolf will help the α wolf make decisions and enforce management. The rest wolves are denoted as ω, and the primary responsibilities of them are to balance the internal relations among grey wolves and coordinate to encircle the prey. Moreover, we consider the hunting process into three stages called encircling, pursuing and attacking in order to capture the prey and obtain the global optimal solution. PSO is the swarm intelligence algorithm by simulating the hunting behavior of birds. We consider every instantiated solution as the bird to search for the optimal operators composition in the edge cluster. Consider pbest and gbest as the local optimal solution and the global optimal solution respectively. Driven by the update mechanism, each solution adapts itself close to the optimal operators composition.
Fig. 1. The constructed edge cluster and operators composition structures generated by applying GWO and PSO, respectively.
5. Implementation and Evaluation Consider the example depicting a generic application that is used to search for the optimal operators composition in the edge network. Initially, we construct an edge cluster according to the requirement of the application hierarchies (tree) structure that is specified by the red circle shown in Fig. 1. The black solid points indicate the locations of the edge nodes, besides, the red solid squares represent the data sources ready to process. The GWO and PSO are used to search for the optimal solutions. As depicted in Fig. 1, searching for the optimal solution generated using the GWO is M 4 [o1 , o2 , o3 , o4 ] = o1 (N43 ) → o2 (N46 ) → o3 (N3 ) → o4 (N20 ). The recommended operators composition performed via PSO is M 4 [o1 , o2 , o3 , o4 ] = o1 (N28 ) → o2 (N30 ) → o3 (N32 ) → o4 (N26 ). Results show that the GWO performs better comparing to PSO. The specific experimental results concerning the two algorithms are analyzed as follows:
Fig. 2. The variation of fitness value and minimum residual energy for the GWO and PSO in the edge cluster when the algorithm are executed in 100 contiguous time slots.
The left of Fig. 2 is the contrast of the fitness value between GWO and PSO in the range of edge cluster. We can see that the PSO algorithm has a large fluctuation during the iterations, while GWO has a strong
360
Jiajia Yao et al. / Procedia Computer Science 147 (2019) 354–360 Jiajia Yao / Procedia Computer Science 00 (2019) 000–000
7
advantage to find the optimal solution during iterations. The right of Fig. 2 is the contrast of minimum residual energy variation between GWO and PSO, as we can see that the decline of curve is obvious in PSO, while the variation of energy is approximate to the linear function in GWO. 6. Related Works and Comparison The problem of operators composition has been studied and well-developed in the past decades. Authors in [6] focus on the objective of optimizing the time to satisfy the demands of the user more quickly. Mobile agents are later called operators, while immobile agents are regarded as data sources introduced in [7]. Our work is based on IoT service compositions which has been developed by the supported of ServiceOriented IoTs [2]. Various IoT devices can achieve the collaboration and cooperation among each other by means of service composition techniques. He et al. [8] proposes energy efficient privacy-preserving content sharing scheme in mobile social networks, which makes full use of the benefit of SOA as well as promotes the collaboration of services. Subsequently, Chen et al. focus on the data aggregation scheduling in [9] to integrate and utilize IoT services and resources. Besides, a service-oriented system developed by Shi et al [10] can adjust to various geo-spatial and resource constraints for exploring connected dominating sets in energy-harvest networks. 7. Conclusions In this article, an optimal composition strategy is proposed to reduce network traffic by coordinating the actions of multiple operators for a given application. Operators compositions are encapsulated into feasible solutions by adopting the multistage shortest path algorithm to find an optimal edge cluster. Operators with p-dimension in a solution are encoded as an abstract matrix defined by evaluation indicators including spatial- and temporal-constraint, energy economic, priority between consecutive operators, and the configurability of operators. The operators composition problem can be reduced as a multi-objective and multi-constrained optimization problem. The results represent that GWO outperforms PSO in searching for an approximately optimal operators composition in the corresponding edge cluster, and reducing the energy consumption. References [1] N. Tziritas, T. Loukopoulos, S.U. Khan, C.-Z. Xu, and A.Y. Zomaya. (2016) “On improving Constrained single and group operator placement using evictions in big data environments,” IEEE Transactions on Services Computing 9 (5): 818–831. [2] Z. Zhou, D. Zhao, L. Liu, and P. C. Hung. (2018) “Energy-aware composition for wireless sensor networks as a service,” Future Generation Computer Systems 9: 299–310. [3] H. Naim, M. Aznag, M. Quafafou, and N. Durand. (2016) “Probabilistic approach for diversifying web services discovery and composition,” IEEE International Conference on Web Services: 73–80, . [4] Y. T. Jiao, P. Wang, S. H. Feng, and D. Niyato. (2018) “Profit Maximization Mechanism and Data Management for Data Analytics Services” IEEE Internet of Things Journal, doi: 10.1109/JIOT.2018.2819706. [5] G. Chatzimilioudis, A. Cuzzocrea, D. Gunopoulos, and N. Mamoulis. (2013) “A novel distributed framework for optimizing query routing trees in wireless sensor networks via optimal operator placement,” Journal of Computer and System Sciences 79 (3): 349–368. [6] R. Sappidi, A. Girard, and C. Rosenberg. (2013) “Maximum achievable throughput in a wireless sensor network using in-network computation for statistical functions,” IEEE/ACM Transactions on Networking 21 (5): 1581–1594. [7] Xiaolin Fang, Junzhou Luo, Guangchun Luo, Weiwei Wu, Zhipeng Cai, and Yi Pan. “Big Data Transmission in industrial IoT System with Small Capacitor Supplying Energy,” IEEE Transactions on Industrial Informatics (TII). [8] Zaobo He, Zhipeng Cai*, Qilong Han, Weitian Tong, Limin Sun, and Yingshu Li. (2016) “An Energy Efficient PrivacyPreserving Content Sharing Scheme in Mobile Social Networks,” Personal and Ubiquitous Computing 20 (5): 833–846. [9] Quan Chen, Hong Gao, Zhipeng Cai, L. Cheng, and Jianzhong Li. (2018) “Energy-Collision Aware Data Aggregation Scheduling for Energy Harvesting Sensor Networks,” The 37th Annual IEEE International Conference on Computer Communications (INFOCOM). [10] Tuo Shi, Siyao Cheng, Zhipeng Cai, Yingshu Li and Jianzhong Li. (2017) “Exploring Connected Dominating Sets in Energy-Harvest Networks,” IEEE/ACM Transactions on Networking 25 (3): 1803–1817.