Large-scale and adaptive service composition based on deep reinforcement learning

Journal Pre-proofs Large-Scale and Adaptive Service Composition Based on Deep Reinforcement Learning Jiang-Wen Liu, Li-Qiang Hu, Zhao-Quan Cai, Li-Nin...

Download PDF

530KB Sizes 0 Downloads 32 Views

Report

PDF Reader
Full Text

Journal Pre-proofs Large-Scale and Adaptive Service Composition Based on Deep Reinforcement Learning Jiang-Wen Liu, Li-Qiang Hu, Zhao-Quan Cai, Li-Ning Xing, Xu Tan PII: DOI: Reference:

S1047-3203(19)30308-6 https://doi.org/10.1016/j.jvcir.2019.102687 YJVCI 102687

To appear in:

J. Vis. Commun. Image R.

Received Date: Revised Date: Accepted Date:

16 August 2019 14 October 2019 14 October 2019

Please cite this article as: J-W. Liu, L-Q. Hu, Z-Q. Cai, L-N. Xing, X. Tan, Large-Scale and Adaptive Service Composition Based on Deep Reinforcement Learning, J. Vis. Commun. Image R. (2019), doi: https://doi.org/ 10.1016/j.jvcir.2019.102687

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

© 2019 Published by Elsevier Inc.

Large-Scale and Adaptive Service Composition Based on Deep Reinforcement Learning Jiang-Wen LIU1, Li-Qiang HU2, Zhao-Quan Cai4*, Li-Ning XING3, Xu TAN3* 1 College of Mechanical and Electrical Engineering, Jiangsu Vocational Institute of Architectural Technology, Xuzhou 221116, P.R. China 2 School of Electrical and Electronic Engineering, Shijiazhuang Tiedao University, Shijiazhuang 050043, PR China 3 School of Software Engineering, Shenzhen Institute of Information Technology, Shenzhen 518172, P.R. China 4 Department of Information Science and Technology, Huizhou University, Huizhou 516007, PR China *Corresponding author: Zhao-Quan Cai, Xu Tan.

Abstract: Service composition is a research hotspot with practical value. With the development of Web service, many Web services with the same functional attributes emerge. However, service composition optimization is still a big challenge since the complex and unstable composition environment. To solve this problem, we propose an adaptive service composition based on deep reinforcement learning, where recurrent neural network (RNN) is utilized for predicting the objective function, improving its expression and generalization ability, and effectively solving the shortcomings of traditional reinforcement learning in the face of large-scale or continuous state space problems. We leverage heuristic behavior selection strategy to divide the state set into hidden state and fully visible state. Effective simulation of hidden state space and fully visible state of the evaluation function can further improve the accuracy and efficiency of the combined results. We conduct comprehensive experiment and experimental results have shown the effectiveness of our method. Keywords: Service composition; deep reinforcement learning; QoS; behavior strategy. 1. Introduction With the development of Internet, traditional service cannot meet the demand of users. Inter-domain, multi-collaborative, integrated network collaboration has been becoming a research hotspot [24-29]. The combination of IT and service is significant for modern intelligent systems. One the one hand, IT can support the application of transportation, banking, and e-commerce. Users access these service websites through the network, so that these traditional services can be extended to the Internet network platform. On the other hand, on the basis of focusing on their core business, enterprises also need the services provided by other business partners to complete some non-core business. As a result, the demand for integration technology is becoming more and more intense. How to solve the system integration problems caused by platform differences, protocol differences and language differences of many application systems has become a difficult problem and challenge. Service-Oriented Architecture (SOA) aims to combine different units of applications to complete specific tasks. Web services are the most promising technical means to

implement the SOA architecture and the basic elements of the SOA architecture. These services are independent, modular and interoperable. With the maturity of standards related to Web services technology, Web services have gradually become the standard form of resource encapsulation in the network environment, and these services that can be inherited by public access constitute a huge standard component library. Therefore, in the technology of extensible interoperability of Web services, the realization of service composition and the provision of integrated services have become the natural demand of the development of Web services technology. As a significant value-added function in Web services, web service composition has attracted many scholars' research [30-32]. Among many service components, there are many applications with the same functions but different quality of service [33, 34]. QoS is used to describe the ability of a product or a service to meet user needs, also known as the non-functional attributes of a service. QoS was propose by Menasce [1], which consists of four attributes: availability, security, corresponding time and throughput. In practice, because of the dynamic nature of the network environment and the unpredictability of the future, Web services relying on the network are bound to be dynamic. In addition, the evolution of Web services increases the variability of the composition process. Therefore, a good Web service composition scheme should be able to cope with such dynamic variables, and it should not fail because of changes in the external environment or the service itself. In addition, with the increasing complexity of user requirements, the number of abstract services in the composite flow increases, and the increase of homogeneous services also extends the space of candidate service sets. Aiming at the disadvantages of existing service composition, in this paper, we combine the reinforcement learning and deep learning to achieve large-scale and adaptive service composition. Recurrent neural network (RNN) is utilized for predicting the objective function, improving its expression and generalization ability, and effectively solving the shortcomings of traditional reinforcement learning in the face of large-scale or continuous state space problems. 2. Related work Integration of data, applications and system is an urgent need for Internet services. Web service is a research hotspot for both academic and industry domains. Web services are an important means of communication between devices. Published on the Internet, Web services for users to discover and invoke are essentially self-describing, modular and self-contained programs that can provide services with different functions. Ankolekar et al. [2] presented DAML-s to describe the properties and capabilities of Web services. The authors introduced three aspects of the proposed method: the service profile, the process model and the service grounding. Maximilien et al. [3] put forward a conceptual model for reputation to solve the problem that current approaches had no basis for selecting service. Zheng et al. [4] proposed collaborative filtering-based QoSaware web service recommendation. Users are always overwhelmed by huge candidate similar modules when they require Web services. Thus, Zheng et al. proposed service recommendation applications based on two collaborative filtering. Souza et al. [5] proposed an integration architecture named SOCRADES to serve the requirement of enterprises manufacturing. Foster et al. [6] proposed a model-based method to evaluate

the Web service composition. Liu et al. [7] proposed Quality of Service (QoS) to verify the quality of Web services. An open, fair, and dynamic QoS model was used in their applications. Majithia et al. [8] proposed a graphical Web service composition, where users can visually create service composition. In addition, the proposed framework can allow users to record data on a P2P network. Ardagna et al. [9] defined an adaptive process for Web service process named PAWS. Web service composition can be modeled as a graph planning problem, where an action sequence needs to be used to predict target status. Yan et al. [10] used a partially effective planning graph to model service composition problem, and replaced the bad or even invalid services in the process of composition, so as to achieve a certain degree of environmental adaptability. In Literature [11] used hierarchical task network to model adaptive service composition. When the environment changes and services were released and disappeared, these methods need to update the planning map constantly and cannot adapt well to the dynamic environment. Reinforcement learning is a trial-and-error learning, which has been applied to adaptive service composition scenarios in recent years. When applying planning technology, we need to construct a complex state diagram, which is suitable for those relatively stable environments. Unlike planning technology, reinforcement learning is suitable for sequential decision-making problems in incomplete information scenarios as long as learners know their environment and current learning strategies. Wang et al. [12] modeled service composition as a Markov decision-making process and QoS attributes as Reward functions of learning agents. The optimal combination results were obtained by reinforcement learning algorithm. Learning agents would perceive the changes of external environment and make corresponding adjustments to maintain or achieve good service composition results. Literature [13] proposed a multi-standarddriven reinforcement learning algorithm to ensure that the system can respond to environmental changes to ensure availability. These methods were adaptive. However, when faced with the increase of complex composite workflow and candidate services, acceptable efficiency cannot be guaranteed. Literature [14] proposed a method based on Team Markov Games to solve the problem of adaptive service composition. In their work, multi-agent system was used to improve the efficiency, but the computation of the model was complex and the communication is very complex. The communication cost and complexity of the model reduce the efficiency of multi-agent. 3. Proposed method In order to effectively share Web service, it is significant to combine different web services. Web composition contains three attributes: scalability, adaptability, efficiency. In our work, we combine reinforcement learning and deep learning to achieve largescale adaptive Web service composition. 3.1 Reinforcement learning Reinforcement learning (RL) has shown effective learning ability in unknown data. RL is a kind of learning from environment state to action mapping, so as to maximize the cumulative reward value of action from environment. Unlike supervised learning and unsupervised learning, reinforcement learning mainly manifests itself in that the signal provided by the environment in learning is an evaluation of the action, rather

than telling the agent how to produce the correct action through positive examples and negative examples. Because Agent knows little about the external environment, it must depend on its own learning experience to constantly strengthen learning, acquire knowledge in the action-evaluation environment, and improve the action plan to adapt to the environment. RL aims to learn an action strategy 𝜋: 𝑆→𝐴. We adopt the infinite discount model. Agent considers the future infinite step reward and accumulates it in the value function through the discount. This process can be formulated as: ∞

𝑉𝜋(𝑠𝑡) =

∑𝛾 𝑟 𝑖

𝑡+𝑖

(1)

𝑖=0

where 𝛾 represents the discount factor, and 0 < 𝛾 < 1. 𝑟𝑡 + 𝑖 represents the reward value when Agent is from 𝑠𝑡 to 𝑠𝑡 + 1. Based on the optimal function, the optimal action strategy is formulated as: 𝜋 ∗ = 𝑎𝑟𝑔max 𝑉𝜋(𝑠), ∀𝑠 ∈ 𝑆 (2) 𝜋

In the face of uncertain environment, reinforcement learning uses the reward value of the environment to find the optimal behavior strategy without prior training samples. This is the biggest difference from supervised learning, so reinforcement learning is an online learning technology. In addition, in reinforcement learning, agents do not need to know the dynamic model of the system. They only need to memorize the environment and the knowledge of single-sign strategy. The interaction between agents and environment can be predicted by some search process. Therefore, reinforcement learning is more widely used in practical problems. In practice, Agent influences the distribution of training samples by choosing action strategies, which leads to the dilemma of exploration and utilization. The former is to repeatedly explore unknown states and actions, improve long-term performance, and help converge to the optimal strategy. The latter is to make full use of the state and actions that have been obtained with high returns, so as to reduce search time and improve efficiency. Both have their own advantages and disadvantages. If Agent chooses the gain action brought by exploration and deviates from the current greedy strategy, it will lead to a part of the loss of instantaneous benefits. Therefore, Agent needs to face the trade-off problem in exploring and utilizing. In the face of large-scale MDP or continuous space MDP problems, reinforcement learning cannot traverse all states. In addition, table-based reinforcement learning algorithm cannot cope with large-scale scenarios. Therefore, reinforcement learning is required to improve the original table storage strategy, enhance the generalization ability of value function, and replace lookup table with parameterized fitting function. The essence of function estimation in reinforcement learning is to use parameterized functions to approximate mapping relations. When using approximators to solve largescale MDPs or continuous state space problems, Bhatnagar [15, 16] trained linear and non-linear approximators respectively, analyzed their structures, and designed training methods to make the structure more accurate. 3.2 Long short-term Memory

Recurrent neural network was proposed by Goller [17]. The internal state of RNN can show dynamic sequences. Different from feedforward neural networks, RNN can utilize its internal memory to process input sequences of arbitrary time series. Cyclic neural networks use and serialize data and simulate these data accurately. It records the activation values at each time, and enhances the temporal correlation of networks by adding self-connection hidden layer of cross-domain time points. RNN has been widely used in handwriting digital recognition [18], language model [19], machine translation [20], speech recognition [21]. In our implementation, we leverage Long short-term memory (LSTM). We define standard recurrent neuron as follows: ℎ𝑑𝑖 + 1 = 𝑓(𝑎𝑑𝑖 + 1) 𝑎𝑑𝑖 + 1 =

∑𝑤 𝑥

𝑑+1 𝑖𝑗 𝑗

+

(3)

∑𝑢

𝑑 𝑖𝑘ℎ𝑘

𝑗

(4)

𝑘

where the function 𝑓( ∙ ) is a nonlinear activation, and ℎ𝑑𝑖 denotes the status of the 𝑖th neuron at 𝑑-th step. 𝑥 denotes the neuron of previous layers, 𝑤 and 𝑢 denote connection weights. Considering three types of gates, formula (4) can be reorganized as follows: 𝑎𝑑𝑖 + 1 = 𝑐𝑑𝑖 + 1𝑎𝑑𝑖 + 𝑏𝑑𝑖 + 1g

(∑

𝑤𝑖𝑗𝑥𝑑𝑗 + 1 +

𝑗

)

∑𝑢

𝑑 𝑖𝑘ℎ𝑘

𝑘

(5)

where symbol c and b denote forget/keeping gate and input gaze, respectively. As similar as function 𝑓( ∙ ), 𝑔( ∙ ) is a nonlinear activation. The net input consists of three components: signals from previous layers, hosting cell and previous output. This can be formulated as:

(∑

𝛼𝑑𝑖 + 1 = 𝑔

𝑗

𝑤𝛼𝑖𝑗𝑥𝑑𝑗 + 1 +

∑𝑣

𝛼 𝑑 𝑖𝑘ℎ𝑘

𝑘

)

+ 𝑣𝛼𝑖𝑎𝑑𝑖 + 𝑖𝛼

(6)

Where 𝛼 denotes a gate of {b,c,d}. 3.3 Deep reinforcement learning Deep learning-based algorithms leverage multi-layer network structure and nonlinear transformation to combine low-level features to form abstract and easy-todistinguish high-level representations to discover distributed feature representations of data. RL aims to maximize the cumulative reward value of Agent from the environment to learn the optimal strategy to achieve the goal. Therefore, the RL method focuses on learning the strategy to solve the problem. Thus, we combine deep learning and reinforcement learning to form deep reinforcement learning (DRL). Traditional reinforcement learning algorithms leveraged Lookup-table method to store state action pairs. In this paper, we use the advantage of neural network to simulate a function, and use deep learning to simulate the state of service composition problem. In the process of reinforcement learning based on Q-value table, each action will update the Q-value table accordingly. The depth neural network is trained to form a fitted Qvalue surface through the neural network, which is used as an on-line heuristic function

to enlighten the subsequent reinforcement learning. We propose an adaptive deep Qlearning and RNN composition network (ADQRCN), which consists of 30 LSTM. The framework is shown in Figure 1. We follow the method in [22, 23] to train our model. We have made a lot of improvements to the traditional Q-learning algorithm, adding the objective-valued network to alleviate the instability of the linear network representation function. Playback memory unit is used to store samples, and random sampling is used to train network parameters. In addition, we use dropout to prevent over-fitting.

Figure 1. The pipeline of ADQRCN. 4. Experimental result and analysis This paper is based on the study of service composition optimization schemes of quality of service, so on the basis of combining deep reinforcement learning, the ultimate goal is to enable users to obtain high-quality combination schemes to meet user requirements. In our experiment, four QoS attributes are used to evaluate the service quality: Response time, Throughput, Availability, and Reliability. Our experiment is conducted on a PC equipped with Intel i7-6700K, 4.00GHZ CPU, 16GB RAM. 4.1 Dataset Introduction QWS dataset is used in our experiment. According to the distribution of each QoS attribute value in the QWS data set, we use global quality constraints to set random values to generate a set of requirements for testing, which enlarges the quality of service. Firstly, the average value of service on each quality index corresponding to each task is calculated. For each quality index of service set under each task, there is a corresponding constraint condition to specify the range (upper and lower limits), and the values of these upper and lower limits are randomly selected between 0.7 and 1.1. Then we randomly generate POMDP service composition transition graph. The number of candidate services corresponding to each abstract node is set according to different experimental requirements. The functions of these services are the same, mainly in

terms of QoS attributes. The values of these services and their QoS attributes are extracted from the expanded QWS data set. 4.2 Effectiveness analysis We follow the experimental settings in [12]. In order to verify the effectiveness of our algorithms, we compare our method with OSON-WSC and QCN. We leverage ε ― 𝑔𝑟𝑒𝑒𝑑𝑦 as the learning strategy, and ε = 0.6. The comparison result is shown in Figure 2. As we can see from Figure 2, compared with traditional reinforcement learning method QCN, our proposed method ADQRCN performs better. With the increase of training samples, learning efficiency will be improved. The cumulative returns of the convergence of traditional reinforcement learning algorithm QCN are about 42, while those of ADQRCN and OSON-WSC are 47.1 and 48.9, respectively. It can be seen that ADQRCN and OSON-WSC are significantly better than QCN in effectiveness. Secondly, in terms of convergence time, ADQRCN converges faster than QCN. In the face of large-scale service composition scenarios, QCN performs poorly, while ADQRCN algorithm generalizes it, which is more suitable for large-scale service composition scenarios. Moreover, the heuristic behavior strategy added by OSONWSC optimizes the choice of strategies. Using exploratory experience to effectively distinguish the two states, educational ADQRCN has further improved both in efficiency and effectiveness.

Figure 2. The comparison results of three algorithms. In order to verify the effectiveness of ADQRCN and OSON-WSC methods in combination results, we compare the differences of three methods in combination results under different number of States and candidate services. The combined results reflect the cumulative return value in the algorithm, which directly reflects the effectiveness and advantages of the method. In combination scenarios of different scales, the smaller the cumulative discount value is, the higher the quality of the combination scheme is. The comparison result is shown in Figure 3. Under the same

number of states, the cumulative returns of the three methods are also significantly different. Compared with ADQRCN and OSON-WSC, the combination scheme learned by the traditional reinforcement method is deficient, and the cumulative returns are significantly lower, reflecting the differences in user satisfaction. Secondly, the experiment changes the cumulative returns of composite scenarios by changing the number of candidate services under the state nodes.

Figure 3. Validation of accumulated return value. 5. Conclusion In this paper, we propose an adaptive service composition based on deep reinforcement learning, where recurrent neural network (RNN) is utilized for predicting the objective function, improving its expression and generalization ability, and effectively solving the shortcomings of traditional reinforcement learning in the face of large-scale or continuous state space problems. We leverage heuristic behavior selection strategy to divide the state set into hidden state and fully visible state. Effective simulation of hidden state space and fully visible state of the evaluation function can further improve the accuracy and efficiency of the combined results. 6. Acknowledgement Fund project: This paper was supported by the National Natural Science Foundation of China (61773120), the scientific project of mining machinery control and parts engineering center in Jiangsu province (JYAPT17-05), Xuzhou science and technology project (KH17003) and the Youth project of science and Technology of Department of Education in Hebei provincial (QN2016237). Reference [1] Menasce, D. A. . (2002). QoS issues in web services. IEEE Internet Computing, 6(6), 72-75. [2] Ankolekar, A. , Burstein, M. , Hobbs, J. R. , Lassila, O. , & Sycara, K. . (2002).

DAML-S: Web Service description for the Semantic Web. Proceedings of the First International Semantic Web Conference on The Semantic Web. Springer, Berlin, Heidelberg. [3] Maximilien, E. M., & Singh, M. P. (2002). Conceptual model of web service reputation. Acm Sigmod Record, 31(4), 36-41. [4] Zheng, Z., Hao, M., Lyu, M. R., & King, I. (2011). Qos-aware web service recommendation by collaborative filtering. IEEE Transactions on Services Computing, 4(2), 140-152. [5] Souza, L. M. S. D., Spiess, P., Guinard, D., Köhler, M., Karnouskos, S., & Savio, D. (2008). SOCRADES: A Web Service Based Shop Floor Integration Infrastructure. International Conference on the Internet of Things. [6] Foster, H., Uchitel, S., Magee, J., & Kramer, J. (2003). Model-based verification of Web service compositions. IEEE International Conference on Automated Software Engineering. [7] Liu, Y., Ngu, A. H., & Zeng, L. Z. (2004). QoS computation and policing in dynamic web service selection. International Conference on World Wide Webalternate Track Papers & Posters. [8] Majithia, S., Shields, M., Taylor, I., & Wang, I. (2004). Triana: A Graphical Web Service Composition and Execution Toolkit. IEEE International Conference on Web Services. [9] Ardagna, D. , Comuzzi, M. , Mussi, E. , Pernici, B. , & Plebani, P. . (2007). Paws: a framework for executing adaptive web-service processes. IEEE Software, 24(6), 3946. [10] Yan, Y. , Poizat, P. , & Zhao, L. . (2010). Self-Adaptive Service Composition Through Graphplan Repair. IEEE International Conference on Web Services. IEEE. [11] Beauche, S. , & Poizat, P. . (2008). Automated service composition with adaptive planning. International Conference on Service-oriented Computing. Springer-Verlag. [12] Wang, H. , Zhou, X. , Zhou, X. , Liu, W. , Li, W. , & Bouguettaya, A. . (2010). Adaptive service composition based on reinforcement learning. [13] Jureta, I. J. , Stéphane Faulkner, Achbany, Y. , & Saerens, M. . (2007). Dynamic Web Service Composition within a Service-Oriented Architecture. 2007 IEEE International Conference on Web Services (ICWS 2007), July 9-13, 2007, Salt Lake City, Utah, USA. IEEE. [14] Wang, H. , Wu, Q. , Chen, X. , Yu, Q. , Zheng, Z. , & Bouguettaya, A. . (2014). Adaptive and Dynamic Service Composition via Multi-agent Reinforcement Learning. Proceedings of the 2014 IEEE International Conference on Web Services. IEEE Computer Society. [15] Maei, H. R. , Szepesvári, Csaba, Bhatnagar, S. , Precup, D. , Silver, D. , & Sutton, R. S. . (2009). Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation. International Conference on Neural Information Processing Systems. Curran Associates Inc. [16] Sutton, R. S. , Maei, H. R. , Precup, D. , Bhatnagar, S. , Silver, D. , & Wiewiora, E. . (2009). Fast gradient-descent methods for temporal-difference learning with linear function approximation. Danyluk Et.

[17] Goller, C. , & Kuchler, A. . (1996). Learning task-dependent distributed representations by backpropagation through structure. Proceedings of International Conference on Neural Networks (ICNN'96). IEEE. [18] Graves, A. . (2013). Generating sequences with recurrent neural networks. Computer Science. [19] Lanchantin, J., Singh, R., Wang, B., & Qi, Y. (2016). Deep motif dashboard: visualizing and understanding genomic sequences using deep neural networks. Pacific Symposium on Biocomputing Pacific Symposium on Biocomputing, 22, 254. [20] Zhang, C., Zhong, M., Wang, Z., Goddard, N., & Sutton, C. (2017). Sequence-topoint learning with neural networks for nonintrusive load monitoring. [21] Graves, A. , Mohamed, A. R. , & Hinton, G. . (2013). Speech recognition with deep recurrent neural networks. [22] Mnih, V. , Kavukcuoglu, K. , Silver, D. , Graves, A. , Antonoglou, I. , & Wierstra, D. , et al. (2013). Playing atari with deep reinforcement learning. Computer Science. [23] Human-level control through deep reinforcement learning. Nature, 518(7540), 529-533, 2015. [24] Xiang S., Xing L.N., Wang L., et. al. (2019), Comprehensive Learning PigeonInspired Optimization with Tabu List, Science China - Information Sciences, 2019, 62(7): Article Number: 070208 [25] Zhang JW, Wang L, Xing LN. Large-scale medical examination scheduling technology based on intelligent optimization. Journal of Combinatorial Optimization, 2019, 37(1): 385-404 [26] Wu G.H., Pedrycz W., Li H.F., et. al. (2016), Coordinated Planning of Heterogeneous Earth Observation Resources. IEEE Transactions on System Man Cybernetics: System, 46: 109-125 [27] Wang R., Lai S.M., Wu G.H., et. al. (2018), Multi-clustering via evolutionary multi-objective optimization, Information Sciences, 450: 128-140 [28] Yi J.H., Xing L.N., Wang G.G., et. al. (2019), Behavior of Crossover Operators in NSGA-III for Large-scale Optimization Problems, Information Science, https://doi.org/10.1016/j.ins.2018.10.005 [29] Ren T, Li S, Zhang X, et al. Maximum and minimum solutions for a nonlocal pLaplacian fractional differential system from eco-economical processes. Boundary Value Problems, 2017, Article No.:118 [30] Wu G.H., Shen X., Li H., et. al. (2018), Ensemble of differential evolution variants, Information Sciences, 423: 172-186 [31] Cai ZQ, Chen GC, Xing LN, et al. Evaluating hedge fund downside risk using a multi-objective neural network. Journal of Visual Communication of Image Representation, 2019, 59: 433-438 [32] Wu G.H., Mallipeddi R., Suganthan P.N. (2019), Ensemble Strategies for Population-based Optimization Algorithms-A Survey. Swarm and Evolutionary Computation, https://doi.org/10.1016/j.swevo.2018.08.015. [33] Ren T, Xing LN, Zhou ZB, et al. The iterative scheme and the convergence analysis of unique solution for a singular fractional differential equation from complex process. Complexity, 2019, Accepted

[34] Jiao B, Shi JM, Zhang WS, et al. Graph sampling for Internet topologies using normalized Laplacian spectral features. Information Sciences, 2019, 481: 574-603

Conflict of interest We argue that we have no conflict of interest.

Large-scale and adaptive service composition based on deep reinforcement learning

Large-scale and adaptive service composition based on deep reinforcement learning

Recommend Documents