Available online atatwww.sciencedirect.com Available online atwww.sciencedirect.com www.sciencedirect.com Available online
ScienceDirect ScienceDirect ScienceDirect Energy Procedia 152 1188–1193 Energy Procedia 00(2018) (2017) 000–000 Procedia 00 (2018) 000–000 Energy
www.elsevier.com/locate/procedia www.elsevier.com/locate/procedia
CUE2018-Applied Energy Symposium and Forum 2018: Low carbon cities and Forum5–7 2018: Low carbon cities and urban energy systems, Applied Energy Symposium urban energyand systems, June 2018, Shanghai, China CUE2018, 5–7 June 2018, Shanghai, China The 15th International Symposium on District Heating and Cooling Demand Side Data Generating Based on Conditional Generative Adversarial Networks Assessing the feasibility of using the heat demand-outdoor a,b a long-term temperature function heata,b demand forecast Jian Lanfor , Qinglai Guoa,b,*, district Hongbin Sun a Department
of Electrical Engineering, Tsinghua University, 100084 Beijing, China
Engineering, 100084 Beijing, China a,b,cDepartment of Electrical a a Tsinghua University, b c I. Andrić A. Pina , P. Ferrão J. Fournier ., B. Lacarrière O. Le Correc State Key Lab *, of Control and Simulation of Power, Systems and Generation Equipments, 100084,Beijing, China b
a
IN+ Center for Innovation, Technology and Policy Research - Instituto Superior Técnico, Av. Rovisco Pais 1, 1049-001 Lisbon, Portugal b Veolia Recherche & Innovation, 291 Avenue Dreyfous Daniel, 78520 Limay, France c Département Systèmes Énergétiques et Environnement - IMT Atlantique, 4 rue Alfred Kastler, 44300 Nantes, France Abstract
With the technological advancement in the fields of advanced metering infrastructure (AMI), a massive amount of customers’ electricity consumption data is collected. Meanwhile, the energy providers need to make informed decisions based on power Abstract strategy of demand side to reduce overall operational cost. So how to generate demand side load data based on consumption historical energy consumption data or customer attribute is a pressing issue. In this paper, we propose a data-driven approach to District new heating networks are commonly addressed in theproperty literature onepattern of the learnt most from effective solutions for decreasing the generate power consumption data based on intrinsic of as load demand side using conditional greenhouse gas emissions from(cGANs), the building sector. Theseon systems require high investments are known returnedasthrough the and heat generative adversarial networks which is based two interconnected deep neural which networks generator sales. Due toBythe changed conditions andfrom building renovation policies, demand in thedemand future side could decrease, discriminator. using severalclimate representative labels the responded surveys and heat the load data from to train the prolonging the investment period. models, the generator is able return to generate realistic power consumption data by given labels which can be used for energy management Thescheduling, main scopethe ofdiscriminator this paper is to assess the of using the heatconsumption demand – outdoor temperature function for heat demand and is capable of feasibility detecting abnormal power and system error from the smart meter data. forecast. © The district of Alvalade, located in Lisbon (Portugal), was used as a case study. The district is consisted of 665 Copyright 2018 Elsevier Ltd. All rights reserved. Copyright © 2018 Elsevier Ltd. All rights reserved. buildingsand that vary in both construction period andscientific typology.committee Three weather scenarios (low, medium, and high)Forum and three Selection peer-review under responsibility of the of Applied Energy Symposium 2018:district Low Selection and peer-review under responsibility of the scientific committee of the CUE2018-Applied Energy Symposium and renovation scenarios were developed (shallow, intermediate, deep). To estimate the error, obtained heat demand values were carbon cities and urban energy systems, CUE2018. Forum 2018: Low carbon cities and urban energy systems. compared with results from a dynamic heat demand model, previously developed and validated by the authors. The results showedGANs, that when only models, weatherdemand changeside, is considered, the analysis, margin of errormanagement. could be acceptable for some applications Keywords: generative smart meter data energy conditional (the error in annual demand was lower than 20% for all weather scenarios considered). However, after introducing renovation scenarios, the error value increased up to 59.5% (depending on the weather and renovation scenarios combination considered). value of slope coefficient increased on average within the range of 3.8% up to 8% per decade, that corresponds to the 1.The Introduction decrease in the number of heating hours of 22-139h during the heating season (depending on the combination of weather and renovation scenarios considered). On energy the othercarrier hand, function intercept increased for and 7.8-12.7% on the Electricity has become a main for industry, transportation daily per life decade of all (depending human beings, coupled scenarios). The values suggested could be used to modify the function parameters for the scenarios considered, and especially irreplaceable for urban energy system. Technological advancement in smart meter and communication has improve the accuracy of heat demand estimations.
enable the energy providers to monitor the changes in power consumption in demand side. Advanced metering infrastructure (AMI) includes smart meters, communication networks has been realized to achieve an intelligent grid © 2017 The Authors. Published by Elsevier Ltd. byPeer-review acquiringunder real responsibility time data ofofdemand side Committee and the acquired dataInternational can be utilized for data analysis such and as load the Scientific of The 15th Symposium on District Heating forecasting and energy management [1]. However, due to the stochastic and dynamic nature of electricity consumption Cooling. in demand side, it is always a big challenge to know in what way a customer tends to consume electricity. The power Keywords: need Heat demand; providers to learnForecast; from aClimate small change group of customers’ power consuming pattern to speculate how others adopting their power consuming strategy for electricity purchase scheduling and adjusting their pricing strategy. Recently, deep learning has been applied in many industries and the smart grids is no exception. As deep learning has the advantages in handling massive amounts of data, it has been used in smart meter analytics: [2] proposed a 1876-6102 © 2017 The Authors. Published by Elsevier Ltd. Copyright © 2018 Elsevier Ltd. All rights reserved. of The 15th International Symposium on District Heating and Cooling. 1876-6102 Peer-review under responsibility of the Scientific Committee 1876-6102 Copyright © 2018 Elsevier Ltd. Allofrights reserved.committee of the Applied Energy Symposium and Forum 2018: Low carbon cities Selection and peer-review under responsibility the scientific Selection and peer-review under responsibility of the scientific committee of the CUE2018-Applied Energy Symposium and Forum and urban energy systems, CUE2018. 2018: Low carbon cities and urban energy systems. 10.1016/j.egypro.2018.09.157
2
Jian Lan et al. / Energy Procedia 152 (2018) 1188–1193 J. Lan et al./ Energy Procedia 00 (2018) 000–000
1189
method to improve the system level intraday load forecasting considering customer behavior similarities, [3] presents how neural networks used for load identification in a non-intrusive load monitoring system. Several new models were proposed to improve ability of deep learning networks such as generative adversarial networks (GANs). It was first introduced by Ian Goodfellow et al. in 2014 [4] as a novel way to train generative models in unsupervised machine learning, which is composed of two interconnected neural networks contesting with each other in a zero-sum game framework. The aim of GANs is to capture the potential distribution of input data and generate new identically distributed data samples. Since its initiation, it has been widely used in computer vision and natural language processing, it shows distinguish talent in producing images and videos. In this paper, we propose a new method to generate power consumption data in demand side by using generative adversarial networks (cGANs) and applying some other improved algorithm on it. After training the model with historical load data and several labels representing some characteristics of demand side, the model is able to generate how a customer consuming electricity accord with the given labels even if this label combination hasn’t shown before. Nomenclature Indices, Sets, Vectors and Matrices GANs cGANs wGANs G D
generative adversarial networks conditional generative adversarial networks Wasserstein generative adversarial networks generator/generative model discriminator/discriminative model
2. Methodology
𝑧𝑧 ℙ 𝜃𝜃 𝑝𝑝 𝑓𝑓 𝑔𝑔
input noise vector probablity distribution parameters of the generative/ discriminative model probability density function marginal distribution gradient of the neural network
2.1. Generative adversarial networks There are two models in Generative adversarial networks (GANs), known as generator (G) and discriminator (D), both of which could be a non-linear mapping function such as deep neural networks. Traditional deep learning models are primarily based on the backpropagation and dropout algorithms which need piecewise linear units to compute wellbehaved gradient [5]. However, there are many probabilistic computations in deep generating models hard to approximate when implement maximum likelihood estimation in real samples, the Markov chain has high computational complexity. So the GANs get much attentions because there is no need for Markov chain and no inference is required during either training or generation, the backpropagation is only used to obtain gradients. At the same time, GANs can yield specific training algorithms for many kinds of model and optimization algorithm [4], which means it can be used in various situation. The generator and the discriminator are trained in the same time and they are in a game process: discriminator tries to determine whether the sample it received is from the data distribution, generator tries to produce samples highly resembling the original data to "fool" the discriminative model. The competition in this game push both of these two models to improve their methods until it comes to Nash equilibria which means the samples produced by the generative model can’t be distinguished from the original data. The generating target of GANs is the original data whose distribution is defined as 𝑝𝑝𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 , and there is a group of noise vector 𝑧𝑧 under a known distribution Z~ℙ𝑧𝑧 such as Gaussian distribution as input to the generative model and present a mapping to the data space as 𝑥𝑥 = 𝐺𝐺(𝑧𝑧; 𝜃𝜃𝑔𝑔 ) (𝜃𝜃𝑔𝑔 is parameters of the generative model), output from the discriminative model 𝐷𝐷(𝑥𝑥; 𝜃𝜃𝑑𝑑 ) (𝜃𝜃𝑑𝑑 is parameters of the discriminative model) 𝐷𝐷(𝑥𝑥) is a single scalar representing the probability that 𝑥𝑥 comes from 𝑝𝑝𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 rather than the generator’s distribution 𝑝𝑝𝑔𝑔 . So the purpose of training generative model is to maximize 𝐷𝐷(𝐺𝐺(𝑧𝑧)) and the purpose of training discriminative model is minimize 𝐷𝐷(𝐺𝐺(𝑧𝑧)) and maximize 𝐷𝐷(𝑥𝑥) if 𝑥𝑥 comes from the original data. So the objective function 𝑉𝑉(𝐺𝐺, 𝐷𝐷) can be written as followed: (1) min max V ( D, G ) x ~ pdata ( x ) [log D( x)] z ~ pz ( z ) [log(1 D(G ( z )))] G
D
2.2. Conditional generative adversarial networks However, the traditional GANs is an unconditioned model, the generated samples can’t be controlled. When it is
Jian Lan et al. / Energy Procedia 152 (2018) 1188–1193 J. Lan et al./ Energy Procedia 00 (2018) 000–000
1190
3
used in MNIST dataset to create handwritten digits, though the output can be recognized by humans, the number is generated randomly. Conditional generative adversarial nets (cGANs) can be a solution to that problem [6]. In cGANs, there is an information vector 𝑦𝑦 such as class labels. Both of the generative model and discriminative model are trained with 𝑦𝑦 fed as additional input layer which can be joint with prior noise as a single hidden layer of a MLP or be fully interacted with the prior noise to be trained such as word embedding. So the equation (1) can be adjusted as followed: (2) min max V ( D, G ) x ~ pdata ( x ) [log D( x | y )] z ~ pz ( z ) [log(1 D(G ( z | y )))] G
D
2.3. Generative adversarial networks with Wasserstein distance The minimax objective equation (2) can be interpreted as the dual of Wasserstein distance (also known as EarthMover distance) [7].If two random variables 𝑋𝑋 and 𝑌𝑌 with marginal distribution 𝑓𝑓𝑋𝑋 and 𝑓𝑓𝑌𝑌 , and 𝛤𝛤 represents all the possible joint distributions that has marginal of 𝑓𝑓𝑋𝑋 and 𝑓𝑓𝑌𝑌 , their Wasserstein distance can be defined as followed [8]:
W( X , Y ) inf
f XY
| x y | f
XY
( x, y)dxdy
(3)
Equation (3) describe the effort needed smallest to transport the probability distribution of an random variables 𝑋𝑋 to another probability distribution of an random variables 𝑌𝑌. In cGANs, the equation (3) can be used to get two random variables ℙ𝑋𝑋 (𝐷𝐷(𝑥𝑥|𝑦𝑦)) and ℙ𝑍𝑍 (𝐷𝐷(𝐺𝐺(𝑥𝑥|𝑦𝑦))) closed to each other so the Wasserstein distance can be used to optimized the model instead of using Jensen-Shannon (JS) divergence which has been proved in mathematics [9]. So there is an idea that using Wasserstein distance as the loss function to train GANs which known as wGANs, it has been proved that wGANs is more robust than the traditional GAN and it can get rid of mode collapsing problems. So in this paper, we use Wasserstein distance as loss function and adjust the conditional generative adversarial networks to w-cGANs. 2.4. Architecture and Algorithm The architecture of this paper is shown in Fig. 1. In this paper, we use power load consumption of small and medium enterprise (SME) in Irish as the input data, and we take several representative labels from the datasets and the questionnaire answered by the SME customers such as their business type and turnover of their last financial year, which will be described in detail in the next section.
Fig. 1. The architecture of conditional generative adversarial networks
In this paper, the input of the generator is the Gaussian noise 𝑧𝑧 and randomly generated labels applied with the same rules of load labels. The input of discriminator is the combination of the labels and the load data from original dataset or the generator. The generator aims to capture the distribution of historical customers’ load data with labels and generate new load curve imitating the customers’ behaviour, the discriminator aims to distinguish the samples from original dataset and the generator.
4
Jian Lan et al. / Energy Procedia 152 (2018) 1188–1193 J. Lan et al./ Energy Procedia 00 (2018) 000–000
1191
The models are trained using the algorithm as followed:
Algorithm Conditional Generative Adversarial Networks with Wasserstein distance Default values in this paper: 𝛼𝛼 = 0.001, 𝑐𝑐 = 0.01, 𝑚𝑚 = 5000, 𝑛𝑛𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 = 2 Hypermeter:α, learning rate. 𝑐𝑐, clipping parameter. 𝑚𝑚, batch size. 𝑛𝑛𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 , the number of iterations of the critic per generator iteration. Require: 𝜃𝜃𝑔𝑔0 , initial generator parameters. 𝜃𝜃𝑑𝑑0 , initial discriminator parameters. 1: while 𝜃𝜃 has not converged do 2: for t=0,……,𝑛𝑛𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 do (𝑖𝑖) # a batch from the real data 3: Sample {𝑥𝑥 (𝑖𝑖) , 𝑦𝑦𝑥𝑥 }𝑚𝑚 𝑖𝑖=1 ~𝑝𝑝𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 (𝑖𝑖) (𝑖𝑖) 𝑚𝑚 4: Sample {𝑧𝑧 , 𝑦𝑦𝑧𝑧 }𝑖𝑖=1 ~𝑝𝑝𝑧𝑧 𝑎𝑎𝑎𝑎𝑎𝑎 𝑝𝑝𝑌𝑌 # a batch from Gaussian distribution 5: # Update the discriminator by ascending its stochastic gradient: 1 (𝑖𝑖) (𝑖𝑖) 6: 𝑔𝑔𝜃𝜃𝑑𝑑 ← 𝛻𝛻𝜃𝜃𝑑𝑑 ∑𝑚𝑚 [𝐷𝐷(𝑥𝑥 (𝑖𝑖) |𝑦𝑦𝑥𝑥 ) − 𝐷𝐷(𝐺𝐺(𝑧𝑧 (𝑖𝑖) |𝑦𝑦𝑧𝑧 ))] 𝑚𝑚 𝑖𝑖=1 7: 𝜃𝜃𝑑𝑑 ← 𝜃𝜃𝑑𝑑 + 𝛼𝛼 ∙ 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅(𝜃𝜃𝑑𝑑 , 𝑔𝑔𝜃𝜃𝑑𝑑 ) 8: 𝜃𝜃𝑑𝑑 ← 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐(𝜃𝜃𝑑𝑑 , −𝑐𝑐, 𝑐𝑐) 9: end for 10: # Update generator nets using gradient descent: 1 (𝑖𝑖) 11: 𝑔𝑔𝜃𝜃𝑔𝑔 ← 𝛻𝛻𝜃𝜃𝑔𝑔 [− ∑𝑚𝑚 𝐷𝐷 (𝐺𝐺(𝑧𝑧 (𝑖𝑖)|𝑦𝑦𝑧𝑧 ))] 𝑚𝑚 𝑖𝑖=1 12: 𝜃𝜃𝑔𝑔 ← 𝜃𝜃𝑔𝑔 + 𝛼𝛼 ∙ 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅(𝜃𝜃𝑔𝑔 , 𝑔𝑔𝜃𝜃𝑔𝑔 ) 13: end while
3. Case study 3.1. Dataset description The methodology proposed before is applied to the data from Irish small and medium enterprise (SME) consisting of electricity consumption at 30-minute intervals over a period of 1.5 years and surveys responded before the start of the trial. The original dataset consists of 4232 Irish households data with about 90% of them are residential customers. Compare to SME customer, residential customers are more stochastic and it can’t be imitated with several labels. So in this paper, we only take the SME customers as an example of demand side to propose the methodology, the dataset of residential customers will be trained in larger networks with more labels in our future research. This dataset is available publicly and was obtained by the Irish CER (Commission for Energy Regulation) in an electricity costumer behavior trial. The data is stored and distributed by the ISSDA (Irish Social Science Data Archive) [10]. 3.2. Data pre-processing First, we checked the missing data values and abnormal data values in this dataset which can be the result of faulty data collection instruments, data from such days is fully removed instead of extrapolating the missing values to protect its original features. After this procedure, there is still 11301 days’ power consumption from 319 SME customers. Second, we chose several key questions from the surveys conducted before the trial with months and weeks of the power data as the label to train the conditional GANs. The questions we chose are listed in Table 1. Table 1. Survey features Feature
Description
Type of business
{Agriculture, Industry, Construction, Wholesale, Business and Science, Others}
Employees amount
{Sole trader, self employed, 1-5, 6-10, 11-24, 25-49, 50-99,100+}
Annual turnover
{<58k, 59k-100k, 101k-250k, 251k-500k, 501k-750k, 751k-1m,
(k for thousand, m for million)
1m-1.5m 1.6m-2m, 3m-5m, 6m-10m, 11m-20m, >20m}
Operation hour in workdays
{8h-10h during day, 8h-12h into evening, 18h-24h, other}
Age of premises
{<5yr, 6yr-10yr, 11yr-30yr, 31yr-75yr, >75yr}
Third, the purpose of this paper is to find how this features affect the characteristics of power consumption, so we normalized the original data to (0,1) to study their load characteristics. Without the normalization, the mean absolute consumption of the data may cause bad training approximation and generalization problems.
1192
Jian Lan et al. / Energy Procedia 152 (2018) 1188–1193 J. Lan et al./ Energy Procedia 00 (2018) 000–000
5
Fig. 2. Load curve of one day randomly chosen from each 319 SME customers
It is showed in the Fig. 2 that the load consumption of these 319 SME customers can’t be described with single method, and it’s complexly influenced by multiple factors. 3.3. Model training For the labels in this dataset is dispersed class variables, so we use one-hot encoder to encode the labels to train the models. To show the reliability of the cGANs models, we only use several MLP layers instead of complex layers such as convolutional layers in this paper, but it has also been tested that convolutional layer may work even better in models. The activation function of this model mainly consists of leaky Relu function [11], but the output layer is using sigmoid function to compute normalized power consumption and to estimate the probability of discriminator receiving the real electricity data, batch normalization and dropout algorithm are also used in the training process. The methodology to train the models has been introduced in Fig. 2. After training for 1000 epoch, the generative model is fully capable of generating power consumption data according to input labels. We choose three typical SME customers and their training process can be seen in the Fig. 3. The red dotted lines are five load curve randomly chosen from the original dataset to reflect how this SME customer consume electricity in normal times, the blue solid lines are generated by the generator according to the SME customers’ label and dates which defaults to a Monday in January.
Fig. 3. Three typical SME load curve while training
3.4. Model testing We have taken 20% of the dataset before training to test if these models work when giving them labels combination which they have not met before. It’s shown in Fig. 4. and the red dotted lines is chosen from the corresponding test dataset in random while the blue solid lines come from generator predicted by labels from test dataset. It can be easily seen that the generated load curve is similar to the test datasets which have never been fed to models before, and the generated load curve can reflect the load characteristic to some extent. It indicates that when the models are trained with labelled data, it can learn how those factors influence the load distribution in the end. Meanwhile, the discriminator is always capable of distinguishing the generated load data from the original dataset. The remarkable performance of the discriminator guarantees that in the training process the generator is constantly receiving reasonable parameters to use gradient descent, so this model may have higher accuracy when training for more epochs.
6
Jian Lan et al. / Energy Procedia 152 (2018) 1188–1193 J. Lan et al./ Energy Procedia 00 (2018) 000–000
1193
Fig. 4. Six generated load curve randomly chosen from test dataset
Besides, the discriminator could also be used for detecting the abnormal behaviour of customers. When the historical power consumption data of customers has been learnt by the GANs model, the discriminator from these models could be used to estimate whether a current power load conform to what it should be, so the energy providers could easily detect the error of smart meters or the energy theft. 4. Conclusion In this paper, we proposed a new method using the conditional generative adversarial networks (cGANs) with Wasserstein distance as loss function to learn the distribution of historical load data and generate power consumption data of demand side. Compared to existed method of load forecasting, this method can more efficiently and accurately generate power consumption data imitating the distribution of the history data with given labels, so the power providers can make their strategies and schedules with more accuracy. Acknowledgements This work is supported in part by The National Key Research and Development Program of China (Basic Research Class 2017YFB0903000) and in part by funding of State Grid Corporation of China (Basic Theories and Methods of Analysis and Control of the Cyber Physical Systems for Power Grid). We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan X Pascal GPU used for this research. References [1]
Mohassel R R, Fung A, Mohammadi F, et al. A survey on advanced metering infrastructure[J]. International Journal of Electrical Power & Energy Systems, 2014, 63: 473-484. [2] Quilumba F L, Lee W J, Huang H, et al. Using smart meter data to improve the accuracy of intraday load forecasting considering customer behavior similarities[J]. IEEE Transactions on Smart Grid, 2015, 6(2): 911-918. [3] Chang H H, Yang H T, Lin C L. Load identification in neural networks for a non-intrusive monitoring of industrial electrical loads[C]//International Conference on Computer Supported Cooperative Work in Design. Springer, Berlin, Heidelberg, 2007: 664-674. [4] Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets[C]//Advances in neural information processing systems. 2014: 2672-2680. [5] Glorot, X., Bordes, A., and Bengio, Y. (2011). Deep sparse rectifier neural networks. In AISTATS’2011. [6] Mirza M, Osindero S. Conditional generative adversarial nets[J]. arXiv preprint arXiv:1411.1784, 2014. [7] C. Villani, Optimal transport: old and new. Springer Science & Business Media, 2008, vol. 338. [8] Chen Y, Wang Y, Kirschen D S, et al. Model-free renewable scenario generation using generative adversarial networks[J]. IEEE Transactions on Power Systems, 2018. [9] Arjovsky M, Chintala S, Bottou L. Wasserstein gan[J]. arXiv preprint arXiv:1701.07875, 2017. [10] Commission for Energy Regulation (CER). (2012). CER Smart Metering Project - Electricity Customer Behaviour Trial, 2009-2010 [dataset]. 1st Edition. Irish Social Science Data Archive. SN: 0012-00. www.ucd.ie/issda/CER-electricity [11] He K, Zhang X, Ren S, et al. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification[C]//Proceedings of the IEEE international conference on computer vision. 2015: 1026-1034.