A multi-tier architecture for data analytics in smart metering systems

A multi-tier architecture for data analytics in smart metering systems

Journal Pre-proof A Multi-Tier Architecture for Data Analytics in Smart Metering Systems Juan C. Olivares-Rojas , Enrique Reyes-Archundia , ´ ´ Jose´...

1MB Sizes 0 Downloads 29 Views

Journal Pre-proof

A Multi-Tier Architecture for Data Analytics in Smart Metering Systems Juan C. Olivares-Rojas , Enrique Reyes-Archundia , ´ ´ Jose´ A. Gutierrez-Gnecchi , Johan W. Gonzalez-Murueta , Jaime Cerda-Jacobo PII: DOI: Reference:

S1569-190X(19)30155-8 https://doi.org/10.1016/j.simpat.2019.102024 SIMPAT 102024

To appear in:

Simulation Modelling Practice and Theory

Please cite this article as: Juan C. Olivares-Rojas , Enrique Reyes-Archundia , ´ ´ Jose´ A. Gutierrez-Gnecchi , Johan W. Gonzalez-Murueta , Jaime Cerda-Jacobo , A Multi-Tier Architecture for Data Analytics in Smart Metering Systems, Simulation Modelling Practice and Theory (2019), doi: https://doi.org/10.1016/j.simpat.2019.102024

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2019 Published by Elsevier B.V.

A Multi-Tier Architecture for Data Analytics in Smart Metering Systems Juan C. Olivares-Rojas*1, Enrique Reyes-Archundia1, José A. Gutiérrez-Gnecchi1, Johan W. González-Murueta1 and Jaime Cerda-Jacobo2 *Corresponding author: [email protected] Abstract. With the proliferation of smart meters in smart grids, new challenges have emerged in the energy sector and applications are continuously developed, mainly concerning data analytics to address those challenges. Traditionally, data analytics in smart grid systems is performed in server-side tier; however, it is necessary to process data analytics close to the smart meter to achieve better performance. In order to process data effectively, it is also necessary to implement methodologies to facilitate the integration of data analysis processes in the Advanced Metering Infrastructure (AMI). This paper presents a novel architecture for data analytics in Smart Metering Systems based on an edge-fog-cloud computing architecture that permits different types of data analytics in a multi-tier context. The proposed architecture has the capability of learning and adapting to different contexts in smart metering systems using a reinforcement learning approach. The architecture was tested with three different analytic applications: forecasting energy consumption, prediction of power quality and prediction of energy theft. The results indicate that the methodology can be feasible solution for direct implementation in Smart Metering Systems. Keywords. Data Analytics, Smart Metering Systems, Data Streaming, Reinforcement Learning, AMI, Edge-Fog-Cloud Computing. 1. Introduction The Smart Meter (SM) is the cornerstone of the Smart Grid (SG) due to its capabilities for measurement, storage, data processing and communication. Smart Meters are versatile devices that allow developing new applications such as automatic meter reading, connection, and disconnection, price signaling and detection and prevention of energy theft [1]. However, the operation of a Smart Metering System (SMS) includes processing and storage of massive datasets due to tasks such as real-time measurement, data analytics, data optimization and visualization, and finally, decision-making. The most popular architecture for SMS is the Advanced Metering Infrastructure (AMI) that comprises a hardware-software architecture for processing and storing data as well as data communication capabilities for transferring information from SMs to the utilities’

1

Tecnológico Nacional de México / Instituto Tecnológico de Morelia. Avenida Tecnológico 1500, Lomas de Santiaguito, Morelia, Michoacán, México, 58120 2 Universidad Michoacana de San Nicolás de Hidalgo / Facultad de Ingeniería Eléctrica. Avenida Francisco J. Mugica S/N. Ciudad Universitaria. Morelia, Michoacán, México, 58030.

data centers. Figure 1 shows the general AMI architecture divided into four zones: HAN, NAN, FAN/WAN, and Head-End.

Figure 1. AMI Architecture. Smart Meters can perform a range of operations including measurement of energy consumption of electrical devices, such as Appliances (A) and Smart Appliances (SA), and energy production throughout Distributed Energy Resources (DER) such as photovoltaic and wind generation facilities. The combination of A, SA, and DER constitutes a data network called Home Area Network (HAN). The communication between devices in HAN is performed either by wireless (ZigBee, Bluetooth Low Energy, 6LoWPAN) or wired (Power Line Communications (PLC) in Lower Voltages (LV)) sensing technologies. In addition, SMs can measure other electrical signal variables such as current, voltage, frequency and power consumption. Thus, SMs are the core components of AMI. All the data produced in HAN is stored in the SMs using an Embedded Database (EDB). The EDB traditionally is very small. In general, the readings are reported over fifteen-minute period to the Data Concentrator (DC). On some occasions, the SMs can also communicate with each other. The data communication between SMs and DCs form a Neighborhood Area Network (NAN) which in turn is accomplished through wireless (radiofrequency, WiMax) or wired (PLC in LV or Medium Voltage (MV)) means.

The Data Concentrator (DC) is an embedded system with superior hardware and data processing capabilities compared to SMs. The DC collects all the measurements from SMs, storing data in a more complex database, and reports data back to the utility’s AMI headend system. In general, data communication between DCs and the utility’s servers is carried out through a special Wide Area Network (WAN) called Field Area Network (FAN). FAN is a special kind of WAN used in high-voltage transmission and distribution power lines, which requires specialized data communication equipment capable of operating in such harsh environments; data communication is regularly conducted through fiber-optic or cellular network communication links. Finally, the data reported to DC is collected from the AMI head-end system and stored in a big database using a Meter Database Management System (MDMS). MDMS is responsible for storing data and distribution to other utilities’ information systems such as billing, Enterprise Resource Planning (ERP), Customer Relationship Management (CRM), and Outage Management System (OMS), among others. Despite the enormous advantages of AMI, there are some disadvantages. One of the main issues with AMI is the diversity of components in the overall architecture. For instance, data communication means are diverse and interoperability with other AMI is not easy. In general, the scalability of the AMI architecture is restricted and not distributed [2]. Typically, SM readings are not reported in real-time mainly because it is not necessary for billing [3]. In contrast, new applications such as power outage detection, demand response, data security and assessing energy efficiency require obtaining real-time measurements from SMs and other Intelligent Electronic Devices (IED) in the SG. In addition, AMI implementations produce an enormous quantity of data on all tiers [4], necessary for analytic processes. The traditional approach to data processing consists of transferring the measured data to the Data Center for processing. However, other scenarios require including data processing capabilities in other tiers (i. e. in the SM-side) where the hardware capabilities are more limited in comparison to the data center. Nowadays, the inclusion of data analytics applications in SG has inspired research interest worldwide. Several works have studied this approach [5] where SM data is used to perform data analytics processes [6]. Besides, diverse machine learning techniques can be applied to solve issues about prediction and classification in SG and other contexts [7]. At present, one of the trends in the development of IoT architectures for Edge-Fog-Cloud Computing [8]. In Edge-Fog-Cloud Computing architectures, the edge tier is represented by small devices that process sensor data. The fog tier is represented by devices that concentrate data in edge devices; the cloud-computing tier represents the traditional big data approach. The Edge-Fog-Cloud Architecture has been implemented in SG in different applications ranging from monitoring [9] to processing Smart Meter data [10]. However, in general, reported works do not address data analytics operations within IoT devices at different tiers of processing. In this paper, the authors propose a novel AMI based architecture for data analytics at different levels of processing and different tiers of data communication (in real-time delayed-time). The proposed architecture intends to improve the efficiency of AMI for

performing data analytics processes. The rest of this paper is organized as follows. Section 2 shows how the proposed architecture works. Section 3 shows the materials and methods. Section 4 describes and discusses the results of experimentation for testing 3 applications: forecasting energy consumption, prediction of power quality and prediction of energy theft. Finally, Section 5 describes the conclusions of this work. 2. Proposed Architecture The premise relies on the assumption that the Edge-Fog-Cloud computing architecture can be adapted for SMSs including AMI. The edge tier is represented by SAs, DERs, and SMs. At present, SAs and DERs are seldom considered because they do not include hardware capabilities for processing and storing data. Nevertheless, due to current trends in IoT device development, the authors consider that the commercial introduction of SA devices with increased IoT, data storage and data processing capabilities will proliferate during the next decade and thus it is important to address the significance of SA and DER as part of the overall architecture. Similarly, SMs regularly include limited EDB storage capacity and limited data processing capabilities. The authors propose a novel SM and DC architectures with increased hardware capabilities for storing and processing data (see section 3). The fog tier is represented by DCs which process real-time data obtained from SMs. The cloud tier is represented by the MDMS and the entire utilities’ ICT infrastructure for storing and processing big data. The proposed architecture is shown in Figure 2.

Figure 2. High-Level Diagram of the architecture proposed.

The proposed architecture comprises four main areas of AMI data communication (HAN, NAN, FAN/WAN, Head-End) in 3 sections. Section 1 is the edge tier. Section 2 represents the fog tiers. Section 3, composed by the head-end, represents the cloud tier. In Figure 2, the areas delimited by squares represent loads (A, SA, DER); the circles represent SMs, the triangles represent DC, the cylinders represent DBs, the trapezoids represent the utility’s Gateway Servers (G), and the clouds represent the other utilities’ external data communication networks. The Edge Analytics (EA) is represented inside each SM and the Fog Analytics (FA) is represented for each DC. The data communication between each component of the architecture is made through the same data communication network and protocols. Data analytics is performed at the nearest tier. In the EA, data analytics is made online using data streaming to process different data analytic applications. The architecture uses Machine Learning (ML) to calculate the new parameters of the different data analytics applications according to computational capabilities using Reinforcement Learning (RL). RL is a kind of learning process based on feedback. The learners learn by trial and error. RL is a set formed by agents, environments, actions, and rewards [11]. The RL implementation of this work is based on a Markov Decision Process (MDP). The MDP is a tuple M =< S, A, Φ, R > composed by:  A finite set of States S, (si ∈ S, i = {1, . . . , n}).  A finite set of Actions A, which depend on each state (aj(si), j = {1, . . . , m}).  A Reward function R, which defines the goal and mapping of each state-action to a number (reward), indicating desirable states (f(s, a) ⇒ R) .  An environment model or state transition function Φ(s’ |s, a) (Φ: A × S → S) which indicates the probability to reach the state s’ ∈ S when action a ∈ A is completed in state s ∈ S. Additionally, an MDP model used in RL requires three main features:  Politic (π): define the system behavior on the time domain and consists of mapping (sometimes stochastic) the states of the actions (π(S) → A).  Value function (Vf): indicates what is good in the long term and corresponds to the total reward that an agent could wait to accumulate. The initial state of the function is s (Vf(s)) or a state acting a (Q(s, a)).  The rewards are given by the environment, but the values must be estimated (learned) based on observations. The learning process is defined as a Q Vf π(s) = max Q π(s, a). The agents are the processes executing in SMs, DCs, and Servers utilities. The environments depend on the different variables on each data analytics application in the proposed architecture. The actions correspond to the different tasks to predict, classify, learning activities. Constantly, the architecture monitors the different states of the data analytics algorithms. The rewards are the values that optimize the different data analytics algorithms in the proposed architecture according to policies. The policies and the value functions attempt to minimize or maximize the data analytic application performance. The

architecture is flexible, and each tier can be used as a variant of algorithms for data analytics. 2.1 Data Analytics Learning Figure 3 depicts the learning procedure.

Figure 3. The RL in the architecture proposed. The process begins at the Agent lower tier. The Agent in the Lower tiers (Al) execute their data analytics applications and evaluate his policy(π) and the corresponding value function (Vf) to estimate if the new state (S) has better reward(R) using its local parameters. Next, the Al issues a request to the Agent in the Upper-tier (Au) enquiring about the global parameters of their neighbors. Consequently, Au reports back the parameters to the AI that issued the request. Finally, the Al fits its parameters according to the new information and evaluates if the updated parameters have a better reward. The lower and upper tiers depend on the current context of the data analytic application in the architecture. The combination of lower and upper tiers could be (SA/DER => SM), (SM => DC), (DC => DC), and (DC =>G), which are the immediate tiers, but all the tiers are directly connected each other. For instance, the SMs (edge tier) is feedbacked directly by DC (fog tier) and indirectly by G (cloud tier). The architecture segmentation in tiers allows better performance of data analytics according to the hardware capabilities of each device, using data when needed at the time that is required. 3. Materials and Methods The proposed methodology was tested for three data analytic applications: energy consumption forecasting, power quality prediction, and energy theft prediction. The tests were conducted in the HAN and NAN tiers (Parts 1 and 2 of the proposed architecture). The hardware and data communication used for testing the methodology consisted of three main components:

  

Smart Meter: 4 Raspberry Pi model 3B+ with SmartPi board [12]. Data Concentrator: 1 Latte Panda Alpha [13] with 8 MB in RAM and Linux Operating System. Means of communication: WiFi, Gigabit Ethernet, and PLC at 54 Mbps.

3.1 Forecasting energy consumption test setup Several approaches have been reported in the literature to try to solve problems related to forecasting energy consumption [14-18]. Forecasting energy consumption is essential for generating the energy provided by the utilities. In general, data analytics processing for forecasting energy consumption is carried out at the data center tier and requires considerable data processing capabilities for implementing forecasting methods ranging from time series and econometric models to artificial intelligence algorithms. Here, the authors propose that using data derived from SMs and DCs can improve energy consumption forecasting. Moreover, the proposed methodology results in a considerable simple algorithm that can be implemented in the proposed edge and fog computing architecture. Figure 4 describes the Energy Consumption Forecasting procedure.

Figure 4. Architecture for Energy Consumption/Forecasting. The process begins at the Smart Meter level. The Smart Meter processes the measured data using the time series method, based on a pre-trained model. The model is trained using the data corresponding to one day (SM in idle measurement state). The time series algorithm used was ARIMA using python and statsmodels [19] library. The database in SM was SQLite. The SM continuously compares the processed data with the predictions, adjusting the model based on delta errors. The time series algorithms used in DC are more complex than in SM. In this case, the authors used SARIMAX. The database in DC is implemented using SQLite DB. The resulting time-series curve is compared using the BoxJenkins algorithm, adjusting the best curve to the ARIMA model. Then SMs and DC adjust the parameters p, q, and d of the ARIMA model using regression model accuracy metrics (Mean Squared Error: MSE, Akaike information criterion: AIC or Bayesian Information Criterion: BIC). Data derived from the model-based prediction (fifteen-minutes, day, week,

month, and year) allows forecasting before the event occurs and adjusting models. This implementation is shown in Algorithm 1. Algorithm 1: Forecasting of Energy Consumption and Production Input: Energy Consumption and Production Time Series 1: IF pre-trained model exist OR pre-trained model is obsolete THEN 2: calculate new ARIMA(p, d, q) model 3: END IF 4: delta_Error <- prediction of the next data - observed data 5: IF delta_Error >= +/- 0.05 THEN 6: anomaly_counter <- anomaly_conunter +1 7: END IF 8: IF anomaly_counter > 5% total data THEN 9: model <- obsolete 10: END IF Output: valid pre-trained model adjusted with the new input 3.2 Power Quality Prediction test setup The estimation of power quality is an important research topic and has relevant implications for all stages of power generation, transmission, distribution, and consumption. Diverse works address the importance of prediction for the stable operation of end-user equipment. Power quality prediction is a complex task because real-time events have a profound influence on the reliability of prediction; the events often are rapid and sparse which impedes effective analysis of incoming data [20] and [21]. Figure 5 shows the power quality prediction process carried out in the proposed architecture.

Figure 5. Architecture for power quality classification and prediction. The SM acquires and processes the As, SAs, and DERs measurements. The measured values (Voltage and Frequency) as evaluated and compared with standard power quality events such as undervoltage, overvoltage, swags (dips), swells (Table I, [22]).

Voltage Variations

Table I. Classes of Power Quality Events Peak 1 seconds – 3 3 seconds – 1 seconds minute magnitude  < 0.1 pu 0.1 – 0.8 pu 0.8 – 0.9 pu 1.1 – 1.2 pu 1.2 – 1.4 pu < 60 Hz < Duration 

Momentary interruption

Temporal interruption

Momentary Sag

Temporal sag

Momentary Swell

Temporal Swell

> 1 minute Sustained interruption Undervoltage Overvoltage

Frequency Deviation Short

Medium

Long

A decision tree is used to classify the events. The resulting methodology can classify events quickly. The variables of interest for the classifier are voltage and frequency. The reference values are 127 V and 60 Hz for voltage and frequency respectively. The algorithm to classify data in SM is shown in Algorithm 2. Algorithm 2: Classifier using Decision Tree Input: frequency (f), voltage (v), size windows (w, default w = 60 seconds) 1: FOR i=1 to size (w) THEN 2: Check v and f 3: IF v and f are at abnormal levels THEN 4: update Vv(i) or Vf(i) according to decision tree 5: END FOR 6: count Vv and Vf and update Ct in a continuous state Output: Vc state vector voltage, Vf state vector frequency, Ct classification table In addition, SM uses historical data to try to predict a power quality event. The model for prediction is complimentary with DC. The algorithm used in DC is a variation of k-means clustering. Using clustering in DC makes it easy to predict where are the most common power quality events in time in the neighborhood and it can correlate them with the geographical position of SM. This can help to prevent some general disturbances at the distribution grid. 3.3 Energy Theft Prediction test setup Non-technical losses are a big challenge for utilities. The most important non-technical lose is energy theft which may result from SM tampering or tapping into distribution lines. In the literature, energy theft prediction sparks ongoing research and results in a continuous proposition of solutions [23-26]. In order to implement the proposed energy theft prediction methodology, the authors use the results of applying the energy consumption forecasting method as described in section

3.1. The results are used to calculate a model for predicting abnormal consumption that could be classified as energy theft. Figure 6 shows the proposed architecture for classifying different cases of energy theft. In a similar manner to energy consumption forecasting, SM analyzes the data reported by the electrical devices and compares the data readings with predicted behavior. If the anomaly continues for a determined period, it is marked as a possible energy theft and it is reported to the SM and the Utility.

Figure 6. Architecture for energy theft prediction. Considering that most energy theft events can be classified as probabilistic events, the algorithm implemented in this tier is a Bayesian classifier (Algorithm 3). Algorithm 3: Classifier to predict Energy Theft Input: Deltas of predicted and real consumption/production energy, size windows of assessment (w, default w = 60 minutes), % limit of variation (lv, default lv=10%), and the average of power quality events on DC 1: FOR i=1 to size (w) THEN 2: Check deltas of energy consumption/production and compares with previous periods 3: IF deltas >= lv THEN 4: v := true 5: END IF 6: Check power quality events in SM and compares with DC 7: Construct a probability tree with the events in SM 8: END FOR 9: IF v = true THEN 10: walk the probability tree and calculate correlations with power quality events 10: IF correlation events < 0.5 THEN 11: probability = 1 – (probability * deltas) 10: ELSE 11: probability = deltas (percentage) Output: % probability of energy theft

4. Results and discussion 4.1 Results of testing the energy consumption forecasting methodology Measuring energy consumption and production at a rate of 1 sample per minute results in a 43,200 database record in the SM. Table II shows an example of database records obtained for energy consumption and production SM. Table II. Smart Meter Database for recording Consumption and Production records. Timestamp Consumption Production (kWh) (kWh) 2019/04/11 10:00:00 0.109 0 2019/04/11 10:01:00 0.083 0 2019/05/10 09:59:00 0.116 0.054 The DC analyzes data from all 4 Smart Meters in different time windows: day, week, month, and year. The DC computes a new model based on previous data from a month and year and sends it back to the corresponding SM. Since the DC obtains data from 4 SMs, the resulting database contains 172,800 records. Table III shows the database in DC using SQLite DB. Table III. DC database for recording consumption and production of energy in NAN Smart Meter Consumption Production Timestamp Identification Number (kWh) (kWh) (SM_ID) 2019/04/11 10:00:00 1 0.109 0 2019/04/11 10:00:05 2 0.097 0.114 2019/04/11 10:00:09 3 0.201 0 2019/04/11 10:00:13 4 0.145 0 Figures (7, 8, 9) show the results of the experimentation testing proposed architecture. Figure 7 shows the readings of the energy consumption of one SM. The consumption curve has some high peaks some days, others day has no consumption, and others day has a constant consumption. The x-axis represents the days of one month and the y-axis represents the energy consumption in kWh per day.

Figure 8. Smart Meter #1 readings of energy consumption. Figure 8 shows the readings of energy production using a PV panel. Again, the SM records production readings some days, the production is zero in other days. The x-axis represents the days of one month and the y-axis represents the energy production in kWh per day. The negative values indicate that energy is produced; positive energy values indicate that energy is consumed.

Figure 9. Smart Meter #2 readings of energy production. Furthermore, Figure 9 shows the forecasting of energy consumption calculated in Smart Meter 2 (SM2). SM 2 has production and consumption readings. The x-axis represents the numbers of consumption/production readings days of one month and the y-axis represents the energy consumption in kWh per day. The scale in the y-axis is adjusted.

Figure 7. Prediction of energy production in Smart Meter #2. The results show an MSE of 0.019 in consumption readings and 0.001 in production readings. The experimental results show that the proposed architecture can provide close energy consumption/production using the time series approach. The methodology works as follows: when new data is obtained, it is compared with the predicted data and later it is compared with the data observed calculating the error deviation and the learning rate through an RL process. In this data analytic application, the states are Al and the different Au. The actions are predict or adjust. The rewards are (1 or -1) depending on the value function that is given by the error metrics like MSE. The environment and policies map the transition between Al and different Au according to the hardware capabilities and the Vf(s). Finally, the learning function is given by checking the Vf(s) against the observation. Table IV shows the results of experimenting over a 30-day period using 4 SM and 1 DC. The evaluation was tested 12 times. Note that Qi represents the learning rate where i is the test number: Q1 is the learning rate in the first period and so on. The Q values are expressed in percentages.

S M 1 2 3 4

Table IV. Learning rates in the forecasting consumption/production data analytic application Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 51.2 1 48.8 5 53.4 4 50.8 8

53.2 1 49.7 1 53.9 3 51.3 3

53.5 3 50.9 1 54.9 3 52.7 5

53.9 8 52.0 3 55.5 5 53.2 7

54.9 3 55.5 1 57.2 55.1 6

56.5 2 55.8 5 57.2 3 56.0 1

58.0 9 57.8 3 58.8 2 57.1 7

58.3 3 58.8 5 59.9 8 57.8 7

59.5 9 59.9 3 60.7 8 59.0 3

62.0 4 61.2 8 62.0 9 60.1 8

641.7 2 62.36 63.45 61.37

64.8 1 62.3 6 63.4 5 62.5 3

The results obtained show that the architecture proposed in this data analytic application can learn effectively due to the new MSE with RL was 0.014 in consumption and 0.07 in production. 4.2 Results of testing the power quality prediction methodology The database structure of the SM is shown in Table V. Table V. SM database to classified power quality events. Timestamp Voltage Frequency (V) (Hz) 2019/04/11 10:00:00 120.53 60 2019/04/11 10:00:01 127.29 60 2019/04/11 10:00:03 128.12 59.98 The classifier uses a vector state of size 60, to evaluate the events occurred. One vector state with voltage values and other with frequency values are used to record the power quality events. Each event has an ID number representing the ID_class of each category (see Table VI). Table VI. Table of IDs representing power quality events. ID_class Description 0 No event 1 Momentary interruption 2 Temporal interruption 3 Sustained interruption 4 Momentary sag 5 Temporal sag 6 Undervoltage 7 Momentary swell 8 Temporal swell 9 Overvoltage 10 Short frequency deviation 11 Medium frequency deviation 12 Long frequency deviation Table VII shows an example of the first 7 positions in the state vector of voltage values. This example is classified as a temporal sag event because its frequency has four seconds between 0.8 to 0.9 Per-unit (pu). Table VII. First 7 positions in voltage state vector to classify power quality events. 0 0 1 1 1 1 0…0

Table VIII shows the classification results stored in the SM. The duration is expressed in seconds. Table VIII. SM database of classes Timestamp ID_class Duration 2019/04/11 10:01:00 5 4 … … … 2019/04/11 10:14:00 2 11 The original measurements and the power quality events that occurred are sent to DC for processing. DC generates a model for the NAN network which represents the status of the distribution grid. Table IX shows the outputs of classification from events reported for each SM at DC. Table IX. DC database with a classification of power quality events in HAN. Timestamp SM_ID ID_class Duration 2019/04/11 10:01:00 1 5 4 2009/04/11 10:05:00 2 4 3 2019/04/11 10:14:00 4 2 11 During one month of experimentation, there were 5,417 power quality events registered. The most common event was a momentary swell. The complete results are shown in Table X.

SMs C1 SM1 5 SM2 19 SM3 11 SM4 13 Subtotal 43

C2 0 2 1 1 4

Table X. Power quality events classification in DC. Events C3 C4 C5 C6 C7 C8 C9 0 253 203 21 753 147 53 3 288 181 29 299 133 68 2 268 173 37 627 214 51 0 247 164 27 802 218 73 5 1056 721 114 2481 712 245

C10 7 4 5 7 23

C11 1 3 2 5 11

Table XI. Shows the number of correct and wrong classifications in the power quality classification. The results are an average of 30 experiments in the 4 SMs. Table XI. Confusion matrix of the power quality events classification Prediction Positives Negatives Observation Positives 2124 662 Negatives 661 1970

C12 0 1 0 1 2

The methodology is similar to that described in section 4.1 with the following differences: The actions are classify or adjust. The rewards are two values (1 or -1) depending on the value function that is given by the error metrics like a confusion matrix (correct classifications give 1 reward, incorrect classifications give -1). Table XII shows the results experimenting over a 30-day period using 4 SM and 1 DC, in a similar way that section 4.1

S M 1

Q1

2

92.7 9 96.8

3

79.3

4

93.2 6

Table XII. Learning rates in the Power Quality Events Classification. Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11

Q12

93.0 7 96.9 9 80.8 8 94.7 4

99.3 1 99.7 1 95.1 8 98.4 6

94.1 8 97.2 8 84.5 4 95.5 1

95.4 3 97.9 6 86.1 3 95.8 9

96.2 6 98.3 5 87.9 2 96.5 3

96.6 7 98.7 4 90.1 5 97.7 5

97.0 9 99.3 2 91.2 3 97.8 8

97.6 4 99.5 1 91.8 8 98.0 1

97.9 2 99.7 1 93.3 9 98.2

98.3 4 99.7 1 94.3 2 98.4

98.8 9 99.7 1 95.1 1 98.4 6

The results obtained show that the architecture proposed in this data analytic application can learn effectively due to the classification process has a score of 90.53% correct cases and with RL was improved to 98.17%. 4.3 Results of testing the Energy Theft Prediction methodology The data model is obtained from the testing energy consumption forecasting methodology, (Table II and Table III in SM and DC respectively). Additionally, two more data repositories are used in both SM and DC to store the predictions and calculate the percentage of the difference between the consumption/production energy data as shown in Tables XIII and Table XIV. Table XIII. Smart Meter Database for recording Consumption and Production predictions. Timestamp Consumption Consumption Production Production Prediction delta Prediction delta (kWh) (kWh) (kWh) (kWh) 2019/04/11 10:00:00 0.115 0.06 0 0 2019/05/11 10:01:00 0.080 -0.03 0 0 2019/05/11 22:03:00 0.099 +0.11 0.066 -0.07 2019/05/12 09:59:00 0.116 -0.04 0.054 0.05 Table XI. DC database for recording consumption and production predictions in NAN Timestamp SM_ID Consumption Consumption Production Production

2019/04/11 10:00:00 2019/04/11 10:00:05 2019/04/11 10:00:09 2019/04/11 10:00:13

1

predicted 0.115

delta 0.06

Prediction 0

Delta 0

2

0.094

-0.03

0.112

-0.02

3

0.204

0.03

0

0

4

0.149

0.04

0

0

Table XV shows the results of Energy Theft prediction compared to the observations. The values in the consumption and production predictions are expressed in percentages indicating the correct data classified. Table XV. DC results in Energy Theft Prediction data analytic application. SM_ID Consumption Production predicted Prediction 1 88.35 93.12 2 89.12 92.15 3 87.23 92.63 4 90.01 92.27 Table XVI. Shows the general number of correct and wrong classifications in the detection of energy thief in consumption and production energy. The results are an average of 30 experiments in the 4 SMs. Table XVI. Confusion matrix of the energy thief detection Prediction Positives Negatives Observation Positives 19421 2009 Negatives 2047 19723

The methodology is similar to that described in section 4.2 with the same parameters in states, action, rewards, value function. Table XVII shows the results of the 30-day experimentation using 4 SM and 1 DC in a similar way that section 4.1 and 4.2.

S M 1

Q1 87.2

Table XVII. Learning rates in the Energy Thief Classification Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 87.6

87.8

88.0

88.2

88.7

88.9

90.1

90.4

90.6

Q11

Q12

90.73

90.73

2 3 4

3 87.1 1 88.1 3 90.1 7

5 87.3 3 88.2 2 90.2 3

7 87.6 7 88.3 7 90.3 1

7 87.9 1 88.5 1 90.4 7

9 88.1 3 88.6 8 90.6 3

3 88.4 4 88.8 1 90.7 7

9 88.6 9 89.1 1 90.8 8

2 88.9 5 89.2 7 90.9 4

3 90.1 2 89.4 3 91

1 90.2 4 89.9 3 91.0 5

5 90.39 89.93

5 90.63 5 89.93

91.09

91.14

The results obtained show that the architecture proposed in this data analytic application can learn effectively due to the energy thief detection process has a general production/consumption score of 88.16% correct cases and with RL was improved to 90.61%. 5. Conclusions The next generation of applications for SG and SMS needs improved analytic processing. A possible, feasible solution consists of segmenting AMI in tiers to increase processing, storing and communication capabilities; moreover, data velocity and granularity are also important for processing data analytics effectively. The architecture described in this paper showed that Edge-Fog-Cloud architecture is a feasible solution for SMS applications offering data processing advantages by using multiple tiers. The resulting methodology can perform data analytics for a variety of applications in SMS. Here, the authors tested the system architecture for three different applications (consumption/production energy forecasting, power quality, and energy theft). The first application was improved from 51.09% to 63.29% final score, the second application was improved from 90.53% to 98.17%, and the last application was improved from 88.16% to 90.61%. The results suggest that other processes related to energy production, transmission, distribution, and consumption can benefit from implementing the proposed architecture. Moreover, this work showed that it was possible to implement an SMS multi-tier architecture for data analytics using data in real-time and delayed-time. The proposed architecture is versatile and can be used to implement different training and learning procedures for developing on-line and off-line models capable of adapting to adapt to different environments by using reinforcement learning. Acknowledgement This work was supported by Tecnológico Nacional de México [grant 6385.19-P]. References [1] G. Giaconi, D. Gunduz and H. V. Poor, "Privacy-Aware Smart Metering: Progress and Challenges," in IEEE Signal Processing Magazine, vol. 35, no. 6, pp. 59-78, Nov. 2018. doi: 10.1109/MSP.2018.2841410

[2] J. Zhou, R. Qingyang Hu and Y. Qian, "Scalable Distributed Communication Architectures to Support Advanced Metering Infrastructure in Smart Grid," in IEEE Transactions on Parallel and Distributed Systems, vol. 23, no. 9, pp. 1632-1642, Sept. 2012. doi: 10.1109/TPDS.2012.53 [3] Jay D. Hmielowski, Amanda D. Boyd, Genevieve Harvey, Jinho Joo, “The social dimensions of smart meters in the United States: Demographics, privacy, and technology readiness” in Energy Research & Social Science, Vol. 55, 2019, pp. 189-197, ISSN:2214-6296, doi: https://doi.org/10.1016/j.erss.2019.05.003 [4] Abdelazeem A. Abdelsalam, Hossam A. Gabbar, Farayi Musharavati, Shaligram Pokharel, “Dynamic aggregated building electricity load modeling and simulation” in Simulation Modelling Practice and Theory, Vol. 42, 2014, pp. 19-31, ISSN: 1569-190X, doi: https://doi.org/10.1016/j.simpat.2013.12.005 [5] Lulu Wen, Kaile Zhou, Shanlin Yang, Lanlan Li, “Compression of smart meter big data: A survey” in Renewable and Sustainable Energy Reviews, Vol. 91, 2018, pp. 59-69, ISSN: 1364-0321, https://doi.org/10.1016/j.rser.2018.03.088 [6] Xiufeng Liu, Lukasz Golab, Wojciech Golab, Ihab F. Ilyas, and Shichao Jin. 2016. “Smart Meter Data Analytics: Systems, Algorithms, and Benchmarking”, ACM Trans. Database Syst. 42, 1, Article 2 (November 2016), pp. 1-39. doi: https://doi.org/10.1145/3004295 [7] T. Vafeiadis, K.I. Diamantaras, G. Sarigiannidis, and K.Ch. Chatzisavvas, “A comparison of machine learning techniques for customer churn prediction”, in Simulation Modelling Practice and Theory, Volume 55, 2015, Pages 1-9, ISSN: 1569-190X, doi: https://doi.org/10.1016/j.simpat.2015.03.003 [8] D. S. Linthicum, "Connecting Fog and Cloud Computing," in IEEE Cloud Computing, vol. 4, no. 2, pp. 18-20, March-April 2017. doi: 10.1109/MCC.2017.37 [9] Miodrag Forcan, Mirjana Maksimovid, “Cloud-Fog-based approach for Smart Grid monitoring” in Simulation Modelling Practice and Theory, 2019, ISSN: 1569-190X, doi: https://doi.org/10.1016/j.simpat.2019.101988 [10] I. Satoh, “A Framework for Edge Computing on Smart Meters”, 2018 AEIT International Annual Conference, Bari, 2018, pp. 1-6. Doi: 10.23919/AIET.2018.8577317 [11] Amir-Mohsen Karimi-Majd, Masoud Mahootchi, Amir Zakery, “A reinforcement learning methodology for a human resource planning problem considering knowledgebased promotion” in Simulation Modelling Practice and Theory, Vol. 79, 2017, pp. 87-99, ISSN: 1569-190X, doi: https://doi.org/10.1016/j.simpat.2015.07.004. [12] nD-eneserve GmbH, “SmartPi”, [Online] Available at: https://www.enerserve.eu/en/smartpi.html [13] Latte Panda, “Latte Panda Alpha”, [Online]. Available at: https://www.lattepanda.com/ [14] R. Mohammad, "AMI Smart Meter Big Data Analytics for Time Series of Electricity Consumption," 2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/ 12th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE), New York, NY, 2018, pp. 1771-1776. doi: 10.1109/TrustCom/BigDataSE.2018.00267 [15] Zufferey T., Ulbig A., Koch S., Hug G., “Forecasting of Smart Meter Time Series Based on Neural Networks”, In: Woon W., Aung Z., Kramer O., Madnick S. (eds) Data Analytics

for Renewable Energy Integration. DARE 2016. Lecture Notes in Computer Science, Vol. 10097. Springer, Cham. doi: https://doi.org/10.1007/978-3-319-50947-1_2 [16] S. Singh and, A. Yassine, “Big Data Mining of Energy Time Series for Behavioral Analytics and Energy Consumption Forecasting”, in Energies, vol. 11, no. 2, 2018. doi: 10.3390/en11020452 [17] S. V. Oprea, A. Bâra and V. Diaconita, "Sliding Time Window Electricity Consumption Optimization Algorithm for Communities in the Context of Big Data Processing," in IEEE Access, vol. 7, pp. 13050-13067, 2019. doi: 10.1109/ACCESS.2019.2892902 [18] Marwa F. Mohamed, Abd El-Rahman Shabayek, Mahmoud El-Gayyar, Hamed Nassar, “An adaptive framework for real-time data reduction in AMI” in Journal of King Saud University - Computer and Information Sciences, Vol. 31, Issue 3, 2019, pp. 392-402, ISSN: 1319-1578, doi: https://doi.org/10.1016/j.jksuci.2018.02.012 [19] StatsModels, StatsModels Statistics in Python. [Online] Available at: https://www.statsmodels.org/stable/index.html [20] W. Zhu, W. Yu, B. Kan and G. Liu, "Smart Meter Data Analytics Based on Modified Streaming k-Means," 2017 3rd International Conference on Big Data Computing and Communications (BIGCOM), Chengdu, 2017, pp. 328-333. doi: 10.1109/BIGCOM.2017.49 [21] S. Joseph and E. A. Jasmin, "Stream computing framework for outage detection in smart grid," 2015 International Conference on Power, Instrumentation, Control and Computing (PICC), Thrissur, 2015, pp. 1-5. doi: 10.1109/PICC.2015.7455744 [22] D. Saxena, K.S. Verma, and S.N. Singh, “Power quality event classification: an overview and key issues” in International Journal of Engineering, Science and Technology, Vol. 2, No. 3, 2010, pp. 186-199. [Online]. Available at: https://pdfs.semanticscholar.org/a4fb/cb3166e9f8c7e126d993df5d101018ef630c.pdf [23] F. A. A. Alseiari and Z. Aung, "Real-time anomaly-based distributed intrusion detection systems for advanced Metering Infrastructure utilizing stream data mining," 2015 International Conference on Smart Grid and Clean Energy Technologies (ICSGCE), Offenburg, 2015, pp. 148-153. doi: 10.1109/ICSGCE.2015.7454287 [24] Y. Hong, W. M. Liu and L. Wang, "Privacy Preserving Smart Meter Streaming Against Information Leakage of Appliance Status," in IEEE Transactions on Information Forensics and Security, vol. 12, no. 9, pp. 2227-2241, Sept. 2017. doi: 10.1109/TIFS.2017.2704904 [25] Soma Shekara Sreenadh Reddy Depuru, Lingfeng Wang, Vijay Devabhaktuni, “Electricity theft: Overview, issues, prevention and a smart meter based approach to control theft” in Energy Policy, Vol. 39, Issue 2, 2011, pp. 1007-1015, ISSN: 0301-4215, doi: https://doi.org/10.1016/j.enpol.2010.11.037 [26] S. Singh, A. Yassine and S. Shirmohammadi, "Incremental mining of frequent power consumption patterns from smart meters big data," 2016 IEEE Electrical Power and Energy Conference (EPEC), Ottawa, ON, 2016, pp. 1-6. doi: 10.1109/EPEC.2016.7771716