Applied Energy 238 (2019) 1337–1345
Contents lists available at ScienceDirect
Applied Energy journal homepage: www.elsevier.com/locate/apenergy
A time series clustering approach for Building Automation and Control Systems
T
⁎
Gerrit Bode , Thomas Schreiber, Marc Baranski, Dirk Müller RWTH Aachen University, E.ON Energy Research Center, Institute for Energy Efficient Buildings and Indoor Climate, Aachen, Germany
H I GH L IG H T S
clustering algorithms can be applied to labeling building energy data. • Unsupervised cannot reach the performance of supervised alternatives. • They of clustering provide insights into the data structures found in buildings. • Results • Auto-encoders provide a stable, independent alternative for feature selection.
A R T I C LE I N FO
A B S T R A C T
Keywords: Big data Unsupervised Machine learning Building automation and control Time series clustering Feature extraction
Structured data of all sensors and actuators are a requirement for decisions about control strategies and efficiency optimization in Building Automation. In practice, the analysis of data is a challenging and time-consuming task. In previous work, it has been demonstrated that classification algorithms may reach high classification accuracies when applied to building data. However, supervised algorithms require labelled training data sets and a predefined classes, and depend highly on the selection of input features. In this paper, we investigate how unsupervised machine learning techniques can be used to tackle both the problem of classification of time series as well as the problem of feature selection. We present a selection of the most promising algorithms and apply them on data extracted from the E.ON Energy Research Center. We then investigate the use of an unsupervised feature extraction compared to the statistical features used in previous literature by comparing the results of the classification on different data sets. Our investigations show that the unsupervised methods we apply to not find data clusters that represent the pre-defined class labels. They, however, are able to find groups of similar data points, showing that clustering is in general possible and that the time series have distinguishable properties. We also see a more robust performance of the classification algorithms when unsupervised feature extraction is used. The results of this paper show that unsupervised machine learning algorithms cannot generally mitigate the issue of missing training data. However, they can improve supervised classification by providing a more robust set of features compared to manual selection. From the clusters that where found we can derive insights about the properties of the time series, that allow us to make a better assessment which information that can be extracted using data-driven algorithms.
1. Introduction Buildings have a tremendous impact on our environment as they consume more than 40% of the total energy used in the USA and the EU [1]. Suboptimal control strategies are responsible for about 40% of the energy losses in state-of-the-art Building Automation and Control Systems (BACS) [2]. As BACS have become increasingly complex and the amount of
⁎
gather sensor streams and meta data have grown exponentially, the initial configuration of the monitoring system becomes an increasingly time-consuming and expensive task, yet reliable data is fundamental. Unfortunately, configuration changes and issues with the data monitoring and storage system often lead to low quality of the data [3]. With the rise of the Internet-of-Things, it is to be expected that the number of sensors will increase further, yet the separation of the hardware installation, i.e. mounting a sensor, and the configuration, i.e.
Corresponding author. E-mail address:
[email protected] (G. Bode).
https://doi.org/10.1016/j.apenergy.2019.01.196 Received 22 May 2018; Received in revised form 21 January 2019; Accepted 22 January 2019 0306-2619/ © 2019 Elsevier Ltd. All rights reserved.
Applied Energy 238 (2019) 1337–1345
G. Bode et al.
[18]. For both, supervised classification and unsupervised clustering, the accuracy of feature-based techniques highly depends on the selected features [19]. Features describe certain aspects of a time series in lower dimensionality, e.g. reducing a time series to the average [20]. It is plausible that a small group of features does not need to be representative for the underlying time series. Considering this issue, there are aspirations to automate the feature extraction step [21]. Feature selection provides significant improvements in energy applications like forecasting [22] and fault detection [23] and has improved the accuracy up to 30% [24]. In order to find better representations for time series classification, a deep convolutional neural network based feature extraction technique was proposed in [25]. Convolutional neuronal networks are able to learn recurring shapes and patterns in time series, a technique, which does not depend on hand selected features as representation of the data. The proposed classifier achieved better results than 13 of the 14 benchmark classifiers (average accuracy over all data sets) and achieved the highest accuracy on 10 of the UCR data sets. The UCR is a freely accessible database widely accepted for classification performance measurement [26]. Unfortunately the archive does not yet contain a BACS data set, which one could use to evaluate a classifier [27]. Previous work on unsupervised feature learning has shown that it is possible to train convolutional neural networks in an unsupervised way using an auto-encoder design [28]. They consist of an encoder part reducing the data, and a decoder part restoring the encoded data to its original time series. Auto-encoders learn to reconstruct the input by minimizing a cost function. By removing the decoder part, we obtain a low dimensional representation of the data. Like statistical features, it is processable by unsupervised clustering algorithms [29]. There is a large variety of designs for unsupervised neuronal networks like self organizing maps. However, most of them do not work on time series rawdata [30], but, unlike auto-encoders, also depend on hand selected features as the biggest factor on the clustering accuracy. In the field of building energy systems, auto-encoders have been applied to fault detection and diagnosis [31], but to the best of our knowledge their impact on time series classification accuracy has not yet been investigated. For a deeper understanding of auto-encoder designs and convolutional neuronal networks we recommend the following papers [25,28,32]. Aside from feature-based approaches, algorithms exist that do not require feature extraction, but rather work with the raw time series data. An common example of this techniques is dynamic time warp (DTW). DTW is highly competitive in similarity measurement of time series. Combined with a clustering algorithm it is widespread in the data mining community [33]. It has been used for classification in a variety of different applications, for example data gathered from machine learning repositories [34] and UCR data sets [35] and has been applied to identify similarities in non-intrusive load measurements [36].
adding the sensor into a system in the cloud, will increase the need for a clear identification process for sensor signals [4,5]. Currently, the most common process for the identification and labelling time series data within BACS is manual effort by a system engineer during the commissioning of the system. The system engineer created data point labels and links them to the actual data stream. However, with BACS now containing several thousand data points, input errors become almost unavoidable. However, many state-of-theart technologies like model-predictive control and fault detection and diagnosis depend on the correctness of the input data. As a results, wrong data labels and may, for example, lead to wrong inputs into control strategies and optimizations, resulting in wrong control actions or wrong values calculated for the performance of a system. This in turn leads to poor overall performance of the buildings energy systems. The mitigate this issue and in order to improve the efficiency of engineers facing this task, algorithms from the field of machine learning have been applied in several studies [6–8] and were able to show that artificial intelligence has a huge potential for time series classification in BACS data. A major obstacle for those techniques is that engineers working with BACS data usually face a huge amount of unlabelled data and data differs between different buildings, as buildings are highly individual and many different standards for the labelling of data points exist [9]. Unlike the supervised algorithms used in the above-mentioned studies, unsupervised techniques do not require labelled training data, but rely purely on similarities in the time series. In this paper, we investigate the use of unsupervised machine learning techniques for time series identification and classification in BACS. We also demonstrate how a unsupervised machine learning technique, the auto-encoder, can be used to mitigate issues that arise from the feature selection process and how this can be used to improve the previous work in the field. 2. Related work With an increasing availability of data storage and computing power, a growing interest in machine learning for big data analysis was observed [10]. Classification is a major research area in supervised machine learning, in which correct class labels are necessary for the training of the algorithms. Time series classification is a problem faced by researchers from various domains such as physics, medicine, biology or finance and the knowledge discovered in this field increased steadily over the past decades [11]. Reviews of supervised machine learning techniques can be found in [12,13]. For building energy systems, classifiers are a valuable tool, e.g. for the identification of certain loads occurring in a building [14], or the detection and classification of occupants and occupant behaviour [15]. A complete review for classification in building energy system consumption is provided in [16]. Supervised techniques were also applied to the problem of data point identification. In [8], promising results were achieved on a data set extracted from the BACS of the E.ON Energy Research Center. The authors tested time series classification with a variety of supervised machine learning algorithms. Their work is based on statistical features of the time series, the features served as representations for the applied classifiers. For the selection of the features they followed the recommendation published in [17]. The best classifier they tested achieved an overall classification accuracy of 73.2% and for some classes even 100%. Besides of supervised machine learning, unsupervised machine learning is another major research area. Unsupervised techniques rely purely on similarities in the data, it is not required to label the training data set manually. These techniques are in the focus if labelled training data or correct class labels are not available or hard to create. In time series clustering the goal is to find groups within the data without using pre-defined class labels during the training of the algorithms. An extensive review of unsupervised clustering methods can be found in
3. Data set In this section, we give an introduction to our data set. All data we use in this paper is extracted from the database of the BACS installed in the E.ON Energy Research Center located in Aachen, Germany. The building and the energy system is described in more detail in [37]. The system contains more than 9000 sensors and actuators. Their measurement values are stored in an event-based database. All data points are labelled with one of 22 class labels, an overview is presented in Table 1. The labels were manually selected after the BACS was installed, the classes are the result of a plausibility consideration of a domain expert. A detailed description of the monitoring system is provided in [38]. With 5142 samples the December is the month with the highest data output in 2015 and 2016, therefore we choose to apply all clustering 1338
Applied Energy 238 (2019) 1337–1345
G. Bode et al.
performance, we compare the resulting clusters with the 22 predefined classes presented in Table 1, assuming that these classes are correct. Since unsupervised clusters are not labelled, we assume that the cluster label is equal to the most represented predefined class found in that cluster.
Table 1 Manually selected classes of data points in the E.ON Energy Research Center. No.
Name
Description
1 2 3 4 5 6 7 8 9 10 11 12 13 14
AL C CO2 HF OM P pressure revs rhBAS SPO SPPercent SPT SPT , Pot Tg
Fault/Alarm/Maintenance message Counter CO2 concentration Heat flow Operating message (On, Off, Opened, Closed) Power Pressure Revolutions/frequency Relative humidity Set point (operation/request/release) Set point in percent Temperature set point Temperature set point via potential meter Temperature of gaseous fluid
15 16
Tl Vdot , g
Temperature of liquid fluid Volume flow for gaseous fluid
17 18 19 20 21
Vdot , l VOC VP w WSPpercent
Volume flow for liquid fluid Volatile organic compounds measured by BAS Device status in percent: valve position Electric work Working set point (0–100) and positioning
22
WSPT
Working set point (temperature)
4.1. Feature-based clustering In order to compare our results with the promising supervised approach published in [8], we use the same statistical feature selection. The recommended features are: mean value, variance, skewness, kurtosis, minimum and maximum. They were suggested to be used for such an application in [17].
4.1.1. Unsupervised feature extraction In addition to the manually selected statistical features, we train a deep convolutional auto-encoder for unsupervised feature extraction. Auto-encoders are based on neuronal networks and achieve promising results in a variety of unsupervised feature learning tasks. By minimizing a binary cross-entropy between in- and output, we train the neuronal networks to first compress the input into lower dimensionality and then reconstruct the input. Fig. 2 show the basic design of the autoencoder. In deep convolutional auto-encoders, convolutional layers are stacked with max pooling layers. By applying a defined number of filters to sections of the data, convolutional neuronal networks are able to learn recurring shapes and patterns in the time series. Max pooling layers then reduce the dimension by just letting the highest filtered value pass. This way, only the most pronounced shape in the data is passed on to the next stage. For validation of the design, we test two different designs and compare the reconstruction accuracy [40]. One very basic design, which generates a representation of 10 float numbers from the time series input, while the other one is a highly competitive classifier based on multiscale convolutional neuronal networks (MCNN) [25]. In contrast to the clustering algorithms, neuronal network training depends heavily on the size of the data sets Therefore we use the complete monitoring data of 2015 and 2016 for the training. Both designs are trained for 6 epochs on a GeForce GTX 560 (Keras supports calculations on graphics processing units). This took 28 h respectively. One epoch is defined as one training cycle, in which the neuronal networks see all data from the data set. Since MCNN does not achieve a higher accuracy compared to the basic auto-encoder design (MCNNacc = 0.3556; Basicacc = 0.3567 ) in our test case, we use the basic design. After training, we separate encoder and decoder and use the encoder for the unsupervised feature extraction.
techniques on data collected in this month. After removing not correctly recorded samples, we receive 3822 time series. We apply zero-order hold interpolation in order to obtain time series of equidistant values and equal length. As a results of the trade-off between calculation speed performance, resolution and observation time, we choose to investigate two different time frames and resolutions for the data: Long term observations. 20 days with an interval of 15 min. Short term observations. One day with an interval of 60 s. In order to observe the dynamics of the system, the resolution should be as high as possible. However, due to the latency in the measurement system and the cost of transmitting and storing the data, wide-spread use of such high resolutions in BACS is unlikely. Algorithms will therefore work with larger patterns rather than dynamics of the measurements. While some classes, like CO2 concentration in the rooms, show their representative patterns over the course of one day, others, e.g. heating flow temperatures can be dependant on the ambient factors and will show their representative sequences over several days or weeks. Both resolution and observation time effect the amount of entries in a time series. Especially the speed performance of the computational expensive DTW calculation (Section 4.2) depends on the length of the time series.
4. Methodology 4.1.2. The clustering algorithms The algorithms we apply and their initial parameters are presented in Table 2. Corresponding to the 22 class labels in the data set, we choose 22 clusters as initial parameter whenever possible. For the algorithms which do not use the number of clusters as initial parameter, we select the parameter (for example bandwidth) so that they find 22 clusters in the data as well. We do not perform hyper parameter tuning. Instead, we use the default values recommended in the documentation of scikit-learn [39]. The accuracy of some of the algorithms depends on data set preprocessing like standardisation or normalisation. In order to find the combination of data set pre-processing and algorithm which achieves the highest clustering accuracy, we perform a sensitivity analysis. The algorithms run very fast in our use case, except for Just Spectral Clustering, Ward and Affinity Propagation, which take comparatively long.
In the following section, we present the application of selected unsupervised clustering approaches. We implement both manual feature selection and unsupervised feature extraction based on an auto-encoder. In last part, we will apply dynamic time warp. We implement all data-analysis on a ordinary desktop computer (CPU: Intel Core i5 I5-2500K 3.3 GHz; RAM: 8 GB) using open-source Python libraries. The clustering algorithms are provided by Scikit-learn [39]. For the implementation of the auto-encoder we use Keras with Tensorflow backend [40]. For basic mathematical operations like the statistical feature extraction we use Numpy and Scipy [41], for the visualisation of the results Matplotlib [42] in combination with t-SNE for high dimensional data [43]. In Fig. 1 we present an overview of the methodology. For the feature-based approach (1), we test several combinations of features, data set pre-processing and algorithms. In order to evaluate the 1339
Applied Energy 238 (2019) 1337–1345
G. Bode et al.
Fig. 1. Work-flow of the raw-data-based and the feature-based clustering techniques.
4.2. Dynamic time warp
Table 2 Applied clustering algorithms [39].
In contrast to the feature-based approach, the raw time series based technique DTW (2) processes the time series directly. After we calculate 22 representative centroids from the raw time series data, we calculate the nearest neighbour for each time series. The result is a set of the predicted clusters, which we compare with the predefined classes as well. DTW is based on euclidean distances, where (q) and (c) are the equidistant values of two time series:
D (Q, C ) =
Name
GaussianMixture MiniBatchKMeans Ward SpectralClustering Birch KMean AffinityPropagation MeanShift DBSCAN
n
∑i =1 (qi − ci)2
(1)
DTW finds the best match between two time series by optimizing the euclidean distance calculations between them [33]. For a detailed description of the technique we recommend the following papers [33,44]. DTW is more robust, and in fact the method outperforms euclidean distances on all data sets made available in the UCR time series archive in speed and accuracy [26]. The calculations of the centroids take about 31 h, to calculate the nearest neighbour of each of the 3822 time series it takes another 30 h.
Parameter Clusters Clusters Clusters, connectivity Clusters Clusters, threshold Clusters Damping, preference Bandwidth Neighbourhood range
4.3. Clustering performance evaluation For accuracy measurement, we calculate the Adjusted Rand Index (ARI) between the pre-defined classes and the predicted clusters. The ARI is a key performance indicator for clustering techniques and is a value between zero and one. A high ARI indicates a high similarity between two assignments. It is a slightly modified version of the rand
Fig. 2. Design of a basic auto-encoder model. The encoder compresses the input time series to a small representation, while the decoder learns to reconstruct the input from it. 1340
Applied Energy 238 (2019) 1337–1345
G. Bode et al.
accuracy of 38.28%. Data set preprocessing proves to be beneficial for encoded features.
index (RI), which ignores permutations. The RI is the relationship between (a), which is the number of correct pairs between two assignn ments plus (b) the number of pairs in a different set and (C2 samples ), the total number of possible pairs in the data set [39]:
a+b RI = nsamples C2
5.2. Dynamic time warp Fig. 5 shows the result of the most accurate clustering algorithm Mini batch k-means for both data and feature sets and the results of the dynamic time warp algorithm. In our test case, the computationally expensive DTW technique does not achieve a higher accuracy than the other techniques. The technique beats statistical features for the long term time frame (30.8%) but is outperformed by both feature-based methods in the short term time frame (28.31%). Statistical features achieve 41.6% and unsupervised extracted features achieve 37.4%.
(2)
Another key performance indicator in data clustering is Homogeneity (H). A high homogeneity compared to the ground truth indicates that each cluster contains mostly members of a single class. We use this figure for an additional cluster analysis in Section 5.4. It is based on the relationship between the conditional entropy of the classes (C) given the cluster assignments (K) H (C|K ) and the entropy of the classes H (C ) [39]:
H=1−
H (C|K ) H (C )
5.3. Confusion matrix (3)
We compare all combinations of algorithms, features and data preprocessing to obtain the best possible result. Then, we compare this result to both the previous supervised approaches and an dynamic time warp implementation. We further investigate the results of the best clustering approach by evaluating the confusion matrix and increasing the number of clusters to obtain more information on the data.
Additionally, we created a modified confusion matrix to analyse the most accurate method. In Fig. 6, we present the comparison between the predefined 22 classes in the data set and the predicted clusters. The clusters are labelled with the class most often associated with it. The classes are then reordered so a 100% accurate clustering would show a dark red diagonal from the top left corner to bottom right corner. The algorithm has a high tendency to mix temperatures (T_g, T_l) with their set points (WSP_T, SP_T and SP_T_Pot). The cluster OM includes almost all operating messages but also SP_T, AL (alert messages) and SP_O (set point). We observe that the clusters are neither complete nor homogeneous. There is a cluster, which contains just one kind of P (power) and C (counter) but neither of them completely.
5.1. Feature-based clustering
5.4. Homogeneity of clusters
First, we use statistical features as time series representation. For the 20 day data set, the ARIs of the results are shown in Fig. 3. The accuracies range from 22.9% (Mini batch k-means without any data set pre-processing) to zero for Birch, Affinity Propagation and DBSCAN, which depend on data set pre-processing. Gaussian Mixture with data set normalization achieves almost the same accuracy as Mini batch kmeans. The analysis of the short term data set shows a similar distribution with slightly higher values. We compare the result of clustering with Mini batch k-means for short and long term time frames in Fig. 5. Fig. 4 shows the results using the encoded representations generated by the auto-encoder approach. Using this kind of representations, more algorithms get low accuracies below 5%. On the other hand Mini batch k-means, which already achieved the highest performance using statistical features, outperforms every other algorithm and achieves an
We use homogeneity to determine what the actual number of distinguishable subgroups in our data set is. If the number of clusters is equal to the number of samples the homogeneity would be 100%, i.e. each sample would have its own unique cluster, but this is not a useful clustering. Hence there is a trade-off between homogeneity and a useful group information. Fig. 7 shows the homogeneity for an increasing number of clusters, compared to the ground truth. A significant increase in homogeneity until the algorithm distinguishes 100 clusters is observable.
Both the ARI and the homogeneity calculations are implemented in Scikit-learn. 5. Results
6. Discussion 6.1. Overall performance of the unsupervised clustering approach In this section, we take a closer look at the 100 clusters we are able
Fig. 3. Accuracy of clustering algorithms using statistical features. 1341
Applied Energy 238 (2019) 1337–1345
G. Bode et al.
Fig. 4. Accuracy of clustering algorithms using unsupervised extracted features.
Fig. 5. Accuracy of Mini batch k-means and DTW depending on the time frame.
to believe that the actual number of classes that can be distinguished from the data is higher than the assumed 22 classes. Further analysing the clusters found by the algorithms, we can derive information about what classes are actually distinct enough to be found reliably by machine learning algorithms, while simultaneously finding the largest possible discrimination of classes, and therefore provide the most information. To gain insights into our clusters we choose t-distributed stochastic neighbour embedding (t-SNE). T-SNE is a dimension reduction algorithm, which converts high dimensional data into lower dimensional representations [43]. In Fig. 8, we present the visualization of the data set of 3822 samples represented by their statistical features for one day. The legend shows the colour map for the classes according to their labels in the ground truth. In order to analyse the clusters, we plot their 100 centres marked by X‘s, side by side with the data in the visualization. We find a well-developed group at the top in the middle of the visualization. The algorithm also calculated four of its cluster centres in this region. One example, cluster C1 with mean value one, contains 648 operational messages (OM) and 100 set point actuators (SP_O). SP_O is the set point of OM and both generate binary values between one (active) and zero (not active). The data set includes 834 OM sensors and 358 SP_O actuators. Based on this observation we recommend a pregrouping into digital binary (SP_O, OM), digital multi-state (AL, VP) and data points fed from measurements (T_l, T_g or pressure). We also find some clusters with 100% homogeneity. C2 is a small
to distinguish and discuss potential modifications to the ground truth. When compared to the pre-defined 22 classes, even in our best performing scenario,1 the algorithms can only achieve an accuracy of 41.6%. This is significant lower as with the supervised approach published in [8]. They were able to achieve an overall accuracy of 73.2% on a similar data set, with some classes achieving accuracies up to 100%. From this we assume that unsupervised learning cannot be used to replace supervised learning for the intended application, as the results are not good enough to be trusted to correctly label time series in a productive setting. However, in contrast to supervised techniques, where the goal is to solve a pre-defined classification problem, our approach is based on inherent similarities of the data points. Hence, if these inherent data point classes do not match with the predefined classes, we believe that the bad results can be attributed partly to a mismatch between both sets of classes. Hence, we further investigated the results of the unsupervised clustering.
6.2. Data analysis using the clustering results When increasing the number of allowed clusters, we observe (Fig. 7) an increase in homogeneity until we reach 100 clusters. This is a reason 1
Mini batch k-means applied on pre-selected statistical features of one day. 1342
Applied Energy 238 (2019) 1337–1345
G. Bode et al.
Fig. 6. Confusion matrix of Mini batch k-means with statistical features.
homogeneity containing T_l with mean values around 30 °C. The values (Max: 36.45 °C; Mean: 31.2 °C; Min: 26.5 °C) in cluster C59 are slightly higher as the values of the cluster center of C55 (Max: 35.58 °C; Mean: 30.9 °C; Min: 26.94 °C). At a different temperature level we observe a similar relationship between the two clusters C25 and C52. These cluster pairs can be interpreted as flow and return values of different heat network temperature levels. T_l sensors were 100% correctly classified in [8], a more detailed classification seems possible and should be considered.
cluster with the mean 3508 containing only volatile organic compounds (VOC). The data set contains 86 such sensors, with the size of 23 the cluster contains a substantial part of it. We observe clear mixtures between liquid temperatures (T_l) and gas temperatures (T_g), for example in cluster C61. The mean value of 22.4 indicates that in C61 the internal room temperatures mix with liquid sensors at a similar temperature level. The data points mentioned: VOC, T_l and T_g are from measurements and, according to the results published in [8], can be distinguished very well by methods of supervised learning, supporting our assumption that cluster homogeneity is a valid metric to predict the performance in our use case. The heating system of the E.ON Energy Research Center includes a liquid temperature level of 35 °C [38]. We find two clusters with 100%
6.3. Evolution of time series classes To further enhance the results of time series classification for BACS
Fig. 7. Homogeneity depending on the number of clusters. 1343
Applied Energy 238 (2019) 1337–1345
G. Bode et al.
Fig. 8. Two-dimensional visualization of statistical features of 3822 time series. The colour legend shows the classification of the time series according to the ground truth.
for one day in high resolution, and a data set containing data for 20 days in reduced resolution. We also investigated a Dynamic Time Warp algorithm and compared the results with clustering algorithms applied on both preselected statistical features and unsupervised extracted features. We find that even the unsupervised clustering method with the highest accuracy in our test set-up performs poorly compared to previously developed supervised techniques. We conclude that unsupervised learning cannot be used to avoid the need for a training data set to be used in supervised learning. We can also observe that, in our test case, the promising approach of Dynamic Time Warp algorithms does not reward the high calculation cost with a higher accuracy when applied to the test data sets. On the other hand, we also observe that unsupervised clustering is possible and generate logically sound clusters. The presented techniques, combined with some basic information about the system, revealed valuable insights about the data structure and differences and similarities in the measured data points. Implementing these insights into future machine learning designs has a high potential to improve the classification of BACS time series data. When further investigating the effects of the number of clusters, we found that the homogeneity of the clusters significantly increased until we reached 100 clusters. This suggests that a more granular distinction between the classes is possible or even necessary. Investigating the confusion matrix of the clusters can reveal which classes are distinct and which classes mix easily. From this, information about an improved class system can be derived. For classes that are indistinguishable from the data alone, we suggest that the consideration of correlation between data points can be a useful addition to increase accuracy. When investigating the influence of unsupervised feature extraction, we found that a convolutional auto-encoder is able to reduce the effect of the data fidelity and time frame on the results of the machine learning algorithms. This effect can used to improve the accuracy and resilience against time frame selection for any machine learning algorithm applied to BACS data. Future work in the field will focus on dependencies between the time series. Considering correlations could, for example, help to distinguish actual values from their set values, as these classes have a very high tendency to mix. We also recommend to focus on larger time frames in order to catch representative time series behaviour of all classes in the system. A new set of classes will be defined building on
data, we propose to extend the approach to the classification problem to a layered classification system. First, a classification system can divide data points that are easily distinguishable. Further algorithms can than be trained to find specific classes in these subsystems. Because the algorithms can be more finely tuned to certain classes, this way the overall accuracy can be improved. A intuitive first distinction is between physical (e.g. measured values) and non-physical (e.g. set points) data points. In a first experiment, a random forest classifier could reach a 94% precision in this task using the one day data set. Using only the values classified as physical, we can improve the average accuracy of a supervised approach to 86.9% compared to the 73.2% reached in earlier works [8]. The conclusions from the unsupervised algorithms can be applied to further enhance these sub-classifiers and improve the total accuracy of the time series classification for all desired classes. We believe that subclassifiers can also find distinctions like flow and return temperature from a classified temperature, since they can be fine-tuned to small distinctions without losing overall accuracy and applicability for the class temperature. 6.4. Sensitivity of the algorithms vs. the selected data set We observe a very different sensitivity of the clustering accuracies to the selection of the time frame and resolution of the data. The accuracy of DTW and Mini batch k-means clustering based on unsupervised extracted features is almost the same for both data sets. Opposed to that, the accuracy of the statistical feature-based approach is very dependent on the time frame selected. Applied on the day observations data set, we achieve a much higher score (41.6%) than for the long term observations data set with 22.9%. This effect leads to the conclusion, that statistical features chosen are very dependent on the observation time and resolution, and are therefore susceptible any reduction in data fidelity. 7. Conclusion and future work In this paper, we investigated how unsupervised machine learning can be used to simplify the process of classifying time series data in BACS. To this end, we applied a selection of the most promising, stateof-the-art unsupervised clustering techniques on a real world BACS time series data set. We tested these techniques for a data set containing data 1344
Applied Energy 238 (2019) 1337–1345
G. Bode et al.
the information extracted from the unsupervised clustering to better represent the actual data, and will be applied as a ground truth for further investigations.
series data. Int J Comput Res 2001;10:49–61. [18] Rani S, Sikka G. Recent techniques of clustering of time series data: a survey. Int J Comput Appl 2012;52(15):1–9. https://doi.org/10.5120/8282-1278. [19] Fulcher BD, Jones NS. Highly comparative, feature-based time-series classification. CoRR abs/1401.3531. [20] Hastie T, Tibshirani R, Friedman JH. The elements of statistical learning: data mining, inference, and prediction, second edition, corrected at 12th printing 2017 Edition. Springer series in statistics. New York (NY): Springer; 2017. [21] Bengio Y, Courville A, Vincent P. Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 2013;35(8):1798–828. https://doi. org/10.1109/TPAMI.2013.50. [22] Wang J, Li Y. Multi-step ahead wind speed prediction based on optimal feature extraction, long short term memory neural network and error correction strategy. Appl Energy 2018;230:429–43. https://doi.org/10.1016/j.apenergy.2018.08.114. [23] Belaout A, Krim F, Mellit A, Talbi B, Arabi A. Multiclass adaptive neuro-fuzzy classifier and feature selection techniques for photovoltaic array fault detection and classification. Renew Energy 2018;127:548–58. https://doi.org/10.1016/j.renene. 2018.05.008. [24] Feng C, Cui M, Hodge B-M, Zhang J. A data-driven multi-model methodology with deep feature selection for short-term wind forecasting. Appl Energy 2017;190:1245–57. https://doi.org/10.1016/j.apenergy.2017.01.043. [25] Cui Z, Chen W, Chen Y. Multi-scale convolutional neural networks for time series classification. CoRR abs/1603.06995. [26] Dau HA, Keogh E, Kamgar K, Yeh C-CM, Zhu Y, Gharghabi S, et al. The ucr time series classification archive; 2018. [27] Miller C, Meggers F. The building data genome project: an open, public data set from non-residential building electrical meters. Energy Procedia 2017;122:439–44. https://doi.org/10.1016/j.egypro.2017.07.400. [28] Goroshin R, Bruna Joan, Tompson Jonathan, Eigen David, LeCun Yann. Unsupervised feature learning from temporal data. CoRR abs/1504.02518. [29] Guo X, Liu X, Zhu E, Yin J. Deep clustering with convolutional autoencoders. In: Liu D, Xie S, Li Y, Zhao D, El-Alfy E-SM, editors. Neural information processing. Cham: Springer International Publishing; 2017. p. 373–82. [30] Du K-L. Clustering: a neural network approach. Neural Networks: Off J Int Neural Network Soc 2010;23(1):89–107. https://doi.org/10.1016/j.neunet.2009.08.007. [31] Fan C, Xiao F, Zhao Y, Wang J. Analytical investigation of autoencoder-based methods for unsupervised anomaly detection in building energy data. Appl Energy 2018;211:1123–35. https://doi.org/10.1016/j.apenergy.2017.12.005. [32] Baldi P, Guyon G, Dror V, Lemaire G, Taylor D, Silver D. Autoencoders, unsupervised learning, and deep architectures editor: I. [33] Keogh E, Ratanamahatana CA. Exact indexing of dynamic time warping. Knowl Inform Syst 2005;7(3):358–86. https://doi.org/10.1007/s10115-004-0154-9. [34] Liu S, Liu C. Scale-varying dynamic time warping based on hesitant fuzzy sets for multivariate time series classification. Measurement 2018;130:290–7. https://doi. org/10.1016/j.measurement.2018.07.094. [35] Wan Y, Chen X-L, Shi Y. Adaptive cost dynamic time warping distance in time series analysis for classification. J Comput Appl Math 2017;319:514–20. https://doi.org/ 10.1016/j.cam.2017.01.004. [36] Liu B, Luan W, Yu Y. Dynamic time warping based non-intrusive load transient identification. Appl Energy 2017;195:634–45. https://doi.org/10.1016/j.apenergy. 2017.03.010. [37] Bode G, Fütterer J, Müller D. Mode and storage load based control of a complex building system with a geothermal field. Energy Build 2018;158:1337–45. https:// doi.org/10.1016/j.enbuild.2017.11.026. [38] Futterer J, Constantin A, Schmidt M, Streblow R, Muller D, Kosmatopoulos E. A multifunctional demonstration bench for advanced control research in buildings—monitoring, control, and interface system. In: IECON 2013 - 39th annual conference of the IEEE Industrial Electronics Society, pp. 5696–5701. https://doi. org/10.1109/IECON.2013.6700068. [39] Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikitlearn: machine learning in python. J Mach Learn Res 2011;12:2825–30. [40] Chollet F. keras; 2015. [41] Jones E, Oliphant T, Peterson P. Scipy: Open source scientific tools for python; 2001. [42] Hunter JD. Matplotlib: a 2d graphics environment. Comput Sci Eng 2007;9(3):90–5. https://doi.org/10.1109/MCSE.2007.55. [43] van der Maaten L, Hinton G. Visualizing high-dimensional data using t-sne. J Mach Learn Res 2008:2579–605. [44] Grabusts P, Borisov A. Clustering methodology for time series mining. Scientif J Riga Techn Univ Comput Sci 2009;40(1):81–6. https://doi.org/10.2478/v10143010-0011-0.
Acknowledgements We gratefully acknowledge the financial support by Federal Ministry for Economic Affairs and Energy (BMWi), promotional reference 03SBE006A. References [1] U.S. Energy Information Administration. Annual Energy Outlook 2018. Washington, DC: U.S. Department of Energy; 2018. [2] International Energy Agency. Transition to sustainable buildings: Strategies and opportunities to 2050. Paris: Organisation for Economic Cooperation and Development; 2013. [3] Zucker G, Habib U, Blöchle M, Judex F, Leber T. Sanitation and analysis of operation data in energy systems. Energies 2015;8(11):12776–94. https://doi.org/10. 3390/en81112337http://www.mdpi.com/1996-1073/8/11/12337/ht. [4] Chakraborty T, Nambi AU, Chandra R, Sharma R, Swaminathan M, Kapetanovic Z. Sensor identification and fault detection in iot systems. In: Ramachandran GS, Krishnamachari B, editors. Proceedings of the 16th ACM conference on embedded networked sensor systems - SenSys ’18 New York (New York, USA): ACM Press; 2018. p. 375–6. https://doi.org/10.1145/3274783.3275190. [5] Chakraborty T, Nambi AU, Chandra R, Sharma R, Swaminathan M, Kapetanovic Z, et al. Fall-curve: a novel primitive for iot fault detection and isolation. Proceedings of the 16th ACM conference on embedded networked sensor systems, SenSys ’18 New York (NY, USA): ACM; 2018. p. 95–107. https://doi.org/10.1145/3274783. 3274853. [6] Gao J, Ploennigs J, Berges M. A data-driven meta-data inference framework for building automation systems. Proceedings of the 2Nd ACM international conference on embedded systems for energy-efficient built environments, BuildSys ’15 New York (NY, USA): ACM; 2015. p. 23–32. https://doi.org/10.1145/2821650.2821670. [7] Hong D, Wang H, Ortiz J, Whitehouse K. The building adapter: towards quickly applying building analytics at scale. Proceedings of the 2Nd ACM international conference on embedded systems for energy-efficient built environments, BuildSys ’15 New York (NY, USA): ACM; 2015. p. 123–32. https://doi.org/10.1145/ 2821650.2821657. [8] Fütterer J, Kochanski M, Müller D. Application of selected supervised learning methods for time series classification in building automation and control systems. Energy Procedia 2017;122:943–8. https://doi.org/10.1016/j.egypro.2017.07.428. [9] Stinner F, Kornas A, Baranski M, Müller D. Structuring building monitoring and automation system data. REHVA Eur HVAC J 2018;2018(04):10–5. [10] Alfred R. The rise of machine learning for big data analytics. 2016 2nd international conference on science in information technology (ICSITech) IEEE; 2016. p. 1. https://doi.org/10.1109/ICSITech.2016.7852593. 26.10.2016–27.10.2016. [11] Zakaria J, Mueen A, Keogh E. Clustering time series using unsupervised-shapelets. 2012 IEEE 12th international conference on data mining IEEE; 2012. p. 785–94. https://doi.org/10.1109/ICDM.2012.26. 10.12.2012–13.12.2012. [12] Kotsiantis SB, Zaharakis ID, Pintelas PE. Machine learning: a review of classification and combining techniques. Artif Intell Rev 2006;26(3):159–90. https://doi.org/10. 1007/s10462-007-9052-3. [13] Kotsiantis SB. Supervised machine learning: a review of classification techniques. Proceedings of the 2007 conference on emerging artificial intelligence applications in computer engineering: real word AI systems with applications in eHealth, HCI, information retrieval and pervasive technologies. Amsterdam (The Netherlands): IOS Press; 2007. p. 3–24http://dl.acm.org/citation.cfm?id=1566770.1566773. [14] Verma A, Asadi A, Yang K, Tyagi S. A data-driven approach to identify households with plug-in electrical vehicles (pevs). Appl Energy 2015;160:71–9. https://doi. org/10.1016/j.apenergy.2015.09.013. [15] Wang W, Hong T, Li N, Wang RQ, Chen J. Linking energy-cyber-physical systems with occupancy prediction and interpretation through wifi probe-based ensemble classification. Appl Energy 2019;236:55–69. https://doi.org/10.1016/j.apenergy. 2018.11.079. [16] Wei Y, Zhang X, Shi Y, Xia L, Pan S, Wu J, et al. A review of data-driven approaches for prediction and classification of building energy consumption. Renew Sustain Energy Rev 2018;82:1027–47. https://doi.org/10.1016/j.rser.2017.09.108. [17] Nanopoulos A, Alcock R, Manolopoulos Y. Feature-based classification of time-
1345