Neurocomputing 352 (2019) 54–63
Contents lists available at ScienceDirect
Neurocomputing journal homepage: www.elsevier.com/locate/neucom
A data mining method based on unsupervised learning and spatiotemporal analysis for sheath current monitoring Y. Wang a, H. Ye a,∗, T. Zhang b,1, H. Zhang c a
Department of Automation, Tsinghua University, Beijing 100084 China China Property and Casualty Reinsurance Co., Ltd., Beijing 100033 China c Zhengzhou Huali Information Technology Co., Ltd., Henan 450000 China b
a r t i c l e
i n f o
Article history: Received 9 January 2019 Revised 25 March 2019 Accepted 2 April 2019 Available online 25 April 2019 Communicated by Dr. Nianyin Zeng
a b s t r a c t Sheath current is one of the key indicators of underground power cable conditions. Considering the limitations of existing model-based methods for sheath current monitoring and difficulty in handling the increasing amount of unlabeled sheath current data accumulated by cable monitoring systems, we propose a data mining method based on unsupervised learning and spatiotemporal analysis of sheath currents for underground power cable monitoring. Tests based on real historical data demonstrate that the proposed method can effectively reveal unknown inherent patterns in unlabeled sheath current data.
Keywords: Sheath current monitoring Data mining Unsupervised learning Spatiotemporal analysis
1. Introduction With the continuous development of urbanization in China, an increasing number of underground power cables are being installed for energy transmission because they occupy less floor space and are more reliable compared to overhead lines [1,2]. As the length of an underground cable increases, the induced voltage and circulating current in its metallic sheath will also increase, which may lead to sheath breakdown and power loss [3,4]. Therefore, cross-bonding of metallic sheaths is often employed in three-phase high-voltage alternating-current underground cables to reduce the currents in their metallic shields [5,6]. The current in this circulating path which is formed by the cross-bonded sheaths, is called sheath current [7]. Sheath current contains two parts, i.e., the leakage current produced by insulation and the imbalance circulating current caused by some different factors of three phases [7,8]. As discussed in [6,9], the weakest points in a cross-bonded cable system are joints, link boxes, and terminals. According to cable failure statistics, most cable failures are caused by sheath system faults, such as corrosion or mechanical damage to cable jackets and flooded cable joints [3,4]. Because sheath system faults often
∗
1
Corresponding author. E-mail address:
[email protected] (H. Ye). He was with Tsinghua University from 2007 to 2017.
https://doi.org/10.1016/j.neucom.2019.04.006 0925-2312/© 2019 Published by Elsevier B.V.
© 2019 Published by Elsevier B.V.
lead to excess sheath currents [3,4], such currents are indicative of the operating conditions of a cross-bonded cable system. Therefore, current data have been analyzed to improve the asset management of cable systems and monitor power cables in recent years [4,8,10–14]. In [4,10], sheath currents under normal conditions were first estimated based on a simulation model. Next, a rule that can distinguish the differences between the estimates and measured sheath currents was utilized to determine if cables were operating under normal conditions. Similarly, a simulation model for cross-bonded cables was utilized in [13] to generate sheath currents under normal conditions. Next, a fixed threshold for the ratios between the estimated sheath currents and load currents was adopted to detect various abnormalities. In [14], a model was proposed to separate leakage currents from detected sheath currents at the cross-bonding joints of cables. Then, based on this model, in [8,10,14], the authors proposed a model-based method to detect insulation deterioration based on the phase differences (or an equivalent form) of the leakage currents. The common feature of the above methods is that all of them utilize an ideal first-principle model to describe the different behaviors of sheath currents under normal and faulty (or deteriorative) conditions, which is the key for defining rules for fault detection. However, this may limit their application in real and complex circumstances because the estimated results provided by cable simulation models may deviate from measured values [15]. That is because (i) sheath currents may be influenced by many coupled and complex factors
Y. Wang, H. Ye and T. Zhang et al. / Neurocomputing 352 (2019) 54–63
(such as load currents, length imbalance, and burial formation [7]), which may be unknown or vary with time, meaning they cannot be completely considered or accurately described by simulation models; and (ii) simulation models for sheath currents may require strong assumptions (such as the invariance assumptions for impedance, thermal stress, and voltage given in [14]), which cannot be satisfied in real applications. In addition to research on fault-detection methods, an increasing number of condition monitoring systems for high-voltage underground power cables have also been reported, such as those in [16–18]. However, such studies typically focus on the structures and basic functions of systems, rather than core monitoring algorithms. To the best of our knowledge, most existing monitoring systems for underground power cables still utilize simple threshold-based alarms to judge the status of monitored cables, meaning they raise an alarm for an abnormality or fault only when the measurements of some key variables (such as current, voltage, or temperature [19]) exceed predetermined constant thresholds. Because the measured variables in real applications may fluctuate frequently with time, even when power cables are operating under normal conditions, these methods may lead to serious false alarms or missed detections if thresholds are improperly selected or unable to adapt to the time-varying changes of monitored cables or their environment. Considering the limitations of model-based methods, the increasing amount of unlabeled field data accumulated by underground cable monitoring systems (i.e., there is no classification information for the data), and severe lack of reports on data-driven methods for sheath current monitoring, in this paper, we propose a data mining method based on unsupervised learning and spatiotemporal analysis for sheath current monitoring. The proposed method first classifies historical data into different groups based on feature extraction and unsupervised clustering, then extracts unknown inherent patterns from the cable data based on spatiotemporal analysis of the clustering results. The merits and the contributions of the proposed method can be summarized as follows. (i) Requiring no a-priori mathematical model of the monitored cables makes it more applicable to real circumstances and complicated sheath current data. (ii) Classifying sheath currents according to the trends of sheath currents, rather than their amplitudes, is helpful to overcome the demerits of the simple threshold-based alarm strategy widely used in most existing monitoring systems for underground power cables. (iii) The adoption of unsupervised learning allows the proposed method to handle a large amount of unlabeled data, which is a common problem faced by data-driven methods for system monitoring or abnormality detection. As one of the most popular methods for data mining, clustering has been successfully applied in the electricity industry. One such application is load profile clustering, which can be divided into three main focuses: (i) load profile pattern analysis [20,21]; (ii) abnormality detection for electricity consumption [22,23]; and (iii) the prediction of the future electricity consumption patterns [24–26]. However, the inability to handle spatiotemporal attributes may limit the application of these methods to sheath current monitoring. Another application of clustering is the separation of partial discharge (PD) sources [27]. However, the data types of PD signals are always waveforms or phase-resolved patterns, which differ significantly from sheath currents. To the best of our knowledge, there have been no papers published on the application of data mining techniques to sheath current analytics. To sum up, so far, though SdA, t-SNE, and DBSCAN, or other machine learning methods have been widely used, such as [28–31], there is no report on
55
Fig. 1. Flowchart for SdA.
the unsupervised learning of sheath currents, and compared with the existing works on unsupervised learning or clustering itself, this paper focuses on the application of the unsupervised learning to sheath currents. The remainder of this paper is organized as follows. Section 2 reviews the dimensionality reduction techniques utilized in the proposed method. A data mining problem for a reshaped sheath current dataset is described in Section 3. In Section 4, a clustering method based on feature extraction is proposed and the spatiotemporal analysis of clustering results is detailed. Section 5 discusses our conclusions and plans for future work. 2. Preliminaries Notation 1: In this section, we utilize rowj (∗ ) to denote the jth row vector of the matrix ∗ , and s(rowj (∗ )) to denote a nonlinear function for the vector rowj (∗ ) [32]. 2.1. Introduction of stacked denoising autoencoders For a dataset X ∈ RN×d , where N and d denote the number of samples and dimension of each sample, respectively, stacked denoising autoencoders (SdA), which are illustrated in Fig. 1 [32,33], can be utilized to robustly extract a low-dimensional representation, where E(i) for i = 0, 1, . . . , M − 1 are the isotropic Gaussian noise matrices, f(i) for i = 1, 2, . . . , M are encoders and defined as
row j Y (i ) = f (i ) row j Y (i−1) + E (i−1)
= s row j Y (i−1) + E (i−1) i + bi , j = 1, 2, . . . , N
(1)
and g(i ) for i = 1, 2, . . . , M are decoders and defined as
row j Xˆ (i ) = g(i ) row j Xˆ (i+1)
= s row j Xˆ (i+1) i + bi j = 1, 2, . . . , N
(2)
56
Y. Wang, H. Ye and T. Zhang et al. / Neurocomputing 352 (2019) 54–63
The formula Y (i−1 ) + E (i−1 ) in equation (1) is one of the corruption processes mentioned in [32], called additive isotropic Gaussian noise. As shown in Fig. 1, Y = Y (M ) ∈ RN×a with a d is the final low-dimensional representation of X and Xˆ = Xˆ (1 ) ∈ RN×d is the final reconstruction of the original dataset X. In Fig. 1, each pair of f(i) and g(i) forms a traditional denoising autoencoder [34,35], whose parameters i , bi and i , bi are determined by solving the optimization problem (3), where H is a loss function for measuring the distance between two vectors. Additional details can be found in [34,35]. During the training procedure, (3) is solved recursively for i from 1 to M [32]. It should be noted that the corrupted input for each layer is only utilized during the training procedure. In the feature extraction stage, the original inputs without added noise are utilized. Based on stochastic gradient descent, the parameters for the entire network can be fine-tuned [32]. min
i ,bi , i ,b i
=
=
=
N 1 H row j Y (i−1) , row j Xˆ (i ) N j=1
min
N 1 H row j Y (i−1) , g(i ) row j Xˆ (i+1) N
min
N 1 H row j Y (i−1) , g(i ) row j Y (i ) N
i ,bi , i ,b i
i ,bi , i ,b i
min
i ,bi , i ,b i
j=1
j=1
N 1 H row j Y (i−1) , g(i ) f (i ) row j Y (i−1) + E (i−1) N j=1
(3) 2.2. Introduction of T-distributed Stochastic Neighbor Embedding T-distributed Stochastic Neighbor Embedding (t-SNE) is a novel data visualization technique that was first presented in [36]. It can map high-dimensional data into a two-dimensional or threedimensional space. Unlike traditional linear dimensionality reduction techniques, which focus on global structures, or traditional nonlinear techniques, which prefer to maintain local structures [36], t-SNE can reveal both global and local structures in highdimensional data simultaneously. Therefore, t-SNE is an excellent choice for dimensionality reduction. For a high-dimensional dataset U ∈ RN×d , where N and d denote the number of samples and dimension of each sample, respectively, t-SNE aims to map the dataset to a low-dimensional representation V ∈ RN×a (with a = 2 or 3 and a d) by minimizing [36]
C=
i
pi j log
j
pi j qi j
(4)
where pij is the similarity between rowi (U) and rowj (U), and qij is the similarity between rowi (V) and rowj (V). By assuming that the neighbors of rowi (U) follow a Gaussian distribution centered on rowi (U), the similarity between rowi (U) and its neighbor rowj (U) is defined as [36]
pi j =
p j |i + pi| j 2N
with
(5)
2
exp −ui − u j /2σi2 p j |i = exp −ui − uk 2 /2σi2
(6)
k=i
In the low-dimensional space, a Student t-distribution with one degree of freedom is used to define the similarity between rowi (V)
and rowj (V) as [36]
2 −1
exp 1 + vi − v j
qi j =
k=l
exp 1 + vk − vl 2
−1
(7)
By applying t-SNE to a high-dimensional dataset, a twodimensional or three-dimensional representation of that dataset can be obtained.
3. Problem formulation and basic idea As mentioned in Section 1, the main purpose of this paper is to propose an unsupervised-learning-based data mining method that can discover the unknown inherent patterns in historical sheath current data. Therefore, a 144-day historical sheath current dataset from an underground power cable of length 3.9 km in Henan Province, China was utilized for data mining. The target cable has eight cross-bonding joints, each of which has three sheath current sensors for three phases and one sensor for sheath-to-ground current. For convenience, we refer to the dataset for data mining as the training set. The dataset was collected with a sampling interval of five minutes and contains no classification information. Because nonzero sheath-to-ground currents are induced by asymmetries in three-phase minor sections and typically have very small amplitudes, they were not considered in our datasets. Notation 2: For convenience, we utilize n#A, n#B and n#C to denote the sensors for the A, B, C phases of joint n, respectively, for n = 1, 2, . . . , 8. Because the data from sensors 6#A, 6#B and 6#C were missing, these fields were not included in our dataset. Therefore, the training set contains data from 21 sensors corresponding to seven joints. From the original sheath current data, the daily periodicity (strong similarities between each day’s sheath current patterns) can be clearly observed. This periodicity is caused by the periodicity of load currents related to the daily activities of humans. For convenience, we refer to the sheath current from a sensor for a day from 0 0:0 0 to 24:00 as a day-sample. To illustrate the similarities between day-samples from different sensors or different days more clearly, some original sheath current curves are presented in Fig. 2. To be more specific, Fig. 2(a) presents 21 day-samples collected from sensor 1#A for 21 consecutive days, i.e., 04/17/2017-05/07/2017, where each curve corresponds to sheath current on a specific day. While the 21 daysamples collected from all 21 different sensors on the same day (i.e., 03/21/2017), are presented in Fig. 2(b), where each curve corresponds to sheath current of a specific sensor. One can be seen that (i) many day-samples have similar trends, despite their varying amplitudes at each sampling instance; and (ii) although most day-samples show strong similarities, there are a few day-samples that differ significantly from the others, which may indicate abnormalities. This motivated us to utilize day-samples instead of the original sheath currents as the basis for both data mining and abnormality detection. It is worth noting that normalization was performed on the entire time series for each sensor before the data were segmented into day-samples. Therefore, in the following analysis, we utilize the matrix XT rain ∈ R3024×288 to denote the 144-day training set, where each row is a day-sample. Because there are 21 joint sensors, there are a total of 21 × 144 = 3024 day-samples. Because the sampling interval is five minutes, each day-sample contains 60 × 24/5 = 288 sample points.
Y. Wang, H. Ye and T. Zhang et al. / Neurocomputing 352 (2019) 54–63
57
Fig. 2. Examples of day-samples: (a) an example of day-samples from sensor 1#A for 21 consecutive days; (b) an example of day-samples from 21 sensors for the same day.
4. Data mining based on feature extraction, clustering and spatiotemporal analysis 4.1. SdA and t-SNE-based feature extraction and DBSCAN-based clustering Like many clustering methods for high-dimensional data, the data clustering method proposed in this paper consists of two steps: (i) extracting low-dimensional features from the original data by transforming XTrain into a new matrix Y ∈ R3024×a with a 288 and (ii) classifying extracted features (i.e., each row of Y) into different groups. Both SdA and t-SNE can be utilized for feature extraction. The authors of [36] pointed out that utilizing autoencoders prior to applying t-SNE can enhance the performance of t-SNE. Additionally, as mentioned in Section 2.2, t-SNE can reveal both the global and local structure of the high-dimensional data simultaneously [36]. Literature [37] has pointed out that PCA and autoencoders outperform the other unsupervised learning techniques on the natural datasets. While due to the fact that PCA is sensitive to disturbances and is not suitable to extract nonlinear features, SdA, which is modified from basic autoencoders to extract features robustly, is chosen as the first step of our dimensionality reduction procedure. Besides, considering the daily sheath current spreads only in one dimension (i.e. the time dimension), it is not necessary to adopt CNN as the dimensionality reduction technique, because it is more suitable for images and videos [38]. Therefore, similar to [39], a feature extraction technique based on SdA and t-SNE was adopted in this paper to extract robust features from the training set XTrain . The proposed method first transforms XT rain ∈ R3024×288 into Y ∈ R3024×a by utilizing SdA with a 288, then transforms Y ∈ R3024×a into Y ∈ R3024×a by utilizing t-SNE. For each row of XTrain , (i.e., a day-sample with a length of 288 sample points), there is a corresponding a-dimensional (a = 2 or 3) row vector in Y representing its features. Because distance-based clustering methods, such as k-means [40], can only distinguish spherical or ball-shaped clusters from raw datasets [41], whereas the distributions of features extracted by SdA and t-SNE may have arbitrary shapes, distance-based clustering methods may not be applicable to such features. Compared to distance-based clustering methods, DBSCAN, which was first proposed by Ester et al. in [42] and has become one of the most popular density-based clustering methods, is more applicable to our problem because it can discover clusters with arbitrary shapes within datasets [42]. Therefore, in this study, DBSCAN was adopted to classify the low-dimensional features extracted by SdA and tSNE. The key idea of DBSCAN-based clustering is that each feature vector for a cluster should have at least a given minimum number (MinPts) of neighboring feature vectors within a given radius (Eps). In other words, the density of each cluster should exceed a given threshold [42,43].
Fig. 3. Flowchart for Algorithm 1.
A flowchart for SdA and t-SNE-based feature extraction and DBSCAN-based clustering is presented in Fig. 3. The entire process is summarized in Algorithm 1. Algorithm 1 Feature extraction and clustering Require: The training set of unlabelled day-samples XT rain ; The network parameters of SdA; The per plexity of t-SNE; The parameters E ps and MinPts of DBSCAN. Ensure: The number of clusters C; The cluster label of each day-sample. 1: Let the training set XT rain be the input for SdA, as shown in Fig. 1. Assign the isotropic Gaussian noises E (i ) for i = 0, 1, . . . , M − 1 to create a corrupted dataset through the corruption method mentioned in Section 2.1. 2: As shown in Fig. 1, set the output dimensions of each layer for the encoders to a1 , a2 ,…,a [35], and assign a symmetric structure in reverse order for the decoders. 3: Obtain Y (i ) for i = 1, 2, . . . , M, as shown in Fig. 1, by solving (3) recursively based on (1) and (2). Then utilize Y = Y (M ) ∈ RN×a as the intermediate representation of XT rain , as shown in Fig. 3. 4: Let the output Y from SdA be the high-dimensional input for the t-SNE introduced in Section 2.2 (i.e., U = Y ). 5: Utilize t-SNE to derive a low-dimensional representation V by minimizing the cost function (4), and let Y = V ∈ RN×a , where Y is the final feature shown in Fig. 3. 6: With the preset parameters E ps and MinPts for DBSCAN, classify the feature Y into C clusters utilizing DBSCAN. 7: return The clustering results of Y . It is worth noting that because the row vectors in Y have a oneto-one relationship to the day-samples in XTrain , the clustering results of Y correspond to the clustering results of XTrain . Remark 1. The parameter perplexity has a small influence on the performance of t-SNE, and is typically set in the range of 5–50 [36]. Additionally, the result of t-SNE is a two- or three-dimensional map (i.e., a = 2 or 3), and a was set to two in this study. To apply Algorithm 1 to the training set XTrain , certain parameters must be set. As a kind of deep neural networks, the hyperparameter setting for SdA is usually based on experienced optimization, which means that there is no silver bullet, i.e., a
58
Y. Wang, H. Ye and T. Zhang et al. / Neurocomputing 352 (2019) 54–63
Fig. 4. Distribution and clustering results of features.
Fig. 5. Spatiotemporal distribution of clustering results for XTrain.
standard and general method for it. As in [32], we find the optimal solution through repeated experiments with manual guidance. During the selection process, the suggestions for parameter setting of deep neural networks mentioned in [44] and [45], such as some tricks to determine hyperparameters or adopting as deep architectures as possible under the premise of not overfitting, are also considered. Therefore, in Step 1 of Algorithm 1, the isotropic Gaussian noises E(i) for i = 0, 1, . . . , M − 1 were set as normal distributions with mean = 0, and variance = 0.01, which are independent and identically distributed. In Step 2, based on the method for hyperparameter settting mentioned above, a five-layer SdA with the output dimensions of each layer set to 40 0, 20 0, 10 0, 50, and 7, respectively, was employed to obtain the intermediate representation Y . Another important parameter of Algorithm 1, namely perplexity for t-SNE in Step 5, was set to 15 here because it is an indicator of the effective number of neighbors [36]. Additionally, based on the criterion of parameter setting for DBSCAN presented in [46] in Step 6, Eps and MinPts were set to seven and three, respectively. With parameters set as described above, the feature matrix Y for the training set XTrain and final clustering results can be obtained by executing Algorithm 1. Fig. 4 presents the distribution of the extracted two-dimensional features in the feature space defined in Step 5 of Algorithm 1, where each point corresponds to a two-dimensional row vector of the matrix Y and the two axes (i.e., y1 and y2 ) correspond to the two dimensions of each row vector. Futhermore, the feature points are plotted in different colors to
distinguish the clusters they belong to, which is given by DBSCAN (i.e., Step 6 of Algorithm 1) in Fig. 4. 4.2. Spatiotemporal analysis of clustering results 4.2.1. Consistency analysis of sensors and recursive clustering Because each day-sample corresponds to a specific day and specific sensor, the clustering results can be plotted on a spatiotemporal plane, as shown in Fig. 5, where a row represents how the daysample types (given by clustering results) of a sensor change over time and a column represents how the day-sample types within one day change with sensors. From Fig. 5, one can see that among the 21 sensors, 15 sensors (i.e., sensors 1#A, 1#B, 1#C, 2#A, 2#B, 2#C, 3#A, 4#A, 4#B, 4#C, 5#A, 5#B, 5#C, 7#C, and 8#B, which are called sensors of Group 1 for convenience) show very similar periodicities on most days. Based on this periodicity, clusters that only appear occasionally on specific days can be easily detected and identified as potential abnormalities. However, from Fig. 5, one can also see that the other six sensors (i.e., 3#B, 3#C, 7#A, 7#B, 8#A, and 8#C, which are called sensors of Group 2 for convenience) behave differently from the sensors of Group 1. Five of them (i.e., sensors 3#B, 3#C, 7#B, 8#A, and 8#C) show no obvious periodicity. These differences are likely caused by measurement inconsistencies between the two groups of sensors, rather than abnormalities on specific days. The inconsistent sensors can also be picked from all the sensors through
Y. Wang, H. Ye and T. Zhang et al. / Neurocomputing 352 (2019) 54–63
59
G1 Fig. 6. Spatiotemporal distribution of clustering results for XTrain.
some quantitative indices, such as the number of day-samples for each sensor in the two largest clusters (i.e., cluster 1 and cluster 2). Because our purpose in data mining is to discover occasional abnormal patterns on specific days or at specific locations among a large number of consistent normal day-samples, and because it is intuitive that a large number of inconsistent day-samples in Group 2 can create a large disturbance, we reconstructed a new trainG1 ing set XT rain ∈ R2160×288 containing 144 × 15 = 2160 day-samples by removing the day-samples from the six sensors of Group 2 from the training set XTrain , then executed Algorithm 1 again by replacG1 ing XTrain with XT rain . It is worth noting that except for joint 6, which had no data, the sensors of Group 1 cover all joints in the power cable considered in this paper, meaning they are able to reflect the condition of the entire cable. G1 The re-clustering results for XT rain from Algorithm 1 are presented in Fig. 6. One can see that compared to the clustering results for XTrain , which contained data from the 21 sensors shown in Fig. 5, removing the six inconsistent sensors significantly improved the consistency and periodicity of the clustering results. 4.2.2. Discovering the inherent patterns of clusters According to Fig. 6, a cluster can be differentiated from other clusters based on the following three attributes: (i) how large the total area of the cluster is, (ii) whether or not the cluster’s distribution over time shows clear periodicity, and (iii) which sensors the day-samples in the cluster come from. Therefore, in the following analysis, we will discover the inherent patterns of cluster i in Fig. 6 for i = 0, 1, . . . , 8 based on the following statistics: (i) the total number of the day-samples in cluster i, denoted as Ni ; (ii) distribution of Ni over the days of a week, denoted as DtN ; and (iii) distribution of Ni over the sensors, denoted as DsN .
i
i
The number of day-samples in each cluster i (i.e., Ni ) and cumulative percentage of each cluster (i.e., (N0 + N1 + . . . + Ni )/N for i = 0, 1, . . . , 8) are shown in Fig. 7. One can see that clusters 1, 2, 3, and 4 all have a large number of day-samples and account for G1 97.3% of XT rain , whereas the remaining five clusters (i.e., clusters 0, 5, 6, 7, and 8) contain only a very small number of day-samples. Because a system should operate normally most of the time, it is logical to conclude that clusters 14 follow normal patterns and the other clusters, which contain very few day-samples, are potentially abnormal patterns. Hereafter, we will refer to clusters 14 as main clusters and clusters 0 and 58 as minor clusters. 4.2.2.1. Analysis of main clusters. The original day-sample curves of each main cluster are shown below. To analyze the inherent pat-
Fig. 7. Number of day-samples and cumulative percentage of each cluster.
terns of the main clusters, Fig. 8–Fig. 11 further present the other two statistics for each cluster (i.e., DtN and DsN for i = 1, 2, . . . , 4). i
i
Fig. 8 presents the original day-samples, distribution of N1 over time, and that over sensors for cluster 1. From Fig. 8(a), one can see that the original sheath current curves for this cluster are visually consistent. This cluster belongs to the top-four largest clusters according to Fig. 7, meaning the day-samples likely correspond to a normal operation pattern because a system should operate normally most of the time. Additionally, the day-samples in this class are distributed mainly on Tuesday, Wednesday, Thursday, Friday, and Saturday in a periodic manner according to Fig. 8(b). Finally, the day-samples cover all of the sensors (or locations) with a relatively uniform distribution according to Fig. 8(c), meaning the inherent patterns in this cluster represent a common pattern for the entire power cable, rather than a local pattern for a specific sensor or location. Based on these factors, we can conclude that this cluster corresponds to a normal operation pattern for the entire cable for the days of Tuesday to Saturday. Figs. 9 and 10 present the original day-samples, DtN (i = 2, 3), i
and DsN (i = 2, 3) for clusters 2 and 3, respectively. Following the i
analysis similar to that for cluster 1, from Figs. 9 and 10, we can conclude that these two classes correspond to two normal patterns for the entire cable for Sunday and Monday, respectively. The original day-samples, DtN , and DsN for cluster 4 are pre4 4 sented in Fig. 11. The consistency of the day-samples in Fig. 11(a) is not as good as that for clusters 1, 2 and 3 shown in Fig. 8(a), Fig. 9(a),and Fig. 10(a), respectively. From Fig. 11(b), one can see that the day-samples in this cluster do not show clear periodicity. However, by observing the corresponding color of cluster 4 in Fig. 6, one can see that the day-samples in cluster 4 are mainly
60
Y. Wang, H. Ye and T. Zhang et al. / Neurocomputing 352 (2019) 54–63
Fig. 8. Original day-samples, and distribution of N1 over time and over sensors: (a) day-samples; (b) DtN1 , i.e., distribution of N1 over the days of a week; (c) DsN1 , i.e., distribution of N1 over the sensors.
Fig. 9. Original day-samples, and distribution of N2 over time and over sensors: (a) day-samples; (b) DtN2 , i.e., distribution of N2 over the days of a week; (c) DsN2 , i.e., distribution of N2 over the sensors.
Fig. 10. Original day-samples, and distribution of N3 over time and over sensors: (a) day-samples; (b) DtN3 , i.e., distribution of N3 over the days of a week; (c) DsN3 , i.e., distribution of N3 over the sensors.
Fig. 11. Original day-samples, and distribution of N4 over time and over sensors: (a) day-samples; (b) DtN4 , i.e., distribution of N4 over the days of a week; (c) DsN4 , i.e., distribution of N4 over the sensors.
Y. Wang, H. Ye and T. Zhang et al. / Neurocomputing 352 (2019) 54–63
61
Fig. 12. Day-samples on 01/16/2017: (a) day-samples of cluster 6; (b) day-samples of cluster 0.
Fig. 13. Day-samples on 01/04/2017: (a) day-samples of cluster 7; (b) day-samples of cluster 5.
distributed on holidays in China, including 12/31/2016 and 01/01/2017 for New Year’s day; 01/25/2017–01/31/2017 and 02/01/2017 for the Spring Festival; and 05/01/2017 for May Day. Considering the fact that this cluster also belongs to the four largest clusters and covers all of the sensors, we can conclude that cluster 4 corresponds to a normal operation pattern for the entire cable for holidays. In summary, for the four clusters with normal patterns, clusters 1, 2, and 3 represent normal patterns for typical days, whereas cluster 4 corresponds to normal patterns on holidays. Additionally, the main merit of the proposed method stems from the fact that curves with similar trends, but varying amplitudes are classified into the same cluster, based on which more effective monitoring methods can be developed compared to threshold-based alarms.
4.2.2.2. Analysis of minor clusters. The five minor clusters, namely clusters 5, 6, 7, 8, and 0, contain only 21, 13, 10, 8, and 7 daysamples, respectively. Again, because a system should operate normally most of the time, these minor clusters that appear only occasionally may indicate abnormalities. From Fig. 6, one can see that the 59 day-samples in these five clusters are distributed across only seven days: 12/30/2016, 01/04/2017–01/07/2017, 01/16/2017, and 01/17/2017. Therefore, it is reasonable to analyze the daysamples in these five clusters in day-by-day fashion. Due to space limitation, only two days, i.e., 01/16/2017 and 01/04/2017 are discussed in detail in the following, and brief conclusions are given for the other five days. (i). 01/16/2017 From Fig. 6, one can see that the day-samples on 01/16/2017 include those from cluster 6, distributed across sensors 1#A, 1#B, 1#C, 2#A, 2#B, 2#C, 3#A, 4#A, 4#B, 4#C, 5#A, 5#B, and 5#C, and those from cluster 0, distributed across sensors 7#C and 8#B. The original curves of clusters 6 and 0 are presented in Figs. 12(a) and 12(b), respectively. Because (i) these curves are consistent visually and cover all the sensors of Group 1, and (ii) all of these day-samples show rapid changes in amplitude after 22:00, we can conclude that there are some abnormalities around 22:00. This conclusion is also verified by the system log, which revealed that this power cable had an increased demand for electricity supply because of an emergency happening to another
neighboring power cable. This event coincides with the rapid changes in the curves in Fig. 12. (ii). 01/04/2017 According to Fig. 6, the day-samples on 01/04/2017 include those of cluster 7, distributed across sensors 1#A, 1#B, 1#C, 3#A, 4#A, 4#B, 4#C, 5#A, 5#B, and 5#C, as shown in Fig. 13(a), and include those of cluster 5, distributed across sensors 7#C and 8#B, as shown in Fig. 13(b). Again, similar to the analysis for the 01/16/2017, it is reasonable to conclude that there was an abnormal event on 01/04/2017 that affected the entire cable. The conclusion is verified from the system log, which shows that renovation of the cable occurred on 01/04/2017. (iii). Other five days in minor clusters There are no system logs for the other five days covered by the minor clusters. Among them, the day-samples on 01/05/2017, 01/17/2017 and 12/30/2016 do have quite different trends from the normal clusters, i.e., clusters 1–3. According to our analysis, day-samples on 01/05/2017 and 01/17/2017 may be related to the abnormal events on 01/04/2017 and 01/16/2017, respectively, and day-samples on 12/30/2016 may be related to an unknown abnormal event. Additionally, day-samples on 01/06/2017 and 01/07/2017 have no apparent visual abnormalities, which should be further analyzed and discussed with field engineers. 4.3. Summary The inherent patterns of each cluster discovered via spatiotemporal analysis are summarized in Table 1. It is worth noting that all of the above conclusions were made based on the analysis of data with no labels, although two patterns (i.e., patterns 6 and 9) could be verified by records in the system log. Additionally, according to Section 4.2.1, we found that the quality of the data from the sensors of Group 2 (i.e., 3#B, 3#C, 7#A, 7#B, 8#A, and 8#C) was insufficient because the data are not consistent with those from the sensors of Group 1. In the field of condition monitoring of power cables based on sheath currents, a-priori models are always required. But in this paper, the discovered inherent modes, regarded as the basis of the condition monitoring of power cables, were revealed without any a-priori model, which makes the proposed method capable to handle more real circumstances and more complicated sheath current
62
Y. Wang, H. Ye and T. Zhang et al. / Neurocomputing 352 (2019) 54–63 Table 1 Theinherent patterns of each cluster discovered via spatiotemporal analysis. Pattern no.
Pattern type
Related cluster(s)
Distribution over time
Distribution over sensors
1 2 3 4 5 6 7 8 9 10
Normal Normal Normal Normal Abnormal Abnormal Abnormal Unknown Abnormal Abnormal
Cluster Cluster Cluster Cluster Cluster Cluster Cluster Cluster Cluster Cluster
Periodicity on Tuesdays to Saturdays Periodicity on Sundays Periodicity on Mondays Holidays 12/30/2016 01/04/2017 01/05/2017 01/06/2017 01/07/2017 01/16/2017 01/17/2017
15 sensors 15 sensors 15 sensors 15 sensors 8 sensors 10 sensors 5 sensors 2 sensors 15 sensors 15 sensors
1 2 3 4 8 7&5 5&0 5 6 5&0
data. Besides, compared with the threshold-based alarm strategy widely adopted in the existing monitoring systems, the proposed data mining method could classify the sheath currents according to their trends rather than amplitudes, which overcomes the demerits of simple threshold-based alarm strategy. Additionally, the proposed data mining method based on unsupervised learning and spatiotemporal analysis in this paper could reveal the inherent modes from the massive unlabeled sheath currents, which is a common problem faced by current data-driven methods for system monitoring or abnormality detection. 5. Conclusion In this paper, we proposed a data mining method based on unsupervised learning and spatiotemporal analysis to discover unknown inherent patterns in sheath currents. Through the spatiotemporal analysis of clustering results, three types of inherent patterns, namely normal patterns with periodicity, normal patterns for holidays, and abnormal patterns, were discovered in unlabeled data. Furthermore, spatiotemporal analysis revealed that the samples of a particular normal pattern with periodicity may have similar trends, but varying amplitudes, which indicates that the proposed data mining method focuses on trends in sheath currents, rather than amplitudes. A simple abnormality detection method has also been developed by us to demonstrate the potential application of the proposed data mining method, which is not presented in this paper due to space limitation, and needs further research in future. Besides, other unsupervised learning techniques will be explored to achieve better performance for the data mining method in our future work. In addition, the computation efficiency for online algorithm will be considered. Due to the limitation of t-SNE, mappings from high-dimensional vectors to low-dimensional features in the proposed method cannot be applied to new incoming dailysamples directly, which needs further discussion for online detection in the future. Conflict of interest None. Acknowledgement This work was supported by the National Key Research and Development Program of China under Grant 2017YFB120 070 0, the National Natural Science Foundation of China under Grant 61490701 and the special fund of Suzhou-Tsinghua Innovation Leading Action under Grant 2016SZ0202. References [1] Y. Bicen, Trend adjusted lifetime monitoring of underground power cable, Electric Power Syst. Res. 143 (2017) 189–196, doi:10.1016/j.epsr.2016.10.045.
[2] S. TAS¸ KIN, S¸ .S. S¸ eker, Ö. Kalenderli, M. Karahan, Coherence analysis on thermal effect for power cables under different environmental conditions, Turkish J. Electr. Eng. Comput. Sci. 22 (1) (2014) 25–33. [3] M. Marzinotto, G. Mazzanti, The feasibility of cable sheath fault detection by monitoring sheath-to-ground currents at the ends of cross-bonding sections, IEEE Trans. Ind. Appl. 51 (6) (2015) 5376–5384, doi:10.1109/TIA.2015.2409802. [4] X. Dong, Y. Yang, C. Zhou, D.M. Hepburn, Online monitoring and diagnosis of HV cable faults by sheath system currents, IEEE Trans. Power Deliv. 32 (5) (2017) 2281–2290, doi:10.1109/TPWRD.2017.2665818. [5] J.J. Shea, Power and communication cables-theory and applications (Book review), IEEE Electr. Insul. Mag. 16 (3) (20 0 0), doi:10.1109/MEI.20 0 0.845027. 34– 34 [6] IEEE guide for bonding shields and sheaths of single-conductor power cables rated 5 kV through 500 kV, IEEE Std 575-2014 (Revision of IEEE Std 575-1988) (2014) 1–83, doi:10.1109/IEEESTD.2014.6905681. [7] C.-K. Jung, J.-B. Lee, J.-W. Kang, Sheath circulating current analysis of a crossbonded power cable systems, J. Electr. Eng. Technol. 2 (3) (2007) 320–328. [8] Y. Yang, D.M. Hepburn, C. Zhou, W. Zhou, Y. Bao, On-line monitoring of relative dielectric losses in cross-bonded cables using sheath currents, IEEE Trans. Dielectr. Electr. Insul. 24 (5) (2017) 2677–2685, doi:10.1109/TDEI.2017.005438. [9] IEEE standard for the testing, design, installation, and maintenance of electrical resistance trace heating for industrial applications, IEEE Std 515-2017 (Revision of IEEE Std 515-2011) (2017) 1–83, doi:10.1109/IEEESTD.2017.8118401. [10] C. Zhou, Y. Yang, L. Mingzhen, Z. Wenjun, An integrated cable condition diagnosis and fault localization system via sheath current monitoring, in: Proceedings of the 2016 International Conference on Condition Monitoring and Diagnosis (CMD), pp. 1–8. doi:10.1109/CMD.2016.7757837. [11] Y. Li, P. Fa-dong, C. Xiao-lin, C. Yong-hong, X. Li, Study on sheath circulating current of cross-linked power cables, in: Proceedings of the 2008 International Conference on High Voltage Engineering and Application, pp. 645–648. doi:10. 1109/ICHVE.2008.4774018. [12] B. Akbal, Hybrid ann methods to reduce the sheath current effects in high voltage underground cable line, in: Proceedings of the 2016 4th International Istanbul Smart Grid Congress and Fair (ICSG), pp. 1–5. doi:10.1109/SGCF.2016. 7492422. [13] X. Dong, Y. Yuan, Z. Gao, C. Zhou, P. Wallace, B. Alkali, B. Sheng, H. Zhou, Analysis of cable failure modes and cable joint failure detection via sheath circulating current, in: Proceedings of the 2014 IEEE Electrical Insulation Conference (EIC), pp. 294–298. doi:10.1109/EIC.2014.6869395. [14] Y. Yang, D.M. Hepburn, C. Zhou, W. Jiang, B. Yang, W. Zhou, On-line monitoring and trending of dielectric loss in a cross-bonded HV cable system, in: Proceedings of the 2015 IEEE 11th International Conference on the Properties and Applications of Dielectric Materials (ICPADM), pp. 301–304. doi:10.1109/ ICPADM.2015.7295268. [15] U.S. Gudmundsdottir, B. Gustavsen, C.L. Bak, W. Wiechowski, Field test and simulation of a 400-kv cross-bonded cable system, IEEE Trans. Power Deliv. 26 (3) (2011) 1403–1410, doi:10.1109/TPWRD.2010.2084600. [16] S. Jones, G. Bucea, A. McAlpine, M. Nakanishi, S. Mashio, H. Komeda, A. Jinno, Condition monitoring system for transgrid 330 kv power cable, in: Proceedings of the 2004 International Conference on Power System Technology, 2004. PowerCon 2004., 2, pp. 1282–1287 Vol.2. doi:10.1109/ICPST.2004.1460199. [17] D. Gieselbrecht, W. Koltunowicz, A. Obralic, T. Ritz, P. Christensen, B. Schneider, K.H. Cohnen, Monitoring of 420 kv xlpe cable system in underground tunnel, in: Proceedings of the 2012 IEEE International Conference on Condition Monitoring and Diagnosis, pp. 917–920. doi:10.1109/CMD.2012.6416302. [18] X. Xie, L. Feng, Real-time health monitoring system for power tunnel, GeoCongress (2012) 25–29. [19] M. Kazerooni, H. Zhu, T.J. Overbye, Literature review on the applications of data mining in power systems, in: Proceedings of the 2014 Power and Energy Conference at Illinois (PECI), pp. 1–8. doi:10.1109/PECI.2014.6804567. [20] F. McLoughlin, A. Duffy, M. Conlon, A clustering approach to domestic electricity load profile characterisation using smart metering data, Appl. Energy 141 (2015) 190–199. [21] G. Chicco, Overview and performance assessment of the clustering methods for electrical load pattern grouping, Energy 42 (1) (2012) 68–80. [22] L.A.P. Junior, C.C.O. Ramos, D. Rodrigues, D.R. Pereira, A.N. de Souza, K.A.P. da Costa, J.P. Papa, Unsupervised non-technical losses identification through optimum-path forest, Electr. Power Syst. Res. 140 (2016) 413–423.
Y. Wang, H. Ye and T. Zhang et al. / Neurocomputing 352 (2019) 54–63 [23] P. Jokar, N. Arianpoo, V.C. Leung, Electricity theft detection in ami using customers’ consumption patterns, IEEE Trans. Smart Grid 7 (1) (2016) 216– 226. [24] J. Yang, C. Ning, C. Deb, F. Zhang, D. Cheong, S.E. Lee, C. Sekhar, K.W. Tham, k-shape clustering algorithm for building energy usage patterns analysis and forecasting model accuracy improvement, Energy Build. 146 (2017) 27–37. [25] T. Teeraratkul, D. O’Neill, S. Lall, Shape-based approach to household electric load curve clustering and prediction, IEEE Trans. Smart Grid 9 (2018) 5196–5206. [26] Y.-H. Hsiao, Household electricity demand forecast based on context information and user daily schedule analysis from meter data, IEEE Trans. Ind. Inform. 11 (1) (2015) 33–43. [27] M. Wu, H. Cao, J. Cao, H.-L. Nguyen, J.B. Gomes, S.P. Krishnaswamy, An overview of state-of-the-art partial discharge analysis techniques for condition monitoring, IEEE Electr. Insul. Mag. 31 (6) (2015) 22–35. [28] N. Zeng, Z. Wang, H. Zhang, W. Liu, F.E. Alsaadi, Deep belief networks for quantitative analysis of a gold immunochromatographic strip, Cogn. Comput. 8 (4) (2016) 684–692. [29] N. Zeng, H. Zhang, B. Song, W. Liu, Y. Li, A.M. Dobaie, Facial expression recognition via learning deep sparse autoencoders, Neurocomputing 273 (2018) 643–649. [30] N. Zeng, H. Zhang, W. Liu, J. Liang, F.E. Alsaadi, A switching delayed PSO optimized extreme learning machine for short-term load forecasting, Neurocomputing 240 (2017) 175–182. [31] E. Schubert, J. Sander, M. Ester, H.P. Kriegel, X. Xu, DBSCAN revisited, revisited: why and how you should (still) use DBSCAN, ACM Trans. Database Syst. (TODS) 42 (3) (2017) 19. [32] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, P.-A. Manzagol, Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion, J. Mach. Learn. Res. 11 (2010) 3371–3408. [33] L. Van Der Maaten, E. Postma, J. Van den Herik, Dimensionality reduction: a comparative, J. Mach. Learn. Res. 10 (2009) 66–71. [34] P. Vincent, H. Larochelle, Y. Bengio, P.-A. Manzagol, Extracting and composing robust features with denoising autoencoders, in: Proceedings of the 25th International Conference on Machine Learning, ICML ’08, ACM, New York, NY, USA, 2008, pp. 1096–1103, doi:10.1145/1390156.1390294. [35] G.E. Hinton, R.R. Salakhutdinov, Reducing the dimensionality of data with neural networks, Science 313 (5786) (2006) 504. [36] L.v.d. Maaten, G. Hinton, Visualizing data using T-SNE, J. Mach. Learn. Res. 9 (Nov) (2008) 2579–2605. [37] L. Van Der Maaten, E. Postma, J. Van den Herik, Dimensionality reduction: a comparative, J. Mach. Learn. Res. 10 (66-71) (2009) 13. [38] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition, Proc. IEEE 86 (11) (1998) 2278–2324. [39] H. Wang, W. Dong, H. Ye, Y. Yan, X. Yan, Clustering of s700k point machine’s current curves based on reducing dimensions with denoising autoencoders and T-SNE, in: Proceedings of the 2017 Chinese Automation Congress & Intelligent Manufacturing International Conference, p. 6. [40] A.K. Jain, R.C. Dubes, Algorithms for Clustering Data, Prentice-Hall, Inc., 1988. [41] A.K. Jain, Data clustering: 50 years beyond k-means, Pattern Recognit. Lett. 31 (8) (2010) 651–666, doi:10.1016/j.patrec.2009.09.011. [42] M. Ester, H. Kriegel, J. Sander, X. Xiaowei, A density-based algorithm for discovering clusters in large spatial databases with noise, Proceedings of the International Conference on Knowledge Discovery and Data Mining, Portland, OR (United States), 2-4 Aug 1996, AAAI Press, Menlo Park, CA (United States), 1996. Other Information: PBD: 1996; Related Information: Is Part Of Proceedings of the Second International Conference on Knowledge Discovery & Data Mining; Simoudis, E.; Han, J.; Fayyad, U. (eds.); PB: 405 p. [43] X. Rui, D. Wunsch, Survey of clustering algorithms, IEEE Trans. Neural Netw. 16 (3) (2005) 645–678, doi:10.1109/TNN.2005.845141.
63
[44] G. Montavon, G. Orr, K.-R. Mller, Neural Networks: Tricks of the Trade, 7700, Springer, 2012. [45] Y. Bengio, Practical Recommendations for Gradient-Based Training of Deep Architectures, Springer, pp. 437–478. [46] M. Ester, H.-P. Kriegel, J. Sander, X. Xu, A density-based algorithm for discovering clusters in large spatial databases with noise, in: Proceedings of the KDD, 96, pp. 226–231. Yilin Wang received her Bachelor’s degree in Automation from Central South University, Changsha, China, in 2014. She is currently pursuing the Ph.D. degree in Automaton at Tsinghua University, Beijing, China. Her research interests include machine learning, data mining and fault detection.
Hao Ye was born in China, in 1969. He received his Bachelor and Doctoral Degrees in Automation from Tsinghua University, China, in 1992 and 1996 respectively. He has been with the Department of Automation, Tsinghua University since 1996. He is currently a professor and the director of the Institute of Process Control Engineering of the Department of Automation of Tsinghua University. He is mainly interested in fault detection and diagnosis of dynamic systems.
Tongshuai Zhang He received his Bachelor and Doctoral Degrees in Automation from Tsinghua University, China, in 2011 and 2017 respectively. He is currently with China Property and Casualty Reinsurance Co., Ltd., Beijing, China. His research interests are fault diagnosis, machine learning and data mining.
Haipeng Zhang received his Bachelor Degree in Industrial Automation from Zhengzhou University in 20 0 0. He is currently with Zhengzhou Huali Information Technology Co., Ltd., Zhengzhou, China. His is engaged in the development and management of Automation of Electric Power Systems.