Electric Power Systems Research 154 (2018) 474–483
Contents lists available at ScienceDirect
Electric Power Systems Research journal homepage: www.elsevier.com/locate/epsr
High impedance fault detection in power distribution systems using wavelet transform and evolving neural network Sergio Silva a , Pyramo Costa a , Maury Gouvea a , Alcyr Lacerda a , Franciele Alves a , Daniel Leite b,∗ a b
Graduate Program in Electrical Engineering, Pontifical Catholic University of Minas Gerais (PUC Minas), Brazil Department of Engineering, Federal University of Lavras (UFLA), Brazil
a r t i c l e
i n f o
Article history: Received 28 April 2017 Received in revised form 22 July 2017 Accepted 30 August 2017 Keywords: Evolving neural network Pattern recognition High impedance fault detection Power distribution system Wavelet transform
a b s t r a c t This paper concerns how to apply an incremental learning algorithm based on data streams to detect high impedance faults in power distribution systems. A feature extraction method, based on a discrete wavelet transform that is combined with an evolving neural network, is used to recognize spatial–temporal patterns of electrical current data. Different wavelet families, such as Haar, Symlet, Daubechie, Coiflet and Biorthogonal, and different decomposition levels, were investigated in order to provide the most discriminative features for fault detection. The use of an evolving neural network was shown to be a quite appropriate approach to fault detection since high impedance faults is a time-varying problem. The performance of the proposed evolving system for detecting and classifying faults was compared with those of well-established computational intelligence methods: multilayer perceptron neural network, probabilistic neural network, and support vector machine. The results showed that the proposed system is efficient and robust to changes. A classification performance in the order of 99% is exhibited by all classifiers in situations where the fault patterns do not significantly change during tests. However, a performance drop of about 13–24% is exhibited by non-evolving classifiers when fault patterns suffer from gradual or abrupt change in their behavior. The evolving system is capable, after incremental learning, of maintaining its detection and classification performance even in such situations. © 2017 Elsevier B.V. All rights reserved.
1. Introduction High impedance faults (HIFs) are common events in multigrounded distribution networks. HIFs generally occur when an energized conductor comes into contact with an insulating or very poorly conductive surface such as concrete, asphalt, sand or tree branches. The resulting fault current may be equivalent in absolute value with the typical load current of a system, thus preventing overcurrent protection devices from operating correctly [1,2]. Several methods have emerged in recent decades that seek to obtain an effective HIF detection system. A survey of such detection methods is given by Ghaderi et al. [1]. Among the most successful methods are those based on combining a feature extraction algorithm and a pattern recognition technique. Feature extraction is desirable mainly because the electrical currents associated with HIFs are nonlinear, asymmetric, and time-varying [1]. Extracted
∗ Corresponding author. E-mail address:
[email protected]fla.br (D. Leite). http://dx.doi.org/10.1016/j.epsr.2017.08.039 0378-7796/© 2017 Elsevier B.V. All rights reserved.
features usually facilitate HIF detection by means of pattern recognition tools, which are specially designed for the purpose. Among these research studies, we highlight [3] whose main contribution is to show that a pure simulation based procedure, using HIF models, can be used to investigate new schemes for detecting HIFs. The paper by Samantaray et al. [4] introduces the use of a probabilistic neural network as a pattern recognition tool for HIF detection. The paper is also a pure simulation-based study. Reference [5] is a research study based on Discrete Wavelet Transforms and on an Artificial Neural Network (DWT-ANN) arrangement. It introduces a detection method based on an ensemble of decision trees. An extended Kalman filter is used to analyze the magnitude and phase of the harmonics in the HIF current. In [6], a complete WT-ANN detection scheme is presented. A HIF model is also proposed by the authors in order to simulate the dynamic characteristics of electric arcs. The paper also highlights the importance of the robustness of the method against typical transients in power systems. Use of a neural-fuzzy inference system (ANFIS) in a HIF detection method is proposed by Etemadi and Sanaye-Pasand [7]. The authors also investigated the best wavelet family to be used for feature extraction. This analysis is usually ignored by other HIF
S. Silva et al. / Electric Power Systems Research 154 (2018) 474–483
detection studies. The detection scheme proposed by Michalik et al. [8] is an example of using of the DWT as feature extraction method; however, the authors did not use a pattern recognition tool for classification. The detection process can be distinguished from others due to its simplicity and effectiveness. The article by Santos et al. [9] is a recent simulation-based research study, in which the most important contribution is the way to identify the location of HIFs with good overall precision, which results in a 70% reduction of the location time in some cases. Ideally, fault detection systems should be: (i) highly efficient, meaning that the detection rate should be high; and (ii) robust, meaning that the system should not give false positives when typical transitory events occur on a power system, e.g. due to capacitor switching or transformer energization. In particular, some detection schemes have been proposed using wavelet transform as a feature extraction method and different types of artificial neural networks as tools for pattern recognition. In some studies, neuralwavelet systems were subject to typical transitory events and showed significant robustness [4,8,10]. All research studies discussed above use intelligent classifiers with fixed topology. This means that their structures (rule base or connections and neurons of a neural network) are rigid and preconstructed. A fixed number of neurons in the hidden layers of a neural network must be defined by the user based on tests or previous experience. Occasionally, a learning algorithm is used to create the network hidden layer based on training data. In Support Vector Machines, the number of support vectors is defined during the training process. For each case mentioned, a unique learning approach is applied to set the parameters of the classifier. Once trained, the resulting topology is static, i.e., the neural structure or support vectors cannot be changed. This means that the classifiers cannot learn during their online operation. If new information needs to be added to the classifier, a new offline training process from scratch must be performed, thus adding the new samples to the training dataset. Undertaking pattern recognition tasks that deal with non-stationary data is a typical situation in which the ability to add new information to the system is of great value [11,4]. That is the case of the HIF detection problem since fault patterns can change due to a series of factors. A classifier that is able to learn continuously from its environment, thereby adapting its structure and parameters, can maintain its performance. On the other hand, a typical intelligent system can lose its ability to detect patterns. This paper introduces an adaptive HIF detection method to deal with power systems in changing situations. The primary goal is to outperform the existing detection and classification methods by exploring online incremental learning procedures to find new fault patterns in streams of data [12,13]. The proposed method is based on using a discrete wavelet transform approach for feature extraction and an evolving artificial neural network for HIF recognition. Evolving neural networks differ from classical neural networks mainly because they do not have a fixed topology. References [11,14–17] are examples of evolving fuzzy neural networks. These systems are very flexible to the data and able to adapt their structure online, thus allowing the system to learn in non-stationary environment. References [11,17] are examples of evolving granular neural networks (eGNN) for classification. eGNN classifiers are able to undertake structural and parametric adap-
475
tation to handle abrupt and gradual changes that are typical in non-stationary environment. Another class of evolving networks is proposed by Rubio [16]. The paper describes an algorithm that has the ability to reorganize the network model to a changing environment, where structure and parameters are adapted simultaneously. Several classes of evolving intelligent systems discussed in [18,19] may be useful for further reference. The existence of an evolving layer of neurons allows neural networks to change their connectionist structure to fit new information dynamically. Evolving neural networks have the following advantages: (i) fast learning due to one-pass incremental training; (ii) great resistance to catastrophic forgetting; and (iii) good generalization ability. These characteristics are advantageous especially because HIF current waveforms suffer from different types of changes. In other words, because signals and patterns are timevarying, the performance of ‘non-evolving’ classification models may drop substantially [18,19,11]. In order to compare the performance of the proposed evolving wavelet-neural system with the performance of other methods and systems, several families of wavelets are used for feature extraction and four types of classification models are considered for fault detection. They are the well-known Multilayer Perceptron Network (MLP), Probabilistic Neural Network (PNN), Support Vector Machine (SVM), and the evolving neural network, namely, Simple Evolving Connectionist System (SECoS). Non-evolving classifiers, such as MLP, PNN and SVM, exhibit the usual features of intelligent systems: (i) fixed topology during operation; (ii) all data samples are required to be available for offline training; (iii) they require multiple passes over the training samples; and (iv) they are subject to catastrophic forgetting if new patterns need to be learned [20]. SECoS addresses these weaknesses by adding and deleting neurons and/or adapting neural connections online [18]. SECoS is able to engage on online one-pass-through-the-data training, thus offering prevention from catastrophic forgetting. This paper shows the advantages of using wavelet decomposition combined with an evolving neural network considering the HIF detection problem. The remaining of this paper is organized as follows. Section 2 introduces the HIF detection problem. Section 3 addresses the evolving neural network model and discusses its topology and learning algorithm. Sections 4 and 5 present the experimental results. Different classification models are compared and the advantages and disadvantages of the proposed wavelet-SECoS method for HIF detection are presented. Section 6 provides conclusions and suggestions for further investigations.
2. HIF detection This section describes the HIF detection method used in this work and its main characteristics. A general flowchart of the proposed detection scheme is shown in Fig. 1. Fundamentally, the flowchart consists of three elements: (i) a medium voltage network HIF algorithm; (ii) a feature extraction method performed over current signals; and (iii) an evolving intelligent method for recognition of patterns in data streams. Medium voltage network HIF simulation is carried out by using ATP (Alternative Transients Program) and its graphical processor ATPDraw. Feature extraction and pattern recognition algorithms are implemented in Matlab.
Fig. 1. Flowchart of the proposed HIF detection scheme.
476
S. Silva et al. / Electric Power Systems Research 154 (2018) 474–483
where gˆ is the instantaneous arc conductance, is a time constant, and G0 is the stationary arc conductance. The solution of (1) is gˆ (t) = G0 (1 − e−t/ )
(2)
HIF simulation realizes such equation using ATP Models language. The algorithm is given below. Details for calculation of the model parameters are given in [28]. Algorithm. 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26:
Fig. 2. General scheme of ATP HIF Models 1 and 2.
Fig. 3. IEEE 13 bus test feeder.
This mathematical analysis and development environment is used successfully in various pattern recognition applications, such as [21–26]. The next sections detail the processing steps. 2.1. HIF simulation in power distribution system Power systems computer simulations are quite often used in HIF research studies [27]. The models try to approximate HIF characteristics in the best way possible. Reference [27] presents a survey of HIF models for medium voltage networks. Aspects of a real HIF behavior, such as nonlinearities, asymmetries, low-frequency transients and high-order harmonics, are included in these models [28,29]. A HIF can be modeled in ATP using either the available Models language or passive components. This work uses both: a model based on the ATP Models language, created using APT Draw frontend (Model 1; Fig. 2, left); and a model based on passive components (Model 2; Fig. 2, right). A detailed description of Model 1 can be found in [28]. HIF Model 2 was introduced in [27]. In Fig. 2, ‘Model 1 – HIF’ is an algorithm that receives a voltage signal from the fault point as input and thus provides a value of resistance, R, as output. The algorithm is based on differential equations that describe electric arcs [1,28]. For example, Torres et al. [28] used the following differential equation G0 − gˆ d(ˆg ) = dt
(1)
HIF model using ATP MODEL HIF CONST gminval:1.0E-6 v0val:11267 DATA g0 tau tinit INPUT vt OUTPUT rt var gs gd gt sg rt EXEC gs : = g0 * (vt **2/V0 **2) gs : = gs * (1 − exp(−(t − tinit)/tau)) if t ≤ tinit then gd:=gs else LAPLACE (gd/gs) : =1 |/(1 | +5e − 4 | s) end if sg : = (gd ≤ gmin)OR(prevval(gd) ≤ gmin) gt : = (gmin * sg + gd * not(sg)) rt : =1/gt ENDEXEC ENDMODEL
In ‘Model 2’, Di , Li , Ri and Vi refer to values of the positive and negative components of a passive AC circuit. Inductances Li are included to represent the high inductance, which is typical in circuits that produce arcs [27]. Models 1 and 2 are useful to generate data to evaluate the performance and robustness of pattern recognition methods. The objective is to design and train a classifier by using data generated by Model 1, and to detect HIFs from data generated by Model 2. The use of different models is a way of evaluating the generalization capability of classifiers since the data streams produced by each model are slightly different. A computational intelligence tool may be trained to identify a particular pattern; however, the fault current pattern may change over time due to several noncontrolled aspects, such as noise, nonlinearity or fault medium [30]. In this work, such changes in fault patterns are obtained from the HIF models. In order to simulate a HIF in a typical power distribution system, the IEEE 13 bus test feeder is adopted [31]. Fig. 3 shows the feeder scheme. In particular, the voltage regulator between buses 650 and 632 is not considered in this work. This element has a complex model and its presence in the feeder does not influence the analysis carried out in this study. IEEE 13 bus test feeder is a small but heavily loaded feeder with 4,053 kVA apparent power and 0.85 power factor. The feeder is nearly 1.5 km extension from bus 650 to 680. Several line configurations are possible. There are three-, bi-, and mono-phase segments, and several tower configurations that can be applied to different line segments. Most of the load is spot connected, but distributed load is also considered in the segment between buses 632 and 671. Classifiers training and recalling datasets are obtained considering several HIFs applied to the test feeder shown in Fig. 3. Specific assumptions are: • For the training dataset, HIFs are applied to the bus 680. Model 1 parameters are adjusted to result in a HIF current peak of 10%
S. Silva et al. / Electric Power Systems Research 154 (2018) 474–483
477
Fig. 4. DWT realization using a filter bank.
of the typical feeder peak current, i.e., a maximum current of approximately 60 A. A total of 1260 data samples are generated for the training process. Half of these samples have HIF current superposed to the steady state one. • For the recalling datasets, several HIFs are applied to buses 633, 675, 671, 632, 650, 645, 684 and 646. The same model parameters are used to build the recall datasets. 1020 data samples are generated and used to test the classifiers. Half of these samples have HIF current superposed to the steady state one. The other half consists of steady state signals that may be subject to typical electrical transients [29]. Model 1 is used to create the first recalling dataset. The second dataset contains a mixture of samples generated from Models 1 and 2, proportionally.
Table 1 Wavelet families used for feature extraction. Family
Wavelet
Haar Symlet Daubechie Coiflet Biorthogonal
– Sym1 to Sym8 Db1 to Db8 Coif1 to Coif5 Bior1.1 to Bior1.5 Bior2.2 to Bior2.8 Bior3.1 to Bior3.9 Bior4.4, Bior5.5, and Bior6.8
In addition, all HIFs are applied to phase A. The following transients are considered: • 200 kVAr three-phase capacitor bank switching at bus 692; • 500 kVA three-phase transformer energization at bus 634. Electrical transients should not be classified as faults. The objective of considering transients is to analyze the robustness of the classifiers in detecting HIFs regardless the existence of transients. 2.2. Discrete wavelet transform Wavelet transforms are a powerful tool for signal processing and analysis. They are often considered for HIF detection. Fundamentally, the wavelet transform can simultaneously give information about the time and frequency of a given signal. Signal analysis using wavelets is based on the principle of dilation and translation of a mother wavelet over the signal. Scaling operation is performed to dilate and compress the mother wavelet, resulting in low and high frequency signals, respectively. Translation operation is applied with the objective to obtain temporal response. The discrete wavelet transform (DWT) is the digital implementation of the wavelet transform. DWT has the following form [32]: DWT (x(m, n)) =
1
am 0
k
x(k)
∗
n − kb0 am 0 am 0
(3)
where denotes a mother wavelet and * is the complex conjugate. Parameters a and b are the scale and translation parameters, being both functions of m. through a low and high-pass filter DWTs can be implemented 1 1 1 bank [32]. Let a0 = 2 or a−m = 1, , , , . . . and b = 1. This way, 2 4 8 0 the mother wavelet is used as a low-pass filter l(n); its dual is a highpass filter h(n). The scale factor applied over the mother wavelet is based on the resampling of the low pass filter by a 0.5 factor. This leads to a ratio of two to one considering the wavelet being applied on a next stage. The construction using filter bank is computationally efficient [33]; the decomposed signal is in the form of a series of detailed (Dn ) and approximated (An ) components. The first
Fig. 5. Original current signal. HIF applied at 0.1 s.
carries low-frequency components whereas the latter carries highfrequency components. Fig. 4 shows the filter bank arrangement for a 3-level decomposition adopted in this paper. A variety of mother wavelets has been used for extracting features from raw electrical current data. Table 1 shows the wavelet families considered for evaluation. For each wavelet listed in Table 1, a 3-level decomposition is performed, where only detailed coefficients are extracted. Figs. 5–7 give examples of decomposition. Fig. 5 shows a typical current signal measured from the source of the IEEE 13 bus feeder. HIF is applied in a bus at 0.1 s. Fig. 6 shows the first detailed coefficient, D1 , from Db1 wavelet while Fig. 7 shows the second detailed coefficient, D2 , from Sym3 wavelet. One can see that the features extracted from the original current signal using Db2 and Sym3 wavelets are essentially different. The system behavior under a fault condition becomes evident from the Sym3 – D2 coefficient in this case since a different pattern can be clearly observed. A time window containing one cycle of sinusoidal wave is used to decompose the current signal. This is equivalent to 16.67 ms, hence the 60 Hz frequency of the underlying power system. A 3 kHz sampling rate was used for the acquisition of the current signals (measures from the secondary winding of transformers), resulting in 50 data samples per cycle. This is the typical sampling rate of protection relays. Each time window is then passed forward to a DWT
478
S. Silva et al. / Electric Power Systems Research 154 (2018) 474–483
algorithm. Detailed coefficients from multilevel decomposition are reconstructed, and a signal window containing 50 data samples is obtained. The 50-sample window is directly used as input of a classifier. Fig. 8 illustrates the approach applied to a particular signal. A Sym3 – D2 wavelet is considered to provide a vector with 50 elements as input to a classifier. 3. Evolving classification neural network
Fig. 6. Db2 wavelet, first detailed coefficient.
Some classification tasks require online evolution of model parameters and structure [13,11,17]. In these cases, conventional machine learning and computational intelligence algorithms may not meet performance requirements because they assume forms of linearity and stationarity. Very often, learning algorithms demand multiple passes over entire datasets – a complete offline modeling approach. Evolving intelligent systems are characterized by the ability of adjusting their structure and parameters to the changing characteristics of the environment [34–36]. Therefore, they overcome some drawbacks of conventional, non-evolving, intelligent and machine learning systems. Evolving Connectionist Systems (ECoS) are systems that evolve their structure in an online manner to learn from a data stream. An overview of various ECoS classes can be found in [18,19]. Traditional connectionist structures, such as feedforward neural networks, are trained offline and therefore, unlike ECoS, do not deal with timevarying conditions. The operating features of ECoS are summarized below [18,37]. ECoS: • can learn fast from a relatively large amount of data in single pass training (incremental online learning); • self-adapt on the fly, allowing new information to be added to their structure and parameters; • memorize the essential information found in the data for refinements and information retrieval; • learn and improve though interaction with other systems and with the environment. 3.1. Simple Evolving Connectionist System
Fig. 7. Sym3 wavelet, second detailed coefficient.
The Simple Evolving Connectionist Systems (SECoS) approach is an implementation of ECOS, which is useful for supervised training
Fig. 8. Procedure for extracting features: a Sym3 – D2 wavelet is applied considering a 50-sample time window.
S. Silva et al. / Electric Power Systems Research 154 (2018) 474–483
479
defined threshold value, Ethr . If the values of activation and error are smaller than such threshold values, no neurons are created. Learning in this case adapts the connection weights of the most activated neuron in order to fit the new information. If a neuron is added to the network structure, its input weight vector Win , i = 1, . . ., m, matches the input I(j) ; and its output weight Wn1 is set to the desired output O(j) . New neurons in the evolving layer are copies of samples whose level of novelty are higher than threshold values. Naturally, the neural network structure can start from scratch. Adaptation of input weights are given by Wik(t+1) = Wik(t) + 1 (Ii(t+1) − Wik(t) )
(6)
whereas the output weight is updated based on Wk1(t+1) = Wk1(t) + 2 (Ak E1 )
(7)
Note that (6) and (7) are recursive equations; 1 and 2 are learning rates; and ˆ1 E1 = O1 − O
Fig. 9. SECoS neural structure.
of classifiers [37]. SECoS consists of a three-layer neural network as shown in Fig. 9. The first and third layers are called input and output layers, and the intermediate layer is named evolving layer. The evolving layer is the layer whose connections and neurons are developed over the time. The activation of the kth neuron of the evolving layer is given by Ak = 1 − Djk
(4)
where Djk ∈ [0, 1] ∀ j, k, is a normalized distance measure. Consider a weight vector Wik and the jth input vector, I(j) , both m-dimensional column vectors. The Manhattan distance between Wik and I(j) is Djk =
m |Wik − I(j) | i=1 m i=1
|Wik + I(j) |
(8)
is the estimation error. Refinement procedures can be considered during learning. First, a neuron added to the network can be placed based on three different approaches: • Linear allocation (LA): new neurons are placed at the end of the evolving layer; • Maximum activation (MA): new neurons are placed adjacent to the most activated neuron. If k is the most activated neuron of the evolving layer, then the new one is placed in the (k + 1)th position; • Minimum distance (MD): new neurons are placed in the (k + 1)th position, where neuron k is the neuron whose output weight vector is the closest in relation to the desired output vector.
(5)
m is the number of input neurons. Activation functions based on (4) and (5) suggest that samples I(j) that match a pattern Wik result in full activation of a neuron. By contrary, samples that are far away from a pattern result in a near-zero activation value. The output, Ak , of an evolving neuron is multiplied by the weights between the evolving and output layers, Wkl . The output neuron sums the weighted activation levels of the evolving neuˆ 1 to rons and uses a linear saturated function to limit its output O [0, 1]. Different signal propagation methods were proposed [19]. In the One-to-N method, only the most activated neuron of the evolving layer transmits an output signal to the output layer. In the Many-toN method, the neurons that have an activation value greater than a given threshold, Athr , transmit their outputs forward. In this paper, the One-to-N method is used.
In this paper we choose the MD approach since it facilitates potential further aggregations. Since new data cause neurons to be added to the network structure, SECoS networks can grow in size, which may cause memory problems if an aggregation method is not applied. Neuron aggregation is the process in which two or more neurons are aggregated into one. In this process, the resulting neuron is used to represent all the data lying on the region of the data space covered by the original neurons [18,37]. Input and output distances among neurons are calculated. If the resulting distance is inferior to a given threshold, Dthr , then neurons are aggregated. Aggregation allows reduction of the evolving layer size. During aggregation, input and output distances between two evolving neurons, namely, neurons o and p, are computed as in Dop =
i
3.2. SECoS learning algorithm SECoS learning consists essentially in fitting new data samples as weights of the neural network evolving layer [18]. This can be done by either modifying the connection weights or adding a new neuron to the network structure. The decision between creating a new neuron and adapting weights relies on the novelty of the input data. The degree of novelty is defined by comparing the level of activation of the most activated neuron to a pre-defined threshold value, Athr . New neurons can also be added to the network by measuring the error in the output and comparing it to a second pre-
m |Wio − Wip | im |Wio + Wip |
(9)
and out Dop =
|Wo1 − Wp1 | |Wo1 + Wp1 |
(10)
The resulting neuron has as input and output weights the mean values of its predecessors. The SECoS learning algorithm is summarized below. The network structure requires no initialization as no neurons exist on the evolving layer before training starts, and no weights need to be set.
480
Algorithm.
S. Silva et al. / Electric Power Systems Research 154 (2018) 474–483
SECoS online learning
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14:
Set Athr , Ethr , Dthr and learning rates 1 , 2 for each (I(j) , O(j) ), j = 1, . . . do Propagate I(j) through the network Find the most activated evolving neuron k ˆ (j) Calculate the error E1 between O(j) and O
if Ak < Athr ∀ k or E1 > Ethr then Add a neuron based on the MD approach else Update weights Wik ∀ i and Wk1 end if in out if max(Dop , Dop ) < Dthr then Aggregate neurons end if end for
Notice that the algorithm can also operate in a pre-constructed structure. It adds neurons or adapts weights similarly. Neural network models equipped with the online learning algorithm described above are natural candidates for consideration in nonlinear and time-varying classification applications. 4. Experimental study Several experiments were conducted in order to validate the proposed evolving system. Different intelligent classifiers are employed for the task of detecting and classifying HIFs. The classifiers are: Multi-Layer Perceptron (MLP), Probabilistic Neural Network (PNN), Support Vector Machine (SVM), and SECoS. Classifiers are trained and tested using the same data. The number of inputs is 50, namely, the samples of a detailed coefficient regarding the rescaled DWT window. The output refers to the presence or absence of faults in the power distribution system. 4.1. The classification models This subsection describes details on the parameterization of the different classifiers used in this study. MLP: The patternnet and train Matlab function were used to build a feedforward multilayer network model. A two hidden layer structure was chosen. Hidden neurons are based on sigmoidal activation functions whereas output neurons use softmax functions. A series of simulations were carried out to find that 40 and 15 neurons in the first and second hidden layers is the setting that provides the best overall performance. The training algorithm is the Scaled Conjugate Gradient, which in Matlab requires no learning rate definition. The number of training epochs required to achieve the desired minimum gradient (1e−6) varies in function of the wavelet chosen for decomposition. Typical epoch values are between 25 and 35. PNN: PNN is an RBF (Radial Basis Function) neural network for data classification. A PNN was created using the Matlab newpnn function. This function creates a two-layer neural network. Spreads of radial functions are initial parameters required to be set. A series of simulations suggested that a near-zero, small initial spread value provide the best performance for HIF classification. A near-zero value makes the neural network play the role of a nearest neighbor classifier [38]. The spread parameter was set to 0.9 in all simulations. In PNN, the size of the hidden layer is equal to the number of training examples. The network has 50 input neurons, 1260 hid-
den neurons, and one output neuron. The input neurons consist of radial basis functions; neurons of the hidden layer have competitive activation functions based on the Euclidean distance between test examples and centroids. The training algorithm requires only one pass of the data [38]. SVM: SVMs are classifiers that search for an optimal hyperplane capable of separating samples from one class to those from other classes [39]. Matlab svmtrain function was used to obtain an SVMbased HIF detection model. Many parameters are needed to be set, being the kernel function the most important. After a series of simulations aiming to achieve the best classification accuracy, a third order polynomial kernel was chosen. All other parameters were kept as Matlab default. Typical numbers of support vectors range from 10 to 35 depending on the wavelet used for decomposition. SECoS: The SECoS neural network model, as described in the previous section, has been set with the following parameters: Athr = 0.96; Ethr = 5 ×10−3 ; Rthr = 0.1; 1 = 2 = 0.9. These values were found after a series of simulations. The number of neurons in the evolving layer is unpredictable before the training process. It varies depending on the wavelet and detailed coefficient used in preprocessing steps. Typical values for the final number of evolving neurons range from 5 to 12. SECoS neural networks are online adaptive.
4.2. Training and recalling datasets Three major simulations were performed in ATP to construct the datasets for training and recalling classifiers. For all datasets, the sampling frequency for ATPDRAW data collection was set to 3 kHz. A first simulation was carried out to obtain training and validation data. The IEEE 13 bus test feeder was considered in steady state with all components connected. The simulation time was set to 21 s. HIFs are applied to bus 680 using Model 1 only. The goal is to have the same amount of data representing the presence and absence of fault (a balanced classification problem). 60% of the data of a total of 1260 samples was considered for training the classifiers. The rest of the data was used for validating (20%) and testing (20%) them, that is, to verify if the classification models generalize the results for never-before-seen data. A second simulation was conducted to construct the first dataset for recalling classifiers (conditions not used during the training process). HIFs were simulated in different buses using Model 1. Transients concerning switching of a 3-phase capacitor bank and energization of a transformer were also taken into account. The third simulation, which was also used for recalling purpose, has essentially the same prescription as that shown in Table 2 for the first recall dataset, but it combines data from Models 1 and 2. Both recalling datasets have 1020 samples. Table 2 summarizes the characteristics of the datasets. Training and recalling are performed using one of the mother wavelets shown in Table 1. Three-level decomposition based on a wavelet family allows the construction of appropriate windows of coefficients (input vectors for classifiers). Classifiers are, naturally, requested to handle recall data generated from the same mother wavelet and detailed coefficients used for their training.
Table 2 Summary of the datasets. Dataset purpose
Simulation time (s)
Samples after DWT analysis (#)
Healthy data (%)
HIF data – Model 1 (%)
HIF data – Model 2 (%)
Training First recall Second recall
21 17 17
1260 1020 1020
50% 51% 51%
50% 49% 24.5%
0% 0% 24.5%
S. Silva et al. / Electric Power Systems Research 154 (2018) 474–483
481
Table 3 Average performance of classifiers for the first recall dataset. Wavelet
Classifier
True positive %
True negative %
False positive %
False negative %
Misclassified examples %
D1
D2
D3
D1
D2
D3
D1
D2
D3
D1
D2
D3
D1
D2
D3
Db4
MLP PNN SVM SECoS Average
99.59 99.80 100 99.60
100 99.60 100 99.79 99.83
100 99.60 100 100
99.05 98.87 93.4 99.81
99.81 100 99.81 94.58 97.36
90.81 100 94.58 97.58
0.41 0.20 0 0.40
0 0.40 0 0.21 0.17
0 0.40 0 0
0.95 1.13 6.60 0.19
0.19 0 0.19 5.42 2.64
9.19 0 5.42 2.42
0.69 0.69 3.63 0.29
0.10 0.20 0.10 3.04 1.53
5.20 0.20 2.94 1.27
Sym4
MLP PNN SVM SECoS Average
100 99.56 100 93.75
100 99.60 100 95.34 98.94
99.80 99.60 99.79 99.78
98.13 92.72 83.97 52.09
99.62 100 99.62 85.28 91.89
99.62 99.81 98.12 93.73
0 0.44 0 6.25
0 0.40 0 4.66 1.06
0.20 0.40 0.21 0.22
1.87 7.28 16.03 47.91
0.38 0 0.38 14.72 8.11
0.38 0.19 1.88 6.27
0.98 4.22 9.80 47.25
0.20 0.20 0.20 10.49 6.54
0.29 0.29 1.08 3.53
Bior2.2
MLP PNN SVM SECoS Average
99.80 99.80 100 100
100 99.80 100 100 99.59
99.80 99.60 99.60 96.64
99.81 100 99.81 99.43
99.43 100 100 74.33 97.10
99.43 99.81 99.81 93.38
0.20 0.20 0 0
0 0.20 0 0 0.41
0.20 0.40 0.40 3.36
0.19 0 0.19 0.57
0.57 0 0 25.67 2.90
0.57 0.19 0.19 6.62
0.20 0.10 0.10 0.29
0.29 0.10 0 17.75 2.08
0.39 0.29 0.29 5.10
Bior2.8
MLP PNN SVM SECoS Average
99.80 99.80 100 99.80
100 99.80 100 100 99.90
100 99.60 100 100
99.81 100 99.81 99.24
99.62 100 99.43 99.05 99.64
99.62 100 99.24 99.81
0.20 0.20 0 0.20
0 0.20 0 0 0.10
0 0.40 0 0
0.19 0 0.19 0.76
0.38 0 0.57 0.95 0.36
0.38 0 0.76 0.19
0.20 0.10 0.10 0.49
0.20 0.10 0.29 0.49 0.24
0.20 0.20 0.39 0.10
5. Results and discussions 5.1. Preliminary experiments In the first experiment, MLP, PNN, SVM, and SECoS (with the online learning mechanism turned off) classifiers are evaluated by using the first recall dataset. The purpose is to compare the influence of the mother wavelet and detailed coefficient on the classification results, and to evaluate the offline learning of the training algorithms. Table 3 shows the results for the most accurate wavelets, where ‘positive class’ means the existence of HIF. Only data for this class is shown. Bior2.8 mother wavelet presented the best overall results, i.e., the maximum values for the number of true positives and negatives. Therefore, Bior2.8 presented the smallest number of misclassified data samples. For this reason, Bior2.8 is emphasized in subsequent analyzes. The number of neurons in SECoS after the training process varied from 5 to 12, depending on the wavelet and detailed coefficient used in pre-processing steps. For instance, for the Sym4-D1 wavelet, the neural network developed 5 neurons; for Sym4-D2 , SECoS evolved 12 neurons. In the other cases, the neural network maintained a reasonable number of neurons and parameters into its structure to perform online classification. Differently of the performance of the other classifiers, the performance reached by SECoS using bior2.2 was not satisfactory. The same situation is not uncommon to the other classifiers using different wavelets not presented in Table 3, such as Haar and Coif5. Actually, the dynamic behavior of the HIF problem is the reason for using several wavelets in the experiments. Fig. 10 shows some examples of confusion matrices for the classifiers considering Bior2.8 – D3 mother wavelet. All classifiers provided a good performance for the first recall dataset. High rates of true positives and negatives were achieved using a proper mother wavelet. The test data in this experiment were generated by HIF Model 1, i.e., the same model used for generating the training data. Faults were applied in different buses. Moreover, transients were considered in order to evaluate the robustness of the classifiers. The small number of false positives indicates that the classifiers are relatively immune to typical transients of power distribution systems. A second experiment was conducted regarding the second recall dataset. This dataset consists of a mixture of data generated by HIF
Fig. 10. Confusion matrices considering the first recall dataset, Bior2.8 mother wavelet, and D3 .
Models 1 and 2. The classifiers were trained based on HIF Model 1, and the recall data are examples generated by HIF Model 2. This is a more realistic problem due to the differences in the underlying electrical circuits, which aim to simulate differences between a model and a true system. Table 4 shows the average performance of the classifiers for the wavelets selected in the previous experiment. All classifiers had worse performance with respect to that shown in Table 3 for the first recall dataset. The number of true positives was kept at a similar level; however, the number of false negatives increased for all wavelets and classifiers – an increase of about 24%. Naturally, the rate of misclassified examples also increased in all situations.
482
S. Silva et al. / Electric Power Systems Research 154 (2018) 474–483
Table 4 Average performance of classifiers for the second recall dataset. Wavelet
Classifier
True positive %
True negative %
False positive %
False negative %
Misclassified examples %
D1
D2
D3
D1
D2
D3
D1
D2
D3
D1
D2
D3
D1
D2
D3
Db4
MLP PNN SVM SECoS Average
99.55 99.60 100 99.31
100 99.35 100 99.71 99.74
100 99.42 100 100
90.63 67.66 67.53 71.60
80.37 73.31 67.79 77.14 73.41
67.27 77.22 67.79 72.58
0.45 0.40 0 0.69
0 0.65 0 0.29 0.26
0 0.58 0 0
9.38 32.34 32.47 28.40
19.63 26.69 32.21 22.86 26.59
32.75 22.78 32.21 27.42
5.49 24.61 24.71 20.49
12.55 18.82 24.41 15.29 19.21
25 15.29 24.41 19.41
Sym4
MLP PNN SVM SECoS Average
100 99.19 100 87.50
100 99.29 100 93.73 98.20
99.77 99.57 99.59 99.78
82.13 67.62 67.44 51.68
83.31 70.83 67.79 71.90 75.60
90.33 93.72 67.22 93.23
0 0.81 0 12.50
0 0.71 0 6.27 1.80
0.23 0.43 0.41 0.22
17.87 32.38 32.56 48.32
16.69 29.17 32.21 28.10 24.40
9.67 6.28 32.78 6.77
11.18 24.71 24.80 48.04
10.29 21.27 24.41 21.27 18.68
5.59 3.63 25.10 3.82
Bior2.2
MLP PNN SVM SECoS Average
99.73 99.60 100 100
100 99.60 100 100 99.43
99.77 99.46 99.54 95.47
80.59 67.83 67.88 79.51
84.11 67.83 67.88 65.01 76.23
88.49 80.56 88.93 76.16
0.27 0.40 0 0
0 0.40 0 0 0.57
0.23 0.54 0.46 4.53
19.41 32.17 32.12 20.49
15.89 32.17 32.12 34.99 23.77
11.51 19.44 11.07 23.84
12.45 24.41 24.31 13.24
9.71 24.41 24.31 27.65 16.96
6.76 12.55 6.57 17.16
Bior2.8
MLP PNN SVM SECoS Average
99.73 99.60 100 99.68
100 99.60 100 100 99.84
100 99.44 100 100
81.34 67.83 67.88 73.46
80.12 67.83 67.79 82.78 72.84
67.79 78.97 67.53 70.72
0.27 0.40 0 0.32
0 0.40 0 0 0.16
0 0.58 0 0
18.66 32.17 32.12 26.54
19.88 32.17 32.21 17.22 27.16
32.21 21.03 32.47 29.28
11.86 24.41 24.31 18.63
12.75 24.41 24.41 10.69 19.64
24.41 13.82 24.71 21.27
It is important to highlight that the online learning of SECoS was turned off. Therefore, the neural network was not allowed to change its structure during the recall step. SECoS did not show any particular advantage over the other (non-evolving) classifiers. Actually, PNN reached the best performance for detecting HIFs in this experiment. An average performance of 86.2% was achieved by PNN, see Fig. 11.
5.2. Exploring SECoS incremental learning
Fig. 11. Confusion matrices of the classifiers considering the second recall dataset and Bior2.8 – D3 mother wavelet.
Fig. 11 shows the confusion matrices of the classifiers considering Bior2.8 – D3 wavelet. In general, a performance drop of about 13% to 24% with respect to the first experiment can be observed, especially due to the increase of false negatives. This means that, in given situations, the classifiers do not recognize HIFs. The performance drop of the classifiers is attributed to the difficulty of recognizing new fault patterns, i.e., patterns that are different from those used during the training. New HIF examples, generated by a different HIF Model, were included in the second recall dataset. From the 1020 examples in this dataset, half of them were generated by the HIF Model 2. The classifiers can continuous detect HIFs generated by Model 1, but sometimes fail in detecting those generated by Model 2. The 13–24% drop of the detection rates are associated to false negatives from these cases.
Online incremental learning of SECoS is evaluated in this section. The neural network adds neurons, and updates the connection weights, from the data stream. Similar to the second experiment, data samples generated by HIF Model 2 are considered. They are now used for online adaptation of SECoS. The same time window length and wavelet families, in pre-processing steps, were used. In summary, for each electrical current cycle: (i) a type of DWT is applied; (ii) the input data vector of SECoS is formed based on detailed coefficients; and (iii) the learning algorithm of SECoS may change the classifier parameters and structure in order to fit new information on a sample-per-sample basis. Table 5 shows the classification results. In Table 5, it is possible to note that SECoS reduced the number of misclassified examples to values close to those of the first experiment, considering HIF Model 1. The average false negative rate for Bior2.8 wavelet was 0.89% – the best rate among all wavelet families. This means that faults previously not identified could be recognized since the network topology is changed whenever new faults occur by means of electrical current patterns. The evolving topology of SECoS is responsible for this performance. The evolving layer is able to change, thus adding new neurons that mean new prototypes of fault patterns – or aggregating two or more neurons to mean only one prototype. In SECoS, retraining the neural network by using again all HIF training examples is not necessary. The incremental learning algorithm can act on the evolving layer, thus adding or removing neurons in order to accommodate new information as prototypes of data. The connections into the evolving layer, to the input and output layers, are also created or destructed during the online learning. In particular, the number of SECoS neurons using the Bior2.8 – D3 wavelet as feature extraction method increased from 9 to 17 during the online
S. Silva et al. / Electric Power Systems Research 154 (2018) 474–483
483
Table 5 SECoS neural network classification performance based on online incremental learning. Wavelet
db4 sym4 bior2.2 bior2.8
True positive %
True negative %
False positive %
False negative %
Misclassified examples %
D1
D2
D3
D1
D2
D3
D1
D2
D3
D1
D2
D3
D1
D2
D3
99.50 99.47 99.79 99.80
99.60 80.48 84.34 100
97.35 99.80 96.39 99.80
84.60 62.94 98.12 99.05
100 87.63 87.23 99.05
96.42 98.87 88.04 99.24
0.50 0.53 0.21 0.20
0.40 19.52 15.66 0
2.65 0.20 3.61 0.20
15.40 37.06 1.88 0.95
0.00 12.37 12.77 0.95
3.58 1.13 11.96 0.76
9.51 30.29 1.08 0.59
0.20 16.18 14.22 0.49
3.14 0.69 8.33 0.49
learning. Eight neurons were added to accommodate information from HIF Model 2 into the neural network. Seventeen neurons were enough to allow SECoS to detect HIFs generated by both Models 1 and 2. 6. Conclusion A method for detecting high impedance faults by using an evolving pattern recognition method is proposed in this work. SECoS is a class of intelligent system based on a dynamic connectionist structure that is able to adapt its topology to incorporate new information through incremental learning. In order to validate the proposed method, several experiments were conducted by using a 13-node test feeder, the IEEE 13 bus test feeder, and two validated HIF models. Several wavelets and three other well-established pattern recognition tools were used in the experiments, thus aiming to compare their performance in both feature extraction and detection rates. The ability of SECoS to learn and adapt its topology continuously provides a great advantage over the other methods. SECoS outperformed the other methods with respect to the robustness to changes in fault patterns. The other classifiers showed higher false negative rates compared to those presented by SECoS. The general results showed that SECoS combined with Bior2.8 wavelet is the best scheme to HIF classification. References [1] A. Ghaderi, H.L. Ginn, H.A. Mohammadpour, High impedance fault detection: a review, Electr. Power Syst. Res. 143 (1) (2017) 376–388. [2] M. Sedighizadeh, A. Rezazadeh, N.I. Elkalashy, Approaches in high impedance fault detection: a chronological review, Adv. Electr. Comput. Eng. 10 (3) (2010) 114–128. [3] T.M. Lai, L.A. Snider, E. Lo, D. Sutanto, High-impedance fault detection using discrete wavelet transform and frequency range and RMS conversion, IEEE Trans. Power Deliv. 20 (1) (2005) 397–407. [4] R. Samantaray, B.K. Panigrahi, P.K. Dash, High impedance fault detection in power distribution networks using time–frequency transform and probabilistic neural network, IET Gener. Transm. Distrib. 2 (2) (2008) 261–270. [5] R. Samantaray, Ensemble decision trees for high impedance fault detection in power distribution network, Int. J. Electr. Power Energy Syst. 43 (1) (2012) 1048–1055. [6] I. Baqui, I. Zamora, J. Mazon, G. Buigues, High impedance fault detection methodology using wavelet transform and artificial neural networks, Electr. Power Syst. Res. 81 (7) (2011) 1325–1333. [7] A. Etemadi, M. Sanaye-Pasand, High-impedance fault detection using multi-resolution signal decomposition and adaptive neural fuzzy inference system, IET Gener. Transm. Distrib. 2 (1) (2008) 110–118. [8] M. Michalik, W. Rebizant, M. Lukowicz, S. Lee, S. Kang, High-impedance fault detection in distribution networks with use of wavelet-based algorithm, IEEE Trans. Power Deliv. 21 (4) (2006) 1793–1802. [9] W.C. Santos, F.V. Lopes, N.S.D. Brito, B.A. Souza, High-impedance fault identification on distribution networks, IEEE Trans. Power Deliv. 32 (1) (2017) 23–32. [10] J.C. Chen, B.T. Phung, D.M. Zhang, Study on high impedance fault arcing current characteristics, Power Engineering Conference (2013) 1–6. [11] D. Leite, P. Costa, F. Gomide, Evolving granular neural networks from fuzzy data streams, Neural Netw. 38 (1) (2013) 1–16.
[12] D. Leite, R. Ballini, P. Costa, F. Gomide, Evolving fuzzy granular modeling from nonstationary fuzzy data streams, Evol. Syst. 3 (2) (2012) 65–79. [13] P. Angelov, D. Filev, N. Kasabov, Evolving Intelligent Systems: Methodology and Applications, Wiley – IEEE Press Series on Computational Intelligence, 2010. [14] M. Pratama, M.J. Er, X. Li, R.J. Oentaryo, E. Lughofer, I. Arifin, Data driven modeling based on dynamic parsimonious fuzzy neural network, Neurocomputing 110 (1) (2013) 18–28. [15] M. Pratama, S.G. Anavatti, P. Angelov, E. Lughofer, PANFIS: a novel incremental learning machine, IEEE Trans. Neural Netw. Learn. Syst. 25 (1) (2014) 55–68. [16] J.J. Rubio, SOFMLS: online self-organizing fuzzy modified least-squares network, IEEE Trans. Fuzzy Syst. 17 (6) (2009) 1296–1309. [17] D. Leite, P. Costa, F. Gomide, Evolving granular neural network for semi-supervised data stream classification, IEEE 2010 International Joint Conference on Neural Networks (IJCNN) (2010) 1–8. [18] N. Kasabov, Evolving Connectionist Systems: The Knowledge Engineering Approach, Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2007. [19] M. Watts, A decade of Kasabov’s evolving connectionist systems: a review, IEEE Trans. Syst. Man Cybern. Part C: Appl. Rev. 39 (3) (2009) 253–269. [20] R.M. French, Catastrophic forgetting in connectionist networks, Trends Cogn. Sci. 3 (4) (1999) 128–135. [21] M. Valipour, M.E. Banihabib, S.M. RezaBehbahani, Comparison of the ARMA, ARIMA, and the autoregressive artificial neural network models in forecasting the monthly inflow of Dez dam reservoir, J. Hydrol. 476 (7) (2013) 433–441. [22] D. Viero, M. Valipour, Modeling anisotropy in free-surface overland and shallow inundation flows, Adv. Water Resour. 104 (1) (2017) 1–14. [23] M. Valipour, Global experience on irrigation management under different scenarios, J. Water Land Dev. 32 (1) (2017) 95–102. [24] M. Valipour, How much meteorological information is necessary to achieve reliable accuracy for rainfall estimations? Agriculture 6 (4) (2016) 1–9. [25] M. Valipour, Variations of land use and irrigation for next decades under different scenarios, Braz. J. Irrig. Drain. 1 (1) (2016) 262–288. [26] M. Valipour, Selecting the best model to estimate potential evapotranspiration with respect to climate change and magnitudes of extreme events, Agric. Water Manag. 180 (A) (2017) 50–60. [27] N. Zamanan, J. Sykulski, The evolution of high impedance fault modeling, IEEE 16th International Conference on in Harmonics and Quality of Power (ICHQP) (2014) 77–81. [28] V. Torres, J.L. Guardado, H.F. Ruiz, S. Maximov, Modeling and detection of high impedance faults, Int. J. Electr. Power Energy Syst. 61 (1) (2014) 163–172. [29] F.B. Costa, B.A. Souza, N.S. Brito, J.A.C. Silva, Real-time detection of transients induced by high impedance faults based on the boundary wavelet transform, IEEE Trans. Ind. Appl. 51 (6) (2015) 5312–5323. [30] A. Ghaderi, H.A. Mohammadpour, H. Ginn, High impedance fault detection method efficiency: simulation vs. real-world data acquisition, IEEE Power and Energy Conference at Illinois (2015) 1–5. [31] W.H. Kersting, Radial distribution test feeders, IEEE Power Engineering Society Winter Meeting (2001) 908–912. [32] C.H. Kim, R. Aggarwal, Wavelet transforms in power systems. Part 1: General introduction to the wavelet transforms, Power Eng. J. 14 (2) (2000) 81–87. [33] C.H. Kim, H. Kim, Y.H. Ko, S.H. Byun, R.K. Aggarwal, A novel fault-detection technique of high-impedance arcing faults in transmission lines using the wavelet transform, IEEE Trans. Power Deliv. 17 (4) (2002) 921–929. [34] J.J. Rubio, Evolving intelligent algorithms for the modeling of brain and eye signals, Appl. Soft Comput. 14 (B) (2014) 259–268. [35] D. Leite, R. Palhares, V. Campos, F. Gomide, Evolving granular fuzzy model-based control of nonlinear dynamic systems, IEEE Trans. Fuzzy Syst. 23 (4) (2015) 923–938. [36] P. Angelov, A fuzzy controller with evolving structure, Inf. Sci. 161 (1) (2004) 21–35. [37] A. Ghobakhlou, M. Watts, N. Kasabov, Adaptive speech recognition with evolving connectionist systems, Inf. Sci. 156 (1–2) (2003) 71–83. [38] D.F. Specht, Probabilistic neural network, Neural Netw. 3 (1) (1990) 109–118. [39] M.A. Hearst, S.T. Dumais, E. Osman, J. Platt, B. Scholkopf, Support vector machines, IEEE Intell. Syst. Appl. 13 (4) (1998) 18–28.