Computers and Electrical Engineering 61 (2017) 80–94
Contents lists available at ScienceDirect
Computers and Electrical Engineering journal homepage: www.elsevier.com/locate/compeleceng
Fault detection of aircraft based on support vector domain descriptionR Yaoming Zhou a, Kan Wu b, Zhijun Meng a,∗, Mingjun Tian c a
School of Aeronautic Science and Engineering, Beihang University, Beijing, 100191, China School of Mechanical and Aerospace Engineering, Nanyang Technological University, 639798, Singapore c AVIC Shenyang Aircraft Design and Research Institute, Shenyang, 110025, China b
a r t i c l e
i n f o
a b s t r a c t
Article history: Received 26 January 2017 Revised 13 June 2017 Accepted 13 June 2017
To realize intelligent fault detection of aircraft lacking fault samples, a novel fault detection algorithm for aircraft based on Support Vector Domain Description (SVDD) is proposed. The Genetic Algorithm (GA), threshold scaling factor, rapid anomaly detection, modifying kernel function and SVDD model boundary based on equal loss are introduced to the fault detection algorithm. The empirical analyses show that the method has good fault detection ability. The classification accuracy is improved by 5.52% after using the GA. The fault detection time of the SVDD algorithm is improved by 0.4 seconds on average when compared to the red line shutdown system. The accurate classification rate is enhanced by 0.0225, and the number of support vectors is reduced by 1 after adopting the modified kernel function. The fault detection algorithm in this paper provides novel intelligent fault detection technology for aircraft. © 2017 Elsevier Ltd. All rights reserved.
Keywords: Fault detection Support vector domain description Aircraft Modifying kernel function
1. Introduction Support Vector Domain Description (SVDD) is a one-class classification method recently developed by David M.J. Tax et al. The core idea is to treat the object as a whole and construct a hypersphere in the eigenspace so that the complete object or the majority of it is included in the hypersphere. An object falling into a different class is completely excluded or minimally included in the hypersphere. Points within the hypersphere are classified as the target class, and points outside the hypersphere are classified as the non-target class. Thus, the aim of separating the two classes can be achieved [1–3]. The hypersphere has two features, the center a and radius R, as shown in Fig. 1. By minimizing the radius, the area circumscribed by the hypersphere is also minimized, while the amount of training data included is maximized. Under most scenarios, it might not be feasible to separate the target class and non-target class completely. There are overlaps between the classes during the mapping onto the eigenspace. To address these overlaps, the slack variable ε i was added to allow some points to appear on the wrong side of the boundary. Therefore, the first problem that needs to be solved for fault detection of aircraft based on SVDD is minimizing the radius of the hypersphere:
min F (R, a, ε ) = R2 + C
εi
i
R ∗
Reviews processed and recommended for publication to the Editor-in-Chief by Associate Editor Dr. R. Varatharajan. Corresponding author. E-mail addresses:
[email protected],
[email protected] (Z. Meng).
http://dx.doi.org/10.1016/j.compeleceng.2017.06.016 0045-7906/© 2017 Elsevier Ltd. All rights reserved.
(1)
Y. Zhou et al. / Computers and Electrical Engineering 61 (2017) 80–94
81
Fig. 1. Schematic diagram of the hypersphere.
The constrained condition is:
( xi − a ) ( xi − a )T ≤ R2 + εi , εi ≥ 0, ∀i
(2)
where C is a penalty on the misclassified sample. Applying Lagrange multipliers can transform the problem into a minimized target function L:
L(R, a, αi , γi , εi ) = R2 + C s.t. 0 ≤ αi ≤ C, αi = 1
i
εi −
i
αi (R2 + εi − xi − a2 ) −
i
γi εi (3)
i
where Lagrange multipliers α i ≥ 0, γ i ≥ 0. Partial derivatives for R, a, ε i were obtained to give the following optimized problem:
max L(αi ) = αi
i
s.t. 0 ≤ αi ≤ C,
αi (xi · xi ) −
i
αi = 1
i, j
αi α j (xi · x j ) (4)
Finally, the fault detection problem in aircraft based on SVDD is transformed into the following test function (where z is the sample to be fault detected):
z − a2 = (z · z ) − 2
αi (z · xi ) +
i
αi α j (x j · xi ) ≤ R2
(5)
i, j
If the above test function is to be applied in discriminant decisions, the radius of the hypersphere must be calculated during training:
R2 = ( xk · xk ) − 2
αi (xk · xi ) +
i
αi α j (xi · x j )
(6)
i, j
where xk is any support vector from the set of support vectors that satisfy ak < C. However, most real-world problems are non-linearly distributed. According to the kernel method, a non-linear mapping ϕ is first adopted to map data into a high-dimensional eigenspace, and linear classification is then performed in the highdimensional eigenspace. After mapping the data back into the original space, the data become a non-linear classification in the input space [4,5]. To avoid complex calculations in high-dimensional space, the kernel function K(xi , xj ) can be used to replace the inner product calculation ϕ (xi ), ϕ (xj ) of the high-dimensional eigenspace. After introducing the kernel function, the target optimized function becomes:
max L(αi ) = αi
αi K (xi · xi ) −
i
αi α j K (xi · x j )
(7)
i, j
which is constrained by: 0 ≤ αi ≤ C, i αi = 1. The formula expressing the corresponding R2 and test function can be written as:
R2 = K ( xk · xk ) − 2
αi K (xk · xi ) +
i
αi α j K (xi · x j )
(8)
i, j
For the new sample Z, the detection discriminant function is (9), where a =
f (z ) = φ (z ) − a = K (z · z ) − 2 2
i
αi K (z · xi ) +
αi α j K (xi · x j ) ≤ R2
l
i=1
αi φ ( x i ) . (9)
i, j
The rest of this paper is organized as follows. The implementation of the detection method based on SVDD is explained clearly, mainly including the optimization of the SVDD kernel parameters based on the Genetic Algorithm (GA) and the introduction of the threshold scaling factor, the rapid anomaly detection algorithm, a method of modifying the kernel function and the SVDD model boundary. The conclusions of this paper are listed at the end.
82
Y. Zhou et al. / Computers and Electrical Engineering 61 (2017) 80–94
Fig. 2. Fault detection process for aircraft based on SVDD.
2. Implementation of detection method based on SVDD The basic process of the fault detection system of aircraft based on SVDD is presented in Fig. 2. In the detection system, historical trial running data determine the generalizability of the SVDD classification. The validity of the running data is ensured by redundancy sensors [6]. The selection of training data is very important; normalizing characteristics of the data sample can increase the accuracy of the classification. SVDD training is the focus of this article, and the SVDD training, including the selection and modification of the kernel function and the confirmation of the parameters of the kernel function, determines the detection accuracy rate. Parameter storage retains the parameter values of the SVDD model. Subsequent to parameter storage, alarm detection is performed with faults, and decision-making is done using the test function for discrimination calculation. 2.1. Data sample pre-processing According to the characteristics of the SVDD classification algorithm, when SVDD is used to construct the hypersphere model for fault detection, the required hypersphere model should cover all normal samples as much as possible. Therefore, the biggest challenge and requirement of this detection method is the completeness of the data. Not only does the aircraft system sample, such as the propulsion system sample, lack the fault sample, but the normal sample is also often incomplete. The following solutions are adopted to overcome this shortcoming. (1) With regard to the algorithm, we consider that the detection algorithm of this article theoretically trains only an incomplete normal sample, and there is a lack of fault data to adjust the compactness of the decision boundary (relative to normal data). Therefore, the detection model can control its generalizability through controlling the false alarm rates of normal data, that is, by allowing a certain error rate in the normal training data. To address the lack of fault samples to adjust the compactness of the detection model, the threshold scaling factor ε [7] is introduced, and the scaling factor can balance the false alarm rate of the normal data and the missed alarm rate of the fault sample. This issue will be further elaborated. (2) Maximum completeness of sample data selection should be achieved. For normal data from homotype propulsion systems or the same type of fault data, despite their different values or with significant differences, the data belong to the same category on the whole. Thus, during training set selection, data from various ranges should be included in the training sets as much as possible. (3) Normalization of the eigenvectors of the training sample should be performed to increase the accuracy of the classification; the eigenvalue should be in the range of [0,1]:
xi, j − min j (xi, j ) i, j = max j (xi, j ) − min j (xi, j )
x
where xi,j represents the measured value of the jth parameter in the ith sample. 2.2. Model selection based on SVDD The problem of model selection based on the SVDD classification algorithm is essentially about finding the best kernel function for a specific problem. Finding the best kernel function involves decisions such as confirming the type of kernel function, optimizing kernel functions and adjusting kernel functions according to the given data. Model selection would influence the generalizability of the learning machine, so the choice of the SVDD model is often seen as a key question in practical application. The problem expression and solution computation of the SVDD algorithm often include determining a number of design parameters: first, the type of kernel function and its parameters; second, the penalty parameter C. The kernel function indirectly describes the high-dimensional eigenspace of the SVDD. The function of parameter C is determining the degree of penalty of the distant sample. Determination of these parameters has a huge impact on the learning performance of the SVDD algorithm.
Y. Zhou et al. / Computers and Electrical Engineering 61 (2017) 80–94
83
Due to the advantages of the Gaussian kernel function [8,9], it was chosen for the SVDD algorithm in this article.
2
k(xi , x j ) = exp(−xi − x j /σ 2 )
(10)
2.3. Selection of σ The different selections of the Gaussian parameter σ have a great influence on the performance of the SVDD. When the value of σ is very large relative to the distance between the samples, the changes in the kernel function values caused by the distances between samples will be small. The results cause the difference between the target sample and non-target sample to shrink. In other words, the kernel function is insensitive to the differences between data, it is more difficult to differentiate the differences, and the identification efficiency is lowered. At this stage, SVDD is undergoing “over-learning”, the generalizability is poor, and SVDD is only well fitted for specific learning sample sets. However, SVDD does not have any value for application. In contrast, when the Gaussian parameter σ is very small, even when the distance between samples is small, it still causes a larger change to the kernel value, occurring when the parameter is over-sensitive to the data; that is, small differences between the same types are magnified, resulting in a misjudgment, and the identification ability is also affected. Therefore, the distance between the samples is indicative for the selection of σ ; one can consider the largest classified internal distance of the target sample to choose the value of the Gaussian parameter. If the eigenvalues triggered by some faults are very small, changes in the Gaussian parameter value would be more sensitive to this type of fault, and a smaller Gaussian parameter value σ is needed. The choice of σ is the result of multiple considerations. 2.4. Selection of error penalty parameter c The error penalty parameter C determines in its algorithm the degree of penalty for the misjudged sample. The size of parameter C has a huge impact on the position of the most optimized cutting plane. This paper will illustrate the impact of C on the algorithm from two perspectives. According to the optimized formula [10,11], since li=1 αi = 1 and 0 ≤ α i ≤ C, when C < 1/l and li=1 αi < 1, there is no solution to the formula. There is always a solution when C > 1, and the solution always satisfies the condition α i < C. All training samples are within the hypersphere (including the boundaries). When 1/l ≤C≤ 1, the parameter C has the effect of controlling the number of training samples outside the hypersphere. The smaller the value of the parameter C is, the larger the number of training samples outside the hypersphere. C is used to balance the size of the hypersphere and the error rate of the normal samples, but looking from the perspective of C alone does not carry any meaning in physics. Thus, in practical applications, 1/vl is often used to replace C, where l is the number of training samples, and v is the parameter set by the user. Indeed, v is the upper limit of the error rate of the target class (the number of support vectors outside the hypersphere as a percentage of the total sample), and it is the lower limit of the number of support vectors as a percentage of the training sample. The proof is shown in the references [12,13] and is omitted here. This transformation turns the selection of parameter C into the setting of v, and v carries a special meaning. C = 1/vl < 1/l can generalize to v > 1, which, combined with the meaning of v, is not possible. C > 1 implies v < 1/l, so vl< 1; then, all training samples are within the hypersphere. If 1/l ≤ C ≤ 1, then 1 ≤ vl ≤ l, and the number of training samples outside the hypersphere is within the range of (1,l). Therefore, this type of conversion is more indicative of the selection of C. 2.5. Optimization solution for parameter σ based on the genetic algorithm (GA) Due to the superiority of the GA in optimizing solutions, this article pursues the basic process of optimizing the parameter σ based on the GA. The GA individuals represent a group of parameters in the SVDD algorithm; the corresponding fitness function of the individuals is the classification effect of the SVDD algorithm under that group of parameters. When performing the GA for parameter search, a fitness function is chosen:
F (C ,σ 2 ) = p
(11)
To maximize the optimized target:
max F (C, σ 2 ) where p is the classification accuracy rate of SVDD test sample. A total of 664 groups of the data sample are used for a simulation experiment, where 230 groups are training samples, and 434 groups are used as testing samples. When the GA evolved to the 20th generation, the classification accuracy of SVDD was sufficiently high, and applying the parameters obtained from the GA to the SVDD training, the classification accuracy rate obtained was higher than the rate from the SVDD algorithm using the data distribution estimation method, as shown in Table 1. The above results show that by using the GA to perform the optimization search for the SVDD kernel parameters, more satisfactory results can be derived. There is an advantage compared to the standard SVDD algorithm. The classification accuracy is improved by 5.52% after using the GA.
84
Y. Zhou et al. / Computers and Electrical Engineering 61 (2017) 80–94 Table 1 Comparisons between GASVDD and SVDD. Algorithm
Error penalty factor C
Gaussian kernel parameter σ 2
Classification accuracy of testing sample (%)
SVDD GA-SVDD
0.6 0.5
0.74 0.88
86.42 91.94
Fig. 3. Effect of parameter σ on model flexibility.
2.6. Fault detection of aircraft system based on SVDD This section focuses on the empirical analysis of SVDD used for the aircraft propulsion system. Two main parts are discussed. To supplement the adjustment effect from the lack of fault data on model compactness, the threshold coefficient ε is introduced to balance the false alarm rate of normal data and the missed alarm rate of fault data. The problem of sample incompleteness is overcome to a certain extent. To further increase recognizability of the data and more effectively separating the two different types of modes, a method of modifying the kernel function is used in this paper, which significantly magnifies the area surrounding the support vector xi . 2.7. Adjustment of threshold scaling factor ε The penalty parameter C could control the false alarm rate of the detection model of the normal sample. Adjusting the parameter σ can increase model flexibility. Can the small value of v remain unchanged while shrinking σ to increase the flexibility of the model? Using a simple dataset of two-dimensional data (40 samples in the dataset) as an example, the hypersphere models from sample training are shown in Fig. 3(a)–(c), where v = 0.02 remains unchanged, and the value of σ is 30, 20 and 10, respectively. Though the parameter σ is gradually shrinking, model flexibility is increased. However, the error rate of the normal sample (that is, the false alarm rate) also increased, and further shrinking of parameter C did not change the results significantly, as shown in Fig. 3(d). Therefore, only controlling parameter C and parameter σ failed to generate a model with a low false alarm rate and good flexibility. One solution is to introduce the proportional scaling factor ε to the R2 threshold and to transform Eq. (9) as:
f (z ) = K (z, z ) − 2
i
αi K (xi , z ) +
i, j
αi α j K (xi , x j ) ≤ ε · R2
(12)
Y. Zhou et al. / Computers and Electrical Engineering 61 (2017) 80–94
85
Fig. 4. Hypersphere model of threshold before and after ε scaling.
Fig. 5. Training results from split subsets of each sample.
The advantage of this solution is that by artificially scaling the model boundaries, the adjustment effect on model compactness from fault data can be complemented. The exact value of ε is contingent upon the specific object, but the value normally ranges from 0.90–1.10. Fig. 4 is the hypersphere model before and after changing the threshold with the scaling factor ε . The graph shows that by adjusting ε , not only can the model compactness be more effectively adjusted, but model flexibility is also maintained. 2.8. Improvements in training In the optimization process, not only is the support vector optimized, the non-support vector is also optimized, and this optimization inevitably increases the amount of unnecessary calculation. If the support vectors can be pre-selected and the optimization process is performed only on these support vectors, the amount of time required for calculation and internal storage consumption can be greatly reduced. Therefore, we applied the rapid anomaly detection algorithm based on SVDD. Through sample splitting, the algorithm effectively lowers the amount of calculation needed to obtain support vectors. The fundamental idea is to split the training sample into a number of sub-sets, where the number of training samples included in each subset is identical. Learning training is then performed on each subset, thus obtaining the support vector corresponding to the subset. Then, training is performed on the subset support vector obtained, and eventually, the support vectors for the complete training sample are obtained. After obtaining the parameter α i and threshold value R2 for the SVDD algorithm model, Eq. (9) is used to decide if any abnormality exists. In the following example, the two-dimensional data from the previous section were used as an example (40 samples were split into two groups, the asterisk denotes a sample from the first group, the dot denotes a sample from the second group, and each group has 20 samples). Two subsets were constructed to perform support vector split training, to demonstrate that by using this method for training, not only could we obtain results similar to the standard method, but the training efficiency was also enhanced, and this can be applied to training for larger samples of high-dimensional data.
86
Y. Zhou et al. / Computers and Electrical Engineering 61 (2017) 80–94
Fig. 6. Empirical data of split sample training. Table 2 SVDD detection algorithm test results on steady-state fault data. Test data number
Red line system shutdown time (s)
BP (Back Propagation) detection time (s)
SVDD detection time (s)
TEST10-1 TEST15-5 TEST20-0 TEST61-9 TEST62-7
43.2 275.8 30.2 402.7 13.0
42.8 275.8 30.2 402.8 12.3
43.2 275.6 30.2 401.4 12.5
In Figs. 5 and 6, (a) and (b) show the results from using one group of data for training. (c) shows the decision boundary from the standard SVDD training on the complete sample; the boundary circumscribes all sample points. Fig. 6(d) shows the separate training performed on the two groups of samples. Support vectors were found for each group and combined to form the new training sample set. Training was performed on the new training sample set to obtain the decision boundary. The figures show that the decision boundaries of (d) and (c) were similar. Therefore, using an improved training method would yield results similar to using the pre-improved training method. However, the decision result from calculation (d) is far easier to obtain than that from the method of (c), especially when the training sample is large. Thus, using the training strategy of sample splitting can greatly increase the training efficiency. 2.9. Empirical analysis Using the SVDD model obtained and the support vector derived from training, combined with the logic of fault detection from Eq. (12), the system detection threshold was found and the detection process for an aircraft propulsion system can be constructed. There is no rule on setting the values of smoothing parameter v, parameter σ and threshold scaling factor ε . These values must be defined according to the objects. Based on the above analyses, the model parameters pre-defined in the detection system in this article are as follows: 1) Set a small v (0.01) to effectively control the false alarm rate during testing; 2) The chosen Gaussian parameter σ 2 is 0.8; 3) After choosing v and σ , ε is finally set at 1.05, moderately scaling the hypersphere model (scaling the threshold), because when the sample is incomplete, the range of normal data should be wider. To reduce the false alarm rate during faults and increase the robustness of the detection algorithm, the prediction criterion is added: that is, only when multiple data abnormalities are detected consecutively is the existence of a fault in the system confirmed. The reason for applying this detection logic is that it can reduce the influence from random intervention and measurement noise. Only when the integrated detection indicator exceeds the threshold five times would the system identify a fault in the generator and release an alarm. Valid empirical testing was performed using historical propulsion systemic data, five groups of fault data and six groups of normal test data with the SVDD algorithm. No false alarm emerged from the algorithm. For the 5 steady-state groups of fault test data, SVDD also did not show any false alarms, and its fault detection results are presented in Table 2. Fig. 7 shows the detection result of TEST62-7 test data using SVDD; the fault detection time was 12.5 s. Fig. 8 shows the detection result of TEST20-0 test data based on SVDD; the fault detection time was 30.2 s. Fig. 9 shows the detection result for TEST15-5 test data using the SVDD algorithm; the fault detection time was 275.6 s. Fig. 10 shows the detection result for TEST10-1 test data using the SVDD algorithm; the fault detection time was 43.2 s. The fluctuations for these test data were large, but according to the criterion of continuity, there were no five consecutive times of exceeding the threshold value prior to 43.2 s; thus, there were no false alarms. According to Table 2, for the five failure tests, comparing the SVDD algorithm and the red line shutdown system used by the specific model of the propulsion system, the fault detection time was always pre-emptive. The fault detection time
Y. Zhou et al. / Computers and Electrical Engineering 61 (2017) 80–94
87
Fig. 7. SVDD algorithm detection result for TEST62-7.
Fig. 8. SVDD algorithm detection result for TEST20-0.
Fig. 9. SVDD algorithm detection result for TEST 15–5.
Fig. 10. SVDD algorithm detection result for TEST 10–1.
was improved by 0.4 s on average. Furthermore, Table 2 also presents the detection time comparisons between the SVDD algorithm and the BP algorithm. The SVDD detection times for two tests, TEST15-5 and TEST61-9, were pre-emptive, while the detection time for TEST62-7 and TEST-10-1 lagged that of the BP algorithm. The fault detection time of the BP algorithm was pre-emptive by 0.2 s, on average, when compared to the red line shutdown system. However, the BP algorithm was slowed down by 0.1 s in TEST61-9, which is not permitted. Overall, the SVDD algorithm is superior to the BP algorithm.
88
Y. Zhou et al. / Computers and Electrical Engineering 61 (2017) 80–94
2.10. Increasing SVDD algorithm identification efficiency based on modified Gaussian kernel function 2.10.1. Modified kernel function [14] By analyzing the kernel function with Riemannian geometry, Amari and Wu suggest that with regard to empirical data, the original kernel function can gradually be modified for better adaptation to application issues. Given eigenvector mapping U = (x):
∂ ∂ xi (x )d xi i 2 dU = gi j (x )dxi dx j dU =
(13)
i, j
where gi j (x ) = ( ∂∂x (x )) · ( ∂∂x (x )), non-negative definite matrix gi j (x) is the Riemann curvature tensor above Rn , and i j dU 2 = gi j (x )dxi dx j is the Riemann distance above Rn . Given Riemann distance Rn as a Riemannian manifold, the loi, j
cal volume is expressed as:
dv =
g(x )dx1 · · · dxn
(14)
where g(x ) = det(gi j (x )). In other words, g(x) reflects the degree of scaling of the surrounding area around point (x) in the eigenspace. Since k(x, z) = ((x)•(z)), it can be verified that:
gi j ( x ) =
∂ k(x, z )|z=x ∂ xi ∂ z j
(15)
Especially for the Gaussian function k(x,z) =exp(−x − z /2σ 2 ), gij (x) = (1/σ 2 )•δ ij . The aim is to effectively separate the two different types of modal areas, enlarging the distance between them, that is, magnifying the area around detached surfaces as much as possible. The method of kernel function adjustment can achieve this aim. Set c(x) as a positive differentiable real function and k(x, z) as a Gaussian kernel function; then: 2
∼
k (x, z ) = c (x )k(x, z )c (z )
(16)
This is also a kernel function, and: ∼
g ( x ) = ci ( x ) c j ( x ) + c 2 ( x ) gi j ( x )
(17)
ij
where ci (x ) = ∂∂x c (x ). Amari and Wu set c(x) as the following function: i
c (x ) =
hi e−x−xi
2
/2τ 2
(18)
xi ∈SV
where τ > 0 and, the parameter hi is the weighted coefficient. Around support vector xi , there is:
∼
g (x ) =
hni
σ
n
e
2 2 −n r /2τ
1+
σ2 2 r τ4
(19)
where r = x − xi is the Euclidean distance, and from Eq. (19), we can derive:
d
2 2 hn 2nr 2σ 2 r g(x )/dr = i n e−n r /2τ − 2 (1 + 2σ τ4 τ
∼
σ2 2 σ2 r ) / 1 + 4 r2 τ4 τ
(20)
The following conclusions are derived through analytic Eq. (20):
∼ √ (1) When τ < σ / n, r = 1/n − τ 2 /σ 2 > 0 (d g (x )/dr = 0); the meaning of r shows that the magnifying factor is the largest at a distancexi from r. ∼ √ (2) When τ ≥ σ / n, d g (x )/dr ≤ 0 for any r, implying that at r= 0, the magnifying factor is the largest.
∼
To ensure that g (x ) is magnified in the area surrounding the support vector xi , the chosen value of τ should not be √ too much smaller than τ ≈ σ / n (otherwise r becomes large and can no longer be located around xi ), so the chosen value √ should be approximately τ ≈ σ / n.
√
τ ≈ σ/ n
(21)
Thus, the new training process comprises two steps: Step 1: First, a kernel k (Gaussian kernel) is used to perform training; then, Eq. (16) is applied to obtain the adjusted kernel k. Step 2: k is used to perform training.
Y. Zhou et al. / Computers and Electrical Engineering 61 (2017) 80–94
89
Fig. 11. Identification results from the standard Gaussian kernel function.
Fig. 12. Identification results from the modified kernel function.
Table 3 Comparison of kernel function (before and after modification). Algorithm
Number of support vectors
Accurate classification rate
Standard Gaussian kernel function Modified Gaussian kernel function
11 10
0.9533 0.9758
2.10.2. Empirical analyses A total of 664 groups of data samples were obtained from a simulation experiment, where 230 groups were training samples, and 434 groups were used as testing samples. Fig. 11 shows the training results using the standard Gaussian kernel function. Fig. 12 shows the training results using a modified Gaussian kernel function. Both functions use the relative distance f(z)/R ratio as the vertical coordinate. Comparison of the two figures shows that the number of points exceeding the threshold line is fewer in Fig. 12. To better reflect the magnification of the data distribution after introducing the modified kernel function method, subtraction was implemented between the vertical coordinate of Fig. 12 and that of Fig. 11, as shown in Fig. 13, where time (t) is the horizontal coordinate. Fig. 13 shows that most of the points are above zero, meaning that the use of the modified kernel function method can indeed magnify the distribution of data. As for the small number of points below zero, this occurs because this algorithm only magnifies the area surrounding support vector xi ; thus, the relative distance of data in other areas would shrink.
90
Y. Zhou et al. / Computers and Electrical Engineering 61 (2017) 80–94
Fig. 13. Magnified results of the kernel function (before and after modification).
Fig. 14. Detection results of the TEST 10–1 test with the standard Gaussian kernel function and SVDD algorithm.
Table 3 is a comparison of the number of support vectors and accurate identification rate with the modified kernel function. As shown in Table 3, the accurate classification rate is enhanced by 0.0225 after adopting the modified kernel function, but the change was small because the distribution of the empirical data had been relatively centralized (see Fig. 11). The original training was already fairly precise. The number of support vectors is reduced by only 1. From the conclusion of the analyses, we know that v is the lower limit for the number of support vectors as a percentage of training; thus, vl= 1/C (l is the number of samples) is the lower limit of the number of support vectors. The choice of the value of parameter C is an important factor determining the number of support vectors. Therefore, while maintaining the same value of C under the two trainings, the minimum value of the number of support vectors for the modified kernel function is confirmed. By applying the standard Gaussian kernel function, we performed accurate identification with five fault data tests and six normal data tests, and no false alarms emerged. Nevertheless, modifying the Gaussian kernel function could effectively increase the accurate classification rate. This method could effectively prevent a false alarm, that is, before a fault occurs, the situation where the threshold value is exceeded five times consecutively is effectively prevented. An empirical example is given in the following with data from the TEST 10–1 test between 38 and 48 s, where fluctuation was greater. As shown in Fig. 14, when the standard Gaussian kernel function is used for training, the situation where the threshold value is exceeded for five consecutive times did not emerge from the data between 38 and 42.6 s, while 30 data points exceeded the threshold. Fig. 15 presents training results using the modified Gaussian kernel function. Similarly, the situation where the threshold value is exceeded for five consecutive times did not emerge from the data between 38 and 42.6 s, while only 27 data points exceeded the threshold. Although with a mere difference of 3 data points, the use of the modified Gaussian kernel function significantly prevents exceeding the threshold for five consecutive times and thus a false alarm. The figure shows that SVDD algorithm based on the modified Gaussian kernel function can increase the separability between categories.
Y. Zhou et al. / Computers and Electrical Engineering 61 (2017) 80–94
91
Fig. 15. Detection results of the TEST 10–1 test with the modified Gaussian kernel function and SVDD algorithm. Table 4 Risk decisions diagnostic table of two fault types. Diagnosis\Status
N
F
N F
0 L2
L1 0
2.11. Discussion of SVDD model boundary based on equal losses There are two types of samples, target sample set N and non-target sample set F, corresponding to classification {l, −1}. The decisions are presented in Table 4. Losses L2 and L1 in the table refer to losses caused when the system is operating (target/non-target) while the system was diagnosed as (non- target/target). Obviously, for standard fault diagnostic problems, normally L1>L2, meaning that losses caused by the two types of error decisions are different: losses caused by misjudging non-target status as target status are larger than lossses caused by misjudging target status as non-target status. This section combines the characteristic of different losses of the two types of misclassifications in diagnosis, and reoptimizes the design of the hypersphere boundary, with the aim to minimize the losses caused by classifications. The section above artificially introduces the threshold scaling factor ε to control the SVDD detection model boundary, so the identification effect of the classification model can be improved. However, there is no conventional rule for the chosen value of ε , thus there are uncertainties. Regarding identification problems in actual models, the target class and the nontarget class are not completely separable, implying that there are overlaps between the two classes in eigenspace. Therefore, within an area, a number of uncertain factors exist for diagnostic results, and this area can be named as the questionable diagnostic area. If two scaling factors can be found, ε 1 and ε 2 (ε 2 > ε 1 ), for test sample z, where f(z) ≥ ε 2 •R2 or f(z) ≤ ε 1 •R2 , then a clear diagnostic result can be generated.
y=
1, f ( z ) ≤ ε1 · R2 −1, f (z ) ≥ ε2 · R2
(22)
where “1” represents the target category, and “−1” represents the non-target category. Uncertainty exists in the area ε 1 •R2 < f(z) < ε 2 •R2 , which is the questionable diagnostic area. To quantitatively describe the degree of uncertainty of the diagnostic results in the questionable diagnostic area, the diagnostic reliability function can be used. The degree of diagnostic reliability T(z, y): assessing the degree of reliability where the current operation status z is diagnosed as y. Based on this definition, for all detection problems, any sample z has:
T (z, 1 ) + T (z, − 1 ) = 1
(23)
This article adopts the linear uncertain function, as shown in Fig. 16, expressed as:
T (z, 1 )=
ε2 · R2 − f ( z ) ε2 · R2 − ε1 · R2
(24)
92
Y. Zhou et al. / Computers and Electrical Engineering 61 (2017) 80–94
Fig. 16. Diagnostic reliability function.
Fig. 17. Distribution of losses caused by two types of misclassification.
Considering that losses triggered by the two types of misclassification at the cutting plane should be equal, the position of the cutting plane should be in the questionable diagnostics area, and satisfy the equation:
T (z, 1 ) · L2 = T (z, − 1 ) · L1
(25)
Set the satisfying Eq. (25) for the cutting plane formula as:
f (z ) = d When
(26)
f(z) = R2 :
T (z, 1 ) f (z )=R2 =
ε2 − 1 ε2 − ε1
(27)
Fig. 16 shows that a proportional relationship exists at f(z) = d:
ε2 − 1/ε2 − ε1 ε2 · R2 − R2 = ε2 R2 − d T (z, 1 ) f (z )=d
(28)
From the set of four simultaneous Eqs. (23), (25), (26) and (28), the following is obtained:
d=
L1 · ε1 + L2 · ε2 · R2 L1 + L2
(29)
Inserting Eq. (29) into Eq. (26), the following cutting plane equation is obtained:
f (z ) =
L1 · ε1 + L2 · ε2 · R2 L1 + L2
(30)
To sum up the above, the process of applying the identical loss-SVDD algorithm for fault detection is as follows: (1) (2) (3) (4)
Choose an appropriate kernel function to build an optimized formula (7); Solve the optimized equation and obtain the support vector and its corresponding α i ; Obtain the best cutting plane f(z) − R2 = 0 from Eq. (9) under the Vapnik theory; Choose the appropriate ε 1 and ε 2 , and construct the needed cutting plane for fault detection from Eq. (30).
Consider two special cases: (1) L1 = L2, where the cutting plane is f(z) = R2 , identical to the optimized cutting plane with Vapnik’s theory; (2) L1 >> L2, where the cutting plane is f(z) = ε 1 •R2 . The direct explanation is that losses caused by misjudging target status as non-target status are larger than losses caused by misjudging non-target status as target status. Therefore, the samples in the suspicion diagnosis area are all treated as non-target status. The following discusses the differences between confirming the cutting plane based on equal losses (at the cutting plane) and the standard method for confirming boundaries. As presented in Fig. 17, in the questionable diagnostic area, the trends of changes in loss caused by the two types of misclassifications are different. At “1”, losses caused by the two types of misjudgments are identical; the corresponding value f(z) is the cutting plane based on equal loss. Compared to other questionable diagnostic areas, the sum of losses caused by the two types of misjudgment is the smallest at “1”; “2” and “3” were other cutting planes.
Y. Zhou et al. / Computers and Electrical Engineering 61 (2017) 80–94
93
For the two cutting planes “1” and “2”, the risks involved are identical for questionable diagnostic areas prior to “2”. For the area between “1” and “2”, the risks involved for the two types of classification are different. For example, if a sample z is located between the two cutting planes “1” and “2”, and if the cutting plane is “1”, the sample z will be identified as the target class, and the risk value involved is T(z, −1)•L1 (because the probability of sample z being in a non-target class is T(z, −1), and the probability when a non-target class is diagnosed as a target class is L1). If the cutting plane is ‘2 , the sample z is identified as non-target class, and the risk value is T(z, 1)•L2 (because the probability that sample z is a target category is T(z, 1), and L2 is the risk of misdiagnosing a target class as a non-target class). Thus, the figure shows that T(z, 1)•L2>T(z, −1)•L1. Similarly, for the area between “1” and “3”, the risks involved in the two types of classification are different. For example, for a sample z located between two types of cutting planes “1” and “3”, if the cutting plane is “1”, sample z is identified as non-target class, and the risk value involved is T(z, 1)•L2 (because the probability that sample z is a target class is T(z, 1), and the risk of misdiagnosing a target class as a non-target class is L2). When the cutting plane is “3”, the sample z is identified as a target class, and the risk value is T(z, −1)•L1 (because the probability that sample z is a non-target class is T(z, −1), and the probability of misdiagnosing a non-target class as a target class is L1). Additionally:
T(z, −1 ) · L1 < T(z, 1 ) · L2. The above analysis show that determining cutting planes based on equal losses involves smaller risks compared to other cutting planes. Therefore, the method of drawing the SVDD model boundary based on equal loss is logical and effective. 3. Conclusions A novel fault detection method of aircraft based on SVDD is proposed in this paper. Some concluding remarks are summarized as follows. (1) In view of the absence of prerequisite knowledge about detector parameter selection, the GA is used to optimize the SVDD kernel parameters based on the estimation of the data sample distribution. The test results show that the classification accuracy is improved by 5.52% after using the GA. (2) The threshold scaling factor ε entered into the model can adjust model flexibility and compactness. It overcomes the problem of incomplete samples. (3) The rapid anomaly detection algorithm effectively lowers the amount of calculation needed to obtain support vectors through sample splitting, which improves training efficiency. (4) The fault detection time of the SVDD algorithm is always pre-emptive and improved by 0.4 s on average when compared to the red line shutdown system, which is superior to the BP algorithm. (5) The accurate classification rate is enhanced by 0.0225 after adopting the modified kernel function when compared to the standard kernel function. The number of support vectors is reduced by only 1. The change is small because the distribution of the empirical data had been relatively centralized, and the original training using the standard kernel function was already fairly precise. (6) The SVDD algorithm based on the modified kernel function can significantly prevent exceeding the threshold for five consecutive times and thus a false alarm. It can increase the separability between categories. (7) The method of drawing the SVDD model boundary based on equal loss is logical and effective, which involves smaller risks compared to other cutting planes and minimizes the losses caused by classifications. Acknowledgments The authors gratefully acknowledge the support of the National Natural Science Foundation of China (No. 91538204) and the Aerospace Science and Industry Foundation. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14]
David MJT, Robert PWD. Support vector domain description. Pattern Recognit Lett 1999;20:1191–9. David MJT. One-class classification. Delft University of Technology; 2001. David MJT, Robert PWD. Support vector data description. Mach Learn 2004;54:45–66. Vapnik VN. The nature of statistical learning theory. New York: Springer-Verlag; 1999. Campbell C, Bennett KP. A linear programming approach to novelty detection. Adv Neural Inf Process Syst 2001;13:395–401. Yaoming Z, Yongchao W, Shunan D, Zhijun M. Innovative soft fault diagnosis method for dual-redundancy sensors. Sensor Rev 2016;36(1):14–22 2016. Huina L, Yuan P. Recent advances in support vector clustering: theory and application. Int J Pattern Recognit Artif Intel 2015;29. doi:10.1142/ S02180 0141550 0 020. Na SG, Jeon JH, Han IJ, Heo H. Fault detection algorithm of hybrid electric vehicle using SVDD. In: Korean society for noise and vibration engineering fall conf.; 2011. p. 224–9. Yan L, David MJT, Robert PWD, Macro L. Multiple-instance learning as a classifier combining problem. Pattern Recognit 2013;46(3):865–74. David MJT, Robert PWD. Multiple-instance learning as a classifier combining problem. Pattern Recognit 2013;46(3). Xiaoming W, Shitong W. Theoretical analysis for the optimization problem of support vector data description. J Softw 2011;22(7):551–1560. Phaladiganon P, Seoung BK, Victoria CPC. A density-focused support vector data description method. Qual Reliab Eng Int 2014;30(6):879–90. Na SG, Yang IB, Heo H. Abnormality detection via SVDD technique of motor-generator system in HEV. Int J Automot Technol 2014;15:637–43. Amari S, Wu S. Improving support vector machine classifiers by modifying kernel functions. Neural Netw 1999;12(6):783–9.
94
Y. Zhou et al. / Computers and Electrical Engineering 61 (2017) 80–94
Yaoming Zhou received the Ph.D. degree of Aircraft design in 2013, from school of Aeronautic Science and Engineering, Beihang University. Now he serves as a Lecturer and Master Tutor at the School of Aeronautic Science and Engineering, Beihang University. His research interests include aircraft design, machine learning, intelligent control of unmanned aerial vehicle. His email is
[email protected]. Kan Wu received the Ph.D. degree of Industrial and Systems Engineering from Georgia Institute of Technology. Now he serves as a Assistant Professor at School of Mechanical and Aerospace Engineering, Nanyang Technological University. His current research interests are primarily in the areas of queueing theory, with applications in the performance evaluation of supply chains. Zhijun Meng is an Associate Professor in the Department of Aircraft Design of Beihang University. He received the Ph.D. degree in 2009 from school of Aeronautic Science and Engineering. His research interests include Aircraft design, Modeling/Control/Simulation of unmanned aerial vehicle and rotorcraft. Mingjun Tian received the Bachelor degree of Aircraft design in 2008 from school of Aeronautic Science and Engineering, Beihang University. Now he serves as a Engineer at AVIC Shenyang Aircraft Design and Research Institute. His research interests include Health management and Weapons system.