Aerospace Science and Technology 91 (2019) 70–81
Contents lists available at ScienceDirect
Aerospace Science and Technology www.elsevier.com/locate/aescte
Soft extreme learning machine for fault detection of aircraft engine Yong-Ping Zhao a,∗ , Gong Huang a , Qian-Kun Hu a , Jian-Feng Tan a , Jian-Jun Wang b , Zhe Yang b a b
College of Energy and Power Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China Chinese flight test establishment, Xi’an 710089, China
a r t i c l e
i n f o
Article history: Received 17 July 2018 Received in revised form 8 March 2019 Accepted 7 May 2019 Available online 13 May 2019 Keywords: Fault detection Aircraft engine Extreme learning machine Imbalanced classification Machine learning
a b s t r a c t When extreme learning machine (ELM) is used to cope with classification problems, the ±1 is commonly used to construct the label vector. Since ELM adopts the square loss function, this means that it tends to force the margins of all the training samples exactly equaling one from the perspective of margin learning theory, which is unreasonable to some extent. To overcome this hard margin flaw, in this paper a soft extreme learning machine (SELM) is proposed, which flexibly sets a soft target margin for each training sample. Through solving a series of regularized ELMs (RELMs), SELM can be computed efficiently. Based on SELM, an improved SELM (ISELM) is proposed to deal with imbalanced classification problems, which can keep the same computational efficiency as SELM via solving a series of weighted RELMs. From the experimental results on benchmark data sets, the effectiveness and feasibility of SELM and ISELM are confirmed. More importantly, when they are applied to fault detection of aircraft engine, they are promising to be developed as the candidate techniques for it, and ISELM is especially in favor. © 2019 Elsevier Masson SAS. All rights reserved.
1. Motivation Aircraft engine is the power source of airplanes, so its safety and reliability is very important. Any malfunction occurs in the aircraft engine, which is usually accompanied with serious air disaster. This always leads to a lot of life deaths and economic losses. Hence, it is necessary and significant to develop fault diagnosis techniques for aircraft engines. Advanced fault diagnosis techniques not only avoid these misfortunes, but also perform important condition-based maintenance decisions and actions, which can drop the costs due to stopped or aborted flights [1–4]. Generally speaking, fault diagnosis techniques are mainly divided into two groups, i.e., model-based and data-driven techniques [5]. They both have been extensively researched so far. In model-based group, Kalman filters are quite popular [6–9]. Such techniques need a detailed mathematical model of the aircraft engine. Their advantages show in the on-board and real-time implementation capabilities, but their reliability decreases as the system nonlinearities, complexity, and modeling uncertainties increase. Data-driven techniques, such as those on basis of machine learning algorithms [5, 10–16], mostly depend on real-time or collected historical data
*
Corresponding author. E-mail address:
[email protected] (Y.-P. Zhao).
https://doi.org/10.1016/j.ast.2019.05.021 1270-9638/© 2019 Elsevier Masson SAS. All rights reserved.
from the engine sensors and measurements, and do not require a detailed mathematical model of the aircraft engine. Belonging to this group, neural networks are a promising tool for fault diagnosis due to their great success in system identification and excellent capability in nonlinear transformations. Hence, in this paper, the popular neural networks are investigated as the potential fault diagnosis techniques for aircraft engine. In the neural network field, single hidden layer feedforward network (SLFN) has been extensively studied, and a large number of algorithms have been proposed for it. Roughly speaking, these algorithms can be divided into three groups. In the first group, the gradient-based strategies are used to optimize the weights in the network, like error back-propagation algorithm [17], LevenbergMarquardt algorithm [18]. However, these gradient-based methods usually hold the risk of slow convergence or local minima issues. In the second group, the standard optimization procedure is adopted, such as in the support vector machine (SVM) [19,20]. SVM has a solid mathematical background, that is to say, it seeks to find the solution of maximizing the width of two parallel support hyperplanes while minimizing the training errors by solving a quadratic programming problem. Generally, the computational complexity of this kind of algorithms is relatively high. The third group mainly consists of the least square based algorithms, such as the random vector functional-link neural network (RVFLNN) [21–23], and extreme learning machine (ELM) [24]. To be specific, ELM can be
Y.-P. Zhao et al. / Aerospace Science and Technology 91 (2019) 70–81
regarded as a special case of RVFLNN. Compared with those in the former two groups, the algorithms belonging to the third group show remarkable advantage in the computational cost and implementation. Hence, as a representative, ELM has attracted a great deal of attention from the machine learning community and obtained a wide range of applications in classification and regression. In ELM, input network weights and hidden biases are randomly generated while output weights are obtained by solving a linear system in the least square sense. To achieve this, ELM force its outputs to equal the targets. Specifically speaking, taking a single output example, ELM let
o j = f (x j ) =
#HN
θi h(ai , bi ; x j ) = t j
(1)
i =1
where o j represents the network output, h(·) is the activation function, #HN is the number of hidden nodes, θi is the weight connecting the ith hidden node to the output node, ai and b i are the randomly generated parameters of hidden nodes, t j is the target of x j . Given a training set {(x j , t j )} Nj=1 with x j ∈ n and t j ∈ , (1) is expressed compactly as
Hθ = t
(2)
where θ = [θ1 , · · · , θ#HN ] , t = [t 1 , · · · , t N ] ,
⎡
⎤ · · · h(a#HN , b#HN ; x1 ) ⎢ ⎥ .. .. .. H =⎣ ⎦ . . . h(a1 , b1 ; x N ) · · · h(a#HN , b#HN ; x N ) h(a1 , b1 ; x1 )
(3)
is the so-called hidden nodes output matrix. Finding the solution of (2) in a least square sense is equivalent to solving the following optimal problem
min JELM (θ ) = θ
1 2
e 22
(4)
s.t . t = H θ + e where e is the errors vector. From (4), notice that ELM adopts the square loss function. When it is employed to cope with regression problems, that is, the targets are continuous, ELM in (4) is sound, and it is usually mathematically tractable and computationally efficient. However, when ELM is faced with classification problems, where the target t j is encoded with -1 or 1, that is, t is a label vector, the final model becomes
f (x) = sign
#HN
θi h(ai , bi ; x)
(5)
i =1
where sign(·) = 1 when the condition in brackets is not less than zero and otherwise sign(·) = −1. It is plausible to get (5) by solving (4), but there seem be some unreasonable aspects, which can be discussed further. For example, if one instance x j is classified into
#HN =1 θi h (ai , b i ; x j ) 0.
i#HN That is to say, it is not necessary that i =1 θi h (ai , b i ; x j ) ex #HN actly equals one. However, in (4) ELM forces i =1 θi h (ai , b i ; x j ) positive class, its judgement criterion is
to approach to one. This mandatory doing seems too rigorous and unreasonable. According to the margin learning theory, this behavior in ELM is to let the margins of all the training samples be equal to ones. In other words, in (4) a hard margin is adopted. Obviously, this seriously violates the maximum margin principle followed by the machine learning algorithms, such as SVM, which will impair the generalization performance. That is, (4) is not quite suitable
71
for discrete label classification task from a perspective of statistical learning theory. Hence, it is expected that some improvements can be taken to relax the strict requirement in (4) on classification problems. Motivated by this, this paper makes the following contributions as (1) A soft extreme learning machine (SELM) is proposed, which can overcome the shortcoming of the conventional ELM in (4) while dealing with classification problems. That is to say, the strict constraint on the label vector is cancelled, which signifies that a soft margin is adopted for the training samples. As a result, SELM not only improves the generalization performance but also keeps the high computational efficiency via solving a series of linear systems. (2) To cope with imbalanced classification problems [25], based on SELM, an improved SELM (ISELM) is proposed. In comparison with SELM, ISELM further enhances the generalization performance on imbalanced classification problems. Actually, SELM is a special case of ISELM. (3) To verify the effectiveness and feasibility of the proposed SELM and ISELM, several benchmark data sets are utilized to do experiments. From the experimental investigations, our viewpoints are supported. (4) Fault detection is very important for aircraft engine, which can boost its safety and reliability greatly. Fault detection of aircraft engine is an imbalanced classification issue from the perspective of machine learning. Hence, in this paper both SELM and ISELM are applied to fault detection of aircraft engine. From the experimental results, they can be developed as the candidate techniques for fault detection of aircraft engine. Especially for ISELM, it is devised to solve imbalanced classification problems, such as fault detection. This paper is structured as follows. In section 2, ELM and its regularized version are introduced in brief. In the following section, the mathematical model of SELM and its solving method are dwelled on, and its flowchart and computational complexity are given. To cope with imbalanced classification problems well, based on SELM, ISELM are elaborated on in section 4. In section 5, ELM, RELM, SELM, ISELM, and SVM are tested on benchmark data sets. From the experimental results, the effectiveness of the proposed algorithms in this paper are confirmed. In section 6, ELM, RELM, SELM, ISELM, and SVM are applied to fault detection of aircraft engine. According to the application results, SELM and ISELM are preferred, which can be developed as the candidate techniques for fault detection of aircraft engine. Finally, conclusions follow. 2. ELM For a binary classification problem with a training set
{(x j , t j )} Nj=1 , where x j ∈ n and t j ∈ {+1, −1}, when ELM in (4) is used to approximate it, the solution is
θˆ = H † t
(6)
where H † = ( H H )−1 H when H H is nonsingular, or H † = H ( H H )−1 when H H is nonsingular. The least square solution in (6) not only minimizes the training error, but also minimizes the norm of output weights, i.e., θ 22 [26]. Then the decision function of ELM can be calculated as
f ELM (x) = sign
#HN
θˆi h(ai , bi ; x)
(7)
i =1
According to Bartlett’s conclusion [27], the neural networks with the smaller norm of weights tend to suggest better generalization performance. To realize the weight decay, based on
72
Y.-P. Zhao et al. / Aerospace Science and Technology 91 (2019) 70–81
Tikhonov regularization theory [28], the regularized ELM (RELM) [29,30] is obtained as
1
min JRELM (θ ) =
2
θ
e 22
λ
+
2
θ 22
(8)
s.t . t = H θ + e
+ is the regularization parameter controlling the
where λ ∈ tradeoff between the training errors and model complexity. By solving (8), the RELM classifier is got as
f RELM (x) = sign
#HN
θˆi h(ai , bi ; x)
(9)
i =1
where
θˆ =
H H + λI
H
H H
− 1
+ λI
H t
if N #HN
t
if N < #HN
− 1
(10)
where I is an identity matrix of appropriate dimension. Remark 1. Comparing (10) with (6), it is found that ELM is a special case of RELM when λ = 0.
n+1
After θˆ is obtained, the current margin of x j at the (n + 1)th iteration is n +1 snj +1 = t j f SELM (x j ) = t j
#HN
θˆin+1 h(ai , bi ; x j )
Next, we update the target margin of xi obeying the following principles [31]: (1) If snj +1 > 1, which indicates that x j is correctly classified with a margin at least 1, then its discriminant result is satisfactory. So the target margin of x j at the (n + 1)th iteration is set with
nj +1 = snj +1
(14)
This doing places less emphasis on such samples with sufficient margins and concentrates on other samples at the next iteration, because the square loss is adopted in SELM. (2) If 0 snj +1 1, which means that x j is correctly classified with insufficient margin, then we set
nj +1 = snj +1 + 1
The mathematical model of SELM is
min JSELM (θ , ) = θ ,
1 2
e 22
+
λ 2
θ22
(11)
s.t . t = H θ + e where is a Hadamard product operator, = [1 , · · · , N ] . According to the definition of margin, in (11) is the target margin vector. Instead of compulsorily adopting the hard margin in (8), here will be optimized, which means that a soft margin is set for every training sample. It seems reasonable to some extent. Remark 2. If i = 1 (i = 1, · · · , N ), (11) is equivalent to (8). That is to say, RELM is a special case of SELM when = 1, where 1 is a vector of all ones. According to Remark 1, ELM is a special case of RELM. Naturally, ELM is also a special case of SELM. This signifies that SELM performs better than ELM and RELM in terms of the generalization performance in theory. 3.2. Solving method Although SELM takes the theoretical advantage, it loses the priority in the computational complexity. That is, (11) is harder to be optimized than (8). In this paper, a two-step method is adopted to solve it. To be specific, in the first step, is fixed and then θ is optimized. In the second step, according to the current margin, an appropriate target margin is set for the next step. In the following, we will elaborate on it.
• The first step: fixing n , optimizing θ n+1 n
Suppose that is given and fixed (the superscript n denotes the nth iteration), (11) is degenerated into (8), so
HH
+ λI
(15)
so as to encourage increasing of its margin. (3) If snj +1 < 0, i.e., x j is misclassified, then we set
3.1. Mathematical model
(13)
i =1
nj +1 = 1
3. SELM
n +1 θˆ =
n+1 • The second step: given θˆ , setting n+1
− 1
H H H + λI
H (t
− 1
) n
if N #HN
(t n ) if N < #HN
(12)
(16)
so that at the next iteration the objective could penalize misclassifications with largest penalties. Summarizing from (14) to (16), we can get
nj +1 = snj +1 + max{0, min{1, 1 − snj +1 }}
(17)
If n+1 has been got, then we can continue the next iteration until the stopping criterion is satisfied. Remark 3. For n = 0, let starter for SELM.
0 = 1. That is to say, RELM is used as the
Remark 4. In both ELM and RELM, the target margins for all the training samples are fixed with 1. On the contrary, in SELM a soft margin is adopted for different training samples. This flexible strategy is expected to make SELM obtain better generalization performance. 3.3. The flowchart of SELM The flowchart of SELM is organized in Algorithm 1. Remark 5. In Algorithm 1, the positive integer #itMax is utilized to terminate the algorithm. It is an empirical value. In our paper, #itMax 200 is set. Remark 6. From Algorithm 1, finding a solution to SELM is equivalent to solving a series of RELMs. Remark 7. In RELM, the computational complexity is usually O(#HN2 · N ) since #HN N always occurs. Except this, there is an additional loop in SELM. As thus, the extra computational complexity is O (#HN · N ), adding up the costs (listed behind the symbol
Y.-P. Zhao et al. / Aerospace Science and Technology 91 (2019) 70–81
Algorithm 1 SELM.
N
1: Input: Training samples set (x j , t j ) i =1 and activation function h(ai , b i ; x); parameters #HN, #itMax, and λ.
#HN 2: Output: f SELM (x) = i =1 θˆi#itMax h(ai , b i ; x) 3: Initialize: • Calculate the hidden nodes output matrix H according to (3); • Let n = 0, and = 1; −1 • Denote U = H H + λ I H if N #HN, otherwise U =
−1
H H H + λI ; 4: While n < #itMax do n+1 5: Calculate θˆ according to (12); % O (#HN · N ) 6: Calculate snj +1 ( j = 1, · · · , N ) according to (13); % O (#HN · N )
73
Remark 8. If m = 1 and r = 1, (18) is equivalent to (11). That is, SELM is a special case of ISELM. Hence the flowchart of ISELM is shown in Algorithm 2. Algorithm 2 ISELM.
N
1: Input: Training samples set (x j , t j ) i =1 and activation function h(ai , b i ; x); parameters #HN, #itMax, λ, r, and m. 2: Output: f ISELM (x) =
#HN i =1
θˆi#itMax h(ai , bi ; x)
3: Initialize: • Calculate the hidden nodes output matrix H according to (3); • Let n = 0, and = 1; −1 • Denote U = H H + λ I H if N #HN, otherwise
7: Compute nj +1 according to (17); % O ( N ) 8: n ← n + 1; 9: End while
H H H + λI
−1
U =
;
4: While n < #itMax do
% in each row) in the loop. When SELM halts, the total computational cost is max{O (#HN2 · N ), O (#HN · N · #itMax)}. When #HN is comparable to #itMax, SELM does not lose the competitive edge over RELM with respect to the computational complexity. For example, in our experiments in section 5, setting #HN = 100 and #itMax 200 achieves satisfactory results. 4. ISELM In theory, SELM works better than ELM and RELM. Actually, from the following experimental results, this conclusion is also supported. However, when SELM is used to cope with imbalanced classification problems, its performance is compromised to some extent. For example, SVM is a popular machine learning algorithm, which works effectively with balanced problems, when it comes to imbalanced data sets, it could often produce suboptimal results, i.e., a classifier is biased toward the majority class and has a low performance on the minority class [32]. In order to let SELM tackle the imbalanced problems very well, following the line of [33], an improved version for SELM is proposed in this paper. Its mathematical model is shown as
min JISELM (θ , ) = θ ,
1 2
e 22
λ
5:
2
(18)
where λ ∈ + is the regularization parameter, is an N × N diagonal matrix with j j = 1 if x j belongs to the majority class and otherwise j j = mr2 , where r ∈ + is the ramp coefficient, m ∈ + is the margin coefficient. Meanwhile, (17) is modified correspondingly as
nj +1 = snj +1 + max{0, min{m, m − snj +1 }}
7:
Compute
nj +1 according to (19); % O( N )
8:
n ← n + 1;
9: End while
Remark 9. Different from Algorithm 1, realizing Algorithm 2 amounts to solving a series of weighted RELMs. According to [25], this weighted strategy is equivalent to a cost-sensitive method, which is commonly used for imbalanced learning. This provides a solid theoretical foundation for ISELM. Remark 10. Comparing Algorithm 2 with 1, it is found that the computational complexity of ISELM is the same as that of SELM, i.e., max{O (#HN2 · N ), O (#HN · N · #itMax)}, so ISELM is also comparable to ELM and RELM with respect to the computational cost. 5. Benchmark data sets
To demonstrate the effectiveness of the proposed SELM and ISELM, several benchmark data sets are utilized to do experiments. Before testing, some preliminaries are introduced:
• Testing environment: A personal desktop with Intel CoreTM i7-6600U CPU 2.60 GHz processor, 8.00 GB memory, Windows 10 operating system, and MATLAB2017b. • Benchmark data sets: Balance, Statlog, Mushroom, Breast cancer, German.number, Svmguide1. The former four data sets come from the UCI data repository,1 while the latter two ones are taken from the LIBSVM machine learning data collection.2 Their specifications are tabulated in Table 1. For each data set, we randomly divide it into two subsets, i.e., one for training and the other for testing. The training set consists of the majority-class samples (the positive samples, labeled with +1) and the minority-class samples (the negative samples, labeled with -1). In order to reflect the class imbalance, the imbal#PosTr ance ratio (IR) is defined as #NegTr , where #PosTr represents the number of the positive training samples, #NegTr represents the number of the negative training samples. From the IR indexes in Table 1, these several benchmark data sets are imbalanced classification problems. In addition, due to the different dimensions, the input features are normalized into the closed interval [0, 1].
(19)
θˆin+1 h(ai , bi ; x j )
(20)
i =1
Likewise, ISELM can be solved with the two-step method.
• The first step: fixing n , optimizing θ n+1 Similar to (12), θˆ n +1 θˆ =
n+1
is obtained as
H H + λ I
− 1
H (t n )
− 1 H H H + λ I (t
according to (21); % O (#HN · N ) ( j = 1, · · · , N ) according to (20); % O (#HN · N )
Calculate
6:
where #HN
n+1
snj +1
5.1. Preliminaries
+ θ θ
s.t . t = H θ + e
n +1 snj +1 = t j f ISELM (x j ) = t j
Calculate θˆ
if N #HN
) if N < #HN n
(21)
n+1 • The second step: given θˆ , setting n+1 1
n+1 is set using (19).
2
http://archive.ics.uci.edu/ml/. https://www.csie.ntu.edu.tw/%7Ecjlin/libsvmtools/datasets/.
74
Y.-P. Zhao et al. / Aerospace Science and Technology 91 (2019) 70–81
Table 1 Specifications of benchmark data sets. Data sets
#Training (#PosTr+#NegTr)
#Testing
#Features
#PosTr = #NegTr
36
107 = 10.70 10 173 = 12.36 14 222 = 18.50 12 350 = 23.33 15 842 = 21.59 39 2000 = 21.51 93
Statlog
117 (107+10)
1993
Balance
187 (173+14)
389
4
Breast cancer
234 (222+12)
449
10
German.number
365 (350+15)
635
24
Mushroom
881 (842+39)
7243
20
Svmguide1
2093 (2000+93)
4996
4
IR
Notes: #Training represents the number of the training samples, #PosTr represents the number of the positive training samples, #NegTr represents the number of the negative training samples, #Testing represents the number of the testing samples, #Features represents the number of input features, IR is the majority-to-minority imbalance ratio.
• Algorithms: ELM, RELM, SELM, ISELM, SVM. For the former four algorithms, two types of hidden nodes, i.e., the sigmoid 1 h(x) = and the RBF h(x) = exp{−b i x − ai 22 }, are 1+exp{−ai x}
used to do experiments, respectively, where ai is randomly chosen from the range [-1, 1], b i is chosen from the range (0, 0.5) [34]. #HN=100 is set, which considers the tradeoff between the generalization performance and the testing time. As for the model selection problem, in RELM, the regularization parameter λ is decided using 10-fold cross validation technique [35] from the candidate set {2−40 , · · · , 20 , · · · , 210 }. Besides λ, there is another parameter #itMax to be determined in SELM. First, let #itMax=200 and use 10-fold cross validation technique to determine λ from {2−40 , · · · , 20 , · · · , 210 }, and then fixing the optimal λ, an appropriate #itMax is chosen from 1 to 200 also with 10-fold cross validation technique. ISELM adopts the similar strategy to determine λ, r, m, and #itMax to SELM. The different is that the combination of λ, r, and m is chosen from {2−40 , · · · , 20 , · · · , 210 } × {1, · · · , 10 } ×
{1, · · · , 10}. For SVM, the RBF k(xi , x j ) = exp −
xi −x j 22 2γ 2
is
chosen as the kernel function, where γ is the kernel parameter, combining the regularization parameter C in SVM, which are decided from {2−5 , · · · , 25 } × {20 , · · · , 220 } using 10-fold cross validation [36]. Since the sigmoid function does not satisfy the Mercer condition, it can not be chosen as the kernel function for SVM [37,38]. Table 2 lists the model selection results, in which those for SVM are in brackets. Thereinto, #SV represents the number of support vectors. • Performance metric: G-mean, defined by
G-mean =
TP TP + FN
×
TN TN + FP
(22)
where TP is the number of true positives, FN is the number of false negatives, FP is the number of false positives, and TN is the number true negatives. This metric is commonly used to evaluate the algorithms while dealing with the imbalanced classification problems. A larger G-mean always means a better result. To find the average performance rather than the best one, thirty trials are conducted for each data set with every algorithm.
5.2. Experimental analyses Fig. 1 shows the experimental results on benchmark data sets. From this figure, ISELM works best with respect to the generalization performance. On the contrary, ELM performs worst. In comparison with ELM, RELM is superior, but it loses advantage over SELM in terms of the generalization performance. These conclusions are obtained from the fact ELM⊂RELM⊂SELM⊂ISELM. These
Fig. 1. G-mean of every algorithm on each benchmark data set.
experimental results are consistent with the theoretical discussions in the above sections. In addition, SVM is between RELM and SELM with respect to the generalization performance. The detailed information is given in Table 3. Table 3 not only lists the G-mean values, but also tabulates the computational complexity. As for the training time, ELM and RELM are similar because they only need to solve a linear system. Comparatively speaking, SELM and ISELM need more training
Y.-P. Zhao et al. / Aerospace Science and Technology 91 (2019) 70–81
75
Table 2 The model selection results on benchmark data sets for RELM, SELM, ISELM, and SVM. Data sets
The type of hidden nodes
Algorithms
λ(C )
m(γ )
r
#itMax(#SV)
Statlog
sigmoid
RELM SELM ISELM
2−9 2−2 20
N/A N/A 1
N/A N/A 9
N/A 14 6
RBF
RELM SELM ISELM SVM
2−8 2−10 2−11 21
N/A N/A 1 21
N/A N/A 10 N/A
N/A 32 5 19
sigmoid
RELM SELM ISELM
2−18 2−11 2−2
N/A N/A 1
N/A N/A 10
N/A 195 8
RBF
RELM SELM ISELM SVM
2−19 2−8 20 27
N/A N/A 1 21
N/A N/A 10 N/A
N/A 198 59 11
sigmoid
RELM SELM ISELM
20 21 22
N/A N/A 1
N/A N/A 10
N/A 17 6
RBF
RELM SELM ISELM SVM
20 20 23 24
N/A N/A 1 22
N/A N/A 10 N/A
N/A 17 9 10
sigmoid
RELM SELM ISELM
2−40 2−40 22
N/A N/A 1
N/A N/A 10
N/A 197 9
RBF
RELM SELM ISELM SVM
2−40 2−26 2−4 215
N/A N/A 1 23
N/A N/A 10 N/A
N/A 196 16 59
sigmoid
RELM SELM ISELM
2−40 2−10 2−6
N/A N/A 1
N/A N/A 10
N/A 34 2
RBF
RELM SELM ISELM SVM
2−20 2−18 2−15 215
N/A N/A 1 24
N/A N/A 10 N/A
N/A 17 2 41
sigmoid
RELM SELM ISELM
2−40 2−40 2−9
N/A N/A 1
N/A N/A 10
N/A 200 17
RBF
RELM SELM ISELM SVM
2−39 2−38 2−5 29
N/A N/A 1 21
N/A N/A 10 N/A
N/A 199 58 82
Balance
Breast cancer
German.number
Mushroom
Svmguide1
time since they require to solve a series of RELMs or weighted RELMs. Generally, their training time is related to #itMax. That is, the larger #itMax signifies the more training time for SELM and ISELM. This means that SELM and ISELM can obtain comparable computational efficiency to ELM and RELM if #itMax is not large. From Table 2, #itMax, especially for ISELM, is usually small, which sufficiently guarantees their efficient computation. Due to adopting the same #HN, ELM, RELM, SELM, and ISELM require almost the same testing time. As for SVM, it needs more training time as #Training increases. For example, when #Training is small, such as in Statlog, Balance, and Breast cancer, SVM is on the same level as ELM and RELM. However, for Mushroom and Svmguide1, it even requires more training time than SELM and ISELM. Moreover, past research [39] shows that #SV, which is in direct proportion to the testing time, scales linearly with #Training in SVM. Regarding this point, this paper justifies it again, that is, the testing time increases with a growing #Training. Hence, it is not difficult to expect that SVM
will take more training time and testing time than ELM, RELM, SELM, and ISELM for large scale problems. All in all, the proposed SELM and ISELM in this paper not only retain the comparable computational complexity, but also improve the generalization performance, especially for the imbalanced classification problems, which is the motivation of developing them. 6. Fault detection of aircraft engine From the experimental investigations on benchmark data sets in section 5, SELM and ISELM show excellent on the imbalanced classification problems. As known, fault detection of aircraft engine is an imbalanced task, since the fault data are rarely obtained and very valuable because of their acquisition usually along with a huge price. Hence, this paper tries to apply the proposed SELM and ISELM into fault detection of aircraft engine. It is expected that they can be developed as the candidate techniques for the fault detection of aircraft engine.
76
Y.-P. Zhao et al. / Aerospace Science and Technology 91 (2019) 70–81
Table 3 The experimental results on benchmark data sets. Data sets
The type of hidden nodes
Algorithms
G-mean
Training time (sec.)
Testing time (sec.)
Statlog
sigmoid
ELM RELM SELM ISELM
0.9440±0.0149 0.9809±0.0065 0.9922±0.0030 0.9993±0.0007
0.002±0.001 0.002±0.001 0.005±0.002 0.004±0.002
0.007±0.002 0.007±0.002 0.006±0.001 0.007±0.002
RBF
ELM RELM SELM ISELM SVM
0.8094±0.0625 0.9572±0.0111 0.9840±0.0085 0.9927±0.0029 0.9799±0.0000
0.004±0.003 0.004±0.003 0.010±0.001 0.004±0.001 0.002±0.000
0.021±0.004 0.021±0.003 0.016±0.002 0.020±0.002 0.008±0.000
sigmoid
ELM RELM SELM ISELM
0.7300±0.0258 0.8005±0.0195 0.9280±0.0074 0.9603±0.0010
0.001±0.000 0.001±0.002 0.037±0.004 0.008±0.005
0.001±0.000 0.001±0.000 0.001±0.000 0.001±0.000
RBF
ELM RELM SELM ISELM SVM
0.7174±0.0191 0.7922±0.0229 0.9249±0.0046 0.9545±0.0029 0.9001±0.0000
0.003±0.002 0.003±0.002 0.036±0.003 0.020±0.003 0.004±0.000
0.003±0.001 0.003±0.003 0.001±0.000 0.001±0.000 0.001±0.000
sigmoid
ELM RELM SELM ISELM
0.7412±0.0639 0.9109±0.0026 0.9229±0.0030 0.9479±0.0018
0.003±0.002 0.002±0.002 0.008±0.002 0.004±0.002
0.001±0.001 0.002±0.001 0.002±0.001 0.001±0.001
RBF
ELM RELM SELM ISELM SVM
0.7168±0.0589 0.9161±0.0035 0.9316±0.0012 0.9550±0.0031 0.9328±0.0000
0.005±0.002 0.004±0.002 0.007±0.002 0.007±0.003 0.004±0.000
0.004±0.001 0.004±0.001 0.003±0.001 0.002±0.001 0.001±0.000
sigmoid
ELM RELM SELM ISELM
0.1520±0.0356 0.1520±0.0356 0.3927±0.0322 0.4835±0.0125
0.004±0.002 0.003±0.001 0.038±0.005 0.007±0.002
0.002±0.001 0.002±0.001 0.001±0.000 0.002±0.001
RBF
ELM RELM SELM ISELM SVM
0.1981±0.0281 0.1981±0.0281 0.4008±0.0315 0.4809±0.0185 0.3614±0.0000
0.008±0.004 0.008±0.005 0.070±0.007 0.021±0.006 0.127±0.000
0.007±0.002 0.007±0.003 0.004±0.000 0.006±0.002 0.003±0.000
sigmoid
ELM RELM SELM ISELM
0.9389±0.0117 0.9389±0.0117 0.9594±0.0118 0.9674±0.0064
0.006±0.002 0.004±0.002 0.014±0.002 0.016±0.002
0.020±0.002 0.021±0.002 0.018±0.002 0.019±0.002
RBF
ELM RELM SELM ISELM SVM
0.9279±0.0158 0.9316±0.0156 0.9490±0.0123 0.9667±0.0089 0.9489±0.0000
0.011±0.002 0.009±0.003 0.011±0.002 0.018±0.001 0.429±0.000
0.075±0.010 0.076±0.008 0.069±0.003 0.065±0.002 0.031±0.000
sigmoid
ELM RELM SELM ISELM
0.8649±0.0033 0.8644±0.0029 0.9177±0.0041 0.9612±0.0007
0.010±0.002 0.007±0.002 0.360±0.106 0.049±0.004
0.010±0.002 0.010±0.001 0.009±0.001 0.008±0.000
RBF
ELM RELM SELM ISELM SVM
0.8680±0.0035 0.8657±0.0041 0.9202±0.0014 0.9615±0.0007 0.9100±0.0000
0.017±0.005 0.013±0.003 0.740±0.206 0.074±0.005 0.347±0.000
0.015±0.002 0.016±0.001 0.015±0.002 0.011±0.002 0.017±0.000
Balance
Breast cancer
German.number
Mushroom
Svmguide1
6.1. Engine description In this paper, the application object is a two-spool turbojet engine, whose schematic is plotted in Fig. 2. This engine mainly consists of seven components, i.e., inlet, low pressure compressor (LPC), high pressure compressor (HPC), combustor, high pressure
turbine (HPT), low pressure turbine (LPT), and nozzle. The inlet is used to regulate and provide fresh air. When the air passes through LPC and HPC sequentially, its pressure and temperature is enhanced. In the combustor, the high temperature and high pressure air is mixed with fuel and then ignited, so the hot gas with very high pressure is generated. When the hot gas flows through
Y.-P. Zhao et al. / Aerospace Science and Technology 91 (2019) 70–81
77
Fig. 2. Turbojet schematic with station numbering.
Table 4 Couple factors of aircraft engine components. Engine components
Couple factors
Notes
LPC
1.10
Coupled LPC (-1% efficiency, -1.10% flow capacity)
HPC
0.80
Coupled HPC (-1% efficiency, -0.80% flow capacity)
HPT
-2.30
Coupled HPT (-1% efficiency, +2.30% flow capacity)
LPT
-1.60
Coupled LPT (-1% efficiency, +1.60% flow capacity)
Table 5 Specifications of seven cases for fault detection of aircraft engine. Data sets
#Training=#PosTr+#NegTr (#Normal+#LPC+#HPC+#HPT+#LPT)
#Testing
#Features
#PosTr = #NegTr
Case1
441=349+23×4 (349+23+23+23+23)
11188
14
349 23×4
= 3.79
Case2
397=349+12×4 (349+12+12+12+12)
11232
14
349 12×4
= 7.27
Case3
377=349+7×4 (349+7+7+7+7)
11252
14
349 7×4
= 12.46
Case4
369=349+5×4 (349+5+5+5+5)
11260
14
349 5×4
= 17.45
Case5
361=349+3×4 (349+3+3+3+3)
11268
14
349 3×4
= 29.08
Case6
357=349+2×4 (349+2+2+2+2)
11272
14
349 2×4
= 43.63
Case7
353=349+1×4 (349+1+1+1+1)
11276
14
349 1×4
= 87.25
IR
Notes: #Training represents the number of the training samples, #Normal represents the number of the normal states for training, #LPC represents the number of LPC faults for training, #HPC represents the number of HPC faults for training, #HPT represents the number of HPT faults for training, #LPT represents the number of LPT faults for training, #PosTr represents the number of the positive training samples, #PosTr=#Normal, #NegTr represents the number of the negative training samples, #NegTr=#LPC+#HPC+#HPT+#LPT, #Training=#PosTr+#NegTr=#Normal+#LPC+#HPC+#HPT+#LPT, #Testing represents the number of the testing samples, #Features represents the number of input features, IR is the majority-to-minority imbalance ratio.
HPT and LPT, the high pressure spool and low pressure spool are driven, respectively, which supply power for HPC and LPC, correspondingly. Finally, the exhaust gas is injected into the atmosphere with high speed, so the thrust is generated for aircrafts. To facilitate expression, the engine station numbering is denoted in Fig. 2 as: inlet exit by 2, LPC exit by 22, HPC exit by 3, HPT exit by 42, and LPT exit by 46. 6.2. Data acquisition ELM, RELM, SELM, and ISELM are data-driven machine learning algorithms, so the first thing is to acquire the engine data, including the normal states and the faults. In general, the normal
engine states are easy to obtain, but it is difficult to gain the engine faults. When a fault, such as mechanical fatigue fracture, erosion, corrosion, foreign object damage, and so on, occurs on one component of aircraft engine, its corresponding health parameters like efficiency and flow capacity will deviate from their normal values in the component thermodynamic maps. That is, these health parameters can reflect the aircraft engine health states. Hence, the engine faults can be obtained via changing these parameters. According to previous experience, a component fault can be defined as the reduction on its efficiency parameter by 1% [40], meanwhile considering the couple factor between efficiency and flow capacity (see Table 4) [41], because they are not independent when a fault occurs. In reality, the health parameters of aircraft engine compo-
78
Y.-P. Zhao et al. / Aerospace Science and Technology 91 (2019) 70–81
nents can not be obtained, because they are not sensed. Therefore, to realize fault detection, their closely related measurements like temperature and pressure are acquired. In this paper, besides the normal states, we sample four kinds of aircraft engine faults, including LPC faults, HPC faults, HPT faults, and LPT faults, because they easily break down due to the harsh working environment, i.e., high temperature, high pressure, and high rotation speed. When a component fault occurs, its corresponding efficiency parameter is reduced by 1%. By this way, the fault injections are simulated. As thus, 11629 simulation data, consisting of 2324 normal states, 2322 LPC faults, 2324 HPC faults, 2333 HPT faults, and 2326 LPT faults, are obtained from a component level model of two-spool turbojet engine in the whole flight envelope via changing flight height and Mach number. Each datum has 14 variables, which are flight height, flight Mach number, the rotation speed of high pressure spool, the rotation speed of low pressure spool, T22, P22, T3, P3, T42, P42, T46, P46, the temperature ratio (T46/T2), and fuel flow, where T22 represents the temperature of LPC exit, P22 represents the pressure of LPC exit, the rest similar variables follow the same line. 6.3. Testing results After acquiring the aircraft engine data, we normalize the input features into the closed interval [0, 1] due to different dimensions, and then we randomly divide them into the training subset and the testing subset with seven cases. From case 1 to 7, the IR increases. Their specifications are showcased in Table 5. For each case, the number of training samples (#Training) is far less than the number of testing samples (#Testing). The main reason is that in reality the aircraft engine samples, especially the engine faults, are very difficult to obtain. Here, the engine normal states are viewed as the positive samples, and the engine faults as the negative samples. That is, fault detection of aircraft engine is an imbalanced classification problem. When ELM, RELM, SELM, and ISELM are applied to fault detection of aircraft engine, #HN=100 is set for them. In addition, the model section results of RELM, SELM, ISELM, and SVM are shown in Table 6. Fig. 3 plots the testing results, i.e., G-mean vs IR, on fault detection of aircraft engine. From this figure, SVM shows worst with respect to the generalization performance. The other four algorithms hold the rank ISELM>SELM>RELM>ELM. These conclusions are nearly consistent with those on benchmark data sets. To be more important, with an increasing IR, their differences among them become very obvious. In other words, the proposed SELM and ISELM are more suitable for fault detection of aircraft engine than ELM, RELM, and SVM. ISELM is especially devised for imbalanced classification problems like fault detection of aircraft engine. Table 7 dwells on the experimental results on fault detection of aircraft engine. According to this table, ELM and RELM require nearly the same training time, while SELM and ISELM need more training time, because ELM and RELM solve a linear system only once, but there is a loop in Algorithm 1 and 2, which indicates that SELM and ISELM need solve a series of linear systems. Comparably speaking, the training burden of ISELM is lighter than that of SELM, since the former usually needs a smaller #itMax than the later. SVM needs most training time among these algorithms. Since these algorithms, viz., ELM, RELM, SELM, ISELM, and SVM, are offline, hence the training complexity is not a bottleneck for them. As for the testing time, ELM, RELM, SELM, and ISELM need almost the same testing time because the same #HN is set for them. SVM is most cheapest in terms of the testing time due to #SV<#HN.
Fig. 3. G-mean vs IR.
In a nutshell, SELM and ISELM enhance the generalization performance compared with ELM, RELM, and SVM when facing imbalanced classification problems, so they can be developed as the candidate techniques for fault detection of aircraft engine. Thereinto, ISELM is specially in favor for fault detection of aircraft engine from the experimental analyses above. 7. Conclusions As an SLFN, ELM gains fast training speed due to the following aspects: one is that the hidden layer is randomly initialized; the other is that the output weights are analytically determined by solving a least squares solution, which implies that the square loss function is adopted in ELM. When ELM is used to deal with regression problems, the square loss function is a good choice due to the continuous targets. However, for classification tasks, where the targets are discrete values like ±1, if the square loss function is still used, there exists one unreasonable point to some extent. That is, ELM forces the target margins of all the training samples approaching to one from the viewpoint of margin learning theory. To ease this drawback, some improvements can be taken. On one side, an alternate loss function is adopted in ELM, but its training speed will be impaired. On the other side, the behavior of using the discrete values such as ±1 as the targets is cancelled. That is, a soft (continuous) target margin is set for each train-
Y.-P. Zhao et al. / Aerospace Science and Technology 91 (2019) 70–81
79
Table 6 The model selection results on aircraft engine for RELM, SELM, ISELM, and SVM. Data sets
The type of hidden nodes
Algorithms
λ(C )
m(γ )
r
#itMax(#SV)
Case1
sigmoid
RELM SELM ISELM
2−27 2−30 2−27
N/A N/A 3
N/A N/A 4
N/A 10 25
RBF
RELM SELM ISELM SVM
2−29 2−33 2−30 218
N/A N/A 4 21
N/A N/A 7 N/A
N/A 4 16 55
sigmoid
RELM SELM ISELM
2−27 2−35 2−28
N/A N/A 4
N/A N/A 8
N/A 7 170
RBF
RELM SELM ISELM SVM
2−38 2−25 2−29 219
N/A N/A 3 21
N/A N/A 8 N/A
N/A 12 19 46
sigmoid
RELM SELM ISELM
2−27 2−37 2−28
N/A N/A 2
N/A N/A 7
N/A 200 9
RBF
RELM SELM ISELM SVM
2−30 2−32 2−28 219
N/A N/A 2 21
N/A N/A 7 N/A
N/A 42 5 36
sigmoid
RELM SELM ISELM
2−25 2−33 2−25
N/A N/A 2
N/A N/A 8
N/A 200 22
RBF
RELM SELM ISELM SVM
2−30 2−38 2−27 220
N/A N/A 2 22
N/A N/A 9 N/A
N/A 200 8 35
sigmoid
RELM SELM ISELM
2−25 2−32 2−26
N/A N/A 1
N/A N/A 5
N/A 199 20
RBF
RELM SELM ISELM SVM
2−29 2−39 2−26 218
N/A N/A 1 21
N/A N/A 4 N/A
N/A 200 12 30
sigmoid
RELM SELM ISELM
2−37 2−31 2−19
N/A N/A 1
N/A N/A 6
N/A 200 50
RBF
RELM SELM ISELM SVM
2−27 2−33 2−25 220
N/A N/A 1 22
N/A N/A 10 N/A
N/A 187 51 19
sigmoid
RELM SELM ISELM
2−23 2−30 2−21
N/A N/A 1
N/A N/A 10
N/A 200 34
RBF
RELM SELM ISELM SVM
2−26 2−30 2−25 219
N/A N/A 1 22
N/A N/A 10 N/A
N/A 200 110 15
Case2
Case3
Case4
Case5
Case6
Case7
ing sample, and thus a novel SELM for classification problems is proposed. As a result, SELM improves the generalization performance and only needs to solve a series of RELMs, so it can obtain the comparable computational complexity to ELM and RELM. To cope with imbalanced classification problems, ISELM is proposed. ISELM further enhances the generalization performance, and keeps the same computational complexity as SELM in theory by solving a series of weight RELMs. To test the effectiveness and feasibility of the proposed SELM and ISELM, several benchmark data sets are employed to do experiments. From the testing results, ISELM
performs best among all the algorithms in the generalization performance, and ELM shows worst. In addition, SELM works better than RELM, and SVM is between SELM and RELM. With regard to the training time, SELM and ISELM need more than ELM and RELM. In comparison, ISELM needs less training time than SELM because the former requires a smaller #itMax. In a word, ISELM takes the advantages of the generalization performance and training time over SELM. When they are applied to fault detection of aircraft engine, ISELM and SELM are superior to ELM, RELM, and SVM with respect to the generalization performance, which im-
80
Y.-P. Zhao et al. / Aerospace Science and Technology 91 (2019) 70–81
Table 7 The experimental results on aircraft engine. Data sets
The type of hidden nodes
Algorithms
G-mean
Training time (sec.)
Testing time (sec.)
Case1
sigmoid
ELM RELM SELM ISELM
0.9665±0.0136 0.9670±0.0138 0.9703±0.0116 0.9737±0.0105
0.004±0.002 0.003±0.001 0.007±0.003 0.017±0.002
0.027±0.004 0.029±0.004 0.027±0.003 0.024±0.003
RBF
ELM RELM SELM ISELM SVM
0.9630±0.0122 0.9647±0.0118 0.9656±0.0120 0.9701±0.0104 0.9546±0.0000
0.005±0.001 0.005±0.002 0.004±0.000 0.011±0.001 17.755±0.000
0.080±0.004 0.080±0.004 0.077±0.002 0.076±0.005 0.049±0.000
sigmoid
ELM RELM SELM ISELM
0.9423±0.0144 0.9436±0.0145 0.9511±0.0141 0.9545±0.0110
0.003±0.002 0.002±0.000 0.005±0.003 0.068±0.010
0.023±0.003 0.024±0.002 0.026±0.004 0.020±0.002
RBF
ELM RELM SELM ISELM SVM
0.9407±0.0147 0.9407±0.0148 0.9452±0.0183 0.9500±0.0165 0.8960±0.0000
0.005±0.002 0.003±0.000 0.007±0.002 0.012±0.003 22.169±0.000
0.080±0.004 0.079±0.002 0.085±0.013 0.080±0.008 0.047±0.000
sigmoid
ELM RELM SELM ISELM
0.9275±0.0144 0.9294±0.0140 0.9464±0.0178 0.9529±0.0145
0.003±0.002 0.003±0.002 0.043±0.003 0.007±0.001
0.023±0.003 0.023±0.003 0.019±0.002 0.021±0.001
RBF
ELM RELM SELM ISELM SVM
0.9347±0.0182 0.9365±0.0162 0.9495±0.0178 0.9543±0.0143 0.8279±0.0000
0.005±0.002 0.004±0.001 0.012±0.002 0.006±0.001 4.245±0.000
0.080±0.005 0.080±0.004 0.073±0.003 0.077±0.003 0.040±0.000
sigmoid
ELM RELM SELM ISELM
0.8851±0.0263 0.8985±0.0106 0.9153±0.0204 0.9325±0.0128
0.003±0.002 0.003±0.002 0.049±0.005 0.013±0.003
0.027±0.004 0.030±0.002 0.020±0.003 0.026±0.003
RBF
ELM RELM SELM ISELM SVM
0.9095±0.0152 0.9124±0.0156 0.9285±0.0205 0.9410±0.0185 0.7994±0.0000
0.005±0.001 0.004±0.002 0.040±0.002 0.007±0.001 7.037±0.000
0.079±0.002 0.081±0.004 0.071±0.002 0.079±0.004 0.042±0.000
sigmoid
ELM RELM SELM ISELM
0.8081±0.0283 0.8280±0.0170 0.8651±0.0155 0.8767±0.0180
0.003±0.001 0.002±0.001 0.046±0.003 0.011±0.001
0.027±0.004 0.029±0.003 0.021±0.002 0.024±0.003
RBF
ELM RELM SELM ISELM SVM
0.8358±0.0372 0.8419±0.0300 0.8773±0.0440 0.8985±0.0358 0.7708±0.0000
0.005±0.002 0.003±0.001 0.045±0.001 0.011±0.002 1.528±0.000
0.091±0.007 0.090±0.005 0.074±0.004 0.088±0.007 0.037±0.000
sigmoid
ELM RELM SELM ISELM
0.7125±0.0335 0.7125±0.0335 0.8133±0.0433 0.8513±0.0249
0.004±0.003 0.002±0.002 0.044±0.003 0.023±0.006
0.028±0.002 0.029±0.003 0.021±0.002 0.022±0.002
RBF
ELM RELM SELM ISELM SVM
0.6927±0.0403 0.7043±0.0386 0.8242±0.0308 0.8385±0.0389 0.6821±0.0000
0.004±0.001 0.004±0.001 0.044±0.005 0.023±0.002 1.628±0.000
0.093±0.008 0.095±0.011 0.078±0.015 0.081±0.008 0.036±0.000
sigmoid
ELM RELM SELM ISELM
0.6516±0.0471 0.6697±0.0400 0.7992±0.0591 0.8442±0.0206
0.003±0.001 0.003±0.002 0.046±0.007 0.015±0.002
0.029±0.004 0.028±0.003 0.021±0.002 0.027±0.003
RBF
ELM RELM SELM ISELM SVM
0.6429±0.0727 0.6856±0.0268 0.8262±0.0313 0.8397±0.0250 0.6543±0.0000
0.005±0.002 0.004±0.002 0.046±0.003 0.047±0.014 1.263±0.000
0.087±0.008 0.089±0.007 0.077±0.008 0.074±0.010 0.032±0.000
Case2
Case3
Case4
Case5
Case6
Case7
Y.-P. Zhao et al. / Aerospace Science and Technology 91 (2019) 70–81
plies that ISELM and SELM can be developed as the candidate techniques of fault detection of aircraft engine. ISELM is especially more favorable. Declaration of Competing Interest There is no conflict of interest. Acknowledgements This research was supported by the Fundamental Research Funds for the Central Universities under grant no. NS2017013. References [1] A.J. Volponi, T. Brotherton, R. Luppold, D.L. Simon, Development of an Information Fusion System for Engine Diagnostics and Health Management, Tech. Rep. NASA/TM-2004-212924, National Aeronautics and Space Administration, 2004. [Online], available: https://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/ 20040040086.pdf. [2] D.L. Simon, S. Garg, M. Venti, Propulsion Control and Health Management (pchm) Technology for Flight Test on the c-17 t-1 Aircraft, Tech. Rep. NASA/TM-2004-213303, National Aeronautics and Space Administration, 2004. [Online], available: https://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/ 20040172764.pdf. [3] J.S. Litt, D.L. Simon, S. Garg, T.-H. Guo, C. Mercer, R. Millar, A. Behbahani, A. Bajwa, D.T. Jensen, A survey of intelligent control and health management technologies for aircraft propulsion systems, J. Aerosp. Comput. Inf. Commun. (2004) 543–563. [4] T. Kobayashi, D.L. Simon, Integration of on-line and off-line diagnostic algorithms for aircraft engine health management, J. Eng. Gas Turbines Power 129 (4) (2007) 986–993. [Online], available: https://doi.org/10.1115/1.2747640. [5] S. Sina Tayarani-Bathaie, K. Khorasani, Fault detection and isolation of gas turbine engines using a bank of neural networks, J. Process Control 36 (2015) 22–41. [Online], available: https://doi.org/10.1016/j.jprocont.2015.08.007. [6] T. Kobayashi, D.L. Simon, Application of a Bank of Kalman Filters for Aircraft Engine Fault Diagnostics, Tech. Rep. NASA/TM-2003–212526, National Aeronautics and Space Administration, 2003. [Online], available: https://ntrs.nasa.gov/ archive/nasa/casi.ntrs.nasa.gov/20030067984.pdf. [7] S. Borguet, O. Leonard, Coupling principal component analysis and Kalman filtering algorithms for on-line aircraft engine diagnostics, Control Eng. Pract. 17 (4) (2009) 494–502. [Online], available: https://doi.org/10.1016/j. conengprac.2008.09.008. [8] D.L. Simon, S. Garg, Optimal tuner selection for Kalman filter-based aircraft engine performance estimation, in: Proceedings of the ASME Turbo Expo, vol. 1, United states, Orlando, FL, 2009, pp. 659–671. [Online], available: https:// doi.org/10.1115/GT2009-59684. [9] D.L. Simon, A.W. Rinehart, Sensor selection for aircraft engine performance estimation and gas path fault diagnostics, J. Eng. Gas Turbines Power 138 (7) (2016). [Online], available: https://doi.org/10.1115/1.4032339. [10] M. Schwabacher, K. Goebel, A survey of artificial intelligence for prognostics, in: Artificial Intelligence for Prognostics, United States, Arlington, VA, 2007, pp. 107–114. [11] A. Kyriazis, K. Mathioudakis, Gas turbine fault diagnosis using fuzzy-based decision fusion, J. Propuls. Power 25 (2) (2009) 335–343. [Online], available: https://doi.org/10.2514/1.38629. [12] Z. Sadough Vanini, K. Khorasani, N. Meskin, Fault detection and isolation of a dual spool gas turbine engine using dynamic neural networks and multiple model approach, Inf. Sci. 259 (2014) 234–251. [Online], available: https://doi. org/10.1016/j.ins.2013.05.032. [13] Z.-T. Wang, N.-B. Zhao, W.-Y. Wang, R. Tang, S.-Y. Li, A fault diagnosis approach for gas turbine exhaust gas temperature based on fuzzy c-means clustering and support vector machine, Math. Probl. Eng. 2015 (2015). [Online], available: https://doi.org/10.1155/2015/240267. [14] D. Zhou, H. Zhang, S. Weng, A new gas path fault diagnostic method of gas turbine based on support vector machine, J. Eng. Gas Turbines Power 137 (10) (2015). [Online], available: https://doi.org/10.1115/1.4030277. [15] Y.-P. Zhao, F.-Q. Song, Y.-T. Pan, B. Li, Retargeting extreme learning machines for classification and their applications to fault diagnosis of aircraft engine, Aerosp. Sci. Technol. 71 (2017) 603–618. [Online], available: https://doi.org/10.1016/j. ast.2017.10.004. [16] A.D. Fentaye, S.I. Ul-Haq Gilani, A.T. Baheta, Y.-G. Li, Performance-based fault diagnosis of a gas turbine engine using an integrated support vector machine and artificial neural network method, Proc. Inst. Mech. Eng. A, J. Power Energy (2018). [Online], available: https://doi.org/10.1177/0957650918812510.
81
[17] D.E. Rumelhart, G.E. Hinton, R.J. Williams, Learning representations by backpropagating errors 323 (6088) (1986) 399–421. [18] M.H. Hagan, M.B. Menhaj, Training feedforward networks with the Marquardt algorithm, IEEE Trans. Neural Netw. 5 (6) (1994) 989–993. [Online], available: https://doi.org/10.1109/72.329697. [19] V.N. Vapnik, The Nature of Statistical Learning Theory, Springer-Verlag New York, Inc., New York, NY, USA, 1995. [20] C. Cortes, V. Vapnik, Support-vector networks, Mach. Learn. 20 (3) (1995) 273–297. [Online], available: https://doi.org/10.1023/A:1022627411411. [21] Y.-H. Pao, Y. Takefuji, Functional-link net computing: theory, system architecture, and functionalities, Computer 25 (5) (1992) 76–79. [Online], available: https://doi.org/10.1109/2.144401. [22] Y.-H. Pao, G.-H. Park, D. Sobajic, Learning and generalization characteristics of the random vector functional-link net, Neurocomputing 6 (2) (1994) 163–180. [Online], available: https://doi.org/10.1016/0925-2312(94)90053-1. [23] B. Igelnik, Y.-H. Pao, Stochastic choice of basis functions in adaptive function approximation and the functional-link net, IEEE Trans. Neural Netw. 6 (6) (1995) 1320–1329. [Online], available: https://doi.org/10.1109/72.471375. [24] G.-B. Huang, Q.-Y. Zhu, C.-K. Siew, Extreme learning machine: theory and applications, Neurocomputing 70 (1–3) (2006) 489–501. [Online], available: https:// doi.org/10.1016/j.neucom.2005.12.126. [25] H. He, E.A. Garcia, Learning from imbalanced data, IEEE Trans. Knowl. Data Eng. 21 (9) (2009) 1263–1284. [Online], available: https://doi.org/10.1109/TKDE. 2008.239. [26] Y.-P. Zhao, B. Li, Y.-B. Li, An accelerating scheme for destructive parsimonious extreme learning machine, Neurocomputing 167 (2015) 671–687. [Online], available: https://doi.org/10.1016/j.neucom.2015.04.002. [27] P.L. Bartlett, Sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network, IEEE Trans. Inf. Theory 44 (2) (1998) 525–536. [Online], available: https://doi.org/10. 1109/18.661502. [28] A.N. Tikhonov, A.V. Arsenin, Solution of Ill-Posed Problems, Winston & Son, Washington, 1977. [29] W. Deng, Q. Zheng, L. Chen, Regularized extreme learning machine, in: 2009 IEEE Symposium on Computational Intelligence and Data Mining, United States, Nashville, TN, 2009, pp. 389–395. [Online], available: https://doi.org/10.1109/ CIDM.2009.4938676. [30] Y.-P. Zhao, K.-K. Wang, Y.-B. Li, Parsimonious regularized extreme learning machine based on orthogonal transformation, Neurocomputing 156 (2015) 280–296. [Online], available: https://doi.org/10.1016/j.neucom.2014.12.046. [31] S.H. Yang, B.G. Hu, A stagewise least square loss function for classification, in: Siam International Conference on Data Mining, 2008, pp. 120–131. [32] R. Batuwita, V. Palade, Fsvm-cil: fuzzy support vector machines for class imbalance learning, IEEE Trans. Fuzzy Syst. 18 (3) (2010) 558–571. [Online], available: https://doi.org/10.1109/TFUZZ.2010.2042721. [33] G. Xu, B.-G. Hu, J.C. Principe, An asymmetric stagewise least square loss function for imbalanced classification, in: Proceedings of the International Joint Conference on Neural Networks, Beijing, China, 2014, pp. 1107–1114. [Online], available: https://doi.org/10.1109/IJCNN.2014.6889606. [34] Y.-P. Zhao, Y.-T. Pan, F.-Q. Song, L. Sun, T.-H. Chen, Feature selection of generalized extreme learning machine for regression problems, Neurocomputing 275 (2018) 2810–2823. [Online], available: https://doi.org/10.1016/j.neucom.2017. 11.056. [35] Y.-P. Zhao, K.-K. Wang, Fast cross validation for regularized extreme learning machine, J. Syst. Eng. Electron. 25 (5) (2014) 895–900. [Online], available: https://doi.org/10.1109/JSEE.2014.000103. [36] C.-C. Chang, C.-J. Lin, LIBSVM: a library for support vector machines, ACM Trans. Intell. Syst. Technol. 2 (3) (2011) 1–27. [Online], available: https:// doi.org/10.1145/1961189.1961199. [37] T. Hofmann, B. Schölkopf, A.J. Smola, Kernel methods in machine learning, Ann. Stat. 36 (3) (2008) 1171–1220. [38] Y.-P. Zhao, J.-G. Sun, Recursive reduced least squares support vector regression, Pattern Recognit. 42 (5) (2009) 837–842. [Online], available: https://doi.org/10. 1016/j.patcog.2008.09.028. [39] I. Steinwart, Sparseness of support vector machines, J. Mach. Learn. Res. 4 (6) (2004) 1071–1105. [Online], available: https://doi.org/10.1162/ 1532443041827925. [40] S. Borguet, O. Loenard, Comparison of adaptive filters for gas turbine performance monitoring, J. Comput. Appl. Math. 234 (7) (2010) 2202–2212. [Online], available: https://doi.org/10.1016/j.cam.2009.08.075. [41] Pratt Whintey, Module Analysis Program Network (MAPNET) Training Guide, Pratt & Whintey Customer Training Center, USA, 1997.