Detection of incipient faults in EMU braking system based on data domain description and variable control limit

Detection of incipient faults in EMU braking system based on data domain description and variable control limit

Detection of Incipient Faults in EMU Braking System Based on Data Domain Description and Variable Control Limit Communicated by Hongli Dong Journal...

6MB Sizes 0 Downloads 15 Views

Detection of Incipient Faults in EMU Braking System Based on Data Domain Description and Variable Control Limit

Communicated

by Hongli Dong

Journal Pre-proof

Detection of Incipient Faults in EMU Braking System Based on Data Domain Description and Variable Control Limit Jianxue Sang, Junfeng Zhang, Tianxu Guo, Donghua Zhou, Maoyin Chen, Xiuhua Tai PII: DOI: Reference:

S0925-2312(19)31732-1 https://doi.org/10.1016/j.neucom.2019.12.029 NEUCOM 21658

To appear in:

Neurocomputing

Received date: Revised date: Accepted date:

6 June 2019 7 November 2019 2 December 2019

Please cite this article as: Jianxue Sang, Junfeng Zhang, Tianxu Guo, Donghua Zhou, Maoyin Chen, Xiuhua Tai, Detection of Incipient Faults in EMU Braking System Based on Data Domain Description and Variable Control Limit, Neurocomputing (2019), doi: https://doi.org/10.1016/j.neucom.2019.12.029

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2019 Published by Elsevier B.V.

Detection of Incipient Faults in EMU Braking System Based on Data Domain Description and Variable Control Limit? Jianxue Sanga , Junfeng Zhanga , Tianxu Guoa , Donghua Zhoub,a,∗, Maoyin Chena,∗∗, Xiuhua Taic a

b

Department of Automation, Tsinghua University, Beijing 100084, China College of Electrical Engineering and Automation, Shandong University of Science and Technology, Qingdao 266590, China c CRRC Qingdao Sifang Rolling Stock Research Institute Co.,Ltd, Qingdao 266112, China

Abstract The performance of braking system strongly affects the safe operation of electric multiple unit (EMU). During the practical operation, it’s of great significance to detect incipient faults in braking system. Since the braking process is a typical non-Gaussian and multistage process, it’s difficult to detect these incipient faults. Particularly, the usually occurred overlap of normal and faulty samples during braking process makes the detection more difficult. In this paper, a novel method based on data domain description and variable control limit (VCL) is developed for detecting incipient faults in EMU braking system. The local reachability density (LRD) weighted support vector data description with negative samples (NSVDD) is introduced for offline modeling to get more accurate domain description, while the Gaussian kernel trick is utilized to obtain hypersphere with soft boundary. With the presence of sample overlap, the VCL strategy is adopted for online fault detection, which effectively reduces false alarm rate (FAR) and increases fault detection rate (FDR) simultaneously. A case study of three kinds of incipient faults in EMU braking system fully demonstrates the effectiveness of the proposed method. Keywords: Incipient Fault, Electric Multiple Unit (EMU), Braking System, Local Reachability Density (LRD), Variable Control Limit.

?

This work was supported by NSFC under Grants No.61490701, 61751307, 61873143, and Research Fund for the Taishan Scholar Project of Shandong Province of China. ∗ Corresponding author ∗∗ Corresponding author Email addresses: [email protected] (Donghua Zhou), [email protected] (Maoyin Chen)

Preprint submitted to Neurocomputing

December 12, 2019

1. Introduction The safe operation is crucial for electric multiple unit (EMU), while an incipient fault in EMU may cause serious economic loss and even endanger personal safety. For example, the fracture of the steel tyre of composite wheel of ICE884 train made it derail in Germany on 3 June, 1998, resulted in 101 dead and 88 injured[1]. The rear-end collision on Yongwen railway line happened in China on 23 July, 2011, made 40 dead and 172 injured. The reason of this accident is the equipment fault of train control center caused by lightning strike[2]. The above serious accidents imply that the high safety and reliability must be ensured during the operation of EMU. For modern industry, fault detection has been widely adopted to improve system safety and reliability as it can judge whether the current system is in normal state[3]. In the literature, model-based approaches have become the earliest and most mature branch[4, 5, 6]. However, due to the large scale and complex structure of industrial systems, it’s usually difficult or expensive to obtain accurate models for fault detection[7]. Since industrial systems are usually equipped with a large number of sensors and a variety of data acquisition devices, many kinds of data-based approaches have been proposed in the past decades[7], including methods based on multivariate statistical process monitoring (MSPM)[8, 9, 10, 11, 12], signal processing[13, 14], and artificial intelligence[15, 16], etc. Recently, fault detection of EMU begins to attract researcher’s interest[17, 18, 19, 20, 21]. Yin et al. proposes a novel DBN-based approach for the fault diagnosis of vehicle on-board equipments in high-speed railways[20]. Based on total measurable fault information residual, the problem of incipient sensor biases for EMU traction device is considered[21]. Note that the braking system is the key to ensure the safe and stable operation of EMU. Due to the long-term operation, the braking system may suffer incipient faults. The meaning of incipient faults considered in this paper is that their amplitude are too small to be well detected[21, 22]. Incipient faults may degrade the performance of braking system if they develop and spread. For the practical operation of EMU, incipient faults in each stage of the braking process should be detected in real time. The existing on-board detection technologies for EMU braking system are usually based on univariate monitoring strategy, such as the KNORR logic, which limits the allowable deviation between the target pressure value and the upper/lower limit to 20 kPa[23]. Although this kind of strategy can detect significant faults in EMU, it may not be always effective to deal with incipient faults. In[24], intervariable variance (IVV) is introduced to measure the degree of dispersion among different brake cylinder pressures, and thus to monitor the EMU braking system. Based on characteristic dimensionality reduction, a robust detection approach is proposed for control loop fault detection in EMU braking system[25]. During the operation of EMU braking system, normal and faulty samples labelled by operation and maintenance staff can be obtained. However, due to the small amplitude of incipient faults and multi-stage characteristic of braking process, normal and faulty samples often overlap seriously, which brings difficulty for fault detection. Unfortunately, the methods developed in [24] and [25] cannot effectively detect incipient faults in this case. In addition, the braking process is a non-Gaussian process, which makes the detection of 2

incipient faults more difficult. For non-Gaussian processes, the monitoring methods mainly include independent component analysis (ICA), Gaussian mixture model (GMM), and support vector data description (SVDD), etc[7]. ICA and GMM usually consider the modeling of normal data, ignoring the effective information inherent in historical fault data. Based on data domain description, SVDD has no assumption of the data distribution, and SVDD with negative samples (NSVDD) makes use of historical fault information in modeling[26]. Moreover, NSVDD is not designed for the purpose of classification, and thus it does not need to consider the problem of class-imbalance. Motivated by the above facts, a novel detection method is developed based on data domain description and variable control limit (VCL). According to the characteristics of historical data, the local reachability density (LRD) weighted NSVDD is introduced for offline modeling to get more accurate data domain description, and the Gaussian kernel trick is investigated to obtain hypersphere with soft boundary. The VCL strategy can effectively reduce false alarm rate (FAR) and increase fault detection rate (FDR) simultaneously when facing the problem of sample overlapping. A superior detection performance for three kinds of incipient faults in EMU braking system can be achieved by our proposed method. The rest of this paper is organized as follows. In Section 2, the composition and working mechanism of the EMU braking system are introduced. In Section 3, LRD weighted NSVDD method based on data domain description is proposed for offline modeling. In addition, VCL strategy is proposed for online detection. In Section 4, experimental studies on EMU braking system are conducted to illustrate the effectiveness of our proposed method. At last, the conclusion is given in Section 5. 2. EMU Braking System Description Although the braking systems in different kinds of EMUs are not exactly the same, the main working principles are similar. As shown in Fig.1, the CRH2 EMU braking system can be split into three parts: brake control unit (BCU), brake cylinder system and foundation brake device. BCU consists of electric brake control unit (EBCU), electric-pneumatic (EP) valve and relay valve. The brake cylinder system is composed of four identical units, each of which includes pressure control valve, pressurized cylinder and pressure sensor. Foundation brake device contains oil cylinder and disc brake device. The pressures collected in EMU braking system are denoted by Ppre , Pout , and Pi (i = 1, 2, 3, 4), respectively. The braking process can be briefly described as follows. To begin with, the EBCU receives the braking instruction from the information control unit, and calculates the required air braking force. After that, the EBCU controls the opening and closing of EP valve in order to reach the pre-control pressure value Ppre . Then, the relay valve produces brake pressure Pout by air flow amplification. The compressed air surpasses the pressure control valve and pressurized cylinder successively. Finally, the air pressure is converted to the braking force by foundation brake device to complete the brake behavior.

3

PWM Signal

Electric Brake Control Unit

Brake Control Unit

Compressed Air

Brake Supply Cylinder

Relay Valve

Ppre

Pout

Pressure Control Valve

Pressure Control Valve

Pressure Control Valve

Pressure Control Valve

Pressurized Cylinder

Pressurized Cylinder

Pressurized Cylinder

Pressurized Cylinder

P1

Foundation Brake Device

Information Control Unit

EP Valve charge valve discharge valve

Brake Cylinder System

Braking Instruction

P2

P3

P4

Oil Cylinder

Oil Cylinder

Oil Cylinder

Oil Cylinder

Disc Brake Device

Disc Brake Device

Disc Brake Device

Disc Brake Device

Figure 1: Structure diagram of CRH2 EMU braking system[25].

Figure 2: Brake test platform in CRRC Qingdao Sifang Rolling Stock Research Institute Co., Ltd., China.

In this paper, the detection for the braking system is investigated on the brake test platform in CRRC Qingdao Sifang Rolling Stock Research Institute Co., Ltd., as shown in Fig.2. Based on the test platform, brake tests can be performed under various braking modes, and faulty samples can be obtained by fault injection technology[25]. In order to understand the characteristics of the EMU braking system intuitively, Fig.3 shows the dynamic curve of brake pressure in a typical braking process. As shown in Fig.3, the braking process is inherently a non-Gaussian and multi-stage process. There are four stages in the whole process, namely charging-, holding-, venting- and standby-stages. At the charging-stage, the compressed air from the delay valve enters the brake cylinders quickly, and pressure values go up from zero almost simultaneously. At the holding-stage, pressure values are maintained near the target pressure value, which is determined by the brake level. 4

At the venting-stage, the compressed air is gradually discharged through discharge valve, and pressure values gradually reduce accordingly. At last, pressure values reduce to zero, and the braking process is in standby-stage.

300

Holding

250 200 150

Charging

Venting

100 50 Standby

0

0

5

10

15

20

Figure 3: The dynamic curve of brake pressure in a typical braking process.

The overlap of normal and faulty samples often occurs during the practical braking process. For example, considering the component fault in brake cylinder which degenerates the performance of brake cylinder in charging- and venting- stages, faulty samples are almost completely overlapped with normal samples in holding-stage. Hence the main aim of this paper is to propose an effective method to detect incipient faults in EMU braking system, even if there exists the overlap of normal and faulty samples. 3. Fault Detection Method 3.1. Offline Modeling Based on Data Domain Description It’s known that domain description (also called one-class classification) concerns the spatial characterization of a data set. A qualified description uses a compact boundary to cover all target data with exclusion of superfluous space. In the literature, several domainbased approaches have been proposed to find the compact boundary[26, 27, 28]. If faulty samples are included in historical data set, NSVDD can also be used to find spherically shaped boundary to enclose the normal data set and exclude the fault data set[26]. Given a set of historical data X = (x1 ; x2 ; ...; xn ) and Y = (y1 ; y2 ; ...; ym ), where {xi ∈ Rd , i = 1, · · · , n} and {yj ∈ Rd , j = 1, · · · , m} denote normal and faulty samples, respectively. NSVDD can be applied to solve the following optimization problem[26]: min R2 + C

R,a,ξ,θ

n P

i=1 2

ξi + D

m P

θj

j=1

s.t.kϕ(xi ) − ak ≤ R2 + ξi , ξi ≥ 0, i = 1, · · · , n kϕ(yj ) − ak2 ≥ R2 − θj , θj ≥ 0, j = 1, · · · , m 5

(1)

where ϕ is a function mapping original data to a higher dimensional space to obtain flexible boundary[26]. ξ = (ξ1 , · · · , ξn )T and θ = (θ1 , · · · , θm )T are slack vectors for normal and faulty samples respectively, representing the range that each data is allowed to surpass the boundary. Penalty coefficients C > 0 and D > 0 are selected based on a tradeoff between the sphere volume and errors. Following the above optimization, a hyperspherical model is characterized by the center a and the radius R. From the work[26], xj is detected as an outlier if kϕ(xj ) − ak2 − R2 > 0

(2)

However, the penalty coefficients C and D in problem (1) are constant. In fact, different samples have different importance for data domain description. To illustrate this problem intuitively, the illustration of offline modeling of NSVDD is shown in Fig.4(a). The black dots represent normal samples, the red dots represent faulty samples, the circle stands for the optimal boundary of the optimization problem (1), and the samples in the triangle box are those that need to be penalized. For two normal samples A and B, there are more normal samples close to A in offline modeling. It’s easy to infer that there are still more normal samples surrounding A in online detection. If A is disposed outside the boundary of normal data domain, the loss of detection performance may be greater. In order to achieve a better performance, the penalty coefficient corresponding to A should be larger than that of B. For offline modeling illustration of Fig.4(a) and Fig.4(b), the boundary in Fig.4(b) can achieve a better performance in online detection because it contains more normal data and excludes more fault data of the high-density part.

A

..... ........ . . .. . . . . ............. . . ..... . ................ .............. . . .. .. ............ . . . .. . . .............. ... .. ... . .. . . . . . .. . . .

A

B

(a)

..... ........ . . .. . . . . ............. . . ..... . ............... .............. . . .. .. ............ . . . .... . ............ ... .. ... . .. . . . . . .. . . .

B

(b)

Figure 4: The illustration of offline modeling in two dimensions. (a) NSVDD; (b) our method.

Summing up the above analysis, the accuracy of data domain description can be improved by using different penalty coefficients for samples with different densities. Based on optimization problem (1), we rewrite the new optimization problem with variable penalty 6

coefficients as min R2 +

R,a,ξ,θ

n P

ci ξi +

i=1

m P

j=1 2

dj θj = R2 + c · ξ + d · θ

s.t.kϕ(xi ) − ak2 ≤ R + ξi , ξi ≥ 0, i = 1, · · · , n kϕ(yj ) − ak2 ≥ R2 − θj , θj ≥ 0, j = 1, · · · , m

(3)

where c = (c1 , · · · , cn )T and d = (d1 , · · · , dm )T are penalty vectors for normal and faulty samples, respectively. ci and dj are determined based on LRD of corresponding sample xi and yj respectively, which will be shown in the following subsection. The Lagrange dual problem (3) can be described as follows[29]:   inf L(R, a, ξ, θ, s, t, α, β) , s ≥ 0, t ≥ 0, α ≥ 0, β ≥ 0 max R,a,ξ,θ (4) s,t,α,β

where s = (s1 , · · · , sn )T , t = (t1 , · · · , tm )T , α = (α1 , · · · , αn )T and β = (β1 , · · · , βm )T are Lagrange multipliers, and L is the Lagrangian, given by 2

L=R +c·ξ+d·θ−s·ξ−t·θ− +

m X j=1

n X i=1

αi (R2 + ξi − kϕ(xi ) − ak2 ) (5)

2

2

βj (R − θj − kϕ(yj ) − ak )

Setting the partial derivatives of L with respect to R, a, ξ and θ as zero respectively, we get n

m

X X ∂L =0⇒ αi − βj = 1 ∂R i=1 j=1 n

(6)

m

X X ∂L =0⇒a= αi ϕ(xi ) − βj ϕ(yj ) ∂a i=1 j=1 ∂L = 0 ⇒ α i + s i = ci ∂ξi ∂L = 0 ⇒ βj + tj = dj ∂θj

(7) (8) (9)

Substituting Eqs.(6–9) into Eq.(5), it can be learned that L=

n X

αi (ϕ(xi ) · ϕ(xi )) −

i=1 m X m X



j=1 l=1

m X j=1

βj (ϕ(yj ) · ϕ(yj )) −

βj βl (ϕ(yj ) · ϕ(yl )) + 2

n X m X i=1 j=1

7

n X n X i=1 k=1

αi αj (ϕ(xi ) · ϕ(xk ))

αi βj (ϕ(xi ) · ϕ(yj ))

(10)

Here, the Gaussian kernel is utilized to avoid computing the inner product in high dimensional space[30], which satisfies  K(xi , xj ) = ϕ(xi ) · ϕ(xj ) = exp(−kxi − xj k2 s2 )

(11)

Substituting Eq.(11) and Eq.(6) into Eq.(10) leads to L=

n X i=1

+2

αi × 1 −

n X m X

+2

i=1 k=1

n X m X

j=1

βj × 1 −

n X n X i=1 k=1

αi αk K(xi , xk ) −

m X m X

βj βl K(yj , yl )

j=1 l=1

αi βj K(xi , yj )

i=1 j=1 n X n X

=1−

m X

αi αk K(xi , xk ) −

m X m X

(12) βj βl K(yj , yl )

j=1 l=1

αi βj K(xi , yj )

i=1 j=1

From Eqs.(8,9), considering si ≥ 0 and tj ≥ 0, it can be obtained that 0 ≤ α i ≤ ci

(13)

0 ≤ βj ≤ dj

(14)

According to Eqs.(12–14,6), the optimization problem (4) can be rewritten into ! n P n m P m n P m P P P min αi αk K(xi , xk ) + βj βl K(yj , yl ) − 2 αi βj K(xi , yj ) α,β

i=1 k=1

j=1 l=1

i=1 j=1

s.t. 0 ≤ αi ≤ ci , i = 1, · · · , n 0 ≤ βj ≤ dj , j = 1, · · · , m n m P P αi − βj = 1 i=1

(15)

j=1

Let w = [α; β]. Then problem (15) can be considered as a standard quadratic programming (QP) problem with linear constraint conditions, represented by min wT Hw w

s.t.0 ≤ w ≤ [c; d] A·w =1

(16)

in which A = [1; 1; · · · ; 1; −1; −1; · · · ; −1](n+m)×1 {z } | {z } | n

m

8

(17)

H=



K(X, X) −K(X, Y) −K(Y, X) K(Y, Y)

(n+m)×(n+m)

(18)

with K(X, X) being an n × n kernel matrix over normal samples, K(Y, Y) being an m × m kernel matrix over faulty samples, and K(X, Y) being the kernel matrix between normal and faulty samples, i.e.,  n×m K(x1 , y1 ) K(x1 , y2 ) · · · K(x1 , ym )  K(x2 , y1 ) K(x2 , y2 ) · · · K(x2 , ym )    (19) K(X, Y) =   .. .. .. ...   . . . K(xn , y1 ) K(xn , y2 ) · · · K(xn , ym )

Problem(16) can be solved by ellipsoid method, sequential minimal optimization (SMO), or other methods optimized for SVM[31]. The optimal solutions of problem(16) are still recorded as α and β for simplicity. According to Karush-Kuhn-Tucher (KKT) conditions, primal and dual optimal solutions must satisfy the following complementary slackness conditions[29]: s · ξ = 0, t · θ = 0, αi (R2 + ξi − kϕ(xi ) − ak2 ) = 0, βj (R2 − θj − kϕ(yj ) − ak2 ) = 0

(20)

From Eqs.(8,9,20), the following relationships can be obtained: if if if if if if

αi = 0 ⇒ kϕ(xi ) − ak2 ≤ R2 0 < αi < ci ⇒ kϕ(xi ) − ak2 = R2 αi = ci ⇒ kϕ(xi ) − ak2 ≥ R2 βj = 0 ⇒ kϕ(yj ) − ak2 ≥ R2 0 < βj < dj ⇒ kϕ(yj ) − ak2 = R2 βj = dj ⇒ kϕ(yj ) − ak2 ≤ R2

(21)

That is to say, when 0 < αi < ci or 0 < βj < dj , the corresponding sample xi or yj is located on the boundary of the hypersphere. All these samples can be called support vectors, similar to support vector classifiers[26, 32]. For a new sample z, if ϕ(z) is located in the hypersphere in high-dimensional space, it’s considered to be normal. Otherwise, it’s faulty. Similar to Eq.(2), define T (z) = kϕ(z) − ak2 − R2

= kϕ(z) − ak2 − kϕ(xsv ) − ak2 (22) n m n m X X X X =2 αi K(xsv , xi ) − 2 βj K(xsv , yj ) − 2 αi K(z, xi ) + 2 βj K(z, yj ) i=1

j=1

i=1

j=1

where xsv is any one of the support vectors. If T (z) is zero, it means ϕ(z) is on the boundary of the hypersphere in high-dimensional space. The negative or positive of T (z) represents that ϕ(z) is inside or outside the hypersphere, respectively. Based on this, we take T (z) as our detection statistic, and the control limit is 0. When T (z) ≤ 0, z is regarded as a normal sample. Otherwise, z is a faulty sample. 9

3.2. LRD Based Penalty Weight Note that some popular outlier-detection methods such as local outlier factor (LOF) and local correlation integral (LOCI) are based on local densities[33, 34]. They regard a sample as an outlier when its surrounding space contains relatively few samples, i.e., the local density of this sample is relatively low. Considering hypersphere obtained from Eq.(3), it’s desirable to maximize the number of normal samples contained in the hypersphere, and to minimize the number of faulty samples in the hypersphere. Inspired by LOF and LOCI, local density based penalty weight is proposed as follows. For samples with higher local densities in normal data set, bigger penalty weights are assigned to make them be more likely enclosed in the hypersphere. Correspondingly, for samples with higher local densities in fault data set, bigger penalty weights are assigned to make them be more likely excluded from the hypersphere. It should be noted that local densities of normal and faulty samples are calculated in normal and fault data sets, respectively. In this paper, we use local reachability density (LRD)[33] as local density estimation. Denote the set of k nearest neighbors of x as Nk (x), x(k) as the k-th nearest neighbor of x, and Dk (x) as the distance of x to x(k) . The reachability distance[33] from x to y is defined by

 (23) RDk (x, y) = max x − x(k) , kx − yk

Note that the reachability distance from x to y is regarded as the true distance of two objects, but at least the Dk (x). The local reachability density[33] of x is represented by lrdk (x) =

P

|Nk (x)| RDk (y, x)

(24)

y∈Nk (x)

which is the inverse of the average reachability distance of x from its neighbors. If the sample density of x is high, the value of lrdk (x) is large. This value can become infinite for duplicate points. In this paper, the LRD is computed by the same Gaussian kernel as defined by Eq.(11). Then we get  (25) kϕ(x) − ϕ(y)k2 = K(x, x) + K(y, y) − 2K(x, y) = 2 − 2 exp(−kx − yk2 s2 ) As shown in Eq.(25), the distance between ϕ(x) and ϕ(y) in high dimensional space is positively correlated with the distance between x and y in the original space. That is to say, the ranking of distances from other points to x in the original space is identical to that in the kernel space. It’s learned that if y ∈ Nk (x) ⇒ ϕ(y) ∈ Nk (ϕ(x))

(26)

Therefore, the LRD of ϕ(x) in Gaussian kernel space can be obtained by lrdk (ϕ(x)) =

P

y∈Nk (x)

10

|Nk (x)| RDk (ϕ(y), ϕ(x))

(27)

According to the above analysis, the Gaussian kernel does not affect the LRD sorting between each data in the original space, which makes the hypersphere obtained in kernel space can truly describe the data in the original space. Note that LRDs of normal and fault data set are computed separately. After calculating the lrdk (ϕ(xi )) of each normal sample xi and lrdk (ϕ(yj )) of each faulty sample yj , the penalty coefficient ci of xi and dj of yj can be determined subsequently. The values of ci and dj are scaled to the range of [cmin : cmax ] and [dmin : dmax ] respectively as ci = cmin + (cmax − cmin )

lrdk (ϕ(xi )) − dxmin dxmax − dxmin

(28)

lrdk (ϕ(yj )) − dymin (29) dymax − dymin where dxmax and dxmin respectively represent the maximum and minimum local reachability densities of all the normal samples in high dimensional space, dymax and dymin respectively represent the maximum and minimum local reachability densities of all the faulty samples in high dimensional space. cmin , cmax , dmin and dmax can be selected according to the requirements of fault detection and effects of offline modeling. For example, larger cmax results in lower FAR, while larger dmax results in higher FDR. In this paper, the parameters required by LRD weighted NSVDD are set according to Table 1. dj = dmin + (dmax − dmin )

3.3. VCL Strategy for Online Detection For practical data sets of EMU braking system, normal and faulty samples overlap seriously in some cases, and the soft boundary of the hypersphere passes through the overlapping area. If the proposed method is used directly to detect new samples, there may exist high FAR for normal samples and low FDR for faulty samples. In this subsection, we propose a novel strategy of variable control limit (VCL) to solve this dilemma. Here a numerical example is given to illustrate the overlapping problem intuitively. The simulation data are generated from the following two-dimensional linear system:     x1 t1 = +e (30) x2 t2 where t1 and t2 are Gaussian distributed random variables, and each component of e is zeromean white noise with standard deviation of 0.01. Two sets of data sources are simulated to represent different operating modes given by Mode 0 (Normal Mode) : t1 ∼ N (0, 1.5), t2 ∼ N (0, 1.5) Mode 1 (Faulty Mode) : t1 ∼ N (−3, 1), t2 ∼ N (0, 1) where the means and standard deviations of t1 and t2 are diffierent to reflect the shift of modes. When offline modeling, 3000 normal samples and 500 faulty samples are generated from Mode 0 and Mode 1 respectively. As shown in Fig.5(a), normal and faulty samples overlap seriously. When the LRD weighted NSVDD is applied for offline modeling, the parameters are set as: k = 3, cmin = 11

e−3 , cmax = 0.8, dmin = e−3 , dmax = 0.8, s = 5, and the meaning of each parameter can be found in Table 1. Three contours are drawn according to the value of T (z), calculated by Eq.(22). Each contour corresponds to a specific hypersphere in the kernel space. For convenience, name hyperspheres according to the corresponding contours. For example, 0sphere corresponds to the 0-contour. As shown in Fig.5(a), 0-contour passes through more overlapping area than other two contours. Correspondingly, the boundary of 0-sphere also passes through more overlapping area in the kernel space than other two hyperspheres. Hence it leads to a high FAR for normal samples and a low FDR for faulty samples if 0-sphere is chosen to test new data. 10

10

5

5

0

0

-5

-5

0

5

-5

10

(a)

-5

0

5

10

(b)

Figure 5: Offline modeling. (a) Sample overlap of known Modes 0 and 1, and boundaries based on data description; (b) Relationship between boundaries and unknown faulty Modes 2 and 3.

Now we choose various hyperspheres for testing new data. Firstly, we use ρ-sphere corresponding to a small positive number: ρ = 5e−4 . As shown in Fig.5, most of the overlapped data are included in the corresponding contour. It’s obvious that the ρ-sphere completely encloses the 0-sphere in kernel space, and thus contains more overlapped data than the 0-sphere. A lower FAR may be achieved by applying ρ-sphere, while the FDR may be reduced simultaneously. Now we test σ-sphere corresponding to a small negative number: σ = −5e−4 . As shown in Fig.5, most of the overlapped data are not included in the corresponding contour. In fact, the σ-sphere is smaller than 0-sphere in kernel space, and thus does not contain most of the overlapped data. Hence higher FDR may be achieved by applying σ-sphere, while the FAR may be increased simultaneously. It should be noted that lower FAR and higher FDR are usually contradictory for fault detection, especially when there is sample overlapping. As shown in Fig.5, the small change of control limit for LRD weighted NSVDD obviously affects the sample overlapping area, but has little influence on other areas. This inspires us to propose variable control limit (VCL) strategy as follows. The first step is to roughly predict the operating mode of the braking system: normal or abnormal. If the system is predicted to work in normal mode, the focus of detection should be on reducing interruption to smooth operations. Therefore, the criterion for fault detection should be relaxed appropriately to reduce FAR. Otherwise, if the system is 12

predicted to work in abnormal mode, the focus of detection needs to shift to improving FDR, and the criterion of fault detection should be strengthened properly. To satisfy these needs, control limit ρ of T (z) corresponding to a bigger hypersphere in kernel space is used for the detection of the predicted normal mode, and control limit σ corresponding to a smaller hypersphere in kernel space is used for the detection of the predicted abnormal mode. Specifically, we use ρ as an initial control limit. For a new test data zl , denote the average of detection statistics T¯(zl ) by l 1 X ¯ T (zk ) T (zl ) = w k=l−w+1

(31)

where w is an artificially specified parameter representing the width of the sliding window. For positive ε0 (ε0 < ρ), if T¯(zl ) > ε0 , the system is predicted to work in abnormal mode, and the control limit is switched to σ to implement more rigorous detection. For negative ε1 (σ < ε1 ), if T¯(zl ) < ε1 , the system is predicted to work in normal mode, and the control limit is switched to ρ to reduce false alarms. The control limit does not change when ε1 ≤ T¯(zl ) ≤ ε0 . Generally speaking, the superior detection performance with strong robustness can usually be obtained by adjusting parameters required by VCL strategy. Among these parameters, ρ and σ can be chosen according to the requirements of fault detection and the effects of offline modeling. Increasing the absolute values of ρ and σ may lead to a better performance. However, the absolute values of ρ and σ should not be too large to degrade the detection performance of unknown faulty modes. In other words, when selecting ρ and σ, the tradeoff between detection performance and robustness should be considered. Similarly, when selecting ε0 and ε1 , we should consider the tradeoff between the accuracy of prediction of the operating mode and the flexibility of control limit switching. When selecting w, although a larger w makes the prediction of the operating mode more stable, it leads to a larger switching delay of control limit and the decrease of correlation between predicted and real modes. The empirical settings for these parameters are shown in Table 1. Table 1: Parameters required by our proposed method Parameter Meaning s The bandwidth of Gaussian kernel k The number of nearest neighbors used for LRD LRD weighted NSVDD cmin Minimum penalty coefficient of normal samples (Offline modeling) cmax Maximum penalty coefficient of normal samples dmin Minimum penalty coefficient of faulty samples dmax Maximum penalty coefficient of faulty samples ρ Upper control limit VCL strategy σ Lower control limit (Online detection) ε0 Upper threshold used for control limit switching ε1 Lower threshold used for control limit switching w The number of samples used for mode prediction

Range 2 – 10 2 – 50 e−3 − e−4 0.1 – 10 e−3 − e−4 0.1 – 10 e−3 − e−4 (−e−4 ) − (−e−3 ) 0.1ρ − 0.9ρ 0.9σ − 0.1σ 2 – 50

For online detection, 3, 000 test samples are generated according to Eq.(30). Samples 13

from No.601 to No.2400 come from Mode 1, and others come from Mode 0. The parameters required by VCL strategy are chosen to be: ρ = 5e−4 , σ = −5e−4 , ε0 = 0.5ρ, ε1 = 0.5σ, and the meaning of each parameter can be found in Table 1. Note that w is not fixed here. Table 2 shows the fault detection results of Mode 1 with and without VCL strategy. As seen from Table 2, the detection performance is improved evidently if VCL strategy is applied. Table 2 also shows how the detection performance is influenced by the choice of w. When w does not exceed 10, the performance is improved with the increase of w. However, the performance is degraded when w = 30. In addition, fault detection results of Mode 1 are shown in Fig.6. 0.03

0.03

0.02

0.02

0.01

0.01

0

0

-0.01

-0.01 0

600

1200

1800

2400

3000

0

(a)

600

1200

1800

2400

3000

(b)

Figure 6: Detection results of Mode 1. (a) Without VCL (FAR: 12.58%, FDR: 73.67%); (b) With VCL: w = 5 (FAR: 3.42%, FDR: 94.22%).

Table 2: Fault detection results of Mode 1 with and without VCL

without VCL with VCL

Parameter FAR FDR – 11.28% 73.67% w=2 5.75% 77.89% w=5 3.42% 94.22% w = 10 3.33% 94.83% w = 30 5.07% 93.34%

With VCL strategy, the characteristics of data description can be effectively used to improve the detection performance for known faults in the case of sample overlap. Moreover, the detection performance of unknown faults which have not occurred in history will not be reduced. Taking the above simulation case as an example, consider two kinds of unknown faults given by Mode 2 (Faulty Mode) : t1 ∼ N (5, 1), t2 ∼ N (0, 1) Mode 3 (Faulty Mode) : t1 ∼ N (−3, 1), t2 ∼ N (1, 1) The relationship between boundaries for VCL and unknown faulty modes is shown in Fig.5(b). obviously, the samples from Mode 3 are close to the overlapping area of Mode 0 14

and Mode 1, while the samples from Mode 2 are far away from the overlapping area. It can be inferred that the detection of Mode 3 should be significantly affected by VCL, while the detection of Mode 2 should hardly be influenced by VCL. In order to verify our inferences, the online detection of Mode 2 and Mode 3 is carried out respectively. Here, 3, 000 samples are generated according to Eq.(30), and samples from No.601 to sample No.2400 come from faulty mode. The parameters for VCL remain unchanged. The detection results of these two kinds of unknown faults are shown in Fig.7. The comparison of our method with and without VCL strategy are briefly listed in Table 3. As seen from Table 3, the FDR of Mode 3 is increased evidently, the FDR of Mode 2 is slightly changed, and both the FARs of Mode 2 and Mode 3 are reduced obviously when using VCL. 0.15

0.15

0.1

0.1

0.05

0.05

0

0 0

600

1200

1800

2400

3000

0

600

(a)

1200

1800

2400

3000

1800

2400

3000

(b)

0.04 0.04 0.03 0.03 0.02

0.02

0.01

0.01

0

0 -0.01

-0.01 0

600

1200

1800

2400

3000

0

(c)

600

1200

(d)

Figure 7: Detection results of Mode 2 and Mode 3. (a) Mode 2 without VCL (FAR: 9.75%, FDR: 38.22%); (b) Mode 2 with VCL: w = 5 (FAR: 2.67%, FDR: 38.00%); (c) Mode 3 without VCL (FAR: 10.00%, FDR: 71.44%); (d) Mode 3 with VCL: w = 5 (FAR: 3.25%, FDR: 90.50%).

15

Table 3: Fault detection results of Mode 2 and Mode 3 with and without VCL

Parameter without VCL with VCL

– w=2 w=5 w = 10

Mode 2 FAR 10.58% 3.83% 3.67% 3.67%

FDR 75.89% 76.44% 77.28% 77.39%

Mode 3 FAR FDR 10.02% 71.44% 3.67% 83.44% 3.25% 90.50% 3.33% 91.39%

Summing up the above analysis, the offline modeling and online detection procedures of our method can be summarized in Algorithm 1. Algorithm 1 Fault detection method based on data domain description and VCL Require: offline modeling: normal data set X = [x1 , x2 , · · · , xn ]T , fault data set Y = [y1 , y2 , · · · , yn ]T , parameters: s, k, cmin , cmax , dmin , dmax . Ensure: hypersphere enclosing most of the normal data. Calculate lrdk (ϕ(xi )) of each normal sample xi and lrdk (ϕ(yj )) of each faulty sample yj as Eq.(27). 2: Calculate the penalty coefficient ci of ϕ(xi ) and dj of ϕ(yj ) as Eq.(28) and Eq.(29), respectively. 3: Solve the QP problem (16) by SMO. 4: Obtain the hypersphere with a set αi and βj . Require: online detection: new data Z = (z1 ; z2 ; ...; zq ), parameters: ρ, σ, ε0 , ε1 , w. Ensure: determine whether the data zl ∈ Z is faulty. 1:

Select one of xi corresponding to 0 < αi < ci as xsv , and calculate T (zl ) as Eq.(22). Calculate T¯(zl ) as Eq.(31), and determine the current control limit θ using VCL strategy, where θ = ρ or θ = σ. 7: If T (zl ) ≤ θ, zl is normal, else zl is a faulty sample.

5: 6:

4. Experimental Study In this section, experimental studies are carried out on the brake test platform as shown in Fig.2. The data pre-processing is carried out first to map the data of the whole braking process into a small domain. For vector x = [x1 , x2 , ..., xm ]T ∈ Rm and matrix T ∈ Rm×m , the following linear mapping is applied: Φ(x) = Tx where the element of the i − th row and j − th column of matrix T is  1, i = j tij = 1/(m − 1), i 6= j 16

(32)

(33)

Denote the pressure vector of brake cylinders as x = [P1 , P2 , P3 , P4 ]T and apply linear mapping (32) on vector x. From the consistency between the pressure data of different brake cylinders, it’s easy to learn that the components of Φ(x) fluctuate around zero if the brake cylinder system is normal, and all the data in multiple stages are mapped into a small domain. The brake cylinder pressure data after pre-processing is shown in Fig.8, in which normal and faulty samples of brake cylinder system overlap very seriously. Based on large amounts of historical data, LRD weighted NSVDD is used for offline modeling after pre-processing. The parameters are set as s = 3, k = 5, cmin = 0.001, cmax = 0.2, dmin = 0.001, and dmax = 2. Here, three kinds of incipient faults of brake cylinder system are studied for online detection, and the parameters of VCL strategy are set as ρ = 5e−4 , σ = −5e−4 , ε0 = 0.5ρ, ε1 = 0.5σ, and w = 5. Note that these faults cannot be detected by KNORR-based strategy, which is currently embedded in the brake test platform.

15

10

5

0

-5 -50

-40

-30

-20

-10

0

10

Figure 8: Brake cylinder pressure data after pre-processing (illustrated by two dimensions of Φ(x)).

4.1. Online Detection of Sensor Deviation Fault Fig.9(a) shows the evolution of brake cylinder pressures in two consecutive braking processes. A sensor deviation fault is applied on the first sensor starting from 22.3s in the standby-stage, marked as tf . The amplitude of this fault is chosen as 2.0kPa. It’s too small to be detected by KNORR logic, which is represented by upper control limit UCL and lower control limit LCL. Fig.9(b) shows the control limit and the statistic T (z) obtained by LRD weighted NSVDD. For the samples before tf , there is no fault. In this case, the statistic T (z) doesn’t exceed zero most of the times and the control limit is always ρ > 0. From tf , the sensor deviation fault begins to influence P1 , and makes the statistic T (z) rise up rapidly. Meanwhile, T¯(z) exceed the threshold ε0 and the control limit is switched to σ < 0 according to our VCL strategy.

17

350

0.05

300 250 200 150 100 0

50 0

0

10

20

30

40

0

10

(a)

20

30

40

(b)

Figure 9: Detection results of sensor deviation fault (fault amplitude: 2.0kPa). (a) KNORR-based strategy (Invalid); (b) Our strategy (FAR: 0%, FDR: 100%).

It’s noted that fault direction and amplitude have heavy influences on detection performance. Table 4 shows the detection results of some different sensor deviation faults with different amplitudes. It can be seen that our method can achieve the superior performance for incipient sensor deviation faults. Table 4: Fault detection results of different sensor deviation faults

Fault position Fault amplitude FAR sensor 1 -3.0 0% sensor 1 -4.0 0% sensor 3 +3.0 0% sensor 3 +4.0 0%

FDR 91.25% 100% 65.63% 99.38%

4.2. Online Detection of Cylinder Component Fault Brake cylinder component faults are usually caused by long-term running of the brake cylinder, which can degenerate brake cylinder’s performance in air charging and discharging processes. Due to the influence of the component fault, air charging and discharging speed of the faulty brake cylinder is slower than others. Fig.10(a) shows the evolution of brake cylinder pressures in two consecutive braking processes, and a real component fault on the first brake cylinder occurred from the charging-stage of the second braking process.

18

10-3 300 250 200 150

0

100 50 0

0

10

20

30

40

10

(a)

20

30

40

(b)

Figure 10: Detection of cylinder component fault on the first brake cylinder. (a) Brake cylinder pressures; (b) Detection results (FAR: 0, FDR: 98.90%).

Fig.10(b) shows the detection results for this kind of fault. For the samples before tf , there is no fault. For these samples, the statistic T (z) doesn’t exceed zero most of the time and the control limit is always ρ > 0. From tf , the statistic T (z) rise up rapidly due to the cylinder component fault. Meanwhile, T¯(z) exceed the threshold ε0 and the control limit is switched to σ < 0 according to VCL strategy. In the holding-stage and latter part of the venting-stage of the second braking process, the faulty samples overlap with normal samples to a considerable extent, and the statistic T (z) is close to 0 most of the time, even less than 0 at some point. Nevertheless, the fault can be well detected due to VCL strategy. As a contrast, the FDR is only 59.67% if VCL strategy is not used. However, with the presence of sample overlap, this kind of fault is difficult to be detected by conventional methods. In this paper, several classical methods for fault detection are selected for comparison. Note that PCA, DPCA and one class SVM only use normal historical data, while C-SVC use normal and fault data simultaneously[9, 27, 32]. All data are processed by data pre-processing as Eq.(32). The time-lag parameter for DPCA is set as 5, and the confidence level for calculating control limit for PCA and DPCA is set as 95.0%. LIBSVM toolbox[35] is used to train C-SVC and one class SVM model, and the kernel function parameter is set as Gaussian kernel. Table 5 shows the detection results. The average results of 20 tests with the best combination of parameters are given for C-SVC and one class SVM. As shown in Table 5, the FDR of our method reaches 98.90% with FAR 0%, which demonstrates high performance. However, PCA, DPCA, C-SVC and one class SVM cannot obtain such a high FDR in the case of high FAR. It can be seen that the performance of our method is superior to those of other methods.

19

Table 5: Fault detection results of cylinder component fault

Algorithm Statistic FAR FDR PCA SP E 6.94% 39.78% 2 PCA T 6.12% 41.44% DPCA SP E 2.5% 42.54% 2 DPCA T 12.08% 48.62% C-SVC − 4.82% 32.65% One class SVM − 6.94% 43.09% Proposed method T (z) 0% 98.90% 4.3. Online Detection of Air Leakage Fault Air leakage faults usually occur at the connection between the pipeline and the brake cylinder. Due to the effect of leakage fault, the pressure of the brake cylinder will usually be lower than its normal value. Fig.11(a) shows the evolution of brake cylinder pressures in two consecutive braking processes, and a real air leakage fault on the fourth brake cylinder occurs from the charging-stage of the second braking process. 300

0.2

250 0.15 200 0.1

150 100

0.05

50 0

0 0

10

20

30

40

0

(a)

10

20

30

40

(b)

Figure 11: Detection of air leakage fault on the fourth brake cylinder. (a) Brake cylinder pressures; (b) Detection results (FAR: 0, FDR: 99.08% ).

Fig.11(b) shows the detection results for this kind of fault. Similar to the brake cylinder component fault, when the braking process starts from charging-stage, the statistic T (z) becomes bigger than the control limit ρ. Meanwhile, T¯(z) exceed the threshold ε0 and the control limit is switched to σ < 0 according to VCL strategy. This phenomenon lasts until the early part of the venting-stage due to the leakage fault. In the charging-stage, owing to the deflation of brake cylinder system, the pressure difference between different brake cylinders becomes smaller, and the statistic T (z) becomes smaller too. Although the faulty samples overlap with normal samples at the end of venting-stage, they can still be detected by VCL strategy. However, the FDR is only 83.03% if VCL strategy is not used. The detection results of air leakage fault are briefly listed in Table 6. Parameters for PCA and DPCA are set as before, and the average results of 20 tests are given for C-SVC 20

and one class SVM. As shown in Table 6, the FDR of our method reaches 99.08% with FAR 0%, which shows high performance. However, PCA, DPCA, C-SVC and one class SVM only obtain lower performance than our proposed method. Table 6: Fault detection results of air leakage fault

Algorithm Statistic FAR FDR PCA SP E 7.36% 6.42% 2 PCA T 6.49% 80.28% DPCA SP E 2.65% 19.27% 2 DPCA T 12.83% 84.4% C-SVC − 3.87% 75.69% One class SVM − 7.36% 79.82% Proposed method T (z) 0% 99.08% 5. Conclusion A novel method based on data domain description and VCL strategy is developed for detecting incipient faults in EMU braking system. The LRD weighted NSVDD for offline modeling can lead to more accurate domain description, and the Gaussian kernel trick is utilized to obtain hypersphere with soft boundary. Besides, VCL strategy is used to handle the sample overlapping problem. Experimental studies on the brake test platform of Qingdao Sifang Rolling Stock Research Institute Co., Ltd. have been implemented to demonstrate the effectiveness of the proposed method. Since we only consider L1-loss (hinge loss) in the proposed optimization problem, it’s worth further study on designing optimization problems with different alternative loss functions. Conflict of Interest None declared. References [1] M. Brumsen, “Case description: the ICE train accident near eschede,” in European Business Ethics Cases in Context. Springer, pp. 157–168, 2011. [2] T. Song, D. Zhong, and H. Zhong, “A STAMP analysis on the china-yongwen railway accident,” in International Conference on Computer Safety, Reliability, and Security. Springer, pp. 376–387, 2012. [3] D. Zhou and Y. Ye, “Modern fault diagnosis and fault-tolerant control,” Tsinghua University Press, 2000. [4] R. Beard, “Failure accommodation in linear system through self-reconfiguration,” Ph.D. dissertation, 1971. [5] P. M. Frank and X. Ding, “Survey of robust residual generation and evaluation methods in observerbased fault detection systems,” Journal of Process Control, vol. 7, no. 6, pp. 403–424, 1997. [6] J. Zhang, P. Christofides, X. He, Z. Wu, Z. Zhang, and D. Zhou, “Event-triggered filtering and intermittent fault detection for time-varying systems with stochastic parameter uncertainty and sensor saturation,” International Journal of Robust and Nonlinear Control, vol. 28, no. 16, pp. 4666–4680, 2018.

21

[7] Z. Ge, Z. Song, and F. Gao, “Review of recent research on data-based process monitoring,” Industrial & Engineering Chemistry Research, vol. 52, no. 10, pp. 3543–3562, 2013. [8] B. M. Wise, N. Ricker, D. Veltkamp, and B. R. Kowalski, “A theoretical basis for the use of principal component models for monitoring multivariate processes,” Process Control and Quality, vol. 1, no. 1, pp. 41–51, 1990. [9] J. Chen, K. Liu, “On-line batch process monitoring using dynamic PCA and dynamic PLS models,” Chemical Engineering Science, vol. 57(1), pp. 63–75, 2002 [10] U. Kruger and G. Dimitriadis, “Diagnosis of process faults in chemical systems using a local partial least squares approach,” AIChE Journal, vol. 54, no. 10, pp. 2581–2596, 2008. [11] J. Shang, M. Chen, H. Ji, and D. Zhou, “Recursive transformed component statistical analysis for incipient fault detection,” Automatica, vol. 80, pp. 313–327, 2017. [12] J. Shang and M. Chen, “Recursive dynamic transformed component statistical analysis for fault detection in dynamic processes,” IEEE Transactions on Industrial Electronics, vol. 65, no. 1, pp. 578–588, 2018. [13] R. Rubini and U. Meneghetti, “Application of the envelope and wavelet transform analyses for the diagnosis of incipient faults in ball bearings,” Mechanical Systems and Signal Processing, vol. 15, no. 2, pp. 287–302, 2001. [14] W. Yang, R. Court, P. J. Tavner, and C. J. Crabtree, “Bivariate empirical mode decomposition and its contribution to wind turbine condition monitoring,” Journal of Sound and Vibration, vol. 330, no. 15, pp. 3766–3782, 2011. [15] L. Jack and A. Nandi, “Support vector machines for detection and characterization of rolling element bearing faults,” Proceedings of the Institution of Mechanical Engineers, Part C: Journal of Mechanical Engineering Science, vol. 215, no. 9, pp. 1065–1074, 2001. [16] F. Zakaria, D. Johari, and I. Musirin, “Artificial neural network (ann) application in dissolved gas analysis (dga) methods for the detection of incipient faults in oil-filled power transformer,” in IEEE International Conference on Control system, Computing and Engineering (ICCSCE). pp. 328–332, 2012. [17] I. Aydin, M. Karakose, and E. Akin, “A robust anomaly detection in pantograph-catenary system based on mean-shift tracking and foreground detection,” in IEEE International Conference on Systems, Man, and Cybernetics (SMC). pp. 4444–4449, 2013. [18] Y. Wu, B. Jiang, N. Lu, and Y. Zhou, “Bayesian network based fault prognosis via bond graph modeling of high-speed railway traction device,” Mathematical Problems in Engineering, vol. 2015, 2015. [19] J. Guzinski, M. Diguet, Z. Krzeminski, A. Lewicki, and H. Abu-Rub, “Application of speed and load torque observers in high-speed train drive for diagnostic purposes,” IEEE Transactions on Industrial Electronics, vol. 56, no. 1, pp. 248–256, 2009. [20] J. Yin and W. Zhao, “Fault diagnosis network design for vehicle on-board equipments of high-speed railway: A deep learning approach,” Engineering Applications of Artificial Intelligence, vol. 56, pp. 250–259, 2016. [21] Y. Wu, B. Jiang, N. Lu, H. Yang, and Y. Zhou, “Multiple incipient sensor faults diagnosis with application to high-speed railway traction devices,” ISA Transactions, vol. 67, pp. 183–192, 2017. [22] C. Wen, F. Lv, Z. Bao, and M. Liu, “A review of data driven-based incipient fault diagnosis,” Acta Automatica Sinica, vol. 42, no. 9, pp. 1285–1299, 2016. [23] S. Wang, P. Li, F. Wang, and Q. Ji, “Control of brake cylinder pressure on metro trains for beijing no. 5 line,” Rolling Stock, vol. 52, no. 7, pp. 5–8, 2014. [24] D. Zhou, H. Ji, X. He, and J. Shang, “Fault detection and isolation of the brake cylinder system for electric multiple units,” IEEE Transactions on Control Systems Technology, vol. 26, no. 5, pp. 1744–1757, 2018. [25] T. Guo, D. Zhou, J. Zhang, M. Chen, and X. Tai, “Fault detection based on robust characteristic dimensionality reduction,” Control Engineering Practice, vol. 84, pp. 125–138, 2019. [26] D. M. Tax and R. P. Duin, “Support vector data description,” Machine Learning, vol. 54, no. 1, pp. 45–66, 2004.

22

[27] Y. Chen, X. Zhou, and T. S. Huang, “One-class svm for learning in image retrieval,” in IEEE International Conference on Image Processing, pp. 34–37, 2001. [28] E. M. Knorr, R. T. Ng, and V. Tucakov, “Distance-based outliers: algorithms and applications,” The International Journal on Very Large Data Bases, vol. 8, no. 3-4, pp. 237–253, 2000. [29] S. Boyd and L. Vandenberghe, “Convex optimization,” Cambridge University Press, pp. 215–232, 2004. [30] J. P. Vert, K. Tsuda, and B. Sch¨olkopf, “A primer on kernel methods,” Kernel Methods in Computational Biology, vol. 47, pp. 35–70, 2004. [31] J. C. Platt, “Fast training of support vector machines using sequential minimal optimization,” Advances in Kernel Methods, pp. 185–208, 1999. [32] C. Cortes and V. Vapnik, “Support-vector networks,” Machine Learning, vol. 20, no. 3, pp. 273–297, 1995. [33] M. M. Breunig, H. P. Kriegel, R. T. Ng, and J. Sander, “LOF: identifying density-based local outliers,” ACM Sigmod Record, vol. 29, no. 2, pp. 93–104, 2000. [34] S. Papadimitriou, H. Kitagawa, P. B. Gibbons, and C. Faloutsos, “Loci: Fast outlier detection using the local correlation integral,” in IEEE 19th International Conference on Data Engineering Proceedings, pp. 315–326, 2003. [35] C. Chang and C. Lin, “Libsvm: a library for support vector machines,” ACM Transactions on Intelligent Systems and Technology (TIST), vol. 2, no. 3, p. 27, 2011.

Jianxue Sang received the B.Eng. and M.Sci. degree in automation from National University of Defense Technology, Changsha, China, in 2003 and 2005 respectively. He is currently working toward the Ph.D. degree in control science and engineering with the Department of Automation, Tsinghua University, Beijing, China. His main research interest is fault diagnosis of high-speed train.

Junfeng Zhang received the B.E. degree in automation from Northeastern University and the Ph.D. degree in control science and engineering from Tsinghua University. His research interests include fault diagnosis and robust filtering, model predictive control and machine learning, stochastic networked systems and biomedical systems, and their applications.

23

Tianxu Guo received the B.Eng. degree in automation from Northeastern University, Shenyang, China, in 2012. He is currently working toward the Ph.D. degree in control science and engineering with the Department of Automation, Tsinghua University, Beijing, China. His research interests include data-driven industrial process monitoring and fault diagnosis with application in high-speed train.

Donghua Zhou received the B.Eng., M. Sci., and Ph.D. degrees all in electrical engineering from Shanghai Jiaotong University, China, in 1985, 1988, and 1990, respectively. He was an Alexander von Humboldt research fellow with the university of Duisburg, Germany from 1995 to 1996, and a visiting scholar with Yale university, USA from 2001 to 2002. He joined Tsinghua university in 1996, and was promoted as full professor in 1997, he was the head of the department of automation, Tsinghua university, during 2008 and 2015. He is now a vice president, Shandong University of Science and Technology, and a joint professor of Tsinghua university. He has authored and coauthored over 210 peer-reviewed international journal papers and 7 monographs in the areas of fault diagnosis, fault-tolerant control and operational safety evaluation. Dr. Zhou is a fellow of IEEE, CAA and IET, a member of IFAC TC on SAFEPROCESS, an associate editor of Journal of Process Control, the vice Chairman of Chinese Association of Automation (CAA), the TC Chair of the SAFEPROCESS committee, CAA. He was also the NOC Chair of the 6th IFAC Symposium on SAFEPROCESS 2006.

Maoyin Chen received the B.Sci. degree in mathematics and the M.Sci. degree in control theory and control engineering from Qufu Normal University, Shandong, China, in 1997 and 2000, respectively, and the Ph.D. degree in control theory and control engineering from Shanghai Jiao Tong University, Shanghai, China, in 2003. From 2003 to 2005, he was a Postdoctoral Researcher with the Department of Automation, Tsinghua University, Beijing, China. From 2006 to 2008, he visited Potsdam University, Potsdam, Germany, as an Alexander von Humboldt Research Fellow. Since October 2008, he has been an Associate Professor with the Department of Automation, Tsinghua University, Beijing, China. His research interests include fault prognosis and complex systems.

24

Xiuhua Tai is an expert of CRRC Qingdao Sifang Rolling Stock Research Institute Co.,Ltd, Qingdao, China. His research interests include R&D of braking system and control system of the high-speed train.

25