Feature extraction of internal dynamics of an engine air path system: Deep autoencoder approach

Feature extraction of internal dynamics of an engine air path system: Deep autoencoder approach

Proceedings,18th IFAC Symposium on System Identification Proceedings,18th IFAC Symposium on System Identification July 9-11, 2018. Stockholm, Sweden o...

891KB Sizes 0 Downloads 20 Views

Proceedings,18th IFAC Symposium on System Identification Proceedings,18th IFAC Symposium on System Identification July 9-11, 2018. Stockholm, Sweden on Proceedings,18th IFAC Symposium System Identification Available online at www.sciencedirect.com July 9-11, 2018. Stockholm, Sweden on Proceedings,18th IFAC Symposium System Identification July 9-11, 2018. Stockholm, Sweden July 9-11, 2018. Stockholm, Sweden

ScienceDirect

IFAC PapersOnLine 51-15 (2018) 736–741

Feature extraction of internal dynamics of Feature extraction of internal dynamics of Feature extraction of internal dynamics of engine air path system: Feature an extraction of internal dynamics of an engine air path system: an engine air path system: Deep autoencoder an engine air pathapproach system: Deep autoencoder approach Deep autoencoder approach Deep autoencoder approach ∗ ∗∗ ∗∗∗

Kazuhiro Kazuhiro Shimizu Shimizu ∗ Hayato Hayato Nakada Nakada ∗∗ Kenji Kenji Kashima Kashima ∗∗∗ ∗∗∗ Kazuhiro Shimizu ∗∗ Hayato Nakada ∗∗ ∗∗ Kenji Kashima ∗∗∗ Kazuhiro Shimizu Hayato Nakada Kenji Kashima ∗ ∗ Graduate School of Informatics, Kyoto University, ∗ Graduate School of Informatics, Kyoto University, School of Informatics, Yoshida-Honmachi, Sakyo-ku, Kyoto, Japan ∗ Graduate Yoshida-Honmachi, Sakyo-ku, Kyoto Kyoto,University, Japan Graduate School of Informatics, Kyoto University, Yoshida-Honmachi, Sakyo-ku, Kyoto, Japan (e-mail: [email protected]) (e-mail: [email protected]) Yoshida-Honmachi, Sakyo-ku,Technical Kyoto, Japan ∗∗ (e-mail: [email protected]) Motor Corporation, Higashifuji ∗∗ Toyota Motor [email protected]) Corporation, Higashifuji Technical Center, Center, Susono, Susono, (e-mail: ∗∗ Toyota Higashifuji Technical Susono, Shizuoka, Japan ∗∗ Toyota Motor Corporation, Shizuoka, JapanTechnical Center, Toyota Motor Corporation, Higashifuji Center, Susono, Shizuoka, Japan (e-mail: hayato [email protected]) (e-mail: hayato [email protected]) Shizuoka, Japan ∗∗∗ (e-mail:School hayatoof [email protected]) Informatics, ∗∗∗ Graduate Informatics, Kyoto Kyoto University, University, (e-mail:School hayatoof ∗∗∗ Graduate Graduate School [email protected]) Informatics, Kyoto University, Sakyo-ku, Kyoto, Japan ∗∗∗ Yoshida-Honmachi, Yoshida-Honmachi, Sakyo-ku, Kyoto, Japan Graduate School of Informatics, Kyoto Yoshida-Honmachi, Sakyo-ku, Kyoto, University, Japan (e-mail: [email protected]) (e-mail: [email protected]) Yoshida-Honmachi, Sakyo-ku, Kyoto, (e-mail: [email protected]) Japan (e-mail: [email protected]) Abstract: In In order order to to model model and and understand understand complex complex dynamics dynamics such such as as automotive automotive engines, engines, Abstract: Abstract: In order to model and understand complex dynamics such as automotive engines, it is meaningful to find a low dimensional structure embedded in a large number of physical it is meaningful to find a low and dimensional structure embedded insuch a large number of physical Abstract: order to model understand complex dynamics asextraction automotive engines, it is meaningful to find a low dimensional structure embedded a large number of variables. InIn this paper, we utilize several types of autoencoders forinfeature feature of internal variables. In this paper, we utilize several types of autoencoders for extraction of physical internal it is meaningful to find a low dimensional structure embedded in a large number of physical variables. In this paper, we utilize several types of autoencoders for feature extraction of internal dynamics data of an engine air path system. In particular, the practical usefulness is examined dynamics In data ofpaper, an engine air path system. Inofparticular, the for practical usefulness isof examined variables. this we utilize several types autoencoders feature extraction internal dynamics data of an engine air path system. In particular, the practical usefulness is examined through its application to dimensionality reduction, state estimation, and data replication. In through itsdata application to dimensionality reduction, state estimation, and data replication. In dynamics of an engine air path system. In particular, the practical usefulness is examined through its application to dimensionality reduction, state estimation, and data replication. In addition, a unified framework of feature extraction and dynamics identification is also discussed. addition, a unified framework of feature extraction andstate dynamics identification is also discussed. through application to dimensionality reduction, estimation, and data replication. In addition,its a unified framework of feature extraction and dynamics identification is also discussed. addition, a unified framework of feature extraction and dynamics identification is also discussed. © 2018, IFAC (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved. Keywords: Engine Engine air air path path system, system, Machine Machine learning, learning, dimensionality dimensionality reduction. reduction. Keywords: Keywords: Engine air path system, Machine learning, dimensionality reduction. Keywords: Engine air path system, Machine learning, dimensionality reduction. 1. INTRODUCTION INTRODUCTION 1. 1. INTRODUCTION Exhaust gas EGR cooler Exhaust gas EGR cooler ③ 1. INTRODUCTION ③ ② Exhaust gas EGR cooler For the MBD (Model Based Development) of automotive ② ③ For the MBD (Model Based Development) of automotive Exhaust gas EGR cooler ② For the MBD (Model Based Development) of automotive engines, it is crucial to develop a simple method to con③ Turbine engines, it is crucial toBased develop a simple method to con② For the MBD (Model Development) of automotive Turbine engines, it isaccurate crucial to a simple method to construct their their accurate anddevelop low-complexity models. To To this Turbine struct and low-complexity models. this Cylinder engines, it isaccurate crucial ato develop a simple method todataconCylinder Turbine struct their and low-complexity models.to To this end, we investigated sparse modeling approach Cylinder end, wetheir investigated aand sparse modeling approach to datastruct accurate low-complexity models. To this modeling to dataend, we investigated a sparse approach driven modeling of a diesel engine air path system in our Turbine shaft Cylinder driven modeling of a adiesel engine air path system in our Turbine shaft ① end, wemodeling investigated sparse approach tothe ① driven a diesel air path system indataour Turbine shaft previous work. In Inof Shimizu etengine al.modeling (2017), we modeled beprevious work. Shimizu et al. (2017), we modeled the be① driven modeling of a diesel engine air path system in our Turbine shaft previous work. In Shimizu et al. (2017), we modeled the behavior of a scalar output while providing some confidence ① Compressor havior ofwork. a scalar output while providing some confidence previous In Shimizu et al. (2017), we modeled the beCompressor havior of a scalar output while providing some confidence interval. It It should should be be noted noted that that there there are are aa large large number number Inter cooler Fresh air Compressor interval. Inter cooler Fresh air havior of a scalar output while providing some confidence Compressor interval. It should be noted that there are a large number of physical variables inside the engine. Although similar Inter cooler Fresh air of physical variables inside the engine. Although similar interval. It should be noted that there are a large number Inter cooler Fresh air of physicalof insideas engine. et Although similar modeling of variables each variable variable as the in Shimizu Shimizu et al. (2017) (2017) may modeling each in al. may of physical variables inside the engine. Although similar modeling of each variable as in Shimizu et al. (2017) may be possible, such results will not be preferable considering be possible, will not be preferable considering modeling of such each results variable as in Shimizu etcomplexity, al. (2017) may such results not be preferable be considering Fig. 1. 1. Engine Engine air air path path system system its possible, required memory size,will computational and Fig. its required memory size, computational complexity, and Fig. 1. Engine air path system be possible, such results will not be preferable considering its required memory size, computational complexity, and interpretability. To circumvent this issue, feature extrac1. Engine air path system denoising and autoencoders interpretability. To circumvent this issue,complexity, feature extracits memory size,data computational and Fig. denoising and generative generative autoencoders are are designed, designed, and and interpretability. To circumvent this aaissue, feature extractionrequired of internal internal dynamics seems promising direction. denoising and generative autoencoders are designed, and we explore their practically useful applications. In tion of dynamics data seems promising direction. interpretability. To circumvent this aissue, feature extrac- denoising we exploreand theirgenerative practically useful applications. In Section Section autoencoders are designed, and tion of internal dynamics data seems promising direction. we explore their practically useful applications. In Section 6, system identification problem is addressed, where an In this this paper, dynamics several types types of autoencoders aredirection. utilized 6, tion of internal dataof seems a promising system identification problem is addressed, In where an we explore their practically useful applications. Section In paper, several autoencoders are utilized 6, system identification problem is addressed, where an algorithm with its convergence proof is provided. Finally, In this paper, several types of autoencoders are utilized for this purpose. Autoencoder is a typical unsupervised algorithm with its convergence proof is provided. Finally, 6, system with identification problem is addressed, where an for this this paper, purpose. Autoencoder is a typical unsupervised In several types of autoencoders are utilized algorithm its convergence proof is provided. Finally, concluding remarks are given in Section 7. for this purpose. Autoencoder is a typical unsupervised learning method, method, which which has has made made rapid rapid progress progress in in mama- concluding remarks are given inproof Section 7. algorithm with its convergence is provided. Finally, learning for this purpose. Autoencoder is a typical unsupervised concluding remarks are given in Section 7. learning method, which has madeand rapid in machine learning learning community (Deng and Yu progress (2014)). We We in- concluding Notation: The element is givenof Sectionx chine community (Deng Yu (2014)). inlearning method, which has madeand rapid progress in maNotation: remarks The i-th i-th are element ofinaa vector vector x 7. is denoted denoted by by the the chine learning community (Deng Yu (2014)). We investigate not only dimensionality reduction, but also its subscript Notation: The i-th element of a vector x is denoted by the such as x . The normal distribution with mean vestigate not only dimensionality reduction, but also its i chine learning community (Deng and Yu (2014)). We insubscript such as x . The normal distribution with mean i Notation: The i-th element of a vector x is denoted by the vestigate notto reduction, but also its application toonly statedimensionality estimation and and data replication. replication. In vector subscript such as x . The normal distribution with mean µ and variance-covariance matrix Σ is represented i application state estimation data In vestigate not only reduction, but alsoand its µ and Σ is represented subscript suchvariance-covariance as xi . The normal matrix distribution with mean application to statedimensionality estimationofand data extraction replication. In vector addition, a unified framework feature vector µ and variance-covariance matrix Σ is represented by N (µ, Σ). addition, a to unified ofand feature application stateframework estimation data extraction replication.and In by N (µ, vector µ Σ). and variance-covariance matrix Σ is represented framework of feature extraction and addition, unified dynamics aidentification identification is also also discussed. discussed. by N (µ, Σ). dynamics is addition, a unified framework of feature extraction and by N (µ, Σ). dynamics identification is also discussed. 2. This paper paperidentification is organized organized as as follows: Section 22 briefly briefly reviews reviews dynamics is follows: also discussed. 2. ENGINE ENGINE AIR AIR PATH PATH SYSTEM: SYSTEM: OVERVIEW OVERVIEW This is Section 2. ENGINE AIR PATH SYSTEM: OVERVIEW This organized follows: 2 brieflystandard reviews an air airpaper pathissystem. system. In as Section 3,Section we construct construct standard an path In Section 3, we 2. ENGINE AIR PATH SYSTEM: OVERVIEW This paper is organized as follows: Section 2 briefly reviews an air path system. Section variables 3, we construct standard autoencoders for the theIninternal internal variables stationary data, In this section, we describe the air path autoencoders for stationary data, In this section, we describe the air path system system under under an air path system. In Section 3, we construct standard autoencoders for the internal variables stationary data, and examine relationships between the neural network In this section, we describe the air path system under consideration, whose schematic picture is shown in Fig. 1. and examine relationships between the neural network consideration, whose schematic picture is shown in Fig. 1. autoencoders for the reducibility. internal variables stationary data, In this section, we correspond describe the air path system under and examine relationships between the neural4 network structure and data In Sections and 5, consideration, whose schematic picture is shown in Fig. 1. The numbers below to those in the figure. structure and relationships data reducibility. In the Sections 4 network and 5, The numbers below correspond to those in the figure. and examine between neural whosecorrespond schematic picture isinshown in Fig. 1. structure and data reducibility. In Sections 4 and 5, consideration, The numbers below to those the figure. ⋆ This workand structure data reducibility. In Sections and 5, The (1) The air and cooled. Then, it correspond to in the figure. partially supported by JSPS KAKENHI4 26289130 ⋆ This work is (1) numbers The fresh freshbelow air is is compressed compressed andthose cooled. Then, it flows flows is partially supported by JSPS KAKENHI 26289130 ⋆ (1) The fresh air is compressed and cooled. Then, it flows and JSPS KAKENHI 60401551. into the cylinder. Between the cooler and the cylinder, This work is partially supported by JSPS KAKENHI 26289130 and JSPS KAKENHI 60401551. into the cylinder. Between the cooler and the cylinder, ⋆ (1) The fresh air is compressed and cooled. Then, it flows This work is partially supported by JSPS KAKENHI 26289130 and JSPS KAKENHI 60401551. into the cylinder. Between the cooler and the cylinder, and JSPS KAKENHI 60401551. into the cylinder. Between the cooler and the cylinder, 2405-8963 © 2018, IFAC (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved. 1 1 1 1 1

Copyright © 2018 IFAC 736 Peer review© under responsibility of International Federation of Automatic Copyright 2018 IFAC 736 Control. Copyright © 2018 IFAC 736 10.1016/j.ifacol.2018.09.167 Copyright © 2018 IFAC 736

2018 IFAC SYSID July 9-11, 2018. Stockholm, Sweden

Kazuhiro Shimizu et al. / IFAC PapersOnLine 51-15 (2018) 736–741

Table 1. Control inputs and environmental parameters d1 d2 u1 u2 u3

Engine speed Injection quantity Throttle position EGR valve position VGT position

there is a throttle valve. The amount of the fresh air flowing into the cylinder is controlled by this valve. In this paper, a throttle position is the first control input denoted as u1 . (2) Between the exhaust manifold and the intake manifold, there is an EGR path. A part of the exhaust gas can be introduced through the EGR path into the intake manifold. This results in the reduction of nitrogen oxide in the emission. The EGR valve position is the second control input denoted as u2 . (3) This engine has a variable geometry turbocharger (VGT). By changing the position of the VGT, the intake manifold pressure can be indirectly controlled. This VGT position is the third control input denoted as u3 . There are also environmental parameters that represent the driving condition. One is the engine speed d1 , and the other is the injection quantity d2 . The variables introduced above are summarized in Table 1. In Shimizu et al. (2017), only the EGR rate and the intake manifold pressure, which directly affects the engine performance, are regarded as an output. In the current work, we measured 21 variables that are relevant to understand the internal behavior, e.g., the pressure, temperature, flow amount, air density at several points 1 . More specifically, the available data is steady-state values of 21 relevant internal variables x ∈ R21 corresponding to 75 (= 7776) combination of (d, u). These data are divided into the training data Xt and the verification data Xv , where |Xt | = 5776 and |Xv | = 2000.

From a mathematical point of view, each element of x can be viewed as a function of (d, u). This means that x is located on a 5-dimensional manifold in R21 . On the other hand, elementwise regression (e.g., function approximation of xi (d, u), i = 1, 2, . . . , 21) as in Shimizu et al. (2017) will not provide any insight into such a relationship between these variables. Toward better interpretability of the engine mechanism, the main purpose of this work is to utilize several types of autoencoders to directly extract such a low dimensional structure embedded in the high dimensional internal physical variables. 3. AUTOENCODER 3.1 Autoencoder An autoencoder is an artificial neural network used for unsupervised learning of efficient codings. The aim of an autoencoder is to learn a representation for a set of data, typically for the purpose of dimensionality reduction. An autoencoder consists of two parts. One is an encoder which 1

All the data in this paper are normalized so that they have mean 0 and variance 1.

737

737

compress the inputs, and the other is a decoder which reconstructs the inputs from the encoded vector. These are denoted by maps E : Rd → F and D : F → Rd where d is the dimension of the input and F is usually called feature space. If an autoencoder successfully reconstructs a set of data X and the dimension of the feature space F is lower than d, then the feature vector E(x) can be regarded as a compressed representation of the input x ∈ X .

The encoder and the decoder are designed by minimizing the reconstruction error: ∑ L(E, D) := ∥D(E(x)) − x∥2 . (1) x∈X

An autoencoder has an input layer, an output layer and some hidden layers connecting them; see Fig. 2. The output layer has the same number of nodes as the input layer because the purpose is the reconstruction of its own inputs. A brief mathematical description of each circle in Fig. 2 is given by   ∑ (i+1) (i+1) (i)  (i+1) + bj =f ωj aj aj j

(i) aj

where is the scalar output of j-th node in i-th layer and real constants w and b represent weight and bias; see, e.g., Deng and Yu (2014) for details. In this paper, nonlinear activation function f was chosen to be f (x) = max(0, x). Standard methods such as stochastic gradient decent, mini-batch processing and back propagation are employed for efficient optimization. 3.2 Feature extraction We construct an autoencoder for the data set of the engine air path system in order to reveal the hidden low dimensional structure. The cost function is given by (1) with X = Xt . Fig. 3 shows the relationship between the dimension of feature space F and reconstruction error evaluated by the verification data Xv : 21 √ 1 ∑ 1 ∑ |D(E(x))i − xi |2 . (2) V (E, D) := 21 i=1 |Xv | x∈Xv

The results for the following 4 structures are shown in Fig. 3: (1) (2) (3) (4)

3 5 5 5

layer: layer: layer: layer:

21 − dim(F) − 21 (yellow) 21 − 22 − dim(F) − 22 − 21 (blue) 21 − 100 − dim(F) − 100 − 21 (green) 21 − 300 − dim(F) − 300 − 21 (red)

First, the generalization error increases as dim(F) becomes smaller. In particular, the error quickly increases when dim(F) is less than 5, which is the dimension of (d, u). Next, we compare the effect of the choice of structure of the neural networks. The yellow line shows the result of three layer autoencoder, while other three lines show the results of five layer autoencoders with different number of nodes in the second and fourth layers. This figure indicates that the larger degree of freedom actually leads to better approximation. On the other hand, the results for seven layer autoencoders, which are not shown in this paper, are not so better than the five layer counterparts. Although it

2018 IFAC SYSID 738 9-11, 2018. Stockholm, Sweden July

Kazuhiro Shimizu et al. / IFAC PapersOnLine 51-15 (2018) 736–741

Table 2. Robustness to quantization

Output layer

Input layer

𝑥𝑥"

𝐷𝐷(𝐸𝐸(𝑥𝑥)) "

𝑥𝑥#

𝐷𝐷(𝐸𝐸(𝑥𝑥)) #

𝑥𝑥$

Encode

Decode

𝐷𝐷(𝐸𝐸(𝑥𝑥)) $

Quantization steps

8

12

Vq (I, I) Vq (E ⋆ , D ⋆ ) (Normal) Vq (Ed , Dd ) (Denoising)

0.22 0.14 0.11

0.14 0.11 0.09

4.2 Low resolution data First, we consider the situation that we attempt to obtain high resolution data from low resolution sensor data. Let us denote the elementwise uniform quantizer (to 8 and 12 values) by Q. The original data Xt are uniformly quantized, and {Q(x) : x ∈ Xt } is used as corrupted data for the training.

Fig. 2. Structure of 5 layer autoencoder 0.4 0.35

The verification results are shown in Table 2 where the reconstruction error is defined as 21 √ 1 ∑ 1 ∑ Vq (E, D) := |D(E(Q(x)))i − xi |2 . 21 i=1 |Xv |

0.3 0.25 0.2 0.15

x∈Xv

0.1 0.05 0 5

10

15

Fig. 3. Neural network structure and resulting reconstruction error is difficult to determine the optimal structure, the effect of the number of nodes was more significant than the number of layers in most of results we performed. In what follows, we fix the structure with 5 layers, 300 nodes in the middle layers and dim(F) = 4 because the generalization ability seems satisfactorily high in Fig. 3. The autoencoder designed by this structure is denoted by (E ⋆ , D⋆ ), that is, V (E ⋆ , D⋆ ) = 0.06831; see the red line in Fig. 3. 4. DENOISING AUTOENCODER FOR ROBUSTIFICATION The observation in the previous section suggests that suitable low dimensional feature can be extracted from the data. In this section, we attempt to compute the feature vector based on the low accuracy or partially observed data. 4.1 Denoising autoencoder Denoising autoencoders take a partially corrupted input while being trained to recover the original undistorted input. The aim of denoising autoencoders is to obtain autoencoders equipped with recovering ability, in the sense it can robustly generate the corresponding undistorted data from corrupted input. To this end, the encoder and decoder are designed by minimizing ∑ Ed∼p(d) ∥D(E(x + d)) − x∥2 (3) x∈X

where Ed∼p(d) denotes the expectation with respect to p(d), which is the probability distribution of random variable d; see Vincent et al. (2008) for more detail. In this section, the usefulness of the denoising autoencoder for Xt is examined through two typical situations.

738

(4) The first row shows the effect of quantization, i.e, Vq (I, I) with I being the identity operation. The second row is for the normal autoencoder (E ⋆ , D⋆ ) in the previous section, and the third row represents the result for the denoising autoencoders (Ed , Dd ). For both step sizes, Vq (I, I) > Vq (E ⋆ , D⋆ ) > Vq (Ed , Dd ) > V (E ⋆ , D⋆ ). This means that the denoising autoencoder is more robust against the quantization. 4.3 State estimation Next, we consider the situation that we cannot observe some of 21 variables in Xt . Given j(= 1, 2, . . . , 21), let M1:j be the mask of the 1st to j-th variables where these elements are replaced by an i.i.d. Gaussian random variable N (0, 1). We used {M1:j (x) : x ∈ Xt } as the corrupted data for training a denoising autoencoder, which is denoted by (Edj , Ddj ). For the verification, we calculate the reconstruction error of the masked variables j √ 1∑ 1 ∑ |D(E(M1:j (x)))i − xi |2 , Vp (E, D, j) := j i=1 |Xv | x∈Xv

(5) and that of the observed variables Vp+ (E, D, j) (6) √ 21 ∑ 1 1 ∑ := |D(E(M1:j (x)))i − xi |2 . 21 − j i=j+1 |Xv | x∈Xv

First, Fig. 4 shows Vp (E, D, j) for (E ⋆ , D⋆ ) and (Edj , Ddj ), where the denoising autoencoder successfully recovers the original data by using (21 − j)-variables when j ≤ 17. Second, Fig. 5 shows Vp+ (E, D, j). It should be noted that, in the case of the normal autoencoder, the masking deteriorates the reconstruction error of observed variables while the masking does not affect them in the case of the denoising autoencoder. In this sense, we obtained a feature vector that can be calculated by only limited number of variables, which is useful for better interpretability.

2018 IFAC SYSID July 9-11, 2018. Stockholm, Sweden

Kazuhiro Shimizu et al. / IFAC PapersOnLine 51-15 (2018) 736–741

1.2

𝑥𝑥 data

1

2

𝐸𝐸 𝑥𝑥

z

3 0.52 6 3.14 7 6 7 6 0.28 7 4 5 .. .

Encoder

0.8 0.6

739

𝐷𝐷(𝐸𝐸 𝑥𝑥 ) output

Decoder

fake

z 0 ⇠ p(z)

real

0.4

Input

Discriminator

0.2 Prior distribution

1

real

0

fake

0 0

5

10

15

20

Fig. 6. Adversarial autoencoder

Fig. 4. Average reconstruction error of unobserved variables

6

5

1

4

0 2

-5

0.8

0

-10 -2

0.6

-15 15 10

0.4

10

-4

5 0

0

-6

-5

-10

0.2

Fig. 7. Distribution of the feature vector {E ⋆ (x) : x ∈ Xt }

0 0

5

10

15

20

Fig. 5. Average reconstruction error of observed variables Similar numerical experiments were also performed with randomly generated combinations of observable/unobservable variables. The result suggested that arbitrarily chosen 6 sensors are enough to reconstruct all the 21 variables. On the other hand, for some combination of 4 or 5 sensors, we were unable to reconstruct the unobserved variables. 5. GENERATIVE AUTOENCODER One of the major drawback of normal autoencoders is that we have no physical information about the obtained feature space F. Note that, in the minimization criteria in (1), there is a degree of freedom of nonlinear coordinate transformation which makes it difficult to understand the meaning of the feature vector. Actually, it is preferable to know the distribution of the feature vector to use them for anomaly detection or design of control law based on the feature vector. To overcome such an issue, adversarial autoencoder introduced below is a promising method. 5.1 Adversarial learning of generative autoencoder Adversarial autoencoder is aimed at matching the posterior distribution of E(x) to an arbitrarily given prior distribution denoted by p(z); see Makhzani et al. (2015). To construct such autoencoders, a neural network called discriminator is attached as in Fig. 6. The role of discriminator, denoted by H : Rf → [0, 1] with f = dim(F), is to distinguish whether its input is generated by the encoder or according to the prior distribution p(z). To achieve such 739

a property, given an encoder E, we train the discriminator by maximizing ∑ ∑ log(1 − H(E(x))) + log H(z), (7) x∈X

z∈Z

where Z is a set of samples generated according to p(z); see Makhzani et al. (2015). On the contrary, given a discriminator H, the encoder interferes this task by minimizing ∑ log(1 − H(E(x))). (8) x∈X

Similarly to the previous methods, we attempt to reduce the reconstruction error in (1). By iterating these three optimization (i.e., updating E, D by minimizing (1), H by maximizing (7), and E by minimizing (8)), we finally obtain a desired autoencoder. Once the training procedure is done, the decoder of the autoencoder will define a generative model that maps the imposed prior distribution p(z) to the data distribution. 5.2 Data replication

We first show the distribution of the feature vector of the normal autoencoder {E ⋆ (x) : x ∈ Xt } ⊂ R4 in Fig. 7. It is difficult to obtain any useful information from this. This is because the design criteria (1) for the normal autoencoder does not regularize the distribution of the feature vector. We here design a generative autoencoder (Eg , Dg ) for Xt with the prior disribution p(z) ∼ N (0, 52 · I4 ) (9) 4×4 where I4 ∈ R is the identity matrix. Fig. 8 shows the normalized histograms of {(Eg (x))1 : x ∈ Xt , Eg (x) ∈ Zk } (k = 1, 2, . . . , 9) in order to evaluate the posterior distribution of the feature vector Eg (x), where Zk represents

2018 IFAC SYSID 740 9-11, 2018. Stockholm, Sweden July

Kazuhiro Shimizu et al. / IFAC PapersOnLine 51-15 (2018) 736–741

We investigate this similarly to Tanaka et al. (2016), which studied the same issue for the manifold learning.

Fig. 8. Posteriori (blue histogram) and prior (red line) distributions of the feature vector 8

6

Suppose that a time series data {x(k)}k ⊂ R21 of internal physical variables is available for the modeling. For our engineering purpose, it is important to obtain a model that provides a suitable stationary values. In what follows, an expression for the stationary values x ¯(d, u) is assumed to be available, which seems possible via regression of 4 or 5 dimensional feature vector, but not of 21 dimensional original variables. By using this as an input operator, we construct a model in the form of E(x(k + 1)) = A(E(x(k)) − E(¯ x(k))) + E(¯ x(k)) (10) with a stable square matrix A and the simplified notation x ¯(k) := x ¯(d(k), u(k)); (11) see Shimizu et al. (2017). Note that, independent of A, E(x(k)) = E(x(k + 1)) if and only if E(x(k)) = E(¯ x(k)) due to the nonsingularity of I − A. Therefore, the stationary state of the system is close to the prespecified x ¯ as far as the reconstruction error (see (13) below) is small. In what follows, we discuss how to determine the autoencoder (E, D) and the system matrix A.

4

Since the simultaneous optimization of E and A seems difficult, we employ a two stage optimization procedure as follows: First, let us consider L1 (E, A) (12) ∑ := ∥E(x(k + 1)) − AE(x(k)) + (A − I)E(¯ x(k))∥2

2

0

-2

-4 -4

k

-2

0

2

4

Fig. 9. Original (blue) and replicated (red) data the conditioning, e.g., Z1 := {z ∈ R4 : −100 < z2 < −4}. We can observe that the distribution well fits to the prespecified one in (9). As we mentioned before, the decoder of adversarial autoencoder can play a role of a generative model that maps the imposed prior distribution p(z) to the data distribution. To see this, we generated a set of 10000 i.i.d. samples Zp ⊂ R4 according to the priror distribution p(z) in (9), and compared {Dg (z) : z ∈ Zp } with Xt . Since it is difficult to compare the 21-dimensional distributions, we show the scatter plot of 1st and 2nd elements in Fig. 9. The distribution of these 2 sets are almost identical, which means the obtained decoder is a reasonable generative model.

for the trajectory fitting constraint. The other criterion is the reconstruction error similar to the previous sections: ∑ ∥D(E(x(k)) − x(k)∥2 . (13) L2 (E, D) := k

Then, we minimize L(E, D, A)

:= L1 (E, A) + γL2 (E, D) − δ



(14) 2

log(∥E(x(k))∥ ).

k

where γ, δ > 0 are tuning parameters. It should be noted that, for any fixed A, L1 (E, A) can be made arbitrarily small by scaling E and D, without changing L2 (E, D). Therefore, the minimization of L1 (E, A)+γL2 (E, D) (with respect to E, D) ignores the first term for any γ > 0. To avoid this, the third term is added to prevent E(·) from converging to 0. Algorithm: Step 0: Determine ϵ > 0 and initialize A(0) and t := 0. Step 1: Update E and D by

6. IDENTIFICATION OF TRANSIENT RESPONSE In this section, we propose a unified framework of feature extraction and system identification of the system under investigation. The results in the preceding sections motivate us to construct a model of a low dimensional feature vector instead of models for xi (i = 1, 2, . . . , 21). The essential difference from the standard linear system identification (Katayama (2005); Ljung (2010)) is the degree of freedom of the nonlinear coodinate transformation of the autoencoder. This nonlinearity provides a possibility to obtain a linear model for nonlinear dynamics. It should be emphasized that the design of transition matrix and feature vector selection cannot be optimized independently. 740

(E (t+1) , D(t+1) ) := arg min L(E, D, A(t) ). (E,D)

(15)

Step 2: Update A by A(t+1) := arg min L1 (E (t+1) , A). A

(16)

Step 3: Go back to Step 1 with t = t + 1 if |L(E (t+1) , D(t+1) , A(t+1) ) − L(E (t) , D(t) , A(t) )| > ϵ, otherwise stop. The convergence of this algorithm can be guaranteed as follows: Theorem 1. Suppose that supx,E ∥E(x)∥ is bounded where the supremum over E is taken for the class of en-

2018 IFAC SYSID July 9-11, 2018. Stockholm, Sweden

Kazuhiro Shimizu et al. / IFAC PapersOnLine 51-15 (2018) 736–741

coders under consideration. Then, by the algorithm above, L(E (t) , D(t) , A(t) ) becomes a monotonically non-increasing sequence and converges to a value as t → ∞. Proof. In Step 1, L(E (t+1) , D(t+1) , A(t) ) ≤ L(E (t) , D(t) , A(t) ) holds for any t. Similarly, we have L1 (E (t+1) , A(t+1) ) ≤ L1 (E (t+1) , A(t) ). By combining these inequalities and the definitions of L, L(E (t+1) , D(t+1) , A(t+1) ) ≤ L(E (t) , D(t) , A(t) ).

Therefore, L(E (t) , D(t) , A(t) ) is a monotonically nonincreasing sequence. By the assumption, this sequence is bounded from below. Hence, Weierstrass theorem completes the proof. Note that Step 2, a standard least square optimization, can be effectively solved. In Step 1, the L1 (E, A(t) ) plays a role of regularization in a quadratic form. The partial derivative of the second and third terms in L with respect to neural network parameters can be given in a simple form. As a result, we can implement heavy but feasible stochastic gradient decent for L(E, D, A(t) ).

It is also possible to combine several type of autoencoders. For example, we may constrain the obtained state variable E(x) to depend only on specific variables as in Section 4, or to evolve in a specific region as in Section 5. 7. CONCLUSION In this paper, several types of autoencoders are applied to feature extraction of the internal dynamics data of an engine air path system. We observed that they can successfully find a low-dimensional structure embedded in highdimensional stationary data of internal physical variables. Furthermore, recent technique such as denoising and adversarial learning enables us to deal with various practical constraints. Finally, in Section 6, a unified framework of system identification and feature extraction was discussed, whose application to transient response experimental data is currently under investigation. REFERENCES Deng, L. and Yu, D. (2014). Deep learning: methods and applications. Foundations and Trends in Signal Processing, 7(3-4), 197–387. Kashima, K. (2016). Nonlinear model reduction by deep autoencoder of noise response data. In IEEE 55th Conference on Decision and Control, 5750–5755. Katayama, T. (2005). Subspace Methods for System Identification. Communications and control engineering. Springer. Konaka, E. (2016). Model-free controller design for discrete-valued input systems based on autoencoder. In SICE Annual Conference 2016, 685–690. Ljung, L. (2010). Perspectives on system identification. Annual Reviews in Control, 34(1), 1–12. Makhzani, A., Shlens, J., Jaitly, N., Goodfellow, I., and Frey, B. (2015). Adversarial Autoencoders. ArXiv eprints. 741

741

Pillonetto, G., Dinuzzo, F., Chen, T., De Nicolao, G., and Ljung, L. (2014). Kernel methods in system identification, machine learning and function estimation: A survey. Automatica, 50(3), 657–682. Schenato, L. (2016). Multi-agent map-building over lossy networks: From parametric to non-parametric approaches. In 6th IFAC Workshop on Distributed Estimation and Control in Networked Systems. Tokyo. Shimizu, K., Nakada, H., and Kashima, K. (2017). Experimental study on sparse modeling of a diesel engine air path system. In 2017 IEEE Conference on Control Technology and Applications (CCTA), 1426–1431. IEEE. Tanaka, D., Matsubara, T., and Sugimoto, K. (2016). Input-output manifold learning with state space models. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, E99.A(6), 1179–1187. Vincent, P., Larochelle, H., Bengio, Y., and Manzagol, P.A. (2008). Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th international conference on Machine learning - ICML ’08, 1096–1103. ACM Press, New York, USA.