A Spatial Multivariable SVR Method for Spatiotemporal Fuzzy Modeling with Applications to Rapid Thermal Processing Recommended by Prof. T Parisini
Journal Pre-proof
A Spatial Multivariable SVR Method for Spatiotemporal Fuzzy Modeling with Applications to Rapid Thermal Processing Xian-Xia Zhang, Han-Yu Yuan, Han-Xiong Li, Shi-Wei Ma PII: DOI: Reference:
S0947-3580(19)30371-1 https://doi.org/10.1016/j.ejcon.2019.11.006 EJCON 396
To appear in:
European Journal of Control
Received date: Revised date: Accepted date:
16 August 2019 25 October 2019 23 November 2019
Please cite this article as: Xian-Xia Zhang, Han-Yu Yuan, Han-Xiong Li, Shi-Wei Ma, A Spatial Multivariable SVR Method for Spatiotemporal Fuzzy Modeling with Applications to Rapid Thermal Processing, European Journal of Control (2019), doi: https://doi.org/10.1016/j.ejcon.2019.11.006
This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2019 Published by Elsevier Ltd on behalf of European Control Association.
Highlights • Spatial multivariable SVR based 3-D fuzzy modeling methodology is proposed for distributed parameter systems. • Multivariable SVR with spatial kernel functions is proposed to cope with a set of spatiotemporal data. • Spatial multivariable SVR builds up a complete 3-D fuzzy rule-base. • 3-D fuzzy model is constructed in the form of data-driven design.
1
u( , t )
x
t
input
output
DPS
3-D fuzzy modeling
3-D fuzzy model R1 : if x(, t) is A1 then u , t is 1
SFBF
SKF
R : if x(, t ) is AN then u , t is N N
Spatial SVs (SV1 ,SV2 , ,SVN ) & Learning parameters i i i i N 1 1 , , L L
i 1
Spatial multivariable SVR SFBF: spatial fuzzy basis function of 3-D fuzzy model SKF: spatial kernel function of spatial multivariable SVR
2
t
A Spatial Multivariable SVR Method for Spatiotemporal Fuzzy Modeling with Applications to Rapid Thermal Processing Xian-Xia Zhanga,∗, Han-Yu Yuana , Han-Xiong Lib , Shi-Wei Maa a Shanghai
Key Laboratory of Power Station Automation Technology, School of Mechatronics and Automation, Shanghai University, Shanghai 200444, China b Department of Systems Eng & Eng Management, City University of Hong Kong, Hong Kong, China
Abstract Many industrial processes have significant spatiotemporal dynamics and they are usually called distributed parameter systems (DPSs). Modeling such system is challenging due to its nonlinearity, time-varying dynamics, and spatiotemporal coupling. Using model reduction techniques, traditional DPS modeling methods usually reduce an infinite-dimensional system to a finite-dimensional system, which leads to unknown nonlinearity and unmodeled dynamics. The modeling method and the established model are hard to understand. Here, we propose a spatial multivariable support vector regression (SVR) based threedomain (3-D) fuzzy modeling method for complex nonlinear DPSs. The proposed 3-D modeling method integrates the time-space separation and time-space synthesis into a 3-D fuzzy model. Therefore, it doesn’t require model reduction and owns the capability of linguistic interpretability. A spatial multivariable SVR with spatial kernel functions is proposed to deal with spatiotemporal data. The spatial fuzzy basis functions from a 3-D fuzzy model are spatial kernel functions for a spatial multivariable SVR, which satisfy Mercy theorem. Hence, the spatial multivariable SVR can be directly employed to build up a complete 3-D fuzzy rule-base of the 3-D fuzzy model. The proposed modeling method ∗ Corresponding
author Email address:
[email protected] (Xian-Xia Zhang)
Preprint submitted to European Journal of Control
November 27, 2019
integrates the merits of learning ability from a spatial multivariable SVR and fuzzy space processing and fuzzy linguistic expression from a 3-D fuzzy model. The proposed 3-D fuzzy modeling method is successful applied to a simulated rapid thermal processing system. In comparison with several newly developed modeling methods for DPSs, the simulation results validate the superiority of the proposed modeling method. Keywords: Fuzzy modeling, Distributed parameter system, Support vector regression
1. Introduction Many industrial processes have significant spatiotemporal dynamics and they are usually called distributed parameter systems (DPSs) [1]. For example, flexible beam [2], transport-reaction processes [3], and thermal process [4] are typical 5
spatiotemporal systems. These systems are often represented by partial differential equations (PDEs). The PDE descried system can be transformed into an infinite-dimensional system of ordinary differential equations (ODEs). The spatially distributed feature requires an infinite-dimensional modeling [1][5][6], which is more difficult and complicated than modeling of lumped parameter
10
systems (LPS). In spite of the difficulty, it is indispensable to model DPSs for control design, optimization, and dynamic prediction. DPS modeling has been broadly investigated since 1960s. Much research is based on the first-principle model, which is represented by PDEs. The DPS system is an inherent infinite-dimensional system. However, because of a finite
15
number of actuators and sensors for practical sensing and control and limited computing power for implementation, such infinite-dimensional system needs to be approximated by a finite-dimensional system [5][6]. In traditional DPS modeling methods, therefore, the model reduction is critical to derive a loworder model for practical application. The finite-element method [7] and the
20
finite-difference method [8] are commonly used to transformed a first-principle PDE into high-order ODEs, which is also called early lumping [5]. Additionally,
2
spectral method [1] and other methods can be employed to reduce a PDE model, which is also called late lumping [5]. Furthermore, fuzzy PDE [9][10] has been reported in recent years to model a DPS. All these methods are only applicable 25
for situations where the PDEs of a DPS are known. Whereas, the PDEs of many practical DPSs are often unknown. Therefore, data-driven based modeling methods [11] are usually used. In recent decades, many researchers have done related research. Much research is developed on the basis of a time-space separation framework [6] where a spatiotemporal variable
30
is decomposed to a series of spatial functions and temporal coefficients. In some methods, the spatial functions are chosen beforehand. For instance, Jacobi polynomials, Fourier series, and Spline functions can be used as spatial functions [6]. In addition, the number of the spatial functions are also determined by prior knowledge. Traditional modeling methods for lumped parameter systems
35
(LPSs) can be employed to estimate the time model. Different to the prior selection of the spatial functions, the spatial functions can be estimated from a set of spatiotemporal data using Karhunen-Love (KL) decomposition [12]. Wiener models [13], Hammerstein [14], and Spatiotemporal Volterra [15] have been used to model DPSs based on KL decomposition. For purpose of acquiring low-order
40
ODEs, model reduction is required. However, it will bring about unmodeled dynamics and unknown nonlinearity [16]. The standard KL decomposition has the feature of linearity, which means that it ignores the nonlinear variations among data. To deal with the nonlinearity, the authors in Ref. [17] introduce adaptive KL decomposition to a fuzzy TS modeling method, real-time update
45
spatial functions using adaptive KL decomposition, and update temporal model using T-S fuzzy model. 3-D fuzzy modeling [16][18] has been developed as a new modeling method for DPSs in recent years. The functions of time-space separation and timespace synthesis are naturally implemented in each 3-D fuzzy model. On the
50
one hand, the time-space separation is naturally realized in a 3-D fuzzy rule, i.e., the computation of the antecedent part represents the temporal coefficient and the consequent part represents spatial function. On the other hand, the 3
time-space synthesis is realized by the Union of all fired 3-D fuzzy rules. In comparison with the traditional time-space based modeling method, the 3-D 55
fuzzy modeling has two obvious features: no reliance on model reduction and linguistic interpretability. Currently, the developed 3-D fuzzy models [16][18] have successfully applied to RTP systems. In [18], an embryonic 3-D fuzzy model is constructed via KL decomposition for spatial functions and particle swarm optimization (PSO) for membership functions in antecedent part of 3-D fuzzy
60
rules. Since KL decomposition is employed, this method still relies on model reduction. In [16], a 3-D fuzzy model is achieved using nearest neighborhood clustering (NNC) algorithm and similarity measure for antecedent part and multiple support vector regressions (SVR) for spatial functions. It is the first 3-D fuzzy model without model reduction.
65
Here, we develop a novel 3-D fuzzy modeling method using spatial multivariable support vector regression (SVR). Traditional SVR [19][20] has not inherent capability of handling with spatiotemporal data; therefore, when traditional SVR is used for modeling a DPS, it is often required to combine with KL decomposition method, i.e. spatial functions are acquired by KL decomposition and
70
time coefficients are estimated by SVR [12]. The conventional KL-SVR modeling method relies on model reduction and has not the ability to linguistically explanation. In this study, a multivariable SVR with spatial kernel functions is proposed to deal with spatiotemporal data. We call it spatial multivariable SVR (abbreviated as spatial MSVR). Different to the multiple traditional SVRs used
75
to learn spatial functions in [16], here the spatial MSVR is directly employed to build up a complete 3-D fuzzy rule-base. The spatial fuzzy basis functions from a 3-D fuzzy model are spatial kernel functions for a spatial MSVR that satisfy Mercy theorem. The proposed 3-D fuzzy modeling method naturally integrates the advantage of learning ability from a spatial multivariable SVR and fuzzy
80
space processing and fuzzy linguistic expression from a 3-D fuzzy system. The main contributions of this paper are given as follows. 1) Spatial multivariable SVR is proposed to cope with a set of spatiotemporal
4
data. 2) Spatial multivariable SVR is used to build up a complete 3-D fuzzy rule-base. 85
3) A data-driven 3D fuzzy modeling method is proposed via using a spatial multivariable SVR. The paper is organized as follows. Section 2 presents the problem description. In Section 3, the spatial multivariable SVR based 3-D fuzzy modeling approach is described in detailed. In Section 4, an RTP system is taken as
90
an application to validate the effectiveness of the proposed 3-D fuzzy modeling method. Finally, the conclusion is given in Section 5.
2. Problem description Here a nonlinear DPS is considered. Let u(θ, t) ∈ R be the spatiotem-
¯ be the spatial variporal output, c(t) ∈ Rm be the temporal input, θ ∈ Θ 95
¯ be the spatial domain, and t be the time variable. The considered able, Θ
DPS is of infinite-dimension.
However, limited number of sensors and ac-
tuators are applied for the purpose of actual needs. It is assumed that P sensors are located at spatial points θ1 , θ2 , · · · , θP for measuring the output of the system and m actuators are spatially distributed on space domain for 100
controlling the system. The executable temporal signals of m actuators are represented by c(t) = [c1 (t), c2 (t), · · · , cm (t)]. The space domain is written
¯ = [θ1 , θ2 , · · · , θP ]0 , and then the spatial output is written as u(Θ, ¯ t) = as Θ 0
[u (θ1 , t) , u (θ2 , t) , · · · , u (θp , t)] . The modeling problem is to identify a spatiotemporal model from the input 105
{c(t)}(t = 1, · · · , L) and the output {u (θi , t)} (i = 1, · · · , P ; t = 1, · · · , L) with L be the time length. 2.1. Conventional KL-SVR modeling The conventional KL-SVR modeling [12] for a DPS is shown in Figure 1. Firstly, a spatiotemporal variable u(θ, t) is separated spatially and temporally 5
Figure 1: Scheme of conventional KL-SVR modeling method ∞
via a set of spatial functions {ϕi (θ)}i=1 as in (1) u(θ, t)=
X∞
i=1
gi (t)ϕi (θ)
(1)
where gi (t) and ϕi (θ) are the temporal coefficient and the spatial function, respectively. Eq. (1) shows the inherent infinite-dimensional characteristics of a DPS. For practical implementation, an infinite-dimensional system needs to be approximated by a finite-dimensional system using model reduction methods. Usually, the spatial functions are ordered from slow to fast in the spatial frequency domain. In practice, the first slow modes will be retained since the fast modes contribute little to the whole system. Thus, eq. (1) can be written as u(θ, t) ≈
Xn
i=1
gi (t)ϕi (θ)
(2)
Then, KL method is used to acquire the spatial functions ϕi (θ) (i = 1,· · · ,n) and determine the model order n from the data set {c(t), u (θi , t)}(i = 1, · · · , P ; t = 1, · · · , L). Thus, the temporal coefficient gi (t) (i = 1, · · · , nk ) is calculated as in (3) gi (t) = hϕi (θ), u(θ, t)i =
Z
ϕi (θ)u(θ, t)dθ
(3)
Ω
In the conventional KL-SVR modeling, least-squares SVR (LS-SVR) algorithm is used to estimate the temporal coefficients and build up a time model, which is given as follows. gˆ(t) =
XL
τ =1
αK(γ(t), γ(τ )) + b
6
(4)
110
0
where g ˆ(t) = [g1 (t), · · · , gn (t)] is the temporal coefficient, γ(t) = [ˆ g (t − 1)0 , · · · ,
gˆ(t−ng )0 , c(t−1)0 , · · · , c(t−nc )0 ]0 , τ ∈ {1, 2, · · · , L} is the τ th sample time, α = 0
[α1 , · · · , αn ] with αi =[αi (1), · · · , αi (L)] (i = 1, · · · , n) , b = [b1 , · · · , bn ], and K(·, ·) is a kernel function related with time. Using least-squares algorithm, α and b can be calculated from the spatiotemporal dataset {γ(τ ), gˆ(τ )}(τ = 115
1, · · · , L). KL decomposition in conventional KL-SVR modeling method is a key technique for model reduction. It is bound to result in unmodeled dynamics and unknown nonlinearity. Additionally, the established model cannot provide a clear linguistic explanation; therefore, it is difficult to understand. The SVR
120
employed here is incapable of coping with a set of spatiotemporal data and herein only can be used to model time model. 2.2. 3-D fuzzy modeling 3-D fuzzy modeling is a new intelligent modeling method for nonlinear DPSs developed in recent years. 3D fuzzy model is designed based on 3D fuzzy set
125
for expressing spatial information and 3D fuzzy inference for process spatial information [21]. As shown in Figure 2, 3D fuzzy set has three coordinates: one is for the universe of discourse of the variable x, another is for the spatial information θ, and the third is for the membership degree µ(x, θ). 3D fuzzy inference consists of three operations: spatial information fusion, dimension re-
130
duction, and traditional inference. For detailed operations, one can refer to Ref. [21]. Different to the conventional KL modeling method, 3-D fuzzy modeling directly applies 3-D fuzzy linguistic rules to describe a spatiotemporal system and naturally implements the functions of time-pace separation and time-space synthesis as shown in Figure 3. The obvious advantages of 3-D fuzzy modeling
135
are no reliance on model reduction and linguistic interpretability [16]. ¯ t), let the orders of the input As for the input c(t) and the output u(Θ, ¯ t) be A and B, respectively. {c(t − variable c(t) and the output variable u(Θ, ¯ t − 1), u(Θ, ¯ t − 2), · · · , u(Θ, ¯ t − B)} are input 1), c(t − 2), · · · c(t − A)} and {u(Θ,
variables for 3-D fuzzy modeling. A 3-D fuzzy model can describe a DPS with 7
Figure 2: Sketch of 3D fuzzy set
Figure 3: Scheme of 3-D fuzzy modeling
140
the following 3-D fuzzy linguistic rules. ∗ ¯ t − 1) is O ¯ 1l and · · · and u(Θ, ¯ t − B) is O ¯ Bl Rl :if u(Θ,
and c1 (t − 1) is U11l and · · · and c1 (t − A) is U1Al 1l Al and cm (t − 1) is Um and · · · and cm (t − A) is Um
(5)
¯ t) is ϕl (Θ) ¯ then u(Θ, ¯ denotes a spatial function, Usjl (s = 1, · · · , m; j = 1, · · · , A) denotes where ϕl (Θ) 8
¯ kl (k = 1, · · · , B) denotes a 3-D fuzzy set, l = a conventional fuzzy set, O 1, · · · , N .
∗
As shown in Figure 3, the antecedent part in Rl is employed to calculate the 145
∗
temporal coefficients (e.g. gi (t) in (1)) and the consequent part in Rl is employed ∗
to represent spatial function (e.g. ϕi (θ) in (1)). The rule Rl naturally realizes the function of time-space separation as in a conventional time-space based DPS P∞ modeling [6]. The function of time-space synthesis (e.g. i=1 gi (t)ϕi (θ) in (1)) is realized by the union of the fired 3-D fuzzy rules. A detailed description of 150
3-D fuzzy model is given in [18]. A nonlinear mathematical expression of 3-D fuzzy model is presented in Appendix A. To design such a 3-D fuzzy model, expert experience [21][22] is traditionally applied to design 3-D fuzzy rules and 3-D membership functions, which involves much work. In recent years, expert-based design is transferred to machine-
155
learning based design (also called data-driven-based design). For instance, spatial functions are acquired by KL decomposition and all the parameters of membership functions are optimized by PSO algorithm in [18]; the antecedent parts of 3-D fuzzy rules are designed by NN clustering and similarity measure, and spatial functions are learned by multiple SVRs in [16]. The first method still
160
relies on model reduction since KL decomposition is used. The second method is not only time consuming but also locally optimal. From subsection 2.1, it is seen that the traditional SVR can only be used to learn temporal model and has not the capability to learn a space-time coupled model. In this study, we concentrate on spatial multivariable SVR, a kind of
165
multivariable SVR with spatial kernel function, which makes a multivariable SVR with the capability of coping with spatiotemporal data set; and then we apply this spatial multivariable SVR for 3-D fuzzy modeling.
9
3. Spatial multivariable SVR learning based 3-D fuzzy modeling 3.1. Methodology 170
Via utilizing KL decomposition technique, the traditional SVR can be applied to model a DPS. However, the traditional SVR only models the time coefficients. In this study, since 3-D fuzzy model has the characteristics of space-time separation, we investigate a multivariable SVR with spatial kernel function (called spatial MSVR), combine this spatial MSVR with 3-D fuzzy
175
model, and construct a novel spatial MSVR based 3-D fuzzy modeling method. This method integrates two distinct merits: the first one is the fuzzy space processing and fuzzy linguistic expression from 3-D fuzzy model and the other one is the capability of learning from the spatial MSVR. Methodology of the spatial MSVR based 3-D fuzzy modeling is depicted
180
in Figure 4. The proposed 3-D fuzzy modeling method has some features as follows. (1) The 3-D fuzzy model naturally formulates spatial fuzzy basis functions (SFBFs) [23], which satisfy Mercy theorem. (2) The SFBFs from the 3-D fuzzy model is used to be as spatial kernel functions
185
(SKFs) for the spatial MSVR. (3) Trained with a set of spatiotemporal data, the spatial MSVR produces spatial support vectors and Lagrange multipliers. (4) The spatial support vectors and Lagrange multipliers are directly used to design 3-D fuzzy rules of the 3-D fuzzy model.
190
3.2. Theory of spatial multivariable SVR learning based 3-D fuzzy modeling 3.2.1. Spatial multivariable SVR According to the nonlinear DPS described in Section 2, suppose we have a spatiotemporal training set S as follows. S = {xtθ , utθ }(t = 1, · · · , L) 10
(6)
Figure 4: Methodology of spatial MSVR based 3-D fuzzy modeling
where ¯ t) = [utθ , · · · , utθ ] = [u(θ1 , t), · · · , u(θP , t)]0 ∈ RP utθ = u(Θ, 1 P utθi = u(θi , t) ¯ t) = [u(Θ, ¯ t − 1)0 , · · · , u(Θ, ¯ t − B)0 , c1 (t − 1), xtθ = x(Θ, · · · , c1 (t − A), cm (t − 1), · · · , cm (t − A)]0 ∈ RP B+mA A spatial MSVR aims at seeking a function vector f (xθ , w) = [f1 (xθ ,w1 ), · · · ,
fP (xθ ,wP )]0 while a maximum deviation ε from the target vector is achieved
for all training samples in S. The function vector f (xθ , w) can be described as 195
follows.
f (xθ , w) = hw,ψ(xθ )i℘ + b
(7) 0
where w= [w1 , · · · ,wP ]0 , wi = [wi1 , · · · ,wiP B+mA ] , i = 1, · · · , P , b = [b1 , · · · , bP ]
, ψ(·) : RP B+mA → ℘ is a mapping from the input space to a feature space. Eq.
11
(7) can be written as in its component form as below.
f (xθ , w) =
f1 (xθ ,w1 ) .. . fP (xθ ,wP )
=
hw1 ,ψ(xθ )i℘ + b1 .. . hwP ,ψ(xθ )i℘ + bP
(8)
The spatial MSVR problem can be formulated as the following optimization problem.
L X P X 1 2 min = kwk + C [fj (xiθ ,wj ) − uiθj ]ε j w,b 2 i=1 j=1
(9)
where C denotes a parameter determining the trade-off between the approximate error and the complexity of f (xθ , w); [·]εj denotes an ε-insensitive loss function [19] defined as below. [fj (xiθ ,wj ) − uiθj ]εj
0 if fj (xiθ ,wj ) − uiθj ≤ εj = f (xi ,w ) − ui − ε otherwise j j j θ θj
Using a kernel K(·, ·) : RP B+mA × RP → R, we have the following computation of a scalar product:
w=
X i
wi K(xiθ , ·)
(10)
ψ(x) =K(xiθ , ·)
(11)
X
(12)
hw,ψ(x)i℘ =
wi K(xiθ , xθ )
i
∗ Using the Lagrange multipliers αkj αkj and then utilizing the saddle point
condition, the dual optimization problem of (9) is formulated as follows.
max
αjk ,αj∗ k
P L P P 1 (αkj − αkj∗ )(αkl − αkl∗ )K(xjθ , xlθ ) −2 k=1 j,l=1
P P L P P L P P (αkj − αkj∗ )ujθk − (αkj + αkj∗ )εk + k=1 j=1
k=1 j=1
12
(13)
subject to L X j=1
0≤
αkj
(αkj − αkj∗ ) = 0
≤ C, 0 ≤ αkj∗ ≤ C, k = 1, 2, ..., P
After the problem in (13) is solved, the optimal weight vector w and an
200
optimal bias b are derived as in (14). L P (αi − αi∗ )ψ(xiθ ) w= i=1 L P i
1 i b= L (uθ − w · ψ(xθ )
(14)
i=1
i∗ 0 i 0 . and αi∗ = α1i∗ , · · · , αP where α = α1i , · · · , αP i
Finally, we have the best regression hyper-surface as follows L P f (xθ , w)= (αi − αi∗ )K(xiθ , xθ ) + b i=1 P = (αi − αi∗ )K(xiθ , xθ ) + b
(15)
i∈SV
The sample xiθ in S corresponding to nonzero (αi − αi∗ ) is defined as spatial
support vector (SV). K(xiθ , xθ ) is a kernel function, which is the function of 205
space variable. Since it has the capability of handling with spatial information, we call it spatial kernel function (SKF). Similar to the traditional SVR, the SKF of the spatial MSVR should satisfy the Mercy theorem [24]. Spatial fuzzy basis function gl (t) (shown in (A.2) of Appendix A) of a 3-D fuzzy model is a spatial function satisfying Mercy theorem, which is proven in Appendix B. Thus, gl (t)
210
can be used as the SKF for a spatial MSVR. A three-layer network structure concisely depicts the working mechanism of a spatial MSVR as shown in Figure 5.
13
Figure 5: Three-layer network structure of spatial MSVR
3.2.2. Equivalence theory We rewrite the decision function of a spatial MSVR shown in (15) to (16).
f (xθ , w)=
P
i∈SV = P i∈SV
f1 (xθ , w) .. . fP (xθ , w)
(α1i
− α1i∗ )K(xiθ , xθ ) + b1 .. .
i∗ i )K(xiθ , xθ ) + bP (αP − αP
(16)
And we rewrite the nonlinear mathematical expression of a 3-D fuzzy model described as (A.1) in Appendix A with bias terms to (17).
¯ t) = u(Θ,
P N 1 l=1 ϕl (θ1 ) gl (t) + b0 .. = . PN P u (θp , t) l=1 ϕl (θp ) gl (t) + b0 u (θ1 , t) .. .
(17)
Comparing (16) with (17) and Figure 5 with Figure 17 in Appendix A, let K xiθ , xθ = gi (t), αji − αji∗ = ϕi (θj ), and bj = bj0 , we have an equivalence 14
relationship as below. ¯ t) f (xθ , w)=u(Θ,
(18)
Eq. (18) shows that the decision function of a spatial MSVR is equal to a 215
3-D fuzzy model when some conditions are satisfied. These conditions are easy to satisfied. The first condition is to make the SFBFS of a 3-D fuzzy model be the SKFs of a spatial MSVR. Then, the spatial support vectors are used to generate the centers of membership functions in the antecedent parts of the 3-D fuzzy rule-base. The second condition is to produce the spatial functions in the
220
consequent parts of the 3-D fuzzy rule-base by utilizing Lagrange multipliers. And the third condition is to balance the bias terms of the spatial MSVR via ∗
adding an additional 3-D fuzzy rule R0 . Utilizing the equivalence relationship, the problem of designing a 3-D fuzzy model is transferred to the problem of training a spatial MSVR. 225
3.3. Procedure of spatial MSVR based 3-D fuzzy modeling Procedure of the proposed spatial MSVR based 3-D fuzzy modeling is presented as the following steps, which is also shown in Figure 6. Step 1: Set the components of a 3-D fuzzy model, including the type of 3D fuzzifier, the format of 3D fuzzy rule, the type of 3D membership
230
function, fuzzy operator, the type of defuzzifier, and so on. Step 2: Produce SFBFs (e.g. see (A.2) in Appendix A) from the 3-D fuzzy model. Step 3: Generate and collect a set of spatiotemporal input-output data from a DPS system.
235
Step 4: A spatial MSVR model with SFBF type SKFs is trained using the collected spatiotemporal data, and spatial support vectors and Lagrange multipliers are produced to design 3-D fuzzy rules. Spatial support vectors are used to design the antecedent parts of the 3-D fuzzy rules (i.e. the center of the membership functions in (5)), while Lagrange 15
Figure 6: Procedure of spatial MSVR based 3-D fuzzy modeling
240
multipliers are used to design the consequent parts of the 3-D fuzzy rules (i.e. spatial functions in (5)). Step 5: A 3-D fuzzy rule-base is completed by combing the antecedent parts and the consequent parts. Step 6: A 3-D fuzzy model is achieved.
245
4. Application to an RTP system 4.1. RTCVD system In this work, an important RTP process, rapid thermal chemical vapor deposition (RTCVD) in semiconductor manufacturing process [25], is investigated. As shown in Figure 7, the RTCVD system is divided into three zones for heating,
250
i.e., Lamp banks 1, 2, and 3. A 6-in silicon wafer is positioned on a rotatable support to guarantee azimuthal temperature uniformity. Silane (SiH4) injected from the top of the reactor is resolved into silicon (Si) and hydrogen (H2). At 16
temperature close to 1000 K within around one minute, 0.5 µm thick of polysilicon is settled on the wafer. Temperature uniformity along the wafer radius 255
is required to ensure a uniform deposition of polysilicon on the wafer via controlling the power to the Lamp banks 1, 2, and 3. Because the wafer is quite thin and the wafer support rotates, temperature change in radius direction is only considered. A one-dimensional PDE model [26] representing the thermal dynamics in the RTCVD system is described as follows. ∂Td0 /∂t0 = κ0 (1/d0 ) ∂Td0 /∂d0 + ∂ 2 Td0 /∂d02 + σ0 1 − Td04 + ωd q1 (d0 ) c1
(19)
+ ωd q2 (d0 ) c2 + ωd q3 (d0 ) c3
subject to the following boundary conditions: ∂T 0 /∂d0 = σ (1 − T 04 ) + q c ed ed 2 d d ∂T 0 /∂d0 = 0 d
260
when d0 = 1 when d0 = 0
(20)
where Td0 = Td /Tamb is non-dimensional temperature of wafer, Td is the actual wafer temperature, and Tamb = 300K is the actual ambient temperature; t0 = t/τ is non-dimensional time, t is the actual time, and τ = 2.9s; d0 = d/Dw is non-dimensional radius location, d is actual radius location, and Dw = 7.6cm is actual wafer radius; c1 , c2 , and c3 are the percentage of power of Lamps 1,
265
2, and 3; q1 (d0 ), q2 (d0 ), and q3 (d0 ) (distributions are shown in Figure 8) are incident radiation flux from Lamps 1, 2, and 3 to the wafer respectively. The parameters in (19)-(20) are list as the following:
κ0 = 0.0021, σ0 = 0.0012, σed = 0.0037, qed = 4.022, ωd = 0.0256.
4.2. Spatial MSVR based 3-D fuzzy modeling The components of a 3-D fuzzy model were set as described in Appendix A. The SFBFs from the 3-D fuzzy model were written as in (A.2). The PDEs
17
Figure 7: Sketch for an RTP system
Figure 8: Radiation flux distribution of Lamp banks 1, 2, and 3
represented by (19)-(20) was solved by method of lines [27] to simulate the RTP system. Ten percent of disturbance signals was added to the manipulated variables c1 , c2 and c3 to generate a set of spatiotemporal data. Eleven sensors were located uniformly along the wafer radius for measuring the temperature. The data-collection interval was 0.5s, the simulation duration was 7000s, and 14000 samples were collected. After that, 600 samples and 300 samples were picked at random from 14000 samples for training and test respectively. For the sake of simplicity, A and B were chosen as 1 and 1, respectively. Accordingly,
18
the data set S was rewritten as below. S = {xk , ukθ }
= {(xkθ , xkc ), ukθ xkθ ∈ R11×1 , xkc ∈ R1×3 , ukθ ∈ R11×1 , k = 1, · · · , 600}
where xk = (xkθ , xkc ) ¯ k − 1)] xkθ = [u(Θ, xkc = [c1 (k − 1), c2 (k − 1), c3 (k − 1)] ¯ k) = [u(θ1 , k), · · · , u(θ11 , k)]0 ukθ = u(Θ, The spatial MSVR with SFBF type spatial kernel functions was trained by 270
S. In terms of five-folds cross-validation, ε and C of the spatial MSVR were chosen as 0.05 and 125, respectively. After training, 163 spatial support vectors were obtained and then 163 3-D fuzzy rules were formulated. For example, the first three 3-D fuzzy rules were extracted as below. ∗
¯ t − 1) is Positive Medium and c1 (t − 1) is Positive Large and R1 : if u(Θ,
¯ t) c2 (t − 1) is more than Zero and c3 (t − 1) is Positive Large, then u(Θ,
275
is [-124.997 -0.006 -71.178 -123.038 -0.006 -0.014 -0.002 -0.012 -124.993 -124.997 -124.998]; ∗ ¯ t − 1) is Positive Large and c1 (t − 1) is less than Positive Large and R2 : if u(Θ,
¯ t) c2 (t − 1) is more than Zero and c3 (t − 1) is Positive Large, then u(Θ, is [0.003 14.391 124.994 39.913 108.550 124.997 124.998 0.016 124.976
280
124.996 124.995]; ∗
¯ t − 1) is more than Zero and c1 (t − 1) is more than Positive Small R3 : if u(Θ, and c2 (t − 1) is more than Zero and c3 (t − 1) is very Positive Small, then ¯ t) is [-0.001 -55.686 -65.873 -0.005 -0.00219 -0.014 -0.011 -36.883 u(Θ, 285
0.008 -95.155 -90.552]; Furthermore, the first six spatial functions were plotted in Figure 9. 19
150
100
50 spatial function1 spatial function2 spatial function3 spatial function4 spatial function5 spatial function6
0
-50
-100
-150
0
1
2
3
4
5
6
7
8
Figure 9: Spatial distribution of spatial functions from the first six 3-D fuzzy rules
Therefore, a 3D fuzzy model was established. For short, the spatial MSVR based 3-D model is written as MSVR 3-D model. Figure 10 and Figure 11 show the predicted output error of MSVR 3-D model on training data and test data, 290
respectively. Over the space domain, the maximal predicted output errors on training data and test data were 2.4533 and 2.1683, respectively. The MSVR 3-D model can satisfactorily approximate the spatiotemporal system. The following performance index (root of the mean squared error, RMSE) was used to assess the performance of modeling, as shown in Table 1. RM SE =
Z X
e(θ, t)2 dθ/
Z
dθ
X
∆t
1/2
4.3. Method comparison 295
Four modeling methods were considered for comparison. The first method was the conventional KL decomposition and SVR based DPS modeling (abbreviated as KL-SVR) [12], where 11 spatial functions were estimated by KL decomposition and the temporal coefficients were identified by least-squares SVR. The second method was a spatiotemporal least-squares support vector 20
3 2 1 0 -1 -2 -3 8000 6000
8 6
4000
t
4
2000
2 0
0
Figure 10: Predicted output error of MSVR 3-D model on training data
3 2 1 0 -1 -2 8000 6000
8 6
4000
t
4
2000
2 0
0
Figure 11: Predicted output error of MSVR 3-D model on test data
300
machine modeling (abbreviated as ST-LS-SVM) [28], where the nonlinear spatial correlation was described by spatial kernel functions and time dynamics was represented by time Lagrange multipliers, and an ST-LS-SVM model was a linear addition of multiplication of time Lagrange multipliers with spatial kernel functions. In this application, since 11 sensors were employed, 11 Gaussian type
305
spatial kernel functions were used. The third method was the conventional KL decomposition and least squares (LS) based DPS modeling method (abbreviated as KL-LS), where 11 spatial functions were acquired by KL decomposition 21
Table 1: Performance comparison of five modeling methods
RMSE
Training data Test data
MSVR 3-D model
0.8969
0.8703
KL-SVR model [12]
0.9327
0.8899
ST-LS-SVM model [28]
0.9436
0.9264
KL-LS model
2.4481
2.2903
NNC-SVR 3-D model [16]
1.2862
1.2214
and time coefficients were identified by LS algorithm. The fourth method was the nearest neighborhood clustering and SVR based 3-D fuzzy modeling (abbre310
viated as NNC-SVR 3-D model) [16], where the antecedent parts of rules were designed by NN clustering and similarity measure and spatial functions were estimated by multiple traditional SVRs. One can refer to Ref. [16] for detailed design of NNC-SVR 3-D model. In both the conventional KL-SVR DPS modeling method and the KL-LS
315
DPS modeling method, the spatial functions were obtained by KL decomposition techniques. Since 11 sensors were used for measurement in this application, 11 spatial functions were estimated. The spatial distribution of the first 6 spatial functions were depicted as in Figure 12. Another two performance indices were employed to access these five modeling methods. The first one is the relative L2 -norm error (RLNE) given as follows. RLN E(t) =
Z
1/2 Z 1/2 2 e(θ, t) dθ / u(θ, t) dθ 2
The second one is the temporal normalized absolute error (TNAE) given as follows. T N AE(θ) =
X
|e(θ, t)|/
X
∆t
Figure 13 and Figure 14 show the RLNE on training set and test set, re320
spectively. Figure 15 and Figure 16 show the TNAE on training set and test set, respectively. Table 1 shows the comparison of RMSE. From Figures 13-16 and Table 1, it can be seen that the proposed MSVR 22
1
spatial function 1 spatial function 2 spatial function 3 spatial function 4 spatial function 5 spatial function 6
0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8
0
1
2
3
4
5
6
7
8
Figure 12: Spatial distribution of spatial functions estimated from KL decomposition
3-D modeling method was superior to the KL-SVR modeling method, the STLS-SVM modeling method, the KL-LS modeling method, and the NNC-SVR 325
3-D modeling method. Specifically, the MSVR 3-D modeling method outperformed KL-SVR modeling method and the KL-LS modeling method. That is because both KL-SVR modeling method and KL-LS modeling method have the feature of model reduction, which is sure to cause unmodeled dynamics and uncertainty. The MSVR 3-D modeling method outweighed ST-LS-SVM modeling
330
method since SFBFs of the proposed MSVR 3-D model fuses functions of the spatial fuzzy information processing and spatial fuzzy inference. The modeling performance of MSVR 3-D model was better than that of NNC-SVR 3-D model. The reason is that the spatial MSVR in an MSVR 3-D model is directly employed to learn 3-D fuzzy rules in a sense of global optimization.
335
5. Conclusions A spatiotemporal 3-D fuzzy modeling approach using spatial multivariable SVR was put forward for modeling unknown nonlinear distributed parameter systems. The proposed modeling method naturally fuses the time-space sepa-
23
8
×10 -3 KL-LS NNC-SVR 3-D ST-LS-SVM KL-SVR MSVR 3-D
7
RLNE(t)(%)
6 5 4 3 2 1 0
0
50
100
150
200
250
300
350
400
450
500
t
Figure 13: RLNE comparison of five modeling methods on training data
2 KL-LS NNC-SVR 3-D ST-LS-SVM KL-SVR MSVR 3-D
1.8
1.6
1.4
1.2
1
0.8
0.6
0
1
2
3
4
5
6
7
8
Figure 14: TNAE comparison of five modeling methods on training data
ration and the time-space synthesis into a 3-D fuzzy model. SFBFs from the 340
3-D fuzzy model is spatial kernel functions for the spatial multivariable SVR. Therefore, the proposed spatial MSVR based 3-D fuzzy model fully integrates the advantages both from 3-D fuzzy model and spatial MSVR. On the one hand, the proposed spatial MSVR based 3-D fuzzy model has the capability of linguistic expression and explanation. On the other hand, it has strong learning
24
8
×10 -3 KL-LS NNC-SVR 3-D ST-LS-SVM KL-SVR MSVR 3-D
7
RLNE(t)(%)
6 5 4 3 2 1 0
0
20
40
60
80
100
120
140
160
180
200
t
Figure 15: RLNE comparison of five modeling methods on test data
2
1.8 KL-LS NNC-SVR 3-D ST-LS-SVM KL-SVR MSVR 3-D
1.6
1.4
1.2
1
0.8
0.6
0
1
2
3
4
5
6
7
8
Figure 16: TNAE comparison of five modeling methods on test data
345
ability. The proposed modeling method was applied to an RTCVD system. The experimental results indicated the proposed modeling method had a better modeling performance.
25
Appendix A Mathematical expression of 3-D fuzzy model In this study, 3-D fuzzy model contains the following components: singleton type 3-D fuzzifier, 3-D fuzzy rules as in (4), Gaussian type membership function, linear defuzzifier, weighted aggregation dimension reduction, and product tnorm. Through mathematical derivation, the 3-D fuzzy model can be described as in (A.1). ¯ t) u(Θ, = =
PN
2 l (u(θj ,t−i)−rij ) ) ) l i=1 exp(−( σij 2 Qm QA (c (t−k)−dkl ) × i=1 k=1 exp(−( i δkl i ) ) i l j=1 aj
l=1
PN
PP
¯
l=1 gl (t)ϕl (Θ)
gl (t) =
350
QB
¯ ϕl (Θ)
l 2 QB l l j=1 aj i=1 exp(−((u(θj , t − i) − rij ) σij ) ) kl 2 Qm QA × i=1 k=1 exp(−((ci (t − k) − dkl i ) δi ) ) PP
(A.1)
(A.2)
l l ¯ il from where rij and σij are the center and the width of the 3-D fuzzy set O kl the sensing point θj ; dkl i and δi are the center and the width of the traditional
fuzzy set Uikl ; alj is the spatial weight from the sensing point θj (j = 1, · · · , P ). gl (t) is a temporal coefficient in a 3-D fuzzy model [16][18], and is also called spatial fuzzy basis function (abbreviated as SFBF) in a 3-D fuzzy system [23]. SFBFs integrate the spatial fuzzy information processing and spatial fuzzy inference in a 3-D fuzzy model. Therefore, gl (t) is a spatial function. Since a 3-D fuzzy model is the linear combination of multiple SFBFs, it is also depicted by a three-layer network structure as demonstrated in Figure 17, where a fuzzy ∗
rule R0 is added to realize the function of bias terms b10 , · · · , bp0 . ∗
¯ t − 1) is O ¯ 10 and · · · and u(Θ, ¯ t − B) is O ¯ B0 R0 :if u(Θ, and c1 (t − 1) is U110 and · · · and c1 (t − A) is U1A0 10 A0 and cm (t − 1) is Um and · · · and cm (t − A) is Um
¯ t) is b0 (Θ) ¯ then u(Θ,
26
Figure 17: Three-layer network structure of 3-D fuzzy model
¯ i0 (i = 1, · · · , B) denotes a universal 3-D fuzzy set, the grade of which is where O ¯ for any u(Θ, ¯ t−i). Usp0 (s = 1, · · · , m ; p = 1, · · · , A) denotes a universal 1 over Θ
355
traditional fuzzy set, the grade of which is 1 over the domain of discourse for any cs (t − k). Appendix B Proof of Spatial fuzzy basis function satisfying Mercy theorem In the theory of SVR, the precondition of a function taken as KFs is that
360
the function should satisfy Mercer theorem. If SFBFs satisfy Mercer theorem, they will be spatial KFs in the MSVR. In terms of the mathematical description of SFBF in (A.2), the SFBF of the 3-D fuzzy model is rewritten as follows. gl (t) =
l 2 QB l l i=1 exp(−((u(θj , t − i) − rij ) σij ) ) j=1 aj kl 2 Qm QA × i=1 k=1 exp(−((ci (t − k) − dkl i ) δi ) ) PP
27
(B.1)
Mathematically, the 3-D fuzzy model can be regarded as the linear combination of the SFBFs of all the fired 3-D fuzzy rules. Furthermore, we rewrite (B.1) into (B.2) gl (t) =
XP
j=1
alj φl (t)
l 2 l i=1 exp(−((u(θj , t − i) − rij ) σij ) ) kl 2 Qm QA × i=1 k=1 exp(−((ci (t − k) − dkl i ) δi ) )
φl (t) =
QB
(B.2)
(B.3)
where φl (t) is called a traditional fuzzy basis function (FBF) [29]. Therefore, the SFBF gl (t) is assembled by multiple traditional FBFs φl (t) (j = 1, · · · , P ) with spatial weights. As described in Ref. [30], φl (t) is a KF, which satisfies 365
Mercer theorem. Since linear combination of KFs is still a KF, gl (t) is a KF.
Competing Interests The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgments 370
This work was supported by the project from the National Science Foundation of China under Grant no. 61273182.
References References [1] P. D. Christofides, Nonlinear and Robust Control of Partial Differential 375
Equation Systems: Methods and Applications to Transport-Reaction Processes. Boston, MA, USA: Birkhauser, 2001. [2] A. H. V. Flotow,B. Schaefer, Wave-absorbing controllers for a flexible beam, J. Guid. Control Dyn. 9(6)(2012) 673-680.
28
[3] L. Lao, M. Ellis, P. D. Christofides, Handling state constraints and economics 380
in feedback control of transport-reaction processes, Journal of Process Control 32(2015) 98-108. [4] Y. Chen et al., Application studies of activated carbon derived from rice husks produced by chemical-thermal process-a review, Adv. Colloid Interface Sci. 163(1)(2011) 39-52.
385
[5] W. H. Ray, Advanced Process Control. New York:McGraw-Hill, 1981. [6] H.-X. Li, C. K. Qi, Modeling of distributed parameter systems for applications-A synthesized review from time-space separation, J. Process Control 20(8)(2010) 891-901. [7] F. Y. Kuo, I. H. Sloan, Multi-level quasi-Monte Carlo finite element methods
390
for a class of elliptic partial differential equations with random coefficients, Found. Comput. Math. 15(2)(2015) 411-449. [8] M. L. Wang, C. K. Qi, H. C. Yan, Hybrid neural network predictor for distributed parameter system based on nonlinear dimension reduction, Neurocomputing 171(2016) 1591-1597.
395
[9] J.-W. Wang, H.-N. Wu, Exponential pointwise stabilization of semi-linear parabolic distributed parameter systems via the Takagi-Sugeno fuzzy PDE model, IEEE Transactions on Fuzzy Systems 26(1)(2018) 155-173. [10] J.-W. Wang, S.-H. Tsai, H.-X. Li, H.-K. Lam, Spatially piecewise fuzzy control design for sampled-data exponential stabilization of semi-linear
400
parabolic PDE systems, IEEE Transactions on Fuzzy Systems 26 (5) (2018) 2967-2980. [11] Y.C Jiang, S. Yin, O. Kaynak, Data-driven monitoring and safety control of industrial cyber-physical systems: basics and beyond, IEEE Access 6 (2018) 47374-47384.
29
405
[12] C. K. Qi, H.-X. Li, X.-X. Zhang, X.C. Zhao, S.Y. Li, F. Gao,Time/SpaceSeparation-Based SVM Modeling for Nonlinear Distributed Parameter Processes, Industrial & Engineering Chemistry Research 50(1)(2011) 332-341. [13] C. K. Qi, H.-X. Li, A Karhunen-Loeve decomposition-based Wiener modeling approach for nonlinear distributed parameter processes, Ind. Eng. Chem.
410
Res. 47(12)(2008) 4184-4192. [14] C. K. Qi,H.-X. Li, A Time/space separation based Hammerstein modeling approach for nonlinear distributed parameter processes, Compute. Chem. Eng. 33(7)(2009) 1247-1260. [15] H.-X. Li, C. K. Qi, Y. G. Yu, A Spatio-temporal Volterra modeling ap-
415
proach for a class of nonlinear distributed parameter processes, J. Process Control 19(7)(2009) 1126-1142. [16] X.-X. Zhang, L.R. Zhao, H.-X. Li, S. W. Ma, A Novel Three-Dimensional Fuzzy Modeling Method for Nonlinear Distributed Parameter Systems, IEEE Transactions on Fuzzy Systems 27(3)(2019) 489-501.
420
[17] X. J. Lu, W. Zou, M. H. Huang, An adaptive modeling method for timevarying distributed parameter processes with curing process applications, Nonlinear Dyn. 82(2015) 865-876. [18] X.-X. Zhang, Z. Q. Fu, S. Y. Li, T. Zou, B. Wang, A time/space separationbased 3D fuzzy modeling approach for nonlinear spatially distributed sys-
425
tems, International Journal of Automation and Computing 15(1)(2018) 5265. [19] V. Vapnik, Statistical Learning Theory. New York, NY, USA: Wiley, 1998. [20] S. Yin, Y.C. Jiang,Y. Tian, O. Kaynak, A data-driven fuzzy information granulation approach for freight volume forecasting, IEEE Transactions on
430
Industrial Electronics 64(2)(2017)1447-1456.
30
[21] H.-X. Li, X.-X. Zhang, S. Y. Li, A three-dimensional fuzzy control methodology for a class of distributed parameter system, IEEE Trans. Fuzzy Syst. 15(3)(2007) 470-481. [22] X.-X. Zhang, H.-X. Li, B. Wang, S. W. Ma, A hierarchical intelligent 435
methodology for spatiotemporal control of wafer temperature in rapid thermal processing, IEEE Trans. Semicond. Manuf. 30(1)(2017) 52-59. [23] X.-X. Zhang, Y. Jiang, H.-X. Li, S. Y. Li, SVR learning-based spatiotemporal fuzzy logic controller for nonlinear spatially distributed dynamic systems, IEEE Trans. Neural Netw. Learn. Syst. 24(10)(2013) 1635-1647.
440
[24] C. J. C. Burges, A tutorial on support vector machines for pattern recognition, Data Mining Knowl. Discovery 2(2)(1998) 121-167. [25] A. Theodoropoulou, R. A. Adomaitis, E. Zafiriou, Model reduction for optimization of rapid thermal chemical vapor deposition systems, IEEE Trans. Semicond. Manuf. 11(1)(1998) 85-98.
445
[26] R. A. Adomaitis, RTCVD model reduction: A collocation on empirical eigenfunctions approach, Inst. Syst. Res., Univ. Maryland, College Park, MD, USA, Tech. Rep. T.R. 95-64, 1995. [27] W. E. Schiesser, The Numerical Method of Lines: Integration of Partial Differential Equations. San Diego, Chile: Academic Press, 1991.
450
[28] X. J. Lu, W. Zou, M. H. Huang, A novel spatiotemporal LS-SVM method for complex distributed parameter systems with applications to curing thermal process, IEEE Transactions on Industrial Informatics 12(3)(2016) 11561165. [29] L.X. Wang, A course in fuzzy systems and control. Prentice-Hall, Upper
455
Saddle River, 1997. [30] Y. Chen, J. Z.Wang, Support vector learning for fuzzy rule-based classification systems, IEEE Trans. Fuzzy Syst. 11(6)(2003) 716-728. 31