Journal Pre-proof An adaptive mode convolutional neural network based on bar-shaped structures and its operation modeling to complex industrial processes Yongjian Wang, Hongguang Li, Chu Qi PII:
S0169-7439(19)30802-0
DOI:
https://doi.org/10.1016/j.chemolab.2020.103932
Reference:
CHEMOM 103932
To appear in:
Chemometrics and Intelligent Laboratory Systems
Received Date: 10 December 2019 Revised Date:
25 December 2019
Accepted Date: 6 January 2020
Please cite this article as: Y. Wang, H. Li, C. Qi, An adaptive mode convolutional neural network based on bar-shaped structures and its operation modeling to complex industrial processes, Chemometrics and Intelligent Laboratory Systems (2020), doi: https://doi.org/10.1016/j.chemolab.2020.103932. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2020 Published by Elsevier B.V.
Yongjian Wang: Writing- Reviewing; Conceptualization; Visualization Hongguang Li: Supervision, Language improvement. Chu Qi: Validation; Language improvement.
1
An adaptive mode convolutional neural network based on bar-shaped structures
2
and its operation modeling to complex industrial processes
3
Yongjian Wang1,2, Hongguang Li1*, Chu Qi1
4
(1. College of Information Science and Technology, Beijing University of Chemical
5
Technology.
6
2. Department of Chemical and Biomolecular Engineering, University of California, Los
7
Angeles)
8
*Corresponding author, E-mail:
[email protected]
9
Abstract: Optimal operation modeling plays an important role in complex industrial processes;
10
however, with the increasing complexity and high nonlinearity in industrial processes, it becomes
11
more and more difficult to establish an accurate operation modeling using first-principles methods.
12
In this paper, an adaptive mode convolutional neural network framework based on bar-shaped
13
structures (BS-AMCNN) is proposed, which is a data-driven model. First, a bar-shaped structure
14
is designed to deal with the industrial process data specifically. The bar-shaped structure can
15
transfer the advantages of CNN on processing image data to processing industrial process data.
16
Meanwhile, the convolution windows and pooling windows in the proposed BS-AMCNN
17
algorithm is replaced by translation-only sliding bar-shaped windows. Therefore, the algorithm
18
can adjust the CNN structure adaptively among three different modes depending on different
19
process statuses. the optimal operation model can be obtained with the proposed BS-AMCNN
20
method accordingly. An experiment on real complex industrial process, methanol production
21
process, is carried out, which validates the effectiveness of the proposed method. The proposed
22
method is further compared with the traditional CNN method, and the back propagation (BP)
23
method. The results demonstrate the effectiveness of the proposed method.
24
Keywords: Convolutional neural networks; Bar-shaped; Operation modeling; Adaptive mode;
25
Methanol production process
26
1. .Introduction
27
With the development of industrial processes, high efficiency of industrial processes has
28
attracted more and more attention [1-2]. High productivity can be achieved through researching on
1
29
industrial processes. Accurate operations are required to ensure the optimal production efficiency.
30
Therefore, an appropriate operation model is crucial to this process Generally, modeling methods
31
can be divided into mechanism modeling methods and data-driven modeling methods. [3-4]. The
32
mathematical relationship between variables can be established by analyzing and interpreting the
33
physical and chemical mechanisms of processes using mechanism modeling methods [5].
34
Glarborg et al. [6] proposed a mechanism model for the gaseous sulfating of alkali hydroxide and
35
alkali chloride; the method relies on a detailed chemical kinetic model for the high-temperature
36
gas-phase interaction between alkali metals, the O/H radical pool, and chlorine/sulfur species.
37
Frenklach et al. [7] proposed a detailed chemical reaction model for the growth of polycyclic
38
aromatic hydrocarbons and soot particle nucleation and growth; the established model could
39
calculate the optical properties of an arbitrary ensemble of soot particle. Jiang et al. [8] proposed a
40
new numerical model to describe the micro-interacting situations between grains and workpiece
41
material in grinding contact zone to predict the roughness of ground surfaces accurately. However,
42
with the complexity increases in industrial processes, it’s almost impossible to build a precise
43
mechanism model. Optimal operating strategies are often manually adjusted by experienced
44
operators. the optimization of industrial processes can hardly be implemented without the vivid
45
description of the model nor advice from experienced operators. Therefore, an accurate optimal
46
operation model is indispensable.
47
Meanwhile, many researchers have found that with the application of various instrumentation
48
and distributed control systems, industrial data have grown explosively and are increasingly
49
accessible. Industrial process data reflects operation knowledge of the industrial process, which
50
means the data contains abundant operating information. Therefore, many researchers have
51
adopted data-driven modeling methods [9-11]. Data-driven modeling methods dedicates to mining
52
useful information from the input and output data. Mathematical relationship between the
53
independent variables and dependent variables can be established using the data-driven models
54
[12]. Data-driven modeling methods need less process knowledge than mechanism modeling
55
methods, while mainly depending on the collected process data. As a result, data-driven modeling
56
has seen an increasing popularity compared to traditional mechanism modeling, depending on
57
whether the object is non-linear, data-driven modeling methods can be divided into the following
2
58
methods [13]: linear regression methods, fuzzy modeling methods, artificial neural network (NN)
59
methods, etc. NN is one of the most widely used modeling methods, which can represent the
60
mathematical model discrete parallel information process. It can solve complex nonlinear
61
problems effectively. Since Industrial process data are highly non-linear NN has been successfully
62
applied to the modeling for complex processes by various researchers [14]. Lee et al. [15] built a
63
hybrid neural network model of a full-scale industrial wastewater treatment process. He et al. [16]
64
proposed a hybrid robust model based on an improved functional link neural network integrating
65
with partial least square, and the proposed model was successfully applied to predicting key
66
process variables. Ling et al. [17] improved a hybrid particle swarm optimized wavelet neural
67
network for modeling the development of fluid dispensing for electronic packaging.
68
The modeling problem of industrial processes can be successfully solved by these NN
69
methods. Though without an accurate mechanism model, complex industrial process data can be
70
effectively used for modeling. NN data-driven methods have also been successfully applied to
71
optimal operation modeling for optimization by researchers [18]. Cui et al. [19] proposed
72
operational-pattern optimization in blast furnace pulverized coal injection based on prediction
73
model of neural network. Rangwala et al. [20] used computing abilities of neural networks to learn
74
and optimize the machine operations. Ochoa-Estopier et al. [21] proposed a new methodology for
75
optimizing heat-integrated crude oil distillation systems.
76
However, there is a serious problem when it comes to model industrial process objects using
77
traditional NNs. The connection between neurons is fully connected. When there are several
78
hidden layers in the neural network, the fully connected neurons significantly increase the burden
79
of computation. As the time cost will greatly increase will the rising computation amount, the NN
80
methods cannot be suitable for overly complex data modeling. Convolutional neural network
81
(CNN) is a recently developed NNs [22]. Sparse sampling and shared weighting methods are used
82
to reduce the amount of computation rapidly. The structure provides a solution to solving the
83
computational burden problem.
84
CNN is a feedforward neural network. With the advancement of hardware devices, especially
85
the use of GPUs, deep learning attracts more and more attention in academic research and
86
industrial application [23-24]. For modeling problems in different fields, more and more scholars
3
87
tend to use in-depth learning to solve practical problems [25-26]. Deep learning neural networks
88
can extract features from complex high-dimensional data. There are four basic deep neural
89
networks that available currently: deep belief networks (DBNs) [27-28], stacked auto encoders
90
(SAE) [29-30], recurrent neural networks (RNNs) [31-32] and convolutional neural networks
91
(CNNs) [33-34]. CNN does not need to select features manually, only training the weights to
92
process high-dimensional data. thus getting good training results. It has been widely used in image
93
processing and feature extraction. Chen et al. [35] proposed a regularized deep feature extraction
94
method to classify hyperspectral images using a convolutional neural network. Wiatowski et al.
95
[36] proposed a mathematical theory of deep convolutional neural networks for feature extraction.
96
Acharya et al. [37] proposed a computer-aided diagnosis system with the CNN method. Data with
97
grid topology features can be processed well with CNN. We can regard time-series data as one-
98
dimensional grid data which has fixed time intervals. CNN can also be used to analyze time series
99
data. The optimal operation modeling problem is a complex multidimensional nonlinear data
100
modeling problem with time-series characteristics. Considering the advantages mentioned above,
101
CNN is a good solution to optimal operation modeling for industrial processes with time-series in
102
this paper.
103
However, the CNN structure cannot be used to process industrial process data directly. In
104
order to solve this problem, an adaptive mode convolutional neural network modeling framework
105
based on bar-shaped structures (BS-AMCNN) is proposed in this paper. In the proposed BS-
106
AMCNN, the structure of traditional CNN is modified. The inputs of the traditional CNN are
107
usually a square image with
108
window in sequence. The pooling window is also a translation sliding window in sequence. The
109
proposed structure in the BS-AMCNN can break the input square image into a long bar data. In
110
the proposed BS-AMCNN structure, sliding windows and pooling windows can both be replaced
111
by long bar translation-only sliding windows. What’s more, the operation conditions of the
112
complex industrial process is changing all the time. If only one simple single algorithm is used
113
during the data analysis, some important features hidden in the history data cannot be extracted
114
and some unnecessary computation will cost extra calculation time. In our proposed BS-AMCNN
115
method, we adjust the single network structure into three alternative network structures: Using the
n × n size. Image features can be extracted by a translation sliding
4
116
previous time modeling results as a standard; using ordinary convolutional neural network
117
structures; and increasing the number of filters and network depth to perform deep-level extraction
118
of local features. The range of the input data selected each time, the length of convolution
119
windows and the length of pooling windows can be determined to minimize the final error using
120
the trial and error method. Therefore, using the proposed BS-AMCNN model, optimal operational
121
strategies in the industrial process data can be learned and predicted efficiently. In order to
122
validate the performance of the proposed BS-AMCNN, optimal operation modeling simulations
123
on the methanol production process are carried out. Simulation results show that the proposed BS-
124
CNN can achieve high accuracy in operation modeling. Some other methods are compared with
125
our proposed BS-AMCNN method, and the results shows that the BS-AMCNN method relatively
126
advanced in operation modeling.
127
The article consists of the following parts: Section 2 describes the basic CNN structure;
128
Section 3 introduces the steps of the proposed BS-AMCNN model in detail and clearly illustrates
129
the intelligent operation framework. Section 4 gives a case study of methanol production process
130
using the proposed BS-AMCNN model and proves the effectiveness of the proposed method.
131
Section 5 makes a summary of this paper.
132
2. Traditional CNN structure
133
CNN is a new type of deep learning methods. LeCun [38] proposed the original CNN structure
134
to deal with the handwritten number recognition problems. The learning performance of the
135
system can be improved by the three core points of CNN technology: sparse interaction, parameter
136
sharing and pooling.
137
2.1 Sparse interactions
138
Traditional neural network is a fully connected structure. The connection between inputs and
139
outputs are realized by matrix multiplication. The interaction between input units and output units
140
can be reflected by the parameters in the parameter matrix, and the parameters in the parameter
141
matrix also represent the relationship between each corresponding output unit and input unit.
142
However, the sparse connection of CNN can make the size of the kernel much smaller than the
5
143
size of the inputs, thus reducing the dimension and computing complexity of system parameters.
144
Figure 1(a) describes the full connection between the two neuron layers of the traditional neural
145
network. Each neuron unit is connected to any neuron unit in the next layer. Figure 1(b) describes
146
the sparse connection structure in CNN. In such network, each neuron unit is connected to the
147
adjacent part of the neuron unit. In Figure 2, although sparsely connected network structures are
148
adopted, most inputs can still contribute to the outputs after passing through a deep network. Deep
149
units can be indirectly connected to all or most inputs at the same time. Sparse connection holds
150
the advantage of simpler network structure and lower computational complexity. Complex
151
relationship among multivariable can be effectively extracted the through partial connection
152
between neurons.
153 154
Figure 1(a) Fully connected structure
Figure 1(b) Sparsely connected structure
155 156 157
Figure 2 The acceptable domain of CNN architecture 2.2 Parameter sharing
6
158 159
Figure 3(a) Traditional neural network without
160
parameter sharing
Figure 3(b) Neural network with parameter sharing
161
The weights of the convolution kernel are acquired in learning process, and the weights of the
162
convolution kernel will not change during the convolution process. Figure 3(a) describes the
163
traditional neural networks. In Figure 3 (a), the purple-red arrow represents the use of intermediate
164
elements of the weight matrix in the fully connected model. This model does not use parameter
165
sharing, so the parameters are used only once. In Figure 3 (b), the purple-red arrow indicates the
166
use of intermediate elements of the element kernel in the convolution model. Because of
167
parameter sharing, this single parameter is used at all input locations. Parameter sharing reduces
168
the network complexity and the number of parameters that the model needs to store in calculation.
169
2.3 Pooling
170
Typical convolution neural networks mainly consist of three parts: convolution, detection and
171
pooling. Pooling can replace the network output of the current point as the output of the whole
172
function by the overall statistical characteristics of adjacent locations, thus greatly reduce the
173
dimension of parameters. There are two most common pooling methods: average pooling and
174
maximum pooling, and these two pooling methods are shown in Figure 4. In Figure 4(a) and 4(b),
175
, , , stands for the input matrix items; stands for the maximum value among , , , .
176
The transformation matrix of the pooling processing is represented by . The input dimension
7
177
can be reduced clearly according to the pooling layers. Therefore, pooling structures can greatly
178
reduce the computational burden of the network and improve the efficiency of network operation. Input X
Transformation W
Output Y
179 180
Figure 4(a) Maximum pooling Input X
Transformation W
Output Y
181 182
Figure 4(b) Average pooling
183
3. The proposed BS-AMCNN
184
3.1 The structure of the BS-CNN
185
Traditional CNN methods are usually used to process image information; however, the
186
industrial process data are always digital format with time-series. Traditional CNN methods
187
cannot be directly applied to industrial process modeling. In order to solve this problem, the
188
structure of traditional CNN has been modified. In traditional CNN methods, the size of the input
189
image data, convolution windows and pooling windows are × , × and × ,
190
respectively. And > , > . The amount of data in the industrial process is much larger
191
than the number of variables. The number of associated variables in industrial process data can be
192
regarded as the width of the input data of the proposed BS-CNN structure, and the amount of
193
industrial process data can be regarded as the total length of the input data of the proposed BS-
194
CNN structure. The pixel points of the image are independent to each other; however, the
195
industrial process data are related to each other. When using CNN to process the industrial process
196
data, convolution windows need to cover all the associated variables. As shown in Figure 5(a), in
197
this paper the number of the associated variables is the same with the width of convolution
8
198
windows in the proposed BS-CNN structure. In the pooling structure, the characteristics of the
199
required variables cannot be removed by the pooling process, so the associated variables still need
200
to be remained. As shown in Figure 5(b), the width of pooling windows is also consistent with
201
the number of associated variables. Considering the information of industrial process data, pooled
202
values are averaged for each variable in the pooling windows to make the results more accurate.
203
Both the length of convolution windows and pooling windows, and , can be obtained by the
204
trial and error method. In Figure 5(a) and Figure 5(b), and represent the size of traditional
205
convolution window and the size of traditional pooling window, respectively. convolution window tc tc
pooling window t
lc
s
t
p
lp
p
s
206 207
Figure 5(a) the proposed convolution window
Figure 5(b) the proposed pooling window
208
The structure of the BS-CNN is shown in Figure 6. Assume that the initial samples have
209
m rows and columns, where represents the number of the input variables and stands for the
210
number of the initial samples . With the aid of the proposed panning bar-shaped convolution
211
window, the first convolution layer with data can be obtained. The width of the proposed
212
bar-shaped convolution window is equal to the number of the input attributes ; stands for the
213
length of the convolution layer, the value of can be obtained by . The convolution kernel
214
numbers stand for different shared weight numbers, and the features are extracted from . Then,
215
put into the first pooling layer . stands for the output of . According to the proposed bar-
216
shaped pooling window, can be obtained after dimensionality reduction; the width of the
217
proposed bar-shaped pooling window is also equal to the number of the input attributes ;
218
stands for the length of the pooling layer; the value of can be obtained by . After the
219
calculation of several convolution layers ( , , … ) and pooling layers ( , , … ) , a flat data
220
can be obtained with the outputs of the last pooling layer . Then, two fully connected layers
221
and are added into the proposed BS-CNN structure. and ! stand for the output of and
222
, respectively. Finally, calculate the final outputs of the proposed BS-CNN with the upper
223
output.
9
224 225
Figure 6 The BS-CNN architecture
226
3.2 The structure of proposed BS-AMCNN
227
The BS-CNN structure solves the problem that CNN cannot be directly applied to industrial
228
process data, and considers all the input variables at the same time. However, the operation
229
statuses in the production process are changing all the time. When the steady-state changes are not
230
obvious, modeling speed will reduce if all the data are used to build the operation model. When
231
the data changes drastically, the fixed network structure cannot extract deep features, which will
232
cause the underfitting of the segment. To overcome this weakness, we proposed a new adaptive
233
mode CNN based on bar-shaped structures (BS-AMCNN). When it comes to modeling the
234
industrial process data, the network structures are divided into three different modes for different
235
steady-state data conditions. All boundary values are obtained by combining process knowledge
236
with trial and error algorithms. The three different modes are shown as follows.
237
Inheritance mode
238
If the steady-state fluctuate within a small range, we can regard that the industrial data have the
239
same characteristics as the previous data. And we don’t have to use the data with same features to
240
establish the operation model, the previous trained model can represent a steady-state process that
241
is essentially unchanged. Equation 1 shows the boundary.
242
"#$% = "'()*+,- /
243
01 2013 013
/≤
(1)
Where "#$% stands for the network structure selected under the inheritance mode, "'()*+,-
10
244
stands for the previous network structure, 562 represents the previous moment of the input
245
variable, 56 represents the current value of the input variable, and according for the bound value
246
of the mode, which is usually range between 0% and 5%.
247
Normal mode
248
If the steady-state fluctuate between the boundary of inheritance mode and the boundary of
249
enhance mode, we can keep the structure of normal proposed BS-AMCNN. The normal mode
250
usually is the most frequently used mode, and we can continue to train the operation model with
251
the proposed BS-AMCNN method using the default number of layers and filters for CNN, and the
252
model can be described as follows. 01 2013
"#$7 = "892:;<77 < /
253
013
/ ≤
(2)
254
Where "#$7 represents the network selected under the normal mode, "892:;<77 represents
255
the normal BS-AMCNN structure, in which the number of filters and the layers is set as normal
256
CNN method, and stands for the bound value of the enhanced mode, which is usually range
257
between 5% and 10%.
258
Enhanced mode
259
When the steady-state fluctuate is large enough, the normal BS-AMCNN structure cannot extract
260
deeper features hidden in the process data. Without changing the form of the network, we can
261
only extract the deeper features hidden in the data by increasing the number of layers of the
262
network and the number of filters. The final number of layers and the filters in the network can be
263
obtained by trial and error method. And the enhanced mode can be described as follows:
264
∗ "#$> = "892:;<77 < /
265
01 2013 013
/≤
(3)
∗ represents Where "#$> represents the network selected under the enhanced mode, "892:;<77
11
266
the enhanced BS-AMCNN structure, in which the depth of the network and the number of filters
267
can be adjusted to the optimal state, and stands for the upper bound of the enhanced mode.
268
usually ranges from 10%-15%. When the data changed over 15%, we treat the process data as
269
non-steady-state.
270
3.3 The parameters assignment of the proposed BS-AMCNN
271
Two computing functions, the activation function and the cost function, are involved in the
272
proposed BS-AMCNN structure. Nonlinear variables are added into the activation function to deal
273
with nonlinear data. Cost function often affects the experimental results, and a small cost function
274
can obtain a desirable result.
275
3.3.1 Activation functions
276
@$AB function has the advantages of simple calculation, fast convergence and unsaturation, it
277
is a commonly used activation function. In the proposed BS-AMCNN architecture, the ReLU
278
function is used instead of the traditional sigmoid function, and its characteristics are as follows: C C>0 @$AB(C) = D 0 C≤0
279
(4)
280
where C represents the value before activation function. @$AB is a special activation function,
281
which alters the negative values to zero while the positive values keep unchanged. The unilateral
282
inhibition of @$AB creates conditions for the sparse connection of CNN. The main features of
283
complex objects can also be obtained by the sparse property of @$AB function.
284
3.3.2 Cost functions
285
In complex neural network structure, the size of cost function is usually used to evaluate the
286
adaptability between the predicted value and the target value. A simple sample ( (*) , (*) ) of the
287
cost function can be defined as follows.
12
FGH, ; (*) , (*) J = KℎM,N G (*) J − (*) K
288
(5)
289
where G ( ) , ( ) J, ⋯ , G (Q) , (Q) J represent the samples. Equation 6 describes the whole cost
290
function:
V
6
291
QX 2 6X XY ∑*Z ∑WZ
F(H, ) = Q ∑Q*Z F SH, ; G (*) , (*) JT + ∑[Z
(H*W6 )
292
where the unit in the layer and the unit \ in the layer + 1 are related to the weight H*W6 ; ^
293
represents the weight decay parameter; F(H, ) represents the mean of the cost function as a
294
whole. Combining with the principle of gradient descent, we can get update formulas of H*W6 and
295
*W6 in the direction of negative gradient, as shown in Equation 7 and Equation 8. ab
296
H*W6_ = H*W6 − ` aM 1
297
*W6_ = *W6 − ` aN1
cd
ab
cd
(6)
(7) (8)
298
where ` represents the learning rate. In case that H*W6 and *W6 are layered, we can use iterative
299
computation to complete the whole algorithm as long as the partial derivatives of H*W6 and *W6 can
300
be obtained.
301
In summary, because of the proposed bar-shaped input data, bar-shaped convolution windows
302
and bar-shaped pooling windows of the proposed BS-AMCNN, industrial process data
303
associated with optimal operating conditions can be extracted using the proposed BS-
304
AMCNN. With the proposed adaptive mode in the BS-AMCNN structure, the optimal operating
305
strategy of the industrial processes can be modeled. Each layer of the proposed BS-AMCNN
306
structure can adjust itself for the final task. So, effective communication between layers can be
307
achieved. The application procedure of the proposed BS-AMCNN consists of two phases: the off-
308
line training phase and the on-line prediction. In the off-line training phase, convolution and
309
pooling processing are carried out repeatedly. And then the outputs of the final pooling layer are
13
310
put into the full connection layers to calculate out the outputs of the BS-AMCNN. The optimal
311
BS-AMCNN model can be confirmed by changing the length of convolution and pooling
312
windows using the trial and error method. During the on-line prediction phase, real-time process
313
variables are put into the well-trained BS-AMCNN to predict the optimal operation strategies.
314
Based on the descriptions above, the block diagram of the proposed BS-AMCNN method is
315
graphically illustrated in Figure 7. Off-line training
History data Select the network model
Convolution
Pooling
Repeat and get the best state
Well-trained BS-AMCNN model
Prediction of the operation values
On-line prediction
316 317
Output the best results
Results of the proposed BS-AMCNN structure
Change the length of windows
Process variables associated with operation variables
Full connection
Figure 7. The application procedure of the proposed BS-AMCNN
318
4. Case Study
319 320 321 322 323 324 325 326 327 328 329
With the development of modern industry, the consumption of non-renewable energy sources such as petroleum is increasing, and people must find new alternative renewable energy sources. Methanol is an important industrial product. It is a renewable, green and clean energy source, which can alleviate the problem of oil shortage to a large extent. Due to the complex engineering background, the methanol production process involves many random, fuzzy, uncertain and uncontrollable factors. Improper operation settings tend to reduce product quality and increase material consumption; therefore, it is necessary to monitor the state of complex chemical processes, evaluate the performance of manual operations and adjust operations in a timely and accurate manner. We use the proposed BS-AMCNN method to establish the operation model in this paper, which will be helpful with the material consumption and production efficiency improvement. Figure 8 shows the flowchart of the methanol production process.
14
U PSTREAM SYNGAS
J 121 S
FIC3103
PLANT AIR
V-1 S
FIC3106
FROM IMPORT HYDROGEN
V-2
E 121
E122
S
PIC3701B
TIC3401
S
D321 TO FLARE
D 121 V-7 S
PIC3302
V-3 S
V-5
FIC3503
R ECTIFICATION
V-6
E123A/B
S
E124
TIC3402
D 122
330 331 332 333
334 335 336 337 338 339 340 341 342
D322
V-4
Figure 8 The flow chart of Tennessee Eastman Process There are several important variables that related to the productions, which are shown in Table 1. Table 1 The most relevant operation variables with target variables Device Number Variable Name Variable Unit FIC3103 The airflow of the factory Nm3/h FIC3106 Hydrogen flow Nm3/h FIC3503 Crude methanol flow t/h PIC3302 Outlet pressure of methanol separator MPa PIC3701B Outlet pressure of purge gas MPa TIC3401 Outlet gas temperature of the first synthetic tower ℃ TIC3402 Outlet gas temperature of the second synthetic tower ℃ There are many links in the design of the production process of methanol. The main steps are as follows: First, the raw material gas is compressed and input into the preheater. After the temperature reaches the optimal temperature of the reaction, it is sent to the reactor for synthesis under the action of the catalyst. The obtained products are firstly cooled by a cooler, and then are sent to a separator after reaching a certain temperature. Crude methanol is separated in the separator, and the raw material gas that has not fully reacted enters the cycle to continue the reaction. The main formulas designed are as follows: g6gWh-6
+ 2f ijjjjk f f g6gWh-6
343
+ 3f ijjjjk f f + f
344
+ f ijjjjk + f
345 346 347 348 349
g6gWh-6
(9) (10) (11)
All the above are exothermic reactions. FIC3503 is regarded as the output of the methanol production process in this paper. there are 20000 samples of each variable, 12000 samples are selected to train the BS-ARFTCN model, and the rest 8000 samples are used to test the welltrained model. Figure 9 shows the historical data of the selected samples. The program runs on Intel(R) Core (TM) i7-7700HQ NVIDIA GeForce GTX 1060 computer.
15
350 351 352
Figure 9 The historical data of selected samples First, the range of the selected input data each time can be obtained by the trial and error
353
method. The error with different range values is shown in Figure 10. When the range of the
354
selected input data is 40, the final valve opening degree can reach the minimum error
355
7.482 × 10 −6 . Different lengths of the proposed bar-shaped convolution windows are determined
356
using the trial and error method. The error with different length values is shown in Figure 11.
357
When the proposed bar-shaped convolution window size is 4, the final operation model can reach
358
the minimum error 7.397 × 102p. Different lengths of the proposed bar-shaped pooling windows
359
can also be determined using the trial and error method in Figure 11. When the optimal size of the
360
proposed bar-shaped pooling window is 3, the final operation model can reach the minimum error
361
5.459 × 102p. In the proposed BS-AMCNN method, the above selected parameters are used. The
362
optimal result for the proposed BS-AMCNN can be obtained.
16
363 364
Figure 10 Output errors with different range of selected data
365 366
Figure 11 Output errors with different range of pooling windows and convolution windows
367 368 369 370 371
After the parameters are settled, the training sets are put into the proposed BS-AMCNN algorithm to verify the effectiveness of the method. Figure 12 shows the modeling results and Figure 13 shows the error of the built model. It is clear that the proposed BS-AMCNN could help to build a good operation model. The predicted model can be used in the methanol production process to predict the production of crude methanol.
17
372 373
Figure 12 the training results of the crude methanol
374 375 376 377 378 379
Figure 13 the training errors of the methanol production Next, we use the test sets to verify the established model, and the testing results could meet our requirements. Figure 14 shows the test results of the crude methanol production, and Figure 15 shows the errors of the production. Table 2 shows the training and testing results of the proposed BS-AMCNN method.
18
380 381
Figure 14 The test results of the crude methanol production
382 383 384
385 386
Figure 15 The test errors of the crude methanol production Table 2 The training and testing results of the proposed BS-AMCNN method The proposed BS-AMCNN method Training errors Testing errors 2 Crude methanol Production −4.26 × 10 −5.95 × 102 In order to validate the performance of the proposed BS-AMCNN method, we further use
387
other two different methods, the traditional CNN method and the Back Propagation (BP) neural
388
network method, to compare with the proposed method. The experimental results of four different
389
optimization operation models are shown in Figure 16. The errors of three different models are
390
shown in Figure 17.
19
391 392
Figure 16 The comparison of three different operation modeling methods
393 394 395
Figure 17 The errors of three different operation modeling methods Table 3 The errors for the crude methanol productions with different methods Methods
BP
The traditional CNN
The proposed BS-AMCNN
Errors
−3.42 × 102
−2.65 × 102
−5.95 × 102
396
In order to compare the result more precisely, we illustrate the errors in Table 3. Table 3
397
shows that the prediction errors for crude methanol production with the BP method, the traditional
398
CNN method, and the proposed BS-AMCNN method are −3.42 × 102, −2.65 × 102, −5.95 ×
399
102, respectively. It is demonstrated that the proposed BS-AMCNN method can receive the most
400
accurate operation modeling results with the help of the bar-shaped structures and the adaptive
20
401
network mode. This kind of long bar translation-only sliding windows can suite well with the time
402
series industrial process data.
403
5. Conclusions
404
A novel intelligent modeling framework combing adaptive mode convolutional neural
405
network with bar-shaped structure (BS-AMCNN) is proposed to deal with the optimal operation
406
modeling problems in industrial processes. With the aid of the proposed bar-shaped structures, the
407
CNN algorithm can be successfully applied to processing complex industrial process data of time
408
sequence. Meanwhile the adaptive mode can coordinate the proposed BS-AMCNN method
409
dealing with the time-varying working environment. In order to prove the effectiveness of the
410
proposed BS-AMCNN, a real methanol production process is taken as an experimental object. The
411
proposed BS-AMCNN structure is used to validate an optimal operation modeling problem. The
412
BP method and the traditional CNN method are further used to verify the performance of the
413
proposed BS-AMCNN method. the results show that the proposed BS-AMCNN method could
414
achieve the highest prediction? accuracy. Therefore, the proposed BS-AMCNN method can be
415
served as an effective operation modeling tool for industrial processes.
416 417
References
418
[1] Carlini M, Mennuni A, Allegrini E, et al. Energy Efficiency in the Industrial Process of
419
Hair Fiber Depigmentation: Analysis and FEM Simulation[J]. Energy Procedia, 2016, 101:550-
420
557.
421 422 423 424 425 426 427 428
[2] Giacone E, Mancò S. Energy efficiency measurement in industrial processes[J]. Energy, 2012, 38(1): 331-345. [3] Hameed B H, El-Khaiary M I. Malachite green adsorption by rattan sawdust: Isotherm, kinetic and mechanism modeling[J]. Journal of Hazardous Materials, 2008, 159(2-3): 574-579. [4] Hill D J, Minsker B S. Anomaly detection in streaming environmental sensor data: A datadriven modeling approach[J]. Environmental Modelling & Software, 2010, 25(9): 1014-1022. [5] Hameed B H, El-Khaiary M I. Malachite green adsorption by rattan sawdust: Isotherm, kinetic and mechanism modeling[J]. Journal of Hazardous Materials, 2008, 159(2-3):574.
21
429 430 431 432
[6] Glarborg P, Marshall P. Mechanism and modeling of the formation of gaseous alkali sulfates[J]. Combustion & Flame, 2005, 141(1):22-39. [7] Frenklach M, Wang H. Detailed Mechanism and Modeling of Soot Particle Formation[J]. 1994, 59(59):165-192.
433
[8] Jiang, Jingliang, Peiqi, et al. Study on micro-interacting mechanism modeling in grinding
434
process and;ground surface roughness prediction[J]. International Journal of Advanced
435
Manufacturing Technology, 2013, 67(5-8):1035-1052.
436 437 438 439
[9] Wu C L, Chau K W, Li Y S. Predicting monthly streamflow using data℃driven models coupled with data℃preprocessing techniques[J]. Water Resources Research, 2009, 45(8). [10] Boets P, Lock K, Messiaen M, et al. Combining data-driven methods and lab studies to analyse the ecology of Dikerogammarus villosus[J]. Ecological Informatics, 2010, 5(2):133-139.
440
[11] Nuhic A, Terzimehic T, Soczka-Guth T, et al. Health diagnosis and remaining useful life
441
prognostics of lithium-ion batteries using data-driven methods[J]. Journal of Power Sources, 2013,
442
239:680-688.
443 444 445 446
[12] Zhang X, Liu J, Li B, et al. DONet/CoolStreaming: A data-driven overlay network for live media streaming[J]. Proc IEEE Infocom, 2005, 3:2102 - 2111. [13] Han J, Cai Y, Cercone N. Data-driven discovery of quantitative rules in relational databases[J]. IEEE Trans.knowl.data Eng, 1993, 5(1):29-40.
447
[14] Jarmo J. Huuskonen, David J. Livingstone and Igor V. Tetko. Neural Network Modeling
448
for Estimation of Partition Coefficient Based on Atom-Type Electrotopological State Indices[J].
449
Journal of Chemical Information & Computer Sciences, 2000, 40(4):947.
450
[15] Lee D S, Jeon C O, Park J M, et al. Hybrid neural network modeling of a full-scale
451
industrial wastewater treatment process[J]. Biotechnology & Bioengineering, 2010, 78(6):670-682.
452
[16] He Y L, Xu Y, Geng Z Q, et al. Hybrid robust model based on an improved functional
453
link neural network integrating with partial least square (IFLNN-PLS) and its application to
454
predicting key process variables[J]. ISA transactions, 2016, 61: 155-166.
455
[17] Ling S H, Iu H H C, Leung F H F, et al. Improved Hybrid Particle Swarm Optimized
456
Wavelet Neural Network for Modeling the Development of Fluid Dispensing for Electronic
457
Packaging[J]. IEEE Transactions on Industrial Electronics, 2008, 55(9):3447-3460.
22
458
[18] Beşikçi E B, Arslan O, Turan O, et al. An artificial neural network based decision
459
support system for energy efficient ship operations[J]. Computers & Operations Research, 2016,
460
66: 393-401.
461 462
[19] Cui G M, Deng-Fei H U, Xiang M A. Operational-Pattern Optimization in Blast Furnace PCI Based on Prediction Model of Neural Network[J]. Journal of Iron & Steel Research, 2014.
463
[20] Rangwala S S, Dornfeld D A. Learning and optimization of machining operations using
464
computing abilities of neural networks[J]. Systems Man & Cybernetics IEEE Transactions on,
465
1989, 19(2):299-314.
466
[21] Ochoa-Estopier L M, Jobson M, Smith R. Operational optimization of crude oil
467
distillation systems using artificial neural networks[J]. Computers & Chemical Engineering, 2013,
468
59(5):178-185.
469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486
[22] Lawrence S, Giles C L, Tsoi A C, et al. Face recognition: a convolutional neural-network approach[J]. IEEE Trans Neural Network, 1997, 8(1):98-113. [23] Deng L, Yu D. Deep Learning: Methods and Applications[J]. Foundations & Trends in Signal Processing, 2014, 7(3):197-387. [24] Liu Z, Luo P, Wang X, et al. Deep Learning Face Attributes in the Wild[J]. 2014:37303738. [25] Längkvist M, Karlsson L, Loutfi A. A review of unsupervised feature learning and deep learning for time-series modeling ☆[J]. Pattern Recognition Letters, 2014, 42(1):11-24. [26] Majumder N, Poria S, Gelbukh A, et al. Deep learning-based document modeling for personality detection from text[J]. IEEE Intelligent Systems, 2017, 32(2): 74-79. [27] Mohamed A, Dahl G E, Hinton G. Acoustic Modeling Using Deep Belief Networks[J]. IEEE Transactions on Audio Speech & Language Processing, 2011, 20(1):14-22. [28] Basu S, Karki M, Ganguly S, et al. Learning sparse feature representations using probabilistic quadtrees and deep belief nets[J]. Neural Processing Letters, 2017, 45(3): 855-867. [29] Du B, Xiong W, Wu J, et al. Stacked convolutional denoising auto-encoders for feature representation[J]. IEEE transactions on cybernetics, 2017, 47(4): 1017-1027. [30] Ding Y, Zhang X, Tang J. A Noisy Sparse Convolution Neural Network Based on Stacked Auto-encoders[J]. network, 2017, 2: 1.
23
487 488
[31] Williams R J, Zipser D. A Learning Algorithm for Continually Running Fully Recurrent Neural Networks[J]. Neural Computation, 2014, 1(2):270-280.
489
[32] Zhang X Y, Yin F, Zhang Y M, et al. Drawing and recognizing chinese characters with
490
recurrent neural network[J]. IEEE transactions on pattern analysis and machine intelligence, 2018,
491
40(4): 849-862.
492 493 494 495
[33] Zhu J, Liao S, Lei Z, et al. Multi-label convolutional neural network based pedestrian attribute classification[J]. Image and Vision Computing, 2017, 58: 224-229. [34] Kim Y. Convolutional Neural Networks for Sentence Classification[J]. Eprint Arxiv, 2014.
496
[35] Chen Y, Jiang H, Li C, et al. Deep Feature Extraction and Classification of Hyperspectral
497
Images Based on Convolutional Neural Networks[J]. IEEE Transactions on Geoscience & Remote
498
Sensing, 2016, 54(10):6232-6251.
499
[36] Wiatowski T, Bölcskei H. A Mathematical Theory of Deep Convolutional Neural
500
Networks for Feature Extraction[J]. IEEE Transactions on Information Theory, 2015, PP (99):1-1.
501
[37] Acharya U R, Fujita H, Lih O S, et al. Automated detection of arrhythmias using
502
different intervals of tachycardia ECG segments with convolutional neural network[J].
503
Information sciences, 2017, 405: 81-90.
504 505
[38] Lecun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11):2278-2324.
24
An adaptive mode CNN based on bar-shaped structure (BS-AMCNN) is proposed.
The convolution and pooling windows are transformed into bar-shaped structures.
The adaptive mode in BS-AMCNN can suite the changeable working conditions.
Optimal operations can be effectively extracted using the proposed BS-AMCNN method.
The proposed method outperforms standard models on methanol production process.
Declaration of interests ☒ The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. ☒The authors declare the following financial interests/personal relationships which may be considered as potential competing interests:
Yongjian Wang Hongguang Li