Journal Pre-proof Safety control modeling method based on Bayesian network transfer learning for the thickening process of gold hydrometallurgy Hui Li, Fuli Wang, Hongru Li, Qingkai Wang
PII: DOI: Reference:
S0950-7051(19)30586-6 https://doi.org/10.1016/j.knosys.2019.105297 KNOSYS 105297
To appear in:
Knowledge-Based Systems
Received date : 4 June 2019 Revised date : 25 November 2019 Accepted date : 27 November 2019 Please cite this article as: H. Li, F. Wang, H. Li et al., Safety control modeling method based on Bayesian network transfer learning for the thickening process of gold hydrometallurgy, Knowledge-Based Systems (2019), doi: https://doi.org/10.1016/j.knosys.2019.105297. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
© 2019 Published by Elsevier B.V.
*Conflict of Interest Form
Journal Pre-proof Conflict of interest
Jo
urn a
lP
re-
pro of
There is no conflict of interest.
*Revised Manuscript (Clean Version) Click here to view linked References
Journal Pre-proof Safety control modeling method based on Bayesian network transfer learning for the thickening process of gold hydrometallurgy
a
pro of
Hui Lib, Fuli Wanga,b, Hongru Lib, Qingkai Wangc
State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University,
Shenyang, China; postal code: 110819; b
Information Science and Engineering, Northeastern University, Shenyang, China; postal code:
110819;
BGRIMM Technology Group, Daxing District, Beijing, China
E-mails:
[email protected];
re-
c
[email protected];
lP
[email protected]
[email protected];
The corresponding author is Hui Li (e-mail:
[email protected]; postal address: P.O.Box 135, No.11
Jo
urn a
St.3 Wenhua Road, Heping District Shenyang, Liaoning Province, P.R. China; postal code: 110819).
Journal Pre-proof Safety control modeling method based on Bayesian network transfer learning for the thickening process of gold hydrometallurgy Abstract: When the data information of target domain is very limited, it is difficult to establish the
pro of
accurate model to analyze the target problem. For the safety control modeling problem, this paper develops a new Bayesian network (BN) transfer learning strategy for the thickening process of gold hydrometallurgy. First of all, the safety control modeling problem in this process is analyzed deeply. When the data information of abnormality is insufficient, the safety control modeling problem is transformed into the BN transfer learning problem. Furthermore, the new BN transfer learning strategy is proposed, which includes the structure and parameters transfer learning methods. For the structure transfer learning, by integrating the common structural information of multiple sources and the useful
re-
information of target, the final structure of target is determined. For the parameters transfer learning, by distinguishing the similarity of multiple sources, the parameters of target are obtained by the fusion algorithm. Finally, the proposed method is verified by the Asia network and it is applied to establish the
lP
safety control model for the thickening process of gold hydrometallurgy. The simulation results demonstrate that the proposed method is effective and owns the better performances than the traditional modeling method.
knowledge
urn a
Keywords: Bayesian network; Transfer learning; Gold hydrometallurgy; Safety control; Expert
1. Introduction
As an important technique of refining gold in the industrial process (de Andrade Lima, 2006), hydrometallurgy includes the sub-processes of flotation, concentration, leaching, washing and
Jo
cementation. Before the cyanide leaching, the slurry needs to be concentrated by the thickener and the pressure filter to get the high solid slurry. This process is called as the thickening process which is a key process to guarantee the efficiency of the following cyanide leaching. When the characteristics of raw materials change and the operation strategy is improper, the abnormity will occur. Because of the high economic value of gold, the occurrence of abnormity will lead to the serious financial losses or even safety threat. Therefore, the monitoring and identification of abnormity and making the
Journal Pre-proof corresponding safety control scheme attract more and more attention. These researches can ensure that the process runs well. The existing fault detection and safety control methods mainly include the model-based methods (Jin, Zhang, Jing, & Fu, 2019; L. Li, Luo, Ding, Yang, & Peng, 2019; Tran & Trinh, 2019) and the data-driven methods (Y. Liu, Wang, & Chang, 2013; Wu, Wu, Chai, & Sun, 2015; Zhao & Gao, 2017; Zhao & Huang, 2018). However, when there is no accurate mechanism model
pro of
available in the research field, the model-based method will not obtain good performance. The data-driven methods break up the limitations of model-based methods. Based on the collected data information, the condition can be monitored by the data-driven modeling methods. In these methods, it is assumed that the data information is sufficient to establish the monitoring model and the safety control model. However, for the most data-driven methods, the ability to use the expert knowledge and operation experience is limited.
re-
As an intelligent machine learning method, Bayesian network (BN) provides us a new way to solve the problem, which can fuse the expert knowledge and data information to establish the model effectively. For the safety control problem in the thickening process of gold hydrometallurgy, the paper
lP
(H. Li, Wang, & Li, 2017) proposed the safety control scheme based on the BN for two common abnormities. Based on the research results in the paper (H. Li, et al., 2017), the paper (H. Li, Wang, Li, & Wang, 2019) analyzed the third common abnormity and proposed the updating learning scheme for the established BN model. However, the methods in the papers (H. Li, et al., 2017; H. Li, et al., 2019)
urn a
are all proposed based on the sufficient data information of abnormities. Because the abnormity may cause huge losses and safety threat, few factories would like to make abnormity deliberately and collect the data of abnormity. The collection of corresponding good safety control scheme for every abnormity becomes more difficult. In addition, the occurrence of some abnormities may need to experience a long time. It results in the difficulty of collecting the data information of abnormity further. When the data
Jo
of abnormity is insufficient, it is difficult to establish the accurate model to solve the target problem. Under this situation, the transfer learning and domain adaptation inspire us from a new angle to solve the problem. By collecting the useful information of related sources whose abnormality data have been collected and/or the model has been established, the problem of target can be solved by utilizing the related information effectively. Transfer learning has been applied in the various fields and obtains the extensive attention, such as, prediction (X. Liu, Li, & Chen, 2019; Pereira, Lima, Leite, Gomes, &
Journal Pre-proof Machado, 2017; Sun, et al., 2019), classification (Anam & Rushdi, 2019; Talo, Baloglu, Yildirim, & Acharya, 2019), filtering (Grolman, Bar, Shapira, Rokach, & Dayan, 2016; K. Li & Principe, 2017) and so on. Therefore, for the safety control modeling problem in the thickening process of gold hydrometallurgy, when the data information of abnormity is insufficient, other thickening processes in the same factory and/or other factories which use the hydrometallurgy technology to produce gold can
pro of
be considered as the related sources to provide the useful information. When applying the information of related sources, the differences of the relationships of variables and the distributions of parameters between the sources and target need to be focused. Based on the existing research results (H. Li, et al., 2017; H. Li, et al., 2019) on the safety control for the thickening process of gold hydrometallurgy, this paper considers that how to establish effective model to solve the safety control problem when the data information of abnormity is insufficient. Because BN is used to model the target problem in the
re-
existing research results, the safety control modeling problem is transformed into the BN transfer learning problem.
The transfer learning survey based on the computational intelligence methods has been provided
lP
by the paper (Lu, et al., 2015). The existing transfer learning researches mainly focus on the neural network model (Salaken, Khosravi, Nguyen, & Nahavandi, 2017; Shin, et al., 2016). For the BN transfer learning method, the related studies are limited relatively. For the BN structure transfer learning, the paper (Luis, Sucar, & Morales, 2009) proposed a new weighted sum of the conditional
urn a
independence measures by combining measures from the target task with the auxiliary tasks. The papers (Niculescu-Mizil & Caruana, 2007; Diane Oyen & Lane, 2014) considered the BN structure transfer learning method for the multitask learning based on the search and score techniques. For the BN parameters transfer learning, the paper (Y. Zhou, Hospedales, & Fenton, 2016) proposed the BN parameters transfer learning algorithm based on both network and fragment relatedness. In this process,
Jo
the problem of heterogeneous relatedness was analyzed and solved. The paper (Luis, et al., 2009) introduced the distance based linear pooling and local linear pooling probability aggregation methods to combine the probability estimates from the target task with those from the auxiliary tasks. However, when measuring the weights of different sources, the proposed method only considered the influences of conditional probability tables (CPTs) entry size and dataset size, and the fitness of source to the target domain was ignored. In addition, the expert knowledge plays the important role in the process of
Journal Pre-proof establishing the model of target problem. It is an effective way to improve the accuracy of model by integrating the expert knowledge into transfer learning. The paper (Zhu, Yao, & Gao, 2018) transferred the qualitative and quantitative knowledge to monitor the similar batch process. By incorporating the domain knowledge, the paper (Diane
Oyen & Lane, 2012) relaxed the assumption condition when
evaluating the task-relatedness in multitask BN structure learning. The paper (Yun
Zhou, Fenton,
pro of
Hospedales, & Neil, 2015) presented the new probabilistic graphical models parameters transfer learning method by the transferred prior and constraints based on the expert knowledge.
As far as we know, no research results focus on the safety control modeling problem for the thickening process of gold hydrometallurgy when the data information of target is too scarce to establish an accurate model. Therefore, inspired by the expert knowledge and transfer learning, this paper proposes a new safety control modeling method for the thickening process of gold
re-
hydrometallurgy based on the BN transfer learning strategy. In this process, the safety control problem is analyzed based on the existing research results. When the data information of target is very limited, the safety control modeling problem is transformed into the BN transfer learning problem. Furthermore,
lP
a new BN transfer learning strategy is proposed. By extracting the common structural information (CSI) of multiple sources and integrating the useful information of target, the final structure of target can be obtained. The parameters of target can be learned by fusing the parameters of multiple sources owning the different similarities and target. Finally, some simulation results are shown to verify the
urn a
effectiveness of proposed method. The proposed transfer learning strategy is applied to establish the safety control model for the thickening process of gold hydrometallurgy when the dosages of flocculants are in the different conditions. The simulation results imply that the proposed approach is effective and owns the better performances than the traditional modeling method which uses the limited data information of target.
Jo
The contributions of this paper can be summarized as follows. On the one hand, this paper proposes the new safety control modeling method based on the transfer learning for the thickening process of gold hydrometallurgy. On the other hand, this paper proposes the new BN transfer learning strategy. By integrating the CSI of multiple sources and the useful information of target, the final structure of target is determined. By distinguishing the similarity of multiple sources, the parameters of target are obtained by the fusion algorithm. The proposed method owns the generality, and it can be
Journal Pre-proof applied to solve the similar problem in the other research backgrounds. The remaining sections of this article are organized as follows. Based on the existing research results on the safety control for the thickening process, the problem to solve in this paper is analyzed deeply in the Section 2. The new BN transfer learning method is proposed. The structure and parameters transfer learning methods are shown respectively in the Section 3. In the Section 4, the
pro of
proposed algorithm is verified by a set of simulation results. Furthermore, the proposed method is applied to establish the safety control model for the thickening process of gold hydrometallurgy. Finally, the Section 5 concludes the paper.
2. Problem formulation
2.1 The existing safety control research results for the thickening process
urn a
lP
re-
The simplified schematic diagram of thickening process can be depicted by the Figure 1.
Jo
Figure 1. The simplified schematic diagram of thickening process
This process consists of thickener, pressure filter, buffer slots, slurry pumps and valves. For the problems of abnormity identification and safety control in the thickening process of gold hydrometallurgy, the expert knowledge and operation experience have been extracted and summarized, and the data information of relevant variables have been collected by all kinds of sensors. In the existing research results (H. Li, et al., 2017), based on the expert knowledge and data information, the
Journal Pre-proof safety control scheme was proposed based on the BN for two common abnormities. Two common abnormities include that “the underflow concentration of thickener is too high” and “the buffer slot 1 under the thickener is empty”. The causes, phenomena and corresponding removing measures for the abnormities have been analyzed deeply, which is used to define variables and construct the BN structure. In the paper (H. Li, et al., 2019), another common abnormity that “the overflow turbidity is
pro of
too high” was analyzed further. By integrating the research results in the paper (H. Li, et al., 2017), the new safety control BN model was established for three abnormities in the thickening process. Based on the analysis of three abnormities, it can conclude that for the different dosages of flocculants, the relationships among the BN variables are different. Therefore, although most of the BN structure can be obtained by the expert knowledge, the specific structure needs to be determined by the characteristics of data information, which owns the different relationships for the different situations. In
re-
addition, the BN parameters need to be learned from the data information of abnormity. The models in the existing research results (H. Li, et al., 2017; H. Li, et al., 2019) are established under the condition that the data information of abnormity is enough. When the data information of abnormity is limited for
lP
the thickening process, the safety control model is attempted to establish for the target thickening process using the abnormality data from other relevant thickening processes as sources. The problem to solve can be described as the following form: The objective is to establish the safety control model for the thickening process of gold hydrometallurgy. The target domain is the thickening process with
urn a
scarce abnormality data. The source domains include the thickening processes with enough abnormality data from the same factory and/or other factories which apply the hydrometallurgy technology to refine gold. In this research background, the tasks of target and sources are the same, and the target and sources own the same variables. But the specific BN structures may be different for the different conditions and the parameters may have different distribution characteristics due to the
Jo
different equipment sizes or operation differences. Therefore, the safety control modeling problem for the thickening process with scarce abnormality data is transformed into the BN transfer learning problem. The sources information includes the abnormality data and the corresponding safety control model. The available sources may include the above two aspects or only include one aspect. To make the scope of application for the proposed BN transfer learning method more extensive, the corresponding strategy is proposed for the source with the different characteristics.
Journal Pre-proof
2.2 Problem description In the BN transfer learning setting, a domain D ={V, G, Da} includes three components: the variables V={X1 , X 2 , X 3 ,… , X n } represent the BN nodes, Da represents the associated data, and G represents a directed acyclic graph which encodes the statistical dependencies among the variables.
pro of
pa(X i ) represents the parent nodes of node X i . The CPTs specify the probability p(X i |pa(X i )) of every variable given its parents as defined by graph G . In this paper, there is one target domain D t ={V t , G t , Da t } , and a set of sources {D1s , D2s ,… , DLs } (L ≥ 1) , Dls = {Vls , G ls , Da ls } . The target domain and each source domain have training data Da t = {d1t , dt2 ,… , d tN } and Da ls = {d1s , d s2 ,… , d sM } . N represents the number of data for the target domain. M represents the number of data for the source
re-
domain. For BN transfer learning, the condition that the target domain data is relative scarce 0 < N ≪ M , or N is small relative to the dimensionality of the target problem is considered. The target domain parameters are represented as θ t . The objective of BN transfer learning is to improve the
lP
accuracy of BN model in D t using the information in {D1s , D2s ,… , DLs } (L ≥ 1) . Therefore, the BN transfer learning can be defined as:
ˆ t = arg max p( D t ,{D s , D s ,…, D s } | G t ) G L 1 2
(1)
θˆt = arg max p( D t ,{D1s , D2s ,… , DLs } | θ t )
(2)
Gt
urn a
θt
ˆ t represents the estimation of target domain structure; θˆt represents the estimation of target where G domain
parameters.
The
following
conditions
are
assumed:
V t =Vls
,
{D1s , D2s ,… , DLs } (L ≥ 1) and D t may own different distribution properties. The available useful
Jo
information of multiple sources may have different situations. Sometimes the models and data information of multiple sources are all known, and sometimes only one aspect is known. For the different situations, the different strategies will be proposed.
3. A new BN transfer learning method 3.1 The proposed BN structure transfer learning strategy In this section, the new BN structure transfer learning strategy is proposed to obtain the structure of
Journal Pre-proof target. It needs to be considered that how to apply the useful information of multiple sources and the limited information of target to learn the structure of target. The specific process can be depicted by the
lP
re-
pro of
Figure 2.
Figure 2 The proposed BN structure transfer learning strategy
urn a
The proposed BN structure transfer learning method includes three main missions. At first, the CSI of multiple sources needs to be extracted to reflect the common characteristics of multiple sources. Because only the information from the same background is selected as the source, the CSI of multiple sources is more likely to appear in the structure of target. Furthermore, the structure of target needs to be determined by the available useful information in the target and the obtained CSI. Finally, for the
Jo
different available useful information in the multiple sources and target, the different transfer learning strategies should be applied. The objective of this method is to make the searched optimal structure of target which not only takes full use of the common characteristics in the multiple sources, but also reflects the characteristics of available useful information in the target. In the following section, the above three main problems will be analyzed respectively.
Journal Pre-proof 3.1.1 The CSI extraction Based on the related research results on the BN structure learning from the related tasks (Diane
Oyen
& Lane, 2012; Diane Oyen & Lane, 2014), the new BN structure transfer learning strategy is proposed in this section. Given data-sets Da1s … Da sL and structures G1s … G sL from the L related sources, the CSI is
pro of
extracted:
P(G g | G1:s L , Da1:s L ) ∝ P(G g )P(G1:s L , Da1:s L | G g )
∝ P(G g )P(G1:s L | G g )P(Da1:s L | G g ) L
(3)
L
It assumes that P(G1:s L | G g )=∏ P(G ls | G g ) and P(Da1:s L | G g )=∏ P(Da ls | G g ) . The Equation (3) l =1
l =1
is simplified to the following form: L
L
P(G g | G1:s L , Da1:s L ) ∝ P(G g )∏ P(G ls | G g )∏ P(Da ls | G g ) l =1
(4)
l =1
re-
Performing logarithm operation on equation (4), it can obtain that L
L
logP(G g | G1:s L , Da1:s L ) ∝ log P(G g ) + ∑ log P(G ls | G g ) + ∑ logP(Da ls | G g ) l =1
Based on the research results in the papers (Diane
(5)
l =1
Oyen & Lane, 2012; Diane Oyen & Lane,
lP
2014), P(G ls | G g ) can be expressed by the form (6):
s
P(G ls | G g ) ∝ (1 − α ) β∆ (Gl ,G
g
)
(6)
where ∆ is a graph distance metric, which measures the number of structural differences between
urn a
graphs G ls and G g ; The parameter β is the adjustment coefficient; The parameter α ∈ [0,1] controls the similarity of structures. When α = 1 , the structures of G ls and G g are forced to be exactly the same, because only non-zero probability is at ∆ (G ls , G g ) = 0 ; When α = 0 , the structures of G ls and G g will be learned independently, because all values of ∆ (G ls , G g ) will have equal probability.
Jo
Finally, based on the above analysis, the Equation (3) is expressed as the following form (7): L
s
g
L
logP(G g | G1:s L , Da1:s L ) ∝ log P(G g ) + β ∑ log(1 − α )∆ (Gl ,G ) + ∑ logP(Da ls | G g ) l =1
(7)
l =1
L
s
g
where log P(G g ) represents the prior structural information of G g ; β ∑ log(1 − α ) ∆ (Gl ,G ) controls the l =1
L
degree of similarity between G g and G ls ;
∑ logP(Da
s l
| G g ) controls the closeness between G g and Da ls .
l =1
Based on the search and score method, the complexity of structure is considered further. Therefore, the
Journal Pre-proof final score function for the CSI extraction is shown in the following form: L
scoreG g |G s
s 1:L ,Da1:L
l
L
g
l =1
n
scoreDa s |G g =
s
= log P(G g ) + β ∑ log(1 − α ) ∆ (Gl ,G ) + ∑ scoreDa s |G g qi
ri
(8)
l
l =1
n
∑∑∑ m
l ijk
i =1 j =1 k =1
l log(mijk / mijl ) − (1 / 2) log ml ∑ qi (ri − 1)
(9)
i =1
where n represents the number of nodes in BN model ( 1 ≤ i ≤ n ); qi represents the number of
pro of
alternative combination states of the parent nodes of the i th node ( 1 ≤ j ≤ qi ); ri represents the number l represents the number of records which satisfy the of alternative states of the i th node ( 1 ≤ k ≤ ri ); mijk
condition that the i th node is in the k th state and its parent set is in the j th state for the l th source;
mijl represents the records which satisfy the condition that the parent set of i th node is in the j th state for the l th source; ml represents the sample size of l th source.
one:
Before
calculating ∆ (G ls , G g ) ,
some
related
symbols
are
explained.
re-
Remark
gijs ′ (1 ≤ i ≤ n, 1 ≤ j ′ ≤ n) represents the element of the matrix G ls , and its value range includes {1, −1, 0} . When gijs ′ = 1 , there is an arc from node i to node j ′ ; when gijs ′ = -1 , there is an arc from node j ′ to
lP
node i ; when gijs ′ = 0 , there is no arc between node i and node j ′ . For the matrix G ls , the sum of the diagonal elements is zero, that is to say gijs ′ +g sj ′i = 0 . G ls represents the upper triangular matrix of G ls ,
gijg′ ( j ′ ≥ i )
urn a
and gijs ′ ( j ′ ≥ i ) represents its element. G g represents the upper triangular matrix of G g , and represents
n
its
element.
∆ (G ls , G g )
can
be
calculated
n
by ∆ (G ls ,G g ) = ∑∑ | gijs ′ -gijg′ | ( j ′ ≥ i ) . i =1 j ′ =1
Remark two: The parameter β is the adjustment coefficient, which is used to control the role of L
∑ log(1 − α ) l =1
∆ (G ls ,G g )
and make sure its value owns the same order of magnitude as other items. The
Jo
bigger the value of β is, the greater the role of
L
∑ log(1 − α )
∆ (G ls ,G g )
is. The parameter α ∈ [0,1] controls
l =1
the similarity between G ls and G g . When the value of α is close to 1, the structures of G ls and G g own more similarity; When the value of α is close to 0, the structures of G ls and G g own less similarity. The values of α and β need to be determined based on the practical situation and the above principle.
Journal Pre-proof
3.1.2 The determination of target structure Although the available information in the target is limited, it reflects the valuable information of target in a way. Therefore, in the process of searching for the target structure, the CSI and the information in the target are all utilized. The posterior probability of target structure given the target data and CSI can
pro of
be expressed as the following form:
P(G t | G g , Da t ) ∝ P(G t )P(G g , Da t | G t )
(10)
It assumes that P(G g , Da t | G t ) = P(G g | G t )P(Da t | G t ) . The Equation (10) is simplified to the following form:
P(G t | G g , Da t ) ∝ P(G t )P(G g | G t )P(Da t | G t ) g
,G t )
. Performing logarithm operation on equation (11), it can obtain that
re-
where P(G g , G t ) ∝ (1 − α ′) β ′∆ (G
(11)
logP(G t | G g , Da t ) ∝ log P(G t ) + β ′ log(1 − α ′)∆ (G
g
,Gt )
+ log P(Da t | G t )
(12)
Based on the search and score method, the following score function is used to search the target
lP
structure G t .
scoreGt |G g ,Dat = log P(G t ) + β ′ log(1 − α ′)∆ (G
g
,G t )
+ scoreDa t |Gt
(13)
urn a
where the parameter β ′ is the adjustment coefficient; the form of scoreDa t |Gt can refer to the Equation (9). The parameter α ′ ∈ [0,1] controls the similarity between the CSI and the structure of target. When the value of α ′ is close to 1, the structures of G g and G t are more similar; When the value of α ′ is close to 0, the structures of G g and G t are less similar. The values of α ′ and β ′ are determined based
Jo
on the practical situation.
3.1.3 The different transfer learning strategies for the different cases When the available useful information of multiple sources and target are in the different situations, the different strategies need to be applied to learn the structure of target. When searching for the CSI of multiple sources, three cases need to be considered.
Case one: the models and data information of multiple sources are all known. As the description of Figure 2, when the structures of multiple sources are known, the structures
Journal Pre-proof are used to determine the prior structural information of G g . Based on the Equation (8), the CSI can be extracted. In this process, the higher the number of appearances for one edge in all the structures of multiple sources is, the greater the probability that this edge appears in the CSI is.
Case two: only the data information of multiple sources is known. For this case, when searching for the CSI, the Equation (8) is simplified into the following form: L s 1:L ,Da1:L
= ∑ scoreDa s |G g
pro of
scoreG g |G s
(14)
l
l =1
Case three: only the models of multiple sources are known.
For this case, when searching for the CSI, the Equation (8) is simplified into the following form: L
scoreG g |G s
s 1:L ,Da1:L
s
= log P(G g )+β ∑ log(1 − α ) ∆ (Gl ,G
g
)
(15)
l =1
The cases two and three are the special situations of case one.
re-
When searching for the structure of target, two cases need to be considered.
Case four: the expert knowledge of target is available.
For this case, the prior structural information of G t can be determined. Based on the Equation (13), the structure of target can be extracted.
lP
Case five: the expert knowledge of target is not available.
For this case, when searching for the structure of target, the Equation (13) is simplified into the following form:
urn a
scoreGt |G g ,Dat = β ′ log(1 − α ′) ∆ (G
g
,Gt )
+ scoreDa t |Gt
(16)
The case five is the special situation of case four.
3.2 The proposed BN parameters transfer learning strategy In this section, the new BN parameters transfer learning method is proposed to improve the
Jo
performance of target parameters learning, which is shown in the Figure 3. The goal of parameters learning is to determine all p(X i |pa(X i )) . The parameters can be obtained by maximum likelihood estimation (MLE). Three problems for the parameters transfer learning need to be considered: judge whether to transfer the parameter or not; how to evaluate the similarity of parameters between the multiple sources and target; how to fuse the parameters information from the multiple sources to learn the target parameters. Given data-sets Da1s … Da sL , Da t and structures G1s … G sL , G t , to solve the above
Journal Pre-proof problems, the specific process is shown as follows:
Step 1: The estimation parameters θɶ t of target domain are obtained by the target data Da t based on MLE.
θɶ t = arg max p (Da t | θ t )
(17)
θt
pro of
Step 2: The estimation parameters of target domain are obtained by the source data Da ls based on MLE.
θlt = arg max p (Da ls | θ t ) θt
(18)
where θlt represents the estimation parameters of target domain obtained by the l th source data
Da ls based on MLE.
Step 3: The similarity of multiple sources is distinguished based on the difference of structures between
re-
the multiple sources and target (DSST).
scorelsim = 1 / max(1, N l ) (1 ≤ l ≤ L)
(19)
where N l represents the number of different arcs between the l th source structure and the target
lP
structure. The way of calculation of N l is the same with ∆ , which can refer to Remark one. The larger the number of N l is, the smaller the similarity score of the l th source is.
Step 4: The DSST is used to determine that whether the parameters information of source is applied to
Step 5.
urn a
parameters transfer learning or not. If the answer is ‘Yes’ , shift to Step 6; If the answer is ‘No’ , shift to
The target and the multiple sources may only own the local same structures. The parameters of source nodes which own the different parent node sets (PNS) compared with target structure are less likely to have the similar probability distribution to the target. Therefore, the structures of multiple sources are compared with the searched structure of target. If the PNS of one node are the same
Jo
between the source and target, the parameters of this node in this source will be used to transfer learning. Otherwise, it is not considered as the information for transfer learning. For each node in the target, all the sources need to be evaluated respectively by the same way.
Step 5: The parameters of node with the different PNS in the source are not used to transfer learning. This source is not regarded as the alternative source.
Step 6: The weights of alternative sources for the fusion algorithm are calculated.
Journal Pre-proof The weights of alternative sources for the fusion algorithm are calculated by the following equality: L′
ωl = scorelsim
∑ score
sim l
(20)
l =1
where ωl represents the weight of l th source; scorelsim represents the similarity score of l th source;
pro of
L′ ( L′ ≤ L ) represents the number of alternative sources. Step 7: The final parameters of target domain are calculated by the following fusion function: L′
θˆt = ηθɶt + (1 − η )∑ ωlθ lt
(21)
l =1
where η ( 0 < η ≤ 1 ) represents the weight of the parameters obtained by the data information of target, which is determined based on the practical situation; θˆt represents the estimation parameters of target. Above operation is implemented on each node in the target respectively.
re-
Remark three: In the proposed BN parameters transfer learning strategy, to avoid the influence of negative transfer and ensure the effectiveness of transfer learning, the similarity of multiple sources needs to be measured. If the source owns the more similar structure as target, it is more likely to own
lP
the similar probability distribution. The DSST is used to determine whether the parameters information of source is applied to transfer learning or not and the weights of multiple sources for transfer learning. In the proposed strategy, the parameters of every node in the target are calculated respectively, because the alternative parameters information in the multiple sources may be different for every node.
urn a
Similar to the structure transfer learning method, when the available useful information of multiple sources and target are in the different situations, the different strategies need to be applied to learn the parameters of target. Three cases need to be considered.
Case one: the models and data information of multiple sources are all known. As the description of Figure 3, when the models of multiple sources are known, the structures are
Jo
used to determine that whether the parameters information of source is applied to transfer learning or not and the weights of multiple sources. Based on the Equations (17)-(21), the parameters of target can be obtained.
Case two: only the data information of multiple sources is known. For this case, firstly, the structures of multiple sources need to be learned by the sufficient data information from multiple sources. Other steps are the same with the case one.
Journal Pre-proof Case three: only the models of multiple sources are known. For this case, when the PNS of one node in the source are the same with the target, the parameters of this node are used to transfer learning directly. This operation replaces the step 2 in the section 3.2. Other steps are the same with the case one. The cases two and three are the special situations of case one.
pro of
To express the description of symbols in this paper in a clear way, the symbols in this paper are summarized in the Table 1.
The parameters of target are obtained by the mutiple sources data based on MLE
lP
N
re-
The similarity of mutiple sources is calculated based on the DSST
Are the obtained parameters from the mutiple sources used to transfer learning? Y
The weights of alternative sources are calculated
The parameters of target are obtained by the target data based on MLE
The final parameters of target are calculated by the fusion algorithm
urn a
This source is not used as the alternative source for the parameters transfer learning
Jo
Figure 3 The proposed BN parameters transfer learning strategy
Table 1 The symbols and their corresponding description in this paper Symbol
Description
D ={V, G, Da}
Domain
V={X1 , X 2 , X 3 ,… , X n }
Nodes
Da
Data
G
A directed acyclic graph
Journal Pre-proof pa(X i )
Parent nodes of node X i
D t ={V t , G t , Da t } s 1
s 2
s L
s l
Target domain s l
s l
s l
{D , D ,… , D } (L ≥ 1) D = {V , G , Da }
A set of source domains
Da t = {d1t , dt2 ,… , d tN }
Target domain data
s l
s 1
s 2
s M
Da = {d , d ,… , d }
Source domain data
ˆt G
The estimation of target domain structure
G
t
Target domain structure
G
pro of
Gg
CSI
s l
Structure of l th source domain
G1:s L Da
Structures of source domains
s 1:L
Data sets of source domains
∆
A graph distance metric
The number of records which satisfy the condition that l mijk
the i th node is in the k th state and its parent set is in
re-
the j th state for the l th source
The records which satisfy the condition that the parent
mijl
set of i th node is in the j th state for the l th source
ml
θˆt θɶ t
urn a
θlt
Target domain parameters
Nl
The estimation of target domain parameters
lP
θ
The sample size of l th source
t
The estimation of target domain parameters obtained by the target data Da t
The estimation of target domain parameters obtained by the source data Da ls
The number of different arcs compared with the target domain for the l th source
L′ ( L′ ≤ L )
The number of alternative sources.
ωl
The weight of l th source
Jo
4. Experimental Results
Firstly, to demonstrate the feasibility of proposed method, this section presents the experiments on the well-known Asia network (Kabli, Herrmann, & McCall, 2007 ; Kim, Ko, & Kang, 2013; Vafaee, 2014) (Lauritzen & Spiegelhalter, 1988). The network structure of Asia network is shown in the Figure 4. The Asia network is used to represent the relationships for the Chest Clinic, which is a diagnostic demonstrative BN. Then the proposed BN transfer learning strategy is applied to establish the safety
Journal Pre-proof control model for the thickening process of gold hydrometallurgy, which is compared with the traditional modeling method only using the limited data information of target process to verify the
pro of
superiority.
4.1 The experiments on the Asia network
re-
Figure 4 The structure of Asia network
lP
To verify the proposed BN transfer learning strategy, three related sources models are constructed. The
Jo
urn a
structures of three related sources models are shown in the Figure 5.
(a)
(b)
Journal Pre-proof
S
L
T
E
X
B
pro of
A
D
(c)
re-
Figure 5 The structures of three related sources models (a) source one; (b) source two; (c) source three
The dataset with 1000 samples from the true Asia network is used as the target data information, which is represented as data0-1000. 2500 samples are collected from three sources networks as the sources data information respectively, which are represented as data1-2500, data2-2500 and data3-2500.
lP
Based on the proposed BN structure transfer learning method in the section 3.1, the CSI of G g is extracted by the equations (8) and (9), and the structure of target is learned by the equation (13). In this process, the Genetic Algorithm is used to search the structure. The crossover probability is set as 0.9,
urn a
the mutation probability is set as 0.01 and the max generation is set as 300. The search algorithm runs for 20 times. To evaluate the performance of learned structure of target, the learned structure of target is compared with the true Asia network. The average of spurious edges, missing edges and reversed edges are introduced as the indexes to measure the performance of proposed method. The reversed edge can be considered as adding an edge after deleting an edge. The total edge difference is calculated by the
Jo
sum of spurious edges, missing edges and double reversed edges. To evaluate the performance of proposed BN structure transfer learning strategy, the learned structure of target only by the data information data0-1000 from the target is compared with the proposed method. The comparison results are shown in Table 2.
Journal Pre-proof Table 2 The comparison results on the structure difference of target by the proposed BN structure transfer learning method with only using the data information from the target The structure difference of target by the proposed BN structure transfer learning method
spurious edges The average of missing edges The average of reversed edges The total edge difference
target only using the data information of target
α ′=0.1
α ′=0.4
α ′=0.6
α ′=0.9
4.7
3.85
3.45
3.55
2.35
2.9
3.1
0.7
0.7
0.7
8.45
8.15
7.95
10.4
pro of
The average of
The structure difference of
3
1.55
0.4
1.65
7.35
15.25
re-
From the comparison results in Table 2, it can obtain that the obtained network structure by the proposed method is much closer to the true network. Therefore, the proposed BN structure transfer learning strategy is effective to obtain the structure of target and it owns the better performance than
lP
the method which only uses the limited information of target to learn the structure of target. To show the influence of data size from the target on the performance of proposed method, the simulation results of different data sizes from the target are shown in the Figure 6. The total edge difference by the method which only uses data information from the target is represented as N1 . The
urn a
total edge difference by the proposed BN structure transfer learning method is represented as N 2 . The difference value of two kinds of total edge differences is represented as M = N1 − N 2 . In the Figure 6, the vertical axis represents the difference value M and the horizontal axis represents the data size of
Jo
target. The lines with the different forms are obtained based on the different values of α ′ .
Journal Pre-proof
9 8
0.1 0.4
7
0.6 0.9
5 4 3 2 1 100
1000
2000
pro of
M
6
3000
4000
data size
re-
Figure 6. The difference values of two methods under the different data sizes of target
From the Figure 6, it can obtain that as the increase of data size, the difference value M changes smaller and smaller. It can conclude that when the data size of target increases, the role of transfer
lP
learning decreases. It conforms to the general experience that when the data size of target is small, the transfer learning will play more important role in the structure learning of target. For the BN parameters learning, the Kullback-Leibler (KL) divergence is used to evaluate the performance of parameters learning and measure how closely the learned parameters with the true
urn a
parameters. The smaller the value of KL divergence is, the better the learned parameter is. The parameters of target are learned by the proposed BN parameters transfer learning method in the Section 3.2. The Table 3 shows the values of KL divergence under the different weights η . The smaller the value of η is, the bigger the role of parameters transfer learning is. When η =1 , it is obvious
Jo
that the parameters of target are learned only using target data.
Table 3 The values of KL divergence under the different weights η by the proposed method The value of η 0
The value of KL divergence when data size is 100 The value of KL divergence when data size is 200
0.2
0.4
0.6
0.8
1
15.6102
15.1453
15.1720
15.9189
22.4935
15.1808
14.3580
14.0956
14.6597
22.6415
16.5178
Journal Pre-proof The value of KL divergence when data size is 300 The value of KL divergence when data size is 400 The value of KL divergence when data size is 500
15.4437
14.8574
14.8048
15.5283
22.1781
15.3717
14.7241
14.6130
15.2689
21.5119
15.2640
14.5034
14.2833
14.8492
21.1441
pro of
From the comparison results in the Table 3, it can obtain that the KL divergence value by the method which only uses the target data is bigger than all the other values using the proposed BN parameters transfer learning method. The learned parameters by the proposed BN parameters transfer learning method are closer to the true parameters. Therefore, the proposed BN parameters transfer learning method is feasible to obtain good parameters and the learned parameters are better than the parameters learning only using target data ( η =1 ). As the increase of η , the role of multiple sources
re-
decreases, and the role of target increases. The value of KL divergence gets smaller and then it gets bigger. This is because that the useful information in the target plays the positive role in a way. But limited data information of target cannot obtain the good parameters, when the more data information
lP
of target is utilized to transfer learning, the performance of parameters decreases. For the proposed BN parameters transfer learning method in the Section 3.2, if the PNS of one node in the source are different from the target, the parameters of this node in this source will not be considered as the information for transfer learning. To verify the necessity of this way, the values of KL
urn a
divergence under the different weights η when the DSST is not considered are shown in the Table 4.
Table 4 The values of KL divergence under the different weights η when the DSST is not considered 0
The value of KL divergence when data size is 100
The value of KL divergence when
Jo
data size is 200
The value of KL divergence when data size is 300
The value of KL divergence when data size is 400
The value of KL divergence when data size is 500
17.3136
The value of η 0.2
0.4
0.6
0.8
1
16.2461
15.6304
15.5053
16.0883
22.4935
15.8293
14.8607
14.4452
14.8380
22.6415
16.0885
15.3577
15.1555
15.7113
22.1781
16.0883
15.3380
15.0926
15.5683
21.5119
15.9584
15.0816
14.7231
15.1174
21.1441
Journal Pre-proof Comparing the results in the Tables 3 and 4, when the DSST is not considered, the obtained values of KL divergence are all bigger than the proposed BN parameters transfer learning method for the corresponding weight η and data size. It can conclude that the way of considering the DSST will obtain the better parameters transfer learning performance. The proposed method can avoid the influence of negative transfer.
pro of
The numerical experimental results on the Asia network show that our proposed BN transfer learning method is effective to learn the model of target and owns the better performances than the method which only uses the target data.
4.2 Simulation results on the thickening process of gold hydrometallurgy
The proposed BN transfer learning strategy is applied to establish the safety control model for the process
of
gold
hydrometallurgy
on
the
semi-physical
simulation
platform.
re-
thickening
Hydrometallurgy semi-physical simulation platform has been designed and constructed by our team during the past few years. Based on mechanism analysis and actual data, the simulation platform can
lP
simulate the hydrometallurgical process including the thickening, cyanide leaching, washing and cementation sub-processes. The new optimal control, monitoring and fault diagnosis methods can be verified by this simulation platform. The hardware structure diagram of system can refer to the related research results (H. Li, et al., 2017; H. Li, et al., 2019). In this paper, the common abnormities of
urn a
thickening process are taken as the research background.
4.2.1 Example one
Based on the analysis in the section 2, it can conclude that for the different dosages of flocculants, the relationships among the related variables of abnormity are different. Therefore, the data information
Jo
under the different dosages of flocculants is collected. The corresponding models are learned by the different data information. When the dosage of flocculants is too high, the established model is shown in the Figure 7. When the dosage of flocculants is too low, the established model is shown in the Figure 8. The physical meanings and the grades of nodes are shown in the Table 5. When the dosage of flocculants is too high, it may result in the increase of the underflow concentration, but it is not the reason of abnormal overflow turbidity. When the dosage of flocculants is too low, it may result in the
Journal Pre-proof abnormal overflow turbidity, but it is not the reason of high underflow concentration. From the Figures 7 and 8, it can obtain that the learned relationships among the related variables by the data information
pro of
conform to the analysis of mechanism.
urn a
lP
re-
Figure 7. The established BN model when the dosage of flocculants is too high
Figure 8. The established model when the dosage of flocculants is too low
Table 5. The physical meanings and the grades of nodes The physical meanings of
The nodes of BN
Jo
A
B
C
nodes
The grades of nodes
The opening degree of valve
1. unchanged
3 and the power of slurry
2. middle high grade
pump 1
3. high grade
The opening degree of valve
1. closed grade
4
2. open grade
The opening degree of valve
1. unchanged
1
2. middle low grade 3. low grade
Journal Pre-proof
G
H
I J
3. severe grade The underflow concentration
1. nonoccurrence
is too high
2. medium grade 3. severe grade
The buffer slot 1 under the
1. nonoccurrence
thickener is empty
2. occurrence
The motor current in the
1. nonoccurrence
thickener is too large
2. medium grade 3. severe grade
The bed pressure in the
1. nonoccurrence
thickener is too high
2. medium grade 3. severe grade
The electricity of slurry
1. nonoccurrence
pump 1 is not stable
2. occurrence
The dosage of flocculants
1. normal 2. too low 3. too high
The overflow turbidity is too
1. nonoccurrence
high
2. medium grade
urn a
K
2. medium grade
pro of
F
low
re-
E
1. nonoccurrence
lP
D
The underflow rate is too
3. severe grade
The two models in the Figures 7 and 8 are regarded as two related sources. 2500 samples are collected from two sources models respectively, which are used as the data information of sources. In
Jo
addition, 100 samples and 500 samples are collected when the dosage of flocculants is too low, which are used as the target data information respectively. To verify the proposed BN transfer learning method, the target data information and two related sources information are applied to establish the model of target, and the learned model is represented as “Model one”. To verify the superiority of proposed method, the traditional modeling method on the scene is used to compare the modeling results. In the traditional method, the model is learned only by the limited target data information. The
Journal Pre-proof useful information from the related sources is not utilized effectively. The learned model by the traditional method is represented as “Model two”. Based on the analysis of practical situation, some possible abnormal scenarios are extracted, which are shown in the Table 6. In the Table 6, every abnormal scenario includes four characteristics with the different degrees. The specific meanings of different degrees can be found in the Table 5. Taking the abnormal scenario 1 as the example, the states
pro of
of nodes G, H and K are all at the severe grade and the state of node I is occurrence. Not all scenarios are included in the Table 6. Only some typical scenarios are considered and other similar scenarios can be analyzed in the same way. The abnormal scenarios will be used as the evidences to obtain the inference results of nodes A, B, C and J by the Models one and two. The reasoning result which owns the largest posterior probability will be regarded as the final decision. The total number of reasoning result is 24, and it is obtained by the number of abnormal scenarios (6) multiplying the number of
re-
nodes to be inferred (4). If the obtained final decisions conform to the expert knowledge, operation experience and data information, the inference result is considered to be correct. The precision rate is calculated by the correct number of inference results dividing the total number of reasoning results,
lP
which is shown in the Table 7.
Table 6 Some possible abnormal scenarios for three abnormities in the thickening process G
H
I
K
1
3
3
2
3
2
3
3
1
3
3
3
3
2
1
4
3
3
1
1
5
1
1
1
3
6
1
1
1
1
urn a
Scenario number
Table 7 The precision rate of inference result by the Models one and two The precision rate of inference result by the
The precision rate of inference result by the
size
Model one
Model two
95.8%
75%
95.8%
91.7%
100 500
Jo
Data
From the results in the Table 7, it can conclude that the model learned by the proposed BN transfer learning method can obtain the higher precision rate of inference than the learned model only by the
Journal Pre-proof limited data information of target. In addition, when the data size of target is less, the proposed transfer learning method will play more important role.
4.2.2 Example two
pro of
In the Example one, the collected target data information is only from the condition that the dosage of flocculants is too low. The learned model structure of target is the same with the Figure 8. In the Example two, the collected target data information is from two conditions. When the dosage of flocculants is too low, 250 samples are collected. When the dosage of flocculants is too high, another 250 samples are collected. Above collected 500 samples are used as the data information of target. When the collected target data information is from the different dosages of flocculants, how will the learned model change? Therefore, in the following example, the model of this condition is established
re-
by the proposed BN transfer learning method. The related sources information is the same with the Example one. The learned model of target is shown in the Figure 9, which is represented as “Model
lP
three”.
J
A
B
C
urn a
D
K
E
G
F
H
I
Figure 9. The learned model by the proposed transfer learning method when the data information
Jo
of target from the different dosages of flocculants
From the Figure 9, it can conclude that when the data information of target from the different dosages of flocculants, the relationships between the node J and the node K and the relationships between the node J and the node E all exist. Based on the analysis in the Example one, this structure conforms to the practical situation. To compare the performance of transfer learning, the model is also
Journal Pre-proof learned only by the limited target data information, and the learned model is represented as “Model
four”. In the proposed BN parameters transfer learning method, the DSST is considered to decide whether the parameter is used to transfer learning. To show the performance of this way, the “Model
five” is learned when the DSST is not considered. The abnormal scenarios in the Table 6 are used as the evidences to obtain the inference results. The precision rates of inference result by three kinds of
pro of
models are shown in the Table 8.
Table 8 The precision rates of inference result by three kinds of models The precision rate of inference result Model three (η =0.1)
75%
Model four Model five (η =0.1)
66.7%
70.8%
re-
From the results in the Table 8, it can conclude that the model learned by the proposed BN transfer learning method can obtain the higher precision rate of inference than the learned model only by the limited data information of target. In addition, when the DSST is not considered, the precision rate of
lP
inference will decrease.
Based on the simulation results in the Examples one and two, it can conclude that the proposed BN transfer learning strategy is effective to establish the safety control model for the thickening process of gold hydrometallurgy. It owns the better performance than the traditional modeling method
5. Conclusions
urn a
which only uses the limited data information of target.
This paper develops a new safety control modeling method based on the BN transfer learning strategy for the thickening process of gold hydrometallurgy. First of all, by analyzing the existing research
Jo
results on the safety control for the thickening process of gold hydrometallurgy, the problem to solve is transformed into the BN transfer learning problem. Furthermore, a new BN transfer learning strategy is proposed, which includes the structure transfer learning and the parameters transfer learning. Finally, the experimental results demonstrate that the proposed BN transfer learning strategy is effective and owns the better performances. The influences of data size of target and DSST on the transfer learning are analyzed and compared. Finally, the proposed BN transfer learning strategy is applied to establish
Journal Pre-proof the safety control model for the thickening process of gold hydrometallurgy. The simulation results demonstrate that the proposed method is effective to establish the model when the dosages of flocculants are in the different situations and it owns the better performances than the traditional modeling method which only uses the limited data information of target.
pro of
Conflict of interest There is no conflict of interest.
Acknowledgments
This work was supported by the National Nature Science Foundation of China [grant numbers 61533007, 61873049, 61973057], the Foundation for Innovative Research Groups of the National
re-
Natural Science Foundation of China [grant numbers 61621004], the National Key Research and Development Program of China [grant numbers 2017YFB0304205]
lP
References
Anam, A., & Rushdi, M. 2019. Classification of scaled texture patterns with transfer learning. EXPERT SYSTEMS WITH APPLICATIONS, 120, 448-460. de Andrade Lima, L.R.P. 2006. Nonlinear data reconciliation in gold processing plants. Minerals
urn a
Engineering, 19, 938-951.
Grolman, E., Bar, A., Shapira, B., Rokach, L., & Dayan, A. 2016. Utilizing transfer learning for in-domain collaborative filtering. KNOWLEDGE-BASED SYSTEMS, 107, 70-82. Jin, Y., Zhang, Y., Jing, Y., & Fu, J. 2019. An Average Dwell-Time Method for Fault-Tolerant Control of Switched Time-Delay Systems and Its Application. IEEE TRANSACTIONS ON
Jo
INDUSTRIAL ELECTRONICS, 66, 3139-3147. Kabli, R., Herrmann, F., & McCall, J., 2007 A Chain-Model Genetic Algorithm for Bayesian Network Structure Learning, GECCO '07 Proceedings of the 9th annual conference on Genetic and evolutionary computation. Publishing, London, England, pp. 1264-1271.
Kim, D.W., Ko, S., & Kang, B.Y. 2013. Structure Learning of Bayesian Networks by Estimation of Distribution Algorithms with Transpose Mutation. Journal of Applied Research and
Journal Pre-proof Technology, 11, 586-596. Lauritzen, S.L., & Spiegelhalter, D.J. 1988. Local Computations with Probabilities on Graphical Structures and Their Application to Expert Systems. Journal of the Royal Statistical Society. Series B 50, 157-224. Li, H., Wang, F., & Li, H. 2017. A safe control scheme under the abnormity for the thickening process
pro of
of gold hydrometallurgy based on Bayesian network. Knowledge-Based Systems, 119, 10-19. Li, H., Wang, F., Li, H., & Wang, X. 2019. The updating strategy for the safe control Bayesian network model under the abnormity in the thickening process of gold hydrometallurgy. Neurocomputing, 338, 237-248.
Li, K., & Principe, J.C. 2017. Transfer Learning in Adaptive Filters: The Nearest Instance Centroid-Estimation Kernel Least-Mean-Square Algorithm. IEEE TRANSACTIONS ON
re-
SIGNAL PROCESSING, 65, 6520-6535.
Li, L., Luo, H., Ding, S.X., Yang, Y., & Peng, K. 2019. Performance-based fault detection and fault-tolerant control for automatic control systems. Automatica, 99, 308-316.
lP
Liu, X., Li, Y., & Chen, G. 2019. Multimode tool tip dynamics prediction based on transfer learning. ROBOTICS AND COMPUTER-INTEGRATED MANUFACTURING, 57, 146-154. Liu, Y., Wang, F.-l., & Chang, Y.-q. 2013. Reconstruction in integrating fault spaces for fault identification with kernel independent component analysis. Chemical Engineering Research
urn a
and Design, 91, 1071-1084.
Lu, J., Behbood, V., Hao, P., Zuo, H., Xue, S., & Zhang, G. 2015. Transfer learning using computational intelligence: A survey. Knowledge-Based Systems, 80, 14-23. Luis, R., Sucar, L.E., & Morales, E.F. 2009. Inductive transfer for learning Bayesian networks. Machine Learning, 79, 227-255.
Jo
Niculescu-Mizil, A., & Caruana, R., 2007. Inductive Transfer for Bayesian Network Structure Learning, Eleventh International Conference on Artificial Intelligence and Statistics (AISTATS). Publishing, San Juan, Puerto Rico.
Oyen, D., & Lane, T., 2012. Leveraging Domain Knowledge in Multitask Bayesian Network Structure Learning, Twenty-Sixth AAAI Conference on Artificial Intelligence. Publishing, Toronto, Canada, pp. 1091-1097.
Journal Pre-proof Oyen, D., & Lane, T. 2014. Transfer learning for Bayesian discovery of multiple Bayesian networks. Knowledge and Information Systems, 43, 1-28. Pereira, F.L.F., Lima, F.D.d.S., Leite, L.G.d.M., Gomes, J.P.P., & Machado, J.d.C. 2017. Transfer Learning for Bayesian Networks with Application on Hard Disk Drives Failure Prediction. 228-233.
pro of
Salaken, S.M., Khosravi, A., Nguyen, T., & Nahavandi, S. 2017. Extreme learning machine based transfer learning algorithms: A survey. Neurocomputing, 267, 516-524.
Shin, H.-C., Roth, H.R., Gao, M., Lu, L., Xu, Z., Nogues, I., Yao, J., Mollura, D., & Summers, R.M. 2016. Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning. IEEE TRANSACTIONS ON MEDICAL IMAGING, 35, 1285-1298.
re-
Sun, C., Ma, M., Zhao, Z., Tian, S., Yan, R., & Chen, X. 2019. Deep Transfer Learning Based on Sparse Autoencoder for Remaining Useful Life Prediction of Tool in Manufacturing. IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 15, 2416-2425.
lP
Talo, M., Baloglu, U., Yildirim, O., & Acharya, U. 2019. Application of deep transfer learning for automated brain abnormality classification using MR images. COGNITIVE SYSTEMS RESEARCH, 54, 176-188.
Tran, H.M., & Trinh, H. 2019. Distributed Functional Observer Based Fault Detection for
urn a
Interconnected Time-Delay Systems. IEEE SYSTEMS JOURNAL, 13, 940-951. Vafaee, F., 2014. Learning the structure of large-scale bayesian networks using genetic algorithm, GECCO '14 Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation. Publishing, Vancouver, BC, Canada, pp. 855-862. Wu, Z., Wu, Y., Chai, T., & Sun, J. 2015. Data-Driven Abnormal Condition Identification and
Jo
Self-Healing Control System for Fused Magnesium Furnace. IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 62, 1703-1715.
Zhao, C., & Gao, F. 2017. Critical-to-Fault-Degradation Variable Analysis and Direction Extraction for Online Fault Prognostic. IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, 25, 842-854. Zhao, C., & Huang, B. 2018. A full-condition monitoring method for nonstationary dynamic chemical
Journal Pre-proof processes with cointegration and slow feature analysis. AIChE Journal, 64, 1662-1681. Zhou, Y., Fenton, N., Hospedales, T.M., & Neil, M., 2015. Probabilistic Graphical Models Parameter Learning with Transferred Prior and Constraints, 31st conference on uncertainty in artificial intelligence. Publishing, pp. 972-981. Zhou, Y., Hospedales, T.M., & Fenton, N. 2016. When and Where to Transfer for Bayes Net Parameter
pro of
Learning. Expert Systems With Applications, 55, 361-373.
Zhu, J., Yao, Y., & Gao, F. 2018. Transfer of Qualitative and Quantitative Knowledge for Similar Batch
Jo
urn a
lP
re-
Process Monitoring. IEEE ACCESS, 6, 73856-73870.
*Author Contributions Section
Journal Pre-proof Author Contribution Statement Hui Li: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Writing - Original Draft. Fuli Wang: Writing - Review & Editing, Supervision, Project administration, Funding acquisition.
pro of
Hongru Li: Writing - Review & Editing, Supervision, Funding acquisition.
Jo
urn a
lP
re-
Qingkai Wang: Resources.