Safety control modeling method based on Bayesian network transfer learning for the thickening process of gold hydrometallurgy

Safety control modeling method based on Bayesian network transfer learning for the thickening process of gold hydrometallurgy

Journal Pre-proof Safety control modeling method based on Bayesian network transfer learning for the thickening process of gold hydrometallurgy Hui Li...

689KB Sizes 0 Downloads 7 Views

Journal Pre-proof Safety control modeling method based on Bayesian network transfer learning for the thickening process of gold hydrometallurgy Hui Li, Fuli Wang, Hongru Li, Qingkai Wang

PII: DOI: Reference:

S0950-7051(19)30586-6 https://doi.org/10.1016/j.knosys.2019.105297 KNOSYS 105297

To appear in:

Knowledge-Based Systems

Received date : 4 June 2019 Revised date : 25 November 2019 Accepted date : 27 November 2019 Please cite this article as: H. Li, F. Wang, H. Li et al., Safety control modeling method based on Bayesian network transfer learning for the thickening process of gold hydrometallurgy, Knowledge-Based Systems (2019), doi: https://doi.org/10.1016/j.knosys.2019.105297. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

© 2019 Published by Elsevier B.V.

*Conflict of Interest Form

Journal Pre-proof Conflict of interest

Jo

urn a

lP

re-

pro of

There is no conflict of interest.

*Revised Manuscript (Clean Version) Click here to view linked References

Journal Pre-proof Safety control modeling method based on Bayesian network transfer learning for the thickening process of gold hydrometallurgy

a

pro of

Hui Lib, Fuli Wanga,b, Hongru Lib, Qingkai Wangc

State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University,

Shenyang, China; postal code: 110819; b

Information Science and Engineering, Northeastern University, Shenyang, China; postal code:

110819;

BGRIMM Technology Group, Daxing District, Beijing, China

E-mails:

[email protected];

re-

c

[email protected];

lP

[email protected]

[email protected];

The corresponding author is Hui Li (e-mail: [email protected]; postal address: P.O.Box 135, No.11

Jo

urn a

St.3 Wenhua Road, Heping District Shenyang, Liaoning Province, P.R. China; postal code: 110819).

Journal Pre-proof Safety control modeling method based on Bayesian network transfer learning for the thickening process of gold hydrometallurgy Abstract: When the data information of target domain is very limited, it is difficult to establish the

pro of

accurate model to analyze the target problem. For the safety control modeling problem, this paper develops a new Bayesian network (BN) transfer learning strategy for the thickening process of gold hydrometallurgy. First of all, the safety control modeling problem in this process is analyzed deeply. When the data information of abnormality is insufficient, the safety control modeling problem is transformed into the BN transfer learning problem. Furthermore, the new BN transfer learning strategy is proposed, which includes the structure and parameters transfer learning methods. For the structure transfer learning, by integrating the common structural information of multiple sources and the useful

re-

information of target, the final structure of target is determined. For the parameters transfer learning, by distinguishing the similarity of multiple sources, the parameters of target are obtained by the fusion algorithm. Finally, the proposed method is verified by the Asia network and it is applied to establish the

lP

safety control model for the thickening process of gold hydrometallurgy. The simulation results demonstrate that the proposed method is effective and owns the better performances than the traditional modeling method.

knowledge

urn a

Keywords: Bayesian network; Transfer learning; Gold hydrometallurgy; Safety control; Expert

1. Introduction

As an important technique of refining gold in the industrial process (de Andrade Lima, 2006), hydrometallurgy includes the sub-processes of flotation, concentration, leaching, washing and

Jo

cementation. Before the cyanide leaching, the slurry needs to be concentrated by the thickener and the pressure filter to get the high solid slurry. This process is called as the thickening process which is a key process to guarantee the efficiency of the following cyanide leaching. When the characteristics of raw materials change and the operation strategy is improper, the abnormity will occur. Because of the high economic value of gold, the occurrence of abnormity will lead to the serious financial losses or even safety threat. Therefore, the monitoring and identification of abnormity and making the

Journal Pre-proof corresponding safety control scheme attract more and more attention. These researches can ensure that the process runs well. The existing fault detection and safety control methods mainly include the model-based methods (Jin, Zhang, Jing, & Fu, 2019; L. Li, Luo, Ding, Yang, & Peng, 2019; Tran & Trinh, 2019) and the data-driven methods (Y. Liu, Wang, & Chang, 2013; Wu, Wu, Chai, & Sun, 2015; Zhao & Gao, 2017; Zhao & Huang, 2018). However, when there is no accurate mechanism model

pro of

available in the research field, the model-based method will not obtain good performance. The data-driven methods break up the limitations of model-based methods. Based on the collected data information, the condition can be monitored by the data-driven modeling methods. In these methods, it is assumed that the data information is sufficient to establish the monitoring model and the safety control model. However, for the most data-driven methods, the ability to use the expert knowledge and operation experience is limited.

re-

As an intelligent machine learning method, Bayesian network (BN) provides us a new way to solve the problem, which can fuse the expert knowledge and data information to establish the model effectively. For the safety control problem in the thickening process of gold hydrometallurgy, the paper

lP

(H. Li, Wang, & Li, 2017) proposed the safety control scheme based on the BN for two common abnormities. Based on the research results in the paper (H. Li, et al., 2017), the paper (H. Li, Wang, Li, & Wang, 2019) analyzed the third common abnormity and proposed the updating learning scheme for the established BN model. However, the methods in the papers (H. Li, et al., 2017; H. Li, et al., 2019)

urn a

are all proposed based on the sufficient data information of abnormities. Because the abnormity may cause huge losses and safety threat, few factories would like to make abnormity deliberately and collect the data of abnormity. The collection of corresponding good safety control scheme for every abnormity becomes more difficult. In addition, the occurrence of some abnormities may need to experience a long time. It results in the difficulty of collecting the data information of abnormity further. When the data

Jo

of abnormity is insufficient, it is difficult to establish the accurate model to solve the target problem. Under this situation, the transfer learning and domain adaptation inspire us from a new angle to solve the problem. By collecting the useful information of related sources whose abnormality data have been collected and/or the model has been established, the problem of target can be solved by utilizing the related information effectively. Transfer learning has been applied in the various fields and obtains the extensive attention, such as, prediction (X. Liu, Li, & Chen, 2019; Pereira, Lima, Leite, Gomes, &

Journal Pre-proof Machado, 2017; Sun, et al., 2019), classification (Anam & Rushdi, 2019; Talo, Baloglu, Yildirim, & Acharya, 2019), filtering (Grolman, Bar, Shapira, Rokach, & Dayan, 2016; K. Li & Principe, 2017) and so on. Therefore, for the safety control modeling problem in the thickening process of gold hydrometallurgy, when the data information of abnormity is insufficient, other thickening processes in the same factory and/or other factories which use the hydrometallurgy technology to produce gold can

pro of

be considered as the related sources to provide the useful information. When applying the information of related sources, the differences of the relationships of variables and the distributions of parameters between the sources and target need to be focused. Based on the existing research results (H. Li, et al., 2017; H. Li, et al., 2019) on the safety control for the thickening process of gold hydrometallurgy, this paper considers that how to establish effective model to solve the safety control problem when the data information of abnormity is insufficient. Because BN is used to model the target problem in the

re-

existing research results, the safety control modeling problem is transformed into the BN transfer learning problem.

The transfer learning survey based on the computational intelligence methods has been provided

lP

by the paper (Lu, et al., 2015). The existing transfer learning researches mainly focus on the neural network model (Salaken, Khosravi, Nguyen, & Nahavandi, 2017; Shin, et al., 2016). For the BN transfer learning method, the related studies are limited relatively. For the BN structure transfer learning, the paper (Luis, Sucar, & Morales, 2009) proposed a new weighted sum of the conditional

urn a

independence measures by combining measures from the target task with the auxiliary tasks. The papers (Niculescu-Mizil & Caruana, 2007; Diane Oyen & Lane, 2014) considered the BN structure transfer learning method for the multitask learning based on the search and score techniques. For the BN parameters transfer learning, the paper (Y. Zhou, Hospedales, & Fenton, 2016) proposed the BN parameters transfer learning algorithm based on both network and fragment relatedness. In this process,

Jo

the problem of heterogeneous relatedness was analyzed and solved. The paper (Luis, et al., 2009) introduced the distance based linear pooling and local linear pooling probability aggregation methods to combine the probability estimates from the target task with those from the auxiliary tasks. However, when measuring the weights of different sources, the proposed method only considered the influences of conditional probability tables (CPTs) entry size and dataset size, and the fitness of source to the target domain was ignored. In addition, the expert knowledge plays the important role in the process of

Journal Pre-proof establishing the model of target problem. It is an effective way to improve the accuracy of model by integrating the expert knowledge into transfer learning. The paper (Zhu, Yao, & Gao, 2018) transferred the qualitative and quantitative knowledge to monitor the similar batch process. By incorporating the domain knowledge, the paper (Diane

Oyen & Lane, 2012) relaxed the assumption condition when

evaluating the task-relatedness in multitask BN structure learning. The paper (Yun

Zhou, Fenton,

pro of

Hospedales, & Neil, 2015) presented the new probabilistic graphical models parameters transfer learning method by the transferred prior and constraints based on the expert knowledge.

As far as we know, no research results focus on the safety control modeling problem for the thickening process of gold hydrometallurgy when the data information of target is too scarce to establish an accurate model. Therefore, inspired by the expert knowledge and transfer learning, this paper proposes a new safety control modeling method for the thickening process of gold

re-

hydrometallurgy based on the BN transfer learning strategy. In this process, the safety control problem is analyzed based on the existing research results. When the data information of target is very limited, the safety control modeling problem is transformed into the BN transfer learning problem. Furthermore,

lP

a new BN transfer learning strategy is proposed. By extracting the common structural information (CSI) of multiple sources and integrating the useful information of target, the final structure of target can be obtained. The parameters of target can be learned by fusing the parameters of multiple sources owning the different similarities and target. Finally, some simulation results are shown to verify the

urn a

effectiveness of proposed method. The proposed transfer learning strategy is applied to establish the safety control model for the thickening process of gold hydrometallurgy when the dosages of flocculants are in the different conditions. The simulation results imply that the proposed approach is effective and owns the better performances than the traditional modeling method which uses the limited data information of target.

Jo

The contributions of this paper can be summarized as follows. On the one hand, this paper proposes the new safety control modeling method based on the transfer learning for the thickening process of gold hydrometallurgy. On the other hand, this paper proposes the new BN transfer learning strategy. By integrating the CSI of multiple sources and the useful information of target, the final structure of target is determined. By distinguishing the similarity of multiple sources, the parameters of target are obtained by the fusion algorithm. The proposed method owns the generality, and it can be

Journal Pre-proof applied to solve the similar problem in the other research backgrounds. The remaining sections of this article are organized as follows. Based on the existing research results on the safety control for the thickening process, the problem to solve in this paper is analyzed deeply in the Section 2. The new BN transfer learning method is proposed. The structure and parameters transfer learning methods are shown respectively in the Section 3. In the Section 4, the

pro of

proposed algorithm is verified by a set of simulation results. Furthermore, the proposed method is applied to establish the safety control model for the thickening process of gold hydrometallurgy. Finally, the Section 5 concludes the paper.

2. Problem formulation

2.1 The existing safety control research results for the thickening process

urn a

lP

re-

The simplified schematic diagram of thickening process can be depicted by the Figure 1.

Jo

Figure 1. The simplified schematic diagram of thickening process

This process consists of thickener, pressure filter, buffer slots, slurry pumps and valves. For the problems of abnormity identification and safety control in the thickening process of gold hydrometallurgy, the expert knowledge and operation experience have been extracted and summarized, and the data information of relevant variables have been collected by all kinds of sensors. In the existing research results (H. Li, et al., 2017), based on the expert knowledge and data information, the

Journal Pre-proof safety control scheme was proposed based on the BN for two common abnormities. Two common abnormities include that “the underflow concentration of thickener is too high” and “the buffer slot 1 under the thickener is empty”. The causes, phenomena and corresponding removing measures for the abnormities have been analyzed deeply, which is used to define variables and construct the BN structure. In the paper (H. Li, et al., 2019), another common abnormity that “the overflow turbidity is

pro of

too high” was analyzed further. By integrating the research results in the paper (H. Li, et al., 2017), the new safety control BN model was established for three abnormities in the thickening process. Based on the analysis of three abnormities, it can conclude that for the different dosages of flocculants, the relationships among the BN variables are different. Therefore, although most of the BN structure can be obtained by the expert knowledge, the specific structure needs to be determined by the characteristics of data information, which owns the different relationships for the different situations. In

re-

addition, the BN parameters need to be learned from the data information of abnormity. The models in the existing research results (H. Li, et al., 2017; H. Li, et al., 2019) are established under the condition that the data information of abnormity is enough. When the data information of abnormity is limited for

lP

the thickening process, the safety control model is attempted to establish for the target thickening process using the abnormality data from other relevant thickening processes as sources. The problem to solve can be described as the following form: The objective is to establish the safety control model for the thickening process of gold hydrometallurgy. The target domain is the thickening process with

urn a

scarce abnormality data. The source domains include the thickening processes with enough abnormality data from the same factory and/or other factories which apply the hydrometallurgy technology to refine gold. In this research background, the tasks of target and sources are the same, and the target and sources own the same variables. But the specific BN structures may be different for the different conditions and the parameters may have different distribution characteristics due to the

Jo

different equipment sizes or operation differences. Therefore, the safety control modeling problem for the thickening process with scarce abnormality data is transformed into the BN transfer learning problem. The sources information includes the abnormality data and the corresponding safety control model. The available sources may include the above two aspects or only include one aspect. To make the scope of application for the proposed BN transfer learning method more extensive, the corresponding strategy is proposed for the source with the different characteristics.

Journal Pre-proof

2.2 Problem description In the BN transfer learning setting, a domain D ={V, G, Da} includes three components: the variables V={X1 , X 2 , X 3 ,… , X n } represent the BN nodes, Da represents the associated data, and G represents a directed acyclic graph which encodes the statistical dependencies among the variables.

pro of

pa(X i ) represents the parent nodes of node X i . The CPTs specify the probability p(X i |pa(X i )) of every variable given its parents as defined by graph G . In this paper, there is one target domain D t ={V t , G t , Da t } , and a set of sources {D1s , D2s ,… , DLs } (L ≥ 1) , Dls = {Vls , G ls , Da ls } . The target domain and each source domain have training data Da t = {d1t , dt2 ,… , d tN } and Da ls = {d1s , d s2 ,… , d sM } . N represents the number of data for the target domain. M represents the number of data for the source

re-

domain. For BN transfer learning, the condition that the target domain data is relative scarce 0 < N ≪ M , or N is small relative to the dimensionality of the target problem is considered. The target domain parameters are represented as θ t . The objective of BN transfer learning is to improve the

lP

accuracy of BN model in D t using the information in {D1s , D2s ,… , DLs } (L ≥ 1) . Therefore, the BN transfer learning can be defined as:

ˆ t = arg max p( D t ,{D s , D s ,…, D s } | G t ) G L 1 2

(1)

θˆt = arg max p( D t ,{D1s , D2s ,… , DLs } | θ t )

(2)

Gt

urn a

θt

ˆ t represents the estimation of target domain structure; θˆt represents the estimation of target where G domain

parameters.

The

following

conditions

are

assumed:

V t =Vls

,

{D1s , D2s ,… , DLs } (L ≥ 1) and D t may own different distribution properties. The available useful

Jo

information of multiple sources may have different situations. Sometimes the models and data information of multiple sources are all known, and sometimes only one aspect is known. For the different situations, the different strategies will be proposed.

3. A new BN transfer learning method 3.1 The proposed BN structure transfer learning strategy In this section, the new BN structure transfer learning strategy is proposed to obtain the structure of

Journal Pre-proof target. It needs to be considered that how to apply the useful information of multiple sources and the limited information of target to learn the structure of target. The specific process can be depicted by the

lP

re-

pro of

Figure 2.

Figure 2 The proposed BN structure transfer learning strategy

urn a

The proposed BN structure transfer learning method includes three main missions. At first, the CSI of multiple sources needs to be extracted to reflect the common characteristics of multiple sources. Because only the information from the same background is selected as the source, the CSI of multiple sources is more likely to appear in the structure of target. Furthermore, the structure of target needs to be determined by the available useful information in the target and the obtained CSI. Finally, for the

Jo

different available useful information in the multiple sources and target, the different transfer learning strategies should be applied. The objective of this method is to make the searched optimal structure of target which not only takes full use of the common characteristics in the multiple sources, but also reflects the characteristics of available useful information in the target. In the following section, the above three main problems will be analyzed respectively.

Journal Pre-proof 3.1.1 The CSI extraction Based on the related research results on the BN structure learning from the related tasks (Diane

Oyen

& Lane, 2012; Diane Oyen & Lane, 2014), the new BN structure transfer learning strategy is proposed in this section. Given data-sets Da1s … Da sL and structures G1s … G sL from the L related sources, the CSI is

pro of

extracted:

P(G g | G1:s L , Da1:s L ) ∝ P(G g )P(G1:s L , Da1:s L | G g )

∝ P(G g )P(G1:s L | G g )P(Da1:s L | G g ) L

(3)

L

It assumes that P(G1:s L | G g )=∏ P(G ls | G g ) and P(Da1:s L | G g )=∏ P(Da ls | G g ) . The Equation (3) l =1

l =1

is simplified to the following form: L

L

P(G g | G1:s L , Da1:s L ) ∝ P(G g )∏ P(G ls | G g )∏ P(Da ls | G g ) l =1

(4)

l =1

re-

Performing logarithm operation on equation (4), it can obtain that L

L

logP(G g | G1:s L , Da1:s L ) ∝ log P(G g ) + ∑ log P(G ls | G g ) + ∑ logP(Da ls | G g ) l =1

Based on the research results in the papers (Diane

(5)

l =1

Oyen & Lane, 2012; Diane Oyen & Lane,

lP

2014), P(G ls | G g ) can be expressed by the form (6):

s

P(G ls | G g ) ∝ (1 − α ) β∆ (Gl ,G

g

)

(6)

where ∆ is a graph distance metric, which measures the number of structural differences between

urn a

graphs G ls and G g ; The parameter β is the adjustment coefficient; The parameter α ∈ [0,1] controls the similarity of structures. When α = 1 , the structures of G ls and G g are forced to be exactly the same, because only non-zero probability is at ∆ (G ls , G g ) = 0 ; When α = 0 , the structures of G ls and G g will be learned independently, because all values of ∆ (G ls , G g ) will have equal probability.

Jo

Finally, based on the above analysis, the Equation (3) is expressed as the following form (7): L

s

g

L

logP(G g | G1:s L , Da1:s L ) ∝ log P(G g ) + β ∑ log(1 − α )∆ (Gl ,G ) + ∑ logP(Da ls | G g ) l =1

(7)

l =1

L

s

g

where log P(G g ) represents the prior structural information of G g ; β ∑ log(1 − α ) ∆ (Gl ,G ) controls the l =1

L

degree of similarity between G g and G ls ;

∑ logP(Da

s l

| G g ) controls the closeness between G g and Da ls .

l =1

Based on the search and score method, the complexity of structure is considered further. Therefore, the

Journal Pre-proof final score function for the CSI extraction is shown in the following form: L

scoreG g |G s

s 1:L ,Da1:L

l

L

g

l =1

n

scoreDa s |G g =

s

= log P(G g ) + β ∑ log(1 − α ) ∆ (Gl ,G ) + ∑ scoreDa s |G g qi

ri

(8)

l

l =1

n

∑∑∑ m

l ijk

i =1 j =1 k =1

l log(mijk / mijl ) − (1 / 2) log ml ∑ qi (ri − 1)

(9)

i =1

where n represents the number of nodes in BN model ( 1 ≤ i ≤ n ); qi represents the number of

pro of

alternative combination states of the parent nodes of the i th node ( 1 ≤ j ≤ qi ); ri represents the number l represents the number of records which satisfy the of alternative states of the i th node ( 1 ≤ k ≤ ri ); mijk

condition that the i th node is in the k th state and its parent set is in the j th state for the l th source;

mijl represents the records which satisfy the condition that the parent set of i th node is in the j th state for the l th source; ml represents the sample size of l th source.

one:

Before

calculating ∆ (G ls , G g ) ,

some

related

symbols

are

explained.

re-

Remark

gijs ′ (1 ≤ i ≤ n, 1 ≤ j ′ ≤ n) represents the element of the matrix G ls , and its value range includes {1, −1, 0} . When gijs ′ = 1 , there is an arc from node i to node j ′ ; when gijs ′ = -1 , there is an arc from node j ′ to

lP

node i ; when gijs ′ = 0 , there is no arc between node i and node j ′ . For the matrix G ls , the sum of the diagonal elements is zero, that is to say gijs ′ +g sj ′i = 0 . G ls represents the upper triangular matrix of G ls ,

gijg′ ( j ′ ≥ i )

urn a

and gijs ′ ( j ′ ≥ i ) represents its element. G g represents the upper triangular matrix of G g , and represents

n

its

element.

∆ (G ls , G g )

can

be

calculated

n

by ∆ (G ls ,G g ) = ∑∑ | gijs ′ -gijg′ | ( j ′ ≥ i ) . i =1 j ′ =1

Remark two: The parameter β is the adjustment coefficient, which is used to control the role of L

∑ log(1 − α ) l =1

∆ (G ls ,G g )

and make sure its value owns the same order of magnitude as other items. The

Jo

bigger the value of β is, the greater the role of

L

∑ log(1 − α )

∆ (G ls ,G g )

is. The parameter α ∈ [0,1] controls

l =1

the similarity between G ls and G g . When the value of α is close to 1, the structures of G ls and G g own more similarity; When the value of α is close to 0, the structures of G ls and G g own less similarity. The values of α and β need to be determined based on the practical situation and the above principle.

Journal Pre-proof

3.1.2 The determination of target structure Although the available information in the target is limited, it reflects the valuable information of target in a way. Therefore, in the process of searching for the target structure, the CSI and the information in the target are all utilized. The posterior probability of target structure given the target data and CSI can

pro of

be expressed as the following form:

P(G t | G g , Da t ) ∝ P(G t )P(G g , Da t | G t )

(10)

It assumes that P(G g , Da t | G t ) = P(G g | G t )P(Da t | G t ) . The Equation (10) is simplified to the following form:

P(G t | G g , Da t ) ∝ P(G t )P(G g | G t )P(Da t | G t ) g

,G t )

. Performing logarithm operation on equation (11), it can obtain that

re-

where P(G g , G t ) ∝ (1 − α ′) β ′∆ (G

(11)

logP(G t | G g , Da t ) ∝ log P(G t ) + β ′ log(1 − α ′)∆ (G

g

,Gt )

+ log P(Da t | G t )

(12)

Based on the search and score method, the following score function is used to search the target

lP

structure G t .

scoreGt |G g ,Dat = log P(G t ) + β ′ log(1 − α ′)∆ (G

g

,G t )

+ scoreDa t |Gt

(13)

urn a

where the parameter β ′ is the adjustment coefficient; the form of scoreDa t |Gt can refer to the Equation (9). The parameter α ′ ∈ [0,1] controls the similarity between the CSI and the structure of target. When the value of α ′ is close to 1, the structures of G g and G t are more similar; When the value of α ′ is close to 0, the structures of G g and G t are less similar. The values of α ′ and β ′ are determined based

Jo

on the practical situation.

3.1.3 The different transfer learning strategies for the different cases When the available useful information of multiple sources and target are in the different situations, the different strategies need to be applied to learn the structure of target. When searching for the CSI of multiple sources, three cases need to be considered.

Case one: the models and data information of multiple sources are all known. As the description of Figure 2, when the structures of multiple sources are known, the structures

Journal Pre-proof are used to determine the prior structural information of G g . Based on the Equation (8), the CSI can be extracted. In this process, the higher the number of appearances for one edge in all the structures of multiple sources is, the greater the probability that this edge appears in the CSI is.

Case two: only the data information of multiple sources is known. For this case, when searching for the CSI, the Equation (8) is simplified into the following form: L s 1:L ,Da1:L

= ∑ scoreDa s |G g

pro of

scoreG g |G s

(14)

l

l =1

Case three: only the models of multiple sources are known.

For this case, when searching for the CSI, the Equation (8) is simplified into the following form: L

scoreG g |G s

s 1:L ,Da1:L

s

= log P(G g )+β ∑ log(1 − α ) ∆ (Gl ,G

g

)

(15)

l =1

The cases two and three are the special situations of case one.

re-

When searching for the structure of target, two cases need to be considered.

Case four: the expert knowledge of target is available.

For this case, the prior structural information of G t can be determined. Based on the Equation (13), the structure of target can be extracted.

lP

Case five: the expert knowledge of target is not available.

For this case, when searching for the structure of target, the Equation (13) is simplified into the following form:

urn a

scoreGt |G g ,Dat = β ′ log(1 − α ′) ∆ (G

g

,Gt )

+ scoreDa t |Gt

(16)

The case five is the special situation of case four.

3.2 The proposed BN parameters transfer learning strategy In this section, the new BN parameters transfer learning method is proposed to improve the

Jo

performance of target parameters learning, which is shown in the Figure 3. The goal of parameters learning is to determine all p(X i |pa(X i )) . The parameters can be obtained by maximum likelihood estimation (MLE). Three problems for the parameters transfer learning need to be considered: judge whether to transfer the parameter or not; how to evaluate the similarity of parameters between the multiple sources and target; how to fuse the parameters information from the multiple sources to learn the target parameters. Given data-sets Da1s … Da sL , Da t and structures G1s … G sL , G t , to solve the above

Journal Pre-proof problems, the specific process is shown as follows:

Step 1: The estimation parameters θɶ t of target domain are obtained by the target data Da t based on MLE.

θɶ t = arg max p (Da t | θ t )

(17)

θt

pro of

Step 2: The estimation parameters of target domain are obtained by the source data Da ls based on MLE.

θlt = arg max p (Da ls | θ t ) θt

(18)

where θlt represents the estimation parameters of target domain obtained by the l th source data

Da ls based on MLE.

Step 3: The similarity of multiple sources is distinguished based on the difference of structures between

re-

the multiple sources and target (DSST).

scorelsim = 1 / max(1, N l ) (1 ≤ l ≤ L)

(19)

where N l represents the number of different arcs between the l th source structure and the target

lP

structure. The way of calculation of N l is the same with ∆ , which can refer to Remark one. The larger the number of N l is, the smaller the similarity score of the l th source is.

Step 4: The DSST is used to determine that whether the parameters information of source is applied to

Step 5.

urn a

parameters transfer learning or not. If the answer is ‘Yes’ , shift to Step 6; If the answer is ‘No’ , shift to

The target and the multiple sources may only own the local same structures. The parameters of source nodes which own the different parent node sets (PNS) compared with target structure are less likely to have the similar probability distribution to the target. Therefore, the structures of multiple sources are compared with the searched structure of target. If the PNS of one node are the same

Jo

between the source and target, the parameters of this node in this source will be used to transfer learning. Otherwise, it is not considered as the information for transfer learning. For each node in the target, all the sources need to be evaluated respectively by the same way.

Step 5: The parameters of node with the different PNS in the source are not used to transfer learning. This source is not regarded as the alternative source.

Step 6: The weights of alternative sources for the fusion algorithm are calculated.

Journal Pre-proof The weights of alternative sources for the fusion algorithm are calculated by the following equality: L′

ωl = scorelsim

∑ score

sim l

(20)

l =1

where ωl represents the weight of l th source; scorelsim represents the similarity score of l th source;

pro of

L′ ( L′ ≤ L ) represents the number of alternative sources. Step 7: The final parameters of target domain are calculated by the following fusion function: L′

θˆt = ηθɶt + (1 − η )∑ ωlθ lt

(21)

l =1

where η ( 0 < η ≤ 1 ) represents the weight of the parameters obtained by the data information of target, which is determined based on the practical situation; θˆt represents the estimation parameters of target. Above operation is implemented on each node in the target respectively.

re-

Remark three: In the proposed BN parameters transfer learning strategy, to avoid the influence of negative transfer and ensure the effectiveness of transfer learning, the similarity of multiple sources needs to be measured. If the source owns the more similar structure as target, it is more likely to own

lP

the similar probability distribution. The DSST is used to determine whether the parameters information of source is applied to transfer learning or not and the weights of multiple sources for transfer learning. In the proposed strategy, the parameters of every node in the target are calculated respectively, because the alternative parameters information in the multiple sources may be different for every node.

urn a

Similar to the structure transfer learning method, when the available useful information of multiple sources and target are in the different situations, the different strategies need to be applied to learn the parameters of target. Three cases need to be considered.

Case one: the models and data information of multiple sources are all known. As the description of Figure 3, when the models of multiple sources are known, the structures are

Jo

used to determine that whether the parameters information of source is applied to transfer learning or not and the weights of multiple sources. Based on the Equations (17)-(21), the parameters of target can be obtained.

Case two: only the data information of multiple sources is known. For this case, firstly, the structures of multiple sources need to be learned by the sufficient data information from multiple sources. Other steps are the same with the case one.

Journal Pre-proof Case three: only the models of multiple sources are known. For this case, when the PNS of one node in the source are the same with the target, the parameters of this node are used to transfer learning directly. This operation replaces the step 2 in the section 3.2. Other steps are the same with the case one. The cases two and three are the special situations of case one.

pro of

To express the description of symbols in this paper in a clear way, the symbols in this paper are summarized in the Table 1.

The parameters of target are obtained by the mutiple sources data based on MLE

lP

N

re-

The similarity of mutiple sources is calculated based on the DSST

Are the obtained parameters from the mutiple sources used to transfer learning? Y

The weights of alternative sources are calculated

The parameters of target are obtained by the target data based on MLE

The final parameters of target are calculated by the fusion algorithm

urn a

This source is not used as the alternative source for the parameters transfer learning

Jo

Figure 3 The proposed BN parameters transfer learning strategy

Table 1 The symbols and their corresponding description in this paper Symbol

Description

D ={V, G, Da}

Domain

V={X1 , X 2 , X 3 ,… , X n }

Nodes

Da

Data

G

A directed acyclic graph

Journal Pre-proof pa(X i )

Parent nodes of node X i

D t ={V t , G t , Da t } s 1

s 2

s L

s l

Target domain s l

s l

s l

{D , D ,… , D } (L ≥ 1) D = {V , G , Da }

A set of source domains

Da t = {d1t , dt2 ,… , d tN }

Target domain data

s l

s 1

s 2

s M

Da = {d , d ,… , d }

Source domain data

ˆt G

The estimation of target domain structure

G

t

Target domain structure

G

pro of

Gg

CSI

s l

Structure of l th source domain

G1:s L Da

Structures of source domains

s 1:L

Data sets of source domains



A graph distance metric

The number of records which satisfy the condition that l mijk

the i th node is in the k th state and its parent set is in

re-

the j th state for the l th source

The records which satisfy the condition that the parent

mijl

set of i th node is in the j th state for the l th source

ml

θˆt θɶ t

urn a

θlt

Target domain parameters

Nl

The estimation of target domain parameters

lP

θ

The sample size of l th source

t

The estimation of target domain parameters obtained by the target data Da t

The estimation of target domain parameters obtained by the source data Da ls

The number of different arcs compared with the target domain for the l th source

L′ ( L′ ≤ L )

The number of alternative sources.

ωl

The weight of l th source

Jo

4. Experimental Results

Firstly, to demonstrate the feasibility of proposed method, this section presents the experiments on the well-known Asia network (Kabli, Herrmann, & McCall, 2007 ; Kim, Ko, & Kang, 2013; Vafaee, 2014) (Lauritzen & Spiegelhalter, 1988). The network structure of Asia network is shown in the Figure 4. The Asia network is used to represent the relationships for the Chest Clinic, which is a diagnostic demonstrative BN. Then the proposed BN transfer learning strategy is applied to establish the safety

Journal Pre-proof control model for the thickening process of gold hydrometallurgy, which is compared with the traditional modeling method only using the limited data information of target process to verify the

pro of

superiority.

4.1 The experiments on the Asia network

re-

Figure 4 The structure of Asia network

lP

To verify the proposed BN transfer learning strategy, three related sources models are constructed. The

Jo

urn a

structures of three related sources models are shown in the Figure 5.

(a)

(b)

Journal Pre-proof

S

L

T

E

X

B

pro of

A

D

(c)

re-

Figure 5 The structures of three related sources models (a) source one; (b) source two; (c) source three

The dataset with 1000 samples from the true Asia network is used as the target data information, which is represented as data0-1000. 2500 samples are collected from three sources networks as the sources data information respectively, which are represented as data1-2500, data2-2500 and data3-2500.

lP

Based on the proposed BN structure transfer learning method in the section 3.1, the CSI of G g is extracted by the equations (8) and (9), and the structure of target is learned by the equation (13). In this process, the Genetic Algorithm is used to search the structure. The crossover probability is set as 0.9,

urn a

the mutation probability is set as 0.01 and the max generation is set as 300. The search algorithm runs for 20 times. To evaluate the performance of learned structure of target, the learned structure of target is compared with the true Asia network. The average of spurious edges, missing edges and reversed edges are introduced as the indexes to measure the performance of proposed method. The reversed edge can be considered as adding an edge after deleting an edge. The total edge difference is calculated by the

Jo

sum of spurious edges, missing edges and double reversed edges. To evaluate the performance of proposed BN structure transfer learning strategy, the learned structure of target only by the data information data0-1000 from the target is compared with the proposed method. The comparison results are shown in Table 2.

Journal Pre-proof Table 2 The comparison results on the structure difference of target by the proposed BN structure transfer learning method with only using the data information from the target The structure difference of target by the proposed BN structure transfer learning method

spurious edges The average of missing edges The average of reversed edges The total edge difference

target only using the data information of target

α ′=0.1

α ′=0.4

α ′=0.6

α ′=0.9

4.7

3.85

3.45

3.55

2.35

2.9

3.1

0.7

0.7

0.7

8.45

8.15

7.95

10.4

pro of

The average of

The structure difference of

3

1.55

0.4

1.65

7.35

15.25

re-

From the comparison results in Table 2, it can obtain that the obtained network structure by the proposed method is much closer to the true network. Therefore, the proposed BN structure transfer learning strategy is effective to obtain the structure of target and it owns the better performance than

lP

the method which only uses the limited information of target to learn the structure of target. To show the influence of data size from the target on the performance of proposed method, the simulation results of different data sizes from the target are shown in the Figure 6. The total edge difference by the method which only uses data information from the target is represented as N1 . The

urn a

total edge difference by the proposed BN structure transfer learning method is represented as N 2 . The difference value of two kinds of total edge differences is represented as M = N1 − N 2 . In the Figure 6, the vertical axis represents the difference value M and the horizontal axis represents the data size of

Jo

target. The lines with the different forms are obtained based on the different values of α ′ .

Journal Pre-proof

9 8

0.1 0.4

7

0.6 0.9

5 4 3 2 1 100

1000

2000

pro of

M

6

3000

4000

data size

re-

Figure 6. The difference values of two methods under the different data sizes of target

From the Figure 6, it can obtain that as the increase of data size, the difference value M changes smaller and smaller. It can conclude that when the data size of target increases, the role of transfer

lP

learning decreases. It conforms to the general experience that when the data size of target is small, the transfer learning will play more important role in the structure learning of target. For the BN parameters learning, the Kullback-Leibler (KL) divergence is used to evaluate the performance of parameters learning and measure how closely the learned parameters with the true

urn a

parameters. The smaller the value of KL divergence is, the better the learned parameter is. The parameters of target are learned by the proposed BN parameters transfer learning method in the Section 3.2. The Table 3 shows the values of KL divergence under the different weights η . The smaller the value of η is, the bigger the role of parameters transfer learning is. When η =1 , it is obvious

Jo

that the parameters of target are learned only using target data.

Table 3 The values of KL divergence under the different weights η by the proposed method The value of η 0

The value of KL divergence when data size is 100 The value of KL divergence when data size is 200

0.2

0.4

0.6

0.8

1

15.6102

15.1453

15.1720

15.9189

22.4935

15.1808

14.3580

14.0956

14.6597

22.6415

16.5178

Journal Pre-proof The value of KL divergence when data size is 300 The value of KL divergence when data size is 400 The value of KL divergence when data size is 500

15.4437

14.8574

14.8048

15.5283

22.1781

15.3717

14.7241

14.6130

15.2689

21.5119

15.2640

14.5034

14.2833

14.8492

21.1441

pro of

From the comparison results in the Table 3, it can obtain that the KL divergence value by the method which only uses the target data is bigger than all the other values using the proposed BN parameters transfer learning method. The learned parameters by the proposed BN parameters transfer learning method are closer to the true parameters. Therefore, the proposed BN parameters transfer learning method is feasible to obtain good parameters and the learned parameters are better than the parameters learning only using target data ( η =1 ). As the increase of η , the role of multiple sources

re-

decreases, and the role of target increases. The value of KL divergence gets smaller and then it gets bigger. This is because that the useful information in the target plays the positive role in a way. But limited data information of target cannot obtain the good parameters, when the more data information

lP

of target is utilized to transfer learning, the performance of parameters decreases. For the proposed BN parameters transfer learning method in the Section 3.2, if the PNS of one node in the source are different from the target, the parameters of this node in this source will not be considered as the information for transfer learning. To verify the necessity of this way, the values of KL

urn a

divergence under the different weights η when the DSST is not considered are shown in the Table 4.

Table 4 The values of KL divergence under the different weights η when the DSST is not considered 0

The value of KL divergence when data size is 100

The value of KL divergence when

Jo

data size is 200

The value of KL divergence when data size is 300

The value of KL divergence when data size is 400

The value of KL divergence when data size is 500

17.3136

The value of η 0.2

0.4

0.6

0.8

1

16.2461

15.6304

15.5053

16.0883

22.4935

15.8293

14.8607

14.4452

14.8380

22.6415

16.0885

15.3577

15.1555

15.7113

22.1781

16.0883

15.3380

15.0926

15.5683

21.5119

15.9584

15.0816

14.7231

15.1174

21.1441

Journal Pre-proof Comparing the results in the Tables 3 and 4, when the DSST is not considered, the obtained values of KL divergence are all bigger than the proposed BN parameters transfer learning method for the corresponding weight η and data size. It can conclude that the way of considering the DSST will obtain the better parameters transfer learning performance. The proposed method can avoid the influence of negative transfer.

pro of

The numerical experimental results on the Asia network show that our proposed BN transfer learning method is effective to learn the model of target and owns the better performances than the method which only uses the target data.

4.2 Simulation results on the thickening process of gold hydrometallurgy

The proposed BN transfer learning strategy is applied to establish the safety control model for the process

of

gold

hydrometallurgy

on

the

semi-physical

simulation

platform.

re-

thickening

Hydrometallurgy semi-physical simulation platform has been designed and constructed by our team during the past few years. Based on mechanism analysis and actual data, the simulation platform can

lP

simulate the hydrometallurgical process including the thickening, cyanide leaching, washing and cementation sub-processes. The new optimal control, monitoring and fault diagnosis methods can be verified by this simulation platform. The hardware structure diagram of system can refer to the related research results (H. Li, et al., 2017; H. Li, et al., 2019). In this paper, the common abnormities of

urn a

thickening process are taken as the research background.

4.2.1 Example one

Based on the analysis in the section 2, it can conclude that for the different dosages of flocculants, the relationships among the related variables of abnormity are different. Therefore, the data information

Jo

under the different dosages of flocculants is collected. The corresponding models are learned by the different data information. When the dosage of flocculants is too high, the established model is shown in the Figure 7. When the dosage of flocculants is too low, the established model is shown in the Figure 8. The physical meanings and the grades of nodes are shown in the Table 5. When the dosage of flocculants is too high, it may result in the increase of the underflow concentration, but it is not the reason of abnormal overflow turbidity. When the dosage of flocculants is too low, it may result in the

Journal Pre-proof abnormal overflow turbidity, but it is not the reason of high underflow concentration. From the Figures 7 and 8, it can obtain that the learned relationships among the related variables by the data information

pro of

conform to the analysis of mechanism.

urn a

lP

re-

Figure 7. The established BN model when the dosage of flocculants is too high

Figure 8. The established model when the dosage of flocculants is too low

Table 5. The physical meanings and the grades of nodes The physical meanings of

The nodes of BN

Jo

A

B

C

nodes

The grades of nodes

The opening degree of valve

1. unchanged

3 and the power of slurry

2. middle high grade

pump 1

3. high grade

The opening degree of valve

1. closed grade

4

2. open grade

The opening degree of valve

1. unchanged

1

2. middle low grade 3. low grade

Journal Pre-proof

G

H

I J

3. severe grade The underflow concentration

1. nonoccurrence

is too high

2. medium grade 3. severe grade

The buffer slot 1 under the

1. nonoccurrence

thickener is empty

2. occurrence

The motor current in the

1. nonoccurrence

thickener is too large

2. medium grade 3. severe grade

The bed pressure in the

1. nonoccurrence

thickener is too high

2. medium grade 3. severe grade

The electricity of slurry

1. nonoccurrence

pump 1 is not stable

2. occurrence

The dosage of flocculants

1. normal 2. too low 3. too high

The overflow turbidity is too

1. nonoccurrence

high

2. medium grade

urn a

K

2. medium grade

pro of

F

low

re-

E

1. nonoccurrence

lP

D

The underflow rate is too

3. severe grade

The two models in the Figures 7 and 8 are regarded as two related sources. 2500 samples are collected from two sources models respectively, which are used as the data information of sources. In

Jo

addition, 100 samples and 500 samples are collected when the dosage of flocculants is too low, which are used as the target data information respectively. To verify the proposed BN transfer learning method, the target data information and two related sources information are applied to establish the model of target, and the learned model is represented as “Model one”. To verify the superiority of proposed method, the traditional modeling method on the scene is used to compare the modeling results. In the traditional method, the model is learned only by the limited target data information. The

Journal Pre-proof useful information from the related sources is not utilized effectively. The learned model by the traditional method is represented as “Model two”. Based on the analysis of practical situation, some possible abnormal scenarios are extracted, which are shown in the Table 6. In the Table 6, every abnormal scenario includes four characteristics with the different degrees. The specific meanings of different degrees can be found in the Table 5. Taking the abnormal scenario 1 as the example, the states

pro of

of nodes G, H and K are all at the severe grade and the state of node I is occurrence. Not all scenarios are included in the Table 6. Only some typical scenarios are considered and other similar scenarios can be analyzed in the same way. The abnormal scenarios will be used as the evidences to obtain the inference results of nodes A, B, C and J by the Models one and two. The reasoning result which owns the largest posterior probability will be regarded as the final decision. The total number of reasoning result is 24, and it is obtained by the number of abnormal scenarios (6) multiplying the number of

re-

nodes to be inferred (4). If the obtained final decisions conform to the expert knowledge, operation experience and data information, the inference result is considered to be correct. The precision rate is calculated by the correct number of inference results dividing the total number of reasoning results,

lP

which is shown in the Table 7.

Table 6 Some possible abnormal scenarios for three abnormities in the thickening process G

H

I

K

1

3

3

2

3

2

3

3

1

3

3

3

3

2

1

4

3

3

1

1

5

1

1

1

3

6

1

1

1

1

urn a

Scenario number

Table 7 The precision rate of inference result by the Models one and two The precision rate of inference result by the

The precision rate of inference result by the

size

Model one

Model two

95.8%

75%

95.8%

91.7%

100 500

Jo

Data

From the results in the Table 7, it can conclude that the model learned by the proposed BN transfer learning method can obtain the higher precision rate of inference than the learned model only by the

Journal Pre-proof limited data information of target. In addition, when the data size of target is less, the proposed transfer learning method will play more important role.

4.2.2 Example two

pro of

In the Example one, the collected target data information is only from the condition that the dosage of flocculants is too low. The learned model structure of target is the same with the Figure 8. In the Example two, the collected target data information is from two conditions. When the dosage of flocculants is too low, 250 samples are collected. When the dosage of flocculants is too high, another 250 samples are collected. Above collected 500 samples are used as the data information of target. When the collected target data information is from the different dosages of flocculants, how will the learned model change? Therefore, in the following example, the model of this condition is established

re-

by the proposed BN transfer learning method. The related sources information is the same with the Example one. The learned model of target is shown in the Figure 9, which is represented as “Model

lP

three”.

J

A

B

C

urn a

D

K

E

G

F

H

I

Figure 9. The learned model by the proposed transfer learning method when the data information

Jo

of target from the different dosages of flocculants

From the Figure 9, it can conclude that when the data information of target from the different dosages of flocculants, the relationships between the node J and the node K and the relationships between the node J and the node E all exist. Based on the analysis in the Example one, this structure conforms to the practical situation. To compare the performance of transfer learning, the model is also

Journal Pre-proof learned only by the limited target data information, and the learned model is represented as “Model

four”. In the proposed BN parameters transfer learning method, the DSST is considered to decide whether the parameter is used to transfer learning. To show the performance of this way, the “Model

five” is learned when the DSST is not considered. The abnormal scenarios in the Table 6 are used as the evidences to obtain the inference results. The precision rates of inference result by three kinds of

pro of

models are shown in the Table 8.

Table 8 The precision rates of inference result by three kinds of models The precision rate of inference result Model three (η =0.1)

75%

Model four Model five (η =0.1)

66.7%

70.8%

re-

From the results in the Table 8, it can conclude that the model learned by the proposed BN transfer learning method can obtain the higher precision rate of inference than the learned model only by the limited data information of target. In addition, when the DSST is not considered, the precision rate of

lP

inference will decrease.

Based on the simulation results in the Examples one and two, it can conclude that the proposed BN transfer learning strategy is effective to establish the safety control model for the thickening process of gold hydrometallurgy. It owns the better performance than the traditional modeling method

5. Conclusions

urn a

which only uses the limited data information of target.

This paper develops a new safety control modeling method based on the BN transfer learning strategy for the thickening process of gold hydrometallurgy. First of all, by analyzing the existing research

Jo

results on the safety control for the thickening process of gold hydrometallurgy, the problem to solve is transformed into the BN transfer learning problem. Furthermore, a new BN transfer learning strategy is proposed, which includes the structure transfer learning and the parameters transfer learning. Finally, the experimental results demonstrate that the proposed BN transfer learning strategy is effective and owns the better performances. The influences of data size of target and DSST on the transfer learning are analyzed and compared. Finally, the proposed BN transfer learning strategy is applied to establish

Journal Pre-proof the safety control model for the thickening process of gold hydrometallurgy. The simulation results demonstrate that the proposed method is effective to establish the model when the dosages of flocculants are in the different situations and it owns the better performances than the traditional modeling method which only uses the limited data information of target.

pro of

Conflict of interest There is no conflict of interest.

Acknowledgments

This work was supported by the National Nature Science Foundation of China [grant numbers 61533007, 61873049, 61973057], the Foundation for Innovative Research Groups of the National

re-

Natural Science Foundation of China [grant numbers 61621004], the National Key Research and Development Program of China [grant numbers 2017YFB0304205]

lP

References

Anam, A., & Rushdi, M. 2019. Classification of scaled texture patterns with transfer learning. EXPERT SYSTEMS WITH APPLICATIONS, 120, 448-460. de Andrade Lima, L.R.P. 2006. Nonlinear data reconciliation in gold processing plants. Minerals

urn a

Engineering, 19, 938-951.

Grolman, E., Bar, A., Shapira, B., Rokach, L., & Dayan, A. 2016. Utilizing transfer learning for in-domain collaborative filtering. KNOWLEDGE-BASED SYSTEMS, 107, 70-82. Jin, Y., Zhang, Y., Jing, Y., & Fu, J. 2019. An Average Dwell-Time Method for Fault-Tolerant Control of Switched Time-Delay Systems and Its Application. IEEE TRANSACTIONS ON

Jo

INDUSTRIAL ELECTRONICS, 66, 3139-3147. Kabli, R., Herrmann, F., & McCall, J., 2007 A Chain-Model Genetic Algorithm for Bayesian Network Structure Learning, GECCO '07 Proceedings of the 9th annual conference on Genetic and evolutionary computation. Publishing, London, England, pp. 1264-1271.

Kim, D.W., Ko, S., & Kang, B.Y. 2013. Structure Learning of Bayesian Networks by Estimation of Distribution Algorithms with Transpose Mutation. Journal of Applied Research and

Journal Pre-proof Technology, 11, 586-596. Lauritzen, S.L., & Spiegelhalter, D.J. 1988. Local Computations with Probabilities on Graphical Structures and Their Application to Expert Systems. Journal of the Royal Statistical Society. Series B 50, 157-224. Li, H., Wang, F., & Li, H. 2017. A safe control scheme under the abnormity for the thickening process

pro of

of gold hydrometallurgy based on Bayesian network. Knowledge-Based Systems, 119, 10-19. Li, H., Wang, F., Li, H., & Wang, X. 2019. The updating strategy for the safe control Bayesian network model under the abnormity in the thickening process of gold hydrometallurgy. Neurocomputing, 338, 237-248.

Li, K., & Principe, J.C. 2017. Transfer Learning in Adaptive Filters: The Nearest Instance Centroid-Estimation Kernel Least-Mean-Square Algorithm. IEEE TRANSACTIONS ON

re-

SIGNAL PROCESSING, 65, 6520-6535.

Li, L., Luo, H., Ding, S.X., Yang, Y., & Peng, K. 2019. Performance-based fault detection and fault-tolerant control for automatic control systems. Automatica, 99, 308-316.

lP

Liu, X., Li, Y., & Chen, G. 2019. Multimode tool tip dynamics prediction based on transfer learning. ROBOTICS AND COMPUTER-INTEGRATED MANUFACTURING, 57, 146-154. Liu, Y., Wang, F.-l., & Chang, Y.-q. 2013. Reconstruction in integrating fault spaces for fault identification with kernel independent component analysis. Chemical Engineering Research

urn a

and Design, 91, 1071-1084.

Lu, J., Behbood, V., Hao, P., Zuo, H., Xue, S., & Zhang, G. 2015. Transfer learning using computational intelligence: A survey. Knowledge-Based Systems, 80, 14-23. Luis, R., Sucar, L.E., & Morales, E.F. 2009. Inductive transfer for learning Bayesian networks. Machine Learning, 79, 227-255.

Jo

Niculescu-Mizil, A., & Caruana, R., 2007. Inductive Transfer for Bayesian Network Structure Learning, Eleventh International Conference on Artificial Intelligence and Statistics (AISTATS). Publishing, San Juan, Puerto Rico.

Oyen, D., & Lane, T., 2012. Leveraging Domain Knowledge in Multitask Bayesian Network Structure Learning, Twenty-Sixth AAAI Conference on Artificial Intelligence. Publishing, Toronto, Canada, pp. 1091-1097.

Journal Pre-proof Oyen, D., & Lane, T. 2014. Transfer learning for Bayesian discovery of multiple Bayesian networks. Knowledge and Information Systems, 43, 1-28. Pereira, F.L.F., Lima, F.D.d.S., Leite, L.G.d.M., Gomes, J.P.P., & Machado, J.d.C. 2017. Transfer Learning for Bayesian Networks with Application on Hard Disk Drives Failure Prediction. 228-233.

pro of

Salaken, S.M., Khosravi, A., Nguyen, T., & Nahavandi, S. 2017. Extreme learning machine based transfer learning algorithms: A survey. Neurocomputing, 267, 516-524.

Shin, H.-C., Roth, H.R., Gao, M., Lu, L., Xu, Z., Nogues, I., Yao, J., Mollura, D., & Summers, R.M. 2016. Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning. IEEE TRANSACTIONS ON MEDICAL IMAGING, 35, 1285-1298.

re-

Sun, C., Ma, M., Zhao, Z., Tian, S., Yan, R., & Chen, X. 2019. Deep Transfer Learning Based on Sparse Autoencoder for Remaining Useful Life Prediction of Tool in Manufacturing. IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 15, 2416-2425.

lP

Talo, M., Baloglu, U., Yildirim, O., & Acharya, U. 2019. Application of deep transfer learning for automated brain abnormality classification using MR images. COGNITIVE SYSTEMS RESEARCH, 54, 176-188.

Tran, H.M., & Trinh, H. 2019. Distributed Functional Observer Based Fault Detection for

urn a

Interconnected Time-Delay Systems. IEEE SYSTEMS JOURNAL, 13, 940-951. Vafaee, F., 2014. Learning the structure of large-scale bayesian networks using genetic algorithm, GECCO '14 Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation. Publishing, Vancouver, BC, Canada, pp. 855-862. Wu, Z., Wu, Y., Chai, T., & Sun, J. 2015. Data-Driven Abnormal Condition Identification and

Jo

Self-Healing Control System for Fused Magnesium Furnace. IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 62, 1703-1715.

Zhao, C., & Gao, F. 2017. Critical-to-Fault-Degradation Variable Analysis and Direction Extraction for Online Fault Prognostic. IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, 25, 842-854. Zhao, C., & Huang, B. 2018. A full-condition monitoring method for nonstationary dynamic chemical

Journal Pre-proof processes with cointegration and slow feature analysis. AIChE Journal, 64, 1662-1681. Zhou, Y., Fenton, N., Hospedales, T.M., & Neil, M., 2015. Probabilistic Graphical Models Parameter Learning with Transferred Prior and Constraints, 31st conference on uncertainty in artificial intelligence. Publishing, pp. 972-981. Zhou, Y., Hospedales, T.M., & Fenton, N. 2016. When and Where to Transfer for Bayes Net Parameter

pro of

Learning. Expert Systems With Applications, 55, 361-373.

Zhu, J., Yao, Y., & Gao, F. 2018. Transfer of Qualitative and Quantitative Knowledge for Similar Batch

Jo

urn a

lP

re-

Process Monitoring. IEEE ACCESS, 6, 73856-73870.

*Author Contributions Section

Journal Pre-proof Author Contribution Statement Hui Li: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Writing - Original Draft. Fuli Wang: Writing - Review & Editing, Supervision, Project administration, Funding acquisition.

pro of

Hongru Li: Writing - Review & Editing, Supervision, Funding acquisition.

Jo

urn a

lP

re-

Qingkai Wang: Resources.