Dynamic control model of BOF steelmaking process based on ANFIS and robust relevance vector machine

Dynamic control model of BOF steelmaking process based on ANFIS and robust relevance vector machine

Expert Systems with Applications 38 (2011) 14786–14798 Contents lists available at ScienceDirect Expert Systems with Applications journal homepage: ...

2MB Sizes 1 Downloads 31 Views

Expert Systems with Applications 38 (2011) 14786–14798

Contents lists available at ScienceDirect

Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa

Dynamic control model of BOF steelmaking process based on ANFIS and robust relevance vector machine Min Han ⇑, Yao Zhao Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, 116023 Dalian, PR China

a r t i c l e

i n f o

Keywords: Basic oxygen furnace (BOF) steelmaking Dynamic control model Adaptive-network-based fuzzy inference system (ANFIS) Robust relevance vector machine

a b s t r a c t This study concerns with the control of basic oxygen furnace (BOF) steelmaking process and proposes a dynamic control model based on adaptive-network-based fuzzy inference system (ANFIS) and robust relevance vector machine (RRVM). The model aims to control the second blow period of BOF steelmaking and consists of two parts, the first of which is to calculate the values of control variables, viz., the amounts of oxygen and coolant requirement, and the other is to predict the endpoint carbon content and temperature of molten steel. In the first part, an ANFIS classifier is primarily constructed to determine whether coolant should be added or not, then an ANFIS regression model is utilized to calculate the amounts of oxygen and coolant. In the second part, a novel robust relevance vector machine is presented to predict the endpoint. RRVM solves the problem of sensitivity to outlier characteristic of classical relevance vector machine, thus obtaining higher prediction accuracy. The key idea of the proposed RRVM is to introduce individual noise variance coefficient to each training sample. In the process of training, the noise variance coefficients of outliers gradually decrease so as to reduce the impact of outliers and improve the robustness of the model. Simulations on industrial data show that the proposed dynamic control model yields good results on the oxygen and coolant calculation as well as endpoint prediction. It is promising to be utilized in practical BOF steelmaking process. Ó 2011 Elsevier Ltd. All rights reserved.

1. Introduction The basic oxygen furnace (BOF) steelmaking is an important metallurgical technology. It is one of the most efficient methods to produce molten steel from hot metal and is also the pre-process of continuous casting and rolling. Because of its high productivity and low cost, until now almost 65% of the total steel in the world are produced by using this method. In general, the aim of controlling BOF steelmaking is to guarantee the proper temperature and element contents of molten steel under the metallurgical standards. In practice, the criterion whether molten steel is acceptable or not mainly depends on the values of endpoint carbon content and temperature (Chou, Pal, & Reddy, 1993). After smelting, carbon content generally decreases from approximate 4% in hot metal to less than 0.08% in molten steel, and temperature increases from about 1250 °C to more than 1650 °C (Han & Huang, 2008). Therefore, it requires some methods to control the decarburization and temperature-rising. In the early years, the process was just controlled according to the experience of operators, which often could not obtain satisfactory result. Thus, some mathematical models were developed to assist the operations. These models were mainly ⇑ Corresponding author. Tel.: +86 411 84707847; fax: +86 411 84707847. E-mail address: [email protected] (M. Han). 0957-4174/$ - see front matter Ó 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2011.05.071

based on material balance and heat balance, which were accountable for their name ‘‘static control models’’. With the development of measuring techniques, a lot of new-style sensors and devices, such as sonic sensors, automatic sub lance and off-gas analysis, have been applied in BOF steelmaking to improve the control effect (Dippenaar, 1999; Iida et al., 1984). Based on these advanced measurements, the process control models are also improved. The models are able to automatically calculate the required amounts of oxygen and auxiliary additions as well as predict the endpoint carbon content and temperature (Birk, Johansson, Medvedev, & Johansson, 2002; Blanco & Diaz, 1993; Chou, Pal & Reddy, 1993; Johansson, Medvedev, & Widlund, 2000). In order to distinguish from the static control models, the improved models are named after ‘‘dynamic control models’’. Notwithstanding the superiority to the static ones, the dynamic models are nevertheless based on physical and chemical laws, which inevitably have some inherent limitations in practical application. That’s because BOF steelmaking process is coupled with heat transfer, mass transfer, and a large number of chemical reactions. The complexity of the nature of BOF steelmaking makes its modeling and control extremely difficult and too hard to be reduced to sets of equations. To overcome the difficulty and establish an exact mathematical relationship between input and output variables of BOF steelmaking, data-driven models such as artificial neural networks (ANNs)

M. Han, Y. Zhao / Expert Systems with Applications 38 (2011) 14786–14798

are adopted. Because of its accurate identification for complex and nonlinear dynamic system (Narendra & Parthasarathy, 1990), ANNs are suitable for both modeling and control purpose in iron and steel making process (Bloch, Sirou, Eustache, & Fatrez, 1997). Radhakrishnan and Mohamed (2000) utilized neural networks as soft sensors to predict silicon and sulfur content of blast furnace hot metal, and created an expert control system to improve the hot metal quality. Pernía-Espinoza, Castejón-Limas, GonzálezMarcos, and Lobato-Rubio (2005) proposed several robust learning algorithms to train neural networks and described the steel annealing process. As for BOF steelmaking, Cox, Lewis, Ransing, Laszczewski, and Berni (2002) used ANNs to predict oxygen and coolant requirements during the second blow period. Fileti, Pacianotto, and Cunha (2006) developed an inverse neural network model to calculate the end-blow process adjustments and the model was successfully implemented on line in a Brazilian steelmaking plant. Das, Maiti, and Banerjee (2009) used ANNs with Bayesian regularization to predict the control action for a steelmaking furnace. Many successful applications of ANNs on steelmaking modeling have been reported in the literature, however ANNs have some limitations. It is difficult to tune the structure parameters, which essentially affect the efficiency and prediction accuracy of ANNs. In addition, the models are sensitive to initialized parameters. In recent years, statistical learning theory has been rapidly developed. It takes structural risk minimization as its principle and focuses on controlling the generalization ability of learning process (Vapnik, 2000). Based on this theory, the support vector machine (SVM) was invented. It enhances the computational ability by using kernel functions and mapping the data into highdimensional space. Moreover, a regularization parameter C is defined to control the trade-off between the model complexity and training error (Müller, Mika, Rätsch, Tsuda, & Schölkopf, 2001). Therefore, SVM becomes a powerful tool to identify the nonlinear system and many successful applications have been achieved (Esen, Ozgen, Esen, & Sengur, 2009; Vong, Wong, & Li, 2006; Zhang & Wang, 2008). As in the application of steelmaking process control, SVM has also delivered good performances. Yuan, Mao and Wang (2007b) integrated multiple support vector machines with principle component regression to predict the endpoint parameters of electric arc furnace steelmaking. Valyon and Horváth (2009) proposed a sparse and robust extension of least-square SVM (LS-SVM) to calculate the amount of oxygen blown in BOF steelmaking, and demonstrated that the performance of LS-SVM was better than that of ANNs. However, despite its success, SVM has a number of significant and practical disadvantages. For example, predictions are not probabilistic and the kernel function must satisfy Mercer’s condition. The error/margin trade-off parameter C needs to be estimated by using cross validation which consumes a lot of time. Moreover, although SVM is relatively sparse, the number of support vectors still grows linearly with the size of the training sample set (Tipping, 2000). These disadvantages limit the further applications of SVM. To alleviate the above drawbacks, Tipping (2000, 2001) proposed the relevance vector machine (RVM). RVM is a nonlinear probabilistic model based on Bayesian evidence framework. It uses type-II maximum likelihood method, which is also referred to as ‘‘evidence procedure’’ (Mackay, 1992a, 1992b) to optimize the hyperparameters of the model and obtain a sparse solution. The generalization performance of RVM is comparable to that of SVM, whereas RVM is a higher sparse model (Tipping, 2001). Due to its advantages, RVM has obtained state-of-the-art results in different applications, such as microbiological fermentation (Sun & Sun, 2005), mechanical engineering (Yang, Zhang, & Sun, 2007), medical image processing (Wei, Yang, Nishikawa, Wernick, and Edwards, 2005) and fault diagnosis (Widodo et al., 2009). However, RVM

14787

has a serious weakness that it assumes all of the training samples are coupled with independent Gaussian noise: e  N(0, r2). A wellknown disadvantage with Gaussian noise model is that it is not robust. If the training samples are contaminated by outliers, the accuracy of RVM model will be significantly compromised (Faul & Tipping, 2001). In this paper, a novel robust relevance vector machine (RRVM) is contrived, which assumes that each training sample has its individual coefficient of noise variance. During the model training procedure, the coefficients corresponding to outliers will decrease drastically to detect and eliminate outliers. We utilize the proposed RRVM as an identifier to predict the endpoint carbon content and temperature of molten steel. In BOF steelmaking process, measured data are often interfused with outlying observations, while RRVM can reduce the impact of outliers and has good generalization ability (These will be demonstrated by simulations). Therefore, it is suitable to construct the endpoint prediction model. On the other hand, the amounts of oxygen and coolant required in the second blow, which are considered as control variables, are critical to achieve the expected endpoint. Based on the control experience of operators and production data of a steel plant, adaptive-network-based fuzzy inference system (ANFIS) is adopted to calculate the values of these control variables. ANFIS can acquire knowledge from a set of input-output data, and has competitive calculation accuracy (Jang, 1993). The proposed dynamic control model is implemented as follows. At first, ANFIS is utilized to calculate the amounts of oxygen and coolant based on metallurgical standards and parameters measured by sublance. Then the calculation results and measured data are used as input variables of RRVM model to predict the endpoint carbon content and temperature. If predicted values are in the expected range, the control variables determined by ANFIS will be accepted. Otherwise, the calculated control variables should be adjusted by operators in order to achieve the optimal values. Combining ANFIS and RRVM, a dynamic control model of BOF steelmaking process is constructed. In order to acquire the expected control effect, the premise is that RRVM model must be well-trained as an identifier to approximate the relationship between input and output, and accurately predict the endpoint carbon content as well as temperature. In the latter part of this paper, simulations will demonstrate that RRVM has good approximation ability and robustness. The remainder of this paper is organized as follows: In Section 2, the production process of BOF steelmaking is briefly described. Section 3 presents the structure of dynamic control model for the second blow period. Section 4 introduces the methods ANFIS and RRVM utilized in this paper. Some simulations on benchmark data and industrial data are given in Section 5. In Section 6, the conclusions are drawn.

2. Description of BOF steelmaking process BOF comprises a vertical solid-bottom furnace with a vertical water-cooled oxygen lance entering the furnace from above. The furnace is tilting for charging and tapping. Above the vessel, there are a hood and a duct for exhausted gas. The general view of BOF is shown in Fig. 1(Han & Huang, 2008). The molten steel capacity of a furnace generally ranges from 150 to 180 tons and the whole production process is as follows: Step 1: Approximately 20–30 tons of scrap and 120–130 tons of molten hot metal are charged into the furnace. The hot metal has been preprocessed for desulfuration. Step 2: Oxygen is blown into the furnace through the lance at a speed of 500 cubic meters per minute. Meanwhile, burnt lime, dolomite and other auxiliary materials are added. On the

14788

M. Han, Y. Zhao / Expert Systems with Applications 38 (2011) 14786–14798

expected endpoint carbon content and temperature. That will avoid reblow so as to save operation time and oxygen.

exhausted gas hood

3. Structure of dynamic control model

lance sputtering tapping hole molten steel

slag refractory body

Fig. 1. General view of basic oxygen furnace (BOF).

surface of hot metal, oxygen is reacted with the elements such as carbon, silicon and manganese to remove the impurities and rise the temperature continuously. The slag and exhausted gas are also generated. Step 3: During the whole process, oxygen is blown for about 25 min. At a predetermined point in the blowing (about 85% of the whole blowing period), automatic sublance is sticked into the molten steel to measure the temperature and take a sample. Carbon content is estimated by analyzing the sample. Comparing the estimated values with expected endpoints, the amounts of oxygen and coolant for the second blow period are calculated. Step 4: The blowing continues until the second blow period ends. Then the endpoint temperature and carbon content of molten steel are measured for the second time. If the endpoint measurements are acceptable, the molten steel will be tapped. Otherwise, additional oxygen will be blown until steel components and temperature meet the technical standards. This period is called ‘‘reblow’’ which consumes more oxygen and should be avoided. The definitions of different blowing periods are illustrated in Fig. 2. Throughout the whole process, there are three periods, namely first blow, second blow and reblow. The first blow period is also referred to as ‘‘main blow’’. After the first blow, temperature and carbon content of molten steel are measured by sublance to calculate the oxygen and coolant requirement in the second blow. In this paper, we design the dynamic control model to assist the second blow control. Based on the measured parameters, the model is able to calculate the appropriate amounts of oxygen and coolant as well as predict the

Hot metal and scrap are charged

As pointed out in Section 2, the aim of dynamic control model is to accurately calculate the required volume of oxygen and the weight of coolant. Furthermore, it should achieve the expected endpoints at the end of the second blow, so the control model consists of two parts. The first part focuses on calculating the amounts of oxygen and coolant by using ANFIS model. Input variables are temperature and carbon content of molten steel measured at the end of the first blow. However, in practical operations, coolant is not always added. It depends on the current condition and expected endpoints (Fileti et al., 2006). We analyze 1000 groups of production data which are collected from Benxi Steel Sheet Co., Ltd. in China and illustrate the distribution histogram of coolant data in Fig. 3. The data was collected from May 2008 to July 2008. From the histogram, it is noted that there are a num of cases when there is not any coolant added. In addition, the ANFIS model cannot predict a zero value, hence whether the coolant will be added or not should be determined at first (Cox et al., 2002). This is a binary classification problem and we also choose ANFIS as the classifier for this task. Combined with the binary classifier, the block diagram of dynamic control model is given in Fig. 4. When a new heat is under operation and the first blow period ends, the temperature and carbon content are measured by the use of sublance. Firstly, the measured values are fed into ANFIS oxygen predictor and coolant classifier as inputs to calculate the volume of oxygen as well as determine whether coolant should be added or not. If coolant is needed, the ANFIS coolant predictor will calculate how much coolant should be added. Next, based on the predicted amounts of oxygen and coolant, endpoint model is utilized to predict the temperature and carbon content of molten steel. In case the predicted endpoint values are in the required range, namely ‘‘hit’’, the amounts of oxygen and coolant calculated by ANFIS will be utilized in the real system. Otherwise, the calculated amounts should be adjusted by operators and used to predict the endpoint again. Possibly this procedure will be repeated for several times until the predicted temperature and carbon content are acceptable. Then the calculated amounts of oxygen and coolant are finally used to control the steelmaking process. After the operation of current heat, next heat will begin.

4. Methods In this section, the methods utilized in dynamic control model will be described in detail, including ANFIS and RRVM. ANFIS is meant to calculate the amounts of oxygen as well as coolant,

First Second Measurement Measurement

Oxygen and additions First Blow

Oxygen and coolant Sec. Blow

Main Blow Fig. 2. The whole process of BOF steelmaking.

Reblow Tapping

M. Han, Y. Zhao / Expert Systems with Applications 38 (2011) 14786–14798

where pr1 ; pr2 ; . . . ; prn and qr are fuzzy consequent parameters which can be determined based on least-square regression. Layer 5: Calculate the final output of ANFIS. It is the weighted  r (r = 1, . . . , K). summarization of fr, and the weight is x

800 700

Number of heats

600

y¼ 500

K X

 r fr x

ð6Þ

r¼1

4.2. Classical relevance vector machine

400

Relevance vector machine (RVM) is a probabilistic model based on Bayesian theory. Consider a data set of input-target pairs fxi ; ti gNi¼1 as the training samples, where xi e Rn denotes an n-dimensional input vector and ti e R denotes a scalar-measured output. Furthermore, assume that the targets are independently sampled from the regression model with additional noise ei as follows:

300 200 100 0

14789

0

1

2

3

4

5

6

7

8

9

10

Coolant added in the second blow (tons)

ti ¼ yðxi ; wÞ þ ei

ð7Þ

Fig. 3. Histogram of coolant added in the second blow for 1000 groups of production data.

where ei is assumed to be the mean-zero Gaussian noise with variance r2, namely ei  N(ei|0, r2). Similar to SVM, the prediction function y(x; w) of RVM is defined as a linear combination of the weighted basis functions:

whereas RRVM is utilized to predict the endpoint carbon content and temperature.

yðx; wÞ ¼

4.1. Adaptive-network-based fuzzy inference system Adaptive-network-based fuzzy inference system (ANFIS) is an off-line learning model. It has been widely used in modeling and control of nonlinear systems by constructing a set of fuzzy if-then rules with appropriate membership functions (Melin & Castillo, 2005; Mon, 2007). Generally, an ANFIS model consists of five layers. The architecture is shown in Fig. 5. The fuzzy rules extracted from input–output pairs are described as (1).

Rr : if x1 is As11 and x2 is As22 . . . and xn is Asnn then f r ¼ fr ðx1 ; x2 ; . . . ; xn Þ;

r ¼ 1; . . . ; K

ð1Þ

where Rr denotes the rth fuzzy rule, and As11 ; . . . ; Asnn are the fuzzy sets associated with the input variables x1, . . . , xn. Function fr = fr(x1, x2, . . . , xn) is the output of the rth fuzzy rule. The different functions of five layers are described as follows: Layer 1: Input variables are fuzzificated and the membership of xl (l = 1, . . . , n) on different fuzzy sets are calculated according to formula (2). sl l

l ¼ lASl ðxl Þ

ð2Þ

l

where lASl ðÞ denotes the membership function of variable xl on l s s fuzzy sets Al l and ll l is the membership degree. Layer 2: Calculate the confidence degrees of fuzzy rules. As for the rth fuzzy rule, the degree of confidence is calculated as formula (3).

xr ¼ ls11  ls22    lsnn ; r ¼ 1; . . . ; K ,

K X

where K(x, xi) is a basis function, effectively define one basis function for each sample in training data set. The weight parameter vector is defined as w = [w0, . . . , wN]T. According to Eq. (7) and the noise assumption of ei, we have the Gaussian distribution over ti with mean y(xi; w) and variance r2, viz., p(ti|xi) = N(ti|y(xi; w), r2). For convenience, a hyperparameter b is defined as b = 1/r2. Therefore, the likelihood function of the complete training data set is expressed as

pðtjw; bÞ ¼



N=2   b b exp  kt  Uwk2 2p 2

ð9Þ

where t = [t1, t2, . . . , tN]T and U 2 RNðNþ1Þ defined as U ¼ ½/ðx1 Þ; /ðx2 Þ; . . . ; /ðxN ÞT , which is called design matrix. The definition of /ðxi Þis/ðxi Þ ¼ ½1; Kðxi ; x1 Þ; Kðxi ; x2 Þ; . . . ; Kðxi ; xN ÞT ; i ¼ 1; . . . ; N. The essence of training RVM is to determine the posterior distribution over weight vector w. In order to maintain sparse and maximize the likelihood function, the prior distribution over wj (j = 0, . . . , N) should be defined primarily. Assume that wj meets Gaussian distribution with mean-zero and variance aj1, hence the prior distribution over w is expressed as

pðwjaÞ ¼

N Y

Nðwj j0; a1 j Þ

ð10Þ

j¼0

which is a multivariate Gaussian distribution, where a = [a0, a1, . . . , aN]T and aj is the individual hyperparameter independently associated with each weight parameter wj. With the defined prior distribution (10) and the likelihood function (9), the posterior distribution over w can be computed on the basis of Bayesian rule:

pðwjaÞpðtjw; bÞ pðtja; bÞ

ð4Þ

Since p(w|a) and p(t|w, b) are all Gaussian, the product of these two distributions is also Gaussian. Furthermore, p(t|a, b) does not include w, so it is considered as a normalization coefficient. The posterior distribution over w is also Gaussian and can be expressed as:

p¼1

Layer 4: Calculate the output of each fuzzy rule according to formula (5). Here Takagi–Sugeno type fuzzy rules are adopted.

fr ¼ pr1 x1 þ pr2 x2 þ    þ prn xn þ qr

ð8Þ

pðwjt; a; bÞ ¼

!

xp ; r ¼ 1; . . . ; K

wi Kðx; xi Þ þ w0

i¼1

ð3Þ

Layer 3: All of the confidence degrees are normalized as:

 r ¼ xr x

N X

ð5Þ

pðwjt; a; bÞ ¼ Nðwjl; RÞ

ð11Þ

ð12Þ

where l is the mean value vector and R is the variance matrix, which are expressed as formulas (13) and (14), respectively:

14790

M. Han, Y. Zhao / Expert Systems with Applications 38 (2011) 14786–14798

Start a new heat

Measure temperature and estimate carbon content

ANFIS oxygen predictor

ANFIS coolant classifier

Add coolant?

N

Stop coolant calculation and wait until next heat starts

Y ANFIS coolant predictor

Finely adjust the volume of blown oxygen

Finely adjust the weight of added coolant

Predict the endpoint temperature and carbon content

N

Temperature hit?

Carbon content hit?

Y

N

Y Utilize the calculation results in real system

Next heat Fig. 4. Block diagram of dynamic control model for BOF steelmaking process.

R ¼ ðbUT U þ AÞ1

ð13Þ

l ¼ bRUT t

ð14Þ

where A = diag(a0, a1, . . . , aN). The posterior distribution over w are determined by hyperparameters b and a, thus the hyperparameters are optimized by using evidence procedure. The iterative optimization formulas for hyperparameters are (15) and (16), respectively.

aj ¼ 1=ðl2j þ Rjj Þ ¼ cj =l2j ; j ¼ 0; 1; . . . ; N b¼

P N  Nj¼0 cj kt  Ulk

ð15Þ

ð16Þ

where lj denotes the jth element of vector l and Rjj denotes the jth diagonal element of matrix R, cj = 1  ajRjj. In the process of train-

ing, formulas (13)–(16) are calculated iteratively. Most of aj tend toward infinity and the corresponding lj will tend toward zero. The training stops until all the hyperparameters are convergent or the maximum number of iterations is reached. 4.3. Robust relevance vector machine As inferred above, classical RVM is based on the assumption that the noise ei for each training sample is mean-zero Gaussian distribution with the same variance r2 (or hyperparameter b). However, in practical applications, measured data are always contaminated by outlying observations, which make the Gaussiannoise assumption untenable. This will compromise the robustness of RVM regression model and reduce its prediction accuracy. To alleviate this problem, researchers have proposed some modified

14791

M. Han, Y. Zhao / Expert Systems with Applications 38 (2011) 14786–14798

A11 …

x1

Π

ω1

ω1

N

f1

A1m1

x1 ,x2,..,xn

ω1 f1

A21 …

x2

Π

ω2

ω2

N

ω2 f 2

f2

m2 2

A

S

x1 ,x2,..,xn ……

……

……

……

……

ωk f k

An1



xn

Π

ωk

N

ωk

fk

Anmn

layer 1

y

x1 ,x2,..,xn

layer 2

layer 3

layer 4

layer 5

Fig. 5. The architecture of ANFIS.

methods. Faul and Tipping (2001) presented a variational relevance vector machine (VRVM) to deal with outliers. They introduced an explicit distribution which is incorporated in a mixture with the standard likelihood function to explain outliers, and utilized variational approximation to implement the inference strategy. Tipping and Lawrence (2005) modified RVM with the Student-t noise model the distribution of which had heavier tails than that of Gaussian noise model. However, the modified algorithm was also implemented by using variational approximation, which consumes more computation time. Yang, Zhang, and Sun (2007) proposed a trimmed relevance vector machine (TRVM) that redefined the likelihood function as a trimmed one. The outliers are eliminated during model training and a reweighted strategy is introduced to find the trimmed subset. The new algorithm is able to detect outliers and enhance the robustness of the model. The above modified strategies are mainly based on variational inference or trimming data set. In this section, a robust relevance vector machine is presented to reduce the impact of outliers and the model can still be implemented by using evidence procedure. Instead of setting the same noise variance for all the samples, we assume that each training sample has its individual coefficient of noise variance. Then, based on the Bayesian evidence framework, the iteration formulas are deducted to optimize the hyperparameters and noise variance coefficients. During the optimization process, the noise variance coefficients of outliers will decrease so as to detect and eliminate outliers. The detailed optimization procedure is as follows. In reference to Bayesian weighted linear regression (Ting, D’Souza, & Schaal, 2007), assume that the individual noise distribution of the ith training sample is:

pðei Þ ¼ Nðei j0; r2 =bi Þ; 2

i ¼ 1; . . . ; N

ð17Þ

where r denotes the average variance of all the training samples and bi denotes the noise variance coefficient of the ith sample. The prior distribution of bi is assumed to be Gamma distribution, namely

a

a 1

pðbi Þ ¼ Gammaðai ; bi Þ ¼ Cðai Þ1 bi i bi i ebi bi

ð18Þ

R1

with ‘‘gamma function’’ Cðai Þ ¼ 0 t ai 1 et dt. Define the vector b = [b1, b2, . . . , bN]T and the likelihood function of the complete training sample set will change from formula (9) to

pðtjw; b; r2 Þ ¼ ð2pr2 ÞN=2 jBj1=2   1  exp  2 ðt  UwÞT Bðt  UwÞ 2r

ð19Þ

where B = diag(b1, b2, . . . , bN), and j  j is the determinant of matrix. The definitions of t, w and U are the same as before. The prior distribution over w is still expressed as formula (10). According to Bayesian rule, the posterior distribution of w is computed as

pðwjt; a; b; r2 Þ ¼

pðwjaÞpðtjw; b; r2 Þ ¼ Nðwjl; RÞ pðtja; b; r2 Þ

ð20Þ

where the variance matrix R and mean value vector l can be computed by using formulas (21) and (22), respectively.

2

T

R ¼ ðA þ r U BUÞ

1

¼

Aþr

2

N X

!1 bi /ðxi Þ/ðxi Þ

T

ð21Þ

i¼1

2

T

2

l ¼ r RU Bt ¼ r R

N X

! bi /ðxi Þti

ð22Þ

i¼1

Since the computation formulas of variance matrix and mean value vector are both influenced by a, b and r2, these hyperparameters need to be optimized so as to maximize the posterior distribution of w. The optimization method is also based on Bayesian evidence framework (Mackay, 1992a, 1992b), which is implemented through maximizing the product of marginal likelihood function p(t|a, b, r2) and the prior distribution over b, Q pðbÞ ¼ Ni¼1 pðbi Þ. The marginal likelihood function is computed as follows:

14792

M. Han, Y. Zhao / Expert Systems with Applications 38 (2011) 14786–14798

pðtja; b; r2 Þ ¼

Z

pðtjw; b; r2 ÞpðwjaÞdw   1 ¼ ð2pÞN=2 jCj1=2 exp  tT C1 t 2

ð23Þ

where C = r2B1 + UA-1UT. Equivalently, we can optimize the logarithm of the product of p(t|a, b, r2) and p(b). Moreover, we maximize this quantity with respect to log a, log b and log r2 for convenience of computing. Therefore, the objective to be optimized is

log pðtj log a; log b; log r2 Þ þ

N X

log pðlog bi Þ

ð24Þ

Step 3: Optimize the hyperparameters a, b and r2 according to (26)–(28) iteratively. During the optimization procedure, many of aj will tend to infinity (it can be judged by a large threshold value, such as 109). From (26), this implies that lj will tend to zero and so will wi. The corresponding basis functions are pruned and the sparsity of model is realized. Step 4: Determine whether all the parameters are convergent or the maximal iteration number is reached. If so, stop iteration and training. If not, go back to Step 2. After the training is finished, the basis functions corresponding to non-zero lj are called ‘‘relevance vectors’’.

i¼1

Note that p(log bi) = bi  p(bi) and delete the terms which are independent of a, b and r2, we get the objective function

1  log jRj  log jAj þ N log r2  log jBj þ lT Al 2 N i X ðai log bi  bi bi Þ þr2 ðt  UlÞT Bðt  UlÞ þ

L¼

ð25Þ

i¼1

The optimized value of a, b and r2 cannot be obtained in closed form, and have to be re-estimated iteratively. Take the partial derivative of formula (25) with respect to log aj (j = 0, 1, . . . , N), log bi (i = 1, . . . , N) and log r2, and rearrange the equations to obtain the iteration formulas of a, b and r2. They are expressed as formulas (26)–(28), respectively (The detailed inference is in Appendix).

1

aj ¼ bi ¼

l2j þ Rjj

¼

cj l2j

ð26Þ ai þ 0:5

bi þ 0:5  ½r

r2 ¼

2 ðt

i

 /ðxi ÞT lÞ2 þ r2 trðR/ðxi Þ/ðxi ÞT Þ

ðt  UlÞT Bðt  UlÞ P N  Nj¼0 cj

ð27Þ

ð28Þ

where j = 0, . . . , N, i = 1, . . . , N. Rjj is the jth diagonal element of variance matrix R, cj = 1  ajRjj and tr() denotes the trace of matrix. Finally the iterative formulas for optimization are all obtained. Formulas (21), (22), (26), (27) and (28) are the iterative estimations of R, l and hyperparameters aj, bi, r2, respectively. In practical utilization of this algorithm, we should set the initialization of the priors used in formulas (21), (22), (26), (27) and (28). First of all, a and r2 can be initialized according to the characteristic of the data set, e.g. aj = N/var(t), r2 = var(t), where var(t) is the variance of t. Secondly, the scale parameters ai and bi, which are included in bi’s prior distribution Gamma(ai, bi), should be selected so that the prior means of bi are 1. For example, when the parameters are set as ai = 1 and bi = 1, the noise variance coefficient 2 bi has a prior mean of ai/bi = 1 with a variance of ai =bi = 1. That means we start by assuming the noise distributions of all the samples are Gaussian with the same variance, that is to say, all of the training samples are inliers. By using these values, it shows clearly that the range of bi is 0 < bi < 1.5, which could be inferred from formula (27). This setting of prior parameter values is generally valid for most applications or data sets. During the process of iteration, the bi corresponding to outliers will gradually become small. To sum up the above arguments, the whole training procedure of RRVM is as follows: Step 1: Initialize the hyperparameters a, b, r2 as well as ai and bi, i = 1, . . . , N. Step 2: Compute the variance matrix R and mean value vector l of posterior distribution over w by the use of formulas (21) and (22), respectively.

During the training process, formulas (21), (22), (26), (27) and (28) will be computed iteratively until the termination conditions are satisfied. Formula (27) reveals that the prediction error (ti  u(xi)Tl)2 of data point {xi, ti} is in the denominator. If the prediction error in ti is so large that it dominates over other denominator terms, then the corresponding noise variance coefficient bi of that point will be very small. When the prediction error term in the denominator tends to infinity, the bi will approach to zero. As can be seen from (21) and (22), the calculation formulas of R and l of the posterior distribution over w both include a term which is the linear weighted combination of all the samples, and the weight is exactly bi. If a sample has an extremely small coefficient, it will make smaller contribution to the estimate of R and l. This effect is equivalent to the detection and removal of an outlier if the coefficient of the data sample {xi, ti} is small enough, which can improve the robustness of the model. After training, RRVM can be used to make prediction based on the posterior distribution over w. For a new input datum x⁄, the output is y ¼ /ðx ÞT l. 5. Simulations and discussion In this section, we use the benchmark and industrial data to evaluate the performance of dynamic control model. The simulations consists of three parts: the first part is to calculate the amounts of oxygen and coolant based on ANFIS; the second part is to validate the robustness of RRVM by the use of benchmark data set; the third part is to predict the endpoint carbon content and temperature of molten steel by using RRVM. 5.1. Evaluation indicator To evaluate the performance of dynamic control model, two indicators are considered here. One is hit ratio and the other is root mean square error (RMSE). Hit ratio of the calculation and prediction model is defined as follows:

Hit ratio ¼

Nh  100% N

ð29Þ

where N is the size of all the test samples, and Nh is the size of test samples the prediction error of which is within the required range,

Table 1 Description of BOF steelmaking data. Variables Carbon content measured at the end of the first blow (0.01%) Temperature measured at the end of the first blow (°C) The volume of oxygen (m3) The amount of coolant (kg) Endpoint carbon content (0.01%) Endpoint temperature (°C)

Range 4.5–98.3

Mean

Standard variance

34.3

18.9

1517.41708.4

1627.9

28.5

201.03221.0 0.04089.0 2.310.0 1613.71730.4

914.1 446.7 5.6 1675.2

338.4 791.2 1.3 16.1

14793

M. Han, Y. Zhao / Expert Systems with Applications 38 (2011) 14786–14798

Fig. 6. Comparison between predicted value and actual value of the volume of oxygen.

Table 2 Comparison between CBR and ANFIS. Method

CBR ANFIS

The volume of oxygen

The amount of coolant

Hit ratio (%)

RMSE (m3)

Hit ratio (%)

RMSE (kg)

72 86

196.82 170.45

68 80

624.97 447.73

Table 3 RMSE comparison of two-dimensional sinc function approximation. Method

Classical RVM TRVM RRVM

RMSE (training sample set with different number of outliers) 0

10

20

40

0.04919 0.04755 0.04915

0.12933 0.04981 0.05074

0.14777 0.05415 0.05400

0.19132 0.08337 0.06774

Fig. 7. The distribution of heats which are added with coolant and which are not.

viz. jypredict  yactual j 6 e, where ypredict and yactual are predicted value and actual value of output, respectively. The hit ratio represents the computational accuracy of the model. Therefore, the bigger it is, the better the accuracy is. RMSE is defined as formula (30):

where the index i denotes the ith sample and the definitions of other variables are the same as before. RMSE represents the deviation of the predicted values from the actual ones, so the smaller it is, the better prediction result is.

vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u N u1 X RMSE ¼ t ðypredict  yactual Þ2 i N i¼1 i

5.2. Calculating the amounts of oxygen and coolant

ð30Þ

In this part, the volume of oxygen and the amount of coolant in the second blow period are calculated by using ANFIS. The ANFIS

Fig. 8. Comparison between predicted value and actual value of the amount of coolant.

14794

M. Han, Y. Zhao / Expert Systems with Applications 38 (2011) 14786–14798

Fig. 9. The surface of two-dimensional sinc function and training samples with 40 outliers.

Table 4 Comparison of endpoint prediction results. Method

SVM Classical RVM VRVM TRVM RRVM

Fig. 10. The approximation result of classical RVM.

toolbox of MATLAB 7.5 is utilized to implement the algorithm. One thousand groups of BOF steelmaking data, which are collected from a 150 ton furnace of Benxi Steel Sheet Co., Ltd. in China, are used for simulation. The description of the variables, including range, mean value and standard variance is listed in Table 1. Firstly, the volume of second-blow oxygen is calculated based on ANFIS.

Fig. 11. The approximation result of RRVM.

Endpoint carbon content

Endpoint temperature

Hit ratio (%)

RMSE (0.01%)

Hit ratio (%)

RMSE (°C)

92 89 90 89 93

1.18 1.30 1.25 1.26 1.22

70 74 73 75 76

10.76 10.71 10.68 10.49 10.42

The inputs of oxygen model are carbon content and temperature measured at the end of the first blow period. We select 400 groups of data for oxygen calculation, where 300 groups for model training and 100 groups for test. The result is shown in Fig. 6 and the curves of predicted value and actual value are both drawn. When the permissible error range is ±200 m3, the hit ratio is 86% and RMSE is 170.45 m3. As for coolant calculation, two ANFIS models should be trained. One is a classification model to determine whether the coolant should be added, the other is a regression model to calculate the amount of coolant. In the experiment, firstly whether the coolant needs to be added is determined by using the classifier. If it is required, the amount of coolant will be calculated by using regression model. The input variables of coolant model are also carbon content and temperature measured at the end of the first blow. Fig. 7 shows the distribution of the heats. The circles represent the heats which are added with coolant, and the crosses represent the ones which are not added. There is a boundary between two parts, thus the ANFIS classifier can be trained based on the data. Utilize the same 100 groups of data tested in oxygen model to calculate the amount of coolant. The result is shown in Fig. 8. The accuracy of ANFIS classifier is 88%. When permissible error range is ±500.0 kg, the hit ratio of coolant model is 80% and RMSE is 447.7 kg. In some cases, the classifier can make the right decision to add coolant, but the calculation error of regression model is out of the permissible range, so the hit ratio is less than classification accuracy. In 12 groups which are not classified correctly, 11 groups are determined not to add with coolant by ANFIS model, whereas they are added in fact. This may depend on the experience of different operators, and the classification accuracy of ANFIS model has come up to 88%, which is acceptable in practical BOF steelmaking. For comparison, we also use case based reasoning (CBR) to calculate the amounts of oxygen and coolant. Table 2 shows the comparison results between CBR and ANFIS. From the comparison, it demonstrates that ANFIS model can achieve better result than CBR.

M. Han, Y. Zhao / Expert Systems with Applications 38 (2011) 14786–14798

5.3. Two-dimensional sinc function approximation In order to evaluate the robustness of RRVM, two-dimensional sinc function, y(x1, x2) = sin|x1|/|x1| + 0.1x2, is utilized to take the approximation experiment. We assume the input vector x = [x1, x2]T. Both x1 and x2 are uniformly drawn from [10, 10] and y is corrupted by the Gaussian noise with mean-zero and standard variance 0.1. As for the choice of kernel function, through some test and analysis, it is found that linear spline kernel is more suitable than Gaussian kernel to implement the approximation experiment, thus we choose the linear spline function to be the kernel of RRVM. Define two input vectors xi = [xi1, xi2]T and xj = [xj1, xj2]T, then the kernel for two-dimensional spline function is generated as:

Kðxi ; xj Þ ¼ Kðxi1 ; xj1 ÞKðxi2 ; xj2 Þ

ð31Þ

For linear spline function, K(xi1, xj1) is expressed as follows (Vapnik, Golowich, & Smola, 1997):

Kðxi1 ; xj1 Þ ¼ 1 þ xi1 xj1 þ xi1 xj1 minðxi1 ; xj1 Þ 

xi1 þ xj1 2

14795

the test samples are noise-free. For comparison, two other methods are also implemented in the experiment, including classical RVM and TRVM (Yang et al., 2007). The RMSE comparison of three methods is listed in Table 3. When the training sample set excludes outliers, the RMSE of RRVM is very close to that of classical RVM but is worse than that of TRVM. When outliers are added, the approximation performance of classical RVM deteriorates drastically, while TRVM and RRVM can still get good results. With the increase of outlier number, RRVM can obtain better result than classical RVM and TRVM, which demonstrates that RRVM can effectively resist the impact of outliers and has good robustness. Fig. 9 shows the actual surface of two-dimensional sinc function and the distribution of training samples when 40 outliers are added. Figs. 10 and 11 are the approximation results of classical RVM and RRVM, respectively. It clearly shows that RVM cannot accurately approximate the function because of the outliers, while RRVM can reduce the impact of outliers and obtain better approximation result. 5.4. Endpoint prediction of BOF steelmaking

3

 minðxi1 ; xj1 Þ2 þ

minðxi1 ; xj1 Þ 3

ð32Þ

Similarly, K(xi2, xj2) has the same expression. The size of training sample set is 400. At first, we investigate the approximation performance of RRVM with the clean training sample set. Then, some outliers generated from standard Gaussian distribution are added into the training sample set. We interfuse 10, 20, and 40 outliers with the clean training samples, respectively. After training, the model is utilized to approximate the two-dimensional sinc function and

Endpoint carbon content [C]End and temperature TEnd prediction is the most important function of the dynamic control model. RRVM is utilized to predict the endpoint in this section. The data used for experiment is the same as described in Section 5.2. The size of data set is 400, which splits into 300 groups for training and 100 groups for test. The input variables are carbon content [C]S and temperature TS measured at the end of the first blow and the amount of oxygen in the second blow. In this simulation, the volume of oxygen is supposed to use the values calculated by

Fig. 12. Comparison between RRVM predicted value and actual value of endpoint carbon content.

Fig. 13. Comparison between RRVM predicted value and actual value of endpoint temperature.

14796

M. Han, Y. Zhao / Expert Systems with Applications 38 (2011) 14786–14798

ANFIS model. However, the test data is collected from practical BOF steelmaking process. Although the calculation accuracy of ANFIS model is more than 80%, the calculated amounts of oxygen are nevertheless a little different from the actual amounts. Furthermore, in the practical operation, the data of endpoint carbon content and temperature is generated by blowing the actual volume of oxygen. Therefore, if we use the calculated amount of oxygen as input to predict the endpoint in the simulation, the result cannot properly reflect the real performance of the endpoint prediction model. In this section, our concern is mainly to validate the prediction ability of RRVM endpoint model. Thus, it is more appropriate to utilize the actual volume of oxygen in the data set to predict the endpoint carbon and temperature. Moreover, the outputs are the decrease of carbon content and the rise of temperature in the second blow period, denoted as D[C] and DT, respectively. Then the prediction models are created as:

D½C ¼ f½C ð½CS ; T S ; V O2 Þ

ð33Þ

DT ¼ fT ð½CS ; T S ; V O2 Þ

ð34Þ

Based on the prediction results, the endpoint carbon content and temperature can be calculated as follows:

½CEnd ¼ ½CS  D½C

ð35Þ

T End ¼ T S þ DT

ð36Þ

In order to evaluate the prediction performance, we compare RRVM with another four methods, including SVM, classical RVM, VRVM (Faul & Tipping, 2001) and TRVM. The popular software LIBSVM is used to implement SVM (Chang & Lin, 2001). The Gaussian kernel function K(xi, xj) = exp{||xi  xj||2/c2} is chosen, and kernel width c of carbon content and temperature models are selected as 1.4 and 0.7, respectively. As for SVM, the parameters of e-insensitive loss function and regularization coefficient C are determined by using cross validation. Table 4 lists the comparison results. The permissible error range of carbon content prediction is ±0.02% and that of temperature prediction is ±12 °C. RRVM achieves better performance than other four methods, except that the RMSE of carbon content prediction is not as good as that of SVM. Figs. 12 and 13 are the comparisons between RRVM predicted value and actual value of carbon content and temperature, respectively. In addition, it shows that the prediction accuracy of temperature is not as good as that of carbon content. That’s because there are more factors involved in the prediction of endpoint temperature than in that of carbon content, such as splashing, heat transfer through furnace wall et al. In practical production, endpoint temperature is more difficult to predict and control. In the simulation, the hit ratio of RRVM temperature prediction model has exceeded 75%, which is acceptable in practical application. Furthermore, if we combine dynamic model control with operators’ experience, probably the control effectiveness will be improved much more.

FIS regression model. Secondly, endpoint prediction models are constructed by using robust relevance vector machine. It introduces the noise variance coefficients to modify classical RVM and can resist the impact of outliers during training. Using benchmark function for evaluation, it demonstrates that RRVM has better robustness in comparison with other methods. The experiments on industrial data also show that ANFIS and RRVM can obtain good performance both in hit ratio and RMSE. The proposed ANFIS model is able to accurately calculate the amounts of oxygen and coolant, and the RRVM endpoint prediction model can provide more promising results than other methods. Both of them have practical value in BOF steelmaking production. Acknowledgements This research was supported by the Project (2007AA04Z158) of the National High Technology Research and Development Program of China (863 Program), the Project (60674073) of the National Nature Science Foundation of China. Those supports are gratefully acknowledged. Appendix A. Derivation of iteration formulas (26)–(28) on hyperparameters a, b and r2 The objective function of hyperparameter optimization process is formula (25):

1  log jRj  log jAj þ N log r2  log jBj þ lT Al 2 N i X ðai log bi  bi bi Þ þr2 ðt  UlÞT Bðt  UlÞ þ

L¼

i¼1

A.1. The derivation of formula (26) Formula (26) is used to iteratively estimate the hyperparameters aj (j = 0, 1, . . . , N). Firstly take the partial derivative of formula (25) with respect to log aj and ignore the terms which are independent of log aj, it gives

 @L 1 @ ¼ ð log jAj þ lT Al  log jRjÞ @ log aj 2 @ log aj  1 @ log jAj @ @ aj þ ðlT Al  log jRjÞ  ¼  2 @ log aj @ aj @ log aj

ð37Þ

Because A = diag(a0, a1, . . . , aN) and l = [l0, l1, . . . , lN]T, we obtain the following equations: N Y @ log jAj @ ¼ log aj @ log aj @ log aj j¼0

! ¼

N X @ log aj @ log aj j¼0

@ lT Al @ T ¼ l  diagða0 ; a1 ; . . . ; aN Þ  l ¼ l2j @ aj @ aj

! ¼1

ð38Þ

ð39Þ

6. Conclusions

where lj is the jth element of l. According to formula (36) which is to calculate the derivatives of determinant (Roweis, 1999):

In this paper, a dynamic model based on ANFIS and robust relevance vector machine is presented to control the second blow period of BOF steelmaking process. Our main concern is to accurately calculate the volume of oxygen and the amount of coolant needed in the second blow, and predict the endpoint carbon content and temperature of molten steel. Firstly, the oxygen and coolant models are constructed based on ANFIS. A classifier is primarily trained to determine whether coolant should be added, and the amount of coolant will be calculated if it is required indeed. As for the volume of oxygen, it is directly calculated based on the AN-

  @ log jXðzÞj @X ¼ tr X1 @z @z

ð40Þ

we can obtain

  @ log jRj @ log jRj1 @ log jR1 j @R1 ¼ ¼ ¼ tr R @ aj @ aj @ aj @ aj

ð41Þ

From equation (21) in this paper, the quantity R1 ¼ ðA þ r2 UT BUÞ, therefore @R1 =@ aj is a sparse matrix most elements of which are

14797

M. Han, Y. Zhao / Expert Systems with Applications 38 (2011) 14786–14798

zero. Only the jth diagonal element of the matrix is 1. Consequently the value of formula (41) is

Set the above equation to zero and rearrange, we obtain the calculation formula of bi (i = 1, . . . , N), that is formula (27) in this paper:

  @ log jRj @R1 ¼ Rjj ¼ tr R @ aj @ aj

bi ¼

ð42Þ

where Rjj is the jth diagonal element of matrix R. Moreover, the equation oaj/olog aj = aj is hold. Put this result as well as (38), (39), (42) into formula (37), and it gives

bi þ 0:5  ½r

i

 /ðxi ÞT lÞ2 þ r2 trðR/ðxi Þ/ðxi ÞT Þ

A.3. The derivation of formula (28) Take the partial derivative of (25) with respect to log r2 and ignore the terms which are independent of log r2, and it gives

@L 1 ¼  ½1 þ aj ðl2j þ Rjj Þ @ log aj 2

" # @L 1 @ðlog jR1 j þ r2 ðt  UlÞT Bðt  UlÞÞ N þ ¼  @ log r2 2 @ log r2

Set this to zero and we can obtain the formula (26):

cj 1 aj ¼ 2 ¼ lj þ Rjj l2j

@ ðlog jR1 j þ r2 ðt  UlÞT Bðt  UlÞÞ @ log r2

A.2. The derivation of formula (27)

@L 1 @ ¼  ½ log jBj þ log jR1 j þ r2 ðt @ log bi 2 @ log bi  UlÞT Bðt  UlÞ þ

" # 1 @ log jBj @ log jR1 j @ðr2 ðt  UlÞT Bðt  UlÞÞ þ þ ¼  2 @ log bi @ log bi @ log bi N X @ þ ðai log bi  bi bi Þ @ log bi i¼1

ð43Þ Because B = diag(b1, b2, . . . , bN), so the first term of formula (43) is

¼

N X @ log bi @ log bi i¼1

!

¼1

2

ð44Þ

r2 

@ @ r2

    @R1 @ðA þ r2 UT BUÞ 2 ¼ log jR1 j ¼ r2  tr R r  tr R @ r2 @ r2 ¼ r2  trðr4 RUT BUÞ ¼ trðR  ðr2 UT BUÞÞ ¼ trðR  ðR1  AÞÞ ¼ trðI  R  AÞ

Moreover, because of cj = 1  ajRjj, the above formula can be further expressed as:

r2 

@ log jR1 j ¼ trðI  R  AÞ ¼ ðN  trðR  AÞÞ @ r2 ! N N X X ¼ N ðRjj aj Þ ¼  ð1  Rjj aj Þ j¼0

The second term is

¼

@ log jR1 j @ log jR1 j @bi @R1 ¼  ¼ trðR Þ  bi @ log bi @bi @ log bi @bi   @ ¼ tr R ðA þ r2 UT BUÞ  bi @bi ¼ r2  trðR/ðxi Þ/ðxi ÞT Þ  bi

j¼0

cj

ð50Þ

Then put (49) and (50) into formula (48), and we can get

ð45Þ

" !# N X @L 1 T 2 ¼ Nþ  cj  r ðt  UlÞ Bðt  UlÞ @ log r2 2 j¼0 Set the above equation to zero and rearrange, and we obtain the formula (28)

@ ½r2 ðt  UlÞT Bðt  UlÞ @ log bi @ @bi ½r2 ðt  UlÞT Bðt  UlÞ  ¼ @bi @ log bi

r2 ¼ ð46Þ

and the fourth one is N X @ ðai log bi  bi bi Þ ¼ ai  bi bi @ log bi i¼1

N X j¼0

The third term of (43) is

¼ r2 ðti  /ðxi ÞT lÞ2  bi

ð49Þ

The first term of formula (49) is calculated as follows:

N X @ ðai log bi  bi bi Þ @ log bi i¼1

!

@ r2 ðlog jR1 j þ r2 ðt  UlÞT Bðt  UlÞÞ  @r @ log r2  @ ¼ log jR1 j  r4 ðt  UlÞT Bðt  UlÞ  r2 @ r2 @ ¼ r2  2 log jR1 j  r2 ðt  UlÞT Bðt  UlÞ @r @

¼

Formula (27) is used to iteratively estimate the hyperparameters bi ði ¼ 1; . . . ; NÞ. The partial derivatives of (25) with respect to log bi is

ðt  UlÞT Bðt  UlÞ P N  Nj¼0 cj

These are the derivations of iteration formulas (26)–(28) on hyperparameters a, b and r2. References

ð47Þ

Then put (44)–(47) into formula (43) and it gives

@L 1 ¼  ½1 þ r2  trðR/ðxi Þ/ðxi ÞT Þ  bi þ r2 ðt i  /ðxi ÞT lÞ2 @ log bi 2  bi  þ ai  bi bi

ð48Þ

where

where j = 0, . . . , N and cj = 1  ajRjj.

N Y @ log jBj @ ¼ log bi @ log bi @ log bi i¼1

ai þ 0:5 2 ðt

Birk, W., Johansson, A., Medvedev, A., & Johansson, R. (2002). Model-based estimation of molten metal analysis in the LD converter: experiments at SSAB Tunnplåt AB in Luleå. IEEE Transactions on Industry Applications, 38(2), 560–565. Blanco, C., & Diaz, M. (1993). Model of mixed control for carbon and silicon in a steel converter. ISIJ International, 33(7), 757–763. Bloch, G., Sirou, F., Eustache, V., & Fatrez, P. (1997). Neural intelligent control for a steel plant. IEEE Transactions on Neural Networks, 8(4), 910–918. Chang, C. C., & Lin, C. J. (2001). LIBSVM: A library for support vector machines. Software available from .

14798

M. Han, Y. Zhao / Expert Systems with Applications 38 (2011) 14786–14798

Chou, K. C., Pal, U. B., & Reddy, R. (1993). A general model for BOP decarburization. ISIJ International, 33(8), 862–868. Cox, I. J., Lewis, R. W., Ransing, R. S., Laszczewski, H., & Berni, G. (2002). Application of neural computing in basic oxygen steelmaking. Journal of Material Processing Technology, 120(1–3), 310–315. Das, A., Maiti, J., & Banerjee, R. N. (2009). Process control strategies for a steel making furnace using ANN with Bayesian regularization and ANFIS. Expert Systems with Applications, 37(2), 1075–1085. Dippenaar, R. (1999). Towards intelligent steel processing. In Proceedings of the 2nd international conference on intelligent processing and manufacturing of materials, Honolulu, USA (pp. 75–84). Esen, H., Ozgen, F., Esen, M., & Sengur, A. (2009). Modelling of a new solar air heater through least-squares support vector machines. Expert Systems with Applications, 36(7), 10673–10682. Faul, A. C., & Tipping, M. E. (2001). A variational approach to robust regression. Lecture Notes in Computer Science, 2130, 95–102. Fileti, A. M. F., Pacianotto, T. A., & Cunha, A. P. (2006). Neural modeling helps the BOS process to achieve aimed end-point conditions in liquid steel. Engineering Applications of Artificial Intelligence, 19(1), 9–17. Han, M. & Huang, X. Q. (2008). Greedy kernel component acting on ANFIS to predict BOF steelmaking endpoint. In Proceedings of the 17th world congress of the international federation of automatic control, Seoul, Korea (pp. 11007–11012). Iida, Y., Emoto, K., Ogawa, M., Masuda, Y., Onishi, M., & Yamada, H. (1984). Fully automatic blowing technique for basic oxygen steelmaking furnace. ISIJ International, 24, 540–546. Jang, J. S. R. (1993). ANFIS: Adaptive-network-based fuzzy inference system. IEEE Transactions on Systems, Man and Cybernetics, 23(3), 665–685. Johansson, A., Medvedev, A., & Widlund, D. (2000). Model-based estimation of metal analysis in steel converters. In Proceedings of the 39th IEEE conference on decision and control, Sydney, Australia (pp. 2029–2034). Mackay, D. J. C. (1992a). Bayesian interpolation. Neural Computation, 4(3), 415–477. Mackay, D. J. C. (1992b). The evidence framework applied to classification networks. Neural Computation, 4(5), 720–736. Melin, P., & Castillo, O. (2005). Intelligent control of a stepping motor drive using an adaptive neuro-fuzzy inference system. Information Sciences, 170(2–4), 133–151. Mon, Y. J. (2007). Airbag controller designed by adaptive-network-based fuzzy inference system (ANFIS). Fuzzy Sets and Systems, 158(24), 2706–2714. Müller, K. R., Mika, S., Rätsch, G., Tsuda, K., & Schölkopf, B. (2001). An introduction to kernel-based learning algorithms. IEEE Transactions on Neural Networks, 12(2), 181–201. Narendra, K. S., & Parthasarathy, K. (1990). Identification and control of dynamical systems using neural networks. IEEE Transactions on Neural Networks, 1(1), 4–27. Pernía-Espinoza, A., Castejón-Limas, M., González-Marcos, A., & Lobato-Rubio, V. (2005). Steel annealing furnace robust neural network model. Ironmaking and Steelmaking, 32(5), 418–427.

Radhakrishnan, V. R., & Mohamed, A. R. (2000). Neural networks for the identification and control of blast furnace hot metal quality. Journal of Process Control, 10(6), 509–524. Roweis, S. (1999). Matrix identities. Document available from . Sun, Z. H., & Sun, Y. X. (2005). Soft sensor based on relevance vector machines for microbiological fermentation. Developments in Chemical Engineering and Mineral Processing, 13(3–4), 243–248. Ting, J. A., D’Souza, A., & Schaal, S. (2007). Automatic outlier detection: A Bayesian approach. In Proceedings of 2007 IEEE international conference on robotics and automation, Roma, Italy (pp. 2489–2494). Tipping, M. E. (2000). The relevance vector machine. Advances in neural information processing systems (Vol. 12). Cambridge, Massachusetts, USA: MIT Press, pp. 652–658. Tipping, M. E. (2001). Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research, 1(3), 211–244. Tipping, M. E., & Lawrence, N. D. (2005). Variational inference for student-t models: Robust Bayesian interpolation and generalised component analysis. Neurocomputing, 69(1–3), 123–141. Valyon, J., & Horváth, G. (2009). A sparse robust model for a Linz-Donawitz steel converter. IEEE Transactions on Instrumentation and Measurement, 58(8), 2611–2617. Vapnik, V. N. (2000). The nature of statistical learning theory (second ed.). New York, USA: Springer-Verlag. Vapnik, V. N., Golowich, S. E., & Smola, A. (1997). Support vector method for function approximation, regression estimation, and signal processing. Advances in neural information processing system (Vol. 9). Cambridge, Massachusetts, USA: MIT Press, pp. 281–287. Vong, C. M., Wong, P. K., & Li, Y. P. (2006). Prediction of automotive engine power and torque using least squares support vector machines and Bayesian inference. Engineering Applications of Artificial Intelligence, 19(3), 277–287. Wei, L. Y., Yang, Y. Y., Nishikawa, R. M., Wernick, M. N., & Edwards, A. (2005). Relevance vector machine for automatic detection of clustered microcalcifications. IEEE Transactions on Medical Imaging, 24(10), 1278–1285. Widodo, A., Kim, E. Y., Son, J. D., Yang, B. S., Tan, A. C. C., Gu, D. S., et al. (2009). Fault diagnosis of low speed bearing based on relevance vector machine and support vector machine. Expert Systems with Applications, 36(3), 7252–7261. Yang, B., Zhang, Z. K., & Sun, Z. S. (2007). Robust relevance vector regression with trimmed likelihood function. IEEE Signal Processing Letters, 14(10), 746–749. Yuan, J., Wang, K., Yu, T., & Fang, M. (2007a). Integrating relevance vector machines and genetic algorithms for optimization of seed-separating process. Engineering Applications of Artificial Intelligence, 20(7), 970–979. Yuan, P., Mao, Z. Z., & Wang, F. L. (2007b). Endpoint prediction of EAF based on multiple support vector machines. Journal of Iron and Steel Research, International, 14(2), pp. 20–24, 29. Zhang, R., & Wang, S. (2008). Support vector machine based predictive functional control design for output temperature of coking furnace. Journal of Process Control, 18(5), 439–448.