Optimization of gear blank preforms based on a new R-GPLVM model utilizing GA-ELM

Optimization of gear blank preforms based on a new R-GPLVM model utilizing GA-ELM

Knowledge-Based Systems 83 (2015) 66–80 Contents lists available at ScienceDirect Knowledge-Based Systems journal homepage: www.elsevier.com/locate/...

4MB Sizes 0 Downloads 26 Views

Knowledge-Based Systems 83 (2015) 66–80

Contents lists available at ScienceDirect

Knowledge-Based Systems journal homepage: www.elsevier.com/locate/knosys

Optimization of gear blank preforms based on a new R-GPLVM model utilizing GA-ELM Zhiyong Cao a,b, Juchen Xia a, Mao Zhang a, Junsong Jin a, Lei Deng a, Xinyun Wang a,⇑, June Qu b a

State Key Laboratory of Materials Processing and Die & Mould Technology, Huazhong University of Science and Technology, 1037 Luoyu Road, 430074 Wuhan, Hubei, China Hubei Collaborative Innovation Center for Advanced Organic Chemical Materials, Key Laboratory of Green Preparation and Application for Materials, Ministry of Education, Hubei University, 368 Youyi Road, 430062 Wuhan, Hubei, China b

a r t i c l e

i n f o

Article history: Received 15 April 2014 Received in revised form 9 March 2015 Accepted 13 March 2015 Available online 24 March 2015 Keywords: Preform optimization R-GPLVM ELM Gear blank GA

a b s t r a c t The determination of the key dimensions of gear blank preforms with complicated geometries is a highly nonlinear optimization task. To determine critical design dimensions, we propose a novel and efficient dimensionality reduction (DR) model that adapts Gaussian process regression (GPR) to construct a topological constraint between the design latent variables (LVs) and the regression space. This procedure is termed the regression-constrained Gaussian process latent variables model (R-GPLVM), which overcomes GPLVM’s drawback of ignoring the regression constrains. To determine the appropriate sub-manifolds of the high-dimensional sample space, we combine the maximum a posteriori method with the scaled conjugate gradient (SCG) algorithm. This procedure can estimate the coordinates of preform samples in the space of LVs. Numerical experiments reveal that the R-GPLVM outperforms the pure GPR in various dimensional spaces, when the proper hyper-parameters and kernel functions are solved for. Results using an extreme learning model (ELM) obtain a better prediction precision than the back propagation method (BP), when the dimensions are reduced to seven and a Gaussian kernel function is adopted. After the seven key variables are screened out, the ELM model will be constructed with realistic inputs and obtains improved prediction accuracy. However, since the ELM has a problem with validity of the prediction, a genetic algorithm (GA) is exploited to optimize the connection parameters between each network layer to improve the reliability and generalization. In terms of prediction accuracy for testing datasets, GA has a better performance compared to the differential evolution (DE) approach, which motivates the choice to use the genetic algorithm-extreme learning model (GA-ELM). Moreover, GA-ELM is employed to measure the aforementioned DR using engineering criteria. In the end, to obtain the optimal geometry, a parallel selection method of multi-objective optimization is proposed to obtain the Pareto-optimal solution, while the maximum finisher forming force (MFFF) and the maximum finisher die stress (MFDS) are both minimized. Comparative analysis with other numerical models including finite element model (FEM) simulation is conducted using the GA optimized preform. Results show that the values of MFFF and MFDS predicted by GA-ELM and R-GPLVM agree well with the experimental results, which validates the feasibility of our proposed methods. Ó 2015 Elsevier B.V. All rights reserved.

1. Introduction Flashless forging of gears has been widely used because of its high production efficiency, good product performance and low material waste compared to machining [1,2]. Presently, methods that rapidly forge the gear are first blanked, and then the teeth are machined to obtain the final part. This combines the rapidity of flashless die forging with the high accuracy of machining

⇑ Corresponding author. Tel./fax: +86 27 87543491. E-mail address: [email protected] (X. Wang). http://dx.doi.org/10.1016/j.knosys.2015.03.010 0950-7051/Ó 2015 Elsevier B.V. All rights reserved.

[3–6]. Due to the complexity of the gear structure, gears are formed mostly by multi-station forging processes to effectively reduce the forming force and die stress, aiming to increase the deformation uniformity and improve the die life. In the past ten years, finite element model (FEM) methods, together with sensitivity analysis and reverse trace simulation have been employed to optimize the design of preforms [7–13]. However, due to the complexities of the calculation, the time-consuming nature of these methods, and the limitations to the accuracy of the boundary conditions assumed, these technologies are rarely applied to complex preforms. Meanwhile, FEM has certain limitations because its assumptions for the calculation are based on a simplified model [14].

Z. Cao et al. / Knowledge-Based Systems 83 (2015) 66–80

The goal of this proposal is to predict the finished forming force for an optimized preform. However, the prediction models for the preforms are nonlinear, and the traditional empirical approaches and the FEM method cannot always address the difficulties involved in complex preforms. A machine-learning model (MLM) can deal with these nonlinear problems that pose difficulties in mathematical modeling. For example, BP neural networks (BPNN) are well suited to address these problems, and they have been used in the prediction of formation loads of metal [15–18]. There also have been cases in which BPNN has been employed to optimize preforms [19,20]. Nevertheless, BPNN learning algorithms have their own shortcomings that have become the main impediments to wider adoption, namely slow training speed and inadequate generalization performance [21]. Extreme learning machine (ELM) algorithms based on single hidden layer feed-forward neural networks (SLFN) overcome these disadvantages, and thus are used in a wide range of applications in engineering [22–26]. However, there are two problems in the training of the ELM network model. Firstly, compared to other gear blanks, there are more design variables in the preform and relatively less training data can be obtained from the experiment. This makes it necessary to conduct dimensionality reduction (DR) on the input variables to remove the redundant information of the input data so that the regression analysis subsequently can obtain a more accurate prediction model. Secondly, there is inevitably a mix of some unwanted and unoptimized data within the selected data. This arises from the inherent randomness in the weights between the input and hidden layers, as well as the bias of the hidden layer neurons. This randomness may sometimes lead to degeneracies among the columns of the output matrix of the hidden layer [27], rendering the model used to train output weights unsolvable. This limits the predictability of ELM. Meanwhile, Huang [28], who proposed ELM originally, indicates that since evolutionary algorithms (EAs) are widely used as a global searching method for optimization, the hybrids of EA and analytical methods should be promising for network training. Huang proposed a modified differential evolution (DE) to globally search for the optimal input weights and hidden biases, while the Moore–Penrose generalized inverse is used to analytically calculate the output weights. Although many versions of DE have been determined to be effective approaches to optimize parameters, widespread adoption is limited due to susceptibility to premature and over-fitting, which limit its effectiveness of one-to-one parameters adjustment in practice. Genetic algorithm (GA) overcomes these disadvantages, and also has the advantage of amenability to parallel adjustment [29,30]. Thus, it is worthwhile to explore the usage of GA as another practical global searching method for the optimization for the weights and biases. The key to solving DR problems is to find the low-dimensional (sub-manifold) that is embedded in the higher-dimensional space. Linear mapping is adopted in many DR methods such as singular value decomposition (SVD) [31], factor analysis (FA) and principal component analysis (PCA) [32]. Since many realistic sample data are nonlinearly distributed, the linearity hypothesis usually results in poor model performance. As a result, nonlinear DR methods have become popular recently, including Gaussian process latent variables models (GPLVM) [33], local linear embedding (LLE) [34], kernel principle component analysis (KPCA) [35] and ISO metric mapping (ISOMAP). Among them, GPLVM is a very prominent nonlinear mapping model. It overcomes the limitation of the linear DR method but is still able to find the low dimensional manifold hidden in the observational data, even when the observation sample data is relatively small. GPLVM is very suited to handle the high-dimensional data of small samples encountered in the gear blank preform optimization problems considered in this paper. However, the first proposed GPLVM only takes the observation sample space itself into consideration, regardless of the regression

67

constraint corresponding to that space. This limits the utility in many practical situations. Therefore, in this paper, a regressionconstrained Gaussian process latent variables model (R-GPLVM) is proposed. Firstly, the relationship between the latent variables (LVs) and the observational data is established using GPLVM, and then GPR is exploited as the topology of constraint to accomplish the DR of the higher-dimensional observational data. Finally, a follow-up genetic algorithm-extreme learning model (GA-ELM) is employed to measure the aforementioned DR using the engineering criteria. As far as the generalization and stability, Bartlett’s theory about the generalization performance of feed-forward learning networks has pointed out that smaller connection weight norms leads to smaller output errors of the network training, resulting in better generalization performance [36]. Many researchers use GA, DE and other evolutional algorithms to promote generalization performance, but there are rarely reports on the optimization of ELM, especially using the approach of GA, except for Dr. Huang and Refs. [37,38]. In Refs. [37,38], the authors proposed an artificial bee colony algorithm (ABC) and particle swarm optimization (PSO) to tune weights and biases of ELM to improve the generalization performance. However, most of the datasets used in their experiments are based upon huge benchmark data, which may generate unbearable time cost if GA optimization is adopted. As we have indicated that GPLVM is very suitable for handling the high-dimensional data of small samples, the time cost of using GA can be alleviated in our experiments. Owing to the unique advantage of global optimization of GA [39–41], we propose a tuned GA to improve the effectiveness, generalization and stability of ELM prediction models of the preform, even though the approach has a relatively higher time cost. Moreover, unlike other approaches of single fitness functions, we use two fitness functions to optimize the connection parameters between each ELM network layer, namely the output weights of hidden layer neurons and the root mean square error (RMSE). In cases where the RMSE of the whole training set is used as a fitness function, network overfitting will be easily encountered. To mitigate this, training and alidation datasets are used without repetition. Moreover, since the minimum norm of connection weights of the training dataset is obtained by solving the least square solution, the individual training errors of each training dataset are similar. Thus, in this paper, the RMSE of the testing dataset, rather than that of the whole training dataset, is chosen as a fitness function to improve the efficiency of the model. Until now, there are few reports on the GA-ELM method for regression issues. This is mainly because of the insufferable time cost for the huge benchmark data with high-dimensionality. In literature [42], the authors presented a revised genetic algorithm based on ELM. They indicated that by utilizing the exceptional function estimation and continuously retraining the ELM based evolution operators, the global search and convergence capabilities of GA could be improved. Different from literature [42], to obtain a reliable convergent solution, we adjusted the selective strategy of genetic operators, i.e., besides the root mean square error (RMSE), the norm of the output weights of the hidden layer was introduced as the convergent criteria. When combining the ELM with the adjusted GA, the calculation results demonstrated a better convergent performance. It should be emphasized that since a compact network leads to faster response of the trained networks, we proposed a new R-GPLVM method to compact the network with reduced dimensions. This method can accelerate the response of GA since the amount of data has become smaller after dimensions reduction. In literature [43], the researchers explored a novel feature selection method for classification issues based on a special GA using restricted search in conjunction with ELM. Unlike the feature selection algorithm in literature [43], we adopted GA method

68

Z. Cao et al. / Knowledge-Based Systems 83 (2015) 66–80

to tune weights and biases of ELM to improve the generalization performance for regression problems. Also in literature [44], the authors proposed a new model termed the genetic algorithmextreme learning machine-particle swarm optimization (GAELM-PSO) classifier for magnetic resonance imaging (MRI) images. They use ELM for classification, with its performance optimized by the PSO and GA method for feature selection. They also use GA to reduce the high dimensional features needed for classification. Differed from these, we use a new R-GPLVM method to conduct dimensionality reduction (feature selection) for regression issues, and we have analyzed that the R-GPLVM is very suitable for small samples with higher-dimensional data. Lastly, in literature [45], the authors carry out a statistical study of the micro-genetic algorithm (l-GA) ELM algorithm. They analyzed the performance of the l-GA ELM only in small 1-dimensional problems with small population and very few generations. By contrast, our method can cope with not only 1-dimensional problems, but also highdimensions issues with large population and generations. In summary, the major differences between our method and other GA-ELM methods are as followed: (1) Most of the previous GA methods including traditional evolutionary ELM are based on classification issues, few reports are associated with regression problems. Meanwhile, our proposed GA-ELM method can be used to cope with regression issues. (2) The previous GA-ELM procedures need more and more convergence iterations, runtime and convergence time, with the growth of the dimension. By comparison, in our scheme, a newly dimensionality reduction method R-GPLVM is proposed to decrease the amount of data, to accelerate the response of GA optimization. The optimization of the preform design model is the ultimate goal of this work, minimizing the maximum finisher forming force (MFFF) and maximum finisher die stress (MFDS) when the finisher cavity is just filled. These GA multi-objective optimization methods already have some engineering applications [46–49]. In this paper, a parallel selection method is exploited to solve the multi-objective optimization based on the ELM network model. That is, two prediction models of MFFF and MFDS calculated from GA-ELM network (DR optimized) serve as sub-goal functions. All the individuals of the genetic population are equally divided among the two sub-goal functions. Each subgroup has a sub-goal function with the selection operation conducted independently. Some individuals with high fitness are selected in each sub-goal function, and then these subgroups are merged into a larger group. The division, parallel selection and merging functions are conducted repeatedly. Finally, the Pareto-optimal solutions of the two goal optimization functions are solved. The rest of this paper is organized as follows. In Section 2, the modeling of R-GPLVM and the optimization of the hyper-parameters are mentioned, with the latent variables analyzed using a probability analysis and the scaled conjugate gradient (SCG) method [50]. In Section 3, a new GA-ELM framework seamlessly integrated with R-GPLVM is proposed, and the corresponding theories and building of the ELM model of the preform are introduced, with connection parameters between each ELM network layer optimized by GA. In Section 4, the prediction accuracy of GA-ELM, differential evolution-extreme learning machine (DE-ELM), ELM and BP, and the results before and after DR of R-GPLVM are compared. The SCG optimizations of the hyperparameters are illustrated in comparison with the CG method, and the prediction accuracies using different kernel functions in various dimensional spaces are discussed. The multi-objective parallel selection method via GA is employed to obtain a Pareto-optimal solution. The optimized preform is applied into the actual production with comparison to the FEM simulation. In Section 5, conclusions are presented.

2. Model construction 2.1. Bases of Gaussian process regression and latent variables models The nonlinear regression model with Gaussian processes can be defined as follows:

Gi ¼ f ðxi Þ þ r2 I

ð1Þ

Here, r2 I is the noise term, the function f ðxi Þ is a Gaussian process defined as:

f ðxi Þ  GP ð0; kðxi ; x0 ÞÞ

ð2Þ

As can be seen from the above formula, the relation between X ¼ ½x1 ; x2 ; . . . ; xN T and G ¼ ½g 1 ; g 2 ; . . . ; g N T can be defined with the Gaussian process:

  G  N 0; K þ r2 I

ð3Þ

According to Eq. (3), we can obtain the log-likelihood function of GPR:

 1 1 1  1 ln pðGjX Þ ¼  GT ðK þ r2 IÞ G  GT K þ r2 I  ln 2p 2 2 2

ð4Þ

Here, K is the covariance function matrix, namely, the kernel function. Since the model parameters include the kernel function and noise terms, the training of the model is mainly determined by the kernel function. The latent variables model (LVM) is established based on the following assumptions. (1) The errors and individual differences can be filtered out as much as possible in the observation samples and LVs, i.e., the samples are independent and identically distributed (IID). (2) There exits an internal relation between the high-dimensional observation data and low-dimensional LVs. As LVM is established based on a probabilistic framework, it has the ability to estimate the probability distribution from small samples, and it is more conducive to estimate the high-dimensional data of small samples. A commonly used method is the kernel function, and different kernel functions will have different LVMs. It can be assumed that the mapping f d in each dimension is IID with the Gaussian process as follows:

pðf Þ ¼

D Y

pðf d Þ ¼

d¼1

D Y

Nðf d j0; kÞ

ð5Þ

d¼1

The likelihood of the observation data with d-dimensions can be expressed as:

  p y:;d jX; n1 ¼

Z

    p y:;d jxn ; f d ; n1 pðf d Þdfd ¼ N y:;d j0; k

ð6Þ

Here y:;d denotes the dth dimension for all samples. As the dimensions are independent from one another, the likelihood of the observed data can be represented with the product of the likelihood of each spatial dimension:

pðYjX; n1 Þ ¼

D Y 

y:;d jX; n1



d¼1

¼

D Y d¼1 ð2

1 N=2

1=2

pÞ jKj

  1 exp  yT:;d K 1 y:;d 2

ð7Þ

  1  exp  tr K 1 YY T 2

ð8Þ

Eq. (7) can be expressed as:

pðYjX; n1 Þ ¼

D Y d¼1 ð2

1

pÞN=2 jKj1=2

Here, K is the kernel function matrix. In this paper, the Gaussian radial basis kernel function is exploited and can be formalized as follows:

69

Z. Cao et al. / Knowledge-Based Systems 83 (2015) 66–80

    T   1 1 k xi ; xj ¼ 3=2 exp  xi  xj xi  xj 2 p

ð9Þ

  Here, k xi ; xj is the ijth element in the kernel function matrix, which will be discussed in Section 4.4. 2.2. R-GPLVM objective function Assuming that Y ¼ ½y1 ; y2 ; . . . ; yN T represents the observation sample dataset, each sample yi is a D-dimension feature vector yi 2 RD . Defining G ¼ ½g 1 ; g 2 ; . . . ; g N T as the regression matrix corresponding to the samples Y, its dimension can be determined from the dimension of the output matrix with a 1-dimensional (or M-dimensional) regression value. In our test cases, the matrix G is a 2-dimensional regression value, and LVs can be expressed by X ¼ ½x1 ; x2 ; . . . ; xN T ; xi 2 RD . In order to effectively utilize the regression of observation samples, the relationship between the observation data space Y and the regression value space G is built through the LVs space, as shown in Fig. 1. Firstly, the LVs can be initialized using the singular value decomposition (SVD) method, and subsequently the relationship between X and Y is established via GPLVM. Meanwhile, the relationship between X and G is constructed using a GPR model. Thus, we can constrain the previous DR process using the regression values. In addition, the regression values of the testing datasets can be obtained with the model. Finally, the previous DR process and the regression results are evaluated by using GA-ELM model introduced afterward. R-GPLVM can be expressed by Eq. (10):



Y ¼ f Y ðX; n1 Þ þ kY

ð10Þ

G ¼ f G ðX; n2 Þ þ kG

pðy; gjxÞ ¼ pðyjxÞpðgjxÞ

ð11Þ

The joint likelihood of the observed data can thus be given as: N Y

pðyn ; g n jxn Þ ¼

n¼1

N Y

pðyn jxn Þpðg n jxn Þ

n¼1

¼ pðYjX ÞpðGjX Þ

ð12Þ

Bayes’ theorem relates the probabilities as:

pðY; GjX ÞpðXÞ pðXjY; GÞ ¼ pðY; GÞ

ð13Þ

By combining Eq. (12) with Eq. (13), Eq. (13) can be rewritten as:

pðYjX ÞpðGjX ÞpðXÞ pðXjY; GÞ ¼ pðY; GÞ

ð14Þ

 1 1 1  1 ¼  GT ðK n2 þ r2 IÞ G  GT K n2 þ r2 I  ln 2p 2 2 2

ð17Þ

Here, K n1 ¼ K n1 ðx; x0 Þ is the GPLVM kernel function, and K n2 ¼ K n2 ðx; x0 Þ is the GPR kernel function. As for Eq. (15), since ln pðY; GÞ does not directly influence X, maximizing the logarithm of the posterior of pðYjX Þ is equivalent to maximizing the joint log-likelihood of GPLVM, GPR, and the prior probability pðXÞ. The objection function of R-GPLVM can then be expressed with the following fonctionelle:

fX; jn1 ; n2 g ¼ arg maxfln pðYjX Þ þ ln pðGjX Þ þ ln pðXÞg

ð18Þ

ln pðXjY; GÞ ¼ ln pðYjX Þ þ ln pðGjX Þ þ ln pðXÞ  ln pðY; GÞ

ð15Þ

The first term in the right-hand side of Eq. (15) is the log-likelihood of GPLVM.

  1 DN D T ln 2p  ln K n1   trðK 1 n1 YY Þ 2 2 2

2.3. Hyper-parameter optimization n o The maximization of X in Eq. (18) is equal to arg min F  n1 F n2 , X

according to Eqs. (11)–(17), and the modeling process is as same as the process of training the hyper-parameter n1 , n2 and determining LVs X. Since R-GPLVM is established based upon a Gaussian process, the probability distribution of the data can be obtained. In addition, the posterior probability and coordinates of LVs X in the low-dimensional space can be calculated and determined further using Bayes’ theorem. In this paper, the posterior probability of the observation data Y is maximized with the SCG method, with the optimized LVs X and the hyper-parameter n1 and n2 output. The derivative of X in Eq. (15) is:

@ ln pðXjY; GÞ @ ln pðYjX Þ @ ln pðGjX Þ @ ln pðXÞ ¼ þ þ @X @X @X @X @ ln pðY; GÞ  @X

ð16Þ

The second term in the right-hand side of Eq. (15) is the loglikelihood of GPR.

ð19Þ

Since the full probability pðY; GÞ for the derivative of X is zero, Eq. (19) is as same as the following based on the chain rule:

@ ln pðXjY; GÞ @ ln pðYjX Þ @K n1 @ ln pðGjX Þ @K n2 ¼  þ  @X @K n1 @K n2 @X @X

We take the logarithm on both sides of Eq. (14).

F n1 ¼ ln pðYjX; n1 Þ ¼ 

F n2 ¼ ln pðGjX; n2 Þ

X;n1 ;n2

Here, the function f Y represents the mapping relation from the low-dimensional latent space X to the high-dimensional space Y and n1 represents the hyper-parameter of DR mapping. The function f G represents the regression mapping from X to G and the hyper-parameter of mapping can be expressed by n2 . The parameters kY and kG respectively represent the noise with a Gaussian distribution assumed. In the LV model, the observation data (Y; GÞ are independent from each dimension, namely,

pðY; GjX Þ ¼

Fig. 1. The schematic of R-GPLVM regression model.

ð20Þ

Here,

@ ln pðYjX Þ T 1 1 ¼ K 1 n1 YY K n1  DK n1 ; @K n1 @ ln pðGjX Þ T 1 1 ¼ K 1 n2 GG K n2  MK n2 @K n2 @kn

@K n

ð21Þ

The derivatives @X1 and @X2 will have different values according to the selection of the kernel function, and the gradient of the

70

Z. Cao et al. / Knowledge-Based Systems 83 (2015) 66–80

hyper-parameters n1 , n2 of the log-likelihood can be obtained as follows:

@ ln pðXjY; GÞ @ ln pðYjX Þ @K n1 ¼  ; @n1 @K n1 @n1 @ ln pðXjY; GÞ @ ln pðGjX Þ @K n2 ¼  @n2 @K n2 @n2

3. The establishment of the preform model

ð22Þ

In summary, by using an SCG iterative method, the posterior probability based on Bayes’ theorem is maximized to determine the optimized hyper-parameters n1 and n2 . In addition, the LVs X will be determined by the same process. If the initial values and accuracy are given, the training process will be terminated when the iterations of the hyper-parameters converge. Algorithms to optimize the LVs and hyper-parameters using the SCG method contain the following steps: Input: The high-dimensional space matrix is Y 2 RND , the GPR regression constraint matrix is G 2 RNM , and the number of iterations is V. Initialization: The initial latent variables X 2 RNd are obtained via SVD, with initial hyper-parameters n1 ¼ ½1; 1; 1, and n2 ¼ ½1; 1; 1. Pseudo-code: Loop: (m ¼ 1 to VÞ Step

1:

Calculate

ðv 1Þ ðv 1Þ Þ; K n2 n1

GPLVM,

ðv 1Þ

GPR

kernel

ðv 1Þ

K n1

¼ KðX ðv 1Þ ,

ðv 1Þ n2 Þ:

¼ KðX , Step 2: Calculate log-likelihood ðv 1Þ

F n1

¼

    1 DN D  ðv 1Þ  1 ðv 1Þ ln 2p  ln K n1   tr K n1 YY T : 2 2 2

Step 3: Calculate log-likelihood ðv 1Þ

F n2

 1 1 1  ðv 1Þ  ðv 1Þ ¼  GT ðK n2 þ r2 IÞ G  GT K n2 þ r2 I 2 2 1  ln 2p: 2

Step 4: Optimize

n o n o ðv Þ ðv Þ ðv 1Þ ðv 1Þ n1 ; n2 ¼ arg min F n1  F n2 : n1 ;n2

ðv 1Þ

Update K n1

ðv Þ

¼ KðX ðv 1Þ ; n1 Þ,

ðv 1Þ

K n2

Subsequently the LVs X test is determined. Finally, the regression output value g test can be obtained using GPR.

ðv Þ

¼ KðX ðv 1Þ ; n2 Þ.

3.1. Experimental setup and results A diagram of the finisher is shown in Fig. 2. The flashless die forging of the gear blank is conducted by the die and equipment, as shown in Fig. 3. In Fig. 4, the multi-station forging process can be described as upsetting ? preforming ? finishing ? punching. The equipment used is Hauteur’s APM70 high-speed hot upsetting automated production lines from Switzerland. Since the loss of metal burrs from forging is small with high dimensional accuracy and the utilization of materials is high, the forging process is stable and reliable with obvious economic advantages. In this work, the primary focus is the influence of key dimensional parameters of the preform on the formation of the finisher forging, while the parameters of the forging process are given. It is of great importance to obtain a preform model, and the parameterized preform model can be shown in Fig. 5. Since the preform is complex with many dimensions, parameters marked with red signals in Fig. 5 are selected as the most important, according to previous engineering experience and the data analysis afterward. Various design models of the preform are proposed varying the dimensional parameters until the approximate equation for the volume of the finisher achieves a tolerance less than 10%, when the shape of the finisher is known. The experimental dataset has 96 samples, of which the first 76 groups of MFFF and MFDS are obtained from FEM, with the statistics listed in Table 1. The last 20 groups are used for designing the preform die and punch. Pressure sensors are equipped to measure the MFFF, and then high temperature strain gauges are employed to measure the die strain and stress, with the results shown in Table 2. The primary optimization of the gear blank preform determines whether the MFFF and MFDS are both minimum when the die cavity is first filled. The purpose of this work is to quantify the aforementioned process to get the optimized dimension of the preform to provide the basis for the design of the finisher die, with samples observed as much as possible. 3.2. GA-ELM in conjunction with R-GPLVM The ELM model can be classically expressed as: Hb ¼ T 0 , where T represents the transposition of the output matrix T of the network and H denotes the output matrix of the hidden layer. 0

Step 5: Re-calculate log-likelihood ðv 1Þ

F n1

¼

    1 DN D  ðv 1Þ  1 ðv 1Þ ln 2p  ln K n1   tr K n1 YY T : 2 2 2

Step 6: Re-calculate log-likelihood ðv 1Þ

F n2

 1 1 1  ðv 1Þ  ðv 1Þ ¼  GT ðK n2 þ r2 IÞ G  GT K n2 þ r2 I 2 2 1  ln 2p: 2

n o ðv 1Þ ðv 1Þ : Step 7: Optimize X ðv Þ ¼ arg min F n1  F n2 X

2 P ðv Þ ðv 1Þ

Convergence criterion: if Errorðv Þ ¼ Ni¼1 X i  X i

6e then R-GPLVM converges. End Loop over V. Output: the hyper-plane parameters n1 , n2 and latent variables X. Furthermore, the DR of the high-dimension testing dataset Y test is conducted using the LVs X and the hyper-parameters n1 and n2 obtained from the aforementioned training of the model.

Fig. 2. The drawing of the finisher used.

71

Z. Cao et al. / Knowledge-Based Systems 83 (2015) 66–80

Fig. 3. Die and equipment used in the experiment.

Fig. 4. Flow of the multi-station forging process of gear blank.

Table 1 The statistics of the FEM simulation dataset of the preform. Parameters Input variable Maximum diameter (d1/mm) Slope starting diameter (d2/ mm) Bore diameter (d3/mm) Boss diameter (d4/mm) Depth of the hole (h1/mm) Boss height (h2/mm) Slope value (a1/°) Output variable Maximum finisher forming force (MFFF, f p /kN) Maximum finisher die stress (MFDS, f m /MPa)

MAX

MIN

AVERAGE

STDEV

VAR

79.6 73.6

67.9 60.3

74.34 66.89

3.22 3.52

10.34 12.37

31.7 37.6 13.6 16.1 163.7

26.5 33.2 8.2 12.3 145.1

28.75 35.58 10.64 14.46 155.4

1.61 1.25 1.76 0.96 5.02

2.59 1.55 3.11 0.91 25

6130

4996

5595

291.20

84802

2029

1611

1823

100.90

10179

2 Fig. 5. A parameterized preform model in the experiment.

Without loss of generality, the number of neurons of the input and output layers can be set to be n and m respectively. Moreover, bjk denotes the weight between the jth neuron of the hidden layer and the kth neuron of the output layer, while wij denotes the weight between the ith neuron of the input layer and the jth neuron of the hidden layer. This assumes that the input matrix of the training dataset can be expressed as X nQ with Q samples, and the output matrix can be expressed as Y mQ , with an activation function of the hidden layer neuron represented as gðxÞ. Based on this, the output matrix T of ELM can be illustrated as follows:

T ¼ ½t 1 ; t2 ; . . . t Q mQ ;

t1j

3

6 t2j 6 tj ¼ 6 6 .. 4 .

7 7 7 7 5

t mj

m1

 3 i¼1 bil g xi xj þ bi  7 6 PL 6 i¼1 bi2 g xi xj þ bi 7 7 6 ¼6 7; .. 7 6 5 4 .   PL i¼1 bim g xi xj þ bi 2 PL

j ¼ 1; 2; . . . ; Q ð23Þ T Here, wi ¼ ½xi1 ; xi2 ; . . . xin  and xj ¼ x1j ; x2j ; . . . xnj , L is the number T

of the neurons of the hidden layer and b ¼ ½b1 ; b2 ; . . . ; bL L1 is the threshold of the hidden layer neurons. For any given w and b, if the number of the hidden layer neurons is equal to the number of the training dataset samples, the

72

Z. Cao et al. / Knowledge-Based Systems 83 (2015) 66–80

Table 2 The experiment dataset of the preform. id

d1

d2

d3

d4

h1

h2

a1

MFFF

MFDS

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

78.4 71.1 70.3 72.7 79.1 76.6 76.7 74.9 75.3 74.7 77.8 78.3 79.1 70.5 73.6 72.3 71.7 77.3 76.4 78.3

70.6 64.1 62.8 65.1 71.8 69.1 69.4 67.4 68.1 67.5 70.7 71.1 72.1 63.2 66.3 65.5 64.3 70.2 69.4 71.3

28.1 28.5 28.1 27.9 28.4 28.8 28.1 27.9 28.1 27.5 28.6 28.2 27.4 28.1 28.5 28.4 27.2 28.1 27.5 27.6

37.1 36.7 36.5 35.9 37.1 37.2 36.3 35.1 35.6 35.2 36.3 36.5 35.4 35.8 36.7 36.8 37.1 35.9 36.4 36.3

10.8 11.1 10.8 11.4 10.6 11.3 10.2 10.4 10.5 10.7 11.2 10.3 11.1 10.9 10.7 11.6 10.4 10.8 11.5 10.5

13.8 15.6 14.8 14.3 13.1 13.5 14.3 15.8 15.3 15.5 14.6 13.8 15.1 14.6 13.8 13.3 13.1 15.3 13.8 13.6

158 155 160 153 158 155 160 153 153 160 155 158 160 153 158 155 155 158 153 160

5734 5550 5409 5309 5836 5522 5395 5278 5990 5662 5207 5890 6125 5911 5451 5724 5957 5139 5200 5894

2004 1969 1914 1967 2047 1972 1972 2025 1934 1917 1873 1924 1792 1746 1705 1749 2001 1973 1929 1951

SLFN model can be approximated using the training samples with zero error, namely, Q

X

t j  y ¼ 0; j

T yj ¼ y1j ; y1j ; . . . ; ynj ;

j ¼ 1; 2; . . . ; Q

ð24Þ

j¼1

Since the weight w and threshold b are randomly generated before training, we only need to calculate the number of the hidden layer neurons and the activation function, and b can then be determined. However, in order to reduce the number of operations, the number of the hidden layer neurons K is usually taken to be smaller than Q when the number of training samples is relatively large. Moreover, the connection weights b between the hidden layer and the output layer can be achieved by a least squares solution of the following equation



^ ¼ Hþ T 0 min Hb  T 0 ! b b

ð25Þ

Here, Hþ is the Moore–Penrose generalized inverse of the output matrix H. The seven screened out dimensions are used as input parameters. These are the maximum diameter (d1), the slope of the starting diameter (d2), the bore diameter (d3), the convex diameter (d4), the bore depth (h1), the convex height (h2), and the slope value (a1). Meanwhile, the MFFF (f p Þ and MFDS (f m Þ are used as output parameters. The ELM model can be represented as:

  F f p ; f m ¼ wðd1 ; d2 ; d3 ; d4 ; h1 ; h2 ; a1 Þ

ð26Þ

In this work, the ELM model of the preform can be constructed with a typical SLFN structure, as shown in Fig. 6. As mentioned in Section 1, since the values of w and b are both assigned randomly, the validity of the ELM prediction will be reduced and consequently the generalization of the model will be effected. As a result, the connection weights and biases of the neurons should be optimized and adjusted effectively. The optimization and adjustment of the parameters of GA-ELM is seamlessly integrated with RGPLVM, mainly including the following three frameworks, as shown in Fig. 7. Part_1 R-GPVLM: The topology between LVs and the observation samples can be established using GPLVM, with the DR of observation samples performed. GPR is exploited as a constraint to restrict the DR of the high-dimensional observation data. GAELM regressions are used to evaluate the previous DR and

Fig. 6. The established ELM model of the preform.

regression, while the workflow can be seen in the R-GPLVM module of Fig. 7. Part_2 ELM: The dimensions after DR are employed as the inputs of ELM, with the w and b randomly assigned to [1, 1], and the output matrix H and connection weight b will be calculated using the least square method. Based on this procedure, the output of the ELM network will be verified with the test samples. The w and b could be optimized using GA if the output accuracy cannot meet the requirements. Subsequently, the DR will be terminated when the accuracy meets the engineering requirements, otherwise the course of DR (in Part_1) will continue, which is the same as the optimization of w and b. Part_3 GA: The optimization for the connection parameters of the ELM network layers includes four sections. (1) It assumes that the ELM input layer has n neurons and the hidden layer has l neurons with a real number encoding adopted. Each individual of the population to be selected consists of two parts: the connection weight and the neuron bias of the hidden layer. The length of each individual is (n  l + 1). All the values of wij and bj can be assigned [1, 1], and the size of the initial population is set to 96 according to the individual numbers of the samples. (2) As for each individual, the output matrix H of the ELM hidden layer and the Moore Penrose generalized inverse Hþ are calculated using the training set. The connection weight ^ ¼ Hþ T 0 . b can be calculated according to the formula b (3) In this work, in order to improve the efficiency, the prediction RMSE of the testing set is chosen as the fitness function instead of the whole training set, namely

pffiffiffiffiffiffiffiffiffiffiffiffiffi f ðÞ ¼ RMSE ¼ SSE=n ¼ rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 Xn 2 ðy^  yi Þ ¼ i¼1 i n

rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  . ffi T^  T r n ð27Þ

Here, n is the number of the testing sample dataset, ^2 ; . . . ; y ^n  is the prediction value of the testing dataT^ ¼ ½y^1 ; y set, and T r ¼ ½y1 ; y2 ; . . . yn  denotes the true value. The RMSE decreases with the sum of the squared error (SSE), which

Z. Cao et al. / Knowledge-Based Systems 83 (2015) 66–80

Fig. 7. Flow chart of ELM network using GA optimization hybridized with R-GPLVM.

Fig. 8. (a) The training of maximum finisher forming force. (b) The training of maximum finisher die stress.

73

74

Z. Cao et al. / Knowledge-Based Systems 83 (2015) 66–80

Table 3 Comparisons of the training time and accuracy by various methods. Problem

Algorithm

Training time (s)

Accurancy (training dataset) Average relative error (ARE %)

RMSE

MFFF

BP GA-ELM ELM DE-ELM

3.56 345.4 0.43 187.32

6.32 3.15 4.34 2.23

569 256 356 203

MFDS

BP GA-ELM ELM DE-ELM

2.65 453.13 0.23 216.21

5.98 3.21 4.23 2.45

465 189 276 156

network model. In addition, the last 20 groups obtained from experiments are used as the testing samples employed to verify the generalization ability of the network model. The linear conversion function is adopted to achieve the normalization of the data, as shown in Eq. (28).



ðymax  ymin Þ  ðx  xmin Þ þ ymin xmax  xmin

ð28Þ

Here, x and y respectively represent the values before and after the conversion; xmax , xmin , ymax , and ymin respectively denote the maximum and minimum values of the samples. 4.2. Comparison results of GA-ELM and other models

indicates that the model fitting is better and the data prediction is more successful. (4) After the calculations of the fitness of all individuals in the populations are completed, it is necessary to conduct the GA operations of selection, mutation, and crossover for individuals in the populations. The termination condition is when the fitness function reaches a desired tolerance. The detailed algorithm procedure can be shown in the GA module of Fig. 7. The vector of individuals with larger fitness values in populations is preserved by the selection operation. Their mutated variable vectors are compared to the original ones, and are passed to the next generation. In this work, in order to further improve the generalization performance of the network, the strategy of the genetic selection is adjusted, i.e., the norms of the output weights of the hidden layer are added to the new ones except for the RMSE. After that, the crossover and mutation are conducted to generate the optimized sub-population. If the results are not satisfied, it returns to Section 2, and the optimal individual will be exported until the terminal conditions of RMSE and the weight norm kbk are met. In the end, all the optimizations are finished, and the aforementioned seven key dimensions of the preform will be determined.

4. Results and discussion 4.1. Data pre-processing There are 96 groups of preform dimensions, of which the first 76 groups obtained from FEM are used in the training of the

In this section, the number of the hidden-layer neurons of ELM is set to 30. A sigmoid activation function is employed and 76 datasets are used to establish the ELM training network. A comparative analysis between the four algorithms (GA-ELM, ELM, DE-ELM, BP) is performed. The training process of the maximum finisher forming force (MFFF) and the maximum finisher die stress (MFDS) are presented in Fig. 8(a) and (b), compared to the experimental results. From this figure, it can be seen that the learning outcome of DE-ELM and GA-ELM (with w and b optimized) present the best fitting performance, followed by that of ELM (with random w and b assigned) and BP. Comparisons of the RMSE and the average relative error (ARE %) are listed in Table 3. This shows that DE-ELM has the best performance compared to the other three algorithms including GA-ELM, both for the training of MFFF and MFDS. As we have pointed out in the Introduction, the DE optimization is likely to be premature due to its one-to-one tuning mechanism during the training process, which will easily lead to incorrect optimization solutions despite the fact that it has the best results. In the problem of MFFF, the training time of DE-ELM (187.32s) compared to the time of GA-ELM (345.4s) indicates that DE helps to reduce the convergence time cost, and this is the same in the case of MFFF. Although GA prolongs the training process, the training time of GA-ELM is still within reasonable bounds of typical engineering problems. One should keep in mind that ELM always has the shortest training time because of its unique ‘‘extreme’’ characteristic. Unfortunately, ELM cannot always obtain perfect learning. The results of the predicted and experimental values of MFFF and MFDS for the 20 testing sample sets are also shown in Fig. 9(a) and (b), which convinces us that the GA-ELM outperforms all the other algorithms (including DE-ELM) with the highest

Fig. 9. (a) Test of the prediction of MFFF. (b) Test of the prediction of MFDS.

75

Z. Cao et al. / Knowledge-Based Systems 83 (2015) 66–80

Fig. 10. The prediction accuracy of test samples based on different algorithms.

Fig. 11. The optimization of n1 and n2 using the SCG method.

precision. The results of the prediction errors of the testing dataset with different algorithm are shown in Fig. 10(a) and (b). The RMSE and average relative error (ARE) for the MFFF using GA-ELM are 182 and 2.83% respectively. In addition, the RMSE and ARE for MFDS are 20 and 0.85 respectively. From the above results, it can be concluded that GA-ELM exhibits an excellent prediction performance compared to BP, DE-ELM and ELM algorithms. 4.3. Hyper-parameters optimization and dimensionality reduction of R-GPLVM The hyper-parameters are consequently optimized using the SCG method with the LVs output. In Fig. 11, the convergence of the norm of n1 is 0.6821 after 45 iterations, and that of n2 is 0.8812 after 23 iterations, when the initial value and accuracy are set to (1, 1, 1) and 1e4 respectively. The results of Table 4

illustrate that the SCG has a faster convergence rate than that of the conjugate gradient (CG) under the conditions of certain initial values and precision, with the optimized values of hyper-parameters determined. Using the SCG method, the optimized values of hyper-parameters of n1 and n2 are (0.5132, 0.3411, 0.2925) and (0.7412, 0.3812, 0.2861) respectively. In the case of CG, these are (0.5113, 0.3401, 0.2914) and (0.7398, 0.3802, 0.2845) respectively. From Eq. (28), the data pre-processing is performed with 20 groups of test data, and then the data are tested with the RGPLVM algorithm. Three kinds of R-GPLVM can be established from Fig. 5 with dimensionalities of d = 15, 11, and 7 respectively. Comparisons of the regression of MFFF and MFDS indicate that the R-GPLVM has a better accuracy than that of GPR, no matter what dimension (15, 11, 7) is employed, as shown in Fig. 13(a) and (b). It can be determined that the prediction accuracy has a tendency to decrease as the dimension is increased. As can be seen from Fig. 13(c) and (d), the RMSE of MFFF for the three dimensions are 219.5, 144.8, and 82.4 respectively. In addition, the ARE of MFFF are 3.75%, 2.3%, and 1.19% respectively. For MFDS, the RMSE are 206.7, 146.7, and 82.8 respectively, and the ARE are 10.43%, 6.99%, and 3.97% respectively. From the calculations, maximum relative errors of MFFF are 6.59%, 5.80%, and 4.77% respectively, and the maximum relative errors of MFDS are 2.70%, 2.47%, and 2.43% respectively. The results demonstrate that the R-GPLVM prediction has an excellent performance. Since the noise of the data can be eliminated using LVs to conduct dimensionality reduction, the R-GPLVM training algorithm is more suitable and reliable for small sample data. The experiments show that the proposed RGPLVM algorithm on different dimensions of the latent space can obtain a better performance than GPR. This is because it can find the potential manifold structure hidden in the high-dimensional feature space, and a topology between the low-dimensional manifold and the output value can be established. Dimensionality reduction is achieved by LVM to remove the redundant information of the observed data. Furthermore, the most accurate results will be obtained when the regression analysis is conducted. The

Table 4 Comparisons of the optimization of hyper-parameters by SCG and CG methods. Hyper-parameters

n1

n1

n2

n2

Algorithms Initial value Accuracy Iterations Optimized value

CG (1, 1, 1) 1e4 70 (0.5113, 0.3401, 0.2914)

SCG (1, 1, 1) 1e4 45 (0.5132, 0.3411, 0.2925)

CG (1, 1, 1) 1e4 34 (0.7398, 0.3802, 0.2845)

SCG (1, 1, 1) 1e4 23 (0.7412, 0.3812, 0.2861)

76

Z. Cao et al. / Knowledge-Based Systems 83 (2015) 66–80

Fig. 12, the shapes of the five kernel functions are plotted together



  for comparison, where bij ¼ xi  xj =q; kij ¼ k xi ; xj , and q is the accommodation coefficient with the value determined from bij . In this work, the domain shapes of the quadratic, cubic and quartic kernel functions are all spherical with a radius bij 2 ð0; 2Þ, and for the quintic kernel function, the domain is spherical with a radius bij 2 ð0; 3Þ.

PN ^i Þ2 ðyi  y R2 ¼ 1  Pi¼1 N  2 i¼1 ðyi  yi Þ

ð29Þ

The Gaussian kernel function is below:

kij ¼

1

p3=2

  1 2 exp  bij 2

ð30Þ

Other kernels considered are polynomial functions. The quadratic kernel function is:

Fig. 12. The shapes of the five kernel functions used.

advantage is especially obvious when the size of the training data is small. Since the redundant information in GPR has not been removed, the accuracy of the model will be affected.

4.4. Discussion of kernel functions 4.4.1. Five kernel functions used in the numerical experiments In this work, five different kernel functions (the Gaussian, quadratic, cubic, quartic and quintic kernels) are exploited to investigate the prediction accuracy of MFFF. The expressions of RMSE and R-squared can be seen in Eqs. (27) and (29) respectively. In

5 kij ¼ 43p

(

3 4

2

3  34 bij þ 16 bij

0

if 0 6 bij 6 2

) ð31Þ

elsewhere

The cubic kernel function is:

8 2 3 > > 23  bij þ 12 bij 2 <   3 1 kij ¼ 2  bij 6 17p > > : 0

9 if 0 6 bij 6 1 > > =

ð32Þ

if 1 6 bij 6 2 > > ; elsewhere

The quartic kernel function is:

kij ¼

315 2382p

(

2 3

2

3

4

5  98 bij þ 19 b  32 bij 24 ij

0

The quintic kernel function is:

Fig. 13. Comparisons of R-GPVLM with different dimensions of the latent variables.

if 0 6 bij 6 2 elsewhere

) ð33Þ

77

Z. Cao et al. / Knowledge-Based Systems 83 (2015) 66–80

8 5  5  5 > 3  bij  6 2  bij þ 15 1  bij > > > >  5  5 3 < 3  bij  6 2  bij kij ¼   5 > 9917p > 3  bij > > > : 0

9 if 0 6 bij 6 1 > > > > > = if 1 6 bij 6 2 > if 2 6 bij 6 3 > > > > ; elsewhere ð34Þ

4.4.2. Robustness of the kernel functions of the model In this section, different training methods (BP, ELM) are applied to measure the robustness of the five kernel functions by exploiting the accuracy criteria of RMSE and R-square with different dimensions. As described above, the robustness of the model increases as R-square increases and RMSE decreases. Various comparisons of accuracy in different dimensional spaces for different kernel functions are illustrated in Table 5. The best result is obtained when d = 7 (corresponding to the key 7 LVs aforementioned). As seen in Table 4, for the low-dimensional spaces (below dimension 7), the Gaussian kernel function obtains the best values of R-square and RGPVLM no matter what training method is employed, whereas ELM can obtain better results. For the higherdimensional spaces (above dimension 7), the Gaussian kernel function also shows a good performance. The R-square is closer to 1 and RMSE is smaller compared to the others. A general trend is noticeable, where the precision increases gradually as the dimension decreases. However, it should be pointed out that this is not always the case when the dimension was decreased to 6, in which case the R-square decreased. Through the comprehensive comparison of the R-square and RMSE values based upon the low and high

dimensional data, it can be found that the Gaussian kernel function has a higher robustness in spite of the utilization of low and high dimensional sample sets. In addition, it can also be found that the robustness of the model deceased as the degree of polynomial kernel functions increased. The quintic kernel function was thus observed to have the lowest performance. Moreover, the performances of all the R-GPLVM are improved when the ELM training method is adopted, which indicates that ELM has a better generalization than BP. 4.4.3. Computational efficiency In order to evaluate the efficiency of the five kernel functions for the optimal number of dimensions d = 7, a dimensionless CPU time is employed as the criterion. All of the numerical calculations have been performed on a PC equipped with Intel (R) Core (TM) i7 with 16 GB RAM. The Gaussian kernel function was observed to require about 1.5 times more computation time due to its unlimited domain. On the other hand, the other four kernel functions needed less CPU time (less than 0.85) due to their finite domains. The simulation robustness and computational efficiency indicate that the Gaussian kernel function is the most suitable. 4.5. Multi-objective optimization As mentioned before, determining the optimized dimensions of the preform is the ultimate goal of this work, when the MFFF (f p Þ and MFDS (f m Þ are both minima, namely, UB min½f p;f m  subject to xLB i 6 x 6 xi ;

minkHb  T 0 k

ð35Þ

b

Table 5 Robustness comparisons of different kernel functions based on different training algorithms. Dimension

Accuracy criteria

Kernel functions Gaussian

Quatratic

Cubic

Quartic

Quintic

ELM

BP

ELM

BP

ELM

BP

ELM

BP

ELM

BP

R-GPLVM(d = 2)

RMSE R square

356 0.7426

587 0.8021

599 0.8485

564 0.7794

708 0.8021

897 0.8144

563 0.7688

430 0.7832

765 0.8212

899 0.8122

R-GPLVM(d = 3)

RMSE R square

374 0.8574

678 0.7834

765 0.8785

795 0.8659

683 0.7990

756 0.7712

894 0.8832

1345 0.6745

1298 0.8123

998 0.8321

R-GPLVM(d = 4)

RMSE R square

342 0.8713

517 0.7945

692 0.8686

1053 0.7106

287 0.7544

1107 0.8776

643 0.7963

645 0.8055

543 0.7543

453 0.6321

R-GPLVM(d = 5)

RMSE R square

293 0.8821

534 0.8532

492 0.8791

299 0.8832

526 0.8007

487 0.7421

765 0.8123

296 0.7434

432 0.6432

409 0.7102

R-GPLVM(d = 6)

RMSE R square

328 0.8813

309 0.8412

369 0.8806

412 0.8063

608 0.8678

394 0.8443

586 0.8656

453 0.8721

564 0.7854

1173 0.7663

R-GPLVM(d = 7)

RMSE R square

276 0.8926

325 0.8453

259 0.8894

432 0.8793

398 0.8812

387 0.8563

438 0.8845

332 0.7909

869 0.8396

345 0.8561

R-GPLVM(d = 8)

RMSE R square

305 0.8849

276 0.8321

284 0.8897

692 0.8854

403 0.8774

987 0.8533

390 0.8664

275 0.8112

564 0.8661

765 0.8469

R-GPLVM(d = 9)

RMSE R square

297 0.8849

478 0.8421

452 0.8786

513 0.8085

386 0.7665

458 0.8644

365 0.8545

325 0.7676

965 0.7973

867 0.7437

R-GPLVM(d = 10)

RMSE R square

312 0.8816

297 0.7932

945 0.8421

996 0.7871

785 0.8655

897 0.7322

1279 0.7832

1123 0.8231

654 0.6654

265 0.6432

R-GPLVM(d = 11)

RMSE R square

358 0.8705

537 0.8127

784 0.7636

421 0.7659

294 0.8541

1308 0.8296

987 0.8233

476 0.6324

432 0.6932

321 0.6981

R-GPLVM(d = 12)

RMSE R square

314 0.8745

506 0.7743

853 0.8193

299 0.8398

673 0.7584

732 0.7652

294 0.7521

776 0.6543

712 0.8234

663 0.8332

R-GPLVM(d = 13)

RMSE R square

423 0.8601

865 0.8067

643 0.7287

895 0.7195

1099 0.6655

1207 0.7021

458 0.7632

655 0.7963

674 0.7621

999 0.7559

R-GPLVM(d = 14)

RMSE R square

387 0.8787

745 0.8165

597 0.8329

1176 0.7906

562 0.7633

564 0.6732

673 0.6432

1599 0.6323

764 0.7992

866 0.7232

R-GPLVM(d = 15)

RMSE R square

532 0.7487

634 0.7932

896 0.7492

754 0.8209

890 0.8022

804 0.7932

298 0.6343

1211 0.8121

288 0.8121

544 0.8032

78

Z. Cao et al. / Knowledge-Based Systems 83 (2015) 66–80

Fig. 14. The trend of the optimal value and average values of the objective functions as a function of evolutionary generation.

Table 6 Comparisons of the experiments using the optimal solution based on different methods. Model

d1

d2

d3

d4

h1

h2

a1

fp

fm

BP ELM GA-ELM DE-ELM GPR R-GPLVM Experiment FEM

73.2 73.2 73.2 73.2 73.2 73.2 73.2 73.2

67.4 67.4 67.4 67.4 67.4 67.4 67.4 67.4

27.3 27.3 27.3 27.3 27.3 27.3 27.3 27.3

36.2 36.2 36.2 36.2 36.2 36.2 36.2 36.2

10.6 10.6 10.6 10.6 10.6 10.6 10.6 10.6

14.7 14.7 14.7 14.7 14.7 14.7 14.7 14.7

153 153 153 153 153 153 153 153

5552 5451 5245 5217 5483 5203 5324 5172

1954 1885 1701 1681 1619 1863 1765 1897

In this work, a parallel selection method is employed to solve the multi-objective optimization based on the GA-ELM network model. In GA, the parameter settings directly affect the optimization speed and quality, and the parameter settings are as follows: the number of initial population is 96; the number of genetic generations is 200; the random selection sampling function is ‘sus’; the two-point crossover probability is 0.7; and 20-bit binary encoding is adopted. MFFF and MFDS are chosen as the first and second objective functions respectively. The trend of the values of the objective functions, the sum of the objective functions, and the variation of the genetic generations are presented in Fig. 14(a)–(c). As can be seen from Fig. 14(d), the two objective

Fig. 15. The FEM results by using the optimized preform.

Z. Cao et al. / Knowledge-Based Systems 83 (2015) 66–80

79

Fig. 16. The optimized preform and finisher produced in the experiment.

functions already have been stabilized in the last 10 generations after 200 iterations, which indicates that the results have been optimized. A set of Pareto optimal solutions are exploited as the inputs to the different models (BP, ELM, DE-ELM, GA-ELM, GPR, R-GPLVM) in comparison with the experimental one, with the obtained results listed in Table 6. The predicted values of GAELM, DE-ELM and R-GPLVM are closer to the experimental ones, which are also presented in Table 6. 4.6. Simulation and engineering demonstration Deform 2D 9.0 was chosen to simulate our system, with conditions set as follows. (1) The material of the workpiece was set to be low-carbon alloy steel (20CrMoTi) with a rigid plastic body assumed, and the material of the dies was H13 with a rigid body assumed. (2) A shear friction condition was exploited between the workpiece and the dies with a friction factor of 0.1. (3) The temperature of the workpiece was initialized to 850 (°C), and the preheating temperature of the dies was set to 200 (°C). The heat transfer coefficient between the workpiece and the dies was set to 11 (kW/m2 °C). (4) The number of mesh units of the workpiece and dies were both set to 2000 with the step set to 0.01. The simulation results obtained using the aforementioned optimized preform are shown in Fig. 15(a) and (b). From the stroke–load curve in Fig. 15(a), the MFFF is 5172 (KN) when the stroke is 15.7 (mm). It was also determined that the finisher is formed with a precisely filled contour. As can be seen from the distribution of the maximum principal stress in Fig. 15(b), the MFDS is 1897 (MPa) with the stress concentration area located in the convex diameter (d4) of the lower die. Comparisons of the predictability between the FEM simulation, the experimental observation, and other various models are also shown in Table 6. It is generally known that optimized dimensions will improve deformation uniformity during the forming of the preform, and the optimized preform will benefit to the forming of the finisher. In our experiment, the produced preform and finisher with the optimized shape are presented in Fig. 16(a) and (b), which verifies the feasibility of our proposed method. 5. Conclusions In this paper, a new GA-ELM framework was integrated with R-GPLVM. In this framework, the ELM model was constructed including the dimensions, MFFF and MFDS. This has an advantage over traditional FEM optimizations, which have certain limitations when dealing with the complex nonlinear modeling of the preform. However, there are two problems to overcome in the training of the ELM network model. First, as the preform model has more input variables and relatively fewer data compared to conventional

ones, we proposed a novel DR model defined as R-GPLVM. In this model, the weakness of GPLVM was overcome using the correlative regression constraint information of the sample observation space, and was observed to be very suitable for small samples with higher-dimensional data. Experiments of regression illustrated that R-GPLVM outweighs GPR in various dimensions. Second, due to the random assignments of the connection weights between the input layer and hidden layer, and the neurons bias of the hidden layer, the predictive validity of ELM will be decreased. In our proposed solution, technical issues such as insufficient generalization performance and unsatisfactory stability of the regression model were overcome. To address this, the connection parameters of the ELM network in each layer were optimized using GA and by adjusting their strategy for the fitness function. As another important optimization approach, DE was also employed to tune the parameters of ELM, and the results demonstrated that GA-ELM has higher stability and more accurate predictions compared to the BP, DE-ELM and ELM models. Furthermore, results of GA-ELM optimization were used to evaluate the previous DR and regression processes. Since the establishment of the kernel function has important effects on the model precision, comparisons of different kernel functions are also conducted, which indicate that the Gaussian kernel function has higher prediction accuracy. In the training of R-GPLVM, the n1 and n2 hyper-parameters were optimized via an SCG algorithm, with the LVs determined. Comparative analysis with CG also shows that SCG has a faster convergence rate. Since the optimization of the design model of the preform is the ultimate goal, parallel selection methods were adopted to obtain the optimized preform in this work, when the predictive MFFF and MFDS are both minimized. By comparing the results of FEM simulation (using the optimized preform) with that applied to the actual production, the feasibility of our proposed method was verified. Acknowledgements This work was supported by the National Natural Science Foundation of China (Nos. 5100145 and 51175202). The authors would like to thank the relevant engineers of Dongfeng Motor Co., Ltd. and the company to itself for providing the experimental platform including the equipments and dies. The authors are grateful to all the anonymous reviewers for their helpful comments and suggestions. References [1] Y. Can, C. Misirli, Analysis of spur gear forms with tapered tooth profile, Mater. Des. 29 (2008) 29–838. [2] T.A. Dean, The net-shape forming of gears, Mater. Des. 21 (2000) 271–278.

80

Z. Cao et al. / Knowledge-Based Systems 83 (2015) 66–80

[3] E. Akata, M.T. Altinbalik, Y. Çan, Three point load application in single tooth bending fatigue test for evaluation of gear blank manufacturing methods, Int. J. Fatigue 26 (2004) 785–789. [4] T. Altinbalik, H.E. Akata, Y. Can, An approach for calculation of press loads in closed – die upsetting of gear blanks of gear pumps, Mater. Des. 28 (2007) 730–734. [5] D. Kim, J. Park, Development of an expert system for the process design of axisymmetric hot steel forging, J. Mater. Process. Technol. 101 (2000) 223–230. [6] C.A. Santos, M.T.P. Aguilar, H.B. Campos, A.E.M. Pertence, P.R. Cetlin, Failure analysis of the die in the third hot forging stage of a gear blank, Eng. Fail. Anal. 13 (2006) 886–897. [7] G.D. Satish, N.K. Singh, R.K. Ohdar, Preform optimization of pad section of front axle beam using DEFORM, J. Mater. Process. Technol. 203 (2008) 102–106. [8] Y.V.R.K. Prasad, K.P. Rao, Materials modeling and finite element simulation of isothermal forging of electrolytic copper, Mater. Des. 32 (2011) 1851–1858. [9] X. Deng, L. Hua, X. Han, Y. Song, Numerical and experimental investigation of cold rotary forging of a 20CrMnTi alloy spur bevel gear, Mater. Des. 32 (2011) 1376–1389. [10] S. Acharjee, N. Zabaras, The continuum sensitivity method for the computational design of three-dimensional deformation processes, Comput. Methods Appl. Mech. Eng. 195 (2006) 6822–6842. [11] G. Zhao, X. Ma, X. Zhao, R.V. Grandhi, Studies on optimization of metal forming processes using sensitivity analysis methods, J. Mater. Process. Technol. 147 (2004) 217–228. [12] J. Kim, B.S. Kang, S.M. Hwang, Preform design in hydroforming by the threedimensional backward tracing scheme of the FEM, J. Mater. Process. Technol. 130–131 (2002) 100–106. [13] H. Shim, Optimal preform design for the free forging of 3D shapes by the sensitivity method, J. Mater. Process. Technol. 134 (2003) 99–107. [14] G. Sun, G. Li, Q. Li, Variable fidelity design based surrogate and artificial bee colony algorithm for sheet metal forming process, Finite Elem. Anal. Des. 59 (2012) 76–90. [15] V. Nasrollahi, B. Arezoo, Prediction of springback in sheet metal components with holes on the bending area, using experiments, finite element and neural networks, Mater. Des. 36 (2012) 331–336. [16] D.J. Kim, B.M. Kim, Application of neural network and FEM for metal forming processes, Int. J. Mach. Tools Manuf. 40 (2000) 911–925. [17] M. Tercˇelj, I. Peruš, R. Turk, Suitability of CAE neural networks and FEM for prediction of wear on die radii in hot forging, Tribol. Int. 36 (2003) 573–583. [18] D. Ko, D. Kim, B. Kim, Application of artificial neural network and Taguchi method to preform design in metal forming considering workability, Int. J. Mach. Tools Manuf. 39 (1999) 771–785. [19] R.K. Ohdar, S. Pasha, Prediction of the process parameters of metal powder preform forging using artificial neural network (ANN), J. Mater. Process. Technol. 132 (2003) 227–234. [20] H.J. Li, L.H. Qi, H.M. Han, L.J. Guo, Neural network modeling and optimization of semi-solid extrusion for aluminum matrix composites, J. Mater. Process. Technol. 151 (2004) 126–132. [21] G. Huang, Q. Zhu, C. Siew, Extreme learning machine: a new learning scheme of feedforward neural networks, in: Neural Networks, Proceedings IEEE International Joint Conference on Neural Networks, 2004. [22] E. Avci, A new method for expert target recognition system: genetic wavelet extreme learning machine (GAWELM), Expert Syst. Appl. 40 (2013) 3984– 3993. [23] G. Huang, Q. Zhu, C. Siew, Extreme learning machine: theory and applications, Neurocomputing 70 (2006) 489–501. [24] X. Zhao, G. Wang, X. Bi, P. Gong, Y. Zhao, XML document classification based on ELM, Neurocomputing 74 (2011) 2444–2451. [25] F.L. Chen, T.Y. Ou, Sales forecasting system based on Gray extreme learning machine with Taguchi method in retail industry, Expert Syst. Appl. 38 (2011) 1336–1345. [26] F. Cao, B. Liu, D. Sun Park, Image classification based on effective extreme learning machine, Neurocomputing 102 (2013) 90–97. [27] Y. Wang, F. Cao, Y. Yuan, A study on effectiveness of extreme learning machine, Neurocomputing 74 (2011) 2483–2490.

[28] Q. Zhu, A.K. Qin, P.N. Suganthan, G. Huang, Evolutionary extreme learning machine, Pattern Recogn. 38 (2005) 1759–1763. [29] H. Qu, L. Wang, Y. Zeng, Modeling and optimization for the joint replenishment and delivery problem with heterogeneous items, Knowl.-Based Syst. 54 (2013) 207–215. [30] S. Dey, I. Saha, S. Bhattacharyya, U. Maulik, Multi-level thresholding using quantum inspired meta-heuristics, Knowl.-Based Syst. 67 (2014) 373–400. [31] M. Nilashi, O.B. Ibrahim, N. Ithnin, Multi-criteria collaborative filtering with high accuracy using higher order singular value decomposition and NeuroFuzzy system, Knowl.-Based Syst. 60 (2014) 82–101. [32] M. Usman, R. Pears, A.C.M. Fong, A data mining approach to knowledge discovery from multidimensional cube structures, Knowl.-Based Syst. 40 (2013) 36–49. [33] J. Serradilla, J.Q. Shi, A.J. Morris, Fault detection based on Gaussian process latent variable models, Chemomet. Intell. Lab. Syst. 109 (2011) 9–21. [34] Y. Sang, H. Qi, K. Li, Y. Jin, D. Yan, S. Gao, An effective discretization method for disposing high-dimensional data, Inform. Sci. 270 (2014) 73–91. [35] K.N. Reddy, V. Ravi, Differential evolution trained kernel principal component WNN and kernel binary quantile regression: application to banking, Knowl.Based Syst. 39 (2013) 45–56. [36] P.L. Bartlett, The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network, IEEE Trans. Inform. Theory 44 (1998) 525–536. [37] G. Li, P. Niu, Y. Ma, H. Wang, W. Zhang, Tuning extreme learning machine by an improved artificial bee colony to model and optimize the boiler efficiency, Knowl.-Based Syst. 67 (2014) 278–289. [38] F. Han, H. Yao, Q. Ling, An improved evolutionary extreme learning machine based on particle swarm optimization, Neurocomputing 116 (2013) 87–93. [39] S. Oreski, G. Oreski, Genetic algorithm-based heuristic for feature selection in credit risk assessment, Expert Syst. Appl. 41 (2014) 2052–2064. [40] J.S. Chung, S.M. Hwang, Application of a genetic algorithm to process optimal design in non-isothermal metal forming, J. Mater. Process. Technol. 80–81 (1998) 136–143. [41] M. Reihanian, S.R. Asadullahpour, S. Hajarpour, K. Gheisari, Application of neural network and genetic algorithm to powder metallurgy of pure iron, Mater. Des. 32 (2011) 3183–3188. [42] H. Yang, J. Yi, J. Zhao, Z. Dong, Extreme learning machine based genetic algorithm and its application in power system economic dispatch, Neurocomputing 102 (2013) 154–162. [43] E. Alexandre, L. Cuadra, S. Salcedo-Sanz, A. Pastor-Sánchez, C. Casanova-Mateo, Hybridizing extreme learning machines and genetic algorithms to select acoustic features in vehicle classification applications, Neurocomputing 152 (2015) 58–68. [44] S. Saraswathi, B.S. Mahanand, A. Kloczkowski, S. Suresh, N. Sundararajan, Detection of onset of Alzheimer’s disease from MRI images using a GA-ELMPSO classifier, IEEE, 2013. pp. 42–48. [45] A. Paniagua-Tineo, S. Salcedo-Sanz, E.G. Ortiz-Garc’ıa, J. Gasc’on-Moreno, B. Saavedra-Moreno, J.A. Portilla-Figueras, On the Performance of the l-GA Extreme Learning Machines in Regression Problems. IWANN, 2011, pp. 153– 160. [46] L. Wang, C. Fang, P.N. Suganthan, M. Liu, Solving system-level synthesis problem by a multi-objective estimation of distribution algorithm, Expert Syst. Appl. 41 (2014) 2496–2513. [47] F. Chang, J. Wu, C. Lee, H. Shen, Greedy-search-based multi-objective genetic algorithm for emergency logistics scheduling, Expert Syst. Appl. 41 (2014) 2947–2956. [48] C. Zhou, G. Yin, X. Hu, Multi-objective optimization of material selection for sustainable products: artificial neural networks and genetic algorithm approach, Mater. Des. 30 (2009) 1209–1215. [49] A.J. Goupee, S.S. Vel, Multi-objective optimization of functionally graded materials with temperature-dependent material properties, Mater. Des. 28 (2007) 1861–1879. [50] M.F. Møller, A scaled conjugate gradient algorithm for fast supervised learning, Neural Networks 6 (1993) 525–533.