Response surface methodology: A neural network approach

Response surface methodology: A neural network approach

EUROPEAN JOURNAL OF OPERATIONAL RESEARCH ELSEVIER European Journal of Operational Research 101 (1997) 65-73 Theory and Methodology Response surface ...

629KB Sizes 20 Downloads 247 Views

EUROPEAN JOURNAL OF OPERATIONAL RESEARCH ELSEVIER

European Journal of Operational Research 101 (1997) 65-73 Theory and Methodology

Response surface methodology: A neural network approach M . F a r o o q A n j u m a,l, I m r a n T a s a d d u q b,2, K h a l e d A 1 - S u l t a n c'3 a Department of Electrical Eng., University of Maryland at College Park, College Park, MD. USA b Division VI, Research Institute, King Fahd University of Petroleum and MineraLt, Dhahran, Saudi Arabia c Department of Systems Engineering, King Fahd University of Petroleum and Minerals. Dhahran, Saudi Arabia

Received 26 June 1995; accepted 6 June 1996

Abstract

Response surface methodology is used to optimize the parameters of a process when the function that describes it is unknown. The procedure involves fitting a function to the given data and then using optimization techniques to obtain the optimal parameters. This procedure is usually difficult due to the fact that obtaining the right model may not be possible or at best very time consuming. In this paper, a two-stage procedure for obtaining the best parameters for a process with an unknown model is developed. The procedure is based on implementing response surface methodology via neural networks. Two neural networks are trained: one for the unknown function and the other for derivatives of this function which are computed using the first neural network. These neural networks are then used iteratively to compute parameters for an equation which is ultimately used for optimizing the function. Results of some simulation studies are also presented. @ 1997 Elsevier Science B.V.

1. Introduction

The true model of the system, also called the res p o n s e surface, is generally unknown. Hence the first

Response surface methodology (RSM) is a collection of mathematical and statistical techniques that are useful for modelling and analysis of problems in which a response of interest is influenced by several variables and the objective is to optimize this response [ 1 ]. RSM is a useful technique in cases where system optimization is the goal and the system is largely unknown. If some dependent variable y is a non-linear function o f an independent vector of variables x, x E R,, and the expected response is E ( y ) , then the surface E ( y ) = f ( x ) is called a response surface [2].

step in system optimization is to determine a suitable approximation. The most widely used approximating functions are polynomials. Data generated is then used to find a best fit polynomial. In the initial stages, a first order polynomial is usually used and in the region of the optimum a second order polynomial may be used. To estimate the optimum values more precisely, optimization methods derived from calculus may be used. As can be seen, this process involves trial and error, is time consuming and may not eventually arrive at the right model. Various authors have investigated the application of response surface methodology. In [3], response surface methodology has been used to investigate the effects of adaptive control of a machining operation and operating variables on the process-dependent parame-

Email: [email protected]. 2 Email: [email protected]. 3 Email: [email protected].

0377-2217/97/$17.00 (~) 1997 Elsevier Science B.V. All rights reserved. PII S0377-2217 (96)00232-9

66

M. Farooq Anjum et al./European Journal of Operational Research 101 (1997) 65-73

ters utilizing a second order relationship. Two different optimization algorithms, one of which is RSM, have been used in [4] along with an SMOS (Statistical metal oxide semiconductor) model to create an efficient CAD environment for integrated circuit designs. In a study reported in [5], an algorithm incorporating NLP (non-linear programming) software and RSM have been employed for optimal design and operation of filter units. A multi-level multi-dimensional response surface modelling technique is presented in [ 6] for effective and efficient yield-driven design. The approach makes it possible to perform yield optimization as well as a yield sensitivity analysis of circuits with microstrip structures simulated by an electromagnetic (EM) simulator. Hewidy and Fattouh [7] have reported experimental investigations in electrochemical cutting operations on the basis of the RSM technique. Three mathematical models correlating process parameters and their interactions with response parameters have been proposed. These models can be used in selecting optimum process parameters for obtaining the desired controlled width of cut with restricted consumed current. In [ 1 ], a near optimal solution has been obtained for maximizing the production output of a computer integrated manufacturing system. Montgomery and Buttencourt Jr. [ 2 ] have reviewed the application of multiple response surfaces to multiple variable optimization problems and have described how these techniques may be used in analyzing computer simulation experiments. An example with four response surfaces has been presented to illustrate the method. Vining and Myers [ 8 ] have shown that how the dual response approach of optimizing a primary response function while satisfying conditions on a secondary response function can be used to achieve the goals of the Taguchi philosophy within a more rigorous statistical methodology. An example has been given to illustrate the point. In recent years, artificial neural networks (ANNs) have come to be an important element in describing non-linear functions. Neural networks are also known as universal function approximators [9]. It has been shown in [ 10] that a feedforward multi-layered neural network can approximate a continuous function arbitrarily well. The behavior of an artificial neural network depends on both the weights and the input-output func-

tion that is specified for the units. This function typically falls into one of three categories: linear, threshold or sigmoid. For linear units, the output activity is proportional to the total weighted input. For threshold units, the output is set at one of two levels, depending on whether the total input is greater than or less than some threshold value. For sigmoid units, the output varies continuously but not linearly as the input changes [ 11 ]. In [ 12] the use of artificial neural networks for non-linear IMC has been explored. In doing so, the authors have studied the invertibility of a class of non-linear dynamical systems and derived an invertibility characterization. Sotirov and Shafai [ 13 ] have used neural networks for interval identification of non-linear systems, which is a new way of looking at such problems and may be helpful in arriving at a more tractable methodology. In this paper, we present a neural-network-based algorithm for optimizing the parameters of a process with an unknown model. In Section 2, we compare our approach with other existing techniques. In Section 3, we briefly introduce neural networks. Then, we present our proposed algorithm in Section 4. Results and discussion are presented in Section 5. Finally, conclusions are given in Section 6.

2. Comparison with other techniques There are several techniques that are directly or indirectly related to our proposed approach for system optimization. In this section, we briefly describe some of these techniques. We also compare these techniques to our proposed approach and summarize the advantages and the disadvantages of the proposed approach. System Optimization is usually achieved using two stages. The first stage is to determine a suitable model describing the behavior of the system, while the second stage is used to find the best values of input variables for achieving the optimal response of the system. The first stage can be achieved via regression analysis [ 14]. One can arrive at a regression model by guessing a model, then fitting it to the given data to find the best values of the parameters of the model. This is achieved usually by using a least square criterion. After fitting the model, it is checked for adequacy. If the model is not adequate, then another model is tried and the process is repeated until adequacy is achieved.

M. Farooq Anjum et aL /European Journal of Operational Research 101 (1997) 65-73

When one makes a regression model of a simulation model (i.e. data to be fitted are obtained by a simulation model) then one speaks o f regression metamodels [ 15,16], but the idea remains to be the same as above. From the above, it is clear that regression models and metamodels arrive at the model of the system only by trial and error (i.e. by guessing a model and then checking its adequacy) which necessitates direct intervention from the analyst. Moreover, it is possible that one may not arrive at the appropriate model, and that the process could be very time consuming. In addition, regression models and metamodels achieve only the first stage but not the second stage of system optimization. Our neural network approach avoids most of the shortcomings pointed out above. As will be seen later, our approach does not need trial and error to arrive at a reasonably adequate model. The approach requires less interventions from the analyst. Moreover, it achieves both stages of system optimization (i.e. modelling and optimization). Response surface methodology although achieves the two stages of system optimization, they use regression for modelling and therefore suffer from most of the disadvantages pointed out above. Moreover, most RSM approaches use linear regression models for early stages of the analysis and only quadratic regression models at the final stages, which could be very misleading in case of highly non-linear responses. There have been some attempts to avoid that by fitting cubic models [ 17] but still that may not be adequate. Moreover, the whole process of RSM needs lots of intervention from the analyst, and it makes the two stages of modelling and optimization very distinct (for a survey of RSM approaches, see [18] and for recent approaches to RSM, see [ 17,19-21] ). To the contrary of RSM, our approach combines both modelling and optimization in a way that the analyst does not have to interfere. Moreover, it is more accurate for the cases of highly non-linear responses. To summarize, our approach has the following advantages: 1. It combines both modelling and optimization. 2. It requires less intervention from the analyst as compared to other approaches. 3. There is no need for trial and error for arriving at a suitable model, as the process of modelling is almost data driven. Moreover, the possibility of finding a good model is high.

67

4. The accuracy of the approach gets higher as more points become available (this is not necessarily the case for some of the other approaches where if one uses the wrong model, more data will be of little help). The approach, however, suffers from some disadvantages, which include: 1. It can only be used if one has many points, but this disadvantage is also shared by some other approaches. 2. The model uses first order information as represented by the gradient and therefore may get stuck at a local minimum. But this is shared by most RSM techniques which use the steepest ascent approach (first order methods) for optimization. Finally, the proposed approach model can be used for optimizing the response of any system. However, the number of points available has to be greater than or equal to 30 in order to get a reasonable accuracy by this approach.

3. Neural networks

Neural networks (NN) are increasingly receiving attention in solving complex practical problems. Neural networks are known as universal function approximators. They are capable of approximating any continuous non-linear function to arbitrary accuracy [ 10]. Consider a dynamic non-linear relationship described by y ( t ) = f{qS(t)},

(l)

where y ( t ) is the output and ~b(t) is an n,~ x 1 vector composed of the past output and input and f ( . ) is a non-linear function that represents the dynamic relationship. A feedforward neural network may be trained to assume the function f . Upon completion of training the non-linear relationship can be expressed by the trained neural network. The feedforward neural network is assumed to have multiple layers including one or more hidden layers. Units in each layer feed the units into the forward layer through a set of weights and a non-linear differentiable activation function. Each activation function may also have a bias term. The nw x 1 vector w is assumed to contain all the concatenated weights of the network.

68

M. Farooq Anjum et al./European Journal of Operational Research 101 (1997) 65-73

We further denote the output of the feedforward neural network for a given weight w and input pattern ~b(t) as N N { w , ~b(t) }.

(2)

where Fb(t) =

[a+y(,)] T L awk-1 J

(an nw x ny matrix),

ef(t) = y( t) - N N { w , dp( t) },

(7)

3.1. Neural net training

Training consists of adjusting the weights of the network such that the network replicates the observed phenomenon. In the feedforward (equation error) training the network is trained by minimizing the sum squared error defined as q

(3)

J=ZE(t), t=l

with

7/b denotes the batch learning rate and the plus sign indicates the ordered partial derivative in the network [ 23 ]. The ordered derivative can be computed conveniently by the back propagation network [24]. However, instead of backpropagating the error e(t) as normally done, we backpropagate a 1 through the network. This facilitates the derivative calculation. The learning rate r/b can be either matrix or scalar. Usually a matrix learning rate uses higher order information and converges faster. The matrix learning rate in this case is given by

E ( t ) = ½tlef(t)l[ 2 = ½lly(t) - ~(t)ll 2,

(4)

rlb = /z ( e:l + H ) -1 ,

~( t) - N N { w , fb( t) },

(5)

where

O<_/Z_<2,

q

and q is the total number of available data. Training may be classified into batch learning and pattern learning [ 22 ]. In batch learning, the weights of the neural network are adjusted after a complete sweep of the entire training data, while in pattern learning the weights are updated at every time step in an online fashion. Batch learning has a better mathematical validity in the sense that it exactly implements the gradient descent method. Pattern learning algorithms are usually derived as approximations of the batch learning, however, it has the advantage of on-line implementation (hence applicable to adaptive control design). The back propagation algorithm is the most widely used method in neural net training. In this approac h, the training is based on a simpleminded gradient scheme. Back propagation provides a convenient method to compute the gradient using a chain rule. The back propagation algorithm for the weight update is given as

Batch learning.

n= ~'~ rb(t) r~(t), t=l

/Z is a parameter that affords additional flexibility and e is a small positive quantity added to deal with near singular H. It can be shown that in the vicinity of the optimal weight w°, the optimal value of/z becomes 1. However, when the weights are away from w° and/or in the presence of strong nonlinearities, the linearized approximation used by the Gauss-Newton algorithm becomes poor. Based on this consideration at times a conservative (small) value of/Z may yield better results. The following variable/z may be used for this purpose:

/z(k) =/zoo - / z r [/zoo - / z ( k - 1)], /z(O) ~_/Z~ ~_ 1,

(8)

where /zoo is the fnal value o f / Z and /zr controls the rate of transition f r o m / z ( 0 ) to/zoo./zr normally ranges between 0.9 and 1. The back propagation algorithm for pattern learning is given as

Pattern learning. w k = w k- l _ ~

[0+J] T [ aw k- t j

[.

q

= wg-1 + r/b y ~ Fb(t) ef(t), t=l

(6)

?+E(t) w ( t ) = w ( t - 1) - ~p(t) Law( t _ 1) = w ( t -- 1) + r/p(t)/"p(t) ef(t),

IT

(9)

M. Farooq Anjum et al./European Journal of Operational Research 10l (1997) 65-73 where

[ o+~(t) ]T,

['p(t) =

ko S--i)

el(t) - y( t) - N g { w ( t - 1),~b(t)},

(10)

and r/p (t) is the pattern learning rate. Application of the Gauss-Newton algorithm provides an expression for matrix ~Tp(t). Our algorithm for optimization of unknown functions using neural networks consists of two stages: stage one consists of training two neural networks offline, one for the unknown function f ( x ) and the other for its derivatives V f ( x ) . Stage two consists of using these trained neural networks to optimize the unknown function. The optimization process is iterative where at every iteration V f ( x ) and V 2 f ( x ) (or the Hessian of f ( x ) ) are computed until the stopping criteria is satisfied. At this point, we have reached optimality. The value(s) of the input variable(s) at this stage give the optimal point while the output of the first neural network gives the optimal value of the function.

69

3. Train another neural network, call it NN2, having the same inputs as NN1 and desired outputs as V f ( x ) . Hence it would be possible to calculate V 2 f ( x ) from the second neural network applying the procedure as given in the appendix to the second neural network. 4.2. Finding the optimum (Stage 2) Set k = 1 and generate n random numbers from a uniform distribution. Call this vector xk. It serves as the initial value of the input vector. 1. Give vector xk as input to NN1 and NN2 and determine the output. 2. Compute V f ( x ~ ) and V 2 f ( x ~ ) by backpropagating a 1 from the outputs of NNl and NN2 respectively, where V 2 f ( x k ) or H ( x k ) is defined as (

02f

a2f

ax~ ax~ax2 V2f(x~) =

a2f Ox 1cgxn

a2f

a2f

o2f

Ox2Ox~

ax~

c)x20xn

.

4. The algorithm 02f Consider an unknown function f ( x ) which is to be optimized. The algorithm consists of the following two stages: 4.1. Training (Stage 1)

OxnOxj

ozf Ox2,,

3. Compute Xk+l = Xg -- H - l ( x k ) V f ( x k ) [25]• 4. If llVf(xk+~)ll < ~, where a is a small positive number, then, xt+~ is the optimal point. Otherwise, increment k by one and go to step 1.

1. Train a neural network, call it NN1, to assume this function f ( x ) , i.e., after successful training, 5. Results and discussion

f(x) ~ NN{w,4,(t)}. 2. Backpropagate a 1 from the output(s) of NN1 to get V f ( x ) defined as Oxl

of XTf(x) =

Ox2

of

,

(il) X2

X ~

The above algorithm has been implemented and several examples have been solved. 5.1. Example 1 The function is assumed to be described as follows: f ( x ) = (x2 - 0.5) 2 + (x2 - 2Xl)2.

n

OXn / where n denotes the number of inputs to the neural network (see Appendix for details).

Data are generated using the above model and it is desired to find the data point that minimizes the above given function. The generated data are then used in our algorithm to test its efficacy.

M. Farooq Anjum et al./European Journal of Operational Research 101 (1997) 65-73

70

101

00

102 _

~

T

_ _

T

0 O

O0 0

] I

0 o

oOo

1130

o 10' o

10"I

0

O

o

g Cr 0

0

~

o 0

10 "2

o

oo

g

0o

®

o

Oooooooo

o

o

10-3

10~

! i

0

o

©o O00©O0OoQooo

1

o

1011

o °°°ooooooooooooooooooooooooooo

10 "2] 5

10

15

20

25

30

35

40

45

50

5

Fig. 1. Average sum squared error during function identification (example 1 ).

A neural net with one hidden layer and five neurons in the hidden layer has been trained to model the function. Only one hidden layer was taken since it has been proven that a neural network with one hidden layer only is quite adequate to model any nonlinearity [ 10]. The selection of five neurons involves slight trial and error. The number of neurons in the input and the output layers is equal to the number of independent and dependent variables respectively. Batch training has been performed along with the matrix learning rate given in Section 3.1. In Eq. (8) e , / z ( 0 ) , / z r and /zoo were 0.1, 0.1, 0.9 and 1.0 respectively. These initial values have been found to be effective in nearly all the simulations carried out. The training error is displayed as a function of the sweep iteration in Fig. 1. From this neural network, 1000 sets of first order derivatives have been generated. These data sets have been used to train another neural network to replicate V f ( x ) . One hidden layer with 11 neurons and two outputs is observed to capture the required characteristics adequately. The matrix learning rate of Section 3.1 has been used, with e , / x ( 0 ) , /zr and/zoo being 0.1, 0.1, 0.9 and 1.0 respectively. The average sum squared error for this multi-input multi-output neural network is shown in Fig. 2. This neural network has the following structure: Of V f ( x ) = h(Xl,X2) =

Of

10

15

20

25

30

35

40

No. of sweeps

No. of sweeps

Fig. 2. Average sum squared error during function's first order derivative identification (example 1 ).

After successful training of the neural network, it has been used to compute the Hessian or V 2 f ( x ) . The algorithm of Section 3 was then used to determine the optimal values of xl and x2. Initial values for xl and x2 were randomly selected as - 0 . 2 8 1 0 and -0.4530 respectively, ot was set at 10 - 4 and it took five iterations for the algorithm to successfully converge to the optimal values of 0.2435 and 0.4777 as compared to the actual values of 0.25 and 0.5.

5.2. Example 2 The algorithm was also tested on a practical example of an electrochemical cutting mechanism (ECM). This example has been taken from Ref. [7]. In this paper, experimental results have been reported showing the influence of machining parameters such as feed rate, applied voltage, electrolyte conductivity and electrolyte flow rate on the width of cut, electrolyzing current and volumetric metal removal rate. We used the data of 31 experiments reported in the paper and studied the effect on one output only, i.e., the volumetric metal removal rate. A neural network with one hidden layer and seven neurons in the hidden layer was trained to represent these experiments and their results, e,/x(0),/zr and/zoo were respectively 0.1, 0.1, 0.9 and 1.0, and batch learning was used for training (Section 3.1). Training error and outputs are shown in Figs. 3 and 5. This neural network was then used to generate first order derivatives which served as training

M. FarooqAnjumet al. /European Journal of Operational Research 101 (1997) 65-73

71

2

103 f

Aclua

15

"~10'

g

~

o ~Oc~

~

O

Deslrea

05~

o

E

& ~o° m

I

o

-05

-1

10.~

°o

1ol

li ....

10"

1~0

2'0

3'0

40

50

60

70

80

90

l

100

- "

L 2"0

2'5

No of Iterations

No. of sweeps

Fig. 3. Average sum squared error during function identification ( example 2).

Fig. 5. A comparison of the actual (NN) and desired output of the function (example 2).

rgf~ cgx l 10 2 .

.

.

.

~



9)"

l

XTf ( x ) = g ( x l , x2, x3, x4)

10' ~

,,

of \ 9X4 /

oE o ~

g

9f 9x3

lO 0 o

9x2 =

102

•~ 103

104

20

40

60

80

100

120

140

160

180

200

No of sweeps

Fig. 4. Average sum squared error during function's first order derivative identification (example 2).

data for another neural network which was trained to replicate ~Tf(x). This neural network had 19 neurons in the hidden layer, while it had 5 neurons in the input layer and 4 neurons in the output layer. A plot of cumulative average sum squared error versus number of sweeps is shown in Fig. 4. After successful training, this neural network mimics the following function:

As a 1 is backpropagated from the output of this neural network, we get the second order derivatives or the Hessian (~72f(x)) of the function. Now the algorithm of Section 4 was applied to determine the optimum volumetric metal removal rate and the optimal value of machining parameters required to get such a rate. We randomly generated initial values of the four machining parameters as 1.3041, - 0 . 3 4 9 5 , 0.0582 and 0.9929. The value of a was 10 -l and it took three iterations for the algorithm to converge to the optimal value of 19.5212 ( ( m m 3 / m i n ) / m m ) for the output (VMRR). Optimal values of the machining parameters i.e. feed rate, applied voltage, conductivity and electrolyte flow rate emerged as 2.6919, 19.4149, 0.0190 and 15.8320 respectively. All these results match exceptionally well with the experimental results reported in Ref. [7]. The corresponding values reported there are 2.5 mm/min, 20 volt, 0.02 12 -j mm -I and 16 l/min.

72

M. Farooq Anjum et al./European Journal of Operational Research 101 (1997) 65-73

6. Conclusion NET(l) In this paper, a neural-network-based algorithm has been developed for optimizing the parameters of a process with an unknown model. The procedure is carried out in two stages. In stage one, two neural networks are trained, one to identify the unknown function and the other to replicate the first order derivatives of the function. Optimization is then carried out in the next stage. Simulation studies have been presented on two examples establishing the effectiveness and feasibility of the approach. Although we considered only single output functions, the concept is readily extendable to multiple output functions. Directions for future work include looking for a method to accomplish constrained optimization of the unknown models. Calculation of the Hessian inverse is also expensive in actual implementation. Hence it would be profitable to look for a method to calculate the Hessian inverse directly using neural networks.

l

OUT(I)

~N

W I-'>weights in output layer OUT( 1)-->outputs of the hidden layer rleurons

W 2 -->weights in input layer NET( I ) -->ZW2' IN

Fig. A.I. Derivative calculation using neural networks. Oy

Oy

OIN - 0OUT( l)"

0OUT( 1 ) 0IN

= WI.F~ [ ] .F~[ ] .W2,

(A.3) (A.4)

Acknowledgements

F~[] = 1 [ 1 + O U T ( 2 ) ] [1 - O U T ( 2 ) ] ,

(A.5)

The authors would like to acknowledge the King Fahd University of Petroleum and Minerals for supporting this research. The second author would also like to acknowledge the support provided by the Research Institute at the University. The authors are grateful for anonymous referees for comments that have improved the presentation of the paper.

F ( [ ] = ½11 + O U T ( l ) ]

(A.6)

Appendix A Calculation o f derivatives. The method to determine the derivatives of the dependent variable with respect to the independent variables using neural networks is as given. Refer to Fig. A.1. Let y be the output which is a scalar. For a vector output, the extension is quite simple. Let IN consist of the inputs (independent variables) to the network. Then the method to obtain the required coefficients is

y = F2 [ wIT.OUT(I) ] = F2 [ wT1.FI ( w T . I N ) ],

(A.1) (A.2)

[1 - O U T ( 1 ) ] ,

where i indicates the derivative.

References [ 1] Shang, J.S., and Tadikamalla, ER., "Output maximization of a cim system: simulation and statistical approach", International Journal of Production Research, 31 (1993) 19-41. [2] Montgomery, D.C., and Bettencourt Jr., V.M., "Multiple response surface methods in computer simulation", Simulation (1977) 113-121. [3] Barnwal, S., Bayoumi, A.E., and Hutton, D.V., "Prediction of flank wear and engagement from force measurements in end milling operations", Wear 170 (1993) 255-266. [4] Christopher, M., Su, H., and Ismail, M., "Yield optimization of analog mos integrated circuits including transistor mismatch", Proceedings of 1993 IEEE International Symposium on Circuits and Systems 3 (1993) 1801-1804. [5] Fujiwara, O., Dharmappa, H.B., Vemik, J., and Vigneswaran, S., "Optimization of granular bed filtration treating polydispersed suspension", Water Resources 26 (1992) 1307-1318. [6] Chen, S.H., Grobelny, EA., Bandler, J.W., Biernacki, R.M., and Ye, S., "Yield-driven electromagnetic optimization via

M. Farooq Anjum et al./European Journal of Operational Research lOl (1997) 65-73

171

[ 81

19]

[101

[II] 1121

1131

[141 [15]

multilevel multidimensional models", 1EEE Transactions on Microwave Theory and Techniques 41 (1993) 2269-2278. Hewidy, M.S., and Fattouh, M., "Electrochemical cutting using tubular cathodes: response surface approach", International Journal of Production Research 27 (1989) 953-963. Geoffrey, G.V., and Myers, R.H., "Combining Taguchi and response surface philosophies: A dual response approach", &~urnal of Quality Technology 22 (1990) 38-45. Hornik, H., Stinchcomb, M., and White, J., "Multilayer feedforward networks are universal approximators", Neural Networks 2 (1982) 359-366. Cybenko, G., "Approximation by superposition of a sigmoidal function", Math Control Signal Systems 2 (1989) 303-314. Hinton, G.E., "'How neural networks learn from experience", S~'ientific American, September (1992) 105-109. Hunt, K.J., and Sbarbaro, D., "Neural networks for nonlinear internal model control", lEE Proceedings-D 138 (1993) 431-438. Sotirov, G., and Shafai, B., "Interval identification of nonlinear systems using neural networks", Proceedings of the American Control Conference, San Francisco, CA, (1993) 2529-2530. Montgomery, D.C., and Rungers, G.C., Applied Statistics and Probability for Engineers, John Wiley, New York, 1994. Tow, J.D., "Simulation metamodel estimation using a combined correlation-based variance reduction technique for first and higher-order metamodels", European Journal of Operational Research 87 (1993) 349-363.

73

] 16] Kleijnen, J., and Groenendaal, W.V., Simulation: A Statistical Perspective, John Wiley, Londen, 1992. [ 17 ] Lin, D.K., and Tu, W., "Dual response surface optimization", Journal of quality technology 27 (1995) 34-39. [18] Myers, R.H., and Khuri, A.I., "'Response surface methodology: 1966-1988", Technometrics 31 (1989) 137157. [19] Myers, R.H., Vining, G.G., Goivannitti-Jensen, A., and Myers, S.L., "Variance dispersion properties of second-order response surface designs", Journal c~f"Quality Technology 24 (1992) 1-11. [201 Vining, G.G., and Myers, R.H., "Combining Taguchi and response surface philosophies: A dual response approach". Journal of Quafity Technology 22 (1990) 38-45. [21] Castillo, E.D., and Montgomery, D.C., "A nonlinear programming solution to the dual response problem", &~urnal of Quality Technology 25 ( 1993 ) 199-204. [ 22 ] Ahmed, M.S., and Tasadduq, I.A., "Neural-net controller for nonlinear plants: design approach through linearization", lEE Proc. Control Theory, Appl. 141 (1994) 323--328. [23J Webros, P.J., "'Backpropagation through time: what it does and how to do it", Proc. oflEEE 78, 1550-1560, [ 241 Rumelhart, D.E., Hinton, G.E., and Williams, R.J., "Learning internal representation by back propagation", ParaUel Distributed Processing: Exploration in the Micrastructure ~/' Cognition, I, 1986. [25] Bazaraa, M.S., Sherali, H.D., and Shetty, C.M., Nonlinear Programming: Theory and Algorithms, 2rid Ed., John Wiley, New York, 1993.