Neural Networks in Materials Science

Neural Networks in Materials Science

Neural Networks in Materials Science 1. Introduction There are difficult problems in materials science where the general concepts might be understood ...

168KB Sizes 0 Downloads 78 Views

Neural Networks in Materials Science

1. Introduction There are difficult problems in materials science where the general concepts might be understood but which are not as yet amenable to scientific treatment. We are at the same time told that good engineering has the responsibility to reach objectives in a cost- and timeeffective way. Any model which deals with only a small part of the required technology is therefore unlikely to be treated with respect. Neural network analysis is a form of regression or classification modeling which can help resolve these difficulties while striving for longerterm solutions. It is not surprising that the method has had wide applications in all aspects of materials science. We introduce here the essence of the method. The familiar regression analysis best-fits a specified relationship which is usually linear to a collection of data. The result is an equation in which each of the inputs xj is multiplied by a weight wj; the sum of all such products and aP constant y then gives an estimate of the output y ¼ j wj xj þ y. Notice that a relationship has to be chosen before analysis and there is a tendency for this to be linear, or with nonlinear terms added together to form a psuedolinear equation, e.g., y ¼ w1 x1 þ w2 x2 þ w12 x1 x2 . The regression equation, once derived, is assumed to apply across the entire span of the input space. A general method of regression which avoids these difficulties is neural network analysis, illustrated at first using the familiar linear regression method. A network representation of linear regression is illustrated in Fig. 1. The inputs xi define the input nodes, y

the output node. Each input is multiplied by a random weight wi and the products are summed P together with a constant y to give the output y ¼ i wi xi þ y. The summation is an operation which is hidden at the hidden node. Since the weights and the constant y were chosen at random, the value of the output will not match with experimental data. The weights are systematically changed until a best-fit description of the output is obtained as a function of the inputs; this operation is known as training the network. The network can be made nonlinear as shown in Fig. 2. As before, the input data xj are multiplied by ð1Þ weights (wj ), but the sum of all these products forms the argument of a hyperbolic tangent: X

h ¼ tanh

where wð2Þ is a weight and yð2Þ is another constant. The strength of the hyperbolic tangent transfer function is determined by the weight wj (Fig. 3(a)). The output y is therefore a nonlinear function of xj, the function usually chosen being the hyperbolic tangent because of its flexibility. Further degrees of nonlinearity can be introduced by combining several of these hyperbolic tangents (Fig. 3(b)), so that the neural network method is able to capture almost arbitrarily nonlinear relationships. Each hyperbolic tangent is equivalent to introducing

x

(1)



w1 

x3

3

2

w

(1)

w

2

(1)

3

(1)

Input nodes tanh ( w

w2

x

x

1

1

x2

with y ¼ wð2Þ h þ yð2Þ

þy

j

w

x1

! ð1Þ wj xj

w3 w

Σ

(1)

(1)

X+ )

j

j

(2)

h+

j

(2)

Hidden unit

y

y Output node

Figure 1 Neural representation of linear regression. Modified from Bhadeshia H K D H 1999 Neural networks in materials science. ISIJ Int. 39, 966–79.

Figure 2 Nonlinear neural network. Modified from Bhadeshia H K D H 1999 Neural networks in materials science. ISIJ Int. 39, 966–79.

1

Neural Networks in Materials Science 1

x1

tanh(2x) 0.5 Function

x2

x3

tanh(x/2)

0 tanh(x/10)

−0.5

h1

h2

−1 −4

−2

0 x

(a)

2

4 y

2.5

Figure 4 Two-hidden-unit network.

tanh(x/2) + tanh(2 −2x) + 1

Function

2 7

1.5

6 1

5 4 −4

y

0.5 −2

0

2

(b)

4

6

3

x

Figure 3 (a) Three hyperbolic tangent functions, with shape dependent on the weights. (b) Combination of two hyperbolic tangents to produce a more complex model. Reproduced from Bhadeshia H K D H 2006 Neural networks in materials science: the importance of uncertainty. In: Int. Workshop Neural Networks Genet. Alogorithms Mater. Sci. Eng., 11–13 Jan., 2006, Shibpur, India. Tata McGraw-Hill Publishing, New Delhi, pp. 1–9.

a further hidden unit. The function for a network with i hidden units is given by y¼

X

ð2Þ

wi hi þ yð2Þ

with hi ¼ tanh

! ð1Þ

ð1Þ

wij xj þ yi

ð1Þ

j

Notice that in calculating hi, linear functions of the inputs xj are operated on by a hyperbolic tangent function, such that each input contributes to every hidden unit (Fig. 4). 2

1 0

0

1

2

3

x

4

5

6

7

Figure 5 The points represent a set of noisy experimental data. There are two fitted functions, one a straight line and the other a nonlinear polynomial. Reproduced from Bhadeshia H K D H 2006 Neural networks in materials science: the importance of uncertainty. In: Int. Workshop Neural Networks Genet. Alogorithms Mater. Sci. Eng., 11–13 Jan., 2006, Shibpur, India. Tata McGraw-Hill Publishing, New Delhi, pp. 1–9.

2. Overfitting

i

X

2

A potential difficulty with the use of flexible nonlinear regression methods is the possibility of overfitting data. In Fig. 5, the points represent all of the available (noisy) experimental data. There are two fitted functions, one a straight line and the other a nonlinear polynomial function. It is not possible, without any guiding physical principles relating x to y, to assess which of these functions is the more reliable. To avoid this difficulty, the experimental data can be divided into two sets, a training data set and a test

Neural Networks in Materials Science 400 300 200 y

data set. The model is produced using only the training data. The test data are then used to check that the model behaves itself when presented with previously unseen data. This is illustrated in Fig. 6 which shows three attempts at modeling noisy data for a case where y should vary with x3. A linear model (Fig. 6(a)) is too simple and does not capture the real complexity in the data. An overcomplex function such as the polynomial in Fig. 6(a) accurately models the training data but generalizes badly. The optimum model is illustrated in Fig. 6(b). The training and test errors are shown schematically in Fig. 6(c); not surprisingly, the training error tends to decrease continuously as the model complexity increases. It is the minimum in the test error which enables that model to be chosen which generalizes best to unseen data.

100 0 (a) 400 300

3. Noise and Modeling—Uncertainty

ED p

200 y

There are two kinds of errors to consider: when conducting experiments, the noise results in a different output for the same set of inputs when the experiment is repeated. This is because there are variables which are not controlled, so their influence is not included in the analysis. The second kind deals with the uncertainty of modeling; there may exist many mathematical functions which adequately represent the same set of empirical data but which behave differently in extrapolation. The noise in the output can be assessed by comparing the predicted values (yj) of the output against those measured (tj), e.g.,

100 0 −100 (b)

2

4 x

6

8

X ðtj  yj Þ2 j

3

2

x 3x 34 þ þ 44 11 11

Both of the functions illustrated precisely reproduce these data but behave quite differently when extrapolated (or indeed, interpolated, for x ¼ 3, y ¼ 4.931, not y ¼ 6 according to the linear function). The difference in the predictions of the two functions in domains where data do not exist, is a measure of the uncertainty of modeling, since both functions correctly represent the data x ¼ 2, 4, 6 used in creating the models. Notice also that unlike the noise, the magnitude of the modeling uncertainty is not constant, but varies as a function of the position in the input space. The uncertainty

Error

Test error

ED is expected to increase if important input variables have been excluded from the analysis. Whereas ED gives an overall perceived level of noise in the output parameter, it is not on its own, a satisfying description of the uncertainties of prediction. By contrast, the uncertainty of modeling is illustrated in Fig. 7, where a set of precise data (2, 4, 6) are fitted to two different functions, one linear the other nonlinear: y¼

0

Training error

(c)

Complexity of model

Figure 6 Filled points represent training data and the circles the test data. (a) A linear function which is too simple and a fifth-order polynomial which is too complicated. (b) A cubic polynomial with optimum representation of both the training and test data. (c) Variation in the test and training errors as a function of the model complexity. (a) Modified, (b and c) reproduced from Bhadeshia H K D H 1999 Neural networks in materials science. ISIJ Int. 39, 966–79.

3

Neural Networks in Materials Science 16

Linear

Output

12

Experimental data

y

A

8 B 4 x3 3x3 34 + + 44 11 11

x

12

16

Figure 7 Modeling uncertainty. Reproduced from Bhadeshia H K D H 2006 Neural networks in materials science: the importance of uncertainty. In: Int. Workshop Neural Networks Genet. Alogorithms Mater. Sci. Eng., 11–13 Jan., 2006, Shibpur, India. Tata McGraw-Hill Publishing, New Delhi, pp. 1–9.

becomes larger in domains of input space where knowledge is sparse or nonexistent (Fig. 7). One way of representing the uncertainty is to create a large variety of models, all of which reasonably represent the experimental data. The predictions made by these models will not be identical; the standard deviation in the predicted values then is a quantitative measure of the modeling uncertainty. Figure 8 illustrates the problem; the practice of using the best-fit function (i.e., the most probable values of the weights) does not adequately describe the uncertainties in regions of the input space where data are sparse (B), or where the data are particularly noisy (A). The modeling uncertainty is expressed by not having a unique set of weights, but rather a probability distribution of sets of weights. This recognizes the existence of many functions which can be fitted or extrapolated into uncertain regions of the input space, without unduly compromising the fit in adjacent regions which are rich in accurate data. The error bars depicting the modeling uncertainty then become large when data are sparse or locally noisy, as illustrated in Fig. 8. This methodology has proved to be extremely useful in materials science where properties need to be estimated as a function of a vast array of inputs. It is then most unlikely that the inputs are uniformly distributed in the input space.

Figure 8 Uncertainty in defining a fitting function in regions where data are sparse (B) or where they are noisy (A). The thinner lines represent error bounds due to uncertainties in determining the weights.

1.0 0.8 0.6 0.4 0.020 0.015

0.2 0.0

0.010

Ni (wt.%)

0.005

S (wt.%)

3.5

8 Input

2.5

4

1.5

0

0.5

0

Probability



Figure 9 Predicted effect of sulfur, nickel, and carbon on solidification cracking in steel welds.

The best models are ranked using the values of the test errors. Committees are then formed by combining the predictions of the best L models, where L ¼ 1, 2,y; the size of the committee is therefore given by the value of L. A plot of the test error of the committee versus its size L gives a minimum which defines the optimum size of the committee. It is good practice to use multiple good models and combine their predictions. This will make little difference in the region of the input space where the fit is good, but may improve the reliability where there is great uncertainty.

4. Committee of Models

5. Classification Network

It is possible that a committee of models can make a more reliable prediction than an individual model.

There are cases where the quantity being modeled is a binary variable which can only have discrete values

4

Neural Networks in Materials Science (e.g., 0 and 1). Solidification cracking is such a case, where the only thing that matters is whether the sample is cracked or not. Such a problem is known as a binary classification problem, and a classification neural network is used to model the probability of a crack as a function of the input variables (Fig. 9). A probability of 1 represents definite cracking whereas a probability of 0 represents the absence of cracking. Given the function hi as in Eqn. (1), the transfer to the output y is y¼

X ð2Þ 1 wi hi þ yð2Þ with a ¼ 1 þ expfag i

The output y is bounded between 0 and 1 (corresponding to the cracked and not-cracked conditions). The value of y indicates the probability of obtaining a cracked sample, so that y ¼ 0.5 indicates the highest level of uncertainty. The error function in the regression neural network model is replaced by a logarithmic likelihood: G¼

X

¼ tm ln ym þ ð1  tm Þ ln ð1  ym Þ

m

where ym is the output for example m and tm is the target. Training involves minimization of the sum of this error function.

6. Best Practice Because of their empirical character, the quality of neural network models depends critically on a number of factors as follows: * Creating a model is not sufficient; the model should be explored in order to reveal trends. An attempt should then be made to intepret those trends in terms of known science. * The behavior of the model should be similarly interpreted during extrapolation, outside of the domain of the data used to create it.

* Rather than a single best-fit model it is useful to consider the behavior of each of the members of a committee of models. This at the very least gives an indication of the modeling uncertainty. * Networks are not black boxes. They can readily be expressed explicitly using simple mathematical equations. They should therefore be disseminated by providing details of the structure of the network, the transfer functions, and the set of weights. Dissemination of the model is important because it ensures that the work presented in a publication can be reproduced and validated widely. * If possible, the data should also be disseminated. * Neural networks are capable of revealing new phenomena associated with large modeling uncertainties. These can be tested experimentally.

Bibliography Bhadeshia H K D H 1999 Neural networks in materials science. ISIJ Int. 39, 966–79 Bhadeshia H K D H 2006 Neural networks in materials science: the importance of uncertainty. In: Int. Workshop Neural Networks Genet. Alogorithms Mater. Sci. Eng., 11–13 Jan., 2006, Shibpur, India. Tata McGraw-Hill Publishing, New Delhi, pp. 1–9 MacKay D J C 1992 Practical Bayesian framework of backpropagation networks. Neural Comput. 4, 448–72 MacKay D J C 2003 Information Theory, Inference, and Learning Algorithms. Cambridge University Press, Cambridge Sourmail T, Bhadeshia H K D H 2005 Introduction to Materials Modeling. Maney, London, Chap. 9, pp. 153–65 Sterjovski Z, Nolan D, Carpenter K, Dunne D, Norrish J 2005 Artificial neural networks for modeling the mechanical properties of steels in various applications. J. Mater. Process. Technol. 170, 536–44 Sumpter B G, Noid D W 1996 On the design, analysis and characterisation of materials using computational neural networks. Annu. Rev. Mater. Sci. 26, 223–77

H. K. D. H. Bhadeshia

Copyright r 2008 Elsevier Ltd. All rights reserved. No part of this publication may be reproduced, stored in any retrieval system or transmitted in any form or by any means: electronic, electrostatic, magnetic tape, mechanical, photocopying, recording or otherwise, without permission in writing from the publishers. Encyclopedia of Materials: Science and Technology ISBN: 0-08-043152-6 pp. 1–5 5