ProcessSystemsEngineering2003 B. Chen and A.W. Westerberg(editors) 9 2003 Publishedby ElsevierScienceB.V.
364
Information Directed Sampling and Ordinal Optimization for Combinatorial Material Synthesis and Library Design Chia Huang Yen, David Shan Hill Wong, S.S. Jang Department of Chemical Engineering, National Tsing Hua University, Hsinchu, Taiwan 30043
Abstract: Combinatorial synthesis techniques have become more and more important in many areas of process and material designs. Simulated annealing is often suggested as a possible sampling policy in combinatorial methods. However, without model estimates of fitness function, true importance sampling cannot be performed. We suggested that a simple prediction model can be constructed as a generalized regression neural network using currently available. An information free energy index is defined using this model, which directs search to points that have potentially high fitness function (low information energy) and areas that are sparsely sampled (high information based entropy). Two benchmark problems were used to model the optimization problem involved in combinatorial synthesis and library design. We showed that when importance sampling is performed, the combinatorial technique becomes much more effective. The improvement in efficiency is explained using the concept of ordinal optimization. Keywords:
combinatorial synthesis, simulated annealing, ordinal optimization.
1. INTRODUCTION In recent years, combinatorial synthesis become an important optimization technique in product and process development [1 ], e.g. material synthesis[2], design of catalysts[3], drug design [4], improvement of enzymes activity[5], solvent selection[6]. One of stochastic search methods often employed as search strategy for large scale optimization problems is simulated annealing (SA)[7]. In a typical SA scheme, a reference point is selected; a set of candidates are generated. The merit functions at these candidates are compared to that of the reference. The reference is updated by the candidate according to the transition probability that mimics the Metropolis importance-sampling strategy used in Monte Carlo simulation of molecular ensembles. However, the "actual" merit function of a data point is known only if the actual experiment is performed. Therefore, unless a "model" of the merit function is available, importance sampling cannot be performed. However, for problems that need to be solved by combinatorial synthesis, accurate models are usually unavailable. Sampling policy in combinatorial synthesis is just another form of experimental design. In the past, we have proposed experimental design methods based on information theory for process optimization and recipe selection [8]. The philosophy is as follows. Given a set of data, a "rough" model can be constructed using neural-network. This model can be used to direct the search and suggest experiments at points that have potentially high fitness function (low information energy). However, such a model is untrustworthy when the number of data is sparse compared to the entire search space (high information based entropy). We need to explore areas that have not been investigated. A temperature-annealing schedule can be used to balance the two needs. The reason of why a rough model of limited accuracy can be useful to direct search was explained by Ho and coworkers [9] in terms of ordinal
365 optimization theory. Using probability theory and numerical simulation, they showed that the number of top ranked object selected can be significantly increased if a very poor model is used to aid a stochastic picking procedure. In this paper, we shall applied this concept of using a rough model to direct search and applied it to two benchmark problems in combinatorial synthesis. The objective is to demonstrate the effectiveness of this approach in solving large scale combinatorial optimizations. 2. B E N C H M A R K P R O B L E M S 2.1 N-K model
A benchmark problem that describes the basic physics of high-dimensional "structural" search is known as the N-K problem [10]. The simplest N-K model can be described as follows. Consider a N-dimensional array of integer variables. The merit function depends on the interaction of K-nearest neighbor.
1 N=K+I Y= [ ( N - K ) J 1/2 j=l~ ~tt(aj'aj+l ..... aj+K-1)
(1)
is a predefined discrete variable function, aj is the state of the jth element. The N-K model captures the basic physics of many phenomena such as genomics, protein evolution etc. Two characteristics of the N-K model that are especially important. The fitness function landscape becomes more and more rugged with many local minima as N and K increases. While the fitness function is very rugged, functions of neighboring points are correlated and not random variations. This ensures that accumulation of knowledge is helpful and the identification of a predictive model is possible. 2.2 RPV Model
In many combinatorial material synthesis problems, experimental variables include candidates of input include the compositions of the material and processing conditions. The input space consists of both discrete and continuous variables and is typically of very high dimensions. The merit function is usually one or a set of specific properties of the material such as superconductivity, luminescence, catalytic activity, tensile strength etc. Such properties will depend on the particular phase of the product material. Since the changes of physical property of the material is discontinuous across a phase boundary. The objective function encountered in a combinatorial synthesis optimization is only piecewise continuous. Falcioni and Deem [10] proposed a "random phase volume" (RPV) model to simulate the merit function encountered in combinatorial synthesis. Essentially RPV is a relation between the merit function U and a set of compositional variables x and non-composition variables z. Composition variables are expressed as mole fractions. Therefore for a C component system, there are C-1 independent composition variables. The entire composition space is divided into ct=l...M different phases by defining phase center points x",~. Each point in the composition space belongs to phase of the nearest phase center point. Non-composition variables may be processing conditions such as temperature, pressure, pH, time of reaction, thickness of film. They may be discrete or continuous and normalized in the range [-1, 1] in the RPV model. Similarly the noncomposition space is divided into y-1...N different phases by defining phase center points Z~
366
r ( x , z ) = rat
if W
IIx xoll m,. ILxx~ a':l...M 11-II- rain Z Z;
(2)
Z'=I...M
The merit function Yar in this phase is represented by a Qx th order polynomial in composition variable and a Qzth order polynomial in non-composition variable:
Qx d ,.-k--ak Yar = Va + Crx ~ Z f il ...ik ~x Ail...i k Yi1Yi2""Yi k k=l il>-...>-ik=l
(3)
1( Qz b ,x_k R?,k ) +-ff WZ +o"z ~ ~ f il...i k "~z -'il...ik Wil Wi2 "" Wik k=l i1>...>ik=l V~, Wr, Crx, ~, ~x, ~. A~'il...ik and BYkil...ik are coefficients of the model. The factor is f~, ...ik is a symmetric factor. Figure 2 is a typical merit function landscape of a RPV model with C=3, N=O, M=15 and Qx=2. The piecewise continuous and nonlinear nature is evident. The complexity of the problem can be increased by increasing the dimension of variable and number of phases. Given the N-K model and the RPV model, the sampling algorithm for optimization in combinatorial synthesis and library design can be benchmarked. 3. METHODOLOGY 3.1 IFEDSA The Information Free Energy Directed Annealing (IFEDA) procedure proposed is described as follows: 1. A set of 100 initial configurations are sampled randomly and the merit functions are sampled. 2. The data are trained to generate a meta-model using the generalized regression neural network (GRNN, [12]) 3. A simulated annealing search is performed using this GRNN meta-model. i. The existing minimum is selected as the reference and its information free energy is calculated. ii. A potential test candidate is generated and its information free energy is calculated. iii. If information free energy of the candidate is less than that of the reference a. The candidate is collected as a new test point b. The reference point is replaced the candidate c. If NT candidates have been collected go to step iv, otherwise go to step ii otherwise, go to step ii 4. The actual merit functions at this new batch of candidate solutions are evaluated (tests are performed). 5. If enough tests have been performed and the optimum found is satisfactory, the process is stopped. Otherwise, the temperature is reduced and then go to step 2 so that the modeling-selection-testing cycle is repeated.
367
3.2 Information Energy Given a model of all the presently available experimental data, we define the information energy index of any candidate point in the search space
u (x) = ~ ~ (x) for menimizaton problems [-~" (x) for maximizationproblems
(4)
At any candidate point, the better the predicted fitness function, the lower is its information energy. The more valuable is the information at that point.
3.2 Information Entropy Our knowledge of a candidate point can be measured by the information entropy
S(x)--i=~I(X--xi)TA-I(x--xi)expI--(x--xi)TA-I(x-xi)]
(5)
i~=exp[-(x-xi)rA-'(x-xi)] A-z is a scaling factor matrix. The information entropy is just the average of the square of the distance between a candidate point and all existing data points. The higher the information entropy, the more we know about the candidate point.
3.4 Information free energy and temperature annealing A candidate is worthy of experiment if it has a potential of having a good fitness value (low information energy), or it neighborhood has not been sufficiently explored (high information entropy). During the initial stages when the number of experiment is small, the model predictions are of little value, experiments should be devoted to sample un-chartered search space. When sufficient information has been gathered, only samples that are potentially important should be tested. Chen et al. (1998) proposed an information free energy index: F(x)- U (x)- rS(x) (6) T is an annealing temperature proportional to the number of experiment. The proper convergence of the sampling procedure to the global minimum depends on the annealing schedule. In this work, the very fast annealing scheme proposed by Ingber [ 13] was used:
T = TO9exp(-a. N o )
(7)
with To, a, b being adjustable constants and N is the total number of the experiments. 3. RESULTS Table 1 shows the results of solving several N-K and RPV problems after 1000 function evaluations. It was found that our IFEDA approach is much more effective than simple SA for both the N-K and R-P-V problems. Table 1:
Search Results for the NK and RPV problem using SA and IFEDSA RPV NK
M:15,N=3, Z:O,Q=2 Avg. a SA -1.71 IFEDA -1.83 average of 10 runs,
Stdb 0.08 0.01 b standard
M=37,N=6 Z:O,Q=6
N=20,K=4
Avg." Stdb ,, Av~. a -1.02 0.12 1.54 -1.62 0.02 1.13 deviation of 10 runs
Std b 0.08 0.00
N=30,K=5 Avg. a 1.88 1.58
Stdb 0.11 0.07
368 Fig. 1 illustrates the decrease in the objective function with number of batches of experiments for a RPV model. An undirected simulated annealing search causes the system to deviate from the current minimum so that the search process will avoid being trapped in a local minimum. However, aided by the GRNN, the IFEDSA is able to locate the correct minimum almost 1 order of magnitude faster. 9. o . s
...........................
" ...... "-.-.-.'..--'--.'.-~--'-~
...................
..........
" .... i..."
- ......... . .
.
.
.
.
.
......... ~ ....
-. . . . . . .
I:,,! i .....i.ililiii .
.................... ~ ............. " . . . . . . . : . . 4 - - - ' .
..i...§
.
.
.
l
.
.
.
.
~ .
1o
.
.
.
.
! .
"-.'..§
.
.
No. o f Batch
"~ t
!.!.i'~i .
.
.
..........i i.iri:iii 2o
~.'t ....................t ............i ........~......~...~.. ~ ' i ................" ...........~ .......~i~
i .... i....'.. ".-.'.§
so
L.tU;I7;I;;.;[71i;7~.I771.11~il;75TS77];;S;Li .177]J 10o
No. o f B a t e l l
SA IFEDSA Fig. 1: Changes in optimum located in the solution of a simple RPV problem (M=15,N=3, Z=O,Q=2) Ordinal optimization theory [9] found that the improved alignment of top ranked candidates by the use of a model depends strongly on (i) how the fitness function is distributed, shape of the ordered performance curve, but weakly on (ii) accuracy of the model, and (iii) selection rule, how the model is used. Both the N-K and RPV models exhibit what is called a bell-shaped distribution curve, which is most amenable to help by a model. A parameter W defined as
maxlY-r~j
W = rmax - rmin
,
(8)
was used in their studies to represent model accuracy. Lau and Ho [14] found that when W-0.5, the alignment can be increased by 6--8 times if a horse-race selection rule is used. Our IFEDSA is just another form of selection rule. For both N-K and RPV model, we found that W-0.3 to 0.5. Therefore approximately one order of magnitude increase in efficiency is expected. Fig. 2 shows the distribution of data sampled in the 1st "~10 th batch, 21 st to 30 th batch and 51 st to 60 th batch for a N-K problem. It is obvious that distribution of data sampled shifted towards low energy end. Such changes were not found for simple SA. This shows that IFEDA allows us to perform importance sampling in the early stages of the search.
Fig. 2: Distribution of data sampled different stages of optimization for the N-K model
369 4. CONCLUSION In this work, we have demonstrated the necessity of true importance sampling in solving optimization problem of high dimension. Our results show that while brute-force combinatorial techniques may be powerful, but the technique becomes much more effective if we can organize the present knowledge periodically to direct the search. The organization of knowledge need not be done in a theoretical or a precise manner. A simple empirical model that memorizes all existing data is sufficient. A "rough" model together with appropriate consideration of model uncertainty is able to significantly improved search efficiency. REFERENCES
[1] M.E. Davis, "Combinatorial Methods: How will they integrate into Chemical Engineering?," AIChE J., 45(11), 2270, 1999 [2] X.D. Xiang, X. Sun, G. Briceno, Y. Lou, K. A. Wang. H. Chang, W.G. Wallace-Freedman, S.W. Chen and P.G. Schultz; "A Combinatorial Approach to Materials Discovery," Science, 268, 1738, 1995 [3] B.M. Cole, K.D. Shimizu, C.A. Kreuger, J.P.A. Harrity, M.L. Snapper, and A.H. Hoveyda, "Discovery of Chiral Catalysts Through Ligand Diversity: Ti-Catalyzed Enantioselective Addition of TMSCN to Meso Epoxides," Angew Chem Int. Ed. Engl., 35, 1668 (1996) [4] B.M. Gordon, M.A. Gallop, and D.V. Patel, "Strategy and Tactics in Combinatorial Organic Synthesis. Applications to Drug Discory," Acc. Chem. Res., 29, 144, 1996 [5] L. You, and F.H. Amold. "Directed Evolution of Subtilisin E in Bacillus Subtilis to Enhance Total Activity in Aqueous Dimethylformamide," Protein Eng., 9(1), 77,1996 [6] E.J. Pretel, P.A. Lopez, B.B. Susana, and E.A. Brignole, "Computer-aid molecular design of solvents for separation processes," AIChE J., 40(8), 1349, 1994. [7] P.J.M. van Laarhoven, and E.H.L. Aarts, "Simulated Annealing: theory and applications", Rediel, Dordrecht, The Netherlands, 1987. [8] J.H. Chen, S.S. Jang, D.S.H. Wong and S.L. Yang, "Product and Process Development Using Artificial Neural-Network Model and Information Analysis," AIChE J., 44, 876, 1998. [9] Y.H. Ho, "Tutorial on ordinal optimization" http://hrl.harvard.edu/people/faculty/ho /DEDS/00/ OOTOC. html [10] S.A. Kauffman, and S. Levin, "Towards a general theory of adaptive walks on rugged landscapes," J. Theor. Biol., 128, 1 l, 1987. [11 ] M.M. Falcioni and W. Deem, "Library Design in Combinatorial Chemistry by Monte Carlo Methods", Physical review E., 61(5), 5948, 2000. [12] D.F. Specht., "A General Regression Neural Network," IEEE Transactions on Neural Network, 2(6), 568, 1991. [ 13] L. Ingber, "Very Fast Simulate Annealing," Math. Comput. Modeling, 12, 967, 1989. [14] T.W.E. Lau, and Y.C. Ho "Universal alignment probability and subset selection for ordinal optimization.", Journal of Optimization Theory and Applications, 93(3), 455, 1997