Discovery scientific laws by hybrid evolutionary model

Neurocomputing 148 (2015) 143–149 Contents lists available at ScienceDirect Neurocomputing journal homepage: www.elsevier.com/locate/neucom Discove...

Download PDF

707KB Sizes 3 Downloads 83 Views

Report

PDF Reader
Full Text

Neurocomputing 148 (2015) 143–149

Contents lists available at ScienceDirect

Neurocomputing journal homepage: www.elsevier.com/locate/neucom

Discovery scientiﬁc laws by hybrid evolutionary model Fei Tang a, Sanfeng Chen a, Xu Tan a, Tao Hu a, Guangming Lin a,n, Zuo Kang b a b

Shenzhen Institute of Information Technology, Shenzhen, China School of Computer Science, Wuhan University, Wuhan, China

art ic l e i nf o

a b s t r a c t

Article history: Received 9 April 2012 Received in revised form 28 June 2012 Accepted 30 July 2012 Available online 28 July 2014

Constructing a mathematical model is an important issue in engineering application and scientiﬁc research. Discovery high-level knowledge such as laws of natural science in the observed data automatically is a very important and difﬁcult task in systematic research. The authors have got some signiﬁcant results with respect to this problem. In this paper, high-level knowledge modelled by systems of ordinary differential equations (ODEs) is discovered in the observed data routinely by a hybrid evolutionary algorithm called HEA-GP. The application is used to demonstrate the potential of HEA-GP. The results show that the dynamic models discovered automatically in observed data by computer sometimes can compare with the models discovered by humanity. In addition, a prototype of KDD Automatic System has been developed which can be used to discover models in observed data automatically. & 2014 Elsevier B.V. All rights reserved.

Keywords: Hybrid evolutionary algorithm Discover scientiﬁc laws Genetic programming

1. Introduction For centuries, scientists have attempted to identify and document analytical laws that underlie physical phenomena in nature [1]. The major impediment to scientiﬁc progress in many ﬁelds is the inability to make sense of the huge amounts of data that have been collected from a variety of sources. In the ﬁeld of knowledge discovery in databases (KDD), there have been major efforts in developing automatic methods to ﬁnd signiﬁcant and interesting models (or patterns) in complex data and forecast the future based on those data. In general, however, the success of such efforts has been limited in the degree of automation during the process of KDD and in the level of the models discovered by data mining methods. Usually the goals of description and prediction are achieved by performing the following primary data mining tasks: summarization, classiﬁcation, regression, clustering, dependency modelling, and change and deviation detection [2]. Recently, Schmidt and Lipson [1] have demonstrated the discovery of physical laws, from scratch, directly from experimentally captured data with the use of a computational search. They used the presented approach to detect nonlinear energy conservation laws, Newtonian force laws, geometric invariants, and system manifolds in various synthetic and physically implemented systems without prior knowledge about physics, kinematics, or geometry. Ngan et al. [3] used grammar based genetic programming for data mining of medical knowledge. Despite all those n

Corresponding author. Tel.: þ 86 755 89226652. E-mail address: [email protected] (G. Lin).

http://dx.doi.org/10.1016/j.neucom.2012.07.058 0925-2312/& 2014 Elsevier B.V. All rights reserved.

methods and models mentioned above, our research focuses on discovering high-level knowledge in complex data modelled by complicated functions and systems of ordinary differential equations (ODEs). Cao and Kang [4] have proposed a two-level evolutionary modelling algorithm to approach this task. Some numerical experiments were done to test their algorithms' effectiveness. In this paper, we run a Hybrid Evolutionary Algorithm (HEA) with GP on the applications of time series to demonstrate its potential in discovering the dynamic models in observed data automatically. The rest of the paper is organized as follows. Section 2 is the Related Works. Section 3 is the description of HEA with GP. Section 4 gives two examples of the application of HEA-GP. Section 5 is the discussion and Section 6 gives some conclusions.

2. Related works 2.1. Problem statement Suppose a dynamic system can be described by n interrelated functions x1 ðtÞ; x2 ðtÞ; …; xn ðtÞ and a series of observed data collected at the time t i ¼ t 0 þ inΔt ði ¼ 0; 1; 2; …; m 1Þ can be written as the following form: 2 3 x1 ð0Þ x2 ð0Þ … xn ð0Þ 6 7 6 x1 ðt 1 Þ x2 ðt 1 Þ … xn ðt 1 Þ 7 7 ð1Þ X¼6 6 ⋮ ⋮ ⋮ 7 4 5 x1 ðt n Þ x2 ðt n Þ … xn ðt n Þ

144

F. Tang et al. / Neurocomputing 148 (2015) 143–149

where t0 denotes the starting time (here t 0 ¼ 0), Δt denotes the interval between two observations, and xj ðt i Þ ðj ¼ 0; 1; 2; …; nÞ denotes the observed value of variable xj at the time ti. If we denote xðtÞ ¼ ½x1 ðtÞ; x2 ðtÞ; …; xn ðtÞ, f ðt; xÞ ¼ ½f 1 ðt; xÞ; f 2 ðt; xÞ; …; f n ðt; xÞ where f j ðt; xÞ ¼ f j ðt; x1 ðtÞ; x2 ðtÞ; …; xn ðtÞÞ ðj ¼ 1; 2; …; xn Þ is the composite function of elementary functions involving variables xi ði ¼ 1; 2; …; nÞ and t, and the function space deﬁned by those functions can be denoted as F, the modelling problem of system of ODEs is to ﬁnd the model, a system of ﬁrst-order differential equations having the general form of n

dx =dt ¼ f ðt; xn Þ

ð2Þ

such that minf J X n X J ; 8 f A Fg where Xn is the matrix that is the values of xn ðtÞ at the time points corresponding to matrix (1), sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ J Xn X J ¼

m1

n

∑ ∑ ðxj ðt i Þ xnj ðt i ÞÞ2

i¼0 j¼1

and the values of xi ði ¼ 0; 1; 2; …; nÞ at the next τ time steps 2 3 x1 ðt m Þ x2 ðt m Þ … xn ðt m Þ 6 7 x2 ðt m þ 1 Þ … xn ðt m þ 1 Þ 7 6 x1 ðt m þ 1 Þ 6 7 6 7 ⋮ ⋮ ⋮ 4 5 x1 ðt m þ τ 1 Þ x2 ðt m þ τ 1 Þ … xn ðt m þ τ 1 Þ can be predicted based on the model (2). To make sure of the validity of models, we assume that system of ODEs implied in the observed data satisﬁes some degree of stability with respect to the initial condition. Namely, the small change of the initial condition will not give rise to the great change of the solution of ODEs. 2.2. Background of evolutionary modelling Constructing a mathematical model is an important issue in engineering application and scientiﬁc research. A complex model is composed of model structure and model parameters. Modelling problems can be categorized into two types, one is parameter identiﬁcation and the other is system modelling. For parameter identiﬁcation, the model structure is known and model parameters are unknown; while for system modelling, both model structure and model parameters are unknown. The classical methods for modelling must conﬁrm model structure by modelers before modelling, and then identify the parameters of the model. Because of the disadvantages of classical methods, these parameters often trap into local optimal values. Evolutionary Computation (EC) [5], especially Genetic Programming (GP) [6], has made automatic modelling into reality. GP uses some types of structure as its individual, which in reality is a model structure, and we can optimize model structure by using GP. In another way, EC algorithms are global optimization algorithms and thus we can conﬁrm the global optimization of model structure and model parameters by using EC and GP. In Genetic Programming (GP) [7] paradigm, the individuals in the population are compositions of functions and terminals appropriate to the particular problem domain. The set of functions used typically includes arithmetic operations, mathematical functions, conditional logical operations, and domain-speciﬁc functions. Each function in the function set must be well deﬁned for every element in the range of any other function in the set. The set of terminals used typically includes inputs appropriate to the problem domain and various constants. The search space is the hyperspace of all possible compositions of functions that can be recursively composed of the available functions and terminals. The symbolic expressions (S-expressions) of the LISP programming

language are an especially convenient way to create and manipulate the compositions of functions and terminals described above. These S-expressions in LISP correspond directly to the parse tree that is internally created by most compilers. That entire computer programs can be genetically bred to solve problems in a variety of different areas of artiﬁcial intelligence, machine learning, and symbolic processing. GP started to receive attention from a wide group of researchers from the early 1990s [6]. Since then, it has been rapidly developed into a popular research ﬁeld of artiﬁcial intelligence. GP has been recognized as being able to ﬁnd promising solutions in many areas, including signal ﬁlters [9], circuit designing [10], image recognition [11], symbolic regression [12], ﬁnancial prediction [13], and classiﬁcation [14].

3. Description of HEA with GP 3.1. The general non-linear programming problem The general non-linear programming (NLP) problem can be expressed in the following form: f ðX; YÞ

min s:t:

hi ðX; YÞ ¼ 0; i ¼ 1; 2; …; k1 ;

g j ðX; YÞ r 0; j ¼ k1 þ 1; k2 þ 1; …; k X lower r X r X upper ; Y lower r Y r Y upper

ð3Þ

where X A Rp , Y A N q and the objective function f ðX; YÞ, the equality constraints hi ðX; YÞ and the inequality constraints g j ðX; YÞ are usually nonlinear functions which include both real variable vector X and integer variable vector Y. Denoting the domain: D ¼ fðX; YÞjX lower r X rX upper ; Y lower rY rY upper g We introduce the concept of a subspace V of the domain D. m couple points ðX j ; Y j Þ; j ¼ 1; 2; …; m, in D are used to construct the subspace V, deﬁned as ( ) V¼

m

ðX V ; Y V Þ A DjðX V ; Y V Þ ¼ ∑ ai ðX i ; Y i Þ i¼1

where ai is subject to m

∑ ai ¼ 1;

i¼1

0:5 r ai r 1:5:

Because we deal mainly with optimization problems which have real variables and inequality constraints, we assume k1 ¼ 0 and q¼ 0 in the expression (3). Denoting ( k 0; g i ðXÞ r 0 and WðXÞ ¼ ∑ W i ðXÞ W i ðXÞ ¼ g i ðXÞ otherwise i¼1 Then problem (3) can be expressed as follows: minf ðXÞ;

X AD

s:t:WðXÞ ¼ 0;

X AD

ð4Þ

We deﬁne a Boolean function better as

betterðX 1 ; X 2 Þ ¼

8 WðX 1 Þ o WðX 2 Þ > > > > < WðX 1 Þ 4 WðX 2 Þ ðWðX 1 Þ ¼ WðX 2 ÞÞ 4 ðf ðX 1 Þ r f ðX 2 ÞÞ > > > > : ðWðX 1 Þ ¼ WðX 2 ÞÞ 4 ðf ðX 1 Þ 4 f ðX 2 ÞÞ

TRUE FALSE TRUE FALSE

F. Tang et al. / Neurocomputing 148 (2015) 143–149

145

Cauchy distributed one-dimensional random number with t ¼1. qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ The factors τ and τ0 have commonly set to ð 2 ðp þ qÞÞ 1 and pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ 1 ð 2ðp þ qÞÞ .

If betterðX 1 ; X 2 Þ is TRUE this means that the individual X 1 is better than the individual X 2 . 3.2. A hybrid evolutionary algorithm

3.3. HEA with GP

We introduce a new hybrid evolutionary algorithm to solve global optimization:

The HEA-GP is a new kind of hybrid evolutionary modelling algorithm we have proposed approaching the task of automatic modelling of ODEs for the dynamic system. Its main idea is to embed a Hybrid Evolutional Algorithm (HEA) in genetic programming (GP) where GP is employed to optimize the structure of a model, while a HEA is employed to optimize the parameters of the model. It operates on two levels. One level is the evolutionary modelling process and the other one is the parameter optimization process. The structure of HEA-GP can be described in pseudo code as follows:

Algorithm 1. A hybrid evolutionary algorithm.

Algorithm 2. Procedure HEA-GP.

where Z nGi ðjÞ, Z nCi ðjÞ and σ i ðjÞ denote the jth component of the vectors Z nGi , Z nCi and σ i , respectively. Nð0; 1Þ denotes a normally distributed one-dimensional random number with mean zero and standard deviation one. N j ð0; 1Þ indicates that the Gaussian random number is generated anew for each value of j. C j ð1Þ denotes a

Once the best evolved model is obtained in one run, to check its effectiveness, we take the last line of observed data as the initial conditions, advance the solution of the ODEs model by numerical integration using some numerical methods, such as the modiﬁed Euler method, and get the predicted values for unknown data in

Table 1 The modelling results of the three best evolutionary models for Example. Evolutionary solution

Model I

Model II

Model III

Year

Numberi

xi

x^ i

x^ i

x^ i

1790 1800 1810 1820 1830 1840 1850 1860 1870 1880 1890 1900 1910 1920 1930 1940 1950 FE PE

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

3.9 5.3 7.2 9.6 12.9 17.1 23.2 31.4 38.6 50.2 62.9 76.0 92.0 106.5 123.2 131.7 150.7

3.535357 5.156120 7.570995 9.915323 13.087546 17.293678 22.678036 30.352379 40.267639 49.184601 62.618629 76.934494 91.309525 107.662506 122.202255 131.699997 150.699997 3.069454 5.646193

3.535357 4.768738 7.238487 9.632812 12.867659 17.148132 22.613745 30.378368 40.369537 49.320061 62.753670 77.020882 91.324280 107.611816 122.157379 137.396286 150.477142 3.091522 5.700646

3.535357 5.634524 7.929233 10.166016 13.206610 17.261953 22.490780 30.010643 39.829693 48.743813 62.287102 76.814316 91.436218 108.040558 122.736099 137.936386 151.287262 3.350870 6.263979

146

F. Tang et al. / Neurocomputing 148 (2015) 143–149

the next time steps. As for the representation, ﬁtness evaluation and genetic operators of these two processes, interested readers can refer to [3,4] to get more details.

In addition, to measure the modelling results, we deﬁne the ﬁtting error (FE) and the prediction error (PE) of an ODEs model as sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ FE ¼

m

∑ ðx^ i xi Þ2 ;

i¼1

4. The application of HEA-GP In this section, we provide two examples to illustrate the process of identifying models by genetic programming.

mþn

PE ¼

∑

i ¼ mþ1

ðx^ i xi Þ2

ð5Þ

where xi denotes the observed value of X, x^ i denote the ﬁtting value and the predicted value of the ODEs model for FE and PE respectively, m and n are the numbers of data points to ﬁt and to predict respectively. The three best models discovered by computer automatically in forty runs are:

4.1. Example 1 The experimental data are cited from [8] which are about the populations of the United States from 1790 to 1950 (see the practical data column in Table 1 where the unit of xi is million). Our task is to build an ODEs model based on the observed data from 1790 to 1930 to describe the growing process of the population and predict the populations of 1940 and 1950. To examine the effectiveness of the HEA-GP, we run it on the applications of time series, one of which is one-dimensional and the other one is three-dimensional. Forty runs and twenty runs are conducted independently for them respectively and the best model is shown and discussed. All the experiments are performed on PC using Visual Cþ þ compilers. The parameter settings are as follows: For the evolutionary modelling process: We use the function set F ¼ f þ ; ; n; =; 4 ; sin ; cos ; exp; lng where x 4 k symbolizes xk ð0 o k o 5Þ, the terminal set T ¼ x1 ; …; xn ; t; c, where n is the number of equations in the system of ODEs and c is a random constant, a population size of 50, a maximum tree depth of 3 and a maximum of 50 generations per run. For the parameter optimization process: We use a population size of 80, a 60% crossover rate, a 30% mutation rate and a 10% reproduction rate, and the termination criterion of that the ﬁtness value of the best individual has remained unchanged for 3 generations. 160 observed value fitting value predicted value

140

populations

120 100 80 60

Model I: dx=dt ¼ ð1:384360 ðxntÞÞnððx þ1:548616Þn20:574356Þ Model II: dx=dt ¼ xnð30:912327 ðxn0:151000ÞÞ The simpliﬁed form: dx=dt ¼ 30:912327x 0:151000x2 Model III: dx=dt ¼ ððxn21:926580Þ þ 111:811684Þn cos ðxntÞ. Their modelling results are listed in Table 1, and Fig. 1 illustrates the ﬁtting and predicting results of Model I. From Table 1 and Fig. 1, we can see that multiple superior ODE models can be discovered automatically by running the HEA-GP. Although the structures of those models are different enormously, their ﬁtting values and predicted values can coincide with the practical data very well. More interestingly, the simpliﬁed form of the HEA-GP model II is in agreement with the result in [8] obtained by using the classical Logistic Model. According to the assumption of the Logistic Model dx x ¼ r 1 x ð6Þ dt xm where r and xm are two constants, the values of r and xm are determined as 0.31 and 197 respectively based on the practical data in [8], that is dx ¼ 0:31x 0:001537x2 dt

ð7Þ

As the form of an ODEs model is related to the unit of time t, the models for the same data set may look different under different time units. In the HEA-GP, we use the modiﬁed Euler method with the stepsize 0.01 to do the numerical integration whose time unit is 0.01 of the Logistic Model, therefore there is a difference of 100 times between the two models while the results are identical in essence. It is worth noticing that the evolutionary model II is discovered by the computer automatically based on the dynamic data of the system while model (7) is built based on the assumption of the classical model. Their agreement indicates that the HEA-GP has great potential in discovering the scientiﬁc laws in dynamic observed data.

40

4.2. Example 2 20 0

0

2

4

6

8

10

12

14

16

number Fig. 1. Illustrates the ﬁtting and predicting results of Model I.

18

The experimental data are taken from Gan [15] giving the annual sunspot numbers over 176 years, from 1749 to 1924. We take the observed data of the ﬁrst 170 years as historical data to build ODE(1) models, ODE(2) models, ODE(3) models and ODE(4)

Table 2 Parameter settings of the modelling experiments for ODE. Structure optimization (GP)

Function set

F ¼ fþ ; ; n; =; 4 ; sin ; cos ; exp; lng where x 4 n symbolizes xn ð0 o n o 5Þ

Terminate set Control parameters

T ¼ y1 ; …; yn ; t; c where n is the order of ODE and c is a random constant Popsize ¼ 50; MaxDepth ¼ 4; MaxGen ¼ 50

Parameter optimization (GA)

Control parameters

Popsize¼ 20, pc ¼ 0:6, pm ¼ 0:3, Termination Criterion: The ﬁtness value of the best individual has remained unchanged for 3 generations.

F. Tang et al. / Neurocomputing 148 (2015) 143–149

Table 3 The statistical results for Example I (10 runs). Model

ODE(1)

ODE(2)

ODE(3)

ODE(4)

AFE APE ANnodes Nsucc

260.657 99.586 8.1 10

102.560 30.452 9.1 10

26.474 71.277 7.8 9

93.686 32.405 8.0 2

Table 4 The best ODE(2) model for Example 2. (

Evolutionary solution

dy1 =dt ¼ y2 dy2 =dt ¼ ðlnðjy2 jÞ ðy1 57:668ÞÞnðy1 n55:339Þ

x″ ¼ 55:339455xðlnjx0 j x þ 57:668163Þ 102.390884 10.011124 63.6 37.6 26.1 14.2 5.8 60.484 40.62 25.95 16.75 11.86

Equivalent ODE FE PE Observed value Predicted value

180

16.7 10.63

observed value fitting value predicted value

160

Sunspot Numbe

140 120 100 80 60 40 20 0 0

20

40

60

80

100

120

140

160

180

147

the case that limited information is known to a system, to partly replace human intelligence with computational intelligence in some steps of traditional modelling, including the development of assumptions, the construction and the calculation of a model, to complete the whole modelling task. The results from Section 4 do not indicate that genetic programming is a methodology that outperforms more traditional modelling methods. Indeed, the numerical performance of models from HEA-GP is not better than the numerical performance of simpler linear models, while their complexity is larger. However, we have only used a single set of parameters for the HEA-GP algorithm without trying to determine the optimal HEA-GP parameters for the problem at hand. Hence, we conclude that economic modelling with HEA-GP is still in its infancy, but its similar performance to more traditional modelling techniques nevertheless suggests that there is potential to be explored further. In this section, we discuss various issues regarding searching for economic models with HEA-GP and indicate future directions along which this potential could be explored. As with many data-driven methods, HEA-GP assumes that data have been collected independently along different dimensions. In economic data, however, different deﬁnitions might be used for the same quantity. Since the deﬁnitions form an integral part of the models that we search for, it is important to use data obtained according to the deﬁnitions assumed in the model. In our example, for instance, the results would have been different if the national demand were deﬁned in another way, because equilibrium is expected between the national demand and the gross domestic product. Therefore, it is important to study the way the data have been collected when applying the HEA-GP methodology for modelling. The modeler has to form a thorough understanding of the way the input data have been collected, and the assumptions that is implicit in the data.

Serial No. of Year Fig. 2. The ﬁtting and prediction curves of the best ODE(2) model for Example 2.

models, and predict the values of the last 6 years. The parameter settings are shown in Table 2. Table 3 shows the statistical results of ten runs for the HODE models with different order built for Example I. From the fact that the rate of success is rather low for the ODE(4) model (only succeeded twice in ten runs), we infer that it is not appropriate to use fourthorder ODE model to describe this time series. In addition, comparing the AFE and the APE of the other three models, we see that both kinds of error are the largest for the ODE(1) model. For the ODE(3) model, the AFE is quite small while the APE is large. For the ODE(2) model, the AFE and the APE are rather moderate. If we choose a model with emphasis in describing a system, the ODE(3) model is the best. While the main objective of modelling a system is to make predictions, rather than to ﬁt the observed data only, we consider that the ODE (2) model is preferable to the ODE(3) model to describe and predict the time dynamic data. We choose the best ODE(2) model in ten runs to illustrate the effectiveness of the HEA-GP. Table 4 presents its modelling results and Fig. 2 depicts its ﬁtting and prediction curves. The computer can search out multiple superior ODE models whose structures are usually unimaginable for human. These show that the computational intelligence can be competitive with the intelligence of mankind, even surpass it in some sense.

5. Discussion In view of the drawbacks in modelling the complex systems by the use of most available methods, we consider using HEA with GP to approach the modelling problem of complex systems. That is, in

6. Conclusions We have discussed how genetic programming can be used for generating models of economic processes in a data driven manner. A framework has been proposed within which the development of an economic model can be formulated as a HEA-GP search. The advantage of the method is that various assumptions regarding model structure can be relaxed, letting the data speak for themselves. The proposed framework follows the modelling approach familiar to economists, where they can specify deﬁnitions, equilibrium conditions and behavioural equations. The deﬁnitions are not changed by the proposed method, the equilibrium conditions are used to determine model ﬁtness and the behavioural equations are optimized to increase the ﬁt to the observed data. In this paper, based on the hybrid evolutionary modelling algorithm called HEA-GP, we report two applications of this algorithm: one is the dynamic population model of the United States from 1790 to 1950, the other is the annual sunspot numbers over 176 years model. The experimental results show that by running the HEA-GP, the computer can discover high-level knowledge modelled by a system of ordinary differential equations (ODEs) in observed data automatically. It can not only discover the dynamic model which can compare with the classical model or exact model, but also some suitable models whose structures are usually unimaginable for human. It is promising for the HEA-GP to serve as a powerful tool for the automatic discovery of the knowledge in dynamic data, especially for the discovery of scientiﬁc laws in observed data. We present a new idea for modelling and predicting onedimensional dynamic data using high-order ordinary differential equations (HODEs) models instead of the traditional time series analysis Models. Accordingly, a real-time hybrid evolutionary

148

F. Tang et al. / Neurocomputing 148 (2015) 143–149

modelling algorithm is proposed to approach this task. The main idea of the algorithm is to embed a genetic algorithm (GA) into genetic programming (GP), where GP is employed to optimize the structure of a model, while a GA is employed to optimize the parameters of the model. This algorithm has the following advantages compared with traditional modelling methods used for time series analysis: (1) It has overcome the limitation of linear models in traditional models, and is capable of building complex non-linear HODE models for one-dimensional dynamic data. (2) The optimization of the model structure and the optimization of the parameters of the model can be performed simultaneously by using the two main processes in the Real-time HEA-GP. (3) It depends very little on domain details and human expertise. The whole modelling process is carried on automatically. (4) It implements the real-time modelling and predicting of dynamic data with the renewing of observed data and has taken a ﬁrst step toward the automatic programming for the dynamic data real-time predictions. Our research in this work has offered a new tool for automatic KDD. Further research is needed: (a) to study the possibility of determining the optimal order of ODE model family under different in advance rather than experimenting with different orders one by one; (b) to explore some techniques to reducing the computational cost of the algorithm thus to shorten the modelling time; (c) to implement the algorithm in parallel computer systems. All these issues wait to be addressed in our future research.

[12] F. Castillo, A. Kordon, J. Sweeney, W. Zirk, Using genetic programming in industrial statistical model building, in: U.-M. OReilly (Ed.), Genetic Programming Theory and Practice II, vol. 3, Springer, 2006, pp. 31–48. [13] W. LEE, Genetic programming decision tree for bankruptcy prediction, in: Proceedings of the 2006 Joint Conference on Information Sciences, Atlantis Press, 2006. [14] J. Hong, S. Cho, Lymphoma cancer classiﬁcation using genetic programming with snr features, in: Proceedings of 7th EuroGP Conference, 2004, pp. 78–88. [15] R.C. Gan, The Statistical Analysis of Dynamic Data, Beijing University of Science and Technology Press, Beijing, 1991 (in Chinese). Fei Tang received his M.S. degree from Shenzhen University, China, in 2007. Currently, he is an assistant professor at Shenzhen Institute of Information Technology. His research interests include artiﬁcial intelligence and image processing.

Sanfeng Chen was born in 1979 and received the Ph.D. degree from the University of Science and Technology of China, in 2008. She is an assistant professor at Shenzhen Key Lab of Visual Media Processing and Transmission, Shenzhen, China and Shenzhen Institute of Information Technology, Shenzhen, China. Her research interests include artiﬁcial intelligence, signal processing and pattern recognition.

Acknowledgements This work was supported by the National Natural Science Foundation of China (Nos. 71101096 and 61172165), the Natural Science Foundation of Guangdong Province (Nos. S201101000849, S2011010003890 and S2012010008540), Shenzhen Basic Research Project for Development of Science and Technology (Nos. JC201006020807A and JC201105190819A), Research Project of SZIIT (Nos. CXTD2-005 and BC2009014).

Xu Tan was born in 1981 and received his Ph.D. degree in Management Science and Engineering from National University of Defense Technology, China, in 2009. Currently, he is an associate professor at Shenzhen Institute of Information Technology. His research interests include granular computing, intelligent decision and knowledge discovery.

References [1] M. Schmidt, H. Lipson, Distilling freeform natural laws from experimental data, Science 324 (5923) (2009) 81–85. [2] U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, R. Uthurusamy, Advances in Knowledge Discovery and Data Mining, AAAI Press, The MIT Press, 1966. [3] P.S. Ngan, M.L. Wong, K.S. Leung, J.C.Y. Cheng, Using grammar based genetic programming for data mining of medical knowledge, in: Genetic Programming 1998: Proceedings of the Third Annual Conference, 1998, pp. 254–259. [4] H.Q. Cao, L.S. Kang, Z. Michalewicz, Y.P. Chen, A two-level evolutionary algorithm for modeling system of ordinary differential equations, in: Genetic Programming 1998: Proceedings of the Third Annual Conference, 1998, pp. 17–22. [5] M. Mitchell, An Introduction to Genetic Algorithms, MIT Press, Cambridge, MA, 1996. [6] J.R. Koza, Genetic Programming: On the Programming of Computers by Means of Natural Selection, MIT Press, Cambridge, MA, 1992. [7] J.R. Koza, Genetic Programming ii: Automatic Discovery of Reusable Programs, MIT Press, Cambridge, MA, 1994. [8] Q.Y. Jiang, Mathematical Models, II ed., Education Press, Beijing, 1993, pp. 110– 221 (in Chinese). [9] P. Andreae, H. Xie, M. Zhang, Genetic programming for detecting rhythmic stress in spoken English, Int. J. Knowl.-Based Intell. Eng. Syst. 12 (1) (2008) 15–28 (special issue on Genetic Programming). [10] L.B. Desa, A. Mesquita, Evolutionary synthesis of low-sensitivity equalizers using adjacency matrix representation, in: Proceedings of the 10th Annual Conference on Genetic and Evolutionary Computation, ACM, 2008, pp. 1283–1290. [11] D. Agnelli, A. Bollini, L. Lombardi, Image classiﬁcation: an evolutionary approach, Pattern Recognit. Lett. 23 (2002) 303–309.

Tao Hu received the Ph.D. degree from Huazhong University of Science and Technology, in 2009. He is an assistant professor at Shenzhen Key Lab of Visual Media Processing and Transmission, Shenzhen Institute of Information Technology, Shenzhen, China. His research interests include machine vision and image processing.

Guangming Lin was born in 1963 and received the Ph.D. degree from the University of New South Wales, in 2003. He is a professor at Shenzhen Key Lab of Visual Media Processing and Transmission, Shenzhen, China and Shenzhen Institute of Information Technology, Shenzhen, China. His research interests include evolutionary algorithms, parallel computing and optimization.

F. Tang et al. / Neurocomputing 148 (2015) 143–149

Zuo Kang received the Ph.D. degree from Wu Han University, China, in 2006. He is a professor at Wu Han University. His research interests include evolutionary computation and parallel computing.

149

Discovery scientific laws by hybrid evolutionary model

Discovery scientific laws by hybrid evolutionary model

Recommend Documents