Proposal of the “data to equation” algorithm

Proposal of the “data to equation” algorithm

Chaos, Solitons and Fractals 114 (2018) 423–432 Contents lists available at ScienceDirect Chaos, Solitons and Fractals Nonlinear Science, and Nonequ...

1MB Sizes 0 Downloads 18 Views

Chaos, Solitons and Fractals 114 (2018) 423–432

Contents lists available at ScienceDirect

Chaos, Solitons and Fractals Nonlinear Science, and Nonequilibrium and Complex Phenomena journal homepage: www.elsevier.com/locate/chaos

Frontiers

Proposal of the “data to equation” algorithm Shin-ichi Inage a,b,∗, Kazumi Yamaguchi c a

Advanced Energy Systems for Sustainability, Institute of Innovation Research, Tokyo Institute of Technology, Ishikawadai-6 Bld. Room 419 2-12-1 Ookayama, Meguro-ku, Tokyo 152-8550, Japan b Hitachi Ltd., Energy Solution Business Unit, Akihabara Daibiru Building, 18-13 Soto-Kanda 1-chome, Chiyoda-ku, Tokyo 101-8608, Japan c Hitachi Industry & Control Solutions, Ltd., Asahi-Seimei Building 1-22-1 Saiwai-cho, Hitachi-shi, Ibaraki-ken 317-0073, Japan

a r t i c l e

i n f o

Article history: Received 23 February 2018 Revised 11 April 2018 Accepted 11 June 2018

a b s t r a c t A new algorithm to determine the master equation behind time-series data is proposed. For this algorithm, a library of potential terms for the master equation for a finite difference formula was given. Using time-series data, each time-series potential term was estimated. The master equation was written as a linear sum of each potential term. Additionally, each coefficient in front of the potential terms was determined to minimize  (linear sum)2 during the given period using the genetic algorithm and Gauss–Seidel method. The genetic algorithm minimized both  (linear sum)2 and number of potential terms, because the estimated master equation by the time-series data should be simple. This approach was applied to several functions and so-called Lorenz equations, which demonstrate typical chaotic motion. The timeseries data was given through the numerical calculation of sample functions and Lorenz equations. The estimated master equation based on the proposed approach was compared with sample functions and the Lorenz equations. In each case, the coefficients in the equations were in good agreement. There was a small discrepancy between the coefficients of the original Lorenz equations and the estimated master equation. For the Lorenz equations, in the short-term, the attracters were in good agreement. By contrast, the discrepancies influenced the long-term attracters based on chaotic motion. © 2018 Elsevier Ltd. All rights reserved.

1. Introduction In this paper, we propose a concept to determine the master equation or law of physics behind time-series data. Many approximation techniques for time-series data have been proposed and there are many review articles. The approximation techniques are roughly classified into (1) local approximation; (2) global approximation, which includes response surface methodology, design and analysis of computer experiments, neural networks and the genetic algorithm and (3) mid-range approximations. Local approximations are valid in the vicinity of the point at which they are generated, whereas global approximations are valid in the entire design space. They allow the study of the region for the location of optimal points. Typically, global approximation techniques include the response surface methodology (RSM), neural networks [10,12], and the design and analysis of computer experiments. In particular, George Cybenko proved that multi-layer neural networks based on a sigmoidal function can express any linear or nonlinear func∗ Corresponding author at: Advanced Energy Systems for Sustainability, Institute of Innovation Research, Tokyo Institute of Technology, Ishikawadai-6 Bld. Room 419 2-12-1 Ookayama, Meguro-ku, Tokyo 152-8550 Japan. E-mail addresses: [email protected], [email protected] (S.-i. Inage).

https://doi.org/10.1016/j.chaos.2018.06.015 0960-0779/© 2018 Elsevier Ltd. All rights reserved.

tions [8]. Abdulrahman Baqais proposed the genetic algorithm approach for optimal coefficients and power of the variables in function approximation [1]. Many approaches to estimating approximation functions using time-series data have been proposed and they contribute to practical applications. By contrast, these approaches can never estimate the physical law or determine the master equation behind time-series data. For example, the major disadvantage of neural models is that the input/output is approximated using a black-box approach, and no understanding of the underlying relationship can be gained. As remarkable research, Michael Schmidt and Hod Lipson applied artificial intelligence to determine the natural law from time-series data [14]. S Inage proposed an algorithm to determine the optimal coefficients on an assumed master equation using time-series data as a reverse problem to forecast irradiation in a broad geographical region [18]. Using the generalized concept, in this paper, we propose an algorithm to determine the master equation as a differential equation. To determine the optimal master equation, we consider a difference equation that consists of multiple potential terms that are estimated using time-series data, such as first, second, third and fourth-order differences, and the power of t and f(t). To minimize the linear sum of the potential term, the coefficients of each potential term are estimated numerically. Additionally, to minimize the number of contributing poten-

424

S.-i. Inage, K. Yamaguchi / Chaos, Solitons and Fractals 114 (2018) 423–432

Fig. 1. Overall flow of the proposed concept.

tial terms, the genetic algorithm is applied to simplify the master equation or physical law. If the coefficients of the potential term are sufficiently small, then the term is omitted. This approach is called data to equation (D2E).

included in the master equation is required. If the first-order difference term is chosen as the reference term, then Eq. (3) can be rewritten as



2. Fundamental concept Fig. 1 shows the overview of the proposed concept. Principally, the concept consists of reverse and forward problems of a differential equation. The forward problem is used to solve the differential equation with appropriate initial and boundary conditions to generate time-series data. By contrast, the reverse problem is used to estimate the initial and boundary conditions from time-series data with the differential equation. For the proposed algorithm, as a reverse problem, the master equation is estimated using time-series data for the genetic algorithm and Gauss–Seidel method. Additionally, to verify the estimated differential equation, the equation is solved numerically and the results can be compared with original time-series data. The details of each part are described as follows: The following time-series data are considered:





=

d f dt 2



i

1 = 

  df

dt i+1



 df  dt i

=

fi+1 + fi−1 −2 fi



2

(2)

···

If time-series data (f0 , f1 , f2 , f3 ,…, fN ) satisfy a differential equation as the master equation, then we obtain



C1

df dt



+Cm





+ C2 f i

d2 f dt 2



df dt





+ C3 f i

+·····=0

2

df dt





+ C4 f i

3

df dt





+·····



ei =

df dt

E =

N





df dt

+ A3 f 3 i



+····· i

(3 )

+·····=0 i





+ A1 f



i



df dt

d2 f dt 2

N

( ei )2 =



+A3 f 3



+ A2 f



2

i



df dt



+ A3 f

3

i



i

+···

(4)

i



df dt

0

df dt

df dt





+ A1 f i





+ · · · + Am i

2

d f dt 2

df dt





+ A2 f 2 i



df dt

 i

2

+···

.

(5)

i

Constants A1, A2, …are estimated to minimize the total error. Then, the following conditions are satisfied:



∂E ∂ A1





=



∂E ∂ A2



=

 ∂E · · · = 0. ∂ A3

For example, for A1 , N



O

df dt

  i

(3)

i

i

df dt

The total error during time 0–tN is defined as

i

where Ci denotes the coefficients of each potential term, which is estimated using real time-series data. For the proposed approach, coefficients C1 ,C2 … are estimated using time series. By contrast, as described below, at least one “reference” term that should be



+ A2 f 2

where Ai = Ci+1 /C1 . For the proposed concept, the potential term estimated using time-series data should be prepared as many as possible, because, generally, the correct master equation behind the time-series data is unknown. Constant Ai is estimated to minimize the error of the left-hand side of Eq. (3’), whose terms are estimated using time-series data. The error for each time is defined as

(1)



The second order: 2

d2 f dt 2

0

fi − fi−1

i



i



df dt

+ · · · +Am

For time step t of each time series, from the time-series data of f(t), the first, second and higher order difference formulas are given as follows: The first order:

df dt



+ A1 f

+Am

Time : (t0 ,t1 ,t2 ,t3 , . . . ,tN ), Data : ( f0 , f1 , f2 , f3 , . . . , fN ).





df dt



+ · · · +Am



N

0



df dt

df dt

d2 f dt 2





  f i

 + A1 f

i



df dt

(6)



 + A2 f 2 i

df dt



 + A3 f 3

i

df dt



+··· =0 i

df dt



A1 +

i

N

0



df dt



 f2

i

df dt

 A2 + · · · i

i

S.-i. Inage, K. Yamaguchi / Chaos, Solitons and Fractals 114 (2018) 423–432 Table 2 Estimated constants.

Table 1 Estimated constants.

=−

N



0



df dt

i

A1

4.091419e−11

A1

−0.1

A2 A3

0.405284 −6.843723e−11

A2 A3

0.2 −0.1



df dt

.

(7)

i

As mentioned, to maintain the right-hand side of Eq. (7), the non-zero “reference” term that should be included in the master equation is required. In the case of Eq. (3), without the reference term, the right-hand side is zero; therefore, the coefficients C1 , C2 ,… can never be solved. Thus, other constants satisfy the simultaneous equations, which includes all constants A1, A2, …An . This simultaneous equation can be solved using the Gauss–Seidel method numerically [2]. If these constants are given, then the differential equation can be treated as a physical model for time-series data. Additionally, if the considered potential terms are sufficiently small, then the terms can be omitted. This fundamental concept is checked using several simple examples as follows: Example-1 As a simple example, f = sin(2t/π ) is considered. As potential terms, the first and second difference formulas, f and f2 , based on the time series during t = 0–50 are considered. Then, Eq. (4) is written as



ei =

d2 f dt 2





+ A1 i

df dt



2

+ A2 f + A3 f .

(8)

i

As mentioned, constants A1 –A3 can be estimated to minimize total error E in Eq. (5). In this case, the simultaneous equation can be written as

⎡    2       ⎤⎡ ⎤ ⎡   df   d2 f  ⎤ 2 df − 2 dt fi df fi df A1 dt i dt i ⎢  i 2 dt i ⎥ ⎢  dt i ⎥   ⎢ ⎥ ⎢ ⎥ d f 2 3 df ⎢ fi ⎥⎣A2 ⎦ = ⎢ − fi 2 ⎥. fi fi dt dt i ⎣ ⎦ i ⎦ ⎣    2  df   3  4  2 d2 f fi

fi

dt i

fi

A3



fi

dt 2

Table 1 shows the results of estimated constants A1 –A3 using the Gauss–Seidel method. Two hundred time-series data during t = 0–2 for t = 0.01 are given. Compared with A2 , A1 and A3 are negligible. Hence, the master equation is

d2 f dt 2



= −0.405284 f.

(10)

Theoretically, in the case of f = sin(2t/π ), the coefficient of f is (2/π )2 = 0.40528. This discrepancy depends on the approximation degree between the finite difference equation and difference equation. To reduce the error, time step t of the time-series data should be sufficiently small. In this case, the total error is 5.730635e−17. Example-2 On this example, time-series data based on the following nonlinear differential equations are considered. The time-series data are given through a numerical simulation for the initial conditions t = 0, f = 0; t = 1, f = 0.1:

d2 f df = 0.1 − 0.2 f + 0.1 f 2 dt dt 2



ei =

d2 f dt 2





+ A1 i

df dt

(11)



+ A2 f + A3 f 2 . i

The second difference term is chosen as the reference term. Table 2 shows the estimated A1, A2 and A3 based on the proposed approach. There are 200 time-series data during t = 0–50 for t = 0.25. The total error is 2.399e−11 and By contrast, the coefficients A1 –A3 agree with Eq. (11), because the time series data is also given using the finite difference method of Eq. (11). Though these examples, using the appropriate choice of potential terms, the fundamental concept is confirmed. By contrast, as emphasized, the correct choice of the reference term is a key in the proposed approach. The described examples are rather simple cases used to choose the reference term. To expand this fundamental concept more generally, the following genetic algorithm is applied to estimate the optimal reference term. 3. Application of the genetic algorithm 3.1. Requirement of the genetic algorithm As is well known, the genetic algorithm is one of the most powerful algorithms based on the simulation of creatures’ heredity and evolution as multi-point search methods [3,7,13,19–21]. The following benefits are expected: a) The genetic algorithm can be applied to problems even in discrete search space. b) An optimal solution can be determined even when the landscape of the objective function is multi-modal. c) In multi-objective optimization, the genetic algorithm, which is a multi-point search, can determine a Pareto-optimal set using one trial to check the trade-off relations between multiobjectives.

i

(9)



425

(12)

To estimate the optimal master equation for a given time series, the genetic algorithm can support this fundamental concept in the following three aspects: 3.1.1. To search the simplest master equation For example, if f = sin(2t/π ) is considered, it satisfies differential equations, such as

 2

d2 f 2 =− π dt 2

f,

 2 df

d3 f 2 =− π dt 3

dt

,···

(13)

Thus, if first, second and third differences, and f are chosen as potential terms, such as



ei =

d3 f dt 3





+ A1 i

df dt





+ A2 i

d2 f dt 2



+ A3 f

(14)

Then, the constants are estimated as A1 = A3 = 0.4052496, A2 = 1. This can be written as

d2 f + 0.405284 f + dt 2



d3 f df + 0.405284 dt dt 3



= 0.

(15)

This includes two of the same master equations. To make the master equation simpler, the application of the genetic algorithm reduces the number of potential terms for minimum total error E in Eq. (5).

426

S.-i. Inage, K. Yamaguchi / Chaos, Solitons and Fractals 114 (2018) 423–432 Table 3 Referred equations. No,

Types

Differential equations

1

Sturm–Liouville equation

d dt

( p(t ) dtd f (t )) + q(t ) f (t ) + λW (t ) f (t ) = 0

2

+ p(t ) dtd f (t ) + q(t ) f (t ) = 0

2

Fuchsian equations

d f dt 2

3

Gauss equations

t(1 − t ) ddt 2f + (γ − (α + β + 1 ) ) dtd f (t ) − αβ f (t ) = 0

4

Bessel equations

t2 ddt 2f + t dtd f (t ) − (t 2 − α 2 ) f (t ) = 0

5

Legendre equations

d dt

6

Painlevé equations (except PⅤ , PⅥ )

PⅠ : ddt 2f = 6 f 2 + t

2

2

( (1 − t 2 ) dtd f (t )) + ν (1 + ν ) f (t ) = 0 2

PⅡ : ddt 2f = 2 f 3 + t f + α 2

2

7

Elliptic equations

( df )2 − dt

PⅢ : ddt 2f =

1 f

2 PⅣ : ddt 2f

1 2f

=

1 df t dt

( ) + df dt

2

3 2

+

1 t

(α f 2 + β ) + γ f 3 +

δ f

f + 4t f + 2(t − α ) f + βf 3

2

2

For f(t) = sn(t): d2 f dt 2

)2 = (1 − y2 )(1 − k2 y2 ) + (1 + k2 ) f − 2k2 f 3 = 0 and ( df dt

For f(t) = cn(t): d2 f dt 2

)2 = (1 − y2 )(1 − k2 + k2 y2 ) + (1 − 2k2 ) f + 2k2 f 3 = 0 and ( df dt

For f(t) = dn(t): d2 f dt 2

)2 = (y2 − 1 )(1 − k2 − y2 ) − (2 − k2 ) f + 2 f 3 = 0 and ( df dt

3.1.2. To search the optimal constants in the potential terms As potential terms, any functions, such as sin(Ct), cos(Ct) and exp(Ct), that include unknown constant C can also be chosen. Constant C in the potential term can never be estimated using the Gauss–Seidel method. By contrast, the genetic algorithm can optimize any constants in the potential terms to minimize total error E. 3.1.3. To choose the optimal reference term The incorrect reference term, which is never included in the master equation, will lead to an incorrect master equation. When the incorrect reference is chosen, the total error using Eq. (5) is much larger than the case with the correct reference term. Therefore, using the genetic algorithm, the optimal reference is also chosen to minimize the total error. In this paper, the reference term is searched from the first, second, third and fourth difference terms. On this algorithm, the multi-objectives are as follows: (1) Minimize the number of terms in the master equation. (2) Minimize the total error in Eq. (5). On this paper, the neighborhood cultivation genetic algorithm (NCGA) is applied [4,5,9,16,17]. The NCGA has a neighborhood crossover mechanism in addition to the mechanisms of general genetic algorithms. The NCGA has advantages compared with other genetic algorithms from the point of view of the solution survey in discrete search space. 3.2. Overall algorithm Preparation of the potential term based on time-series data: In this paper, as a library, the following potential terms are prepared: 1) 2) 3) 4) 5)

first, second, third and fourth-order difference data; power functions of t, such as t0 = 1, t, t2 , t3 ,…, t10 , 1/t, tC ; power functions of f, such as f, f2 , f3 ,…, f10 , 1/f, fC ; general functions of t and f, such as sin(Ct), cos(Cf), exp(Ct); product of terms of 1)–4).

In total, 140 potential terms are considered. The potential terms are chosen to cover the typical differential equations in Table 3. Some of these differential equations are generally discussed in the complex plane. By contrast, in this paper, we focus on differential equations in the real plane.

Flow of the algorithm: The overall flow of the algorithm is summarized as follows: STEP-1 Set the time-series data. STEP-2 Estimate the considered potential terms in a library using time-series data, STEP-3 Initialize the combination of the potential terms based on the genetic algorithm STEP-4 Estimate the constants of the coefficients of the potential terms based on the Gauss–Seidel method. STEP 5 Search for the optimal combination of the potential terms using the genetic algorithm. 4. Validation model The optimized master equation is compared with time-series data and validated. For this approach, the results are based on the differential equation. Therefore, for a direct comparison with the time series, the numerical calculation based on the estimated optimized master equation is required. In this section, the numerical approach is discussed based on Euler’s method [2]. For this approach, the master equation is written generally as

C1 t + C2 t 2 + · · · + Ci eC j t + Ck f + Cl f 2 + · · ·   df + Cm t + Cn t 2 + · · · + C p f + · · · dx   d2 f 2 2 + Cq t + Cr t + · · · + Cs f + Ct f + · · · d x2 3   d f + Cu t + Cv t 2 + · · · + Cw f + · · · d x3   d4 f + Cz t + Caa t 2 + · · · + Cab f + Cac f 2 + · · · = e. d x4

(16)

More simply, Eq. (16) can be written as follows:

df d2 f + F2 (t, f ) 2 dx dx d3 f d4 f +F3 (t, f ) 3 + F4 (t, f ) 4 =  ei . dx dx

G(t, f ) + F1 (t, f )

(17)

For the up-wind scheme, the first, second, third and fourthorder differences are defined as

df fi+1 − fi = dx 

(18)

S.-i. Inage, K. Yamaguchi / Chaos, Solitons and Fractals 114 (2018) 423–432

427

Table 4 Difference formulas for validation. Highest order

Master equations

1st order

df = e. G(t, f ) + F1 (t, f ) dx

From Eq. (18), df dx

=

f i+1 − f i t

=

e−G(ti , f i ) . F1 (ti , f i )

(ti , fi ) . (Initial conditions: f0 ) ∴ f i+1 = f i + t e−G F (t , f ) 1

2nd order

i

i

2

df + F2 (t, f ) ddx2f = e. G(t, f ) + F1 (t, f ) dx

From Eqs. (18) and (19), d2 f d x2

=

f i+1 −2 f i + f i−1 t 2

=

e−G(ti , f i ) F2 (ti , f i )

∴ f i+1 = 2 f i − f i−1 + t ( 2

3rd order



F1 (ti , f i ) f i − f i−1 t F2 (ti , f i )

e−G(ti , f i ) F2 (ti , f i )



.

F1 (ti , f i ) f i − f i−1 t F2 (ti , f i )

2

). (Initial conditions: f1 , f0 )

3

df + F2 (t, f ) ddx2f F3 (t, f ) ddx3f = e. G(t, f ) + F1 (t, f ) dx

From Eqs. (18)–(20), d3 f d x3

=

f i+1 −3 f i +3 f i−1 − f i−2 t 3

=

e−G(ti , f i ) F3 (ti , f i )

∴ f i+1 = 3 f i − 3 f i−1 + f i−2 (ti , fi ) − +t 3 ( e−G F (t , f ) 3

i

i



F1 (ti , f i ) f i − f i−1 t F3 (ti , f i )

F1 (ti , f i ) f i − f i−1 t F3 (ti , f i )





F2 (ti , f i ) f i −2 f i−1 + f i−2 F3 (ti , f i ) t 2

F2 (ti , f i ) f i −2 f i−1 + f i−2 F3 (ti , f i ) t 2

.

)

(Initial conditions: f2 , f1 , f0 ) 4th order

2

3

4

df + F2 (t, f ) ddx2f + F3 (t, f ) ddx3f + F4 (t, f ) ddx4f = e. G(t, f ) + F1 (t, f ) dx

From Eqs. (18)–(21), d4 f d x4

=

f i+1 −4 f i +6 f i−1 −4 f i−2 + f i−3 t 4 f −f (ti , fi ) − FF1 ((tti ,, ffi )) i ti−1 = e−G F4 (ti , f i ) 4 i i



F2 (ti , f i ) f i −2 f i−1 + f i−2 F4 (ti , f i ) t 2

f) − FF3 ((t, t, f ) 4

f i −3 f i−1 +3 f i−2 − f i−3 t 3

∴ f i+1 = 4 f i − 6 f i−1 + 4 f i−2 − f i−3 (ti , fi ) − +t 4 ( e−G F (t , f ) 4

i

i

F1 (ti , f i ) f i − f i−1 t F4 (ti , f i ) f i −2 f i−1 + f i−2 F3 (ti , f i ) f i −3 f i−1 +3 f i−2 − f i−3 F4 (ti , f i ) t 2 t3 4 i i

− FF2 ((tti ,, ffi ))

)

(Initial conditions: f3 , f2 , f1 , f0 )

5. Verification of the proposed algorithm

d2 f fi+1 − 2 fi + fi−1 = d x2 2

(19)

d f fi+1 − 3 fi + 3 fi−1 − fi−2 = d x3 3

(20)

d4 f fi+1 − 4 fi + 6 fi−1 − 4 fi−2 + fi−3 = . d x4 4

(21)

3

The validation flow is as follows: STEP-1: Estimate the highest order of difference terms based on the current approach. STEP-2: Numerically estimate the finite difference equation as follows: First, estimate the highest order of the differential terms. Then define the differential terms from the first order to fourth order. Thus, the finite difference method is applied as follows: On the case of the highest order it the first order, Eq. (17) is

df G(t, f ) + F1 (t, f ) = e. dx

(22)

From Eq. (18),

df fi+1 − fi e − G(t, f ) = = . dx t F1 (t, f ) e − G(t, f ) . F1 (t, f )

 ei =

d2 f dt 2



 + A1 i

df dt



 + A2

i

d3 f dt 3

 + A3 f + A4 f 2 .

(25)

i

Using the proposed method, A1 = A2 = A4 = 0 is confirmed. Additionally, A1 = 0.405284720881364 and e = 4.768564e−15. The total error is 2.404785e−22. By citing a figure, estimated f(t) was in good agreement with sin(2t/π ). Example-2 2

This example is the simple case of ddt 2f = 0.1 df − 0.2 f and was dt discussed in the previous section. The time-series data is given by a numerical solution of the difference equation for t = 0.001. The potential master equation is written as



(23) ei =

Therefore,

fi+1 = fi + t

The proposed algorithms was developed as a tool based on VBA on HP Compaq Pro 6300 All in one PC (Processer: Intel(R) Core (TM) i7-2770S CPU @3.10 GHz, Memory: 4.00GB). Table 5 shows the verification of the proposed algorithm. Example-1 This example is the simple case of f = sin(2t/π ) and was discussed in the previous section. In this case, the potential master equation is written as

d2 f dt 2



 + A1 i

df dt



 + A2

i

d3 f dt 3

 + A3 f + A4 f 2 + A5 f 3 . i

(24)

(26)

If the initial condition is given, then Eq. (24) can be solved. General progress with higher order difference terms is summarized in Table 4. Using this approach, estimated master equation can be compared with original time-series data.

Using the proposed method, A1 = −0.1, A2 = A4 = A5 = 0, A3 = 0.2 is confirmed. Additionally, e࣌ 0. The total error is 4.989007e−12. By citing a figure, there is no numerical discrepancy between sin(2t/π ) and the estimated f(t).

428

S.-i. Inage, K. Yamaguchi / Chaos, Solitons and Fractals 114 (2018) 423–432

Table 5 Comparison of the results and validation. Function or differential equation

Theoretical master equation

Estimated master equation

d2 f dt 2

d2 f dt 2

= −0.40528 f +4.7685e−15

d2 f dt 2

− 0.2 f −5.59101e-10 = 0.1 df dt

d2 f dt 2

− 0.2 f = 0.1 df dt

Example-1 f = sin( 2πt ) Period: t = 1–4 t = 1e−3 Data number:30 0 0

= −( π2 )2 f

Example-2 d2 f dt 2

− 0.2 f = 0.1 df dt

Period: t = 0–4 t = 1e−3 Data number:40 0 0

Same as on the left

Example-3 d2 f dt 2

− 0.2 f + 0.2 f 2 = 0.1 df dt

Same as on the left

Period: t = 0 − 4t = 1e − 3 Data number:40 0 0

+0.2 f 2 −5.72618e-09

d2 f dt 2

Example-4 ∞

f= (t ) = ∫ xt−1 e−x dx 0

Period: 0.1–4 t = 1e-3 Data number:3900

None

+(7.61115500799875 − 0.11864 f ) df dt −3.876703t − 2.6139007t 2 2 +16.12076 f − 3.885793 f = 0.1112640

Validation

S.-i. Inage, K. Yamaguchi / Chaos, Solitons and Fractals 114 (2018) 423–432

Example-3 As mentioned above, the error has a clear meaning. The proposed method is applied to a differential equation, which includes the nonlinear term of f as follows:

d2 f df = 0.1 − 0.2 f + 0.2 f 2 . dt dt 2

(27)

The time-series data is given by the numerical results of Eq. (48) for the initial conditions t = 0, f = 0; t = 0.001, f = 0.001 for t = 1e−3. The potential master equation is



ei =

2

d f dt 2





+ A1 i

df dt





+ A2 i

3

d f dt 3



+ A3 f + A4 f 2 + A5 f 3 . i

Using the proposed method, A1 = −0.1, A4 = −0.2, A2 = A5 = 0, A3 = 0.2 is confirmed. Additionally, e = −5.726186e−09. The total error is 5.233166e−10. By citing a figure, there is no numerical discrepancy between sin(2t/π ) and the estimated f(t). Example-4 On this example, f = (t). As well known, the gamma function (t) has already been proved to never satisfy any differential equation (Otto Ludwig [15], Eliakim Hastings [6]). The proposed method is applied to t = 1–4 for t = 1e-3. The estimated difference function is

d2 f df + (7.61115500799875 − 0.11864 f ) dt dt 2 −3.876703t − 2.6139007t 2 + 16.12076 f −3.885793 f

2

Therefore, the total error can be defined for X, Y and Z as

E1 =

N 

e1 ( i )

2

(36)

0

E2 =

N 

e2 ( i )

2

(37)

0

E3 =

N 

2

e3 ( i ) .

(38)

0

(28)

0.1112640 =

429

Using the same procedure for Eqs. (33)–(38), instead of Eqs. (3)–(7), all constants A1 ,A2 ,B1 …are estimated. In the case of simultaneous differential equations, the potential terms, such as XY, XZ and XYZ are prepared. For the genetic algorithm, the total error can be given as a single value of E1 + E2 + E3 . Table 6 shows the comparison between Lorenz equations and the proposed approach for 10,0 0 0 time-series data during t = 0– 1 for t = 1e−4. Using this approach, the estimated coefficients are in good agreement with the original coefficients of the Lorenz equations. Additionally, the coefficients for the extra potential terms are sufficiently small. The calculation time was 32 min by HP Compaq Pro 6300 All in one PC. Fig. 2 shows the validation of the proposed approach. As described in Table 6, some extra potential terms remain. Therefore, the validation difference equations are as follows based on Table 6:

Xi+1 = Xi + t

(−9.99Xi + 10.0Yi ) (1 + 1.71e − 06Yi Zi )

(39)

Yi+1 = Yi + t

(−1.0Xi Zi + 27.9Xi − 0.97Yi ) (1 − 2.34e − 6Xi Zi )

(40)

Zi+1 = Zi + t

(1.00XiYi − 2.68Zi ) . (1 − 7.59E − 5Zi + 3.49e − 6Xi Yi )

(41)

(29)

In Table 5, the red line indicates the numerical results of Eq. (29). The blue dashed line indicates the gamma function. Both lines are in good agreement, which means that even in the case of the gamma function, which never satisfies any differential equation, this approach can estimate an approximate differential equation for a local region. All of calculation of Example-1 to Example-3 was done during less than 1 min. 6. Verification of the algorithm by the application to a simultaneous differential equation The proposed concept can be applied to not only simple differential equations but also simultaneous differential equations. As an example, the following Lorenz equations of X, Y and Z were considered [11]:

dX = −pX + pY dt

(30)

dY = −X Z + rX − Y dt

(31)

dZ = X Y − bZ, dt

(32)

where p, r and b are constants. These equations are well known as typical chaotic phenomena. For this system, the proposed concept is as follows given the time-series data of X, Y and Z:

e1 ( i ) =

dX + A1 X + A2 Y + · · · dt

(33)

e2 ( i ) =

dY + B1 X Z + B2 X + B3 Y + · · · dt

(34)

e3 ( i ) =

dZ + C1 X Y + C2 Z + · · · dt

(35)

Eqs. (39)–(41) are solved during t = 0–30 for t = 1e−4. This time span is 30 time periods compared with the time span of simple cases in Table 3. Fig. 2 compares time series X, Y and Z between the original Lorenz equations in Eqs, (30)–(32) and estimated equations in Eqs. (39)–(41). As well known, each trend of X, Y and Z shows typical chaotic motion. The solid circles signify X, Y and Z based on the Lorenz equations, and the solid lines signify the results of Eqs. (39)–(41) for the same initial conditions: X0 = 0.1, Y0 = 0.15 and Z0 = 0.3. Fig. 3 compares the attractors between the Lorenz equations and Eqs. (39)–(41). The solid circles indicated X, Y and Z based on the Lorenz equations, and the solid lines show the results using the proposed approach. The proposed results simulate chaotic motion. The solid circles signify the attractors using Eqs. (39)–(41). By citing a figure, the proposed approach is in good agreement with the Lorenz equations. By contrast, the Lorenz attractor is very sensitive to the initial conditions and coefficients in the equations. As shown in Table 7 shows the sample values of X, Y and Z using the Lorenz equations and proposed approach at the representative moment. Clearly, with increasing time, the discrepancy between the Lorenz equations vs Eqs. (39)–(41) increases. As mentioned, the Lorenz attractor is very sensitive to the initial conditions of X, Y and Z. Table 8 checks the influence of the initial conditions on the coefficients of the potential terms. The different initial conditions induce different attractors of X, Y and Z. By contrast, the estimated coefficients of the potential terms are in good agreement. This means that the proposed approach is a robust method for estimating the potential master equation, even in the case of chaotic motion.

430

S.-i. Inage, K. Yamaguchi / Chaos, Solitons and Fractals 114 (2018) 423–432

Table 6 Comparison of coefficients between the Lorenz equations and proposed approach. Time-series data

Considered potential terms

X

dX dt

X Y X2 X dX dt Y Z dX dt Y

dY dt

X Y Y2 XZ Y dY dt XZ dY dt Z

dZ dt

Z Z2 XY Z dZ dt XY dZ dt

Coefficients Lorenz equation

Proposal

1 10 −10 − − − 1 −28 1 − 1 − − 1 8/3 = 2.666667 − −1 − −

1 9.991003 −9.999383 0 0 1.7142e−06 1 −27.896400 0.970553 −0.0 0 0629888 0.997455 0 −2.3374e−06 1 2.684706593 −0.0 0 0727547 −0.999108922 −7.58534e−05 3.48495e−06

Fig. 2. Comparison of X, Y and Z between the Lorenz equations and proposed approach.

Fig. 3. Comparison of X, Y and Z between the Lorenz attractor and proposed approach.

S.-i. Inage, K. Yamaguchi / Chaos, Solitons and Fractals 114 (2018) 423–432 Table 7 Comparison of X, Y and Z. Time

Then, the master equation is



Lorenz equation

0 5 10 15 20 25 30

       ∂f ∂f ∂f 2 ∂f C1 + C2 + C3 f + C4 f +··· ∂t i j ∂x ij ∂t i j ∂t i j  m  ∂ f +Cm + · · · = 0. (46) ∂ xm i j

Proposed approach

X

Y

Z

X

Y

Z

0.1 −8.029 −7.060 −2.489 −2.428 4.230 1.277

0.15 −6.956 −4.999 −4.455 0.326 8.018 2.261

0.3 27.832 28.002 10.933 25.150 8.990 16.372

0.1 −8.089 −7.017 −8.529 −0.324 15.601 4.850

0.15 −7.127 −5.231 −14.447 −4.139 14.512 7.578

0.3 27.744 27.606 16.104 25.713 37.802 16.157

Additionally, the error is defined as follows when ( ∂∂ tf )i j is chosen as a reference term:



ei =

Table 8 Comparison of the coefficients for different initial conditions. Time-series data

Considered potential terms

Coefficients

dX dt

X

X Y X2 X dX dt Y Z dX dt dY dt

Y

X Y Y2 XZ Y dY dt XZ dY dt dZ dt

Z

Z Z2 XY Z dZ dt XY dZ dt

X0 = 0.2, Y0 = 0.3, Z0 = 0.1

1 9.991003 −9.999383 0 0 1.7142e−06 1 −27.89640 0.970553 −0.0 0 062988 0.997455 0 −2.3374e−06 1 2.68476 −0.0 0 0727547 −0.999108 −7.58534e−05 3.48495e−06

1 9.991551 −9.999348 0 0 1.610772e−06 1 −27.89692 0.970211 −0.0 0 05830 0 0.997408 0 −2.49382e−06 1 2.68474 −0.0 0 0728772 −0.999111 −7.59098e−05 3.48501e−06

 = ij

fi j − fi( j−1) , t

∂2 f ∂t2

 ij

1 = t



∂f ∂t



∂f − ∂t (i+1) j

  ij

 = ij

∂2 f ∂t2

···



fi j − f (i−1) j , x

ij

= 1t

  ∂f

∂ t i( j+1 ) −

(44)

∂f  ∂t i j

=

 + A1 ij

∂m f ∂ xm



     ∂f ∂f ∂f + A2 f + A3 f 2 ∂x ij ∂t i j ∂t i j 2

+···

(48)

ij

In this paper, a new algorithm to determine the master equation behind time-series data was proposed. This approach was based on a library of potential terms for the master equation and linear sum of potential terms with appropriate coefficients. To optimize the coefficients, the genetic algorithm based on NCGA and the Gauss–Seidel method was applied. Additionally, this approach was applied to linear and nonlinear sample time-series data and a simultaneous differential equation of Lorenz equations. Through these comparisons, the proposed algorithm was verified. In particular, the genetic algorithm reduced the number of related potential terms for the master equation. Additionally, even in the case of chaotic motion, this algorithm worked well. This approach can be expanded to a more general case, which includes functions of time-space series data. This proposal approach can be applied the following regions: 1) Time-series data analysis:

fi( j+1) + fi( j−1) − 2 fi j = , t 2

second order for space x at tj :





8. Conclusions

first order for space x at tj :

∂f ∂x

∂f ∂t

Therefore, if time-space series data (t, x; f(t, x)) are given, the proposed algorithm is applicable as same as the time-series case. Not only is the case one-dimensional but also the f(t, x, y) and f(t, x, y, z) cases are applicable. In fact, one of the authors applied the case of f(t, x, y) to forecast the photovoltaic generation output in a broad geographical region based on grand measurement timeseries data (Shinichi [18]).

(43)



j

+ · · · +Am

(42)







second order for time t at xi :



E=



i

The proposed approach can be applied to a more general case using time-space series. For a function of f(t;x), which depends on time t and space x, the finite difference term is as follows: first order for time t at xi :

∂f ∂t

       ∂f ∂f ∂f ∂f + A1 + A2 f + A3 f 2 +··· ∂t i j ∂x ij ∂t i j ∂t i j  m  ∂ f +Am +··· (47) ∂ xm i j

The total error is

X0 = 0.1, Y0 = 0.15, Z0 = 0.3

7. Expansion to time-space series data



431

f (i+1) j + f (i−1) j −2 fi j t 2

(45)

This is essential application of this approach. As discussed, for given time-series data, this approach can find master equation, and can make a physical model. On the case of timeseries data which have theoretical master equation, this approach can find the coefficients in the equation as reverse problem. 1) Confirmation of functions of system or product: For example, if time-series data of output of wind turbine and mean wind speed at the wind turbine is given with enough time resolution, this approach can estimate the operation curve as the master equation between wind speed and output. Trough the comparison with the specification curve, weather the wind turbine is on the right situation or not can be checked.

432

S.-i. Inage, K. Yamaguchi / Chaos, Solitons and Fractals 114 (2018) 423–432

As the next step, a more rational library should be prepared. To learn the optimal potential terms, this approach should be machine learning or deep learning. Additionally, the accuracy of coefficients heavily depends on the time step between each time-series datum because the finite difference method is applied. The relationship between the time step and coefficient error should be discussed. Furthermore, a discussion on the numerical stability and uniqueness of the solution is required for the proposed approach to be a more universal method. The proposed approach is based on a simple concept. Therefore, it might be a powerful tool. Acknowledgments We thank Maxine Garcia, PhD, from Edanz Group for editing a draft of this manuscript. References [1] Baqais Abdulrahman. Generic algorithm for function approximation: an experimental investigation. Int J Artif Intell Appl 2016;Vol. 7(3) May. [2] Atkinson, Kendall A. An introduction to numerical analysis, New York: John Wiley & Sons; 1989. (2nd ed.)ISBN 978-0-471-50023-0. [3] Ak Bilgesu, Koc Erdem. A guide for genetic algorithm based on parallel machine scheduling and flexible job-shop scheduling. Proc - Soc Behav Sci 2012;62:817–23. [4] Deb K. Multi-objective optimization using evolutionary algorithms. Chichester, UK: Wiley; 2001. [5] Deb K, Agrawal S, Pratab A, Meyarivan T. A fast elitist non-dominated sorting genetic algorithm for multi-objective optimization: NSGENETIC ALGORITHM-II, Kanpur, India: Indian Institute of Technology; 20 0 0. Kangal report 20 0 0 01. [6] Moore EliakimHastings. Concerning transcendentally transcendental functions. Math. Ann. 1897;48:49–74. doi:10.1007/BF01446334.

[7] Fonseca CM, Fleming PJ. Genetic algorithms for multiobjective optimization: formulation, discussion and generalization. In: Proceedings of the 5th international conference on genetic algorithms, 423; 1993. p. 416. [8] George Cybenko. Approximation by superpositions of a sigmoidal function. Math Control Signals Syst 1989;2:303–14. [9] Goldberg DE. Genetic algorithms. Optimization and machine learning Search. Addison-Wesly; 1989. [10] Hornik Kurt. Maxwell stinchcombe, halbert white, multilayer feedforward networks are universal approximators. Neural Netw 1989;2(5):359–66. [11] Lorenz EN. Deterministic nonperiodic flow. J Atmosp Sci 1963;Vol.20:130–41. [12] Langkvist Martin, Karlsson Lars, Loutfi Amy. A review of unsupervised feature learning and deep learning for time-series modeling. Pattern Recognit Lett 2014;42:11–24. [13] Kumar Manoj, Husian Mohammad, Upreti Naveen, Gupta Deepti. Genetic algorithm: review and application. Int J Inf Technol Knowl Manag 2010;Volume 2(2):451–4 July-December. [14] Michael Schmidt, Hod Lipson. Distilling free-form natural laws from experimental data. Science 2009;324(3) April. [15] Hölder OttoLudwig. Über die Eigenschaft der Gammafunction keiner algebraischen differentialgleichung zu genügen. Math Ann 1887;28:1–13. doi:10.1007/ BF02430507. [16] Schaffer JD. Multiple objective optimization with vector evaluated genetic algorithms. In: Proceedings of 1st international conference on genetic algorithms and their applications, vol. 100; 1985. p. 93. [17] Watanabe Shinya, Hiroyasu Tomoyuki, Miki Mitsunori. Neighborhood cultivation genetic algorithm for multi-objective optimization problems. In: Late breaking papers at the genetic and evolutionary computation conference (GECC; 2002. p. 9–13. [18] Inage Shinichi. Development of advection model for solar-forecasting based on ground data 1st report: development of fundamental model and its verification. Sol Energy 2017;153:414–34. [19] Srinivas N, Deb K. Multi objective optimization using non dominated sorting in genetic algorithms. Evol Comput 1994;vol. 2(3):221. [20] Zitzler E, Thiele L. Multi objective evolutionary algorithms: a comparative case study and the strength pareto approach. IEEE Trans Evol Comput 1999;vol. 3(4):257. [21] Zitzler E, Laumanns M, Thiele L. Computer engineering genetic algorithm and communication networks lab (TIK) Technical Report 103. Swiss Federal Institute of Technology (ETH) Zurich; 2001.