Parameter identification and uncertainty quantification in stochastic state space models and its application to texture analysis

Parameter identification and uncertainty quantification in stochastic state space models and its application to texture analysis

Applied Numerical Mathematics 146 (2019) 38–54 Contents lists available at ScienceDirect Applied Numerical Mathematics www.elsevier.com/locate/apnum...

2MB Sizes 0 Downloads 77 Views

Applied Numerical Mathematics 146 (2019) 38–54

Contents lists available at ScienceDirect

Applied Numerical Mathematics www.elsevier.com/locate/apnum

Parameter identification and uncertainty quantification in stochastic state space models and its application to texture analysis B. Pedretscher a,b,∗ , B. Kaltenbacher a , O. Pfeiler b a b

Alpen-Adria-Universität, Klagenfurt, Austria KAI – Kompetenzzentrum Automobil- und Industrieelektronik GmbH, Villach, Austria

a r t i c l e

i n f o

Article history: Received 26 June 2019 Accepted 26 June 2019 Available online 5 July 2019 Keywords: Stochastic state space model Parameter identification Uncertainty quantification Profile likelihood Adjoint approach Fokker-Planck equation Thermo-mechanical fatigue Texture analysis

a b s t r a c t In this paper, a computational framework, which enables efficient and robust parameter identification, as well as uncertainty quantification in state space models based on Itô stochastic processes, is presented. For optimization, a Maximum Likelihood approach based on the system’s corresponding Fokker-Planck equation is followed. Gradient information is included by means of an adjoint approach, which is based on the Lagrangian of the optimization problem. To quantify the uncertainty of the Maximum-A-Posteriori estimates of the model parameters, a Bayesian inference approach based on Markov Chain Monte Carlo simulations, as well as profile likelihoods are implemented and compared in terms of runtime and accuracy. The framework is applied to experimental electron backscatter diffraction data of a fatigued metal film, where the aim is to develop a model, which consistently and physically meaningfully captures the metal’s microstructural changes that are caused by external loading. © 2019 IMACS. Published by Elsevier B.V. All rights reserved.

1. Introduction Dynamical system modeling in finance, biology and mechanics by stochastic process models has become popular and accepted in recent years. Usually, parameter identification in real applications is not straightforward, due to a lack of observability of the quantities of interest. This means that most frequently the states of the dynamical system cannot be observed directly, but mostly (if at all) only indirectly, which complicates investigations of the physical or economic law that triggers the observed effects. In addition, observations are usually prone to random measurement error causing illposedness properties of the problem. Nowadays, power semiconductor devices are used in a large and increasing number of applications, also for driving assistance, where safe operation and reliable lifetime models are obligatory for obvious reasons. In automotive industry it has therefore become mandatory to test the robustness of the devices not only at nominal use, but also at overload conditions by accelerated lifetime test. During such a power cycling stress test, thermo-mechanical loads are applied by electric pulses, causing material degradation and in worst case device failure. To forecast the safe operating area of the devices, reliable lifetime models are indispensable.

*

Corresponding author at: KAI – Kompetenzzentrum Automobil- und Industrieelektronik GmbH, Villach, Austria. E-mail address: [email protected] (B. Pedretscher).

https://doi.org/10.1016/j.apnum.2019.06.020 0168-9274/© 2019 IMACS. Published by Elsevier B.V. All rights reserved.

B. Pedretscher et al. / Applied Numerical Mathematics 146 (2019) 38–54

39

Previously used regression models, based on Arrhenius type equations or the Coffin Manson law, show prediction and extrapolation weaknesses because they are not able to capture the complex degradation process of the devices [15,41]. Thus, the need for mathematical degradation modeling, based on physical laws that describe the degradation process, became apparent. First investigations in this direction were done in a diploma thesis [39], where a state space model (SSM) based on ordinary differential equations was developed. Detailed analysis showed that the developed model could not explain the damage process and thus the lifetime data reasonably, due to a lack of understanding of the material response to loading. Therefore, further research on the characteristic behavior of the used materials, as well as an extension of the mathematical models and methods are necessary. In this work, we focus on the behavior of a metal film, which is investigated by help of a test structure that will be introduced and explained later in the text. Different types of measurements of the fatigued metal provided by scanning acoustic microscope, scanning electron microscope or electron backscatter diffraction (EBSD) combined with literature knowledge from material science allows the conclusion that stochastic processes are present, which need to be considered and included in any reasonable model. Since the entire degradation process of the structure is not yet completely understood nor modeled, a mathematical framework that enables to check and compare different modeling hypotheses in a reasonable amount of computation time is required. Exactly this requirement is the reason why spatial models based on finite elements cannot be used for this purpose, because of their high computational cost. More specifically, the aim of this work is to provide a framework that first enables to combine the available indirect observation data with the internal states of the system, and second, can be efficiently implemented and numerically solved. Therefore, state space modeling combined with adjoint based gradient computation, as well as profile likelihood based uncertainty quantification (UQ) are used. In general, SSMs allow to link internal (not directly measurable) system behavior with noisy experimental data under consideration of the experimental setting. The experimental setting refers to the input variables, the internal dynamics refer to the states and the observables to the output. Input, states and output are linked by means of a parametrized state and observation equation (system). The aim is to identify the modeling parameters of this complex system. At this point note that when speaking of identifiability one has to distinguish between structural and practical identifiability; concepts that are discussed in section 2.4. As internal stochastic dynamics need to be considered, as mentioned above, an SSM framework based on Itô stochastic processes needs to be considered. Due to the noise and sparsity of the observations, parameter identification is likely to be an ill-posed inverse problem. To nonetheless ensure reliable and robust parameter estimates, the system’s corresponding Fokker-Planck (FP) equation is considered for the optimization task; an approach proposed by Dunker and Hohage in 2014 [22]. A similar approach is used for optimal control in the articles of Annunziato and Borzì [3,4]. Basically, the FP equation is a partial differential equation (PDE) of parabolic type, which describes the evolution of the transition probability density of the states over time. Due to the use of probability densities, non-negativity and mass conservation of the solution of the FP equation have to be ensured at any time. For that reason, adequate boundary conditions, as well as the so-called Chang-Cooper (CC) discretization scheme are implemented [18,35]. The density data, which is required for the optimization, is computed out of the experimental data by means of kernel density estimation (KDE). Under certain smoothness conditions, it can be shown that the kernel density estimate is asymptotically normal, see therefore [26,19,38,27,50]. This result is important to validly define the total error in the observation equation as not only the random measurement error but also the approximation error of the KDE needs to be included. The considered l2 loss function of the squared distances between the densities given by the FP equation and the densities estimated out of the empirical observations is weighted by the variance and centered around the mean of the total observation error. Due to the fact that in the limit the loss function coincides with the negative log-likelihood of the system, these terms are used synonymously. Effective optimization requires gradient information, for which reason the problem’s Lagrangian is introduced. By combining optimality conditions, the properties of Nemytskii operators and the adjoint system, it is possible to give a representation of the gradient of the loss function with respect to the parameters, which can be computed accurately and efficiently compared to routines that are based on finite differences or sensitivity analysis. The quantification of the uncertainty of the parameter estimates is mandatory in any application. Here, a profile likelihood approach is followed, as literature refers it to be most promising in terms of accuracy and numerical efficiency. It is based on repeated local optimizations, where further details are given in [16,49,34,42]. The parameter fitting procedure and UQ framework is validated for a synthetic test setting where an Ornstein-Uhlenbeck (OU) state process is considered. After validation, the framework is applied to EBSD data of the fatigued metal film of interest, where it is investigated whether the loading induced changes in microstructure can be explained by an FP approach. The understanding of the texture evolution is important, as it is the process driving force for local stress concentrations triggering degradation. The goodness-of-fit is illustrated visually and checked by means of the Kullback-Leibler (KL) divergence. Further, the UQ results obtained by profile likelihoods are compared with the credible sets, which are computed by parallel tempering (PT) Markov Chain Monte Carlo (MCMC) simulations. The implementation is done in MATLAB® , by help of the open source GitHub repository toolboxes MTEX [7,8], which is used for the pre-processing of the data, and AMICI [24] and PESTO [46], which are used for the parameter estimation and UQ.

40

B. Pedretscher et al. / Applied Numerical Mathematics 146 (2019) 38–54

2. Mathematical & statistical fundamentals In this chapter the modeling framework is introduced and the mathematics required to do efficient parameter estimating and UQ is summarized. For well understanding, the chapter is divided in four sections, introducing -

the the the the

modeling framework based on stochastic SSMs, CC discretization of the FP equation, gradient computation by an adjoint approach, as well as UQ based on profile likelihoods.

2.1. Problem statement based on stochastic state space modeling As outlined in the introduction, the state dynamics of interest are captured by a stochastic process





Xθ = X tθ : t ∈ I ⊆ R+ 0 ,

(1)

i.e., a family of random variables

X tθ :  → D ⊆ Rm

(2)

defined on a probability space (, F , P ). More precisely, the process in our setting is an Itô process defined by an SDE of the form

d X t = aθ (t , X t )dt + bθ (t , X t )dW t ,

(3)

where aθ corresponds to the drift, bθ to the volatility or diffusion coefficient, and W to a standard Wiener process or Brownian motion [32]. The superscript θ indicates the model parameters θ ∈  ⊂ Rd , which have to be identified by means of the available observation data. To do so, instead of using the SDE (3) directly, its corresponding FP equation is used as state equation in the SSM framework [22]. In statistics, the FP equation is also known as forward Kolmogorov equation. Basically, it is a linear PDE of parabolic type that models the evolution of the transition probability density







u (t , x) = u t , x  X 0 ∼ u θ0 ,

(4)

for t > 0 and x ∈ D. The time interval I in our application refers to the finite and closed interval [0, T ]. By means of

A θ (t , x) = and

B θ (t , x) =

1 2

   T ∇ · bθ bθ − aθ (t , x),

(5)

1  θ θT b b (t , x), 2

(6)

the FP equation can be expressed as a continuity equation in divergence form as follows:

  ∂ u (t , x) = ∇ · B θ ∇ u + A θ u (t , x), ∂t

(7)

=: −∇ · J θ (t , x), where J θ denotes the probability flux, and ∇ · C =



m ∂ j =1 ∂ x j

Cij

T i =1, ..., m

refers to a matrix divergence.

Due to the fact that the PDE in (7) models the evolution of the probability density u, its solution needs to stay nonnegative for every t ∈ I , as well as needs to conserve its mass, i.e., the integral D u (t , x) dx has to be normed to one at any time t ∈ I . For the latter reason, a no-flux boundary condition

J θ (t , x) · n D (x) = 0,

x ∈ ∂ D,

(8)

ensuring that no probability mass can enter nor leave the system, is used. Non-negativity requires further studies, in particular, maximum principles do not apply directly. The case with a scalar, positive diffusion coefficient was investigated by Carillo et al. in 2011 [17]. The authors showed the decay of the relative entropy, which allowed them to conclude that the solution converges to a stationary solution (equilibrium) and stays non-negative. Generalizing to the multidimensional isotropic case, characterized by a diagonal diffusion matrix with equal positive elements, non-negativity is ensured provided that the initial density is non-negative, the diffusion matrix is symmetric and positive definite, which is obviously fulfilled, and the fact that diffusion dominates convection at the boundary, which can be formulated mathematically as:

B. Pedretscher et al. / Applied Numerical Mathematics 146 (2019) 38–54

Aθ · nD > 0

on ∂ D .

41

(9)

The general anisotropic case without boundary conditions was investigated in Arnold et al. [6], where the authors also proved that the relative entropy is decreasing with time. It is expected that a combination of the techniques presented in the two mentioned papers ensures non-negativity in the anisotropic case even with no-flux boundary conditions. At this point it is also interesting to mention the results of Droniou and Vázquez [21], who studied elliptic PDEs with Neumann boundary conditions comparable to the no-flux condition. Due to similarity, the results can be applied to the stationary FP equation. One of their results gives almost everywhere positive or negative, or constant solutions if certain smoothness conditions on the domain D and the coefficients A θ and B θ are fulfilled. For parameter estimation we assume to have indirect, discrete density data f (., x j † ) at prescribed time points 0 < t 1 <

  · · · < tn† = T , at the spatial positions x j † ∈ D, j † ∈ 0, . . . , N † . This leads to the following strong hybrid formulation of the SSM on the k† -th sub time interval:

⎧ ⎪ ⎨

⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨

k† (θ, u ) :

⎪ ⎩

⎪ ⎪ ⎪ ⎪ ⎪ ⎩

∂ u (t , x) ∂t

= −∇ · J θ (t , x)

(tk† −1 , tk† ) × D ,

J θ (t , x) · n D = 0

=

f k†

(tk† −1 , tk† ) × ∂ D , hθ† (u (tk† , .)) + ηk† k



(10)



k † ∈ 1, . . . , n † ,

which in variational form looks as follows [29]:

 ⎧  ∂u θ ⎪ ⎪ ( t , x ) v ( x ) − J ( t , x ) · ∇ v ( x ) dx = 0 ∀ v ∈ H 01 ( D ) , ⎨ ∂t D ⎪ ⎪   ⎩ f k† = hkθ† (u (tk† , .)) + ηk† k † ∈ 1, . . . , n † .

kweak (θ, u ) : †

(11)

The operators hθ† in the observation equation are maps of the form: k

hkθ† : C ( D ; R) → R .

(12)

The observation equation accounts for the total error ηk† ∈ R , which combines the random measurement error and the KDE approximation error. In our application, we consider independent Gaussian measurement noise for every time instance tk† of available observations. Further, the density is estimated out of empirical data by KDE, whose pointwise approximation error is asymptotically normal, see e.g. [19,38,50]. Thus, the vector of total noise ηk† additionally contains a systematic component we consider independent (with respect to the time instances tk† ) normal noise of the form  μk† . Summarized, 



2



ηk† ∼ N μk† , σkθ† I for k† ∈ 1, . . . , n† .

As l2 loss function, we use the asymptotic negative log-likelihood function for the problem (10) or (11), respectively, which due to independence and normality of the error reads as follows: †

ψ( f ) =

n 1 

2





θ2 k†

⎣l log 2πσ

k † =1



 ⎤   f † − h † u (t † , .) − μ † 2  k k k k  ⎦ +   . θ   σ† k

(13)

2

Reliable numerical simulation requires further well-posedness of the SSM in finite and infinite time. Standard analysis [23, Theorems 2, 3, 4, Section 7.1.2] guarantees well-posedness of (7) in finite time, if the coefficients A θ and B θ are in L ∞ ((0, T ) × D ), u θ0 ∈ L 2 ( D ), and B θ is uniformly symmetric and positive definite on (0, T ) × D. Regarding the parameter to drift and parameter to volatility if  respectively, this means that well-posedness is guaranteed  map, 

    −1 θ → aθ ∈ C R d ; L ∞ ((0, T ); D ) , and θ → bθ ∈ C R d ; bθ ∈ L ∞ (0, T ); W 1,∞ ((0, T ); D ) : bθ L ∞((0, T )× D ) ≤ M , for a constant M > 0. Global in time well-posedness requires further that condition (9) is fulfilled, as mentioned in [29]. Finally, parameter identification amounts to solve the following optimization problem:

min (θ, u ; f ) θ, u

s.t.

(θ, u ; f )

(14)

where the operator is defined by the loss function ψ of equation (13), and  refers to the SSM on I , which is defined by the sub SSMs (10) or (11), respectively.

42

B. Pedretscher et al. / Applied Numerical Mathematics 146 (2019) 38–54

2.2. A mass and non-negativity conserving discretization of the Fokker-Planck equation due to Chang and Cooper For simplicity of notation, we skip the superscript θ in this subsection and consider a cubic domain D. To numerically solve the FP equation, a semi-discretization, which is based on a spatial discretization of the PDE by a method-of-lines approach, is implemented. For this purpose, an equidistant spatial discretization of the bounded hypercube D = x0 + L · Q m , where Q m is the m-dimensional unit hypercube [0, 1]m and x0 is an arbitrary corner of D, with the meshing x = L / N is considered. This gives the discretized hypercube





D x = x j ∈ Rm : x j = x0 + j x, j ∈ N m ∩ D ,

(15)

where j = ( j 1 , . . . , jm ) is a multi-index that indicates the spatial position. The time is discretized based on the steps tk = k t, for k = 1, . . . , n. Thus, the solution u of the FP equation is represented by the discrete set ukj = u j (tk ) = u (tk , x j ). Due to the fact that standard numerical finite difference schemes do not provide mass and non-negativity conservation of the solution of the FP equation, the CC scheme is introduced and summarized in this section, as it conserves these two intrinsic properties. More precisely, in [18] Chang and Cooper demonstrated that commonly used difference schemes (even implicit schemes) are not necessarily particle (mass) conserving.1 Indeed, they showed that the change in the number of particles per time step is proportional to the squared mesh size x2 . Thus, even if very fine grids are applied, the numerical error can become severe because of an accumulation over time until a quasi-equilibrium of the PDE is reached [18]. To overcome this issue, the authors discretized the FP equation according to





D t u j t k +1 ≈

1 

t



u j (tk+1 ) − u j (tk ) =

m  1   i ,k i ,k J j +1 /2 − J j −1 /2 , i i

x

(16)

i =1

i ,k

where J j +1 /2 represents the generalized flux at x j +1i /2 in direction i, expressed in implicitly known values of u. Due to the i i ,k +1 implicit formulation in (16), the term ukj + 1i /2 , which is required to compute J j +1i /2 , is not known directly, for which reason it was approximated by 1 ukj + +1 /2 ≈ i

1 2



1 k +1 ukj + , +1 + u j

(17)

i

i ,k in a first approach. The spatial derivative ∂∂x u in the i-th direction at x j +1i /2 in J j +1 /2 was approximated according to i

 ∂ u k+1 ∂x 

 1  k +1 ≈ u j +1 − ukj +1 . i

x j +1i /2

(18)

The problem that arises at this point lies in the fact that by applying (17), negative eigenvalues can emerge unless a very restrictive condition on the mesh x is implemented. Due to runtime issues, such a condition is not suitable for real data simulations. Chang and Cooper solved the problem by means of a convex combination of values of u at adjacent spatial 1 positions and defined ukj+ +1 /2 as i



1 ukj + +1 /2 = 1 − δ j

i ,k



i

1 k +1 ukj + +1 + δ j u j , i ,k

(19)

i

i ,k

for values δ j ∈ [0, 1] to be chosen as below. This approach is consistent with the difference scheme.2 Due to this definition, u j +1i /2 does not automatically refer to the mid-point between u j +1i and u j , as it is the case with the common central i ,k

difference scheme, but denotes any value between these two adjacent points. Thus, depending on the choice of δ j , the scheme switches between backward, centered and forward differencing. Based on the introduced convex combination in (19), the generalized flux in the i −th direction at position x j +1i /2 can be defined implicitly as i ,k

J j +1 /2 = i



i ,k

1 − δj



i ,k

A j +1 /2 + i

1







1 i ,k i ,k u k +1 − − δ ij,k A ij,+k 1i /2 ukj +1 , B B

x j +1i /2 j +1i

x j +1i /2

(20)

i ,k

where the value of δ j is chosen in such a way that the scheme restores equilibria and a non-negative spectrum. To preserve the equilibrium condition remember that the flux is zero in an equilibrium, which gives that

1 2

Mass and particle conservation are used as synonyms in this context. Originally, the scheme was developed for FP type equations with positive coefficients A θ and B θ , leading δ to be any value in the interval [0, 1/2]. In

later works it was proved that it is sufficient to use a positive diffusion matrix B θ and a Lipschitz continuous convection term A θ , which increased the interval for δ up to [0, 1], see [36,25].

B. Pedretscher et al. / Applied Numerical Mathematics 146 (2019) 38–54

1 ukj + +1

i

ukj +1

= 

43

i ,k i ,k i ,k 1

x B j +1i /2 − δ j A j +1i /2  . i ,k i ,k i ,k 1 − δj A j +1 /2 + 1x B j +1 /2 i i

(21)

The ratio in (21) should coincide with the exact ratio 1 ukj + +1 i ukj +1



x j +1 i



⎜ = exp ⎝−



i

i

ukj +1

(22)



i ,k

≈ exp ⎝−

⎟  dxi ⎠ ,



B i t k +1 , x

xj





A t k +1 , x

which can be approximated by 1 ukj + +1



A j +1 /2 i

i ,k

B j +1 /2

x⎠ ,

(23)

i

i ,k

for sufficiently small values of x. By combining (21) and (23), δ j

δ ij,k =

1 i ,k

wj



1



i ,k



exp w j

−1

can be approximated by

(24)

,

with i ,k

i ,k

w j = x

A j +1 /2 i

i ,k

B j +1 /2

(25)

.

i

i ,k

i ,k

i ,k

To obtain the range for δ j , the limits w j → ±∞ have to be evaluated, which gives that δ j ∈ [0, 1], see [25]. Non-negativity of the solution of the FP equation can be ensured for a Lipschitz continuous convection term A, see equation (7):

| A (t , x + x) − A (t , x)| ≤ γ x, where

(26) 1

γ > 0 is the Lipschitz constant. In fact, it was proved that if t < γ , the CC discretization scheme preserves the

non-negativity of the solution of the FP equation [36, Theorems 3.3, 3.6]. Finally, to preserve the mass, a no-flux boundary condition as in equation (8), is used. This condition is implemented by introducing the ghost variables x−1 and x N +1 , and fulfilling k k J− 1i /2 = J N +1i /2 = 0,

(27)

for i = 1, . . . , m. To further guarantee particle conservation for any backward time differencing scheme, used to approximate the partial derivative ∂∂t u on the left hand side of equation (16), note that Lemma 3.1 in the work of Mohammadi and Borzì [36], which treats the special case of the backward Euler method, can be generalized by induction with respect to time. Additionally, in [36] stability of the schemes is proven and local error estimates are established, leading to optimal convergence rates of first and second order in time CC difference schemes. Summarizing, the implicit CC scheme takes the form





D t u j t k +1 ≈

=

1 

t

ukj +1 − ukj

m 1 

x

#

i =1



i ,k 1 − δj



i ,k A j +1 /2 i

+

1

x

 i ,k B j +1 /2 i

1 ukj + +1

i

    1  i ,k i ,k i ,k i ,k i ,k i ,k − B j +1 /2 + B j −1 /2 + 1 − δ j −1 A j −1 /2 − δ j A j +1 /2 ukj +1 i i i i

x 



+

1



(28)

$

1 − δ j −1 A j −1i /2 ukj + B −1 i ,

x j −1i /2 i ,k

i ,k

i ,k

where (16), (19), (20), (24), (25) and (27) are combined. Due to the numerical no-flux boundary condition in (27), corresponding summands in (28) vanish. Finally, by replacing tk in (28) by an arbitrary time instance t ∈ (0, T ), and assuming A and B to be time independent, a system of ordinary differential equations (ODEs)

u˙ = Au ,

(29)

44

B. Pedretscher et al. / Applied Numerical Mathematics 146 (2019) 38–54

where A = A(t , x, δ, θ) represents the discretization matrix in (28),3 remains to be solved. Note that depending on the dimension m of the space D, one obtains a band matrix consisting of the main diagonal and the two secondary diagonals in one dimension, and a block band matrix with two further block diagonals in two dimensions. The scheme can be generalized to non-equidistant spatial grids, as shown in [14]. Simulations show that depending on the problem, such an adaptive mesh can improve the numerical results significantly. 2.3. Adjoint based gradient computation with discrete time observations Efficient parameter identification techniques require gradient information to compute descent directions for the optimization. To compute the gradient of the negative log-likelihood ψ with respect to the unknown parameters θ at low computational costs, an adjoint approach, which is based on the Lagrangian of the system, is followed. At this point note that we use the semi-discretization (29) of the state equation and additionally introduce a space discretization for the observations by introducing the function † † hkN† : R N +1 → R ,

(30)

which satisfies †

hkN† ( v ) ≈ hk† ( I v ),

∀ v ∈ RN



+1

(31)

,







† where I : R N +1 → C D ; R is a piecewise multilinear interpolant such that I v (x j † ) = v j † , ∀ j † ∈ 0, . . . , N

that

† h N† k

is differentiable with Jacobian

† h N† k





L∞

 †  R N +1 .

 †

. We assume

Under consideration of the above introduced discretizations of the state and observation equation in case of empirical observations, the Lagrangian of the discretized problem can be formulated as:







L u , θ, p = ψuN ( f ) +

t † n† k  k† =1t





pkT† u˙ − Au dt

(32)

k † −1



† To set up the adjoint problem, the optimality condition Lu s = 0, for s ∈ C 1 [0, T ]; R N +1



with s(0) = 0, integration by

parts with respect to time on the sub intervals, and the differentiability properties of linear and Neymitskii operators [2], are combined. Therewith it can be concluded that 





Lu u , θ, p s =

† ψuN



s+

tk† n†  k † =1 t

pkT† (˙s − As) dt



k −1 ⎡ tk†  n T   ⎢ N† = ψu s − p˙ k + A T pk† s dt + pkT (tk† )s(tk† ) − ⎣





k † =1

lim

t →t +†



(33)

pk† (t ) T s(tk† −1 )⎦

k −1

t k † −1

!

= 0, where t 0 := 0 and † ψuN





s = −

n 

σkθ†

−2





T

f k† − hkN† (u (tk† ))

†

hkN†





(34)

u (tk† ) s(tk† ).

k † =1



'

Based on (33), the adjoint system for the discretized problem for each sub interval tk† −1 , tk† is an ODE of the form

p˙ k† (t ) = −A T pk† (t ),

(35)

with final conditions

pk† (tk† ) =

⎧ ⎪ ⎨ ρn†

k † = n† ,

⎪ ⎩ lim+ pk† +1 (t ) − ρk†

k † = 1, . . . , n † − 1,

t →t

3

k†

In case A and B depend on time in a smooth way, (28) gives a valid approximation of (29).

(36)

B. Pedretscher et al. / Applied Numerical Mathematics 146 (2019) 38–54

where

ρk† = σkθ†

−2



T



f k† − h N† (u (tk† )) k

†

h N†



k

45



u (tk† ) . The Lagrange multipliers pk† are referred to as the adjoint states of the

boundary value problem. If the FP equation is not discretized by the CC scheme but the PDE setting is used directly, the adjoint system is a PDE of the form:



  ∂ p k† 1  θ θT T − b b : ∇ 2 pk† − aθ ∇ pk† = 0, in tk† −1 , tk† × D , ∂t 2     1 θ θT b b ∇ pk† · n D = 0, on tk† −1 , tk† × ∂ D ,

(37)

2

where “ : ” denotes the Frobenius inner product. This equation represents the Kolmogorov backward equation combined with no-flux boundary conditions [29,32]. However, in both settings the gradient of the negative log-likelihood with respect to the unknown parameters is obtained by applying the chain rule on the Lagrangian. Therewith, the gradient can be expressed as

∇θ ψ = Lu u θ + Lp p θ + Lθ ,

(38)

where the first two summands vanish since u and p are the solutions of the FP and adjoint equation, respectively. Thus, it only remains to compute the direct derivative of the Lagrangian with respect to the parameters, giving

∇θ ψ = Lθ .

(39)

The time consuming computation of the gradient via sensitivity analysis, where sensitivities of the states and output by further PDE systems would have to be computed successively, is avoided due to the adjoint approach, which makes it highly advantageous compared to straightforward gradient computation methods [48]. 2.4. Profile likelihood based uncertainty quantification Since predictions, apart from the prediction error, are affected by the estimation error of the parameter estimates, which is unavoidable when calibrating a model with experimental data, it is mandatory to infer how well model parameters are determined by experimental data. Therefore, identifiability and parameter confidence have to be studied in detail. In literature, different approaches to compute confidence intervals are offered, where one has to distinguish between asymptotic and likelihood-based confidence intervals. In a general SSM setting, likelihood based confidence intervals are used to quantify the confidence of the parameter estimates. Assuming a Gaussian likelihood, the likelihood ratio statistic is approximately χ 2 distributed, giving the following definition of the hyperelliptic confidence region in the parameter space Rd for an 1 − α confidence level:

ˆ = C R 1−α (θ)

      θ ∈   2 ψ(θ) − ψ(θˆ ) ≤ 1−α ,

(40)

2 where 1−α denotes the 1 − α quantile of the chi-squared distribution χdf with df = #θ = d, see [49,34,42]. To compute confidence intervals of single parameters, a profile likelihood approach is followed. This concept is based on exploring the parameter space by re-optimizing the likelihood in such a way that it is reduced to a single parameter function, which is reached by treating the remaining parameters as nuisance parameters. Mathematically speaking, the profile likelihood for β is given as

P L j (β) = min ψ(θ), θ ∈ j (β)



(41)



where  j (β) = θ ∈  : θ j = β , see [49]. By means of the results of (41) for different values of β , likelihood based confidence intervals are obtained as

   



ˆ − P L j (β) ≤ 1−α C I 1−α (θˆ j ) = β  2 ψ(θ)     ˆ ∧ θj = β , = β  ∃ θ ∈ C R 1−α (θ)



(42) (43)

2 where 1−α is the 1 − α quantile of χdf with df = 1. If the confidence interval of a parameter is finite, the parameter is said to be practically identifiable. On the contrary, unbounded confidence intervals indicate non-identifiability of a parameter, resulting from data insufficiency (practical non-identifiability) or modeling/parameter redundancy (structural non-identifiability) [47,42]. At this point it is important to mention that in our application we only investigate whether or not a parameter is practically identifiable. This means we investigate whether the credible intervals are finite for sufficiently small noise. The concept of practical identifiability differs from the usual concept of identifiability, where the lengths of confidence intervals additionally need to converge to zero for “infinitely” large samples.

46

B. Pedretscher et al. / Applied Numerical Mathematics 146 (2019) 38–54

3. Validation and application In order to validate the introduced parameter identification framework an exemplary setting based on the OrnsteinUhlenbeck process is considered. This standard example is used, as errors in the implementation would be visible immediately, and the drawbacks of MCMC simulations can be highlighted in this simple setting. Then, the approach is applied to measurement data of a polycrystalline metal film, which reveals the change in microstructural properties due to applied stress. 3.1. Validation of the parameter identification framework For the test setting, a process of the form

d X t = ζ (μ − X t )dt + σ dW t ,

(44)

with the parameters (ζ, μ, σ ), where it is of importance that ζ > 0, σ > 0 and μ ∈ R, is considered as state SDE. In literature, such a process is referred to as OU process. Generally speaking, it describes the movement of particles under Brownian motion subject to friction [45]. For ease of exposition, an identity observation operator h ≡ Id is considered in this setting, leading to the SSM

  ⎧ 1 2 ∂ ⎪ ⎨ u (t , x) = ∇ · σ u − ζ (μ − x)u (t , x) 2 (ζ, μ, σ ) : ∂ t ⎪ ⎩ f k† (x j † ) = uk† (x j † ) + ηk† , j †

(45)

with no-flux boundary conditions, where x j † ∈ D x , 0 < t 1 < . . . < tn† = T . The noise ηk† , j † accounts for the bias and variation of the KDE. To obtain the density data f k† (x j † ), trajectories of the state SDE representing the empirical data are required first. Usually one would (if possible) apply e.g. a Lamperti transformation, such that the volatility becomes state independent, and then do path sampling in the transformed setting by means of the Milstein scheme with sufficiently small time increments. However, since the OU process is explicitly solvable, namely

X t = μ + ( X 0 − μ) e

−ζ t

t +σ

e −ζ (t −s) dW s ,

0

= μ + ( X 0 − μ) e −ζ t + σ e −ζ t

(46)

t e ζ s dW s , 0

the analytic solution is used for sampling instead. For this purpose, the stochastic integral in equation (46), i.e.,

t e ζ s dW s ,

(47)

0

is transformed by Doob’s transformation [1,20]. To this end, the Dambis, Dubins-Schwartz theorem [43] is applied, which guarantees the existence of a Wiener process fulfilling

t e ζ s dW s = W τ (t ) , 0

where

τ (t ) =

1 2θ

(

t ≥ 0,

(48)

'

e 2ζ t − 1 . Combining (46) with (48) gives the analytic solution of the OU process:





X t = X 0 e −ζ t + μ 1 − e −ζ t + σ e −ζ t W τ (t ) .

(49)

Based on this representation, ten thousand trajectories of the process within the time interval [0, 2], the parameters (ζ, μ, σ ) = (1, 1.2, 0.3) and an uniform initial distribution X 0 ∼ U [0, 0.1] were sampled. The gained sample paths are visualized in Fig. 1, where the horizontal dotted line indicates the mean or long-time equilibrium of the process. In particular, 2

one can show for the large time behavior of the process that E limt →∞ X t = μ and V ar limt →∞ X t = σ2ζ , resulting in converging sample paths that fluctuate around the long time mean. Thus, if the domain D for the FP equation is chosen large enough, in the sense that the sample paths do neither touch nor cross the boundary ∂ D, the introduced fitting procedure can be applied, as the no-flux boundary condition does not falsely affect the evolution of the density.

B. Pedretscher et al. / Applied Numerical Mathematics 146 (2019) 38–54

47

Fig. 1. Trajectories of the OU process with true parameters.

Fig. 2. KDE estimated FP density and its temporal evolution.

For parameter estimation, density data f k† (x j † ) out of the sampled trajectories at the prescribed time instances

tk† = (0.0, 0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4, 1.6, 1.8, 2.0)

(50)

was computed by KDE, where a uniform kernel was used. The estimated density at the time instances (50) is shown in Fig. 2. At this point note that estimated data with an absolute value smaller than 1E − 30 is neglected and set to NaN. Thus, for parameter identification only the data, which is highlighted by the black lines at the spatial grid points in Fig. 2, is used. The optimization routine based on the CC discretization and the adjoint approach is implemented in MATLAB® , where especially for the parameter identification and UQ tasks the toolboxes AMICI and PESTO are used [24,46]. At this point it is interesting to mention that the model needs to be implemented symbolically, as the toolboxes are based on the symbolic toolbox from MATLAB® . Latin hypercube sampling within the lower [0.1, 0.5, 0.1] and upper [2.0, 2.9, 0.5] bounds for the parameters was performed to obtain random start parameter vectors for the optimization. At this point note that the l2 loss function is treated as the negative log-likelihood function, as this is true in the asymptotic sense. Further, prior information for the parameters, namely uniform distributions within the lower and upper bounds, is applied. For these two reasons, the estimate is biased and called Maximum-A-Posteriori (MAP) estimate. Thus, by following Bayes’ law, the global MAP esˆ , σˆ ) = (1.0006, 1.2025, 0.2970) was reached. A comparison with the true parameters give relative errors of timate (ζˆ , μ (0.06%, 0.21%, 1%). Goodness-of-fit, as well as identifiability are checked by help of the posterior probability profiles of the parameters, which are shown in Fig. 3. These profiles are obtained by following the profile likelihood approach, described in section 2.4. Due to the usage of the prior knowledge, posterior profiles instead of only likelihood profiles could be computed. For the computation of the profiles, an optimization based method, highlighted by the red lines in Fig. 3, is applied. For further details on these two methods, the reader is referred to [16]. However, a close look at the profiles indicates that each of the parameters is practically identifiable. This can be concluded by visual inspection only, since every arbitrary statistical confidence level gives finite confidence intervals for each of the three parameters. To highlight the efficiency of the above introduced parameter estimation technique based on optimization and adjoint based gradient information, the results were compared with estimates obtained by following a Bayesian inference approach based on MCMC simulations. More precisely, a PT algorithm was used to simulate Markov chains for the three unknown parameters. Generally speaking, PT rescales the negative log-likelihood by introducing an artificial temperature level. Therewith, multi-modality of any cost functional can be handled reliably, as the chains do not stop in local minima

48

B. Pedretscher et al. / Applied Numerical Mathematics 146 (2019) 38–54

Fig. 3. Posterior probability profiles of the parameters of the OU process. (For interpretation of the colors in the figure(s), the reader is referred to the web version of this article.)

Fig. 4. Results of PT MCMC simulations.

if an appropriate rescaling is implemented [9,10,44]. Here, six different temperature levels were considered to generate the Markov chains shown in Fig. 4. In total, each of the chains consists of one hundred thousand samples. A detailed look at the figure indicates that each of the three chains converged to a stationary posterior probability, after the Burn-In phase, i.e., the (initial) time the chain is run until it is close to stationarity, is passed. When comparing the confidence intervals for the parameter estimates gained by sampling with those gained by computing the posterior probability profiles, one can recognize differences, even when the Burn-In numbers are neglected. Therefore, compare the Figs. 5 and 6, where (80%, 90%, 95%) confidence intervals with and without Burn-In are plotted. For explanation, the longest intervals (lightest color) refer to the 95% credible intervals, the middle (medium color intensity) to the 90% credible intervals, and the shortest (darkest color) to the 80% credible intervals, respectively. The abbreviations on the y-axis indicate whether the intervals are gained by MCMC (Bay., Sampl.), or based on the profile likelihood approach (Profile L.) and corresponding posterior probability profiles, respectively, which are shown in Fig. 3. At this point note the

B. Pedretscher et al. / Applied Numerical Mathematics 146 (2019) 38–54

49

Fig. 5. Confidence intervals due to MCMC (Burn-In included) and profile likelihood.

Fig. 6. Confidence intervals due to MCMC (no Burn-In included) and profile likelihood.

different scaling of the horizontal axes in Figs. 5 and 6 and the fact that the intervals computed by Bayesian Sampling are excessively extended to the left. One can see that even for the simple example with an OU process, confidence intervals gained by MCMC simulations are not that accurate as those generated via posterior probability profiles; a fact confirmed by literature. In fact, it can be seen, that the credible intervals based on sampling and profile likelihoods differ. Due to the fact that in practical applications the Burn-In period is defined mostly heuristically, i.e., it is the user who declares this threshold by visual inspection of the Markov chain, the profile likelihood based confidence intervals are more reliable and independent from user actions. Further, it has to be mentioned that the computation time for the MCMC simulation is much higher than the time needed for the optimization routine that includes the gradient information, highlighting the usability of the introduced approach in practical applications. However, in case only a few trajectories of a process are observed, KDE is very inaccurate, for which reason it is advised to do MCMC simulations directly on the path observations. Here the Euler approximation, which is used to sample the trajectories of the SDE, is the basis for the likelihood construction [28]. To finally visualize the difference between the density estimated out of the empirical data (due to true θ ) and the fitted density (due to MAP θˆ ), the Kullback-Leibler (KL) divergence is visualized in Fig. 7. Note that due to the exact inclusion of the initial density as initial condition in the optimization setting, the KL distance is zero at the beginning.

50

B. Pedretscher et al. / Applied Numerical Mathematics 146 (2019) 38–54

Fig. 7. Temporal evolution of KL distance between prescribed and fitted FP density.

3.2. Application to EBSD data of a fatigued metal to explain texture evolution EBSD is a prominent investigation tool in material science to study the microstructure and the characteristic properties of materials. As mentioned previously, the focus of this work lies in the characterization of a metal film, which shows textural changes because of applied thermo-mechanical stress. More precisely, the metal of interest is a polycrystalline solid that consists of a large number of single crystals, termed as grains. During loading, these grains change their integrity, which leads to local stress concentrations at the grain boundaries, triggering the initialization of intergranular voids. The voids grow, coalesce and finally accumulate to cracks, which in worst case spread through the film. The mentioned observations are from experiments of a test structure, where further details can be found in [37]. To reliably model and predict the degradation of the metallization, it first is important to reasonably model the changes in texture that are caused by loading, as they are the process driving force for degradation. A useful concept in this context is the so-called Grain Boundary Character Distribution (GBCD) – a distribution of relative lengths (2D) or facets (3D) with a given misorientation. In literature, this statistic has been studied heavily by Barmak et al. and Kinderlehrer et al., who modeled coarsening cellular networks based on curvature driven growth of grain boundaries [12,11,30,31]. In a simplified setting, the authors were able to prove a FP type evolution for the GBCD, which is defined as



ρ (α , t ) = relative length of arc of misorientation α ∈ D at time t , normalized such that

ρ d α = 1.

(51)

D

In some more details, they started from a local dissipation principle, which was upscaled from the local level by means of the GBCD. To account for critical events like grain deletion they added an entropy term and included the kinetic energy, which finally gave them a FP equation for the GBCD:

μ

  ∂ ∂ ∂ d λ ρ+ ρ= ζρ , ∂t ∂α ∂α dα

in D , 0 < t < ∞,

(52)

where μ > 0 is a constant, ζ = ζ (α ) ≥ 0 is the surface energy potential of the grain boundaries, which depends on the misorientation α between adjacent grains, λ denotes the diffusion parameter of the metal, and D is the space of misorientation angles [11]. The underlying Langevin equation of (52) shows high similarity to the SDE model of Barmak et al. [12], who in a first approach studied the relation between coarsening and random critical events by an SDE of the form:

dαt = −

d dα

γ dt +  dW t ,

(53)

with energy potential γ and volatility  . However, the authors found that this SDE model is too naive, as it only considers the misorientation angle. This was the reason why they started modeling based on curvature driven growth and the GBCD, which incorporates the relative lengths of the grain boundaries. In the following, we follow the framework introduced in the sections above, i.e., we consider the FP equation (52) as state equation. For the definition of an appropriate observation equation it has to be remarked that the GBCD can directly be computed out of the experimental observation data, giving an identity observation operator in the SSM framework (10). To reliably set up the model, the misorientation space D and the surface energy η have to be defined carefully. Due to crystal symmetry reasons, the interval for the misorientation angles α is the closed interval [0, 62.8◦ ], see [33], and the surface energy is assumed to be of periodic nature. Mathematically speaking, in a first approach the surface energy is modeled by a periodic potential of the form ζ (α ) = ε sin( f α ), where ε is a modeling parameter and the frequency f is fixed to f = 180/125.6. Summarized, we end up with the following FP state equation:

B. Pedretscher et al. / Applied Numerical Mathematics 146 (2019) 38–54

51

Fig. 8. The empirical GBCD and its evolution over stress cycles.

Fig. 9. Confidence intervals based on MCMC simulations and profile likelihoods.

μ

  ∂ ∂ ∂ λ ρ + ε f cos ( f α ) ρ , ρ= ∂t ∂α ∂α

in D , 0 < t < ∞,

(54)

where D = [0, 62.8◦ ], μ and ε are the free modeling parameters that need to be inferred from the measurement data, and the diffusion parameter λ is fixed by means of the material’s characteristic activation energy and the applied stress condition in the experiment. The empirical data is allocated out of a time series of EBSD scans of the fatigued metal film. The post-processing of the data, including the removal of inaccurate measurements and the extraction of the GBCD, is implemented in MATLAB® , where the open source GitHub repository toolbox MTEX is used [7,8]. The evolution of the experimental GBCD is shown in Fig. 8, where the misorientations are given in rad. On basis of the data shown in Fig. 8, the Maximum-A-Posteriori estimates for the model parameters are computed, giving a maximum KL value of 0.0608, which indicates a reasonable fit. To quantify the uncertainty of the estimates, the parameters’ corresponding confidence intervals are computed, where the results of MCMC simulations are compared against profile likelihood results, as shown in the Figs. 9 and 10. Visual inspection of Fig. 9 enables the conclusion that both modeling parameters are practically identifiably, as each of the CIs is finite. Further, the difference between the CIs out of MCMC simulations and the credible sets obtained from profile likelihoods is eye-catching. The mentioned difference is illustrated in more details in Fig. 10, where the MCMC corresponding histograms are plotted against the posterior probability profiles for the parameters. The convoluted results of the multistart optimization are highlighted by the red circles (MS – conv.). The results presented in this section are important to better understand the process of void formation. Interesting at this point is the decrease of probability mass around 60◦ , which was already reported in the work of Bigl [13], who showed a decrease of twin boundaries. Although the heating method of his experiments was different, the results are comparable to the observations in this experiment. A detailed discussion of the model can be found in the paper [40].

52

B. Pedretscher et al. / Applied Numerical Mathematics 146 (2019) 38–54

Fig. 10. Detailed comparison between MCMC results and posterior probability profiles.

4. Conclusion In this paper, parameter identification and UQ for SSMs based on Itô stochastic processes was performed by the system’s corresponding FP equation. For efficient numerical optimization, the gradient of the cost functional with respect to the unknown parameters was computed by an adjoint approach. The UQ results based on profile likelihoods were compared with the results reached by PT MCMC simulations. The algorithmic optimization approach based on gradient information shows benefits over usually applied MCMC simulations due to two reasons; firstly, the confidence intervals obtained by computing the likelihood or posterior probability profiles, respectively, are more accurate; secondly, the computation time for the optimization is much less than the expensive time required for the MCMC simulations. Further, the presented fitting results for the GBCD of a fatigued Cu film are important as they help in understanding the response of the metal due to thermo-mechanical fatigue, which is crucial for future degradation modeling. Future work incorporates the extension of the fitting procedure to Lévy processes, as the void initialization has to be modeled by a Poisson processes, whose intensity is assumed to depend on the change in local misorientation density. For this purpose, the work of Annunziato and Gottschalk [5], who present an optimal control approach for the calibration of a model based on a Lévy process, will be considered. Acknowledgement The authors want to thank Ralf Hielscher (TU Chemnitz, Germany) and Paul Stapor (Helmholtz Zentrum Munich) for their assistance, as well as Michael Nelhiebel (KAI GmbH, Austria), Max Hermann Poech (Fraunhofer-Institut für Siliziumtechnologie (ISIT), Germany), Florian Peter Pribahsnik (KAI GmbH, Austria), Johannes Zechner (KAI GmbH, Austria) and Thomas Svensson (Ingenjörsstatistik, Borås, Sweden) for fruitful discussions. This work was funded by the Austrian Research Promotion Agency (FFG, Project No. 863947). Moreover, the authors wish to thank all reviewers for their very careful reading of the manuscript and their detailed reports with valuable comments and suggestions that have led to an improved version of the paper. References [1] L. Alili, P. Patie, J.L. Pedersen, Representations of the first hitting time density of an Ornstein-Uhlenbeck process, Stoch. Models 21 (2005) 967–980, https://doi.org/10.1080/15326340500294702. [2] H. Amann, J. Escher, Analysis II. Grundstudium Mathematik, Birkhäuser, Basel, 2008. [3] M. Annunziato, A. Borzì, Optimal control of probability density functions of stochastic processes, Math. Model. Anal. 15 (2010) 393–407, https:// doi.org/10.3846/1392-6292.2010.15.393-407. [4] M. Annunziato, A. Borzì, A Fokker–Planck control framework for multidimensional stochastic processes, J. Comput. Appl. Math. 237 (2013) 487–507, https://doi.org/10.1016/j.cam.2012.06.019. [5] M. Annunziato, H. Gottschalk, Calibration of Lévy processes using optimal control of Kolmogorov equations with periodic boundary conditions, Math. Model. Anal. 23 (2018), https://doi.org/10.3846/mma.2018.024.

B. Pedretscher et al. / Applied Numerical Mathematics 146 (2019) 38–54

53

[6] A. Arnold, E. Carlen, Q. Ju, Large-time behavior of non-symmetric Fokker-Planck type equations, Commun. Stoch. Anal. (2008) 153–175. [7] F. Bachmann, R. Hielscher, H. Schaeben, Texture analysis with MTEX – free and open source software toolbox, in: Texture and Anisotropy of Polycrystals III, in: Trans Tech Publications Ltd., 2010, pp. 63–68. [8] F. Bachmann, R. Hielscher, H. Schaeben, Grain detection from 2D and 3D EBSD data—specification of the MTEX algorithm, Ultramicroscopy 111 (2011) 1720–1733, https://doi.org/10.1016/j.ultramic.2011.08.002. [9] B. Ballnus, S. Hug, K. Hatz, L. Görlitz, J. Hasenauer, F.J. Theis, Comprehensive benchmarking of Markov chain Monte Carlo methods for dynamical systems, BMC Syst. Biol. 11 (2017) 63, https://doi.org/10.1186/s12918-017-0433-1. [10] B. Ballnus, S. Schaper, F.J. Theis, J. Hasenauer, Bayesian parameter estimation for biochemical reaction networks using region-based adaptive parallel tempering, Bioinformatics 34 (2018) i494–i501, https://doi.org/10.1093/bioinformatics/bty229. [11] K. Barmak, E. Eggeling, M. Emelianenko, Y. Epshteyn, D. Kinderlehrer, R. Sharp, S. Ta’asan, Critical events, entropy, and the grain boundary character distribution, Phys. Rev. B 83 (2011) 134117, https://doi.org/10.1103/PhysRevB.83.134117. [12] K. Barmak, M. Emelianenko, D. Golovaty, D. Kinderlehrer, S. Ta’asan, Towards a statistical theory of texture evolution in polycrystals, SIAM J. Sci. Comput. 30 (2008) 3150–3169, https://doi.org/10.1137/070692352. [13] S. Bigl, S. Wurster, M.J. Cordill, D. Kiener, Advanced characterisation of thermo-mechanical fatigue mechanisms of different copper film systems for wafer metallizations, Thin Solid Films 612 (2016) 153–164, https://doi.org/10.1016/j.tsf.2016.05.044. [14] J.P.S. Bizarro, P. Rodrigues, Numerical Fokker–Planck calculations in nonuniform grids, Phys. Plasmas 8 (2001) 1903–1909, https://doi.org/10.1063/1. 1348331. [15] O. Bluder, Prediction of Smart Power Device Lifetime Based on Bayesian Modeling, Ph.D. thesis, Alpen-Adria Universität, Klagenfurt, 2011. [16] R. Boiger, J. Hasenauer, S. Hroß, B. Kaltenbacher, Integration based profile likelihood calculation for PDE constrained parameter estimation problems, Inverse Probl. 32 (2016) 125009. [17] J.A. Carrillo, S. Cordier, S. Mancini, A decision-making Fokker-Planck model in computational neuroscience, J. Math. Biol. 63 (2011) 801–830, https:// doi.org/10.1007/s00285-010-0391-3. [18] J.S. Chang, G. Cooper, A practical difference scheme for Fokker-Planck equations, J. Comput. Phys. 6 (1970) 1–16. [19] Y.C. Chen, A tutorial on kernel density estimation and recent advances, Biostat. Epidemiol. 1 (2017) 161–187, https://doi.org/10.1080/24709360.2017. 1396742. [20] J.L. Doob, The Brownian movement and stochastic equations, Ann. Math. 43 (1942) 351–369, https://doi.org/10.2307/1968873. [21] J. Droniou, J.L. Vázquez, Noncoercive convection-diffusion elliptic problems with Neumann boundary conditions, Calc. Var. Partial Differ. Equ. 34 (2009) 413–434, https://doi.org/10.1007/s00526-008-0189-y. [22] F. Dunker, T. Hohage, On parameter identification in stochastic differential equations by penalized maximum likelihood, Inverse Probl. 30 (095001) (2014), https://doi.org/10.1088/0266-5611/30/9/095001, 20. [23] L.C. Evans, Partial Differential Equations, Graduate Studies in Mathematics, vol. 19, American Mathematical Society, Providence, RI, 1998. [24] F. Fröhlich, J. Hasenauer, D. Weindl, P. Stapor, AMICI – Advanced Matlab Interface to CVODES and IDAS. GitHub repository, http://icb-dcm.github.io/ AMICI/. [25] B. Gaviraghi, M. Annunziato, A. Borzì, Analysis of splitting methods for solving a partial integro-differential Fokker–Planck equation, Appl. Math. Comput. 294 (2017) 1–17. [26] A. Gramacki, Nonparametric Kernel Density Estimation and Its Computational Aspects, Studies in Big Data, Springer International Publishing, 2017. [27] W. Härdle, M. Müller, S. Sperlich, A. Werwatz, Nonparametric and Semiparametric Models, Springer Series in Statistics, Springer, Berlin Heidelberg, 2012. [28] S. Hermann, Bayesian Prediction for Stochastic Process Models in Reliability, Ph.D. thesis, Technische Universität Dortmund, 2016. [29] B. Kaltenbacher, B. Pedretscher, Parameter estimation in SDEs via the Fokker–Planck equation: likelihood function and adjoint based gradient computation, J. Math. Anal. Appl. 465 (2018) 872–884, https://doi.org/10.1016/j.jmaa.2018.05.048. [30] D. Kinderlehrer, C. Liu, Evolution of grain boundaries, Math. Models Methods Appl. Sci. 11 (2001) 713–729, https://doi.org/10.1142/ S0218202501001069. [31] D. Kinderlehrer, I. Livshits, S. Ta’asan, A variational approach to modeling and simulation of grain growth, SIAM J. Sci. Comput. 28 (2006) 1694–1715, https://doi.org/10.1137/030601971. [32] P.E. Kloeden, E. Platen, Numerical Solution of Stochastic Differential Equations, Applications of Mathematics – Stochastic Modelling and Applied Probability, vol. 23, Springer, Berlin Heidelberg, 2010. [33] J.K. Mackenzie, Second paper on statistics associated with the random disorientation of cubes, Biometrika 45 (1958) 229–240, https://doi.org/10.1093/ biomet/45.1-2.229. [34] W.Q. Meeker, L.A. Escobar, Teaching about approximate confidence regions based on maximum likelihood estimation, Am. Stat. 49 (1995) 48–53, https://doi.org/10.1080/00031305.1995.10476112. [35] M. Mohammadi, Analysis of Discretization Schemes for Fokker-Planck Equations and Related Optimality Systems, Ph.D. thesis, Julius-MaximiliansUniversität, Würzburg, 2015. [36] M. Mohammadi, A. Borzì, Analysis of the Chang–Cooper discretization scheme for a class of Fokker–Planck equations, J. Numer. Math. 23 (2015) 271–288, https://doi.org/10.1515/jnma-2015-0018. [37] S. Moser, G. Zernatto, M. Kleinbichler, M. Nelhiebel, J. Zechner, M.J. Cordill, R. Pippan, A novel setup for in-situ monitoring of thermo-mechanically cycled thin film metallizations, JOM J. Mineral. Metal. Mater. Soc. (2019), submitted for publication. [38] E. Parzen, On estimation of a probability density function and mode, Ann. Math. Stat. 33 (1962) 1065–1076, https://doi.org/10.1214/aoms/1177704472. [39] B. Pedretscher, An Inverse Problem to Simulate the Degradation Process of Semiconductor Devices, Master’s thesis, Alpen-Adria-Universität, Klagenfurt, 2015. [40] B. Pedretscher, M. Nelhiebel, B. Kaltenbacher, Applying a statistical model to the observed texture evolution of fatigued metal films, J. Mech. Phys. Solids (2019), submitted for publication. [41] K. Plankensteiner, Application of Bayesian Networks and Stochastic Models to Predict Smart Power Switch lifefetime, Ph.D. thesis, Alpen-AdriaUniversität, Klagenfurt, 2015. [42] A. Raue, C. Kreutz, T. Maiwald, J. Bachmann, M. Schilling, U. Klingmüller, J. Timmer, Structural and practical identifiability analysis of partially observed dynamical models by exploiting the profile likelihood, Bioinformatics 25 (2009) 1923–1929, https://doi.org/10.1093/bioinformatics/btp358. [43] D. Revuz, M. Yor, Continuous Martingales and Brownian Motion, Grundlehren der mathematischen Wissenschaften, Springer, Berlin Heidelberg, 2013. [44] M. Sambridge, A parallel tempering algorithm for probabilistic sampling and multimodal optimization, Geophys. J. Int. 196 (2014) 357–374, https:// doi.org/10.1093/gji/ggt342. [45] R. Serfozo, Basics of Applied Stochastic Processes, Probability and Its Applications, Springer, Berlin, Heidelberg, 2009. [46] P. Stapor, D. Weindl, B. Ballnus, S. Hug, C. Loos, A. Fiedler, S. Krause, S. Hroß, F. Fröhlich, J. Hasenauer, PESTO: Parameter EStimation TOolbox, Bioinformatics 34 (2018) 705–707, https://doi.org/10.1093/bioinformatics/btx676. [47] I. Swameye, T.G. Müller, J. Timmer, O. Sandra, U. Klingmüller, Identification of nucleocytoplasmic cycling as a remote sensor in cellular signaling by databased modeling, Proc. Natl. Acad. Sci. 100 (2003) 1028–1033, https://doi.org/10.1073/pnas.0237333100.

54

B. Pedretscher et al. / Applied Numerical Mathematics 146 (2019) 38–54

[48] F. Tröltzsch, Optimal Control of Partial Differential Equations: Theory, Methods, and Applications, Graduate Studies in Mathematics, vol. 112, American Mathematical Society, 2010. [49] D.J. Venzon, S.H. Moolgavkar, A method for computing profile-likelihood-based confidence intervals, J. R. Stat. Soc., Ser. C, Appl. Stat. 37 (1988) 87–94, https://doi.org/10.2307/2347496. [50] L. Wasserman, All of Nonparametric Statistics, Springer Texts in Statistics, Springer, New York, 2006.