Identification of diffusive transport by means of an incremental approach

Identification of diffusive transport by means of an incremental approach

Computers and Chemical Engineering 28 (2004) 585–595 Identification of diffusive transport by means of an incremental approach André Bardow, Wolfgang...

188KB Sizes 0 Downloads 35 Views

Computers and Chemical Engineering 28 (2004) 585–595

Identification of diffusive transport by means of an incremental approach André Bardow, Wolfgang Marquardt∗ Lehrstuhl für Prozesstechnik, RWTH Aachen, Turmstraße 46, D-52064 Aachen, Germany

Abstract Predictive models for diffusion in liquids contain still large uncertainties due to a lack of experimental data. Modern measurement techniques offer high-resolution concentration data but data analysis tools are usually designed for scarce data. A new incremental approach to model identification is therefore introduced and applied to the estimation of diffusion coefficients. The identification problem is split here in a sequence of inverse problems following the steps of model development. Thereby, model uncertainty and computational cost are minimized. The concentration dependence of binary diffusion coefficients can now be efficiently established from a single experiment. Furthermore, the robust estimation of the ternary diffusion matrix from a single data set is demonstrated. © 2004 Elsevier Ltd. All rights reserved. Keywords: Diffusion; Inverse problem; Parameter estimation; Error-in-variables regression; Smoothing spline; Akaike information criterion

1. Introduction Diffusion coefficients in liquid mixtures can still not be predicted with sufficient accuracy (Shapiro, 2003). Experiments are time-consuming and require rather specialized equipment. Furthermore, the use of data stemming from diffusion experiments is rather conservative. Only constant values of the diffusion coefficients are estimated from experimental data even for binary diffusion (e.g. Wild, 2003). At least two experiments are required for the measurement of a constant diffusion matrix in ternary diffusion (e.g. van de Ven-Lucassen & Kerkhof, 1999). Recently, a new measurement method using Raman spectroscopy has been introduced which reduces experimental effort significantly (Bardow, Marquardt, Göke, Koss, & Lucas, 2003). But in order to extract most information from the data, modern high-resolution measurement techniques such as optical spectroscopy have to be complemented by appropriate data analysis tools for model identification. In this work, diffusion coefficients will be estimated using a general stepwise approach to model identification which will be called incremental identification subsequently. It ∗ Corresponding author. Tel.: +49-241-806712; fax: +49-241-8888326. E-mail address: [email protected] (W. Marquardt).

0098-1354/$ – see front matter © 2004 Elsevier Ltd. All rights reserved. doi:10.1016/j.compchemeng.2004.02.003

follows the steps common to model development (Marquardt, 1995) in the identification procedure. Thereby, model uncertainty is minimized and computational cost of the estimation problems is reduced. A flexible framework is provided which easily incorporates cases where model candidates are available and cases where unstructured models have to be employed due to a lack of a priori knowledge. By the application of this efficient and robust model identification method to the analysis of diffusion experiments it will be shown that the new high-resolution measurement techniques allow the direct estimation of concentration dependent binary diffusion coefficients and of a constant ternary diffusion matrix from a single experiment. In the common approach to model identification candidate models based on prior knowledge or physical insight are proposed and inserted into the balance equations to determine fluxes and the contained transport coefficients. Measured data is then used to estimate unknown parameters in the fully specified models. The approach is here therefore called simultaneous identification. Finally, the most suitable model is chosen using model discrimination techniques (see e.g. Asprey & Macchietto, 2000). The simultaneous method suffers from several drawbacks: • the estimation problem may be biased due to the direct use of an unsuitable model structure in the balance equation (e.g. Walter & Pronzato, 1997);

586

A. Bardow, W. Marquardt / Computers and Chemical Engineering 28 (2004) 585–595

• computational cost is high since the whole estimation problem is solved for every possible candidate (e.g. Asprey & Macchietto, 2000). This may be prohibitive for the distributed systems considered here; • initialization and convergence may be difficult since the whole problem is solved in one step (e.g. Bates & Watts, 1988); • the methods are usually tailored for point measurements with low-resolution (e.g. Schittkowski, 2002) whereas modern experimental techniques offer data with high spatial and temporal resolution (e.g. Bardow et al., 2003). A stepwise model identification approach has been intuitively followed by Mahoney, Doherty, and Ramkrishna (2002) in the context of population balances. Here, the growth rate in crystallization processes was directly estimated from the balance equations and then correlated with state variables. Stepwise identification methods are also common in reaction kinetics (Froment & Bischoff, 1990). These approaches are derived from physical insight and intuition and often employ ad hoc solution methods. The proposed incremental approach provides here a rigorous unifying framework. In the next section, a model of the diffusion experiment will be formulated using the common approach to model development. The incremental approach to model identification will be derived from it. The method will then be applied and evaluated for the estimation of the concentration dependence of the binary diffusion coefficient. Cases of known and unknown model candidates will be highlighted. It is then shown that the approach carries over to the more complex case of multicomponent diffusion where the main benefit is expected. Finally, conclusions are given.

2. Incremental modelling and model identification As the basis for the identification of diffusion coefficients a model of the used experiment is required. The experimental set-up considered here has been recently reported by Bardow et al. (2003). In their experiments, two liquids of different composition are layered on top of each other in a diffusion cell, Fig. 1. One-dimensional mixing occurs by diffusion. ˜ j , tk ) of all components are measured The mole fractions x(z by Raman spectroscopy during the diffusive concentration equilibration process at discrete times tk and locations zj .1 Similar diffusion cells are used, e.g. in interferometry (Tyrell & Harris, 1984). Modeling of the experiment and model identification will address isothermal binary diffusion experiments first.

1 For ease of presentation the measurements x(z , t ), j = 1, J, k = 1, j k K will be indexed by a single index x(zi , ti ), i = 1, K ×J in the following. The total number of measurements distributed in space and time will be denoted n = K × J.

Fig. 1. Diffusion cell used in Raman experiments (setup rotated by 90◦ ).

2.1. Model development The key idea of incremental model identification is to exploit the model structure by following the steps of model development. Model development is commonly performed in incremental steps starting with the formulation of the balance equations. The mole balance for component 1 can be given as Model B :

∂c1 ∂N1 =− , ∂t ∂z

z ∈ (0, L), t > t0

(1)

where c1 denotes the concentration and N1 the molar flux of component 1. The boundary is usually impermeable and the initial conditions are known, i.e. N1 (0, t) = N1 (L, t) = 0,

and

c1 (z, t0 ) = c0 (z)

(2)

In a next step, a constitutive model for the molar flux N1 has to be given. By defining a suitable reference velocity the flux can be decomposed into diffusive and convective contributions as N1 = c1 uV + J1V

(3)

J1V is the diffusive flux of component 1 relative to the volume average velocity uV . Other reference frames for diffusion are possible (cf. Taylor & Krishna, 1993). The choice of the laboratory reference frame is especially convenient in experimental studies. In diffusion experiments the volume average velocity is usually negligible (Tyrell & Harris, 1984). Still, the modeller has to specify the diffusive flux J1V . For binary diffusion, all flux models can be related to Fick’s law (Taylor & Krishna, 1993). The model structure used for the fluxes is therefore Model F: uV = 0 V J1V = −D12

(4) ∂c1 ∂z

(5)

V denotes the Fick diffusion coefficient. where D12 Alternatively, the Maxwell–Stefan equation may be used to model the diffusive flux. Fick diffusion coefficients are usually measured in experiments whereas the Maxwell–Stefan approach is preferred in theoretical studies (Taylor & Krishna, 1993). The Maxwell–Stefan diffusivities –D(z, t) can be related to the Fick diffusion coefficients by (Taylor & Krishna, 1993)   ∂ ln γ1 D12 = –D12 1 + x1 = –D12 Γ1 (6) ∂x1

A. Bardow, W. Marquardt / Computers and Chemical Engineering 28 (2004) 585–595

The thermodynamic factor Γ 1 is a function of composition only in isothermal diffusion. It can be easily computed if a suitable model for the activity coefficient γ 1 is available (Taylor & Kooijman, 1991). A further constitutive relation is required to describe the diffusion coefficients contained in the flux models (5) and (6). Since the functional form of its concentration dependence is today still subject to discussion a generic relationship is given Model D :

V D12 =f(x1 , θ),

–D12 = f (x1 , θ)

2.2. Simultaneous model identification In the simultaneous approach only fully specified models are considered. Commonly, a least-squares problem is then formulated to estimate the unknown parameters θ contained in the model. For the diffusion experiment, it can be stated as min = θ

s.t.

n 

(x1 (zi , ti ) − x˜ 1 (zi , ti ))

2

(8)

i=1

  ∂c1 ∂c1 ∂ = f(x1 , θ) , ∂t ∂z ∂z ∂c1 (0, t) ∂c1 (L, t) = = 0, ∂z ∂z

measured data

z ∈ (0, L), t > t0 , c1 (z, t0 ) = c0 (z)

A number of such complicated parameter estimation problems may have to be solved if several candidates for the concentration dependence of the diffusion coefficient are considered. This simultaneous approach to identification is computationally expensive because of the high effort for the solution of one problem and the repetitive solution for the case of various candidate models. It may lead to biased estimates if there is modelling error anywhere in the model. It neglects the inherent model structure with its sequence of models, each containing further assumptions about the process. In contrast, the incremental approach follows the steps of model development. 2.3. Incremental model identification The close relation of the incremental procedure to model development is shown in Fig. 2. It will be explained in the following.

states xi(z,t)

balance envelope

fluxes N1 ( z, t )

model B

balances

model BF

balances

flux models

model BFD

balances

flux models

flux model

model for coefficient

(7)

where x1 represents the mole fraction of component 1 and θ collects all constant model parameters. If the model structure f (or f , respectively) and the value of its parameters θ are given, the model of the experiment can be solved in simulation. It should be stressed that the sequence in model development shown here is generic and not limited to mass transfer processes (Marquardt, 1995).

587

V

coefficients D12 ( z, t )

para-

diffusion meters coefficient

model structure and parameters for diffusion process

Fig. 2. Incremental approach to identification of diffusion models.

2.3.1. Model B: balances Balance equations contain the least uncertainty. The only assumptions to be made concern the balance envelope and the coordinates considered (cf. Fig. 2). Such choice is obvious in our case if one-dimensional transport prevails in the diffusion cell. Hence, Eq. (1) holds. The molar flux N1 itself can be computed as a function of space and time if concentration can be measured with high resolution in space and time. From the balance Eq. (1) and boundary condition (2) we obtain  z ∂c1 (z , t)

Model B : N1 (z, t) = (9) dz ∂t 0 The molar flux N1 is computed directly from the experimental data without introducing any further potentially uncertain constitutive equations. Note that no requirements are posed on the initial condition (2). A sharp concentration step between the lower and upper mixture is often required by the available data analysis methods for diffusion (Tyrell & Harris, 1984). The establishment of this step profile usually requires substantial experimental effort and care (see e.g. Pertler, Blass, & Stevens, 1996). The proposed method allows any initial profile and requires only measurements in the whole region of significant concentration gradients. In Eq. (9), the molar concentration c1 (z, t) of component 1 is used which has to be related to the measured mole fractions x˜ 1 (z, t). Assuming ideal mixture behavior we obtain c˜ 1 (z, t) =

x˜ 1 (z, t) (V20

+ (V10 − V20 )˜x1 (z, t))

(10)

where Vi0 is the molar volume of pure component i. Nonideal mixtures can be treated if a suitable reference frame is chosen (Crank, 1975). The main difficulty in the solution of Eq. (9) though is the estimation of the time derivative of the measured concentration data. This is known to be an ill-posed problem, i.e. small errors in the data will be amplified (Hansen, 1998). Regularization techniques have to be employed to compute a stable approximation. Here, a smoothing spline technique ˆ 1 is computed is used (Reinsch, 1967). A flux estimate N

588

A. Bardow, W. Marquardt / Computers and Chemical Engineering 28 (2004) 585–595

from Eq. (9) replacing c(z, t) by a smoothed concentration profile cˆ (z, t) that is the minimizer of the functional  2  n  ∂ c  1 2 (11) (c(ti , zi ) − c˜ (ti , zi )) + λ  2  min c n ∂t i=1

Here λ is a regularization parameter. This approach corresponds to the well-known Tikhonov regularization method. An extended Simpson’s rule is used here for solving the integral in Eq. (9). The crucial step in the estimation problem (11) is the selection of the parameter λ, which balances data and regularization error. Different methods are available from the literature (Hansen, 1998). Two heuristic methods will be used here since they apply even if there is no a priori knowledge about the measurement error. The first method, generalized cross-validation (GCV), is derived from leave-one-out cross-validation where one concentration data point c˜ (ti , zi ) is dropped from the data set. The regularization parameter is chosen for which the estimated spline predicts the missing point best on average (Craven & Wahba, 1979). The L-curve is a log–log–plot of a smoothing n norm (here: ||∂2 c/∂t 2 ||) over a residual norm (here: i=1 (c(ti , zi ) − c˜ (ti , zi ))2 ) (Hansen, 1998). This graph usually has a typical L-shape since the residual norm will be large for large λ, while the smoothing norm is minimized. For small λ, the residual norm will be minimized but the smoothing norm is large due to the ill-posed nature of the problem leading to oscillations in the solution. At intermediate λ values both error contributions are balanced. The optimal regularization parameter is therefore chosen as the corner point of the L-curve corresponding to the maximum curvature with respect to the regularization parameter. Computational routines for both methods are available in the Regularization Toolbox (Hansen, 1999). It should be noted that the estimation of the diffusive flux requires only the solution of the linear problem, Eq. (9), independent of the number of candidate models. All following estimation problems on the flux and coefficient model level (Fig. 2) are only algebraic. This decoupling of the problem reduces the computational expense substantially. But the decoupling comes at the price of an infinite dimensional estimation problem, the estimation of the molar flux N1 (z, t), which is only feasible given sufficient data. 2.3.2. Model F: flux models On the next level, several models for the fluxes may be considered (cf. Fig. 2). The unknown transport coefficients contained are then computed as functions of space and time. In the binary case, the diffusion coefficients are calculated from Eqs. (4) and (5) as Model F :

J1V (z, t) = N1 (z, t) V ˆ 12 (z, t) = − ⇒D

Jˆ 1V (z, t) ∂ˆc1 (z, t)/∂z

(12)

ˆ 1 (z, t) is taken from the previous The flux estimate N identification step. The spatial derivative ∂ˆc1 (z, t)/∂z is calculated using the smoothing spline approach; time differentiation has to be replaced by spatial derivatives in Eq. (11). The Maxwell–Stefan diffusion coefficients (6) could also be directly computed Model F :

ˆ 12 (z, t) = −D

Jˆ 1V (z, t) Γ1 (˜x1 (z, t))∂ˆc1 (z, t)/∂z

(13)

The thermodynamic factor Γ 1 is usually subject to substantial uncertainty because it contains the derivative of the activity coefficient (Taylor & Kooijman, 1991). It can be seen from Eq. (13) that this uncertainty strongly influences the quality of the estimated Maxwell–Stefan diffusivˆ 12 (z, t) (see also Rutten, 1992). ity –D It can be seen from Eqs. (12) and (13) that the diffusion coefficients can only be estimated if the spatial concentration gradient differs from zero. Further, the estimate will be very sensitive to noise if the gradient is small. A first test of model adequacy is already possible at this level. Since transport coefficients have a physical interpretation which results in certain restrictions (e.g. positivity), those model candidates violating any restriction could be discarded already at this stage. Such constraints have been commonly employed for model selection in reaction kinetics (Froment & Bischoff, 1990). Although, this step should be taken with care here due to the ill-posedness nature of the preceding steps. 2.3.3. Model D: diffusion coefficients Finally, the unknown model for the diffusion coefficient itself, Eq. (7), needs to be identified employing the ˆ V (z, t) (or –D12 (z, t)) of the model-based measurements D 12 previous step and the measured mole fraction data xˆ 1 (zj , tk ) (cf. Fig. 2). The actual computations depend strongly on the availability of reasonable candidate models f(·). First, the lack of sufficient a priori knowledge to formulate candidate models is treated. Here, a general approximation scheme is used. Then, the incremental identification will be described for situations where model candidates can be formulated. 2.3.3.1. Unstructured model candidates. If no reasonable model candidate f(·) can be formulated, a general parameterization for the diffusion coefficient (7) is introduced. The parameterization should be capable of approximating any function. Following closely Hanke and Scherzer (1999) the concentration range [0, 1] is divided into m intervals. The diffusion coefficient is approximated by a piecewise constant function in each interval   i−1 i f(x, θ) = θi , for x ∈ , , i = 1, . . . , m (14) m m

A. Bardow, W. Marquardt / Computers and Chemical Engineering 28 (2004) 585–595

Collecting the estimated diffusion coefficients sampled at ˆ V the residual equations ti , zi in a vector D 12  V     ˆ (t1 , z1 ) D θ1 0···1···0···0 12    .   . .   .   = .. .. (15)    .   V ˆ (tn , zn ) 0···0···1···0 θm D      12  ≡A

ˆV ≡D 12

can be derived from Eqs. (7) and (14). Matrix A is extremely sparse containing only a single “1” per row denoting the appropriate concentration interval. The parameter estimates θˆ follow from minimizing some norm of the equation residuals. It turns out in practice that it is more advantageous to insert the diffusion coefficient model (14) into the transport law (5) to avoid explicit division by the spatial concentration gradient (see Eq. (12)). The resulting residual equations read V ˜ Jˆ 1 = −Aθ

(16) V Jˆ 1

The vector now contains estimated diffusive fluxes Jˆ 1V (zi , ti ) sampled at ti , zi , as determined in the previous identification step (cf. Fig. 2). The “1” in row i of matrix A is replaced by the value of the estimated spatial derivative ˜ ∂ˆc(zi ti )/∂z to form matrix A. The estimation problem for the unknown parameter vector θ can now be given as V

˜ − Jˆ i ||2 min || − Aθ θ

(17)

For the solution of such discrete ill-posed problems several methods have been proposed (Hansen, 1998). Because ˜ of the large problem size and the sparsity of matrix A iterative regularization methods are the most appropriate choice for the solution (Hanke & Scherzer, 1999). Here, the conjugate gradient method (CG) is employed using the Regularization Toolbox (Hansen, 1999). A preconditioner enhancing smoothness is used. The number of CG-iterations serves as the regularization parameter. The L-curve is used as a stopping rule. This procedure leads to an unstructured model for the unknown diffusion coefficient. It is represented as a piecewise constant function of concentration. 2.3.3.2. Structured model candidates. The generic estimation scheme presented above provides a data-driven approach which could be used to generate structured model candidates. In this section, it is therefore shown how the incremental approach can be applied if several models f(·) (Eq. (7)) for the diffusion coefficient are available—either from a priori knowledge or from an investigation as above. In either case, the parameters θ contained in these structured models can be adjusted using once more the estimated diffusion coefficient model from Eq. (12). Note that both, ˆ V (zi , ti ) and the meathe estimated diffusion coefficients D 12 sured mole fractions x˜ (zi , ti ), contain errors. The estimation

589

has therefore to be formulated as an error-in-variables (or orthogonal distance regression (ODR)) problem:   n  δ2i V 2 ˆ 12 (zi , ti ) −f(˜x(zi , ti ) + δi , θ)] + [D min φ = min θ,δ θ,δ ω2 i=1

(18) where δ represents the errors in the measured mole fraction and ω balances both types of errors. If the weight ω is chosen as the variance ratio ω = σδ /σD then the maximum likelihood estimate is computed (see Appendix A). The efficient solution method by Boggs, Byrd, Rogers, and Schnabel (1992) is used to solve this problem. Now, the adequacy of the model candidates has to be quantified. Carroll (2003) suggested the use of Akaike’s information criterion (AIC) embedded within a likelihood framework. Since the formulation of the AIC for the ODR problem appears to be a new result, the full derivation is given in Appendix A. A modified form of the criterion can finally be given as V ˆ 12 ΦD (zi , ti ) − f(˜x(zi , ti ) + δi , θ)Φ ODR = n ln φ + np (19)

where np is the number of parameters contained in the model and φ the ODR objective (18). The basic idea is to balance the goodness-of-fit, represented by the ODR objective, with the number of parameters used. Note that the strict statistical interpretation of the AIC criterion is lost in the incremental approach due to the error propagation. It is used here as a heuristic: low AIC values Φ ODR indicate a preference for the corresponding model. The use of other model discrimination criteria might enhance the performance but the choice of an optimal criterion is difficult in practice (Verheijen, 2003). AIC is a well-established method that has worked well given sufficient data (Stewart, Shon, & Box, 1998). For the considered single-response case with known variance, AIC is closely related to the Bayesian criterion suggested by Stewart et al. (1998).

3. Application and evaluation of the suggested method In order to exemplify the incremental approach to model identification for diffusive mass transfer the “true” relation between the diffusion coefficient and concentration is considered V D12 = ϑ1 + ϑ2 (x1 − 0.5)2 + ϑ3 (x1 − 0.5)6

(20)

This constitutive equation should be recovered from measurements of the mole fraction x1 . The example considered is particularly challenging because of the non-monotonous behaviour of the diffusion coefficient (Cannon & DuChateau, 1980). The diffusion cell is assumed to be of length L = 10 mm. At t = 0 the lower half is filled with pure component 1, pure component 2 is layered on top. Measurements of the

590

A. Bardow, W. Marquardt / Computers and Chemical Engineering 28 (2004) 585–595

mole fraction x˜ 1 (zj , tk ) are taken with a resolution of #z = 0.1 mm and #t = 120 s. The experiment runs for tfinal = 2 h. Gaussian noise is added to simulated mole fraction data to represent noisy measurements. The noise level was chosen as σ = 0.01. This corresponds to very unfavourable experimental conditions for binary Raman experiments (Bardow et al., 2003). Because of the use of simulated data it is possible to evaluate the different steps of the incremental algorithm. First, no structured model for the concentration dependence will be assumed. It will be shown that the concentration dependence can be identified in a fully data-driven approach. The piecewise constant representation of the diffusion coefficient, Eq. (14), is therefore employed first. Based on these results different structured models for the diffusion coefficient are proposed. Their parameters are re-adjusted using the ODR method, Eq. (18). Finally, model candidates are selected based on AIC, Eq. (19). 3.1. Flux estimation Following the formulation of the balance Eq. (1) the molar flux is estimated from Eq. (9). The crucial step of derivative estimation can be done here either by differentiating the mole fraction data x˜ 1 (z, t) or by first computing the concentration c˜ 1 (z, t) (Eq. (10)) and the subsequent computation of its derivative. The latter choice may seem more natural because the concentration derivative is used in Eq. (9). But the derivation of general cross validation assumes Gaussian noise (Craven & Wahba, 1999). This error distribution can be expected for the mole fractions x˜ 1 (z, t) but will be skewed for the concentration c˜ 1 (z, t) if the difference between the molar volumes is large. Both approaches were therefore tested. In practical situations the difference is negligible and both methods lead to very similar solutions (not shown here). In the following the derivative will be calculated from the concentration data c˜ 1 (z, t). The performance of the regularization parameter choice methods is now analyzed. Fig. 3 shows the errors in the ˆ 1 . The error ε is defined by estimated fluxes N ε(z, t) =

ˆ 1 (z, t) − N1 (z, t) N maxz N1 (z, t)

(21)

Table 1 RSS for flux estimation using GCV and L-curve based on mole fraction measurements with variance σ 2 Method

σ2 = 0

σ 2 = 10−6

σ 2 = 10−5

σ 2 = 10−4

σ 2 = 10−3

GCV L-curve

0.0181 0.0376

0.0197 0.0437

0.0299 0.0765

0.0853 0.1147

0.2439 1.5227

The errors are large initially. Note that the flux is a Dirac impulse for time t = 0. The steep gradients at the beginning of the run cannot easily be distinguished from measurement noise since they contradict the proposed smoothness assumption. This is in accordance with observations by other authors who realized that their estimates improved if the first data points were omitted (Hanke & Scherzer, 1999). This suggestion is also followed here. For larger times the estimate is quite good. It is difficult to decide from Fig. 3 which regularization parameter choice method performs better. Table 1 gives the residual sum of squares (RSS) for different noise levels covering the experimental range. The RSS is calculated as RSS =

n 

ˆ 1 (zi , ti ) − N1 (zi , ti ))2 (N

(22)

i=1

The shown values are averaged over 16 realizations of the simulated noise. Both methods give errors in the same order of magnitude. The resulting error sum for the L-curve method is higher by a factor of 1.3–6.2. It should be noted that the L-curve approach suffers from some objections from a theoretical viewpoint (Hansen, 1998). Even so, the results here show its practical value. The drawback of the GCV method is the high computational effort for large problems. 3.2. Diffusion coefficient using unstructured models A piecewise constant representation of the diffusion coefficient (14) is estimated next using the computed flux values Jˆ 1V (zi , ti ), the estimated spatial derivatives ∂ˆc(zi , ti )/∂z and the measured mole fractions x˜ (zi , ti ) using Eq. (17). The optimal iteration number for the CG iterations is chosen by the L-curve as shown in Fig. 4. The smoothing norm here 0

10

5 t =8 min

0

smoothing norm

− 5 [%]

− 10 − 15 − 20

t =2 min

corner point

−1

10

−2

10

−2

−5.23

10

−5.22

10

10

− 25 − 30 −35

iteration number

GCV L−curve 0

2

4

6

8

−3

10

10

z [mm]

Fig. 3. Errors in estimated flux J1 using GCV and L-curve (σ = 0.01).

−5.23

10

−5.21

10

−5.19

10

residual norm

Fig. 4. L-curve for choice of iteration number.

A. Bardow, W. Marquardt / Computers and Chemical Engineering 28 (2004) 585–595



-3

1.6

591

x 10

true estimated

4π Lx



M2 :

f2 (x, ϑ) = ϑ1 + ϑ2 x + ϑ3 cos

M3 :

f3 (x, ϑ) = ϑ1 + ϑ2 (x − 0.5)2 + ϑ3 (x − 0.5)4

(24)

V

2

D12 [mm /s]

1.4 1.2

(25)

5%error band

1

f4 (x, ϑ) = ϑ1 + ϑ2 (x − 0.5)2

M4 :

0.8

+ ϑ3 (x − 0.5)4 + ϑ3 (x − 0.5)6 0.6 0

0.2

0.4

0.6

0.8

1

Fig. 5. Estimated and true diffusion coefficient. V with respect to approximates the second derivative of D12 concentration; the residual norm is the objective function value (17). The estimated and the true concentration depenV are compared in Fig. 5. dence of the diffusion coefficient D12 The shape of the concentration dependence is well captured. It should be noted that only data from one experiment were used. Commonly, more than 10 experiments are employed (Tyrell & Harris, 1984). Nevertheless, the error is well below 5% for most of the concentration range. The minima and the maximum are found quite accurately in location and value. The values of the diffusion coefficient at the boundaries of the concentration range are not identifiable since the concentration gradient vanishes here (cf. Eq. (12)). Better estimates are only possible with a more sophisticated experimental procedure which establishes large gradients in these regions of dilution, for example, by some kind of periodic forcing at the boundaries. The discretization level of the diffusion coefficient had only minor influence on the final result. Here, the concentration range was split into m = 500 intervals, i.e. 500 parameters have to be estimated. This prohibits the use of the simultaneous approach whereas the incremental approach takes an average CPU time of only 8 s on a standard desktop PC. This substantial reduction in computation time is mainly due to the decoupling of the problem. The use of an equation error scheme further reduces computational cost because the repeated solution of the model is avoided.

3.3. Diffusion coefficient from model candidates Even though a model for the diffusion coefficient has been found, it is desirable in practice to obtain a simpler expression than the piecewise constant representation. Such a simple form can now easily be found based on the previous investigation. Usually, polynomials of different order are used to represent the diffusion coefficient (Tyrell & Harris, 1984). Fig. 5 suggests that low order polynomials would not be suitable. The following candidate models f(x, ϑ) are considered: M1 :

f1 (x, ϑ) = ϑ1 + ϑ2 (x − 0.5)2 + ϑ3 (x − 0.5)6

(true model)

f5 (x, ϑ) = ϑ1

M5 :

mole fraction [−]

(23)

(26) (27)

The parameters ϑ in the models M1–M5 are initialized by fitting them to the piecewise constant representation of the diffusion coefficient (14). The following least squares problem is here solved: min ϑ

m 

(θi − fl (xi , ϑ))2 ,

l = 1, . . . , 5

(28)

i=1

This is a linear regression problem in the considered example. Therefore, structured models for the desired diffusion coefficient which are linear in the parameters are obtained with very low additional computational effort. The estimates ϑˆ can be improved using the orthogonal distance regression as given in Eq. (18) if f(·) is replaced by Eqs. (23)–(27). This step removes the influence of potential identification errors of the piecewise constant representation. Since good initial guesses were available from problem (28) convergence was achieved in only two iterations for each model. As noted above it turned out to be advantageous to avoid explicit division by the estimated spatial derivative. The diffusion coefficient models M1–M5 were therefore inserted in the flux Eq. (5) to formulate the ODR problem min

θ,δ1 ,δ2

n  

Jˆ 1V (zi , ti ) + fl (˜x(zi , ti ) + δ1,i , θ)

i=1

 ×

∂ˆc(zi , ti ) + δ2,i ∂z

2 +

δ21,i ω12

+

δ22,i ω22

(29)

for l = 1, . . . , 5. The fit of the different models is compared to the true solution in Fig. 6. Models M2 and M5 differ strongly from the true solution whereas the other candidates capture the concentration dependence very well. In order to assess quantitatively the performance of the model candidates Akaike’s information criterion is employed, Eq. (19). The resulting values for Φ ODR are given in Table 2. The criterion does not lead to a strong preference for any model. This is mainly due to the fact that a large contribution to the likelihood function results from the correction of the measured concentration and the estimated derivatives, i.e. the last two terms in Eq. (29). These terms occur for all models and therefore reduce the significance of the model structure contribution.

592

A. Bardow, W. Marquardt / Computers and Chemical Engineering 28 (2004) 585–595 −3

−3

x 10

1.6 true M2 M5

2

1.4

Diffusion coefficient [mm /s]

2

Diffusion coefficient [mm /s]

1.6

1.2 1 0.8 0.6 0

0.2

0.4

0.6

0.8

1

mole fraction [−]

x 10

true M1 M3 M4

1.4 1.2 1 0.8 0.6 0

0.2

0.4

0.6

0.8

1

mole fraction [−]

Fig. 6. Fitted model candidates and true diffusion coefficient.

Table 2 Scaled values of AIC for model discrimination using ODR results, the standard least-squares formulation (LS) and discrimination from the simultaneous method (sim) Model

Φ ODR Φ LS Φ sim

M1

M2

M3

M4

M5

1.003 1 1

1.024 1.33 –

1.001 1.02 1.009

1 1.01 1.003

1.037 1.73 –

Furthermore, Boggs and Rogers (1990) pointed out that sufficient knowledge of the variance ratio ω is crucial in orthogonal distance regression. They state that reliable parameter estimates can be obtained if ω is known within one order of magnitude. But in order to obtain reliable statistical estimates ω should be known within a factor of 2. Since it is difficult in practice to predict ω with high precision,2 the AIC results should be interpreted with care. This uncertainty in the statistical estimate of model quality can be easily overcome: if a single simulation run is performed for each model M1–M5 with the estimated parameters, the well-known AIC criterion for least-squares estimation Φ LS can be applied. The values of this criterion are also given in Table 2. It should be pointed out that this adds only the cost of one additional simulation per model candidate which is still much lower than the effort of simultaneous parameter estimation. Models M2 and M5 lead now to significantly higher AIC values and can be discarded from further investigations. The true model M1 has the smallest AIC value but models M3 and M4 lead to almost identical values. These

2 A maximum likelihood estimate of ω cannot be computed (Bard, 1974). Estimation methods for the variance are available (see e.g. Seifert, Gasser, & Wolf, 1993) but the estimates contain substantial uncertainty in practice. Boggs et al. (1992) recommend to choose ω such that the magnitude of both contributions to the ODR objective (18) is approximately the same in order to improve the performance of the algorithm. This approach was used in the computations shown here.

model structures are very similar and one cannot expect to discriminate them based on a single experiment. Due to the regularization and error propagation, estimates are slightly biased using the incremental approach (Bardow & Marquardt, 2003). It is therefore advantageous to re-adjust the parameter estimates using the simultaneous approach (8) if computationally feasible. The incremental approach here helps to generate a small number of suitable candidate models and good starting values for their parameters. The standard AIC values after the re-fitting using the simultaneous approach are also included in Table 2. These values retain the statistical interpretation of the AIC criterion and should form the final basis for model discrimination. But it should be noted that the least-squares objective was only reduced by less than 10% in the re-adjustment. This shows that not only model structure candidates are found by the incremental approach but also the parameters are obtained with very low bias. In order to select one structure with sufficient confidence, further investigations which should be based upon optimal experimental design techniques for model discrimination (Chen & Asprey, 2003) are required.

4. Extension to ternary diffusion The extension of the approach presented in Section 2 to ternary diffusion is straightforward. It will be first shown for the case of constant Fick diffusion coefficients which is usually considered in practice. The estimation of Maxwell–Stefan diffusivities is demonstrated subsequently. In this section, the full diffusion coefficient matrix is estimated from a single experiment using the incremental approach. Sample diffusion coefficients are taken from Arnold and Toor (1967) who studied gaseous systems with constant diffusion coefficients. Current measurement procedures in ternary mixtures require at least two experiments to estimate these coefficients and the cross coefficients are still ill-determined (e.g. van de Ven-Lucassen & Kerkhof, 1999).

A. Bardow, W. Marquardt / Computers and Chemical Engineering 28 (2004) 585–595

The balance equations are again the starting point for incremental identification. They retain their structure (Eq. (1)), but are now formulated for two components. Using already u = 0 they can be stated as ∂ci ∂Ji =− , i = 1, 2 (30) ∂t ∂z The balance equations for both components are decoupled and the flux estimation can be performed exactly as in Section 2.3.1. The computational complexity therefore increases only in a linear fashion in the case of multicomponent mixtures. The multicomponent nature shows on the next level of the model refinement hierarchy, namely in the flux model (cf. Fig. 2). The diffusive flux of one component is now also influenced by the concentration gradients of the other components. This requires the use of generalized Fick’s law:       ∂c1 J1 D11 D12  ∂z  Model F : =− (31)  ∂c  2 J2 D21 D22 ∂z leading to a matrix of four Fick diffusion coefficients (Taylor & Krishna, 1993). The use of Fick diffusion coefficients is still often preferred in practical applications whereas diffusion theory is usually formulated using the Maxwell–Stefan transport law (Taylor & Krishna, 1993):  ∂x  1

  Model F :  ∂z  ∂x2 ∂z   x 1 J2 − x 2 J1 x1 (−J1 − J2 ) − (1 − x1 − x2 )J1 +   ct –D12 ct –D13  =   x 2 J1 − x 1 J2 x2 (−J1 − J2 ) − (1 − x1 − x2 )J2  + ct –D12 ct –D23 (32) The quantities –Dij denote the Maxwell–Stefan diffusion coefficients. The Maxwell–Stefan formulation leads to implicit equations for the fluxes and is therefore often avoided. It will be used here as a second model candidate for the diffusive flux. It will be shown that they naturally integrate into the incremental procedure. Neither generalized Fick’s law (31) nor the Maxwell– Stefan approach (32) can be solved for the diffusion coefficients. Hence, they are not identifiable as functions of space and time. The correlation of these model-based measurements –Dij (z, t) and –D(z, t) with other state variables is therefore not possible in the multicomponent case. Hence, identification step F is not possible and has to be eliminated. If parameterized constitutive laws for the diffusion coefficients (Eq. (7)) can be formulated they can be inserted into the flux expression, Eqs. (31) and (32). The contained parameters θ can then be estimated by orthogonal distance regression using the estimated fluxes Jˆ i , spatial derivatives ∂ˆci /∂z and the measured mole fractions x˜ i (cf. Eq. (29)).

593

Table 3 Ternary diffusion coefficients from a single experiment in 10−5 m2 /s, simulated mole fraction corrupted by Gaussian noise with standard deviation σ Coefficient Dij

D11

D12

D21

D22

Fick (σ = 0.001) Error (%)

4.41 −0.7

1.78 −2.8

3.50 −3.8

6.08 −3.5

Fick (σ = 0.01) Error (%)

4.04 −9.0

1.22 −33.4

3.25 −10.8

5.64 −10.4

Coefficient –Dij

–D12

–D13

–D23

Maxwell–Stefan (σ = 0.001) Error (%) Maxwell–Stefan (σ = 0.01) Error (%)

2.38 1.73

6.06 0.57

9.50 −3.64

2.48 −7.38

5.52 4.61

10.07 2.09

First, the case of a constant Fick diffusion coefficient matrix is treated. The estimated values are compared to the true solution in Table 3. The diffusion matrix can be estimated from a single experiment with good precision using the incremental approach. The measurement variance currently found in binary Raman experiments corresponds to the better case (σ = 0.001). It is expected that the quality decreases if additional components are introduced. But it can be seen in Table 3 that estimation still gives reasonable values for high noise levels. The fact that all deviations are negative suggests that the smoothing spline tends to oversmooth the initially very sharp profiles. Additionally, it should be stressed that this estimation problem is very difficult to solve by the simultaneous approach. The Fick matrix is positive definite which is enforced by three inequality constraints (Taylor & Krishna, 1993). In parameter estimation, a sequential approach with an infeasible path optimization routine is often used. This allows the constraints to be violated. In this case, the model cannot be integrated and the procedure usually aborts. This limitation does not apply to the incremental approach since no solution of the direct problem is required. It could therefore also be used to initialize the simultaneous procedure. The orthogonal distance regression can be performed in the same manner using the Maxwell–Stefan formulation (32). The ODR problem is now nonlinear in the parameters but the package ODRPACK (Boggs et al., 1992) handles these cases as well. The estimated Maxwell–Stefan diffusivities are also given in Table 3. The accuracy is even better than for the Fick matrix because fewer parameters have to be determined.

5. Conclusions The advent of new high-resolution measurement techniques for diffusion allows the reduction of experimental effort and helps to expand the data basis to generate predictive models. But suitable methods for data analysis have to

594

A. Bardow, W. Marquardt / Computers and Chemical Engineering 28 (2004) 585–595

be used to extract most information from experimental data sets. A new incremental approach to model identification is therefore introduced and applied to the estimation of diffusion coefficients from optical spectroscopy measurements. The incremental approach leads to efficient and robust model identification. It could be shown that the information from a single experiment is sufficient to estimate the concentration dependence of the binary diffusion coefficient in the full concentration interval. It can be established with little computational effort in the order of seconds using an entirely data-driven approach or model candidates from a priori knowledge. In ternary mixtures, the full diffusion matrix can be measured in a single experiment with good precision. Generalized Fick’s model as well as the Maxwell–Stefan equations integrate naturally into the incremental framework. Furthermore, the introduced incremental approach provides a very general framework for identification and is not limited to diffusion. The incremental approach exploits the inherent model structure. Thereby, uncertainty in each step of the calculation is reduced and intermediate tests on assumptions are possible. It decouples the inverse problems on every level of the identification hierarchy which reduces the computational cost substantially and allows fast model identification even in distributed systems. In order to retain statistically optimal estimation and discrimination results, a single simultaneous parameter readjustment should be performed as a last step if computationally feasible. A work process for model identification is suggested in Bardow and Marquardt (2003) where a rigorous comparison between incremental and simultaneous methods is presented. The application of the method for reaction kinetics and transport in flow systems is currently investigated (see e.g. Brendel, Mhamdi, Bonvin, & Marquardt, 2004). Acknowledgements The authors gratefully acknowledge the financial support by the Deutsche Forschungsgemeinschaft (DFG) within the Collaborative Research Center (SFB) 540 “Model-based Experimental Analysis of Kinetic Phenomena in Fluid Multiphase Reactive Systems”. Appendix A. Akaike’s information criterion for orthogonal distance regression The objective function Φ of Akaike’s information criterion can be given in general form as (Walter & Pronzato, 1997) 1 Φ = (−ln L + np ) (A.1) n where n denotes the number of measurements, L the likelihood function and np the number of parameters in the model.

We will now consider an error-in-variables problem. The model can be given in general form as (Bard, 1974) yi = f (xi , θ)

(A.2)

Now, the measurement of the model outputs yi ∈ Rm as well as measurement of the independent variables xi ∈ Rs contain errors. A single vector of residuals is therefore defined as   x˜ i − xi ei (θ, xi ) = (A.3) y˜ i − f (xi , θ) Here, x˜ i and y˜ i denote the measured quantities. The log-likelihood for an error-in-variables model is defined as (Bard, 1974)  n n ln L = 0 − (s + m) ln 2π − ln det V 2 2 n 1  T −1 ei V ei (A.4) − 2 i=1

The quantities s, m denote the dimensionality of the model outputs and the independent variables, respectively. V is the covariance matrix of the measured quantities. In case of error-in-variable regression, a maximum likelihood estimate of the covariance matrix V cannot be computed if it is entirely unknown (Bard, 1974). It is therefore assumed here that V is parameterized according to     σx2 I 0 σx2 I 0 V = = (A.5) 0 σy2 I 0 ω2 σx2 I where the variance ratio ω = σy /σx is assumed to be a known constant. The log-likelihood function (A.4) becomes now  n nm ln L = − (s + m) (ln 2π + ln σx2 ) − 2 2 ln ω2 n T T 2 1  exi exi + (eyi eyi )/ω − (A.6) 2 σx2 i=1

The maximum likelihood estimate of the variance σˆ x2 can then be computed from ∂ ln L/∂σx2 = 0 as   n Te  e yi 1 yi σˆ x2 = eTxi exi + (A.7) n(s + m) ω2 i=1

Using this result the log-likelihood can be given as  m 1 (s + m)n ln 2π + 1 + ln ω2 + ln ln L = − 2 m+s n(s + m)   n T  eyi eyi + ln eTxi exi + (A.8) ω2 i=1

Note that the last term coincides with the objective function used in orthogonal distance regression (ODR), Eq. (18).

A. Bardow, W. Marquardt / Computers and Chemical Engineering 28 (2004) 585–595

ODR therefore computes the maximum likelihood estimate if the covariance can be parameterized by Eq. (A.5). The expression for the log-likelihood can be inserted in Akaike’s information criterion (A.1). In order to compare different models it is convenient to consider a reduced criterion in which all model independent terms are removed. This reduced AIC Φ ODR is then   n eTyi eyi (s + m)n  T

ΦODR = ln (A.9) exi exi + + np 2 ω2 i=1

For the binary diffusion example, the dimension of the measured concentration is s = 1, the dimension of computed diffusion coefficient is m = 1. The equation therefore reduces to the form given in Eq. (19). References Arnold, K. R., & Toor, H. L. (1967). Unsteady diffusion in ternary gas mixtures. AIChE Journal, 13, 909–914. Asprey, S. P., & Macchietto, S. (2000). Statistical tools in optimal model building. Computers & Chemical Engineering, 24, 1261–1267. Bard, Y. (1974). Nonlinear parameter estimation. New York: Academic Press. Bardow, A., & Marquardt, W. (2003). Incremental and simultaneous identification of reaction kinetics: Methods and comparison, submitted for publication. Bardow, A., Marquardt, W., Göke, V., Koss, H. J., & Lucas, K. (2003). Model-based measurement of diffusion using Raman spectroscopy. AIChE Journal, 49, 323–334. Bates, D. M., & Watts, D. G. (1988). Nonlinear regression analysis and its applications. New York: Wiley. Boggs, P. T., & Rogers, J. E. (1990). Orthogonal distance regression. In P. J. Brown, & W. A. Fuller (Eds.), Statistical analysis of measurement error models and their applications. Contemporary mathematics (vol. 112, pp. 183–194). Providence: American Mathematical Society. Boggs, P. T., Byrd, R. H., Rogers, J. E., & Schnabel, R. B. (1992). User’s reference guide for ODRPACK, version 2.01. Software for weighted orthogonal distance regression, NISTIR 92-483. US Department of Commerce, National Institute of Standards and Technology. Brendel, M., Mhamdi, A., Bonvin, D., & Marquardt, W. (2004). An incremental approach for the identification of reactor kinetics. In F. Allgöwer, F. Gao (Eds.), Preprint of the 7th IFAC Symposium on Advanced Control of Chemical Processes, ADCHEM (vol. I, pp. 177–182). Hong Kong. Cannon, J. R., & DuChateau, P. (1980). An inverse problem for a nonlinear diffusion equation. SIAM Journal on Applied Mathematics, 39, 272– 289. Carroll, R. J. (2003). Personal communication. Chen, B. H., & Asprey, S. P. (2003). On the design of optimally informative dynamic experiments for model discrimination in multiresponse nonlinear situations. Industrial & Engineering Chemistry Research, 42, 1379–1390.

595

Crank, J. (1975). The mathematics of diffusion. Oxford: Clarendon. Craven, P., & Wahba, G. (1979). Smoothing noisy data with spline functions-estimating the correct degree of smoothing by the method of generalized cross-validation. Numerische Mathematik, 31, 377– 403. Froment, G. F., & Bischoff, K. B. (1990). Chemical reactor analysis and design. New York: Wiley. Hanke, M., & Scherzer, O. (1999). Error analysis of an equation method for the identification of the diffusion coefficient in a quasi-linear parabolic differential equation. SIAM Journal on Applied Mathematics, 59, 1012–1027. Hansen, P. C. (1998). Rank-deficient and discrete ill-posed problems. Philadelphia: SIAM. Hansen, P. C. (1999). Regularization tools version 3.0 for Matlab 5.2. Numerical Algorithms, 20, 195–196. Mahoney, A. W., Doyle, F. J., & Ramkrishna, D. (2002). Inverse problems in population balances: Growth and nucleation from dynamic data. AIChE Journal, 48, 981–990. Marquardt, W. (1995). Towards a process modeling methodology. In R. Berber (Ed.), Methods of model-based control (pp. 3–41). Dordrecht: Kluwer Academic Publishers. Pertler, M., Blass, E., & Stevens, G. W. (1996). Fickian diffusion in binary mixtures that form two liquid phases. AIChE Journal, 42, 910–920. Reinsch, C. H. (1967). Smoothing by spline functions. Numerische Mathematik, 10, 177–183. Rutten, P. W. M. (1992). Diffusion in liquids. Ph.D. thesis, Technical University Delft. Schittkowski, K. (2002). EASY-FIT: A software system for data fitting in dynamic systems. Structural and Multidisciplinary Optimization, 23, 153–169. Seifert, B., Gasser, T., & Wolf, A. (1993). Nonparametric estimation of residual variance revisited. Biometrika, 80, 373–383. Shapiro, A. A. (2003). Evaluation of diffusion coefficients in multicomponent mixtures by means of the fluctuation theory. Physica A, 320, 211–234. Stewart, W. E., Shon, Y., & Box, G. E. P. (1998). Discrimination and goodness of fit of multiresponse mechanistic models. AIChE Journal, 44, 1404–1412. Taylor, R., & Kooijman, H. A. (1991). Composition derivatives of activity coefficient models (for the estimation of thermodynamic factors in diffusion). Chemical Engineering Communications, 102, 87–106. Taylor, R., & Krishna, R. (1993). Multicomponent mass transfer. New York: Wiley. Tyrell, H. J. V., & Harris, K. R. (1984). Diffusion in liquids. London: Butterworths. van de Ven-Lucassen, I. M. J. J., & Kerkhof, P. J. A. M. (1999). Diffusion coefficients of ternary mixtures of water, glucose, and dilute ethanol, methanol, or acetone by the Taylor dispersion method. Journal of Chemical and Engineering Data, 44, 93–97. Verheijen, P. J. T. (2003). Model selection: An overview of practices in chemical engineering. In S. P. Asprey, & S. Macchietto (Eds.), Dynamic model development: Methods, theory and applications (pp. 85–104). Amsterdam: Elsevier. Walter, E., & Pronzato, L. (1997). Identification of parametric models: From experimental data. Berlin: Springer. Wild, A. (2003). Multicomponent diffusion in liquids. Ph.D. thesis, Düsseldorf: VDI Verlag.