Chemical Printed
Engineering
in Great
Science.
Vol. 45, No.
12, PP. 3417-3426,
1990.
ooo5-2509po $3.00 + o.co 0 1990 Pergamon Press plc
Britain
TARGET
FACTOR ANALYSIS FOR OF STOICHIOMETRIC
THE IDENTIFICATION MODELS
D. BONVIN’
and D. W. T. RIPPIN Laboratorium, E.T.H. Ziirich, 8092 Ziirich, Switzerland
Technisch-chemisches
(First received 6 April 1989; accepted in revised form 19 February 1990) Abstract-A methodology for identifying stoichiometric models for complex reaction systems is developed. The approach is useful for (1) deriving simple, approximate stoichiometric models for complex systems, or (2) testing possible stoichiometries and thereby investigating reaction pathways. The method of factor analysis is used to determine the number of reactions and derive an observed stoichiometric space from measured composition and possibly thermal data. The validity of proposed target stoichiometries can then be tested for compatibility with this stoichiometric space. The approach is straightforward with noise-free data. Various ways of coping with measurement errors are also presented. A major contribution of this paper consists in an approach for incorporating known stoichiometric information, thereby reducing considerably the effect of measurement noise. The methodology is tested on a simulated model of an industrial system. It is found to be very useful for clarifying the unknown part of the stoichiometric model. Extensions that should facilitate the applicability of the approach to real industrial systems are also discussed.
1. INTRODUCTION
Successful
use
of
2. STOICHIOMETRY
chemical-reaction
models
for
the
purpose of reactor and plant design, production planning and optimization, or monitoring and control of actual operation depends on the quality of the kinetic information built into the models. This quality, in turn, depends directly on the validity of the stoichiometric expressions chosen to represent the reaction network. The chemist specifies a reaction mechanism through a set of stoichiometries. These stoichiometries determine the form of the rate laws if the mechanism is elementary; otherwise the form must be given explicitly. As it is often neither desirable nor possible to work with elementary reactions, a representative stoichiometric model is postulated which is capable of describing the different reaction steps well with respect to the intended use of the model. Hence, identification of stoichiometric models represents an important step before attempting to derive kinetic models. This work aims at identifying stoichiometric models for the following two applications: (1) the derivation of simple, approximate stoichiometric models for complex reaction systems; and (2) the investigation of incompletely known reaction pathways. A methodology
that
can
help
assess
the validity
of
stoichiometries the chemist may propose on the basis of a priori knowledge wiI1 be developed. The approach
utilizes the concepts of stoichiometric target testing to meet its objectives.
linear space and
t Present address: Institut d’brutomatique, Lausanne, Switzerland.
EPFL,
1015 3417
AND
STOICHIOMETRIC
Let us introduce some notation. N represents an R x S-dimensional matrix: N = (v,,)
SPACES
stoichiometric
r = 1, 2, . . . , R
(1)
s=l,2,...,S where v,, is the stoichiometric species in the rth reaction. M is an A x S-dimensional M=(m,,)
a=
coefficient atomic 1,2 ,...,
for the sth
matrix: A
(2)
s=l,2,..*,S where m,, indicates the number of atoms of the ath type in the sth chemical species. Theoretical
stoichiometric
space
Since the number of atoms of each atom type is conserved during chemical reaction, any possible composition change (characterized here by the stoichiometries of a reaction system) must necessarily obey: MNT
= 0.
(3)
In mathematical terms, one can say that the composition change due to any chemical reaction must lie in the null space of the matrix M. The dimension of the null space of M, denoted by S, is equal to S - p(M), where p(M) represents the rank of M (Chen, 1984). Hence, the permissible stoichiometries span a vector space of dimension 6, called here the theoretical stoichiometric space. However, the number of actual reactions may in fact be larger (Denbigh, 1971) or smaller (BjGmbom, 1975) than 6 and must be determined empirically (Aris and Mah, 1963). This theoret-
D.
3418
BONVIN
and D. W.
ical stoichiometric space has also been labelled theoretical reaction subspace by Liao (1989). Unfortunately the theoretical stoichiometric space suffers from three major drawbacks which limit its utility for deriving stoichiometric models: (i) since it is derived solely from the atomic matrix M, any stoichiometry which does not violate the conservation of atom types belongs to that space; (ii) it is of dimension 6 and not r, the number of independent reactions present in the system (r < 6); (iii) the complete matrix M is not available when unknown species are present.
T. RIPPIN
unique, the factors and are therefore factor analysis can real factors and matrices X and N
Factor
analysis
The case where D is not corrupted by measurement noise is first investigated. Upon decomposing D into singular values (Stewart, 1973; Lawson and Hanson, 1974; Horn and Johnson, 1985) one obtains: D =
space Consequently, this study attempts to derive an observed stoichiometric space based on measured data, i.e. for the situation observed in the reactor. A modelling approach that can be used to identify stoichiometric models for batch reactors from off-line concentration measurements (and possibly on-line temperature measurements) is presented in Appendix A. The measured data matrix D of dimension B x S is expressed as: Observed
stoichiometric
D=XN
(4)
where B represents the number of observations (either from the same or from different batches). The unknown matrix X of dimension B x R contains the extents of R reactions for B observations. These extents of reactions contain the effects of kinetics, thermodynamics, transport limitations, etc. The unknosvn matrix N of dimension R x S contains the stoichiometric coefficients for R reactions. Representation (4) is useful as it enables the separation of operational information (X) from structural information (N) in measured data (D). The problem to be solved can be formulated as follows: given a data matrix D and modelling equation (4), is it possible to determine the number of independent reactions and corresponding stoichiometries which are acceptable in chemical terms? One approach in which both X and D are determined by non-linear regression (Filippi et al., 1986) suffers from a number of disadvantages. In Section 3 a more structured approach is presented in which factor analysis is used to derive an observed stoichiometric space from the data matrix D.
3. FACTOR
ANALYSIS
AND
TARGET
TESTING
Factor analysis is a statistical technique that has been used to interpret numerous types of data (Harman, 1970, Malinowski and Howery, 1980; Cureton and d’Agostino, 1983). In this particular application, the data matrix D is the product of the two matrices X and N, each with a rank equal to the number of independent reactions, r. Factor analysis enables the decomposition of D into two matrices of rank r, X, and N,, thereby uncovering the factors, e.g. as the rows of N,, . Since the decomposition of D is not
obtained may not be the real ones labelled abstract factors. Target be used to confirm the presence of to compute the corresponding from the abstract factors.
U,S,
V; = U,, S,, k-z1
(5)
with V, a B x B orthonormal matrix, S, a B x S matrix whose diagonal entries are the singular values of D and V, an S x S orthonormal matrix. S,, contains the r non-zero singular values on the diagonal. u Dl and contain the corresponding r left and VDI right singular vectors, respectively. Vg denotes the transposed matrix of V,. D can be expressed as D=X,N,,
(6)
with the matrices X, and N,, of dimensions B x r and r x S, respectively, given as follows: x,
=
uD,
(7)
sD,
N, = VzI.
(8)
The columns of VD1 represent a basis for the observed stoichiometric space but they may not correspond at all to the stoichiometries observed by the chemist from study of the reaction mechanism. In fact, any linear combination of the r orthonormal singular vectors lies in the same space. Hence, the matrix N, is labelled the abstract stoichiometric matrix. The number of reactions involved in the system equals the number of abstract factors (rows of N,, or columns of vDl 1. rThe physically meaningful stoichiometries lie in the observed stoichiometric space and, therefore, are linear combinations of the r columns of V,,. There remains the problem of finding an r x r linear transformation matrix T to calculate the acceptable stoichiometries from the abstract basis: D =
U,,, S,,
VT,, = X,N,
= X,
T- ’
- TN0 = X;
N, (9)
where X, and N, are the transformed extent-of-reaction and stoichiometric matrices, respectively. Several methods are available to calculate the transformation matrix T, using either computational (Rummel, 1970, Harman, 1970) or graphical (Hamer, 1987) procedures. These methods, however, have been found to be insufficient when the number of reactions is larger than two or three or when the transformed stoichiometries must be physically meaningful. Consequently, and since a large body of knowledge about the reaction system may be available from the chemist, an approach is sought that incorporates this
Target factor analysis knowledge into ometric models.
for
a procedure
deriving stoichi-
Target testing The idea of target testing, also called target factor analysis (Malinowski and Howery, 1980, Roscoe and Hopke, 1981; Lorber, 1984) is presented next. The method is aimed at checking whether a target factor (in our case a target stoichiometry), which may stem from a priori information or from deductive reasoning, is compatible with the abstract factors. More specifically, once a target factor is proposed, a transformation vector is computed that minimizes the Euclidean norm of the difference between the transformed abstract factor and the target factor. Following the notation of eq. (9) one can write: Nt = TN,,
(10)
or for the case of a single stoichiometry, i.e. for a row in eq. (10): I$ = tTN a
(11)
where n, is the transformed stoichiometric vector resulting from linear combination of the abstract stoichiometric vectors contained in the r x S matrix N,,. Given a target vector q,,, the transformation vector t is obtained as the least-squares solution to I& = t’N,
+
ET
(13)
where N: is the pseudo-inverse of N. satisfying Moore-Penrose conditions (Horn and Johnson, 1985). Upon combining eqs (11) and (13) the transformed abstract vector can be directly related to the target vector: n; = r&N,+ N, =
qZr P.
P = N+(I N, = V,, V+,, = D+D.
(15)
Dealing with measurement errors The projection matrix P depends on the data matrix D. Errors in that matrix will affect the validity of target testing. The noisy data matrix D does not in general exhibit rank r but rather full rank, i.e. p(D) = min(Z3, S), and since the number of observations is typically larger than the number of species, p(D) = S. Direct target testing is no longer applicable. Malinowski and Howery (1980) discuss at length ways of determining the number of true factors in D. The simplest one consists of deleting some of the smallest singular values of D, assuming that they contribute only to noise in D. This, of course, is only partly true because the measurement error (ME) affects both the imbedded error (ZE) associated with the r dominant singular values of D and the extracted error (ES) associated with the (S - r) least dominant ones. Under the assumption of equal error variance associated with each element of D, the following relations exist between the standard deviations of the various error terms (Malinowski and Howery, 1980): (ME)2 = (IF)2 + (ES)2 with
(16) (17)
(12)
where E represents the model error (i.e. the target does not necessarily lie in the stoichiometric space): t= = n&N,+
3419
(14)
The S x S matrix P = N,f N, is the projection matrix associated with the row space of N. (Lawson and Hanson, 1974) In other words, the target stoichiometry II,,,, in RS is projected via P onto the stoichiometric space of dimension r. Thus, II, can be viewed as either a transformed abstract vector or the projection of the target vector onto the stoichiometric space spanned by the rows of N,,. If the target stoichiometry is consistent with the observations, it lies in the observed stoichiometric space, and target and transformed stoichiometries are identical. On the other hand, if the target stoichiometry is not consistent with the observations, its projection will differ from itself, thus leading immediately to its rejection. The projection approach is of use only if r < S. That condition is always met since r was found above to be less or equal to 6 = S - p(M). The larger the difference S - r, the more severe the test. The projection matrix P is calculated from the data matrix as follows:
EE=ME
S-r
J -.
S
Equation (17) indicates that ZE, the error that affects the factor analysis reproduction process, is negligible only for r B S. Assuming ZE = 0, an approximate data matrix D, of dimension B x S and rank r can be reconstructed: D, = u,,S,,
J’s,
(19)
where the r x r matrix S,, contains the r dominant singular values of D. The columns of U,, and Vol are the corresponding left and right singular vectors. Target testing can be performed as discussed above using D, instead of D. The effect of ZE is to distort the projection results. The discussion above assumed equal error variance for each element of D, a rather unlikely situation. If some of the elements of D are small but accurately measured, their information is almost completely lost when the smallest singular values are discarded. For such a situation, each column of D can be weighted approximately inversely proportional to the standard error or the mean value in that column. Target testing can then be performed using the weighted data matrix. Consider the general weighting (scaling) in which each element of D can be scaled individually: B=
W,DW,
(20)
where WI and W, represent square non-singular weighting matrices of appropriate dimension. Upon applying SVD on the weighted matrix d and retaining
D.
3420
and
BONVIN
D. W.
T. RIPPIN
D* contains the original data in terms of S - r, pseudo-species which are defined so as to be invariant w.r.t. the r, known reactions:
only the dominant singular values one obtains:
and D, = IV;‘&
n* = y=A2n
IV;‘.
(22)
The scaling of eq. (20) transforms meaningful but numerically negligible terms in D into numerically significant terms in 0”. The back transformation of eq. (22) is carried out after the small singular values of 0” (hopefully mostly associated with noise) have been discarded. If the unknown I is small compared to S, it may be wise in the face of experimental errors to test with possibly too large values of r (i.e. retain one or more additional singular value since this may not be purely related to noise). This will result in a larger-dimensional stoichiometric space and, hence, in a higher likelihood for a transformed vector to match a real stoichiometric target. However, incorrect stoichiometries are rejected easily since I is still small compared to S. 4. INCORPORATION
OF A PRIORI
-,Y-; = v-,
Testing target stoichiometries according to the projection approach derived in Section 3 remains prone to measurement errors, especially when the number of dominant factors is not considerably smaller than the number of measured species. However, by the time a reaction system has gone through its usual development stage, many of the reactions may be known. Then it is only reasonable to try to utilize all the available stoichiometric information in an attempt to reduce the sensitivity of the calculated stoichiometric space to measurement errors. This section is directed at deriving a stoichiometric space by differencing between the known and the unknown reactions. If r, independent reactions are known to take place in a reaction system, their contribution can be separated from the contribution of the remaining r, reactions as follows: D = XN
= X,N,
+ X,N,
(23)
where NA contains the known stoichiometries. In eq. (23) only D and NA are known. Even the number of remaining reactions is uncertain. The singular value decomposition of the r, x S matrix NA gives: NA = U,S,V;
= U,[S,,/O]
1
Vf;I 1
VT Al
.
(24)
The S x (S - rA) matrix VA2 contains the S - r, last right singular vectors of N, that span the null space of NA. The reactions with stoichiometry in the null space of N,, do not contribute to D. Upon postmultiplying eq. (23) with V,, one obtains: D v,, or
=
XBNB
D*=X,N;.
where n and II* are the vectors of number of moles of original and transformed species, respectively. Equation (25) can be analysed further to determine NS+independently of the value of X, (which cannot be computed correctly before a complete stoichiometric network has been found). The contribution of the remaining unknown reactions is, therefore, treated orthogonally to the contribution of the known reactions. The number of remaining (unknown) reactions is given by r,, the number of dominant singular values of D*. A stoichiometric space for these reactions is given by V,, , the first r, singular vectors of D*. This stoichiometric space is orthogonal to NA and is described in terms of the pseudo-species A*. Considering the definition of the pseudo-species of eq. (26) the stoichiometric space for the r* reactions in terms of the original species is given by:
STOlCHlOMETRIC
INFORMATION
v,2
(25)
(26)
v,, .
(27)
Hence, the complete stoichiometric space becomes:
~=[y=[v;~v:2]
(28)
and target testing can be performed as described in Section 3. The S x S projection matrix of rank r reads in this case: (29)
P=&-+Af.
5. SIMULATED
EXAMPLE
The liquid-phase oxidation of propane with oxygen at 150°C and 50 atm is considered (Bulygin et al., 1972). The products are acetone, water, isopropanol. acetaldehyde, methanol, acetic acid, carbon dioxide and formic acid. There are three atom types: C, H and 0. The 3 x 10 atomic matrix M (Table 1) is of rank 3, indicating that seven independent reactions are possible, for example, as a slight modification of the scheme proposed by Bjiirnbom (1974): Rl:
C,H,
+ O2 +(CH,),CO
R2:
C3H,
+ 0.502
R3:
C,H,
+ 0,
R4:
CH,CHO
+ 0.5 0,
+ CH,COOH
(33)
R5:
CH,CHO
+ 2.50,
+ 2C0,
(34)
R6:
CH,OH
R7:
2C,H,
+ CpH,OH
+ CH,CHO
+ 0, + 2.50,
+ Hz0
+ HCOOH
(30) (31)
+ CH,OH
+ 2H,O + Hz0
+ 3CH3CH0
+ 2H,O.
(32)
(35) (36)
Bulygin et al. (1972) showed experimentally that the seventh reaction, given by eq. (36), fails to occur. A 13 x 10 data matrix D (Table 2) was constructed from experimental data extracted from their figure “Kinetic
Target factor analysis curves for the accumulation of products of liquidphase oxidation of propane”. The data in D were very slightly modified so as to be generated exactly from the first six reactions. Hence, p(D) = 6.
Case with measurement errors The elements of the measured data matrix D were artificially corrupted with noise. The relative error was given Gaussian properties with zero mean, fixed standard deviations and no correlation between elements. A 7 x 10 target stoichiometric matrix N, (Table 3) was constructed: it contains the six real reactions and a seventh reaction which failed to occur Ceqs WB-W)l.
Case without measurement error A six-dimensional stoichiometric space was derived from D using the approach discussed in Section 3. Any of the first six true reactions was accepted by target testing; all other trials, including the seventh reaction, were clearly rejected.
Direct approach. The seven target stoichiometries contained in N, were projected onto the six-dimensional stoichiometric space spanned by the columns of VDl’ The m-norm (maximum absolute value of the elements) of the difference vector between target and projected stoichiometries was chosen as a measure of distance of the target reaction from the observed stoichiometric space. Table 4 gives values of the conorm of the difference vectors for various levels of measurement error. In this study, a target stoichiometry is accepted (considered plausible) if the oonorm of the difference vector is less than 0.3. This
Table 1. Atomic matrix M (three atom types: C, H and 0; 10 species: propane, oxygen, acetone, water, isopropanol, acetaldehyde, methanol, acetic acid, carbon dioxide and formic acid) 3 8 0
0 0 2
3 0 6 2 111112
3 8
2 4
12 4
4
11 0 2
3421
2 2
Table 2. Data matrix D (13 observations; 10 species as in Table 1)
-
-
0.1190 0.2111 0.2796 0.3203 0.3402 0.3612 0.3775 0.3942 0.4108 0.4222 0.4467 0.4695 0.4951
0.0313 0.0706 0.1053 0.1231 0.1286 0.1407 0.1508 0.1594 0.1676 0.1748 0.1819 0.1877 0.1948
0.1824 0.2883 0.3617 0.4278 0.4855 0.5288 0.5664 0.6050 0.6437 0.6630 0.7067 0.7459 0.7893
0.0063 0.0075 0.0085 0.0095 0.0096 0.0101 0.0082 0.0082 0.0078 0.0078 0.0080 0.0081 0.0083
0.0310 0.0654 0.0894 0.0994 0.0994 0.0956 0.0911 0.0870 0.0833 0.0813 0.0797 0.0785 0.0781
0.0848 0.1466 0.1940 0.2362 0.2722 0.2986 0.3215 0.3439 0.3662 0.3770 0.3931 0.4057 0.4211
0.0342 0.0440 0.0495 0.0571 0.0644 0.0719 0.0786 0.0859 0.0931 0.0978 0.1104 0.1231 0.1351
0.0350 0.0453 0.0497 0.0522 0.0548 0.0624 0.0706 0.0783 0.0862 0.0915 0.1089 0.1263 0.1444
0.0309 0.0448 0.0533 0.0723 0.0958 0.1049 0.1137 0.1226 0.1318 0.1338 0.1365 0.1378 0.1392
Table 3. Set of target reactions (seven reactions; 10 species as in Table 1)
- 1.0 -
1.0
- 1.0 0.0 0.0 0.0 - 2.0
- 1.0 - 0.5
-
1.0 0.5 2.5 1.0 2.5
1.0
0.0 0.0 0.0 0.0 0.0 0.0
1.0
0.0 0.0 0.0 2.0 1.0 2.0
0.0
1.0 0.0 0.0 0.0 0.0 0.0
0.0
0.0 1.0 - 1.0 - 1.0 0.0 3.0
0.0
0.0 A:: 0.0 - 1.0 0.0
0.0
0.0 0.0 1.0 0.0 0.0 0.0
0.0
0.0 0.0 0.0 2.0 0.0 0.0
0.0
0.0 0.0 0.0 0.0 1.0 0.0
Table 4. Results of target testing for various levels oferror in the data matrix (Rl-R6 are true reactions, R7 did not take place): a3-norm of difference vector between target and projected stoichiometries % rel. error
RI
R2
R3
R4
R5
R6
R7
0 0.1 0.5 1 5 10
0 0.02 0.06 0.08 0.19 0.32
0 0.02 0.07 0.10 0.14 0.18
0 0.12 0.46 0.68 0.90 0.93
0 0.12 0.45 0.69 0.96 0.99
0 0.14 0.48 0.74 0.97 1.00
0 0.21 0.75 0.87 0.92 0.90
0.99 0.97 1.53 2.15 2.82 2.90
0.0226 0.0312 0.0354 0.0408 0.0478 0.0531 0.0570 0.0619 0.0668 0.0684 0.0748 0.0802 0.0871
3422
D. BONVIN and D. W. T. RIPPIN
value was chosen in relation to the typical values 1 and 2 of stoichiometric coefficients. The indicated values are representative of 20 randomly perturbed data matrices, each generated with a different seed. Each column of the data was weighted inversely proportional to the mean value in that column. When the error was kept very small (e.g. standard deviation < 0.1%) the simple approach described in Section 3, by which the non-dominant singular values assumed to contain the effect of measurement noise are discarded, was successfully used. With larger ,noise amplitudes this direct approach was no longer satisfactory. The stoichiometric space was too inaccurately determined to allow effective target testing using the projection matrix P given in eq. (15) with D replaced by D, . Furthermore, in this example with S = 10 and r = 6, one cannot significantly increase r to cope with experimental errors. Thus, a priori information concerning some of the stoichiometries was utilized according to the idea of Section 4.
space. The real reaction R5 is accepted, but only for r = 9, whereas the other reactions generate a maximum difference of stoichiometric coefficients larger than 0.3 and cannot be accepted. These results show that, in the presence of noise, slightly increasing the dimension of the stoichiometric space can be quite useful. In a second test, reactions Rl, R2 and R5 were assumed to be known and formed a 3 x 10 matrix NA _ Following this procedure, an additional basis for the stoichiometric space was generated, and 10 x 10 projection matrices of rank 6, 7 and 8 were computed according to eq. (29). The results of the projection tests are given in Table 5b. R4 is accepted as a true reaction. The procedure is repeated in the same fashion. Table 5c shows that with Rl, R2, R4 and R5 known, R3 can be accepted, and Table 5d shows that with Rl-R5 known R6 is easily accepted. However, reaction R7, which failed to occur, is clearly rejected. This example has shown that the approach is best used in a recursive manner since adding information about one or several reactions reduces the sensitivity of the target testing approach to measurement noise. Furthermore, if the measurement noise is random, increasing the number of observations helps in reducing the adverse effects that noise has on target testing. This last point is not illustrated in this study.
Recursive approach using prior knowledge. This study was performed using data corrupted with relative errors of 5% standard deviation. In a first test Rl and R2 were assumed to be known (which, by the way, could also be extracted from Table 4, i.e. without explicit prior knowledge) and formed a 2 x 10 matrix N,. Following the procedure of Section 4, a 13 x 8 matrix D* was computed. Stoichiometric spaces of dimension 6, 7, 8 and 9 were successively derived by retaining for Nf 4, 5, 6 and 7 singular vectors of D*, respectively. Projection matrices (10 x 10) of rank 6, 7, 8 and 9 WCTCcomputed according to eq. (29). Projection of the seven target stoichiometries contained in N, produced the co-norms of the difference vectors given in Table 5a. The known reactions Rl and R2 by definition belong exactly to the stoichiometric
6. CONCLUSIONS The method of factor analysis was proposed to help elucidate the stoichiometry of complex reaction networks. The basic idea is to use measured data in order to define a stoichiometric space. Testing possible reactions is very appealing to the chemist since he can play the active role of proposing the chemical reactions.
Table 5. Recursive target testing using prior information and stochiometric spaces of various dimensions r: m -norm of difference vector as in Table 4 R3
R4
R5
R6
R7
0.72 0.67 0.54 0.43
0.66 0.58 0.45 0.35
0.67 0.55 0.38 0.26
0.95 0.88 0.68 0.45
1.93 1.63 1.28 0.85
(b) Rl, R2 and R5 known 6 0 0 0.50 7 0 0 0.46 a 0 0 0.43
0.41 0.36 0.29
0 0
0.98 0.91 0.76
1.21 1.13 0.95 0.93 0.94 0.92
0.97
r
Rl
R2
(a) Rl and R2 known 6 : 0 7 0 8 0 0 9 0 0
0
(c) Rl, R2, R4 and RS known ~~ 0:27
0 0
8 0
0.97 0.79 0.66
(d) Rl-R5 known 6 0 0
0
0
0
0.08
(e) Rl-R6 known 6 0 0
0
0
0
0
0 6 7 8
0 0
0 0
0.99
Target factor analysis Target testing indicates whether a target stoichiometry is compatible with measured data. Several remarks concerning factor analysis, target testing and possible extensions can be formulated. (1) It is possible to include thermal information to help infer the apparent reaction enthalpies (Appendix A). (2) It is not mandatory that all species be measured. In fact. if r is significantly smaller than S, a number of species can be left unmeasured (Appendix B). (3) It is possible to include unknown (so-called freefloating) elements in the target stoichiometric vector. Estimates for the unknown elements are obtained from the transformed stoichiometric vector (Appendix C). (4) The measured species must be identifiable by analytical methods, but there is no need to know their atomic structure exactly. In fact, if the atomic structure of all species is known, the theoretical stoichiometric space of dimension 6 can be inferred from the atomic matrix M alone. Knowledge of the atomic matrix may also help reduce measurement noise in D or reconstruct columns of D corresponding to unmeasured species (Appendix D). (5) The testing of a single reaction stoichiometry is independent ofother reactions. This allows the chemist to test a hypothetical reaction without having to consider the global, still unknown reaction system. However, prior knowledge of some of the reactions is most useful when dealing with noisy data (see also point 8 below). (6) Once the extent-of-reaction matrix X is known at several times during a single batch, it can be used in kinetic investigations. (7) It is possible to check the validity of measured data by comparing them to the stoichiometric space obtained from previous (standard) data (Liao, 1989). (8) The detrimental effect of noise in D can be substantially reduced by means of the novel approach developed in this work to separate, in an orthogonal fashion, the contribution of the known and the unknown reactions. The latter can then be studied independently of the former. The method is best used in a recursive way since each newly identified reaction simplifies the identification of other reactions. This makes target testing useful also when the data matrix is corrupted with experimental errors.
NOTATION
a A A 33 C PR d D e ;
ath atom type number of atom types S-dimensional vector of species number of batches (observations) heat capacity of the reactor contents, J/OK S-dimensional composition data vector [es. 64311 composition data matrix of dimension B x S thermal data [eq_ (AlO)] B-dimensional vector of thermal data number of known elements in a target vector
9
AH h
b I
iv n
no
k Jv
P P qF 4,
S S t T T, u V V, X
W
1.2
3423 number of measured species reaction enthalpy, J/mol R-dimensional vector of reaction enthalpies S-dimensional vector of heats of formation number of unmeasured species atomic matrix of dimension A x S number of moles normalizing factor [eq. (A2)] S-dimensional stoichiometric vector stoichiometric matrix of dimension R x S stoichiometric vector space number of free-floating elements projection matrix of dimension S x S rate of heat flow through the inner wall, W rate of heat production for the rth reaction, W rate of secondary heat effects, W heat flow through the inner wall, J secondary heat effects, J reduction matrix of dimension S x g reduction matrix of dimension S x f number of independent reactions rate of the rth reaction, mol/m3 s number of reactions rate of production of the sth species, mol/ m3s number of species matrix of singular values r-dimensional transformation vector transformation matrix of dimension r x r reactor temperature, K matrix of left singular vectors matrix of right singular vectors reactor volume, m3 extent-of-reaction matrix of dimension B xR extent of reaction for the rth reaction reconstruction matrix of dimension g x S weighting matrices
Superscripts * refers to pseudo-species pseudo-inverse + T transposed weighted Subscripts a a A B r” 9 1 M r S
r tar 0
abstract quantity for the ath atom type associated with known reactions associated with unknown reactions related to the D matrix free floating refers to measured species refers to unmeasured species related to the M matrix for the rth reaction for the sth species transformed target initial
D. BONVIN
3424 1
associated values
with
2
associated
S
values of dimension
Greek 6
symbols dimension
with
the
dominant
the non-dominant
and D. W. T. RIPPIN
singular singular
S
of the null space
E
error
v,,
stoichiometric
p(M)
in the rth reaction rank of the matrix
of M
term coefficient
formulated x, (r = 1,2,
for the sth species
in terms
n, = n,, + n, 5 “,, x. r=i
REFERENCES
Aris, R. and Mah, R. H. S., 1963, Independence of chemical reactions. Ind. Engng Chem. Fundam. 2, 9W94. Eljiimbom, P. H., 1974, Failure of the element-by-species method to calculate the number of independent reactions. A.2.Ch.E. J. 20, 1026-1027. Bjornbom, P. H., 1975, The independent reactions in calculations of complex chemical equilibria. Ind. Engng Chem. Fundam. 14, 102-106. Bonvin, D., De Valliere, P. and Rippin, D. W. T., 1989, Application of estimation techniques to batch reactors. I-Modelling thermal effects. Comp. them. Engng 13, 1-9. Bulygin, M. G., Blyumberg, E. A. and Al’tshuler, L. A., 1972, The liquid-phase oxidation of propane. Int. them. Engng 12, 5(r52. Chen, C. T., 1984, Linear System Theory and Design. CBS College Publishing, New York. Cureton. E. E. and D’Agostino, R. B. 1983, Factor Analysis: an Applied Approach. Lawrence Erlbaum Associates, Hillsdale, NJ. Denbigh, K., 1971, The Principles of Chemical Equilibrium. Cambridge University Press, Cambridge. Filippi, C., Greffe, J. L., Bordet, J., Villermaux, J., Barnay, J. L., Bonte, P. and Georgakis, C., 1986, Tendency modelling of semi-batch reactors for optimisation and control. Chem. Engng Sci. 41, 913-920. Hamer, J. W., 1987, Stoichiometric interpretation of whole cell responses. A.I.Ch.E. Annual Meeting, paper 152g, New York. Factor Analysis. The Harman, H. H., 1970, Modern University of Chicago Press, Chicago. Horn, R. A. and Johnson, C. A., 1985, Matrix Analysis. Cambridge University Press, Cambridge. Lawson, C. L. and Hanson, R. J., 1974, Solving Least Squares Problems. Prentice-Hall, Englewood Cliffs, NJ. Liao, J.C., 1989, Fermentation data analysis and state estimation in the presence of incomplete mass balance. Biotechnol. Bioengng 33, 613-622. Lorber, A., 1984, Validation of hypothesis on a data matrix by target factor analysis. Analyt. Chem. 54, 10041010. Malinowski, E. R. and Howery, D. G., 1980, Factor Analysis in Chemistry. Wiley, New York. Roscoe, B. A. and Hopke, P. K., 1981, Comparison of weighted and unweighted target transformation rotations in factor analysis. Compur. Chem. 5, l-7. Rummel, R. J.. 1970, Applied Factor Analysis. Northwestern University Press, Evanston, IL. Stewart, G. W., 1973, Introduction to Matrix Computations. Academic Press, New York.
extents
of
reaction
s=l,2,...,S
(Al)
where n, is the total number of moles of species A,, n,, its initial value and n, the initial number of moles of a key reagent. An integrated energy balance for the reactor section, considering the reactor volume and the heat capacity of the reactor contents C, to be constant, reads: C,,(T,-
M
of the various
_. . , R) as follows:
T,.,)=n,
5
,=I
(-A~r)xr+Qr.+Qe~cc
642)
where Qr and Qsec represent the energy terms for heat flow through the wall and secondary effects, respectively. It is possible to express the material and energy balances in matrix form: eq. (Al) gives dT = XTN 1 where dT = -(nl no
- nlO, n2 XT=(x,,x2
N =
n2,,, . . . , n, ,...,
with
Cd
(A3) nso)
XR)
r = 1,2,
. _. , R
s=l,2,...,S; eq. (A2) gives: e = XTb
II= = (AH,.AH,,
Upon combination
(A4)
. . , AH,).
of eqs (A3) and (A4) one obtains: [dT!e]
= XT[Njh].
(AS)
Equation (A5) holds for any observation (during or at the end of a batch run) with a given reaction system. If the process is repeated for B observations or batch runs, eq. (A5) can be expanded to give: [DIeI
= X[Njb].
(A6)
In this equation, the reaction enthalpies are introduced as additional unknown quantities and are determined together with N so that composition and thermal measurements are matched as well as possible. However, the reaction enthalpies can be calculated from the stoichiometric coefficients and the heats of formation: h = Nh/ with b; = (AH,., AH, 2r . . . , AHI,,). sion (A7) in eq. (A6) gives: [Die]
= XN[ljh,].
(A7) Substituting
expres(ASI
The S x (S + 1) matrix [ll h,] is known and, as long as it is of full rank, can be put on the LHS of eq. (A8) as follows: [Die]
[I+,]
+ = XN.
649)
Since [Die]
= D[ljh,]
(AlO)
eq. (A9) reduces to APPENDIX
A: CONSERVATION
NON-ISOTHERMAL
EQUATIONS
BATCH
FOR
A
REACTOR
This appendix summarizes the conservation equations for
a batch reactor (Bonvin et al., 1989). A mole balance in a batch reactor for the sth species can be
D=XN.
(All)
A comparison of eqs (A9) and (Al 1) indicates that addition of thermal information is redundant when the S species can be measured without error, i.e. eq. (AS) holds exactly. Since this is rarely the case in practice, eq. (A9) can be used to calculate a derived data matrix D, of dimension B x S (least-
Target factor analysis squares approach): D,, = [Die]
[Zjh/]’
= XN.
(A12)
The matrix D, is known as it contains data that depend on measurements of concentration (possibly off-line) and temperature (on-line), and on knowledge of the heats of formation for the various species. If thermal effects are not considered, or in the case of no measurement error, D, = D. Otherwise, D, differs slightly from D. If the elements of [Die] are of widely differing magnitudes or if they are known with various degrees of accuracy, it is possible to use a weighted least-squares approach instead of eq. (Al2) to obtain: fid = [Die]
W, {[ZIhJ]
W,}’
= XN
(A13)
where W, = YW1/’ and Y is a diagonal matrix containing the mean variances of the columns of [Die]. Use of composition and thermal measurements [eq. (A12) or (A13) in the weighted case] typically allows a data matrix to be obtained with reduced error compared with the case using only composition measurements.
where the matrix Q, of dimension S x f is introduced so as to hide the unknown elements in hr. The elements of each column of QY comprise zeros and a single one. Target testing can be implemented for the f known elements. Similarly to eqs (17)-(19), one obtains
q:...,
(Bl)
where Ds is a B x g data matrix and NW an R x g matrix containing some of the stoicbiometric coefficients for R reactions. The matrix D, contains the observed part of the B x S matrix D. Mathematically, one can write:
where the matrix Q, of dimension S x g is introduced so as to hide the columns corresponding to the unmeasured species. Typically, the elements of each column of Q, comprise zeros and a single one. The g x g projection matrix associated with the row space of D, reads: P, = 0;
D,.
W)
N,+
(C3)
n:
Or
=
Car
Q; P
CQ, Qf 1 P.
(C4) (CV
If hr. / is found to be a good reduced target stoichiometry then, m turn, II,~~is accepted as a stoichiometry. A nice feature of the approach consists in the possibility of predicting the p unknown elements in the target vector as shown by eq. (0%) or (C5). It is important to notice that the S x S matrix [QfQ; ] is not the unity matrix of dimension S. The reconstruction of free-floating elements can be implemented only if f > r, i.e. p < S - r. It is also possible to impIement target testing with free-floating elements when only g species are measured (cf. Appendix B). The condition for being able to reconstruct unknown elements of the target vector becomes p cc g - r.
APPENDIX D: USE OF THE CONSTRAINT Upon transposition of eq. (3) one obtains:
NMT = 0.
iVMT = 0 (Dl)
This represents a constraint for the stoichiometric matrix N. (a)
Use of NMr = 0 to reduce measurement noise in D The null space of M describes the theoretical stoichiometric space of dimension 6. The observed stoichiometric space of dimension r is a subspace of Rd. Each observation (row of D) if measured without error would lie in the null space of M, as demonstrated here. From D=XN
(D2)
one can write DM’
= XNM=
CD31
and from eq. (Dl)
(i) the S-dimensional target stoichiometry n:,, proposed by the chemist is postmultiplied by Q, to give the g-dimensional target stoichiometry G,.#; (ii) the transformed (or nroiected) stoichiometric vector of di&nsion g is calculated:.
If br., is found to be an acceptable target stoichiometry, then a_= is, in turn, accepted as a stoichiometry with respect to the measured species. The target testing approach is of use ifrtg. Another approach for dealing with unmeasured species consists in reconstructing them using the constraint equation NMT = 0 (cf. Appendix D).
c: TARGET TESTING WITH
+ E=
t= = ~:...,Q;
(B3)
Target testing can then proceed as follows:
APPENDIX
= tTN.,Qf
0: = nT ,ar,,
APPENDIX B NOT ALL SPECIES MEASURED Often information about all the S species is not available. The general matrix equation (9) can be reduced so as to include only g columns, g being the number of observed species. The resulting expression reads: D, = XN,
3425
FREEFLOATING
ELEMENTS It is also possible to include unknown (so-called freefloating) elements in the specified target stoicbiometry. If there are p unknown elements in the target stoichiometry, a reduced target vector of dimension f= S - p can be obtained:
(Cl)
or
DMT=O
W’4)
MDT = 0,
(JW
i.e. each column of D’ lies in the null space of M. One can use this property to obtain a data matrix D, of rank 6 by projecting the noisy D matrix onto the null space of M. The projection matrix of dimension S x S associated with the null.space of M reads (Lawson and Hanson, 1974): P,=I-M+M.
(Jw
D, = DP,.
(D7)
Hence
If r = 6, the noise left in D, does not affect the stoichiometric space obtained from D,. However, this case is of little practical significance for target testing because any stoichiometry which does not violate the conservation of atom types belongs to that space. If r < 6, target testing is meaningful but the measurement noise left in D, affects (although in a reduced way) the derived stoichiometric space. (b) Use of NM= = 0 to reconstruct unmeasured species If only g of the S species are measured it is possible, under certain conditions, to reconstruct the composition of the I= S - g unmeasured species using the constraint equation NW=O_
D. BONVIN and D. W. T. RIPPIN
3426 Equation
(D4) can be partitioned D,M:
+ D,M:
as = 0
08)
where subscripts Q and I refer to measured and unmeasured species, respectively. M, and M, have dimension A x g and A x 1. From eq. (D8) one obtains: D, =
-
(D,M:)(M:)+
(D9)
under the assumption that the A x 1 matrix M, is of full rank and I d A. The B x S matrix D can therefore be calculated from the B x g matrix D, as follows: D = [D&D,]
= D,[I!
-
M@4:)+]
= D#W,
(DlO)
where W, is a 8 x S matrix which is known from knowledge of the atomic matrix M. Equation (IX) indicates that D, is a linear combination of D,. Hence, p(D) = p(D,).