/'..~.p,,nl I'ol~m,t J , , . r n . kol 16 pp I Io 6 ~ Pergamon P r e ~ l i d 19~,0 Prmlcd ,n Oreal Brllaln
I~ll.l ~trk" x/I (II!II.OIWIIS, 12 iwI o
POLYNOMIAL SERIES EXPANSION OF THE MOLECULAR WEIGHT DISTRIBUTION FUNCTION y. Stcot-rE E~partement de Chimie, Universite de Montreal, Montreal, H3C 3V I, Canada
(Receit'ed 17 July 1979) Abstract--The Gram-Charlier sertes expansmn of the molecular v,eighl distribution, using Laguerrc polynomials, can be considered as an analysis of the distribution in terms of gamma subdistributions. The tv, o scaling parameters involved in thts series are not to be chosen arbitrarib The) play an important role in the suitability of the series to fit the distribution. The best fit occurs for the values of the parameters that locate the averages of the subdistributions symmetrically with respect to the average of the whole distribution, and simultaneously ensure that the extreme subdistributions do not extend significantly outside the interval of non-zero values of the distribution, but come as close as possible to it. Several numerical applications are presented.
symmetric distribution, obtained by Daoust and H o d u c [2] by sedimentation velocity analysis on a sample of P V C i G u l f O i l Canada. FR 1551-106) produced by emulsion p o b m e r i z a t i o n at - 2 0 D-2 is a negativel) skewed distribution function of a cellulose nitrate sample. The integral distribution data were obtained b~ Schulz and Marx. using fractional precipitation, and transformed into the differential distribution function [3] D-3 is a bimodal distribution obtained by McCormick [4]. using sedimentation ~elocity anabsis, on a 50 50 mixture of t ~ o narrow-distribution polyst.~renes. S 102 and S II I D-4 is a simulated distribution used as an example of h i g h b positive ske~ hess
INTRODUCTION
The distribution function, P, of a variable such as molecular weight. M, can be approximated by a finite series of the type: P,,~p(M) = Po(M) L C,f~,(M)
l)
n-O
If Po(M) is chosen as being the two-parameter Schulz Zimm (gamma) distribution. Po(M) = 7 ( M ) -
F(~ + 1)
(2)
the.f~(M) are the Laguerre polynomials and the series is known as the O r a m - C h a r l i e r series. "
rHF c t a S s l ( a L MOI)f: OF (HOOSI%(; THE S(~I,IN(, ParaMtTIt:rs
(n + ~ + l ) ( - x ) k
.f:(M) = k= -~o (k + ~ + i-)(,;- kTk!
(3)
x = /~M.
(4)
In this G r a m Charlier series. ~ and /~ are referred to as the scaling parameters, and are generally obtained b~ setting
where
The coefficients, C., of Eqn. (I) can be obtained from the moments of P(M). or from a least squares fitting, or by c o m b i n i n g the two. All these methods were discussed previously I l l . The least squares method is of course the best for fitting purposes, and will be the only one used in the present work. In this method, each C,, will have the value that minimizes
E 2 = [ P ~ p p ( M ) - P(M)] 2.
DISTRIBUTION
(5)
(71
(8)
If the fitting is good. Co will then be near unity for a normalized P(M) and both C, and C2 will be near zero [5]. The other C. of the series will be a measure of the departure of P(M) from a gamma distribution. Table I shows the values of ~j obtained under this mode, and denoted ¢~. for the four distributions, at j = 4 and 9. There is of course no evidence that the fit obtained under this first mode will be the best possible It will
DATA
Four sets of molecular weight distribution data were used and are shown in Fig. 3. D-I is a relatively ~,
M, = 1:~ + 2) [/
a e = [M - ~l~] z = M,M~ - IM~
USED
,~,1161
(6)
where ,M~ and ~4z are the v, eight and 7 molecular weight averages of the aeighl distribution PIM). By doing st). one is choosing as starting point approximation. 7(M). the gamma distribution having the same weight ax.erage as P(M) and also the same variance,
The m i n i m u m value of E obtained in this way, divided by the m a x i m u m value of P(M). will be denoted ~ and will serve as a measure of the fitting. ~j will be the standard deviation between P(M) and the series of length j. THE
M~ = ~ + ll[~
]
Y. SICOTTE
Table 1. Values of the scaling parameters and of the standard deviations obtained under the classical mode: Eqns (6) and (7)
D-I D-2 D-3 D-4
8.0 13.4 2.84 2.24
3.55 2.36 2.37 1.24
x x x x
10 -s 10 -3 10 -'~ 10 -5
4.5 11.7 15.4 6.2
1.6 7.3 2.7 3.7
denote M, and Mh the lower and higher limits of this interval. For high enough values of n, El,,, will fall above M,. The corresponding subdistributions will then have to cancel each other to a great extent to fit the data, P(M)=O,
AN ALTERNATIVE MODE OF CHOOSING THE SCALING PARAMETERS
~,&,/3).
To locate the set of values of a and fl giving the best possible fit, it is useful to go back to Eqn. (I) and rewrite it, with Eqns (2) and (3), in the form: J
(9)
n=o
Equation (9) shows that the series represents the distribution as a sum of j + 1 gamma distributions, to be called subdistributions. The subdistribution n will have a weight average
M . . = (= + 1 + n)I/3
(i0)
If one looks for the best fit, it appears that, instead of using Eqns (6) and (7), the scaling parameters should be chosen so as to provide a more symmetric spreading of the subdistributions over the interval M~ to M, and to ensure that the subdistributions do not extend significantly outside this interval: this v, ill generate two conditions thai will simultaneously determine ~t and ,8. Instead of having El,~o = M,,., one can choose to fix the average of the mid-subdistribution at the average of the whole distribution. M~i 2 = M-," i.e. by using Eqn. (10), (~t + I + j,2)/fl
and variance a" = ( 2 + 1 + n)/,82.
(11)
The El.. will be equally spaced, extending from M , , o = M~ up to
(12)
El~ = El. + j,# Increasing the length of the series will eventually add subdistributions that will exceed the interval over which P(M) differs experimentally from zero. We will
20
i =4
~(s,~}
(13)
and will be lost, at least partly, for fitting purposes, in the interval M~ to M,. Moreover a residual oscillation about the abscissa will be left above M,.
only be the best fit under the conditions (6) and (7). In fact, it readily appears that ~j varies with the choice of the scaling parameters, as will be expressed by writing
P,,v(M) = ~ C,~M=+"e -pro.
M > Mh,
= El.
(14)
The averages M~, will then be symmetric with respect to El,,. For odd values of j, the j.2 subdistribution does not exist, but Eqn. (14) still has a solution. r o verify this first condition, Eqn. (14). ~(:c fl) has been calculated as a function of/3 for fixed values of :c, at several values ofj. The order in which ~ and fl arc considered here is, of course, purely arbitrar3,. Figure I shows typical results. ,;.:j4~t./3) varies markedly with fl for a given value of :t. The small details of this variation are highly specific of the distribution. but the general features are evident. The curve shov, s a well with steep descents and an irregular bottom.
j =4
((45.B) Y=
IO i=g 5
'=9
D5
I0
15 ~x
I0 5
/
I
1
I
. 1
4
6
8
i0 BxK~ ~
Fig. I. Standard deviations obtained as a function of ,8. for fixed value of :~. The arrow corresponds to Eqn. (14).
Polynomial series expansion of the molecular ,*.'eight distribution function The value of fl corresponding to Eqn. (14). indicated by an arrow on each curve, falls within the well and this happens consistently whatever value of a is considered. It is interesting to note that increasing j not only deepens but broadens and flattens the bottom of the well. Conditions other than Eqn. (14) were also tried, such as centering M,~),, at the mid-interval, (Mn - Mfl/2, or at half the'surface of the distribution. Centering the maximum of the mid-subdistribution at the m a x i m u m of P(M) was also tried for unimodal distributions. But the value of fl corresponding to these various conditions was, at least in certain cases, significantly out of the well, especially for small values o f i where the well is narrower. Equation (14) seems to locate properly the minim u m of ~i(cc, fl) and we will thus keep it as one of the two conditions necessary to choose the two scaling parameters. We will denote as {j(:0 the value of ~i(:< fl) at the value of ,6 corresponding to Eqn. (14). THE S E C O N D
CONDITION
Calculations show readily that ~j{a) varies with :~ and presents a m i n i m u m that it would be interesting to locate by a general condition. This variation of ,.:j(:~) occurs while the M,,, are distributed symmetrically a b o u t M~. According to the value of a, however, the subdistributions will be more or less spread on the abscissa By increasing :c, the subdistributions move from both sides toward M~ and move o u r ~ a r d by decreasing :~. It is then evident that. for a low enough ~alue of :c one of two things will happen. For a negatively skewed distribution, the subdistribution n = i ~ilt extend above M,. For a positively skewed distribution, the subdistribution n = 0 will extend below 5,'It. It v, ould be interesting to choose for :~ the value that just prevents this extention of the subdistributions outside the interval Mt to Mh. The above statements refer of course not to the full mathematical extention of the distributions, but to the interval over which experimental data can be considered as significantly different from zero. Mathematically this can be considered as equivalent to the interval that comprises the total sample within experimental accuracy. For example, one could define M h - M~ as the interval that comprises 99°,; of the surface under the distribution, neglecting 0.5°;; at each end, on the basis that, outside this interval, the distribution is practically never measured to any reasonable degree of accuracy. For a wide variety of unimodal distributions, the ratio of this 99°o interval to the square root of the variance, E q n (8), is consistently very near 5. For example, the two unimodal experimental distributions that we consider, D-I and D-2, yield respectively 4.9 and 5.0. The normal, standard error, distribution yields 5.15. We have verified that the g a m m a distribution, Eqn. (2), also gives 5.15 for this ratio, at least for betv, een I and 50. It will be useful to divide this interval into two parts : M ~ - igt~ = k" (15) (7
Ma
~,(7
k'.
{16)
If M , lies at the mid-interval of the distribution, both k' and k" will be near 2.5. For negatively skewed distributions, k ' > k" and the reverse for postvel,~ skewed We can apply this to each of the subdistributions of the series, that are gamma distributions, C',M"e t~M This type of distribution is positively skewed if :( is low enough and nearly symmetric in this respect, for high values of a'. In this last case, we have found effectively k' ~ k" -~ 2.5. At :( = 10, k' = 2 and k" = 3.1; while at :( = 2, k' = 1.5 and k" = 3.6 Of course, all the figures given above for the linear width of a distribution, k',k" as well as their sum. refer to the interval that comprises 9900 of the sample. An interval corresponding to another level would yield different values. For the 98"o level interval, for example, (Mh -- M~)/a is reduced by 0.2 to 0.3 as compared with the values corresponding to the 99~, interva[ seen above. A value of 5 for this ratio seems then to be a very good first approximation for any experimentally plausible level of the interval. It should appl2, in the case of the subdistributions, at least for :( above I. To apply this notation to the subdistributions, we will divide the interval M~ to M~ into M, to M ~ . M,, to M,,, and M ~ to Mh E q n (10) gives j.p for the interval M~,) to M ~ The interval M~ to M,,,, can be defined as
If M/o is equal to M~, u should be equal to ko and the subdistribution n = 0 should then extend down t~ but not below' My Similarly', in Ms - M~, = t'oj.
(18t
r should be equal to k~ if Mh, is equal to Mh. The subdistribution tl = j should then extend up to bul not above M,. To verify if u = k~ and/or v = k}' effectively' correspond to a m i n i m u m in {j{:c), we have plotted this last quantity as a function of u (see Fig. 2). Here again, the curves seem highly specific of the distribution but a region of m i n i m u m occurs in each case. For D-l. D-2 and D-4, this region of m i n i m u m is between 2 < u < 3. Since the minima are relatively' fiat on this scale, we can as well define 2.5
(19
O"o
as corresponding to the best fit in these cases, i.e to the m i n i m u m of ~.~=). Equation (19) neglects vary ations of u around 2.5, that are partly specific and partly due to the expected variations of kb with the skewness of the subdistribution. These variations of u however correspond to small variations in {j(:O. O n each curve of Fig. 2. we have also indicated the values of e, Eqn. (l g). For D-I and D-3. the minimum corresponds to 2 < v < 3. Here again, we can as \~ell
Y. S~corrE
4
/
~0
3 4
',.7 z
5
I
D-2
D1
I
2
I
3
4
U
1
I
5
I0
I I.'5
U
8
2
N6
7
6
4
6
8
4
D-4
D-3 1
I
I
2
I 3
I
I
1
I
2
3
U
I .....
4
u
Fig. 2. Standard deviations obtained with ,6' given by Eqn. (14}, as a function of u, E q n (17). Values of r, E q n (18) are given on each curve. In every case. the upper curve is for j = 4. the lower one for j = 9. define Mh - Mw, = 2.5
(20)
O'j
as c o r r e s p o n d i n g to the best fit. In these t w o cases, where E q n s (19) a n d (20) apply simultaneously, M,, is very near the mid-interval o f the distribution. F o r D-4, v at the m i n i m u m is significantly greater than 2.5. E q u a t i o n (19) is satisfied but not Eqn. (20). In this case, M,, is significantly smaller than the midinterval.
For D-2, Eqn. (19) is clearly not s a t i s f i e d u is much larger than 2.5. E q u a t i o n (20) is satisfied for j = 9, c is near 2.5. F o r the same distribution however the situation seems to be different for j = 4; but this is only due to the fact that the fitting is very bad in this case for such a short series. Even then however, a strong inflexion point a p p e a r s at t~ ~. 2.5 and develops into a m i n i m u m for longer series. With this restriction, Eqn. (2__0) holds for D-2, but not Eqn. (19). F o r this sample. Mw is significantly greater than the mid-interval. We can then conclude that Eqns (19) and (20) de-
Table 2. Values of the scaling parameters and of the standard deviations obtained under the alternative mode: Eqns (t4) and (19) and/or (20) j D-I D-2 D-3 D-4
4 9 4 9 4 9 4 9
fl
a
9.7 10.8 34.8 41.5 7.7 8.4 5.3 5.3
5.0 6.5 6.2 7.7 6.6 8.5 3.2 4.1
× x x x × x x +
I0 -~ 10 -S 10 -3 10 -3 I0 "-~ I0 -~ 10- 5 I0 -s
~i ~
u
r
2.5 0.7 8.6 3.8 4.6 1.4 3.9 I.I
2.5 2.5 6.0 6.5 2.5 2.5 2.5 2.5
27 2.5 2.5 2.5 3.1 2.9 ,I.5 43
Polynomial series expanston of the molecular ~,eighl distribution function
L D~I //~\\
4
ot
0
%
4
2I
~1
D-2 o
*o
j=9
6
' ~ . / - - ~ ~ 112_ ~ - ~ --
4
MxlO -5
=o6-,, ~ ~lli -
~_
..
D-3
]_~ a
j=5
4- ~1 ~ 1~ ~
I1
MxIO "~
,
2
%4
~
j=6
~=9,9%
\ ,",4..,-
6
M xlO -5
(a}
M x IO - 5
~_Ox 4
i'9 [:3,8./.
2
2
4
6
4
MxlO-5
MxlO -3
E
o_3
j=5
-~ 1,3%
O-
E 2
2
~,,t 4
I 6
M x 10 - 5
4
(b)
8
MxlO
-5
Fig 3 Examples of the fitting obtained (dotted line) (at under the classical mode, Eqns 16) and (7). Ibt under the alternative mode, Eqns (141 and (191 and or (20L
scribe properly the minimum in ~j(a). On decreasing zc the first of these Equations that will be satisfied will constitute the second condition, the first being Eqn. 114), for choosing the two scaling parameters corresponding very nearly to the best fit. in fact, Eqn.~19) corresponds to positively skewed distributions, M~ smaller than the mid-interval, u -~ 2.5. v > 2.5, while Eqn. (__20) corresponds to negatively skewed distributions, M,, greater than the mid-
interval, u > 2.5, v ~ 2.5. Both equations are satisfied simultaneously if M,, corresponds to the mid-interval. We will denote ~j.~j(z0 at z( satisfying Eqn. (19) and/or (20) in the way stated above. Figure 3 shows some examples of both the classical mode of choosing the scaling parameters, Eqns (61 and (71, and the present alternative mode, Eqns (14) and (19) and,or (20). The improvement of the fitting is evident in every case. The great reduction of the oscillations al
Y. SICOTTE
M > Mh is to be noted. Table 2 shows numerical values of Cj and is to be compared to Table I. Except for the special case of D-2 at j = 4, improvements by factors of 2 to 3 are obtained. Of course the useful limit of ¢ is not zero but is the experimental accuracy of the data. At best, the series should smooth the experimental data and ¢ will then be a measure of the scattering of the data about this smoothed curve. The ¢ of Table 2 however have not attained this limit. Even the small ones are still a composite of experimental scattering of the data together with deviations of the series from the distribution, the latter being the most important. This point was verified by smoothing artificially the experimental data. The values of ¢ do not change then significantly. It is evident that, even under the present alternative mode, the Gram-Charlier series is particularly well suited for the representation of positively skewed or symmetric distributions. Negatively skewed distributions require longer series to obtain a given fitting. This is related of course to the fact that the subdistributions used are gamma distributions. Preliminary results obtained by using subdistributions of variable skewness seem to indicate that a more general series could allou, a still better fit, independently of the skewness of the distribution.
CONCLUSIONS The Gram-Charlier series can be used to represent molecular weight distributions, but the fitting will vary markedly with the choice of the scaling parameters. The best fit corresponds to the following conditions. The subdistributions that constitute the series must be located symmetrically about the average of the distribution. The subdistributions must not exceed the interval over which the data are significantly different from zero, but must extend to the limit of this interval i.e. the nearer to the average of the distribution. .4cknowled~lement--The financial aid given by the CAFIR (Internal Research Funding Committee) of the Uni~ersit.~ of Montreal is gratefully acknowledged. REFERENCES
I. Y. Sicotte, Eur. Polrm. J. 13, 515 (19771. 2. H. Daoust and N. Hoduc. private communication. 3. G. Champetier and L. Monnerie, Introduction ~ h+ Chimie Macrornol~'culaire, p. 180. Masson. Paris 119691. 4. H. W. McCormick, Polrmer Fractio~lation (Edited b~ M. J. R. CantowL Chap. C2. p. 274 Academic Press. Nev, York (1967). 5. F. C. Goodrich. Chap. F. p. 443 of ref. [4].
R~,,m6- La representation d'une function de distribution des masses moleculaires en serie de Gram Charlier. avec les polyn6mes de Laguerre. peu! &re consideree comme une analyse de la distribution en termes de sous-distributions gamma. Les deux parametres d'echelle impliques dans eerie serie ne doivent pas ~tre choisis arbitrairement. Au contraire, la precision avec laquelle la serie peut representer I:~ distribution ~arie enormement avecla valeur de ces parametres. La meilleure representation correspond aux va]eurs des param~tres qui am~nent une disposition ~ym~trique des moyennes des sous-distributions par rapport L~]a moyenne de l'ensemble de la distribution et qui en mime temps font que les sous-distributions cou',rent le plus possible, sans le depasser cependant, l'interval]e ou la distribution presente des valeurs non nulles. Plusieurs exemples d'applications numeriques sont etudies.