Vol.18. No.
,&moJphcric Envirmmmr
8. pp. 1517-1537,
OC04-6981/84 S3.W + 0.00 PerSmon Press Ltd.
1984
Printed in Great Bri~arn.
INTERLA~O~TORY COMPARISON OF SOURCE APPORTIONMENT PROCEDURES RESULTS FOR SIMULATED DATA SETS*+ LLOYD A. CURRIE,$ ROBERT W. GERLACH National Bureau of Standards, Washington, DC 20234, U.S.A. CHARLES W. LEWIS U.S. Environmental
Protection Agency, Research TriangIe Park, NC 27711, U.S.A. W. DAVID BALFOUR
Radian Corporation, Austin, TX 78766, U.S.A. JOHN A. COOPER NEA, Inc., Beaverton, OR 97005, U.S.A.
STUART L. DATTNER Texas Air Control Board, Austin, TX 78723, U.S.A. RICHARD T. DE CESAR Oregon Graduate Center, Beaverton, OR 97005, U.S.A. GLEN E. GORDON Department of Chemistry, University of Maryland, College Park, MD 20742, U.S.A. STEVE L. HEISLER Environmental Research and Technology, Inc., Concord, MA 01742, U.S.A. PHILIP K. HOPKE Institute for Environmental Studies, University of Illinois at Urbana-Champaign, Urbana, IL 61801, U.S.A. JITENDRA J. SHAH
Environmental Research and Technology, Inc., Concord, MA 01742, U.S.A. GEORGE D. THURSTON School of Public Health, Harvard University, Boston, MA 02115, U.S.A. and
HUGH J. WILL~MSON Radian Corporation, Austin, TX 78766, U.S.A. (First received 9 June 1983 and infinalform
19 January 1984)
Abstract-Three sets of simulated aerosol compositional data were prepared to (I)assess the current state of the art of source apportionment procedures and (2) provide initial sets of test data to aid method development. The data sets were generated from reported source profiles, real meteorological data (St. Louis, 1976),and two constructed city plans. Following plume dispersion by means of the RAM model, 4O‘samples’ having known source contributions and error structure were generated for each set. Seven laboratories participating in the Mathematical and Empirical Receptor Models Workshop (Quail Roost II) undertook
* A publication of the National Bureau of Standards, not subject to copyright. t A Report from the Mathematical and Empiricai Receptor Models Workshop (Quail Roost II) $To whom requests for reprints of this article should be
addressed. Requests for reprints of the entire series of six articles (including this one) documenting the Quail Roost II workshop should be directed to Robert K. Stevens, MD-47, U.S. Environmental Protection Agency. Research Triangle Park, NC 27711, U.S.A.
1517
LLOYDA. CURRIEet ol.
1518
deconvolution of one or two of the sets by various numerical techmques (chemical mass balance or multivariate). Comparison of the participants’ results with the known source contributions showed that the source contribution estimates were consistent with the truth within a factor of - 2 and that the uncertainty estimates ranged from much too conservative (broad) to much too small. No unique method of choice emerged from this exercise; the participants’ various techniques appeared complementary and capable of resolving - 69 different sources. The intercomparison did allow us to formulate suggestions for improving the simulation process per SC and for improving the various treatments (especially with respect to estimating uncertainties). Key word i&x: Atmospheric particIes, chemical mass balance, error propagation, factor analysis. laboratory lnter~omparison. multivariate analysis, simuIation data, source apportionment, un~~~nty estimates.
INTRODUCTION Interiaboratory com~rison represents a crucial step in the validation or improvement of analytical measurement processes, particularly as applied to difficult, ‘real-world’ problems involving multiple species at the trace level. Almost without exception, such intercomparisons reveal large discrepancies among la~ratories~i~repancies that may arise from errors in sampling, chemical measurements, or data evaluation models and methods. Important insight is often gained by separately examining these three steps of the overall measurement process. Assessment of the state of the art of the last step (data evaluation) is the objective of this article. The evaluation and reporting of analytical data can be separately investigated, in the interlaboratory environment, through the medium of a simulation exercise in which the system under investigation is modeled by computer. This approach permits one to more or less realistically simulate the physical (chemical, meteoroio~~l, etc.) processes that occur in the field, while maintaining full control over the system model and its error structure. That is, the ‘truth’, including the magnitude and distribution of the individual random and systematic errors, is fully known (to those who generate the data). Such was the stimulus and the spirit of the present interlaboratory exercise. It was modeled, in part, from the International Atomic Energy Agency’s very well crafted and very informative gamma ray deconvolution interlaboratory comparison using convolved, simulated gamma ray spectra (Parr et a/.. 1979). Within the context of source apportionment analysis, a simulated data set is a collection of mathematically generated numbers that are intended to represent a set of ambient air quality measurements (typically, aerosol elemental concentrations). In contrast to a ‘real’ data set resulting from actual field measurements, however, in a simulated data set the impacts of the contributing sources are known exactly from knowledge of the data generation process. ft is this absolute standard of truth against which predictions of various receptor models may be compared that makes the use of simulated data sets such a potentially powerful (and prior to now, little used) method to evaluate the performance of alternative mathemati~l source apportionment methods. This report describes the gener-
ation of the simulated data sets made available to participants of the Mathematical and Empirical Receptor Models Workshop (Quail Roost II} (Stevens and Pace, l984), and the source apportionment results based on the participants’ analysis of those data sets. While many different aspects of the participants’ results are presented, Tables 12 and 13 and Fig. 5 give the key results. The simulated data sets produced for Quail Roost II are expected to be useful well beyond the originai application for which they were created. Despite their various limitations (discussed later in this report), they are the product of the most comprehensive attempt to date to realistically model the ambient elemental concentrations that could be plausibly expected from a distribution of commonly occurring sources. The data sets should be useful for testing new or modified receptor models, as well as for debugging existing receptor model computer programs. The data sets are in fact already being used for these purposes at New York University, the University of Maryland, the National Bureau of Standards, the State University of New York College of Environmental Science and Forestry at Syracuse, and the U.S. Environmental Protection Agency.
VENERATION OF SIMULATED DATA SETS
Three data sets were constructed as outlined in Table 1. Each source profile consisted of 19 elements and the isotopic ratio ‘%Z 12C (not provided in Data Set I), with component concentrations (mg g- ’ ) meant to be representative of a line-fraction aerosol component. Using contrived source geographies (Figs 1 and 2) and the RAM Gaussian plume dispersion model Version 80336 (Novak and Turner, 1976), ambient concentrations of each species were generated for each of forty 12-h averaging periods. All simulations were run with real meteorological data obtained from the National Climatic Center (Asheville, NC). Starting with Julian day 249 of 1976, hourly surface data from St. Louis/Lambert, MO, were combined with twicedaily mixing height information from Salem, IL, in the meteorological preprocessing program RAMMET (Turner and Novak, 1978) to obtain hourly wind speeds, wind directions, stabihty classes, mixing heights, and air temperatures. These output quantities
Interlaboratory comparison of source apportionment
procedures: results for simulated data sets
1519
Table 1. Structure of simulated data sets Generating equation
c!K) = i [A -
e
m
+
@)I.~$“) + e!“)
p = number of active sources (p C 13) K = sampling period (1 < K C 40) CiK’ = “observed” concentration of species i for period K (1 C i Q N, N < 20) SJ”’ = true intensity (at receptor) of source j(1 Q j < p) Ai, = ‘observed’ source profile matrix (element i, j) ei = random measurement errors, independent and normally distributed e, = systematic source profile errors, independent and normally distributed (systematic because fixed over the 40 sampling periods) eH = random source profile variation errors, independent and log-normally distributed Data set characteristics
Set I: p = 9 (including one unknown source);* errors = ei, e,; City Plan No. 1 (Fig. 1) Set II: p = 13 (all knownk errors = ei, e,,,; City Plan No. 2 (Fig. 2) Set III: p = 13 (all known); errors = e,, em, eH; City Plan No. 2 (Fig. 2) l
For Data Set I, participants were told only that p < 13.
were then employed in the generation of ambient concentrations by the dispersion model. The original unprocessed meteorological information was provided to the participants. An additional component of source variation in the model, not provided to participants, was the hourly source strength fluctuation due to daily operating schedules created for each classification of sources. For example, type A and B steel plants followed the same diurnal pattern based on a weekly schedule.
m
R
RECEPTOR
0
ROAD (AUTO + SOIL)
1
STEEL -A
Expressed in matrix form, the generating equation from Table 1 is C = AS+e,
(1)
where the vector of ‘observed’concentrations (ng m- 3, is represented by C, the matrix oi source profiles (mg g- ‘) by A, and the source contributions (pg rne3) by S. As Table 1 indicates, each data set included normally distributed measurement errors for the individual observations (e,, random) and for the source
3 0
2
INCINERATOR
3
OIL -n
4
SAND BLAST
6
COAL .B SOIL
3
SASALLT WOOD
5
0
EACH AREA SOURCE IS 2 km SQUARE
Fig. 1. City Plan No. 1 (for Data Set I).
0
I 0 2
1520
Fig. 2. City Han
No. 2 (for Data Sets If and III).
profiles (em, systematic). Set III also included lognormal profile errors (Q, randam) for each of the sources. Xn other words, the profiies themselves changed from one period to the next. In all cases participants were informed of the distributions and standard deviations ofthe error com~nents (Tables 2 and 3). A ‘world list’ of 13 source profiles was also provided. All of these profiles were active in Sets 11and III, but only 8 were active in Set I. In addition, Set I included one source type, ‘sandblast’, of which the participants were not informed. The simulated data sets thus had a weir-de~ned structure in terms of the linear model and error terms. The sets were realistic because mixing of source emissions at the receptor was ~~orn~iis~~ via dispersion, and random and systematic errors were added in accordance with typical field values (although the selected eH errors were somewhat smaller than what realism would probably require). The data and model did assume, however, that all species were conserved from source to receptor. Some deficiencies of the dispersion model employed are noted later in this report. It is important to emphasize that any such deficiencies have no effect on the acrurrtc~ ofthe resulting simulated data sets per se. The dispersion model is employed to link source emissions, locations and meteorology in order to assign exact numerical values to the source iutensiti~ S, at the receptor site. These intensities are then convoluted and perturbed with random errors to produce the ‘“observed” ambient elementaf concentrations Ci according to Equation (I). The job of any source apportionment analysis is then to recover the Sj
Table 2. ‘Measurement’ errors* for simulated data sets Relattve error Element
c NZi c: S Cl
t ‘“01 -s IO I5 I !0 li)
Absolute error
be m _ 7 ,100 200 100 50 10 f;
K
$0
15
Ca
t4
lo 7
;
2 3
T Cr Mn
s i( :i
I 3
Fe Ni
10
I
CU
10
3
Zn
10
AS
I ii s x 8 --
F;, cct
7
is 1 1 5‘I<,
*Quoted measurement error = absolute error t {etemental concentration x relative errorf/lOG. t Contemporary carbon (determined from “C concentration) as a percentage of total carbon.
from the C, values. Dispersion model deficiencies do affect the 'wdism'of the simulated data sets to the degree that the deficiencies distort the relationships among the data set values and the emissions inventory, and among the data set values and the meteorological conditions and source ‘geography’. values
1521
Interlaboratory comparison of source apportionment procedures: results for simulated data sets Table 3. Random errors* in source profiles for Set III
Element
C Na Al Si S Cl K Ca Ti V Cr Mn Fe Ni cu Zn As
Steel-A
Steel-B
Oil-A
Incinerator
Oil-B
Coal-A
Aggregate
Glass
Basalt
Soil
Wood
Coal-B
1.2 1.3 1.3 2.0 1.5 1.3 2.0 2.0 2.0 1.5 2.5 2.0 1.5 1.5 2.0 2.0 1.5
1.2 1.3 1.3 2.0 1.5 1.3 2.0 2.0 2.0 1.5 2.5 2.0 1.5 1.5 2.0 2.0 1.5
1.2 1.2 1.1 1.3 1.1 3.1 1.5 3.0 1.5 1.5 2.0 1.3 1.2 2.0 2.0 2.0 3.0
1.2 2.0 3.0 2.5 3.0 1.3 1.5 2.0 3.0 3.0 3.0 3.0 2.5 2.0 1.5 1.3 1.5
1.2 1.2 1.1 1.3 1.1 3.1 1.5 3.0 1.5 1.5 2.0 1.3 1.2 2.0 2.0 2.0 3.0
1.2 2.0 1.2 1.3 2.0 1.3 2.0 1.2 1.3 2.0 3.0 2.5 1.5 2.0 2.0 3.0 1.5
1.5 1.5 1.1 1.1 1.2 1.3 1.5 1.2 1.2 1.1 1.3 1.3 1.3 1.3 1.3 1.3 1.2
1.5 1.3 2.5 1.2 1.3 1.1 1.3 1.1 1.2 1.1 3.0 2.5 3.0 3.0 2.0 3.0 3.0
1.2 1.5 1.1 1.1 1.2 1.3 1.5 1.2 1.2 1.2 1.3 1.2 1.3 2.0 1.5 1.2 1.1
1.2 1.5 1.1 1.1 1.2 1.3 1.5 1.2 1.2 1.2 1.3 1.2 1.3 2.0 1.5 1.2 1.1
1.3 2.5 3.0 3.0 1.2 2.0 1.5 1.2 2.0 2.0 2.0 2.0 2.0 2.5 1.5 1.3 2.0 1.5 1.2 1.1
1.2 2.0 1.2 1.3 2.0 1.3 2.0 1.2 1.3 2.0 3.0 2.5 1.5 2.0 2.0 3.0 1.5 1.5 3.0 1.1
Br
1.5
1.5
3.0
1.5
3.0
1.5
1.5
3.0
1.5
1.5
Pb
2.5 1.1
2.5 1.1
2.5 1.1
1.5 1.1
2.5 1.1
3.0 1.1
2.0 1.1
2.5 1.1
2.0 1.1
2.0 1.1
cct
* Errors have a log-normal distribution whose standard deviation is given by log,, (tabulated value). t Contemporary carbon (determined from ‘Y concentration) as a percentage of total carbon.
These aspects were generally unimportant in the present case because the source apportionment strategies presented at Quail Roost II made little direct use of the emissions inventories and meteorological information that were supplied. ‘Realism’ also may have suffered somewhat as a result of the large numbers of duplicate sources (especially with City Plan No. 2). This duplication was intended partly to compensate for the narrowness of the dispersion plume, and tended to decrease the variability of source contributions from period to period. (See Tables 5A-C for the actual variations in source contributions.) The complexity of the data sets increased in the order I + II 4 III, because of the increased number of active components in the latter two sets and especially because of the considerable log-normal variability in source profiles in Set III. In many respects, however, Set I was the most interesting because participants were not told how many sources from the world list were active, and because an additional totally unknown source was also present. In summary, the simulated data sets were believed to comprise fairly realistic but well controlled sets of test data for reliable evaluation of precision and accuracy in the final data evaluation step (deconvolution) of the source apportionment process. To illustrate the general natureand level ofdifficulty of the simulated data, several highlights of Set I are given in Fig. 3. Here, the 13 source types given to the participants are indicated in the first column, together with 2 additional source types not sent to the participants. The first additional source type, ‘road’, was simply a linear combination of the known source types auto’ (75 %) and ‘soil’ (25 %) and was used to represent the reentrainment of road dust by motor vehicles.
iiif
Lsj,
STEEL-A
STEELS OIL-A INCINERAIOI
011-s COAL-A
AGQREQATE QUSS
DATA SE’, I
SASALT SOIL AIJIO woo0 COIL-S
Fig. 3. Characteristics of Data Set I.
While soil was used on its own as one of the sources in the data set generation, auto was used only in the road combination. Thus, the auto source intensity is simply 75 % of the road intensity, whereas the soil intensity is the sum of 25 % of the road intensity and a quantity independent of the road source. This method of using soil, auto and road was the same in the generation of all three data sets. The second additional source type, ‘sandblast’, was a source totally unknown to the participants. Average concentrations of the 9 active sources at the receptor, given in the second column of Fig. 3, were
A. CUURIE
LLOYD
1522
adjusted during the data set generation step so that a range of difficulty (or detectability) would be represented, as indicated by the rj./o~ bar graph. The theoretical (signal/noise) detection limit of 3.29 (Currie, 1977) is indicated by the arrow on the horizontal axis. The weakest source, steel-A, was well below the detection limit. However, several sources (including the unknown sandblast} were easily detectable, since the true average concentrations exceeded by many times the standard errors that would result from weighted leastsquares d~onvolution of the average elemental con~ntrations. The intensity of each source type varied greatly over the 40 averaging periods. Figure 4 depicts the range of intensity variation due to the incinerator source at the receptor site in Set I over all sampling periods. For this typical source, a large fraction of the intensities exceeded the (approximate, weighted least-squares) detection limit (L,, in the figure). The rather sharp variations seen in this plot are due to the unrealistically narrow dispersion characteristics of the RAM model. Minimizing the undesirable effect of this narrowness on the resulting data sets was a primary factor in planning the number of source locations for each source type. Even so, this artifact of the RAM model was never completely overcome. Thus the resulting distributions of each source contribution as a function of concentration were noticeably skewed, as can be inferred from Fig. 4, for example. This effect has been previously characterized by Bowne et 01. (1981). Table 4 lists the true source profiles from which the data sets were generated. The participants were provided not with these, however, but with a similar set of profiles perturbed by the systematic errors em. (The participants were not given the sandblast or road source profiles.) Tables 5(A j(C) list the true mass
er al.
contributions by source type and observation period for all three data sets. These contributions were. of course, not given to the participants. Table 6 lists the emissions inventory given to the participants, which was input to the RAM dispersion model for generation of Sets II and III. The primary consideration in choosing emission rates was to produce (in conjunction with the dispersion model) elemental ambient concentrations that were consistent with common experience. That this consideration required unrealistically high emission rates is a direct indication of limitations of the RAM model. Our goal of matching ‘common experience’, however. was not fully achieved. For example, the contributions of the crustal components (soil, basalt) exceeded what is commonly found in the fine-fraction aerosol in U.S. cities. The three data sets themselves. analyzed by the participants, are not included in this article because of their length.
SOURCE
APPORTIONMENT
SOLUTION
Several distinct approaches were chosen by individual participants in their attempts to estimate average source intensities Sj as well as uncertainties from one or more of the simulated data sets. Each of these approaches is summarized below. The form of each summary is an abbreviated series of steps followed by a narrative giving details and illustrations of some of the steps. This method of presentation is intended to show the basic structure of each approach as simply as possible, to reveal which portions are ‘recipe-like’, and which portions require experienced judgment. One measure of uncertainty, Gj (i.e. +J the standard error in the estimated source impact S,. was reported by
INCINERATOR SOURCE SET I
IIj
IS Sl
61
l
* l
l 0 l l -....-------_-__----
me
’ 1
_o_-__e-_-i-_~__-e--.l
0 0
0
0
l
l
LO
0 1
I
.
_.*
10
1
,*e
l
CEIIIOD.
impact at receptor
I 30
20 OIMPLlNO
Fig. 4. Incinerator
STRATEGIES
12 k
for Data Set I
l*
0
a 1
.o
11 13 8 27 30 9.1 70 1.3 1.8 3.3 16 120 2.8 2.6 2s 0.08 0.1 1.6 0
150
Steel-A
13 34 45 1.0 1.3 3.2 22 160 2.9 3.7 52 0.1 0.1 14 0
14 210 10 12 48
Steel-B
E t-II1 14 4.9 0.7 0.4 0.02 0.03 0.8 0
z‘; OI9 0.7
:: 7 180 140
Oil-A
Incin-
120 4.2 7.0 130 270 200 11 0.9 0.03 0.9 0.3 6.1 0.1 3.6 26 0.2 0.8 17 50
40
erator
0.4 0.1 13 0.06 0.8 1.3 0.3 0.4 3.8 0
0.1
200 22 5.4 70 110 12 1.4 8.6 1.1
Oil-B
*Contemporary carbon (determined from r4C con~ntration) nuciear testing.
C Na Al Si S Cl K ca Ti V Cr Mn Fe Ni Cu Zn As Br Pb CC*
Element
70 300 7.5 0.8 14 40 2.8 0.07 0.06 0.8 53 0.03 0.6 0.4 0.03 0.01 0.1 0
:,
Aggregate
28 7 8.2 4 0.03 5.8 1.4 63 3.2 6.9 4.7 0.1 0.05 5.8 0
s5 4oi 8.8
Sandblast
z 17d 2.7 27 4.6 0.4 1.5 3.4 0.3 10 1.3 0.4 0.1 0.6 0.02 0.5 0
2 150
Glass
0.3 0.2 8.3 67 9 0.2 0.1 1.5 86 0.1 0.1 0.08 0 0 0.01 0
2:
0.1 19
Basalt 66.4 4 222 543 0.3 0.1 12 20.5 12.7 0.3 0.3 2 97 0.1 0.2 0.4 0.01 0.01 0.3 97
Soil 550 70 8.6 15 8 10.7 2.7 11.8 0 0.1 0.09 0.6 7 0.06 0 3.2 0.01 41.7 120 2
Auto
z.08 0.1 108
s’.9 1 10 8 9 0.02 0 0 0.02 1 0,Ol 0 0.7
3
600
wood
;tz 180 17 3 40 9 0.8 0.3 0.2 70 0.6 0.4 0.9 0.2 0.01 1.2 0
8 4
Coal-B
429 52.5 62 147 6.1 8.1 5 13.9 3.2 0.1 0.1 0.9 29.5 0.07 0.05 2.5 0.01 31‘3 90. I 6
Road
as a percentage of total carbon. Values may exceed lOO~$because of the age of the wood and atmospheric
26 150 15 11 10 23 2.6 0.1 0.4 0.3 18 0.2 1.3 3 0.3 0.2 2 0
9.7
30
Coal-A
Table 4. True source profiles (mg g- ‘)
1524
LLOYD A. CURRIE et al. Table 5A. True mass contrtbutions Period I
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
Steel-A 39 260 96 168 I2 2 25 50 84 147 I1 122 28 0 22 6 39 23 18 5 49 0
I 0 10 142 40 1 85 21 88 43 6 5 43 8 1 0 61 260
Incinerator 3200 147 717 1244 250 61 578 18 1154 1384 1663 8622 279 49 1048 0 111 61 877 1257 2349 180 4081 814 11524 45 54 151 719 2304 548 812 1683 340 266 881 650 0 399 1361
Oil-B 48 1253 10 9780 3 596 64 11 608 888 2416 4039 78 358 5 0 663 143 1021 567 12 2121 232 67 632 1162 188 455 8330 2711 5459 3922 1139 10912 1342 1495 380 817 1911
(ng tt~ 3, by source type and observatton Sandblast
Basalt
Soil
7827 652 916 855 1861 2139 3462 14565 1 0 3024 200 3612 17246 3683 14722 4028 5533 82 5819 5422 2939 993 10408 5742 565 1213 30169 1175 2700 0 5804 5800 2876 2059 2 0 1940 0 2
2767 2374 3126 5298 4554 6586 13921 5646 1613 722 4360 12216 8650 4456 3608 8125 7085 1078 1636 1556 3028 3854 1598 1068 4091 2881 3607 4336 3924
5153 6400 1999 2014 11558 8179 6589 6859 805 2475 4232 8779 19077 8183 8968 7837 10991 1157 4926 3178 14098 7102 3085 2185 4017 9513 7091 4539 11496 3767 7267 13050 2678 23079 24806 4493 21625 10968 2570 13973
most participants. The overall uncertainty should, of course, include information on bounds for model error and systematic error. It is useful to define at the outset how the terms ‘regression’ and ‘classical factor analysis’ are to be understood in this report. A ‘regression’ model of source apportionment is the same as a chemical mass balance (CMB) mode1 except that the former includes an additive intercept constant determined along with the source intensities in the fitting procedure. ‘Classical factor analysis’ generally means initial factor extraction by principal component analysis of the correlation matrix of elemental concentration pairs, retention of a number of factors equal to the number of correlation matrix eigenvalues greater than unity, and finally orthogonal VARIMAX rotation. These choices are generally consistent with the default options present in most factor analysis computing packages.
3530 5900 1864 13157 11671 298 1 9839 3426 1132 3928
period
Wood 2272 7032 3486 8082 3333 8896 I899 3607 484 6218 1552 8393 425 3057 3078 4460 4261 918 1136 3252 2352 II? 813 2371 1917 1912 I465 7181 1497 5139 1032 9367 1304 3473 61’ 4569 1210 0 4oi 6151
(1) E$ecfive I;ariancu CMB
for Set
1
Coal-B
Road
49x 674 924 43 5233 1244 2387 1295 298 3332 892 15873
9464 3579 3033 4292 4013 3644 4704 18 16 2768 21861 138 1521 935 1106 2099 739 1 0 1772 63 1503 69615 42894 31773 22200 1 2630 3230 2212 717 15033 4256 1626 167 307 133 6497 76 0 X84
1678 1306 2614 1544 1373 1549 2794 6203 1113 11577 2X46 995 83 4’28 805 2096 1714 60 1 $51 6 4713 432’ 64x (! 0 0
[CMB(EY-l)]
(SLH)*
(a) Include all source profiles and elements in the initial CMB fit. (b) Remove sources with negative predicted intensities one at a time, discarding first the source where aj/Sj is largest, and repeat the CMB calculation after each removal. (c) When all remaining source Intensities are positive, remove any additional sources one at a time, discarding first the source where 0,/S, is largest and greater than 1, and repeat the CMB calculation after each removal.
*The descriptions of data evaluation methods l-7 were provided by the authors whose initials are shown in parentheses.
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
1
Period
2:: 94 135 1508 24 12 110 443 607
3:: 64 575 53 498 91i 614
1903 190 68 206 99 280 878 442 42 42 1128 83 460 426 669 329 41 404 602 297 1990 367 155 264 411 2721 1921 268 349 200 432 2131 453 777 280 83 15 169 576 239
27 338 0 292 0
243 1546 292 1566 121 51 607 1668 88 133 33 509 18 116 632 1042 254 753 238 95 394 77
8 93 1176 790 363 483 379 3 30 0 307 979 2493 315 256 0 0 0 205 247 388
39: 973 770 207 0 0 0 0 0 0
;
Oil-A
Steel-B
Steel-A 4667 477 2648 266 16563 772 661 582 185 0 1887 2 834 680 205 1993 9022 1546 508 273 389 2878 4238 3955 6153 60 439 33% 99 148 0 219 1684 92 199 1560 2 669 145 2390
Incinerator
E 799 648 126 209 24 214 0 394 149 262 481 688
2:: 33 1899 0 123 214 194 368 216 404 409 318 200 103 308 243 2237
112 241 495 479 434 612 692 150
Oil-B
89: 3847 10918
:
2: 1099 938 719 4202 1459 1182 0 1734 0 5740 2025 129 0 43515 13603 8086 4958 0 1743 0 434 3969
a: 2141
1651 2763 6210 1117 2360 16484 2354
Coal-A a4 977 308 585 1366 273 276 353 104 644 1150 2162 0 172 255 918 340 413 288 576 243 78 115 33 11 539 Ill 1212 472 988 1172 1397 81 86 0 3 232 111 42 301
Aggregate 723 1147 1016 4659 1247 275 49 3 31 421 4 1040 271 43 0 0 2 0 1136 215 67 148 587 79 0 753 625 106 405 519 382 125 74 86 819 83 207 336 219 666
Glass 4469 11549 7845 11598 9934 22385 34763 21112 4547 4321 6968 17189 14794 16476 9890 16970 9806 5233 5492 6902 9141 9291 3388 3980 3504 10382 10322 14646 15910 7598 13264 13533 3338 27525 10453 9674 15127 8802 3570 17465
Basalt 2236 9561 5688 10756 4960 10235 14038 14094 867 1798 4714 25224 20442 13609 3125 22546 11302 1673 1198 2747 8259 2483 2129 181 4070 3784 8648 5812 4973 5377 8255 3952 2289 8207 6489 438 2151 797 203 1 19755
Soil 275 2972 486 2773 685 698 259 454 139 1764 229 2402 343 607 345 8% 366 92 401 690 288 2645 588 1301 411 3589 395 766 300 1568 347 1163 360 1360 713 739 496 1120 258 2629
Wood
Table SB. True mass contributions (ng mM3) by source type and observation period for Set II
273:: 110 4335 12466 0 0
935 2246 0 126 0 0 0 3314 395 1120 1090 1484 6346 436 0 0 2724 0 1874 4 10167 12906 4873 8629 1085 1856 829 1013 1938 4343 1503 8806 6183
Coal-B 1535 593s 5269 6082 5263 3277 1773 263 221 3745 2189 5255 2560 890 1037 843 2951 0 2075 927 2075 9897 9072 10644 5806 3436 2736 2105 2263 3195 2388 3230 2620 7487 2889 1849 1526 2657 0 5833
Road
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
Period
4z 94 57 385 78 665 53 581 826 566 105 268 88 127 1608 30 18 117 473 685
247 1633 343 1671 126 52 593 1676 99 135 28 488 20 98 645 1003 313 782 258
Steel-A
24 366 0 317 0 5 0 0 383 1118 756 218 0 0 0 0 0 0 0 0 85 I246 1032 385 556 388 3 29 0 315 1012 2379 346 293 0 0 0 193 250 446
Steel-B 1991 I85 70 221 92 270 906 442 44 41 1127 79 439 425 718 372 43 422 580 286 1791 363 157 285 387 2849 1910 287 336 197 457 2069 463 769 261 85 16 171 557 222
Oil-A 5087 507 2788 264 72341 860 669 1235 361 0 1876 2 1395 638 490 2164 7.565 1443 525 285 443 3899 5325 3894 4510 337 373 4373 110 132 0 342 2029 119 249 1907 2 716 137 4609
Incinerator
:‘s: 219 403 413 321 194 100 321 227 2473 870 970 836 625 140 229 25 217 0 430 161 288 505 8.50
122 265 531 517 434 630 747 155 74 214 32 1841 0 139 222
Oil-B
Table 5C. True mass contributions
1610 2591 6146 1180 2216 18077 2302 102 824 2203 0 29 1100 891 771 4046 1530 1165 0 1690 0 5401 2215 140 0 46760 13716 7857 5134 0 1856 0 439 4139 0 0 0 892 3928 10720
Coal-A 81 983 296 594 1328 280 274 341 104 670 1088 2257 0 181 251 961 334 436 296 563 257 79 108 34 12 544 109 1252 484 956 1202 1407 82 85 0 3 228 107 44 312
Aggregate
(ng mm ‘1 by source
708 1113 999 4633 1261 272 54 3 33 459 4 1232 295 42 0 0 2 0 1227 219 68 134 598 75 0 799 677 103 411 497 398 128 79 82 836 84 203 361 234 695
Glass 4432 11550 8171 11717 9807 22100 35417 21814 4716 4242 7547 17818 16361 16957 10123 17124 10160 5715 5484 6893 9345 9447 3624 3961 3713 10853 10538 15326 15773 7251 13755 13705 3276 27302 10120 9766 15635 9199 3580 18333
Basalt
type and observation
2217 10005 5768 10650 4492 10154 14320 14910 847 1680 4880 25499 19748 14726 3176 22406 11450 1622 1254 2967 9036 2583 2112 173 3914 4159 9874 5902 4763 5257 8135 4018 2278 8150 6632 460 2049 818 1999 22929 295 2696 584 2566 605 840 265 388 131 1828 255 3225 289 559 339 316 343 79 498 604 272 2035 657 1558 458 4233 362 617 257 1373 383 1121 473 1531 675 762 524 1024 313 3153
Wood
for Set III
Soil
period
1032 2396 0 202 0 0 0 3280 585 1489 %2 1451 10309 396 0 0 2592 0 2278 5 10498 13975 5307 9394 937 2010 696 991 1911 4663 1799 8114 6576 76 35466 106 4757 13923 0 0
Coal-B
2039 10618 9918 9685 6886 3833 2667 2205 2076 3529 2382 2990 3209 7750 3308 1983 1845 2384 0 7058
,827
E: 860 1035 985 2773 0 1903
1485 6050 4745 3786 4707 3183 1702 261 207 3566 2195
Road
Interlaboratory comparison of source apportionment
the ‘detection limit’ for sources excluded from the CMB has not yet been developed. The uncertainty estimates obtained via Equation (3) using the individual period results were substantially smaller than those from the average ambient composition approach. As will be seen in Table 13, many of the estimated source contributions obtained by the former method deviate from the ‘true’contributions by more than would be expected from the uncertainty estimates. This suggests that the method used to estimate the uncertainties in results from individual periods may not be appropriate.
Table 6. Emissions inventory for Sets II and III Emission rate 0 Y-*1
Steel-A Steel-B
74000 107,000 43,000 47,000 49,ooo 503,000 31,600 48,400 50.000 862;OO0 2,285,OOO 934 795,000
Oil-A Incinerator Oil-B Coal-A Aggregate GhSS
Basalt Soil Auto wood Coal-B
(2) Efectiue variance CMB [CMB(EV-2)]
(d) Experiment with adding or substituting discarded sources to improve agreement between the measured and predicted elemental concentrations to reduce the percentage error in predicted total mass, and to minimize the reduced chi-square value. This procedure was applied to the individual observation periods separately and also to the average of the periods. Although much more computing is required, the former was considered preferable because of the more satisfactory handling of multicollinearity: even though a particular source is discarded in the analysis of one observation period, it will likely be retained in another, and average intensities for all sources are then obtained from an average of results from all periods. Uncertainties in the source contributions were obtained differently in the two treatments. When the procedure was applied to the average of the individual periods, the uncertainties in the average concentrations were estimated by propagating the errors (Bevington, 1969) for the individual periods through the averaging process br,
=
;( $, @qz,
(2)
where N = 40. The CMB procedure was then applied using the average concentrations and their uncertainties, and the uncertainties in the average source contributions were the uncertainties estimated by the effective variance method. When the procedure was applied to the individual periods, the uncertainties in the average source contributions were estimated by propagating through the averaging process the uncertainties estimated by the effective variance method for each period ,
/
N
1527
procedures: results for simulated data sets
\I/2
When a source was not included in the CMB for a particular period, the uncertainty in its zero contribution was assumed also to be zero. This assumption is probably not completely valid, but a means to estimate
(JAC)
This procedure is the same as the preceding one. It was applied only to the individual observation periods separately and the results averaged. In place of Equation (3), however, uncertainties in the average source intensities were calculated from (4) The CMB(EV-2) results were submitted after completion of the workshop, when the true source intensities were known. (3) Weighted ridge regression
CMB
[CM&RR)]
WJW (a) Calculate variance inflation factors (VIFs) for each source profile (Henry et al., 1984). (b) Perform weighted ridge regression for a series of ridge k values between 0 and 0.72. (c) Select the solution where (1) the calculated regression coefficients (source intensities and intercept) do not change appreciably for a small change in k, (2) no significant negative coefficients are calculated and (3) the calculated total mass is in good agreement with the measured total mass. The VIFs of Step (a) are not required in the subsequent steps, but serve as a diagnostic in establishing the need for ridge regression. Twenty-eight k values were used, more closely spaced near 0 than near 0.72. Like CMB(EV), the procedure can be applied both to individual observation periods and to the average of the periods. The latter was considered preferable because it presents less of a problem for calculating valid uncertainty values and requires fewer computing resources. (4) Weighted CMB/factor
analysis [CMB( WJ/FA]
(GDT)
(a) Perform classical factor analysis on all elemental observations to screen data and determine number and characteristics of major sources. (b) Perform classical factor analysis on all available source profiles to classify sources into distinct source groups. (c) Attempt to match, based on correspondences in elements with heavy loadings, each ambient factor from Step (a) with a source group factor from Step (b).
LLOYDA. CURRIEet al.
1528
(d) For each source group factor that has been matched, choose one source profile as representative of it and use the resulting set of source profiles in a CMB calculation weighted by the ambient concentration uncertainties only. Step (a) applied to Set I initially identified (on the basis of the number of eigenvalues exceeding unity) 6 major factors, as shown in Table 7. One factor, however, consisted almost entirely of arsenic. Since this factor matched none of the source profiles and since arsenic was a very ‘noisy’ variable (mean concentration < uncertainty), arsenic was eliminaad from further consideration, thus reducing the solution to
five major factors. Table 8 gives the results of Step (b), showing seven source groups: (1) crustal: soil. basalt, coal-A, aggregate furnace; (2) carbon-rich: nutomorice. wood combustion; (3) glass manufacturing; (4) incineraror; (5) steel furnaces: steel-A. steel-B; (6) coalB; (7) oil combustion: oil-A. oil-B. Four of the five ambient factors could be reasonably matched to the first four source groups listed above. This was accomplished by comparing the factors listed in Table 7 with the supplied elemental profiles of the sources in the listed groups. The lack of a good match for the remaining ambient factor (Factor 3) was taken to indicate (correctly) that an important source profile
Table 7. Factor analysis of Set I by CMB(W)/FA* Rotated factor pattern Factor 1
Factor 2
0.26 0.30 0.92 0.84 0 0 0 0.88 0.97 0.59 0 0.89 0.92 0 0 0 0 0 0
0.92 0.79 0.33 0 0 0 0 0 0 0 0 0.35 0 0 0 0.43 0 0.96 0.95
C
Na Al Si s Cl K :” V Cr Mn Fe Ni CU Zn As Br Pb
30.3
Factor 3
Factor 4
0
0 0.44
0
0
0.39 0 0 0 0 0 0 0.97 0 0.26 0.97 0.96 0.28 0 0 0
0
0
Factor 5
Factor 6
0 0 0 0 0
0.58 0.95 0.96 0 0 0 0 0 0 0 0 0.83 0 0 0
0
0 0 0
0.73 0 0 0 0 0.74 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0.97 0 0
Percentage of total variance explained by each factor 20.8 16.9 17.4 5.3
6.8
* Loadings of < 0.25 have been replaced by 0. Table 8. Factor analysis of source profiles by CMB(W)/FA* Rotated factor pattern
Steel -A Steel-B Oil-A Incin Oil-B Coal-A Aggregate Glass manuf. Basalt Soil Auto Wood Coal-B
Factor 1
Factor 2
Factor 3
Factor 4
Factor 5
Factor 6
Factor 7
0 0 0.75 0 0 0.97 0.99 0 0.95 0.97 0 0 0.72
0.45 0.54 0 0 0.79 0 0 0 0 0 0.94 0.93 0
0 0 0.40 0 0.31 0 0 0.97 0 0 0 0 0.25
0 0 0 0.97 0 0 0 0 0 0 0 0 0
0.88 0.83 0 0 0.27 0 0 0 0 0 0.25 0.31 0
0 0 0 0 0 0 0 0 0 0 0 0 0.62
0 0 0.47 0 0.33 0 0 0 0 0 0 0 0
3.5
3.0
Percentage of total variance explained by each factor 38.2
23.2
10.2
* Loadings of < 0.25 have been replaced by 0.
7.8
13.5
Interlaboratory comparison of source apportionment had been omitted from the world list. For lack of a better alternative, the source profile steel-B was chosen as a surrogate for the missing source, on the basis of its high loadings of chromium, nickel and copper. The five representative source profiles chosen for Step (d) are italicized in the above. Step (d) was applied to each observation period, and the source intensities obtained for the individ~1 periods averaged. The un~rtainties in the average source intensity estimates were calculated by Equation (3), with the difference that N counted only the period for which source j was used in a CMB (i.e. Sp) # 0). Step (a) was repeated with several meteorological variables added to the elemental ones: percentage of wind direction (NE, SE, SW, NW), percentage of stable conditions, percentage of unstable conditions, and mixing height during the sampling period. The 4factor solution for Set I is shown in Table 9. The most conspicuous feature is the association of NE wind direction with carbon-rich Factor 2. This was taken as indicative of a major source of automotive emissions to the northeast. a conclusion that appears consistent with the automotivecor~dor in the northeast sector of City Plan 1 (Fig. 1).
(5) Eflective variance CMBlfactor an&is FA] (CWL)
[CMB(EV)/
(a) Perform classical factor analysis on all elemental
Table 9. Factor analysis of Set I including meteorological variables by CMB(W)/FA*
Rotated factor pattern Factor 1 Factor 2 Factor 3 Factor 4
c Na Al Si S Cl & Ti V Cr Mn Fe Ni cu Zn Br Pb PERCNE PERCSE PERCSW PERCNW PERCUNST PERCSTBL MIXHT
0.36 0.36 0.91 0.86 0.36 0 0 0.90 0.96 0.17 0 0.88 0.93 0 0 0 0.26 0.25 0 0 0.36 0 0 0.46 0
0.86 0.76 0.29 0 0 : 0 0 0 0 0.31 0 0 0 0.44 0.92 0.91 0.84 0 -0.43 0 0.27 0 0
0
:40 0 0 :: 0 0 0.95 0 0.28 0.95 0.94 0.30 0 : 0.46 0 -0.33 0 0 0.44
0.47 0 0 0.77 0.91 0.91 0 0 0.35 0 0 0 0 0 0.80 0 0 :40 d 0.29 0 0.29 0
Percentage of total variance explained by each factor 26.4
18.4
14.9
* Loadings of < 0.25 have been replaced by 0.
15.4
procedures: results for simulated data sets
1529
observations to screen data and determine number and characteristics of major sources. (b) On the basis of similar prominent elements, associate a single representative source profile with each factor found in Step (a). (c) Perform a CMB(EV) with the smaller set of source profiles chosen in Step (b). This procedure is essentially a simpler but less systematic and less objective version of CMB( W)/FA. Step (a) applied to Set I produced results virtually identical to those produced by CMB( W)/FA, with five nontrivial factors identified (arsenic excluded). In Step (b), after some trial and error, soil, auto, incinerator, oil-B and steel-A were selected as representative of the five factors. In Step (c), the effective variance CMB program of Dzubay et al. (1981) was applied only to the average of the Set I observations, using all elemental concentrations except arsenic. Step (a) was repeated, adding the meteorological variables percentage of the wind direction and average wind speed. In contrast to ~MB(W)/FA, none of the new variables was found to be signi~~ntIy associated with the old factors. The reason for this difference is not understood. (6) Multiple linear regression by tracers/factor analysis [MLR(T)/FA] (SLD) (a) Perform classical factor analysis on all elemental observations to screen data and determine number and characteristics of major sources. (b) For each factor found in Step (a), choose the element with the highest loading as the tracer for that factor. (c) Perform a backward stepwise unweighted multiple linear regression with total mass as the dependent variable and the tracer elements as independent variables. Remove an independent variable whenever its regression coefficient is c 1.96 times the calculated uncertainty in the coefficient. (d) Calculate the intensity of source j during observation period K, .SgK),from SJ”’ = BjCj”‘,
(5)
where Bj is the regression coefficient found in Step (c) for the tracer element of source j, and Cfr‘t is the observed con~ntration of that tracer for observation period K. Step (a) applied to Set I produced the same results as thecorresponding step in the two previous procedures. Step (b) determined titanium, bromine, nickel, potassium and vanadium as the tracer elements for the five nontrivial sources found in Step (a). The estimated source intensities obtained in Step (d) were averaged across observation periods. The uncertainty in each average was obtained in a relatively crude way, by only considering the standard error of Bj. A feature of MLR(T)/FA is its ability to generate its own souroz profiles from the ambient observations, without requiring the input of any external source
limit on the number of major sources. (b) Perform target transformation factor analysis to determine the number of major sources. their average intensities, and their profiles. The ‘classical’ analysis in Step (a) was based on correlations between observation periods (Q-mode analysis), whereas the three preceding factor analyses used correlations between elemental concentrations (R-mode). The upper limit in Step (a) was determined by the number of factors that accounted for at least one unit of variance after orthogonal ~~ARI~AX~ rotation. This also digers from the three previous procedures, which used the more common method of equating the number of significant factors to the number ofcorrelation matrix eigenvalues greater than unity. For Set I (after eliminating arsenic), Step (a) indicated a maximum of seven factors, compared with the five found in the three previous procedures. In Step
Sotirce profiles were generated by repeating Step (c) with each element, in succession (instead of just total mass), as the dependent variable. (The cases involving a tracer element as the dependent variable are trivial, resulting in one regression coefficient of unity and the rest of 0.) The matrix of regression coe~cients thus obtained was then converted to the normal form of a source profile matrix by applying a normalization constant to each column (i.e. source) such that the total mass regression coefficient became unity. The restthing source profile matrix for Set I is given in Table IOA. The true profites are given in Table 1OD.
upper
profikes.
(7) Target transformtion
factor
analysis [TTFA]
OWN)
(a) Perform ‘classical’ factor analysis on ail elemental observations to screen data and determine an
Table 10A. Source profiles (mg g- ’ 1for Set I from MLR(TI/FA Element C Na : S Cl
Soil
(Road)
NS* NS 245217 6521t47 -40.3+ 13 - 22.3 54
K
537+31 52.2+4 56.5t12 NS NS 11.0*2
NS
Ca Ti V Cr MrS Fe Ni cu Zn
NS
Br Pb
NS NS
* NS = no solution
Wood
c Na
464&24 8.3&9.0 o+so 15Zso 0.040*0.123 3.7+5.f 0.26 & 7.36 OkI8 0.41 + 2.43 0.074+0.216 0.25 zko.49 0.42 IO.33 0.69 & 3.38 O.OSO+ 0.330 0.001&0.398 0.42 + 1.66 0.014+0.952 0.27 * 3.04
z S C1 K Ca l-i V Cr Mn Fe Ni cli Zn Br Pb
NS f7Si6 NS 363 & 58 NS 25.4+4
NS 94.2+ 12 NS NS 1683-22 277+8 192 NS NS NS NS NS NS NS 2.50_+0.8 26.5 &.1
NS NS NS NS 6.92 f 0.2 0.679 & 0.2 46.1 +6 4.12 7.65 + 0.29 5.55+_0.5
7.84*3 NS NS NS 0.735kO.l NS :5 3.S8rtO.3
NS 4Q.O 140+2
Oil 240*57 22.2+7 Gs NS I35&16 NS NS 18.8?7 NS 1.06 NS NS NS NS NS NS
NS
NS
NS
NS NS
NS NS
NS NS
(no non zero ratio foundh i.e. pi < 1.96 up,,
Table 10B. Source Element
Incinerator
NS
53.5k6 19.8 NS NS 2.8 4 0.2 14425 NS NS NS
As
(Sandblast)
Soil 34+10 4.3 f 3.8 2.56+21 518134 12rfr6 4.2+2.2 9.6 * 3.4 7.3 + 7.6 11.3kl.l 0.30j_0.10 0.78_tO.21 2.2orto.14 s5.0+5.7 0.2940.14 0.18kO.23 1.2&O.? 0.009~0.400 1.511.3
profiles
(mg g- ‘) for Set 1 from TTFA
Incinerator
Oil
(~ndblast~
(Road)
33*40 115116 76581 0.%X + 0.646 149&23 245&Q 213* 14 f2*31 o.s5*4.11 0.052 to.364 0.088+0.821 0.22+0.55 36-123 0.5OcO.56 6.12+0.92 12+3 0.021 *oh09 13*4
140+40 12112 8.0k63.1 37 + 105 87ki8 6.2k6.7 7.9 & 10.6 41224 4.6k3.3 1.94 ?: 0.29 0.11 *oil5 0.36kO.43 4.42 17.7 0.93 & 0.44 0.63 + 0.72 0.52 t_ 2.19 0.037 kO.257 3.4*4.1
3.6k 12.8 13&S 83127 3X,44 Ilk8 2.2 i 2.8 17k5 49+_ 10 3.7 & 1.4 0.14&0.12 3.58 k 0.27 0.83*0.18 11328 2.09*0.1X 4.18rf:O.30 3.1 * 1.o 0.010~0.519 0.35 & 1.66
642+30 84.7 k 11.2 9.1 rt: 59.6 llf99 47417 16&t 0.51 f9.W 54+23 8.5k3.1
0.007i: 0.688 0.78 ltO.61 1.1 z0.s 6.9 & 16.8 0.7110.42 0.98 + 0.68 5.442.1 73.3 _t 1.2 223f4
Interlaboratory
comparison
of source apportionment
Table IOC. Source profiles Element
wood
C
591+42 4.0* 20 10+31 12+92 0.25 f 26 16+11 13+5 14+ 14 0+3 OkO.11 OkO.4 0+0.4 0.31 k9.32 0+0.11 OkO.7 0.45 f 2.43 0.12+ 1.13 0.014k4.61
Na Al Si S Cl K Ca Ti V Cr Mn Fe Ni cu Zn Br Pb l
Submitted
Na Al Si S Cl K Ca Ti V Cr Mn
Fe Ni CU Zn As Br Pb
68k 12 0.0061+ 5 166k9 466*25 5.7k7.2 0.35 f 2.90 7.5 + 1.3 41.4k3.6 12.9kO.7 0.315f0.030 0.54*0.09 1.97*0.09 96.4f2.6 0.196*0.029 0.52kO.19 0.28 + 0.66 0.89kO.31 5.0* 1.3
after completion
Element C
Soil
Oil-B 200 22 5.4 70 110 12 1.40 8.6 1.10 0.67 0.42 0.140 13.0 0.06 0.75 1.30 0.25 0.39 3.8
procedures:
(mg p-
results for simulated
1531
data sets
‘) for Set I from TTFA*
Incinerator
Oil
35+37 64_+18 0.0002 + 26 0.53 + 79 64k24 196k 10 122+4 1.6k11.4 5.4+2 0.25 io.10 1.2kO.3 1.3kO.3 2.4k8.2 0.12+0.10 4.13kO.59 lo&3 0.37 f 0.98 9.2k4.1
185_+26 14+13 11*20 44+57 12Ok 17 5.4 k 6.7 2.9k2.8 26k9 2.7* 1.4 0.660 + 0.068 0.026 + 0.203 0.13kO.20 35_+6 0.094 + 0.066 0.4OkO.42 0.25 f 1.52 0.48 k 0.71 3.4k2.9
Coal
Sandblast
15551 16+25 35k25 8.5k11.8 210+40 0.00099~ 18 2305 120 508_+54 18Ok40 0.025 + 15 23* 14 37*7 17+6 8.5k2.7 1.5k7.7 81k16 16*3 2.7* 1.4 0.41 kO.14 0.15*0.07 0.35 *0.40 5.49 * 0.20 1.9io.4 1.1 +0.2 143212 45.6 k 5.5 0.50*0.13 3.06+0.07 0.014 kO.820 6.63 k 0.40 0.53 f 2.97 6.0& 1.5 o* 1.4 0.7OkO.67 1.6k5.7 5.9k2.8
Road 573f 17 54.8 L+8 44* 13 73+37 lo* 11 16&5 0.00004~ 1.7 13&6 0.77 f 0.90 0.00029 k 0.043 0.027 -+ 0.130 0.61+0.13 9.2 & 3.8 0.0037 * 0.042 0.060 _+0.269 4.2+ 1.0 52.4 + 0.5 152_+2
of workshop.
Table 10D. True
source profiles (mg g-l) for Set I*
Incinerator
Soil
Road t
wood
66.4 4.0 222 543 0.26 0.13 12.0 20.5 12.7 0.25 0.31 2.0 97 0.090 0.20 0.41 0.010 0.010 0.26
429 52.5 62 147 6.06 8.06 5.0 13.9 3.17 0.143 0.141 0.94 29.5 0.070 0.050 2.51 0.010 31.3 90.1
600 3.0 7.0 8.9 1.00 10.0 8.0 9.0 0.020 0 0.002 0.020 1.00 0.005 0 0.70 0 0.080 0.100
40 120 4.2 7.0 130 270 200 11.0 0.90 0.027 0.85 0.33 6.1 0.100 3.6 26 0.150 0.82 17.0
* Steel-A and basalt sources were also present but were not deduced t Road is a linear combination of soil (25 %) and auto (75 T,).
(b), several indicator parameters (Table 11A) were examined to arrive at the final number of source factors. The chi-square and average error indicators were the most useful in fixing this number at 6. (An indicator parameter should show a smaller rate of decrease once the proper number of factors has been used.) Details of the procedure used in Step (b) are given by Roscoe and Hopke (19Sla). Like MLR(T)/FA, TTFA generates its own source profiles without requiring the input of any source information. The source profiles resulting from TTFA applied to Set I are given in Table lOB, to be compared with the true profiles in Table 1OD.The uncertainties given in Table 10B represent an upper limit for one standard deviation, as discussed by Roscoe and Hopke (1981b).
Sandblast 5.0 8.0 4tG5 8.8 28 7.0 8.2 4.0 0.031 5.8 1.4 63 3.2 6.9 4.7 0.099 0.047 5.8
by any workshop
Coal-B 8.0 4.0 140 200 180 17 8.0 40 9.0 0.8 0.3 0.2 70 0.6 0.4 0.9 0.2 0.01 1.2 participants.
After the workshop concluded, a second TTFA Set I solution was submitted that is regarded as superior to the first one. For the second solution, the chi-square and average error indicators (Table 11B) showed a 7factor solution, in agreement with the classical factor analysis of Set I. The source profiles generated in the second solution are given in Table 1OC. The second solution used an R-mode analysis instead of the Qmode employed in the first solution.
RESULTS
OF THE INTERCOMPARISON
Tables 12 and 13, which give the average source strengths determined by the Quail Roost II participants from their analyses of simulated Data Sets I and
LLOYD A. CURR~E et al.
1532 Table
1 IA. Diagonalization
Factor
Eigenvalue
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
36OE +Ol 2.97E + 00 3.26E -01 2.37E - 01 8.lOE-02 3.19E-02 2.18E-02 1.258-02 5.65E -03 1.42E-03 2.66E - 04 9.80E - 05 6.92E - 05 1.50E-05 9SOE - 06 2.95E - 06 2.29E-06
* Six-factor
solution
results and reproductton
r.m.s. error 4.48E + 01 1.48E+Ol l.O9E+Ol 7.30E+OO 4.65E +00 3.06E+OO 2.14E+OO 1.59E+OO 7.19E -01 4.57E-01 3.38E -01 2.55E -01 l.l7E-01 6.51E-02 4.14E-02 3.14E-02 1.94E-02
submitted
Chi-square
Exner
Malmowskt emptrical Indicator
3.20E+02 8.11E+01 6.30E + 01 6.94E + 01 8.18E+Ol 1.78E+Ol 1.55E+Ol 9.28E + 00 104E+Ol 5.62E + 00 2.80E+OO 5.51E+OO 1.70E+OO 1.62E+OO 1.43E+OO 1.51E+OO 1.68E+OO
0.346 0.152 0.112 0.07 1 0.048 0.036 0.025 0.0156 0.0078 0.0038 0.0025 0.0018 0.0010 0.0007 O.ooo4 0.0003 0.0002
4.75E - 05 7.23E - 05 1.75E-05 l.l8E-05 8.75E -06 7.06E - 06 S.26E - 06 3.52E-06 1.90E - 06 I .02E - 06 7.35E -07 5.72E -07 3.50E -07 ?.75E -07 I .95E -07 1.63E - 07 I. 17E -- 07
results and reproduction
Factor
Eigenvalue
r.m.s. error
1 2 3 4 5 6 7 8 9 IO 11 I? 13 14 15 16 17
1.35E+Ol 1.92E+OO l.lZE+OO l.OlE+OO 2.48E-01 4.46E - 02 3.17E-02 2.46E - 02 1.67E - 02 l.O8E-02 9.53E-03 9.24E - 03 8.42E - 03 5.15E-03 3.18E-03 1.44E-03 1.31E-03
4.9E+Ol 3.4E+Ol 2SE+Ol 1.8E+ol 1.6E+Ol 1.3E+Ol l.lE+Ol 8.3E+OO 5.8E+OO 3.5E+00 3.2E+OO 2.5E + 00 1.5E-tO0 1.2E+OO 9.5E -01 7.4E-01 3.8E-01
solution
for Set I from TTFA*
Average error ( “<)b 130 70 58 51 48 37 34 25 23 14 IO IO 6.4 5.1 3.7 2.9 1.9
prior to workshop.
Table 1 IB. Diagonalization
* Seven-factor
summary
submitted
Chi-square 9.5E+O2 4.OE + 02 4.OE + 02 1.3E+Ol l.lE+Ol 1.3E+Ol 3.lE+OO 2.2E+OO 2.7E+OO 3.3E+OO 3.4E+OO 4.3E + 00 3.9E +00 3.4E +00 4.2E+OO 7.7E+OO 1.7EtOl
after completion
II, are the most important in this report. No results for Set III were reported. In presenting the results in an understandable and equitable manner, two complications must be addressed. The first relates to the difference between the CMB and factor analysis approaches to source apportionment. The former uses a supplied collection of source profiles, while the latter generates its own from the variability of the ambient data, without prior reference to the supplied profiles. In general, the two groups of profiles will not agree exactly, and judgment is needed to make a correspondence between them. In the case of Data Set I, factor analysis was part of the approach used by every reporting participant, and
summary
for Set I from TWA*
Exner
Mahnowskt empirical Indicator
0.75 0.56 0.42 0.22 0.14 0.12 0.107 0.091 0.079 0.070 0.060 0.050 0.038 0.028 0.020 0.0157 0.009 I
2.80E - 04 2.46E - 04 7.16E-04 1.39E-04 1.06E-04 l.llE-04 l.l8E-04 1.28E-04 1.45E-04 172E-04 2.09E - 04 2.54E -04 3_06E-04 4.01E-04 5.93E - 04 1.24E-03 4.02E - 03
Average error ( “,) 180 115 97 29 23 23 14 12 II II I0 8.9 6.6 4.7 3.5 33 3.0
of workshop.
there was fairly good agreement on the number and nature of the sources found (see Tables IOA-D). Accordingly, for Set I the supplied sources were grouped in a manner to best correspond to the source categories identified in the factor analyses. While this step was necessarily somewhat arbitrary, its basis was the information in Tables IOA-D and Table 8. The groupings are indicated in Table 12. No such problem arises with Data Set II, since only CMB analyses were reported for that set. The second complication arises from the use of the road source (a fixed linear combination of the supplied auto and soil sources) in the generation ofall three data sets. A factor analysis approach is by nature sensitive to
lnteria~rato~
comparison of source ap~~ionment
1533
procedures: results for simulated data sets
Table 12. Estimated source impacts (pg me3) compared with true values for Set I* Source
CMB(EV)/FAt
CMB( W)/FA$
MLR(T)/FA§
Steel-A Steel-B
1.0+0.3
-
-
Oil-A Oil-B
6.1 k 1.0
Incinerator
1.5kO.2
-
-
0.05 -
-
2.3kO.4
2.1il.O
0.7rto.3 -
2.4kO.4 -
1.8kO.l -
1.9&0.14 -
1.3 -
-
-
-
-
2.2kO.73
2.4 -
15.9+ 1.1
15.9kO.2
10.6+0.9
12.7FO.5
12510.75
5.3 +0.4
8.3 +0.2
3.0fO.l
4.0*0.09
7.4 j: 0.7
4.3 f 0.52
3.3
-
1.9kO.4
5.2 * 0.4
4.1*0.20
4.2
Glass
Road7 Wood
S.OkO.2
-
Sandblasttt Total
Truth
7.9kO.8
Coal-B Coal-A Aggregate Basalt Soilf
TTFAII
TT’FA
29.8
4.OkO.3
35.5
29.9
32.4
2.0
4.7 8.0** 7.1
33.05
31.1
* Uncertainties represent one standard deviation. t CMB result for the average of the observation periods. $ Average of CMB results for each observation period. $ Fitted intercept = 3.3 f 1.5 pg rnm3. II Results submitted after completion of workshop. f Data were generated using road as a source component, where road = 0.25 soil +0.75 auto. ** Soil from all sources (including road component) = 9.8 fig rnm3. ti Sandblast was an extraneous source, not included among the 13 profiles furnished to the participants. Labels originally used for this source by those who furnished estimates were: ‘other (steel-B used as surrogate)‘, CMB(W)/FA; ‘unknown (nickel tracer)‘. MLR(T)/FA; and ‘aggregate’, TTFA. Table 13. Estimated source impacts (fig m-“) compared with true values for Set II* Source
CMB(EV-1)r t
Nl$
CMB(EV-l)a!j
CMB(RR), $ CMB(RR),& 11 CMB(EV-2)t,(1,$
Steel-A Steel-B Oil-A Oil-B Incinerator
0.58+0.033 0.24kO.017 0.44*0.033 1.2k0.13 1.5+0.05
19 11 17 11 27
0.97kO.14 0.47kO.12 1.7& 1.1 1.5+0.1
0.29 + 0.73 0.3 1 * 0.49 0.1 I f0.66 2.3 f 3.3 1.3kO.84
0.66 f 2.4 0.26* 1.6 0.45 rto.56 0.11 f2.7 1.5f0.29
Glass Coal-B Coal-A Aggregate Basalt Soil Auto wood
0.16+0.03 3.7fO.17 4.3kO.3 2.1 +0.3 8.2kO.5 9.5 + 0.4 2.5kO.05 0.5kO.l
213 11 11 24 33 40 11
- 1.1 3.8& 6.2f1.3 11.2k1.7 7.8+ 1.4 2.SrtO.l 0.6kO.5
0.15fi.3 4.3f3.1 3.Ok4.9 7.3f4.0 9.0* 3.4 6.3 f 2.1 2.3 + 0.78 1.1kO.38
0.31 f 1.1 4.9k2.3 4.6k4.7 2.5 f 6.9 8.2f6.1 8.6* 3.5 2.5 * 0.2 1.0*0.54
Total
34.92
36.74
37.76
35.59
N2**
Truth
0.55+0.11 0.22 + 0.08 0.47+0.10 0.72 f 0.25 1.5*0.43
21 8 21 8 26
0.42 0.29 0.56 0.41 1.81
0.28*0.11 3.850.91 5.3* 1.40 2.1 kO.57 9.5* 1.28 8.4k1.26 2.5kO.31 0.91 kO.16
258 22 11 31 33 40 26
I? 3.7 0.46 11.3 7.9 2.5 0.94
36.25
34.05
* Uncertainties represent one standard deviation. t Average of CMB results for each observation period (40 total). $ Number of periods (of 40 total) for which source was used in CMB(EV-I), (separate analysis of each period). Q CMB result for the average of the observation periods. I/ Results submitted after completion of workshop. 5 Average intercept = 0.52 pg rnv3. ** Number of periods (of 40 total) for which source was used in CMB(EV-2). a combined source rather than its components, whereas a CMB approach resolves the (auto, soil) components and the combination goes unrecognized. For this reason it is natural to express the results from
factor analyses (Table 12) in terms of the road source, but the results from CMB analyses (Table 13) in terms of the auto source.
Set I results Examining the Set I results in Table 12, we conclude the following. (1) There were no false positives (absent sources reported as present). (2) False negatives (sources missed) were primarily
is34
i.JBYD
A. f&RRlE
restricted to those that were ‘weak’ (large relative standard deviation [ RSD]) or highly ‘collinear’ (correlated) with other sources. This result is especially evident in the approaches that used factor analysis to ascertain source categories and number ofsources.For example. of the nine sources present, ‘MYA found only six initially and seven in the post-workshop solution. (3) The unknown source, sandblast, was purposely made large (RSD + 1; see Fig. 3). Three out of the four responding participants reported a value for an extra component, each of which appears quantitatively consistent with the truth. (4) Arsenic was seen as a lone component in all factor analyses. However, every participant correctly judged this value to be measurement-error-reIated rather than indicative of a true source impact. Consequently. arsenic was omitted in all the final factor analyses of Set I. (5) Two of the methods, MLR(T)/FA and TTFA. provided independent estimates of the source profiles, a very interesting achievement. A comparison of the statistically significant estimates listed in Tab& fOA-D shows that the post-workshop TTFA profiles (Table IOC) and pre-workshop MLR(T)/FA profites (Table 10A) both are reasonably consistent with the true profiles (Table 10D). For the unknown profile (sandblast), the normalized residuals (i.e. the ratios of deviations to estimated standard errors) range from - 3.2 to + 2.0 for TTFA (post-workshop) and from - 3.6 to $5.6 for MLR(T)/FA. The pre~workshop ‘I’TFA profile estimates are more deviant, with residuals ranging from -9.2 to 6.2. It is noteworthy that TTFA correctly identified the largest number of source classes: 7 out of 9 in the ~st-workshop solution, and 6 out of 9 in the pre-workshop solution, Each of the other methods identified five classes. (6) The quality of fit for source impact estimates, as reflected in the normalized residuals, is roughly comparable for all methods (see Table 12). Excluding the road estimates. the maximum absolute normalized residuals range from 4.5 to 16. Methods that inferred their own profiles [MLR(T)/FA and ITFA] show large negative residuals for road estimates (- 10.5 and - 34. respectively). There are several possibilities for judging the overall success of a particular source apportionment analysis. For example, a graph&a1 approach is used in connection with the Set II results (Fig. 5). As a numerical alternative. we define a ‘figure of merit’, F F = kxminimum I
Calc True Truel’ Calc”. 1 ’
et d.
Fig. 5. Estimated vs true Set II source intensities for CMBW-E), , CMB(RR), and CMBIEV-2) solutions.
ficiencies in such a definition: as an averaged measure of agreement, F may be greatly influenced by the poor agreement of weak source intensities; F takes no account of the magnitudes of a solution’s calculated un~ertajnties~ and F is subject to significant uncertainty due to the relatively few degrees of freedom. Bearing these limitations in mind, for the five solutions shown in Table $2, we find that F ranges from a best value of 0.76 to a worst value of 0.42, with an average of 0.54. From the latter value we can state that the source apportionment procedures were able to predict source impacts (for the sources resolved) within a factor of 2 on average. Finally, it is of interest to consider whether the calculated uncertainties for the Set I solutions are reasonable in their magnitudes. Since the uncertainties are intended to represent one standard deviation, the di~erences between the source jntensity estimates and the true values should not exceed + 3 times the stated uncertainties and. in the long run. _ 68”” should lie within the stated uncertainties. All sets of resufts are consistent with this expected proportion (68”,), with the second TTFA solution lying closest, but there are too few degrees of freedom for a very sensitive test. {Because of sampling size error, with just seven comparisons, the proportion could range from 2/7 to 7/7 by chance [90°0 confidence interval].) All solutions have normalized residuals (differences from the truth/stated uncertainties) that are larger than expected: CMB(EV)/FA is besr in this regard (residuals of -4.5 to 4.1) and TTFA bare-workshop) is widest (residuats of -41 to 5.9). It follows that the estimated uncertainties were generally too small. Se1 II rrsulrs
(6)
where Calc and True are the predicted and actual intensities of the ith source, and the summation is over all P sources contrjbutjng to the receptor site. F lies in the range between Oand 1. and is equal to 1 if and only if there is perfect agreement; a value of 0.5 indicates that the predicted and actual intensities agree within an average factor of 2. There are admittedly some de-
Table 13 contains results for Set II that were submitted both before and after the workshop. The pre- and post-workshop CMB(RR) results differ in how the data were weighted in the fitting process. In the former, the weight for an average elemental concentration took into account both its measurement standard error and its day-to-day variability. In the latter. the weight was based on the quoted concentration (measurement) error only. Also, in the latter
Interlaboratory comparison of source apportionment
case, the criteria used to choose the ridge regression Avalue led to selecting k = 0. In other words, the best solution corresponded to ordinary weighted leastsquares CMB! Thus, the general agreement between the CMB(RR), solution and CMB(EV-lb is to be expected, since both resulted from averaging the elemental concentrations, and since the difference between ordinary and effective variance weighting is probably not large in this case. Similarly, the close agreement between CMB(EV-l), and CMB(EV-2) is to be expected. Not shown in Table 13 is a TTFA Csource solution obtained prior to the workshop: auto = 6.2 + 0.2 pg m- ‘, incinerator = 3.2 + 0.1 pg m- ‘, coal = 5.3 +0.3 pg me3, soil = 19.6f0.3 pg mm3. This approach thus significantly underestimated the number of major sources present in Set II. Little improvement was accomplished in post-workshop attempts with this method. The author who contributed this solution (PKH) assumes that the cause of these difficulties in source resolution was that many of the sources were collocated, such that their variation was governed predominantly by the dispersion model. The figures of merit caiculated from Equation (6) for the five solutions given in Table 13 range from a best value of 0.76 to a worst value of 0.57, with an average of 0.67. On this basis, the post-workshop CMB(EV-2) solution was closest to the true source intensities, but it was only marginally closer than the pre-workshop CMB(EV-l), solution. On the basis of the figure of merit values, we conclude that these source apportionment procedures were able to predict source impacts to better than a factor of 2 on the average. A graphical alternative to the numerical figure of merit is given by Fig. 5, in which the Set II results for CMB(EV-l),, CMB(RR)I and CMB(EV-2) are compared with the true values. Because of the use of logarithmic axes, the vertical linear distance of each point from the straight line is proportional to the percentage deviation of the result from the true value. Notice that, because of collinearity among source profiles, the quality of agreement does not improve monotoni~lly with increasing intensity of a source. A striking feature of the results presented in Table 13 is that while there is generally good agreement among intensity estimates for a given source, the calculated uncertainties are widely different, ranging from very small uncertainties for CMB(EV-l), to very large for CMB(RR). If we examine the uncertainties listed in Table 13 as we examined those in Table 12, we find that only the CMB(EV-I), and CMB(EV-2) uncertainties are consistent with the deviations. The CMB(EV-l), estimates and their uncertainties almost never overlap the true values, and the CMB(RR) results always do. (For a true standard deviation, the expected overlap probability is _ 68:,. However, because of sampling size error, with 13 comparisons, this proportion could range from 5/13 to 11/13 by chance [90% confidence interval].) In light of the expected similarity in the CMB(EV-l), and
procedures: results for simulated data sets
1535
CMB(RR), solutions (discussed earlier), the large difference in relative uncertainties between these two is inconsistent. Also, the CMB(RR) uncertainties are so large (RSD > 60 “,d)that eight sources must be considered ‘undetected’ by that method. Set III observation No results for Set III were submitted by any participant, but it was evident (because of the additional log-normal error components) that deconvolution would have been still more difficult than for Set II. The importance of additional physi~hemi~l characterization of source and ambient samples for these difficult but realistic deconvolution situations is evident from the demonstrated effects of collinearity among source profiles. An explicit illustration of this was given by the National Bureau of Standards, where the effect of i4C characterization was investigated. For Set III, without 14C measurements, there were 16 very strong pairwise correlations among the 13 source estimates; the addition of “C characterization reduced that number to only 1 severe interference (between steel-A and steel-B). (These interference questions were addressed mathematically by examining the var~n~~o~ri~~ matrix resulting from weighted least-squares analysis.) Carbon-14 characterization was obviously crucial in reducing the interference between wood and automotive sources. Clearly, other correlation-breaking observations, such as additional isotopic analyses and individ~l particle and crystalline phase analyses, may be the key for resolving a host of sources with adequate precision for control purposes.
UNCERTAINTIES
*
One topic that deserves further investigation in future intercomparisons and methods development is the matter of uncertainties, It is quite encouraging (and represents real progress) that all of the reported results included approximate standard errors. Some obvious differences in treating different error components, degrees of freedom, and confidence intervals were seen, but an attempt was made to quote all results on an equivalent basis. Limitations in current treatments (especially with respect to systematic error and the treatment of negative estimates) are summarized below. (I) None of the reported results included a completely adequate statement of uncertainties. Comparisons of deviations from the truth with error estimates (particularly for the results shown in Table 13) confirm that different investigators employed different bases for stating uncertainties. (2) For methods using the protiles as submitted (some form of CMB): only the effective variance CMBs explicitly included the effect of source profile (0,) uncertainties, but these were treated as random rather than as systematic error components.
LLOYD A. CURRIEet 01.
1536
The use of effective variance itself is an approximate treatment of the total least-squares prcblem (errors in the source matrix as well as in the observations); a more rigorous approach is given by Golub and Van Loan (1980). Still more approximate, however, were CMBs which explicitly included the observation measurement error (a,) only. This was generally by choice, however, and not indicative of an inherent limitation in the method (e.g. ridge regression). (3) Multivariate and factor analysis solutions quoted uncertainties that were to be taken as no greater than the standard error; no information on bias bounds (if any) was provided. (4) Ridge regression provided no quantification of bounds for the bias that is introduced by this mathematical method. However, according to the author who contributed this solution (HJW), the absence of significant negatives or unreasonably large positive values precludes the possibility of extreme biases. (5) Uncertainty about such items as the appropriate number of degrees of freedom prevented the construction of confidence intervals based on Student’s t. (6) Consistency tests between the MLR(T)/FA and TTFA source profile estimates and the true profiles were likewise difficult because of insufficient information about uncertainties in the estimated profiles. (7) For Set I (fewer than 13 sources operating), the questions of uncertainty intervals for the estimated number of sources or alternative possible solutions within such intervals were generally not addressed. (8) Quality of fit measures would have been desirable for all methods and results. In at least some cases discussed at the workshop, it was apparent that such measures (e.g. chi-square) did not always remain within statistically acceptable bounds.
CONCLUSIONS
Despite the extremely short time available for analysis prior to the workshop, the methods of deconvolution were generally satisfactory, providing consistent estimates for nearly all sources present. In quantitative terms, the best solutions for the two data sets that were analyzed gave source intensity estimates that averaged within c 30 y0 of the true values. It was not obvious that any one method was ‘best’ in all circumstances. Specifically, the multivariate methods (factor analyses and multiple linear regression by tracers) were very successful in dealing with Set I, which contained a relatively small number of sources, one of whose profiles was unknown. On the other hand, the least-squares (CMB) method was clearly the better choice for Set II, which contained a large number of active sources with no unknown source profiles. Using these observations in the source apportionment of a real data set would be difficult because a real data set is likely to contain a large number of sources, at least some of which probably are unknown. That is, a real data set would probably
combine the more difficult aspects of Sets I and II. It seems apparent that there is value in applying both CMB and factor analysis approaches to a given data set, and insisting on some degree of consistency between them before regarding the results as valid. Time was insufficient for full use of meteorological data by the various methods. The two factor analyses that did use meteorological variables had different degrees of success. Given the limited total number of samples’available for analysis, no participant attempted to use the meteorological data to stratify the ambient concentration data prior to analysis. This approach might have been useful if the participants had been supplied with the source geography (Figs I and 2) and with more observations. Limitations in the analysis of the data sets related primarily to limitations in power of resolution and to different (or incomplete) approaches to stating uncertainties. The latter limitation has already been discussed. The former is intrinsic in the data (not the deconvolution methods) and is expected to be solved by adding correlation-breaking data (e.g. isotopic and microscopic), which tend to carry unique information about different sources. Sampling regimes that permit the application of such techniques as wind trajectory analysis may also be very effective in reducing the effects of profile collinearities and in discovering superpositions (such as ‘road’ in the current study). The eventual marriage between deterministic (dispersion and source inventory) and stochastic (receptordeconvolution) models may also lead to greatly improved resolution together with identificationquantification accuracy. Whether either model is sufficiently advanced for an effective linkage is unclear at present; this linkage would seem to be a matter for further research. (See comments below concerning the limitations of the RAM dispersion model.) Strengths and weaknesses in the simulated data, per se, were evident in the generation process and in the results of the intercomparison. The greatest strength was the fact that rigorously controlled models, with known error structures and source concentrations, could be applied. The ranges of concentrations and source types were reasonable (realistic) and ‘interesting’ because participants could readily quantify some components but scarcely detect others. The unknown number of components, the ‘extra’ component, the linear combination component, and the collinearity of profiles served very useful purposes, by representing situations that can be expected to occur in real field data. The need for more attention to treating the different error classes and estimating uncertainties was usefully demonstrated by the simulated data, as was the need for additional kinds of physicochemical information (if one hopes to resolve cases as difficult as Set III). The exercises were not, of course, broad enough. Having more extensive world lists of sources, data sets containing more than a single unknown source, and multiple receptors with additional wind direction data
Interlaboratory comparison of source apportionment would be desirable. More observtition periods also would be desirable, since the number of degrees of freedom in each data set was barely adequate to
properly apply factor analysis (Henry et al., 1984). The greatest limitations in the simulation process were various peculiarities of the RAM dispersion model. The narrowness of the dispersion led to extreme fluctuations in source contributions at the receptor. Also, this narrowness together with the RAM’S different treatments of point and area sources and lack of a subroutine for line sources (highways) led to unrealistic (too large, too variable) emissions strengths for several source types. Based on all of the foregoing considerations (apportionment techniques, simulation method, and data), we conclude that future simulation exercises should consider the following improvements. (1) Generate replicate observations to provide direct measures of imprecision and bias. (2) Overcome plume narrowness by using an improved dispersion model. (3) Provide participants with geographical information for both sources and receptors. (4) Use multiple (- 3) receptors to evaluate the ability to pinpoint unknown sources and estimate upper limits of emissions from specified sources. (5) Increase potential resolving power by providing not only hourly wind directions, but also information on direction variability (a& for each hour. (6) Treat u,, as a discrete (feedstock) change, rather than as random with each sampling period. (7) Change sampling times from 1200-2400 to 0700-1900 to better expose diurnal variations. (8) Introduce unknown sources to test for false positives and negatives (and for profile estimation) by including > 2 unknown sources and expanding the world list. (9) Use both fine and coarse fractions. (10) Include morphological data. (11) Include sensitivity studies of source intensity detection threshold and source resolution of similar profiles. (12) Increase the number of sampling periods to meet the statistical requirements of factor analysis.
procedures: results for simulated data sets
1537
Acknowledgements-RWG gratefully acknowledges support for part of this work through a Fellowship awarded by the
Natio”a’ Research Cou”ci’.
REFERENCES
Bevington P. R. (1969) Data Reduction and Error Analysisfor the Physical Sciences. McGraw-Hill, New York. Bowne N. E., Cher M. and Liu M. (1981) EPRI Plume Model Validation Project: preliminary evaluation of model performance at a Plains site. In Proc. 74th Annual Meeting and Exposition
of the Air
Pollution Control
Association,
Philadelphia, Pennsylvania, pp. 1-16 (Session 24 Paper 1). Currie L. A. (1977) Detection and quantitation in X-ray fluorescence spectrometry. In X-Ray Fluorescence Analysis ofEnoironmental Samples (edited by Dzubay), pp. 289-306. _ __ Ann Arbor, Ann Arbor, Michigan. Dzubav T. G.. Stevens R. K.. Courtnev W. J. and Drane E. A. (198i) Chemical element balance aialysis of Denver aerosol. In Electron Microscopy and X-ray Applications to Environmenfal and Occupational Health Analysis, Vol. 2 (edited by Russell), pp. 2342. Ann Arbor, Ann Arbor, Michigan. Golub G. H. and Van Loan C. F. (1980) An analysis of the total least squares problem. SIAM J. Numer. Anal. 17, 883-893.
Henry R. C., Lewis C. W., Hopke P. K. and Williamson H. J. (1984) Review of receptor model fundamentals. Atmospheric Environment 18, 1507-I 515. Novak J. H. and Turner D. B. (1976) An efficient Gaussianplume multiple-source air quality algorithm. J. Air Pollut. Control Ass. 26, 570-575.
Parr R. M., Houtermans H. and Schaerf K. (1979) The IAEA intercomparison of methods for processingGe(Li)gammaray spectra. In Computers in Activation Analysis and Gamma-Ray Spectroscopy, pp. 544-562. CONF-780421, National Technical Information Service, Springfield, Virginia. Roscoe B. A. and Hopke P. K. (1981a) Comparison of weighted and unweighted target transformation rotations in factor analysis. Comput. Chem. 5, l-7. Roscoe B. A. and Hopke P. K. (1981b) Error estimates for factor loadings and scores obtained with target transformation factor analysis. Analytica chimica Act> 132, 89-97. Stevens R. K. and Pace T. G. (19841 Overview of the Mathematical and Empirical Re&ptor’ Models Workshop (Quail Roost II). Atmospheric Environment l&1499-1506. Turner D. B. and Novak J. H. (1978) user’s Guidefor RAM. EPA-600/8-78-016a, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina.