Comparative testing of PMF and CFA models

Comparative testing of PMF and CFA models

Chemometrics and Intelligent Laboratory Systems 61 (2002) 75 – 87 www.elsevier.com/locate/chemometrics Comparative testing of PMF and CFA models Y. Q...

372KB Sizes 0 Downloads 91 Views

Chemometrics and Intelligent Laboratory Systems 61 (2002) 75 – 87 www.elsevier.com/locate/chemometrics

Comparative testing of PMF and CFA models Y. Qin a, K. Oduyemi a,*, L.Y. Chan b b

a Division of Construction and Environment, University of Abertay Dundee, Dundee DD1 1HG, UK Department of Civil and Structural Engineering, The Hong Kong Polytechnic University, Hong Kong, People’s Republic of China

Received 10 March 2001; accepted 25 September 2001

Abstract Positive matrix factorization (PMF) and convenient factor analysis (CFA) models have been tested using a large aerosol database measured in Hong Kong. As many as possible chemical components (good elements or so-called weak elements) [Atmos. Environ. 33 (1999) 2169] were selected to compose as large as possible a database. Error estimates and enforced rotation techniques were used in the PMF model trial. These important aspects were not included in a recently published work [Atmos. Environ. 33 (1999) 2169]. The test results of the two models mentioned above were assessed qualitatively by analyzing factor characteristics, and quantitatively by comparing factor mass profiles. CFA model has been shown to be a convenient tool for aerosol source identification and can qualitatively treat the elements that can serve as source tracers as well as PMF model does. PMF model provides expert tool for the identification of aerosol sources and source contribution estimation. It can treat the chemical components from various sources by apportioning these chemical components among the factors more reasonably than CFA model can. Quantitatively, the factor mass profiles produced by a PMF model are better at describing the source structure than those derived by a CFA model. D 2002 Published by Elsevier Science B.V. Keywords: Positive matrix factorization; Convenient factor analysis; Respirable suspended particulate; Source identification

1. Introduction Identification of airborne pollutant sources and estimation of source contributions to air quality ‘‘hot spots’’ are very important in the management of ambient air quality. Dispersion models are usually used to estimate contributions of a single source or multiple sources to air quality. On the other hand, receptor models provide a useful means of identifying air pollution sources and of quantitatively apportioning air pollutant concentrations to their sources, even when these sources are not reasonably defined. Receptor models have generally been used to identify sources of atmospheric aerosols [2]. Various approaches have

*

Corresponding author.

been developed for receptor model analysis and these are outlined below. . In rural or remote sites where there is no local emission source, air parcel backward trajectories have been used to identify the source regions of the air pollutants [3– 5]. . A graphical technique for determining major components in a mixed aerosol has been developed by Rahn [6]. . Chemical mass balance (CMB) is a method used to estimate the contributions of various sources to atmospheric pollution when the number of sources and source profiles are reasonably defined [7– 10]. The source contributions are quantified by ‘‘regressing’’ pollutant concentrations on source profiles. The precision of a CMB model depends on the precision of a source fingerprint. In many situations, the precise

0169-7439/02/$ - see front matter D 2002 Published by Elsevier Science B.V. PII: S 0 1 6 9 - 7 4 3 9 ( 0 1 ) 0 0 1 7 5 - 7

76

Y. Qin et al. / Chemometrics and Intelligent Laboratory Systems 61 (2002) 75–87

source profiles are not easily obtained because many fugitive and small sources with widely varying compositions exist. Furthermore, the source profiles may have changed since they were last investigated. . In cases where the source profiles are not reasonably defined, convenient factor analysis (CFA) or the principal component analysis (PCA), a mathematical tool for data analysis, can be used to extract the common sources (factors). Either of the above mathematical tools may also be used to apportion the mass among the sources from aerosol chemical data or aerosol and gas pollutant data [11 – 13]. The use of the CFA model to identify air pollution source works on the basis that the components from the same source will be correlated and this correlation can be used to estimate the source composition. The principal method of CFA or PCA model is the singular value decomposition: X = GF, where X is an airborne pollutant concentrations matrix with dimensions of n  m (n represents the number of samples measured at the receptor, m represents the number of chemical species analyzed for each sample). CFA model decomposes airborne pollutant concentration matrix X into two new matrices G and F with dimensions of n  p and p  m, in such a way that GF explains the variation in X as well as possible ( p is the rank of factorization, G is the factor score matrix, F is the factor loading matrix). CFA model is now available in some commercially available data analysis software packages, such as SAS, STATISTICA, MINITAB, etc. The results of CFA analysis usually possess rotational ambiguity. Rotation may have to be performed to impose characteristics expected for the factors. Equimax, Varimax, Quartimax and Orthomax are the four choices for orthogonal rotation in the CFA model software. Varimax is usually used to identify air pollution sources because it can maximize the variance of the squared loading. However, the negative loading that exists in the results of the CFA model are difficult to interpret physically. A new type of factor analysis method, positive matrix factorization (PMF), has been developed and applied to air pollution source identification [14 – 18]. PMF model has attracted general interests because of its individual features. PMF, unlike CFA, produces strictly nonnegative factor loading and factor scores with less rotational ambiguity. Concentrations of

chemical components measured in the ambient environment vary widely, especially in the case of some trace elements in aerosols. In a PMF model, original data can be weighed point-by-point using error estimate. This function is very useful in handling the data with different variation ranges and uncertainties. Furthermore, subjective information, such as the knowledge on air pollution source composition, may be combined with factor analysis in a PMF model trial. A PMF model provides specially enforced rotation techniques to enable the results to approach ideal situations. The research group headed by Dr. Hopke has made some progress in PMF model applications [19 – 21]. Huang et al. [1] tested and optimized CFA and PMF models on a set of aerosol data from Narragansett, USA, when trying to answer two specific questions: Which technique distinguishes sources better? In addition, how much can we ‘tune’ each of the techniques to extract the most of the information? They found out the following. . The choice of factor analysis technique was much less important than the proper use of the technique. . The selection of good elements was the key to obtaining good results. The resolution of factor analysis might be degraded by some elements that vary randomly in response to analytical uncertainties or by ‘under-detectable’ elements. The so-called weak elements, the elements that could not serve as source markers and those that have large analysis uncertainties, should not be included in factor analysis. . CFA model was easy to use, fast and especially good for surveys and qualitative work. . PMF model used information more efficiently and separated sources up to an order of magnitude better than CFA model, although it was hard to use. However, the ‘weak’ elements contain the source information just as well as other elements and, therefore, we decided to include them in the factor analysis in order to have full confidence in the results of such analysis. The following two important functions of PMF model, as mentioned above, were not tested in Huang et al.’s PMF model trial:  

The weightings of data using error estimate. The complementary use of subjective information with the factor analysis.

Y. Qin et al. / Chemometrics and Intelligent Laboratory Systems 61 (2002) 75–87

In this work, PMF and CFA models have been tested using a large database of Respirable Suspended Particulate (RSP) measured in Hong Kong. As many as possible chemical components (good or ‘weak’ elements) were selected to compose as large as possible a database. In the PMF model trial carried out as part of this study, different error estimates were used to weigh different chemical components and enforced rotation was used to modify some factor profiles to ideal patterns. The inclusion of these important functions in our model trial distinguishes our work from that of Huang et al. [1]. In this paper, the test results will be interpreted qualitatively and compared quantitatively by analyzing source characteristics and comparing source profiles with artificial ideal source profiles.

2. Selection of aerosol data and components An air quality monitoring network has been set up since 1984 by the Hong Kong Environmental Protection Department (HKEPD). The network comprises thirteen monitoring stations in different districts of the city, five of which are no longer in existence now. Typical gas pollutants and meteorological factors are measured continually by auto analyzers and equipment located at the air quality monitoring stations. RSP was sampled for 24 h every 6 days using High Volume samplers at all stations. The chemical composition of RSP was analyzed in a Government Laboratory. HKEPD summarized all the air quality monitoring network data from 1986 to 1995 and published the information on a CD-ROM [22]. This archived information includes concentrations of mass and 29 chemical components of RSP. The information provides a good database for testing factor analysis models. Based on this database, Qin et al. [23] analysed the characteristics of chemical composition of RSP in Hong Kong. Lee et al. [24] applied PMF on this database to apportion sources of particulate pollution in Hong Kong. Nine factors, which include secondary ammonium sulfate, chloride-depleted marine aerosols, crustal/soil dust, nonferrous metal smelter, vehicular emission, particulate bromide, particulate copper and fuel oil burning, were identified. However, the weightings of the original data by error estimate were not included in their PMF trial. Carbon,

77

C, which held about 46.3% of RSP mass and NO 3 have not been included in their PMF trial. C is usually not included in factor analysis because its existence may change the structure of the source profile, even if a small portion of C is apportioned to a factor that C is not a major component of (See Section 4.2). However, C is the most important chemical component in atmospheric aerosol and we believe that the source contributions can not be estimated reasonably if C is not included in the analysis. NO 3 is also excluded from the analysis, because there is relatively large uncertainty in sampling and analyzing this component. In comparing the data treating abilities of PMF and CFA models, as many as possible chemical components were selected in this work. The only principle for selecting chemical components was that of getting as large as possible sample number with less number of missing data (‘empty’) in the database. Mass and 20 chemical components of RSP (Al, Ca, Mg, Pb, Na+ ,  2 V, Cl  , NH+4 , NO 3 , SO4 , Br , Mn, Ni, Zn, C, + THC, Cd, K , Ba, Cu) measured at 11 air quality monitoring stations in Hong Kong, in the period 1992 – 1994, were selected for the comparative testing of CFA and PMF models. The 20 selected chemical components held about 81.4% of the RSP mass. In the PMF model test, we found that large residues were usually produced in the samples with more than two missing data although the PMF model has a special function to handle the missing data. As a result of this observation, samples with more than two missing data were deleted from the database. The final database used in this work contained one thousand, four hundred and one RSP samples. This database should show the original relationships among the chemical components of aerosol reasonably well.

3. Model trials 3.1. PMF model There are two main types of PMF models, namely:  

Two-way PMF Three-way PMF

Two-way PMF, like normal CFA model, is a twodimensional principal component analysis model.

78

Y. Qin et al. / Chemometrics and Intelligent Laboratory Systems 61 (2002) 75–87

Two-dimensional matrix is used as input database. Three-way PMF is a straightforward generalisation of two-way PMF. It is a trilinear factor analysis model, in which three-dimensional matrix is used as input database. The two-way PMF was tested using the Hong Kong database. Sij, the error estimate for Xij, the measured concentration of chemical component j in sample i was calculated using the following formula [25]: qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Sij ¼ Tj þ Uj maxðAXij A; AYij AÞ þ Vj maxðAXij A; AYij AÞ

ð1Þ

where Yij is the fitted value for Xij using a PMF model. There is a relationship between effective digit of concentration and detection limit for the chemical component. Two units of effective digit of concentration for the chemical components of the aerosol were used as Tj. For example, effective digits of concentration are 1  102 for SO42  and 1  10  1 for NO 3 . The values of 200 and 0.2 were used as the values of Tj for SO42  and NO 3 respectively. The second variable in Eq. (1) was neglected in this work (Uj = 0). There is no simple rule for selecting values for coefficient Vj. The optimal way for selecting the value of Vj depends on a basic understanding of the data and trial. Experience dictates that large values should be used as coefficients for the less precise chemical components. The initial selections of Vj values were based upon variation ranges of the chemical components, because the chemical analysis precision and sample uncertainty for each chemical component of aerosol were unavailable on the CDROM. Relatively large initial values of Vj were used in the model test because the variation ranges of concentrations of aerosol mass and chemical components are very large. Then, the value of Vj was modified in trial, basically in accordance with the standard deviation of scaled residuals (Eij/Sij; where Eij = Xij  Yij) for a chemical component. The small deviation of scaled residuals for a chemical component means that PMF model provides a good fitted concentration for the chemical component. It may be interpreted that the random error is small for this chemical component. Therefore, a small value of Vj can be used. A large deviation of scaled residuals for a chemical component means that

PMF model provides a poor fitted concentration for the chemical component, which may be interpreted as random error being large for this chemical component. Consequently, a large value of Vj should be used in this case. For example, the same initial value of Vj was used for both SO42  and Br . The value of Vj for SO42  was reduced in the trial, because the standard deviation of its scaled residual was small. On the other hand, the value of Vj for Br was increased because the standard deviation of its scaled residual was large. Other factors were considered as well in the choice of value for Vj. For example, the random errors for NH4+ and NO 3 are relatively large in aerosol sampling and analysis. Relatively high values of Vj for these two chemical components were used, although their standard deviations of scaled residuals were small. The final value of Vj varies between 0.1 and 0.3. Relatively low value (0.1) was used for Al, Mg, Na + , SO42  and C, while relatively high value (0.3) was used for mass, NH4+ , Br  , Mn, Zn, THC and Ba. As mentioned in Section 2, there is less missing data in our database because the samples with more than two missing data have been deleted. Geometric mean values with their corresponding error estimates equaling four times the mean values were used for missing data. The relative error estimates for missing values equal 400%. Thus, the missing values have much lower weights in comparison to the measured (actual) values. The technique used here is that which is used in PMF models for treating missing data. PMF model was tested on the Hong Kong database with a factorization rank of between 5 and 10. The overriding principle used for judging the appropriate number of factors is linked to what the authors considered to be suitable for physical interpretation. The results of the PMF model runs, using different factorization ranks, were found to be stable, i.e. the major characteristics of the initial five factors did not change with increasing factorization rank. It was found that physical interpretations can be given to the source characteristics of newly derived factors, with the best physically interpreted results being that with a factorization rank of nine. These nine factors represented 74.8– 98.6% of RSP mass and various chemical components. The results from the PMF model are presented as scaled factor and factor loading. The scaled factor usually shows the source characteristics well while factor loading shows the

Y. Qin et al. / Chemometrics and Intelligent Laboratory Systems 61 (2002) 75–87

source profiles directly. After analyzing the source characteristics, a few source profiles that biased their source characteristics were made to approach the ideal pattern by using enforced rotation. 3.2. CFA model There is less difference among the factor analysis tools in different data analysis software packages [1]. Factor analysis tool in MINITAB, a data analysis software package (Minitab) was tested using the Hong Kong database. The data analysis was performed on a worksheet in the software package. This approach is very convenient for entering data, selecting elements and obtaining results quickly. Varimax rotation was selected for the model trial. The CFA model was tested with a factorization rank of between 5 and 10. The characteristics of the factors derived from the CFA model using different factorization ranks were very similar to those of the factors derived by using PMF. The results derived from a factorization rank nine were only used for comparison purposes. The nine factors have already achieved 88.4% of the total variability. The results of the CFA model show factor characteristics reasonably well. However, they did not directly show the source profiles. The loading produced by CFA has to be converted to relative mass. The conversion method used by Lowenthal and Rahn [26] was adopted in this work. The loading for each chemical component was multiplied by the component’s standard deviation in the database. The negative loading derived by using the CFA model seemed to represent sources from opposite directions [1].

4. Results and discussion 4.1. Factor characteristics The scaled factors (Explained Variances) produced by the PMF model qualitatively show factor characteristics. These are shown in Fig. 1. The characteristics of factors 1, 2, 3, 4 and 8 are very clear and can be easily interpreted. Factors 1, 2, 3, 4 and 8 are the source of soil and construction dust, the secondary pollution source of ammonium sulfate, the secondary

79

pollution source of ammonium nitrate, the fresh marine source and the fuel oil burning emission source, respectively. This is because we have high loading in the chemical components associated with source markers such as Al, Ca and Mn in factor 1, NH4+ and SO24  in factor 2, NH4+ and NH 3 in factor 3, Na + , Cl  and Mg in factor 4 and V and Ni in factor 8. Factor 5 (Fig. 1) has a high loading for C and THC. It also has a loading for Br  and Pb. Many vehicles in Hong Kong used diesel as their low cost fuel during the period corresponding to the database used in this program of work. The particles emitted from diesel vehicle contain a significant amount of C and THC. Thus, this factor should represent a vehicle emission source. Factor 6 has a high loading for Na + , Mg and a loading for SO24  . Na + and Mg are major components of marine aerosol. There was a fraction of SO24  that can not be associated with NH4+ in Hong Kong (the average equivalent concentration ratio of SO42  /NH4+ was 1.54). This factor should represent the aged marine aerosol source with chloride replaced by sulfate. Factor 7 has a loading for various heavy metals, Pb, Zn, Cd and K + as well as NH4+ , SO42  and C. There were two large coal burning power stations and two incinerators in operation in Hong Kong during the period 1992 – 1994. This factor may be linked to the emission from incinerator emissions. The scaled figure shows that factor 9 is a unique factor. It has a high loading for only one chemical component, Cu. Cu may originate from the brush of the motors in the high volume sampler. The concentration of Cu varied with the locations of monitoring stations in Hong Kong, as expected. It was especially high at three monitoring stations located in industrial and mixed areas, with lots of printed circuit board facilities. Acid spray from nozzles was used to etch the circuit board in these facilities. Ventilation fans were usually used to remove the fugitive mists in the facilities. Factor 9 probably represents particulate Cu source from these facilities and from the brush of the motors in the high volume sampler. The characteristics of eight of the factors derived in our PMF trial are qualitatively the same as those reported in Lee et al.’s [24] paper. However, a more reasonable factor of ammonium nitrate in this paper replaces the factor of particulate bromide in Lee et al.’s [24] paper. The particulate bromide factor was interpreted as vehicle emission/road dust source in Lee

80

Y. Qin et al. / Chemometrics and Intelligent Laboratory Systems 61 (2002) 75–87

Fig. 1. Scaled factors produced by PMF model.

Y. Qin et al. / Chemometrics and Intelligent Laboratory Systems 61 (2002) 75–87

Fig. 2. Factor loading produced by CFA model.

81

82

Y. Qin et al. / Chemometrics and Intelligent Laboratory Systems 61 (2002) 75–87

et al.’s [24] paper, but the interpretation was not convincing. The loading obtained from the CFA model shows factor characteristics clearly and this is shown in Fig. 2. If the sign difference in the loading was assumed to represent sources from opposite directions, the source characteristics represented by the factors derived from the CFA model are qualitatively similar to those produced by the PMF model, except in the case of factor 6. This is different from the factor 6 derived by PMF model (the aged marine aerosol source with chloride replaced by sulfate). The factor 6 derived using CFA model has a high loading for only one element, Br  . This factor is qualitatively the same as the factor of particulate bromide obtained by Lee et al. [24]. Error estimate was not used when Lee used the PMF model. Hence, the PMF results obtained by Lee et al. show similar characteristics to the CFA results. Comparing the two results shown in Figs. 1 and 2, CFA model is always used to attribute various chemical components to a few numbers of factors. On the other hand, PMF model does this and attributes some

other chemical components to a large number of factors. For the chemical components that can serve as source tracers, CFA model can identify their sources as well as those identified by PMF model, sometimes more clearly. For example, the loading of Al, Ca, Mn and Mg in factor 1, the absolute loading of NH4+ and SO42  in factor 2, the loading of Cl  , Na + and Mg in factor 4 and the loading of V and Ni in factor 8 (see Fig. 2) are generally higher than those of the corresponding loading figures in Table 1. However, for some normal chemical components that may come from various sources, the way CFA model attributes them to a few number of factors may cause confusion. For example, high loading of Mg and Ba in factor 1 and high loading of Pb, Cd and K + in factor 2 (see Fig. 2) can not be given any reasonable interpretation. 4.2. Factor mass profiles The factor mass profiles are known to quantitatively show source structure. The best way to assess the result of factor analysis is to compare the derived

Table 1 Mass profiles produced by PMF model

Al Ca Mg Pb Na + V Cl  NH4+ NO3 SO24  Br  Mn Ni Zn C THC Cd K+ Ba Cu Equivalent anion/cation

Factor 1 (%)

Factor 2 (%)

Factor 3 (%)

Factor 4 (%)

Factor 5 (%)

Factor 6 (%)

Factor 7 (%)

Factor 8 (%)

Factor 9 (%)

9.40 19.30 3.33 0.00 1.67 0.00 0.00 0.01 1.07 21.46 0.01 0.32 0.00 0.13 41.71 0.45 0.00 1.06 0.07 0.01 4.64

0.26 0.00 0.00 0.04 0.01 0.00 0.00 14.24 0.19 40.57 0.01 0.00 0.00 0.09 44.58 0.00 0.00 0.00 0.01 0.00 1.07

0.03 0.28 0.00 0.44 0.02 0.00 2.74 18.33 69.61 5.98 0.04 0.05 0.02 0.63 0.09 0.09 0.01 1.58 0.00 0.07 1.25

0.01 0.70 1.93 0.00 19.18 0.00 34.54 0.00 0.47 0.26 0.06 0.01 0.00 0.01 41.88 0.03 0.00 0.91 0.01 0.00 1.15

0.00 0.65 0.03 0.08 0.00 0.00 0.00 0.00 0.01 2.54 0.02 0.01 0.00 0.13 86.68 9.81 0.00 0.00 0.02 0.01 213.67

0.01 0.85 2.49 0.00 22.81 0.03 0.01 0.01 2.76 63.36 0.04 0.01 0.00 0.11 7.00 0.01 0.00 0.42 0.10 0.01 1.36

0.03 0.78 0.00 0.73 0.26 0.03 0.00 0.03 0.05 12.27 0.02 0.09 0.00 0.72 77.60 0.01 0.01 7.35 0.02 0.00 1.28

0.33 0.65 0.01 0.00 0.05 1.00 0.01 0.03 0.11 11.96 0.04 0.00 0.28 0.06 85.14 0.01 0.00 0.15 0.10 0.10 35.52

0.27 1.54 0.00 0.00 1.58 0.01 0.73 0.01 3.44 8.94 0.03 0.01 0.00 0.07 61.40 0.02 0.00 2.63 0.12 19.19 1.92

Y. Qin et al. / Chemometrics and Intelligent Laboratory Systems 61 (2002) 75–87

factor profiles with the real source profiles. Unfortunately, measured profiles for the local Hong Kong air pollution sources are unavailable. The factor mass profiles were assessed by analysing source characteristics and comparing them with some artificial ideal source profiles applied in SCAQS (Southern California Air Quality Study) PM2.5 and PM10 receptor modelling [27]. Some real source profiles such as secondary pollution sources and fresh marine source should be very similar to the ideal source profiles. However, contamination in the environment could make real primary source profiles depart slightly from the ideal patterns. Ideal source profiles are usually used when real source profiles are unavailable. F-factors derived using the PMF model show the mass profiles for the factors directly. The mass profiles for a total of 20 chemical components produced by the PMF model (in percentages) are shown in Table 1. C is known to be the most important chemical component in aerosol. It comes from various sources. C holds about 46.3% of the RSP mass in the Hong Kong database. C may modify some factor mass profiles and cause them to depart markedly from expected ideal patterns, even where a very small percentage of C is apportioned into corresponding factors. For example, C holds about 41.88%, 7.00% and 61.40% mass in factors 4, 6 and 9 respectively, although only 5.8%, 1.2% and 1.9% of C are apportioned to these factors by the PMF model. The authors have enough reasons to believe that fresh and aged marine aerosol sources and particulate Cu source should not contain high percentage mass of C. The results show some rotational ambiguities in the factors. PMF model contains four enforced rotation techniques for imposing characteristics expected for the factors. FKEY, a technique for pulling individual factor elements to zero [25], has been used by Lee et al. to pull sulfate in some factors to zero in order to obtain useful profiles. FKEY technique was also used in this work to pull high percentage mass of C in factors 4, 6 and 9 to zero. The ‘pulling down’ operation is controlled by the FKEY matrix. This matrix of integer value is of the same dimensions of F. Each element of FKEY matrix controls the behavior of the corresponding element in F. Three elements in the F matrix that represent C in factors 4, 6 and 9 were identified. A FKEY matrix was constructed with zero for all

83

elements except the three elements that control the behavior of C in factors 4, 6 and 9 in the F matrix. A value of 9 was chosen as the values for these three elements (a ‘‘medium-strong’’ pulling). F and G matrices resulting from the previous model trial without enforced rotation were used as initial values for the matrices. With the same data matrix and error estimate, C in factors 4, 6 and 9 were pulled to zero successfully. The factor characteristics produced using the PMF model with enforced rotation are qualitatively the same as those without enforced rotation. There is no obvious difference in between the performance of PMF model without rotation and with enforced rotation. The calculated Q value for the ‘enforced rotation’ case is only about 1.7% higher than its value for the ‘without rotation’ cases. The modified mass profiles after applying enforced rotation are shown in Table 2. The percentages of C in factors 4, 6 and 9 have reduced to zero while source profiles of other factors remain virtually the same as when there was no enforced rotation. The converted mass profiles from factor loading derived by the CFA model are shown in Table 3. Factor 1 represents the source from soil and construction dust. Although the mass percentages are different, the proportion of major crustal elements and sulfate are comparable in the results of the PMF and CFA models. The ratio of Al/Ca/Mg/Mn/SO42  is 1:2.07:0.36:0.03:2.41 in Table 2. It is 1:2.17:0.59:0. 04:2.39 in Table 3. The ratios of the remaining chemical components to Al are higher in Table 3 than the corresponding ratios in Table 2. The ions are not the major components in this factor. The ionic balance is poor in both tables and consequently it is difficult to assess quantitatively which result is better at describing soil and construction dust source. Factor 2 represents the secondary aerosol source of ammonium sulfate. There are only two chemical components (27.3% NH4+ + 72.7% SO42  ; SO42  / NH4+ = 2.66) in the artificial ideal source profile of ammonium sulfate [27]. The anion and cation are fully balanced. High percentage mass of C in factor 2 modifies the source profile from the ideal type. In Hong Kong, secondary aerosol comes mainly from long distance transportation [23]. High percentage mass of C can be attributed to secondary aerosol source of ammonium sulfate, being accompanied by some anthropogenic sources related to fuel combus-

84

Y. Qin et al. / Chemometrics and Intelligent Laboratory Systems 61 (2002) 75–87

Table 2 Mass profiles produced by PMF model with enforced rotation

Al Ca Mg Pb Na + V Cl  NH4+ NO3 SO24  Br  Mn Ni Zn C THC Cd K+ Ba Cu Equivalent anion/cation

Factor 1 (%)

Factor 2 (%)

Factor 3 (%)

Factor 4 (%)

Factor 5 (%)

Factor 6 (%)

Factor 7 (%)

Factor 8 (%)

Factor 9 (%)

6.87 14.20 2.45 0.00 1.08 0.00 0.00 0.01 0.50 16.58 0.01 0.23 0.00 0.10 56.71 0.41 0.00 0.78 0.05 0.01 5.26

0.21 0.00 0.00 0.04 0.05 0.00 0.00 12.68 0.03 37.03 0.01 0.00 0.00 0.08 49.85 0.00 0.00 0.01 0.01 0.00 1.09

0.05 0.29 0.01 0.39 0.01 0.00 2.62 18.40 73.57 3.01 0.04 0.04 0.03 0.59 0.11 0.08 0.01 0.71 0.00 0.06 1.27

0.00 1.01 3.43 0.00 33.52 0.00 52.86 0.00 1.19 6.23 0.09 0.01 0.00 0.00 0.00 0.01 0.00 1.63 0.01 0.01 1.09

0.00 0.62 0.02 0.06 0.00 0.00 0.00 0.00 0.00 3.08 0.02 0.01 0.00 0.12 86.50 9.52 0.00 0.00 0.02 0.02 383.70

0.00 0.92 2.66 0.00 24.52 0.02 0.01 0.01 3.85 67.25 0.04 0.01 0.00 0.13 0.00 0.00 0.00 0.46 0.11 0.01 1.36

0.21 1.37 0.00 1.07 0.02 0.04 0.00 0.05 0.12 13.82 0.02 0.13 0.00 1.05 71.66 0.01 0.02 10.39 0.03 0.00 1.08

0.18 0.37 0.02 0.00 0.07 0.62 0.01 0.02 0.04 10.64 0.03 0.00 0.17 0.05 87.64 0.01 0.00 0.02 0.07 0.07 52.96

0.35 3.62 0.01 0.00 6.14 0.01 2.35 0.04 9.32 14.19 0.05 0.01 0.00 0.12 0.00 0.03 0.01 7.88 0.32 55.54 1.09

Table 3 Converted mass profile derived from CFA model

Al Ca Mg Pb Na + V Cl  NH4+ NO3 SO24  Br  Mn Ni Zn C THC Cd K+ Ba Cu Equivalent anion/cation

Factor 1 (%)

Factor 2 (%)

Factor 3 (%)

Factor 4 (%)

Factor 5 (%)

Factor 6 (%)

Factor 7 (%)

Factor 8 (%)

Factor 9 (%)

4.58 9.97 2.69 0.19 1.09 0.01 1.95 0.34 5.71 10.94 0.01 0.20 0.01 0.21 57.34 2.37 0.00 2.27 0.07 0.06 3.18

0.37 0.58 0.03 0.25 0.09 0.01 0.32 9.49 2.77 28.10 0.01 0.03 0.00 0.17 54.95 0.82 0.00 2.00 0.01 0.00 1.62

1.56 2.41 1.18 0.07 2.37 0.01 0.72 16.93 39.74 17.09 0.05 0.07 0.02 0.39 14.89 1.19 0.00 1.02 0.03 0.26 1.39

0.30 1.53 2.52 0.26 21.41 0.01 28.99 1.21 5.25 13.88 0.04 0.01 0.00 0.01 22.72 0.90 0.00 0.86 0.00 0.09 1.20

0.03 1.11 0.05 0.08 0.76 0.01 0.49 0.02 0.91 1.62 0.02 0.02 0.01 0.08 74.85 19.81 0.00 0.02 0.01 0.10 1.81

0.19 0.74 0.11 0.32 0.37 0.02 5.37 0.63 9.07 11.30 0.41 0.01 0.00 0.05 65.34 4.24 0.00 1.64 0.00 0.17 6.68

0.66 2.76 0.09 0.88 1.63 0.00 0.07 5.57 1.92 19.75 0.02 0.10 0.02 2.95 58.24 2.83 0.02 2.43 0.00 0.06 1.34

0.60 0.96 0.34 0.09 0.62 0.27 1.97 7.99 4.70 24.86 0.01 0.00 0.10 0.19 53.25 1.94 0.01 1.46 0.02 0.61 1.86

0.86 1.29 0.14 0.19 0.69 0.08 1.82 2.56 10.39 12.77 0.04 0.00 0.01 0.34 50.70 5.27 0.00 2.94 0.05 9.86 2.47

Y. Qin et al. / Chemometrics and Intelligent Laboratory Systems 61 (2002) 75–87

tion. In Table 2, NH4+ and SO42  hold about 49.72% of the source mass. The ratio of SO42  /NH4+ (2.92) approaches the ideal value. The loading of NO 3 is very low (0.03%). The ratio of equivalent anion/cation is 1.07. The anion and cation are almost fully balanced. In Table 3, NH4+ and SO42  hold about 37.59% of the source mass. The ratio of SO42  /NH4+ (2.96) also approaches the ideal value. The loading of NO 3 is relatively high (2.77%). The ratio of equivalent anion/ cation is 1.67. It is noted that a fraction of SO42  has not been associated with NH4+ . The result produced by the PMF model is better at describing the secondary aerosol source of ammonium sulfate than that produced by the CFA model. Factor 3 represents the secondary aerosol source of ammonium nitrate. There are only two chemical  components (22.6% NH 4+ + 77.5% NO  3 ; NO 3 / + NH4 = 3.43) in the artificial ideal source profile of ammonium nitrate [27]. The anion and cation are fully balanced. In Table 2, NH4+ and NO 3 hold about + 91.97% of the source mass. The ratio of NO 3 /NH4 (3.80) approaches the ideal value and the loading of SO42  is relatively low (3.01%). There is a match between the anion and cation. The ratio of equivalent anion/cation is 1.25. In Table 3, NH4+ and NO 3 hold only about 56.67% of the source mass. The ratio of + NO 3 /NH4 (2.34) is lower than that of the ideal source profile value. The loading of SO42  (17.09%) is much higher than that in Table 2. The ratio of equivalent anion/cation is 1.39. The balance between the anion and cation is poorer than that in Table 2. The result produced by the PMF model describes an almost ideal secondary aerosol source of ammonium nitrate. However, high percentage of SO42  in the results derived by the CFA model makes the source profile to depart from an ideal type. Factor 4 represents a fresh marine aerosol source. In ideal fresh pure marine aerosol source profile [28], the percentages of major chemical components are 57.40%, 32.00%, 8.00%, 1.22% and 1.18% for Cl  , Na + , SO42  , Ca and K + , respectively. The equivalent anion/cation ratio is 1.26. In Table 2, the percentages for Cl  , Na + , SO24  , Ca and K + are 52.86%, 33.52%, 6.23%, 1.01% and 1.63%, respectively. These values are very similar to the ideal values. The equivalent anion/cation ratio (1.09) matches the ideal ratio. In Table 3, 22.72% of C has modified the source profile and makes it depart from

85

the ideal type. The percentages for Cl  , Na + , SO42  , Ca and K + are 28.99%, 21.41%, 5.25%, 1.53% and 0.86%, respectively. The structure of the source profile biases the ideal type. The result produced using the PMF model is much better at describing the fresh marine aerosol source than that produced by the CFA model. Factor 5 represents a vehicle emission source. C and THC are two major chemical components in this factor. They hold about 95% of the source mass in both Tables 2 and 3. The ratio of two trace elements, Pb/Br  is 0.33 in Table 2. It is similar to the ratio for a typical leaded auto exhaust ratio (0.38) [29]. However, the ratio of Pb/Br  is 0.25 in Table 3. This is less than the typical leaded auto exhaust ratio quoted above, because ions are not the dominating components in this factor. The ionic balance in each of these two cases is poor, especially in Table 2. It is difficult to quantitatively assess which of the results from the PMF and CFA is better at describing vehicular emission source. Factor 6 in Table 2 represents aged marine aerosol source with chloride replaced by sulfate. An ideal aged marine aerosol source profile with all chloride replaced by nitrate was used by Watson et al. [27], in + 2 which NO were the three major 3 , Na and SO4 chemical components used. They hold about 70.5%, 22.2% and 5.54% of the source mass respectively. The equivalent anion/cation ratio is 1.27. In factor 6 that was produced by the PMF model, SO42  , Na + and NO 3 are the three major chemical components. They hold about 67.25%, 24.52% and 3.85% of source mass respectively. The equivalent anion/cation ratio is 1.36. The ratio of SO42  /Na + in factor 6 is 2.74. This ratio is much larger than that in fresh sea salt (0.25). The factor 6 that was produced by the PMF model can describe an ideal aged marine aerosol source with chloride replaced by sulfate reasonably well. On the other hand, factor 6, as shown in Table 3, can hardly be interpreted as vehicular emission/road dust source. C holds about 65.34% of source mass. The ratio of two trace elements, Br  /Pb is 1.28, which is much lager than the ratio for a typical leaded auto exhaust ratio. The ionic balance is poor in this factor. It is difficult to quantitatively assess this source profile derived by CFA model. Factor 7 represents sources from incinerator emissions while factor 8 represents the source of fuel oil

86

Y. Qin et al. / Chemometrics and Intelligent Laboratory Systems 61 (2002) 75–87

burning. C and SO42  are two major chemical components for these two factors in both Tables 2 and 3. The measured profiles for the incinerator sources and fuel oil burning emission source in Hong Kong are unavailable. It is not suitable to compare these profiles with available incinerator emission source profile and fuel oil source profile used in other places or countries, because the compositions of municipal refuse and fuel oil are different to those in the USA, for example. Thus, it is therefore difficult to quantitatively assess which of the two models is better at describing these two source profiles. Factor 9 represents particulate Cu source. In Table 2, Cu is the most important chemical component. It holds about 55.54% of source mass while SO42  and NO3 hold another 23.51%. The equivalent anion/ cation ratio is 1.09. The anion and cation are almost fully balanced. The mass profile shows very clearly the characteristics of particulate Cu source that may come from the brushes of the motors in the high volume sampler and acid spray in the circuit board facilities. In Table 3, C holds about 50.70% of source mass while SO42  and NO3 hold another 23.16%. Cu holds only about 9.86% of source mass. The ionic balance is poor and therefore, this mass profile is very difficult to interpret as particulate Cu source. The result produced by the PMF model is much better at describing the particulate Cu source than that produced by the CFA model.

5. Conclusions PMF and CFA models have been tested using a Hong Kong aerosol database with very few missing data. The techniques of error estimates and enforced rotation were used in the PMF model trial while normal analysis was conducted in the CFA model trial. Advantages and disadvantages of the two models have been discussed and should be taken into consideration when selecting models and when applying the models to aerosol source identification. As a mathematical tool for data analysis, CFA model provides a convenient way for identifying aerosol source. It can be operated easily and results can be obtained quickly. For some elements that can serve as source tracers, CFA model can be used to qualitatively identify their sources as well as the PMF

model. However, it is difficult to physically interpret the negative loading that exists in the results of the CFA model. A conversion has to be done to derive factor mass profiles from the results of the CFA model. For some chemical components that may come from various sources, CFA model can not apportion these chemical components among the factors reasonably. Some source profiles derived by the CFA model may bias ideal patterns. In general, CFA model may be conveniently used to identify aerosol sources qualitatively. However, users of receptor models should be very cautious if CFA model was applied to quantitatively estimate the source structure and source contributions. As a special factor analysis model designed for air pollution source identification, PMF model, with more optional functions than the CFA model, provides an expert tool for aerosol source identification. It can handle elements more effectively by using error estimate. It can feed subjective information into factor analysis by using enforced rotation techniques. For the elements that can serve as source tracers, PMF model can treat them as well as the CFA model. For the chemical components that come from various sources, PMF model can apportion them among the factors in a more reasonable manner than the CFA model. Quantitatively, the factor mass profiles produced by the PMF model are better at describing the source structure than those derived by the CFA model. However, nonprofessionals find it is difficult to use PMF model. The subjective information fed into factor analysis must have enough chemical or physical reasons, in order to guard against wrong subjective information producing misleading results. In general, PMF model is a powerful tool for professionals in identifying aerosol sources and estimating the source contributions quantitatively. Two important functions, namely error estimate and enforced rotation should be used in PMF model trial.

Acknowledgements The authors gratefully acknowledge the financial support granted by the University of Abertay Dundee in this project. The support of Dr. David Lill, particularly in discussions on Chemistry, is also gratefully acknowledged.

Y. Qin et al. / Chemometrics and Intelligent Laboratory Systems 61 (2002) 75–87

References [1] S. Huang, K.A. Rahn, R. Arimoto, Atmos. Environ. 33 (1999) 2169 – 2185. [2] P.K. Hopke, Chemom. Intell. Lab. Syst. 10 (1991) 21 – 43. [3] C.A. Pio, I.M. Santos, T.D. Anaclet, T.V. Nunes, Atmos. Environ. 25A (1991) 669 – 680. [4] R.R. Yaaqub, T.D. Davies, T.D. Jickells, J.M. Miller, Atmos. Environ. 25A (1991) 985 – 996. [5] S. Wakamatsu, A. Utsunomiya, J.S. Han, A. Mori, I. Uno, K. Uehara, Atmos. Environ. 30 (1996) 2343 – 2354. [6] K.A. Rahn, Atmos. Environ. 33 (1999) 1441 – 1455. [7] D.H. Lowenthal, J.C. Chow, J.G. Watson, G.R. Neuroth, R.B. Robbins, B.P. Shafritz, R.J. Countess, Atmos. Environ. 26A (1992) 2341 – 2351. [8] J.C. Chow, J.G. Watson, D.H. Lowenthal, P.A. Solomon, K.L. Magliano, S.D. Ziman, L.W. Richards, Atmos. Environ. 26A (1992) 3335 – 3354. [9] J.G. Watson, J.C. Chow, Z. Lu, E.M. Fujita, D.H. Lowenthal, D.R. Lawson, Aerosol Sci. Technol. 21 (1994) 1 – 36. [10] W.C. Malm, K.A. Gebhart, Atmos. Environ. 30 (1996) 843 – 855. [11] P.K. Hopke, Atmos. Environ. 22 (1988) 1777 – 1792. [12] M.A. Cohen, P.B. Ryan, J.D. Spengler, Atmos. Environ. 25B (1991) 95 – 107. [13] E. Swietlicki, S. Puri, H. Hansson, Atmos. Environ. 30 (1996) 2795 – 2809. [14] P. Paatero, U. Tapper, Environmetrics 5 (1994) 111 – 126. [15] S. Juntto, P. Paatero, Environmetrics 5 (1994) 127 – 144.

87

[16] P. Anttila, P. Paatero, U. Tapper, O. Jarvinen, Atmos. Environ. 29 (1996) 1705 – 1718. [17] K. Juvela, K. Lehtinen, P. Paatero, Mon. Not. R. Astron. Soc. 280 (1995) 616 – 626. [18] P. Paatero, Chemom. Intell. Lab. Syst. 37 (1997) 23 – 35. [19] A.V. Polissar, P.K. Hopke, W.C. Malm, F. Sisler, Atmos. Environ. 10 (1996) 1147 – 1157. [20] V. Alexandr, A.V. Polissar, P.K. Hopke, P. Paatero, J. Geophys. Res. [Atmos.] 103 (1998) 19045 – 19057. [21] Y.L. Xie, P.K. Hopke, P. Paatero, L.A. Barrie, S.M. Li, J. Atmos. Sci. 56 (1999) 249 – 260. [22] Air Services Group, Hong Kong Environmental Protection Department, Air Quality in Hong Kong 1986 – 1995 (CDROM), Enviro-Chem Engineering and Laboratory, 1997. [23] Y. Qin, C.K. Chan, L.Y. Chan, Sci. Total Environ. 206 (1997) 25 – 27. [24] E. Lee, C.K. Chan, P. Paatero, Atmos. Environ. 33 (1999) 3201 – 3212. [25] P. Paatero, User’s Guide for Positive Matrix Factorization Program PMF2 and PMF3, University of Helsinki, 1998. [26] D.H. Lowenthal, K. Rahn, Atmos. Environ. 9 (1987) 2005 – 2013. [27] J.G. Wilson, J.C. Chow, Z. Lu, E.M. Fujita, D.H. Lowenthal, D.R. Lawson, Aerosol Sci. Technol. 21 (1994) 1 – 36. [28] T.R.S. Wilson, in: J.P. Riley, G. Skirrow (Eds.), Chemical Oceanography, vol. 1, 2nd edn. (1975) 365 – 413. [29] P.D.E. Biggins, R.M. Harrison, Environ. Sci. Technol. 13 (1979) 558 – 565.