Statistical distributions of particulate matter and the error associated with sampling frequency

Statistical distributions of particulate matter and the error associated with sampling frequency

Atmospheric Environment 35 (2001) 2907}2920 Statistical distributions of particulate matter and the error associated with sampling frequency Brian Ru...

218KB Sizes 0 Downloads 24 Views

Atmospheric Environment 35 (2001) 2907}2920

Statistical distributions of particulate matter and the error associated with sampling frequency Brian Rumburg , Richard Alldredge, Candis Claiborn * Laboratory for Atmospheric Research, Center for Multiphase Environmenal Research, Department of Civil and Environmental Engineering, Washington State University, Pullman, WA 99164-2910, USA Program in Statistics, Washington State University, Pullman, WA 99164-3144, USA Received 9 May 2000; accepted 21 November 2000

Abstract The distribution of particulate matter (PM) concentrations has an impact on human health e!ects and the setting of PM regulations. Since PM is commonly sampled on less than daily schedules, the magnitude of sampling errors needs to be determined. Daily PM data from Spokane, Washington were resampled to simulate common sampling schedules and the sampling error was computed for regulatory and distribution statistics. Probability density functions (pdf's) were "t to the annual daily data to determine the shape of the PM and PM concentration distributions and they were also "t    to the less than daily sampling to determine if pdf's could be used to predict the daily high-concentration percentiles. There is an error when using a less than daily sampling schedule for all statistics. The error expressed as a percentage di!erence from the everyday sampling for the PM mean was as large as 1.7, 3.4 and 7.7% and the 98th percentile error   was as great as 8.8, 18 and 67% for 1-in-2 day, 1-in-3 day and 1-in-6 day sampling, respectively. For PM the error in the  mean was 2.5, 4.7 and 8.6% for and the error in the 99th percentile was 27, 18 and 46% for 1-in-2 day, 1-in-3 day, and 1-in-6 day sampling, respectively. The PM and PM concentration data were best "t by a three-parameter lognormal    distribution and a generalized extreme value distribution, respectively. For PM and PM , as the annual mean    increased the mode concentration increased, but for PM the shape of the distribution also #attened. Predicting the daily  high percentiles from pdf's that were "t to the less than daily sampled data produced mixed results. For PM , the pdf's  predicted high concentrations were closer to the daily percentiles than the actual less-than-daily sampling percentile while for PM they were not.  2001 Elsevier Science Ltd. All rights reserved.   Keywords: Aerosol; Air pollution; Probability distribution; Level of accuracy; Infrequent sampling

1. Introduction The United States Environmental Protection Agency (EPA) revised the particulate matter (PM) standard in 1997 to include a standard for PM (aerodynamic dia  meter below 2.5 m). The new standard calls for a 3-year spatial average of the annual arithmetic mean concentration below 15 g m\ for PM and 50 g m\ for   PM and a 3-year average of the annual 98th percentile  * Corresponding author. Tel.: #1-509-335-5055; fax: #1509-335-7632. E-mail address: [email protected] (C. Claiborn).

below 65 g m\ for PM and the 99th percentile to be   below 150 g m\ for PM (Federal Register, 1997). The  EPA adopted a percentile-based, high-concentration standard instead of the one expected annual exceedance method, previously used for PM , to provide more  consistent results over di!ering sampling schedules and because the high concentrations tend to be outliers (Federal Register, 1997). The new PM standard calls for a minimum   monitoring frequency of 1-in-3 day and the PM stan dard was revised in 1997 from a minimum of 1-in-6 day sampling to 1-in-3 day sampling (Federal Register, 1997). While the EPA assumes that the concentration distribution of the sampled data is the same as that of the missing

1352-2310/01/$ - see front matter  2001 Elsevier Science Ltd. All rights reserved. PII: S 1 3 5 2 - 2 3 1 0 ( 0 0 ) 0 0 5 5 4 - 9

2908

B. Rumburg et al. / Atmospheric Environment 35 (2001) 2907}2920

data (Federal Register, 1997), there is a possibility of a sampling error when the entire population is not sampled. The magnitude of the sampling error and possible di!erences in the distribution of concentrations is important for regulatory compliance and PM health e!ects research. The sampling error issue adds a level of uncertainty to PM health e!ects research into the mechanisms of action, the concentration}response relationship, and whether ambient monitoring re#ects population exposures (Federal Register, 1996). The largest risk to human health from PM exposure may not be from the infrequent highconcentration days but from the larger number of days that have low- to mid-range concentrations (Federal Register, 1996). Saltzman (1987, 1997) addressed the in#uence of all pollutant concentrations and their frequency on the calculation of health risk. Using a lognormal pollutant concentration distribution, Saltzman (1987, 1997) showed that changes in the standard deviation of the pollutant concentrations had a major in#uence on the calculated health risk. A larger lognormal standard deviation had the e!ect of making the distribution #atter and would result in a larger overlap of the concentration}response curve and the pollutant frequency distribution. This, combined with the hypothesis that the greatest health e!ects are related to the low- to mid-range concentrations due to the greater frequency of occurrence of these lower levels (Federal Register, 1997), suggests that the shape of the concentration distribution and how it changes with increasing emission levels have important health implications.

2. Background Probability density functions (pdf's) have been used in the analysis of the distributions of data and for examining the frequency of high-concentration events. Larsen (1969) conducted an empirical analysis of gaseous pollutants in eight cities to show that the distributions for each pollutant "t a two-parameter lognormal model. It was also later shown that the empirical method used by Larsen was an application of extreme value theory applied to the lognormal distribution (Singpurwalla, 1972). The Larsen model was later modi"ed to include a threshold parameter when sampling near a point source (Larsen, 1977). Patel (1973) noted that Larsen (1969) assumed that the data were independent, and further showed that the data were autocorrelated. An analysis of the e!ects of autocorrelation and non-stationarity on estimating the maximum concentration of air pollutants showed that the autocorrelation does not a!ect the estimation of the distribution properties (Horowitz and Barakat, 1979). On the other hand, non-stationary sequences of concentrations did produce errors.

Georgoploulos and Seinfeld (1982) stated that independence, while not strictly valid, could be applied for simple statistical analysis. They also showed that the mean scales linearly with emissions while the extremes do not. Ott and Mage (1976) used the di!usion equation to demonstrate that the concentration of air pollutants should "t a three-parameter lognormal distribution. A physical explanation for why a lognormal distribution "ts the concentration distribution of atmospheric pollutants was given by Ott (1990) as the successive random dilution of a pollutant in the atmosphere. Ott also noted that as the number of dilutions increased the variance of the distribution of the concentrations also increased. Bencala and Seinfeld (1976) "t other distributions to the carbon monoxide data analyzed by Larsen (1969) and showed that other distributions can also represent air pollutant concentrations. Several right-skewed distributions have also been used to "t atmospheric pollutant concentrations with varying results when compared to the lognormal (Graedel et al., 1974; Kalpasanov and Kurchatova, 1976; Holland and Fitz-Simons, 1982; Berger et al., 1982). There have been few studies on the concentration distribution of aerosols (Savoie and Prospero, 1977; Savoie et al., 1987) and even fewer studies of urban aerosol distributions (Morel et al., 1999). Because the PM standard was promulgated only   recently, there has not been su$cient time to build up a su$cient database of daily PM concentrations to   evaluate the error associated with less-than-daily sampling of the concentrations. For one site in Spokane, Washington, daily PM and PM concentration have    been measured since January of 1995. The primary objective of this work is to determine if there is an error in the regulatory statistics when not sampling daily, and to determine the magnitude of that error using data from Spokane. The everyday PM concentration data was re-sampled to get various di!erent less-than-daily sampling data sets that were then compared to the daily sampling. Distribution statistics are computed to determine whether the sampling schedule has an e!ect on the distribution of all of the concentration data. The secondary objective is to "t the annual daily concentration data to pdf's to determine the shape of the concentration distribution. The third objective is to determine if pdf's can be "t to the less than daily sampling data to estimate the daily high-concentration percentiles better than the actual less than daily sampling data.

3. Methods Daily PM samples were collected as part of an epidemiological study in Spokane, Washington, starting in January 1995. This ongoing study is conducted by

B. Rumburg et al. / Atmospheric Environment 35 (2001) 2907}2920

Washington State University, the University of Washington, and the Spokane County Air Pollution Control Authority. The PM data were collected daily from January 1995 to December 1997 in a residential neighborhood, using a Versatile Air Pollutant Sampler (VAPS) (University Research Glassware; Chapel Hill, NC; Pinto et al., 1998). The VAPS has a PM sample inlet and  a virtual impactor that allows for the collection of two PM "lter samples and one coarse (PM }PM )      sample. The PM sample train utilized for mass   measurements included a Te#on "lter installed behind a sodium-carbonate-coated denuder and a citric-acidcoated denuder. A Nuclepore "lter was installed in the coarse particulate sample train. Sampling and quality assurance protocols are described by Sullivan (1997).

4. Statistical analysis 4.1. Percentile method and distribution statistics Percentiles were determined by sorting the valid data points for each year from highest concentration to lowest. All the sorted concentrations were ranked starting with one for the lowest concentration ascending to the highest concentration. To determine the rank that corresponded to the given percentile, the number of valid data points is multiplied by the percentile and rounded up. The concentration of interest was found by consulting the list of sorted and ranked concentrations. In addition to the arithmetic mean and the high percentiles the skewness and kurtosis were calculated to determine if the distribution of concentrations changed for the various sampling schedules. 4.2. Distribution estimation Parameters for all distributions were estimated using S-Plus statistics software (MathSoft; Cambridge, MA), and EnvironmentalStats for S-Plus (Probability, Statistics & Information; Seattle, WA). The distributions chosen to "t the data are shown in Table 1, and Millard (1997) gives complete details for all of the methods. Goodness-of-"t tests used to evaluate the "t of the pdf's to the actual distribution included the chi-square and the Kolmogorov}Smirnov (K}S) test (Kolmogorov, 1933; Smirnov, 1939). 4.3. Quality assurance The PM and PM daily concentrations were    compared to daily averaged PM and PM mass    measurements using the tapered element oscillating microbalances (TEOM) technology (Rupprecht and Patashnick; Albany, NY) co-located at the sampling site

2909

and to reconstructed mass from X-ray #uorescence analysis of the "lters. For quality assurance mass measurements were compared to TEOM measurements and to the reconstructed mass from X-ray #uorescence and inconsistent measurements were discarded. After quality assurance checks for PM there were 352, 338, and 350   samples remaining for further analysis for 1995, 1996, and 1997, respectively. For PM , there were 332, 337,  and 349 samples left after quality assurance checks for 1995, 1996, and 1997, respectively. The one standard deviation precision of the two di!erent VAPS samplers for PM is 2.4 g m\ and for PM it is 2.8 g m\    based on duplicate sampling. A positive bias in the PM mass concentration measurements was found   above 7 g m\ due to contamination from glycerin used to coat the denuder tubes by comparing the mass of Te#on "lters behind the denuder tubes to the mass of Te#on "lters not behind denuder tubes. Only a few measurements of the bias have been made, and since the cause is not understood and the magnitude varied no correction was made (Finn et al., 2001). Due to the uncertainty of the sampling error and bias at somewhat higher concentrations, it is not known how these errors will a!ect the results presented.

5. Results 5.1. Actual concentration distributions Frequency distributions of the daily concentrations are shown in Fig. 1 and the corresponding annual arithmetic means of PM for 1995}1997 are indicated.   The distribution of concentrations is not smooth and the frequency becomes more erratic above 15 g m\. The increase in the arithmetic mean is accompanied by an increase in the mode from year to year. The frequency distributions of the daily concentrations and the corresponding annual arithmetic means of PM  for 1995}1997 are shown in Fig. 2. The frequency distribution for PM is more variable than that for PM and    the distribution is #atter from year to year and more skewed to the right. Since the meteorology for both PM and PM is the same, the di!erences in the distri   bution shape are most likely to be due to the di!erences in emission frequency. One of the main sources of PM in  Spokane is dust which tends to be seasonal and highly variable, as opposed to the combustion-related sources of PM , which tend to be more constant throughout the   year (Haller et al., 1999). 5.2. Percentile analysis The population statistics and the range of values for the di!erent sampling schedules are shown in Table 2 for PM . To quantify the error between the daily statistics  

F(x; , , )"exp![1!(x!)/]G, O0 F(x; , , )"exp!exp[!(x!)/], "0 "location; "scale; "shape F(x; , )"1/[x (2)]exp![log(x! ]/(2 ) "mean log; "standard deviation log F(x; , , )"1/[(x! ) (2)]exp![log(x! )! ]/(2 ) "mean log; "standard deviation log; "threshold

Generalized extreme value

Millard (1997) Millard (1997) Johnson et al. (1995) Cohen (1988) Cohen and Whitten (1980) Gri$ths (1980) and Royston (1992) Evans et al. (1993) Millard (1997) Millard (1997)

Evans et al. (1993) Johnson et al. (1995) Johnson et al. (1995) Greenwood et al. (1979) Evans et al. (1993) Evans et al. (1993) Evans et al. (1993) Millard (1997) Hosking (1985)

Reference

MLE"maximum likelihood estimator; MME"method of moments estimator; MMUE"method of moments with an unbiased estimator; PWME"probability weighted moments estimator; MVUE"minimum variance unbiased estimator; MMME"modi"ed method of moments estimator.

Weibull

Three-parameter lognormal

F(x; , )"[(x?\)/?]exp[!(x/)?] "shape; "scale

F(x; , )"(x/)?\[exp(!x/)]/[()] "shape; "scale; ()"gamma function

Gamma

MLE/MME MVUE MME MMUE MMME Zero skew MLE MME MMUE

MLE MME MMUE PWME MLE MME MMUE MLE PWME

F(x; , )"(1/)exp[!(x!)/]exp!exp[!(x!)/] "location; "scale

Extreme value

Lognormal

Estimation methods

Formula

Distribution

Table 1 Probability distribution functions used to "t to the PM data with estimation methods used and references for those estimation methods (x denotes actual distribution data)

2910 B. Rumburg et al. / Atmospheric Environment 35 (2001) 2907}2920

B. Rumburg et al. / Atmospheric Environment 35 (2001) 2907}2920

2911

Fig. 1. Measured distributions of 24-h average PM concentrations measured in Spokane, WA, over a 3-year period with 100%   sampling.

Fig. 2. Measured distributions of 24-h average PM concentrations measured in Spokane, WA, over a 3-year period with 100%  sampling.

and the less-than-daily statistics, a percentage error method was used. The speci"c sampling schedule statistic was subtracted from the daily statistic and then divided by the daily statistic and expressed as a percent. To determine the magnitude of the uncertainty over all of the years for each sampling schedule, a standard devi-

ation was calculated. The errors in the arithmetic mean for 1-in-2 day, 1-in-3 day, and 1-in-6 day sampling for di!erent sampling strategies were within 1.7, 3.4, and 7.7%, respectively. The errors in the 98th percentiles were within 8.8, 18, and 67% as the sampling frequency decreased. As expected, the error in the 98th percentile was

2912

B. Rumburg et al. / Atmospheric Environment 35 (2001) 2907}2920

Table 2 PM and PM 1995 through 1997 population statistics and    the range of sample statistics Daily

PM2 5

Mean 1995 1996 1997 98th percentile 1995 1996 1997 Maximum 1995 1996 1997 Kurtosis 1995 1996 1997 Skewness 1995 1996 1997 PM8 Mean 1995 1996 1997 99th percentile 1995 1996 1997 Maximum 1995 1996 1997 Kurtosis 1995 1996 1997 Skewness 1995 1996 1997

1-in-2 day range

1-in-3 day range

Table 3 PM and PM standard deviations of the from the daily    values for the maximum and high percentiles (concentrations in g m\)

1-in-6 day range

11.7 11.8 12.1

11.7}11.8 11.7}11.9 11.9}12.3

11.8}12.1 11.5}11.9 11.7}12.5

11.0}12.6 11.0}12.4 11.8}12.9

31.9 39.2 33.2

29.1}31.9 37.4}42.0 32.5}34.3

28.4}34.3 32.0}41.6 31.6}35.0

29.1}53.4 32.0}42.0 30.9}38.9

67.1 42.1 45.9

53.4}67.1 39.2}42.1 36.7}45.9

47.6}67.1 39.2}42.1 43.8}45.9

25.7}67.1 32.0}42.1 32.5}45.9

11.53 5.44 4.77

7.92}13.31 5.06}5.55 3.64}5.52

6.13}18.41 4.49}5.51 3.50}5.93

2.99}15.43 2.78}7.42 3.16}6.39

2.12 1.53 1.30

1.71}2.49 1.43}1.59 1.09}1.47

1.59}2.91 1.33}1.61 1.07}1.48

0.77}2.93 0.86}2.06 0.94}1.72

16.3 17.9 19.2

15.9}16.7 17.9}17.9 19.2}19.3

15.6}16.7 17.8}18.3 18.5}20.1

15.5}17.7 17.4}18.5 18.5}20.3

52.4 56.0 47.3

40.5}61.7 48.5}71.3 45.9}48.2

43.1}52.4 45.7}60.8 46.1}47.3

38.2}76.5 44.5}71.3 37.6}49.0

76.5 79.5 65.2

62.7}76.5 60.8}79.5 50.0}65.2

61.7}76.5 56.0}79.5 50.0}65.2

38.2}76.5 44.5}79.5 45.5}65.2

8.67 7.77 4.01

6.35}9.18 4.42}10.31 3.37}4.19

6.33}11.29 4.35}10.44 2.97}4.61

2.71}9.34 3.43}14.88 2.66}4.74

1.90 1.76 1.06

1.43}2.11 1.24}2.20 0.85}1.16

1.55}2.36 1.26}2.22 0.87}1.17

0.79}2.26 1.08}2.98 0.58}1.36

Mean, 98th percentile, 99th percentile, and maximum concentrations in g m\; kurtosis and skewness are unitless.

greater than that for the arithmetic mean due to the infrequent occurrences of the high-concentration events. The underestimation of the maximum ranged from 20% for 1-in-2 day sampling, to 29% for 1-in-3 day sampling, and to 62% for 1-in-6 day sampling. The kurtosis showed a greater error, while this is to be expected since the higher moments are less stable; it again shows that the distribution of the concentrations was not the same. For 1-in-2 day sampling, it was within 31%, for 1-in-3 day,

1-in-2 day PM2 5

Maximum 99.5th percentile 99th percentile 98th percentile 97th percentile 95th percentile 90th percentile

1-in-3 day

1-in-6 day

7.48 7.08 6.30 2.84 1.16 1.15 0.88

6.46 4.22 2.94 2.06 1.22 0.59 1.01

7.65 5.02 4.53 3.12 1.96 0.94 1.10

PM8 Maximum 12.42 99.5th percentile 9.85 99th percentile 10.87 98th percentile 3.65 97th percentile 2.98 95th percentile 2.58 90th percentile 1.62

12.77 8.73 6.18 1.89 2.36 3.11 1.35

22.62 15.02 12.24 5.18 5.09 4.02 2.56

60%, and for 1-in-6 day, 74%. The error in the skewness was as high as 22% for 1-in-2 day sampling, 33% for 1-in-3 day sampling, and 65% for 1-in-6 day sampling. The population statistics and the range of values for the di!erent sampling schedules for PM are shown in  Table 2. The error in the arithmetic mean was 2.5% for 1-in-2 day sampling, 4.7% for 1-in-3 day sampling, and 8.6% for 1-in-6 day sampling. These values are higher than those for PM , indicative of the higher variability   of the distributions as shown in Figs. 1 and 2. The 99th percentile showed a larger error as sampling frequency decreased at 27, 18, and 46%. The maximum concentration was underestimated by 24, 30, and 50% as the sampling frequency decreased. The error in the kurtosis was nearly the same for both 1-in-2 day and 1-in-3 day sampling with 43 and 44%, respectively and for 1-in-6 day sampling it was 69%, which is comparable to that PM . The skewness, again, had a larger error for 1-in-2   day sampling at 30% than for 1-in-3 day sampling at 28% and the 1-in-6 day error was 69%. It is clear that reducing the sampling frequency to 1-in-2 day can a!ect the distribution of concentrations. The standard deviations of the maxima and the high percentiles from daily values for 1995}1997 for PM   and PM are shown in Table 3. For PM the 98th    percentile is reasonably stable for all of the sampling schedules. For PM the 99th percentile has a much  higher deviation than the 98th percentile. The PM dis tribution is much more variable at the right-tail end than the PM distribution and that is re#ected in the higher   values of the deviations.

B. Rumburg et al. / Atmospheric Environment 35 (2001) 2907}2920

5.3. Daily probability distribution functions Several pdf's were evaluated for their ability to describe the PM concentration frequency distribution   for the daily sampling schedules. The results of the goodness-of-"t tests by year and ranked based upon goodness-of-"t tests are shown in Table 4 for PM . Overall,   the three-parameter lognormal distribution with the zero-skew estimation method (Gri$ths, 1980; Royston, 1992) performed the best. Figs. 3}5 show the best-"tting pdf's and their corresponding parameter values for PM for 1995, 1996, and 1997, respectively. Analyzing   the distribution statistics of the three-parameter lognormal with the zero-skew estimation method we can see that the distribution mean increased each year while the standard deviation decreased. Recall that Saltzman (1987, 1997) showed that larger standard deviations had a major in#uence on the calculated health risk. The results of the goodness-of-"t tests for the evaluated pdf's are shown in Table 5 for PM . Figs. 6}8 show the 

2913

actual distributions and the best performing "tting pdf's for 1995, 1996, and 1997, respectively. The best-"tting pdf was the generalized extreme value with the MLE estimation method. The goodness-of-"t statistics for the generalized extreme value MLE distribution were fairly constant each year. Analyzing the parameters describing the generalized extreme value distribution with the MLE estimation method the location and scale increased each year, meaning that the distribution shifted to the right and #attened. Recall that the mode in the PM   distribution shifted to the right as the arithmetic mean increased but the PM distribution retained its peaked  ness, compared to the PM distribution which #attened.  5.4. Estimation of the extreme values using less than every daily sampling There is a possibility of missing the high-concentration days when not sampling daily thus, pdf's were "tted to the less than daily data to estimate the daily high

Table 4 PM "tted distribution type and estimation method and goodness-of-"t statistics and rank from best "tting to least in parenthesis.   Figures in italic show the two pdf's with the best "t Chi-square (1995)

K}S (1995)

Extreme value MLE MME MMUE PWME

48.13 58.18 57.04 45.71

0.0726 0.0842 0.0847 0.0698

Gamma MLE MME MMUE

23.44 (8) 39.86 (11) 39.47 (10)

0.0345 (7) 0.0660 (10) 0.0665 (11)

51.43 (12) 46.46 (9) 46.46 (9)

0.0782 (12) 0.0659 (7) 0.0656 (6)

21.88 (9) 19.30 (5) 19.36 (6)

0.0539 (9) 0.0505 (8) 0.0504 (7)

Generalized extreme value MLE PWME

31.58 (9) 18.22 (7)

0.0581 (9) 0.0335 (6)

23.97 (2) 21.36 (1)

0.0295 (1) 0.0348 (2)

18.40 (3) 17.80 (2)

0.0341 (3) 0.0329 (2)

Lognormal MVUE MLE/MME

11.47 (2) 10.71 (1)

0.0256 (2) 0.0256 (2)

27.82 (5) 28.32 (6)

0.0648 (5) 0.0363 (3)

19.36 (6) 19.48 (8)

0.0420 (5) 0.0420 (5)

Three-parameter lognormal MME MMUE MMME Zero skew

12.49 13.26 16.31 13.13

0.0282 (5) 0.0278 (4) 0.0358 (8) 0.0206 (1)

40.25 40.25 26.33 26.33

0.0667 0.0665 0.0686 0.0390

(9) (8) (10) (4)

31.24 (10) 31.24 (10) 18.76 (4) 16.72 (1)

0.0583 (10) 0.0583 (10) 0.0344 (4) 0.0295 (1)

Weibull MLE MME MMUE

44.18 (12) 50.04 (15) 50.67 (16)

0.0722 (13) 0.0880 (17) 0.0880 (17)

60.38 (15) 68.46 (17) 72.06 (18)

0.0940 (18) 0.0820 (13) 0.0820 (13)

34.36 (13) 37.96 (17) 39.64 (18)

0.0757 (18) 0.0670 (16) 0.0670 (16)

(14) (18) (17) (13)

(3) (5) (6) (4)

(14) (15) (16) (12)

Chisquare (1996)

K}S (1996)

55.41 67.96 55.41 49.44

0.0851 0.0750 0.0850 0.0876

(13) (16) (13) (11)

(7) (7) (3) (3)

(16) (11) (15) (17)

Chi-square (1997)

K}S (1997)

32.20 37.12 36.04 37.84

0.0605 (11) 0.0663 (12) 0.0663 (12) 0.0664 (14)

(12) (15) (14) (16)

2914

B. Rumburg et al. / Atmospheric Environment 35 (2001) 2907}2920

Fig. 3. 1995 measured distribution of 24-h average daily PM concentrations measured in Spokane, WA, and pdf's "tted to the   distribution.

Fig. 4. 1996 measured distribution of 24-h average daily PM concentrations measured in Spokane, WA, and pdf's "tted to the   distribution.

concentrations. Table 6 shows the standard deviations from the daily sampling for the less than daily sampling for PM . The "ve best performing "tted pdf's showed   are based on the sum of the standard deviations over the

tested percentiles. The three-parameter lognormal distribution with the zero-skew estimation method had the best overall "t but did not "t the high percentiles as well as other distributions. The 1-in-2 day sample schedule had

B. Rumburg et al. / Atmospheric Environment 35 (2001) 2907}2920

2915

Fig. 5. 1997 measured distribution of 24-h average daily PM concentrations measured in Spokane, WA, and pdf's "tted to the   distribution. Table 5 PM "tted distribution type and estimation method and goodness-of-"t statistics (lower numbers correspond to a better "t) and rank  from best "tting to least in parenthesis. Figures in italic show the two pdf's with the best "t Chi-square (1995)

K}S (1995)

Chi-square (1996)

K}S (1996)

Chi-square (1997)

K}S (1997)

Extreme value MLE MME MMUE PWME

39.68 47.27 45.49 36.77

0.0566 0.0676 0.0680 0.0627

32.21 47.54 46.05 45.18

0.0784 0.0744 0.0742 0.0809

27.62 27.01 25.69 25.81

0.0528 (15) 0.0455 (13) 0.0451 (12) 0.0426 (8)

Gamma MLE MME MMUE

25.38 (9) 39.55 (12) 39.42 (11)

0.0581 (12) 0.0535 (9) 0.0540 (10)

35.21 (12) 34.46 (11) 33.83 (10)

0.0776 (13) 0.0695 (9) 0.0700 (10)

18.95 (4) 20.64 (7) 21.60 (8)

0.0385 (4) 0.0318 (2) 0.0316 (1)

Generalized extreme value MLE PWME

15.13 (1) 17.79 (5)

0.0346 (4) 0.0315 (1)

16.39 (1) 18.88 (2)

0.0323 (1) 0.0409 (5)

16.06 (2) 17.27 (3)

0.0426 (8) 0.0402 (7)

Lognormal MVUE MLE/MME

16.40 (3) 16.40 (3)

0.0357 (5) 0.0357 (5)

20.00 (5) 19.13 (3)

0.0405 (3) 0.0405 (3)

20.52 (5) 20.52 (5)

0.0399 (5) 0.0399 (5)

Three-parameter lognormal MME MMUE MMME Zero skew

19.43 (7) 19.43 (7) 19.05 (6) 15.64 (2)

0.0341 (2) 0.0345 (3) 0.0412 (8) 0.0388 (7)

28.35 26.48 27.60 19.25

0.0510 (8) 0.0508 (7) 0.0480 (6) 0.0392 (2)

24.49 (11) 24.01 (9) 24.25 (10) 14.50 (1)

0.0437 (11) 0.0436 (10) 0.0479 (14) 0.0371 (3)

Weibull MLE MME MMUE

56.50 (16) 64.85 (18) 64.09 (17)

0.0809 (18) 0.0780 (16) 0.0790 (17)

55.89 (16) 59.26 (18) 57.64 (17)

0.0839 (16) 0.0890 (17) 0.0900 (18)

28.94 (16) 32.79 (18) 32.07 (17)

0.0588 (18) 0.0540 (16) 0.0540 (16)

(13) (15) (14) (10)

(11) (14) (15) (13)

(9) (15) (14) (13)

(8) (6) (7) (4)

(14) (12) (11) (15)

(15) (14) (12) (13)

2916

B. Rumburg et al. / Atmospheric Environment 35 (2001) 2907}2920

Fig. 6. 1995 measured distribution of 24-h average daily PM concentrations measured in Spokane, WA, and pdf's "tted to the  distribution.

Fig. 7. 1996 measured distribution of 24-h average daily PM concentrations measured in Spokane, WA, and pdf's "tted to the  distribution.

a higher deviation from actual sampling than the other actual sample schedules for the 99th and 99.5th percentiles. As a result of the high deviation for the 1-in-2 day sampling the corresponding pdf's had lower overall deviations than were achieved using the sampling. For 1-in-3

day and 1-in-6 day sampling the actual data had lower standard deviations. The standard deviations from the daily sampling for the 1-in-2 day, 1-in-3 day, and 1-in-6 day sampling and for PM as well as the "ve best performing "tted pdf's 

B. Rumburg et al. / Atmospheric Environment 35 (2001) 2907}2920

2917

Fig. 8. 1997 measured distribution of 24-h average daily PM concentrations measured in Spokane, WA, and pdf's "tted to the  distribution.

Table 6 Sample standard deviations (g m\) from the daily concentration percentile for actual less than daily sampling and pdf's "tted to the less than daily sampling for PM . Figures in italic show the PM high concentration standard percentile     1-in-2 day

Actual sampling

Three-parameter lognormal (MMUE)

Three-parameter lognormal (MME)

Gamma (MMUE)

Gamma (MME)

Extreme value (MMUE)

99.5% 99% 98% 97% 95% 90%

7.08 6.30 2.84 1.16 1.15 0.88 19.41

5.40 4.08 3.74 2.39 1.32 1.91 18.84

5.44 4.05 3.75 2.37 1.33 1.93 18.87

7.28 3.08 3.52 2.22 1.31 1.55 18.96

7.36 3.08 3.54 2.20 1.32 1.57 19.07

7.93 2.76 3.58 1.90 1.36 1.77 19.30

1-in-3 day

Actual sampling

99.5% 99% 98% 97% 95% 90%

4.22 2.94 2.06 1.22 0.59 1.01 12.04

Three-parameter lognormal (MMME) 3.64 4.11 3.06 2.06 1.09 2.28 16.24

Three-parameter lognormal (MMUE) 4.71 3.64 3.41 2.11 1.10 1.85 16.82

Gamma (MMUE) 6.72 2.58 3.19 1.95 1.06 1.45 16.95

Three-parameter Gamma lognormal (MME) (MME) 4.80 6.86 3.60 2.56 3.43 3.22 2.09 1.90 1.14 1.07 1.89 1.49 16.95 17.10

1-in-6 day

Actual sampling

99.5% 99% 98% 97% 95% 90%

5.02 4.53 3.12 1.96 0.94 1.10 16.67

Extreme value (MMUE) 8.18 3.82 4.11 2.76 2.21 2.15 23.23

Extreme value (MME) 8.36 3.84 4.17 2.74 2.23 2.20 23.54

Extreme value Gamma (PWME) (MMUE) 9.28 7.93 3.56 4.47 4.13 4.29 2.26 3.19 2.09 2.32 2.25 1.96 23.57 24.16

Gamma (MME) 8.13 4.42 4.32 3.11 2.31 2.04 24.33

Actual sampling

9.85 10.87 3.65 2.98 2.58 1.62 31.55

Actual sampling

8.73 6.18 1.89 2.36 3.11 1.35 23.62

Actual sampling

15.02 12.24 5.18 5.09 4.02 2.56 44.11

1-in-2 day

99.5% 99% 98% 97% 95% 90%

1-in-3 day

99.5% 99% 98% 97% 95% 90%

1-in-6 day

99.5% 99% 98% 97% 95% 90%

10.55 5.75 3.69 3.62 3.11 2.35 29.07

Extreme value (MMUE)

Three-parameter lognormal (MMME) 5.63 4.34 2.95 1.68 1.75 1.45 17.80

7.84 5.91 3.95 2.65 2.24 1.51 24.10

Three-parameter lognormal (MMME)

11.55 6.19 3.25 3.51 2.93 2.00 29.43

Extreme value (PWME)

Three-parameter lognormal (MMUE) 6.80 3.33 2.49 2.11 1.86 1.31 17.90

8.55 4.91 3.59 3.09 2.49 1.54 24.17

Three-parameter lognormal (MMUE)

10.78 5.82 3.66 3.70 3.16 2.34 29.46

Extreme value (MME)

Three-parameter lognormal (MME) 6.94 3.32 2.43 2.17 1.92 1.33 18.11

8.62 4.90 3.56 3.09 2.49 1.54 24.20

Three-parameter lognormal (MME)

Three-parameter lognormal (MMME) 9.55 7.23 5.10 3.78 3.01 1.99 30.66

Three-parameter lognormal (zero skew) 5.75 5.61 3.96 1.73 1.36 1.27 19.68

10.59 5.02 2.47 2.80 2.40 1.79 25.07

Extreme value (MME)

Gamma (MMUE) 10.89 6.15 4.24 3.83 3.17 2.39 30.67

Gamma (MMUE) 9.65 3.97 1.76 1.90 1.66 1.57 20.51

10.61 5.04 2.90 2.76 2.29 1.84 25.44

Gamma (MMUE)

Table 7 Sample standard deviations (g m\) from the daily concentration percentile for actual less than daily sampling and pdf's "tted to the less than daily sampling for PM . Figures in italic  show the PM high concentration standard percentile 

2918 B. Rumburg et al. / Atmospheric Environment 35 (2001) 2907}2920

B. Rumburg et al. / Atmospheric Environment 35 (2001) 2907}2920

based on the sum of the standard deviation over the tested percentiles are shown in Table 7. All of the "tted pdf's in Table 7 had lower standard deviations for all of the sampling schedules over the percentile range than the actual sampling. The distribution of PM was more  variable than PM especially at the higher concentra  tions as shown by the deviations of the actual sampling which are higher than for PM .   6. Conclusions The distribution of PM and PM concentrations    from 1995 to 1997 in Spokane are right skewed and the shape of the distribution of PM is more peaked than   for PM . The PM distribution had roughly the same    shape for all three years while the PM distribution  #attened out from year to year. Since the meteorology was the same for both PM and PM the change in the    PM distribution shape is most likely due to variability  in sources and emission frequency. The assumption that the distribution of the sampled data is the same as the less frequent unsampled data is not supported by the variation in the statistics versus the daily sampling. The PM arithmetic mean had a max  imum error of 1.7, 3.4 and 7.7% error for 1-in-2, 1-in-3, and 1-in-6 day sampling. The 98th percentile had a maximum error of 8.8, 18, and 67% error for 1-in-2, 1-in-3, and 1-in-6 day sampling. The 98th percentile for determining the high-concentration standard for PM has   a lower standard deviation for 1-in-2 day and 1-in-3 day sampling than the 99th percentile indicating that it is more representative of the distribution. The skewness and kurtosis varied widely for the di!erent sampling schedules. The revision in the PM standard from the  minimum sampling from 1-in-6 day to 1-in-3 day should reduce the error in the reported arithmetic mean and 99th percentile measurements. For PM , the 99th per centile showed at least double the error for all sampling schedules as measured by the standard deviation from the daily values when compared to the 98th percentile. For the daily PM , the best overall performing pdf   was the three-parameter lognormal distribution using the zero-skew estimation method. The generalized extreme value distribution using the MLE estimation method (Millard, 1997) "t the PM concentration distri bution best. The use of pdf's "t to the less-than-daily sampling schedules to estimate the daily high concentrations produced mixed results. Only for PM , when   1-in-2 day sampling was used, did the pdf estimate the daily high concentrations better than the actual sampling and that was due to the high deviation of the 1-in-2 day sampling. The pdf's did a much better job estimating the high concentrations for PM with all of the top "ve pdf's  producing lower standard deviations from the daily data than the actual less than daily sampling. The pdf's that

2919

best "t the entire distribution of data did not "t the extreme concentrations as well as other pdf's. It has been stated that the frequency of low- to midrange PM concentrations pose the largest health   threat. Using the arithmetic mean and the 98th percentile provides only limited information about the occurrence of those concentrations. The use of histograms and pdf's can enhance the understanding of the frequency of these concentrations. When trying to detect the high concentrations, whether for compliance monitoring or for health studies, there is a signi"cant possibility of missing the events and thus signi"cantly underestimating the high concentration with less-than-daily sampling. At this time and the current level of understanding about PM concentration distributions and health e!ects it seems prudent to sample every day and then possibly decrease sampling frequency when there is more known and the implications can be better understood. Further studies need to be conducted in other areas not only with one sampler but with multiple samplers so that the e!ect of spatial averaging can also be analyzed.

Acknowledgements The authors would like to thank the reviewers for their comments. Also thanks to Dennis Finn for organizing and collecting the data. The following individuals for help in collecting the samples and analysis: Sergey Napelenok, Tara Sullivan, Wendy Sullivan, Wes Struble, Paul Voeller, Jennifer Holmes, Lynette Haller, Chris Schmidt, and Wayne Osborne. Lee Bamesberger for helping keep all the equipment running. This research was funded by the US EPA and the Washington State Department of Ecology but this paper has not been reviewed by the either agency. Mention of trade names or commercial products does not constitute endorsement or recommendation for use.

References Bencala, K.E., Seinfeld, J.H., 1976. On frequency distributions of air pollutant concentrations. Atmospheric Environment 10, 941}950. Berger, A., Melice, J.L., Demuth, Cl., 1982. Statistical distributions of daily and high atmospheric SO -concentrations.  Atmospheric Environment 16, 2863}2877. Cohen, A.C., 1988. Three-parameter estimation. In: Crow, E.L., Shimizu, K. (Eds.), Lognormal Distributions: Theory and Applications. Marcel Dekker, New York (Chapter 4). Cohen, A.C., Whitten, B.J., 1980. Estimation in the three-parameter lognormal distribution. Journal of the American Statistical Association. 75, 399}404. Evans, M., Hastings, N., Peacock, B., 1993. Statistical Distributions. Wiley, New York.

2920

B. Rumburg et al. / Atmospheric Environment 35 (2001) 2907}2920

Federal Register, 1996. National Ambient Air Quality Standards for Particulate Matter. 40 CFR Part 50, 61, December 13, 1996, 65637}65683. Federal Register, 1997. National Ambient Air Quality Standards for Particulate Matter: Final Rule. 40 CFR Part 50, 62, July 18, 1997, 38651}38854. Finn, D., Rumburg, B., Claiborn, C., Bamesberger, W., Koenig, J., Larson, T., Norris, G., 2001. Sampling arti"cats from the use of denuder tubes with glycerol based coatings in the measurement of atmospheric particulate matter. Environmental Science and Technology 35, 40}44. Georgoploulos, P.G., Seinfeld, J.H., 1982. Statistical distributions of air pollutant concentrations. Environmental Science and Technology 16, 401A}416A. Graedel, T.E., Kleiner, B., Patterson, C.C., 1974. Measurements of extreme concentrations of tropospheric hydrogen sul"de. Journal of Geophysical Research 79, 4467}4473. Greenwood, J.A., Landwehr, J.M., Matalas, N.C., Wallis, J.R., 1979. Probability weighted moments: de"nition and relation to parameters of several distributions expressible in inverse form. Water Resources Research 15 (5), 1049}1054. Gri$ths, D.A., 1980. Interval estimation for the three-parameter lognormal distribution via the likelihood function. Applied Statistics 29, 58}68. Haller, L., Claiborn, C., Larson, T., Koenig, J., Norris, G., Edgar, R., 1999. Airborne particulate matter size distributions in an arid urban area. Journal of the Air and Waste Management Association 49, 161}168. Holland, D.M., Fitz-Simons, T., 1982. Fitting statistical distributions to air quality data by the maximum likelihood method. Atmospheric Environment 16, 1071}1076. Horowitz, J., Barakat, S., 1979. Statistical analysis of the maximum concentration of an air pollutant: e!ects of autocorrelation and non-stationarity. Atmospheric Environment 13, 811}818. Hosking, J.R.M., 1985. Algorithm AS 215: maximum likelihood estimation of the parameters of the generalized extremevalue distribution. Applied Statistics 34, 301}310. Johnson, N.L., Kotz, S., Balakrishnan, N., 1995. Continuous Univariate Distributions, 2nd Edition, Vol. 2. Wiley, New York (Chapter 22). Kalpasanov, Y., Kurchatova, G., 1976. A study of the statistical distribution of chemical pollutants in air. Journal of the Air Pollution Control Association 26, 981}985. Kolmogorov, A.N., 1933. Sulla Determinazione Empirica di una Legge di Distribuzone. Giomale dell' Instituto Itanliano degle Attuari 4, 83}91.

Larsen, R.I., 1969. A new mathematical model of air pollutant concentration averaging time and frequency. Journal of the Air Pollution Control Association 19, 24}30. Larsen, R.I., 1977. A new mathematical model of air pollutant concentration averaging time and frequency. Journal of the Air Pollution Control Association 27, 454}459. Millard, Steven P., 1997. Environmental Stats for S-PLUS Help, Version 1.1. Probability, Statistics and Information, Seattle, WA. Morel, B., Yeh, S., Cifuentes, L., 1999. Statistical distributions for air pollution applied to the study of the particulate problem in Santiago. Atmospheric Environment 33, 2575}2585. Ott, Wayne, 1990. A physical explanation of the lognormality of pollutant concentrations. Journal of the Air and Waste Management Association 40, 1378}1383. Ott, W.R., Mage, D.T., 1976. A general purpose univariate probability model for environmental data analysis. Computers and Operations Research 3, 209}216. Patel, N.R., 1973. A new mathematical model of air pollution concentration. Journal of the Air and Waste Management Association 23, 291}292. Pinto, J.P., Stevens, R.K., Willis, R.D., Kellog, R., Mamane, Y., Novak, J., Santroch, J., Benes, I., Lenicek, J., Bures, V., 1998. Czeck air quality monitoring and receptor modeling study. Environmental Science and Technology 32, 843}854. Royston, J.P., 1992. Estimation, reference ranges and goodness of "t for the three-parameter log-normal distribution. Statistics in Medicine 11, 897}912. Saltzman, B.E., 1987. Lognormal model for health risk assessment of #uctuating concentrations. American Industrial Hygiene Association Journal 48, 140}149. Saltzman, B.E., 1997. Health risk assessment of #uctuating concentrations using lognormal models. Journal of the Air and Waste Management Association 47, 1152}1160. Savoie, D.L., Prospero, J.M., 1977. Aerosol concentration statistics for the Northern Tropical Atlantic. Journal of Geophysical Research 82, 5954}5964. Savoie, D.L., Prospero, J.M., Nees, R.T., 1987. Frequency distribution of dust concentration in Barbados as a function of averaging time. Atmospheric Environment 21, 1659}1663. Singpurwalla, N.D., 1972. Extreme values from a lognormal law with applications to air pollution problems. Technometrics 14, 703}711. Smirnov, N.V., 1939. Estimate of deviation between empirical distribution functions in two independent samples. Bulletin Moscow University 2, 3}16. Sullivan, W.W., 1997. Chemical characterization of airborne particulate matter in Spokane, Washington. Masters Thesis, Washington State University.