Application of an entropy-based Bayesian optimization technique to the redesign of an existing monitoring network for single air pollutants

Application of an entropy-based Bayesian optimization technique to the redesign of an existing monitoring network for single air pollutants

Journal of Environmental Management 90 (2009) 2715–2729 Contents lists available at ScienceDirect Journal of Environmental Management journal homepa...

991KB Sizes 0 Downloads 17 Views

Journal of Environmental Management 90 (2009) 2715–2729

Contents lists available at ScienceDirect

Journal of Environmental Management journal homepage: www.elsevier.com/locate/jenvman

Application of an entropy-based Bayesian optimization technique to the redesign of an existing monitoring network for single air pollutants B. Ainslie a, C. Reuten a, D.G. Steyn a, *, Nhu D. Le b, c, James V. Zidek c a

Atmospheric Science Programme, The University of British Columbia, 6339 Stores Road, Vancouver, B.C., Canada V6T 1Z4 British Columbia Cancer Research Centre, Vancouver, B.C., Canada V5Z 1L3 c Department of Statistics, The University of British Columbia, Vancouver, B.C., Canada V6T 1Z4 b

a r t i c l e i n f o

a b s t r a c t

Article history: Received 6 June 2008 Received in revised form 29 January 2009 Accepted 21 February 2009 Available online 2 April 2009

We apply the entropy-based Bayesian optimizing approach of Le and Zidek to the spatial redesign of the extensive air pollution monitoring network operated by Metro Vancouver, in the Lower Fraser Valley, British Columbia. This method is chosen because of its statistical sophistication, relative to other possible approaches, and because of the very rich, two-decade long data record available from this network. The redesign analysis is applied to ozone, carbon monoxide and PM2.5 pollutants. The analysis provides guidance with regard to stations monitoring the three pollutants. For both ozone and PM2.5, the analysis indicates a need for more stations in the eastern part of the monitoring domain. A parallel analysis indicates that stations may be removed from the more central parts of the domain. An analysis of the carbon monoxide network produces results that are not nearly as clearly defined as those for the other two pollutants, presumably because carbon monoxide is a primary pollutant with many locally important sources. The work demonstrates the great utility of the analysis technique, and also provides statistically defensible guidance on the spatial redesign of this important monitoring network. Ó 2009 Elsevier Ltd. All rights reserved.

Keywords: Air pollution Carbon monoxide Entropy-based Bayesian optimization Monitoring-network design Ozone Particulate matter Spatio-temporal analysis

1. Introduction: air pollution network reviews Most large air pollution regulatory agencies around the world operate, as part of their legislated mandate, networks of stations at which point measurements of a wide array of air pollutants are taken. Data generated by such networks are used to achieve many of the mandated objectives of such agencies. Since such agencies serve the public by providing an understanding of ambient air quality, their operation is generally subjected to more-or-less regular review, often by outside agencies. Such reviews have objectives and mandates that may be quite narrow, or in some cases quite wide ranging. Narrow reviews serve to provide little more than an overview of network operation, a review of data quality and completeness over the period of review, and may be required to recommend changes in operating procedures. Wide-ranging reviews provide complete overviews of all aspects of the network operation, including network objectives, structure and procedures as well as an overview of statistical characteristics of data generated during the review period. In some cases, the review might be given

* Corresponding author. Tel.: þ1 604 827 5517. E-mail address: [email protected] (D.G. Steyn). 0301-4797/$ – see front matter Ó 2009 Elsevier Ltd. All rights reserved. doi:10.1016/j.jenvman.2009.02.016

a mandate to recommend changes to the spatial distribution of stations making up the network. Metro Vancouver (formerly known as the Greater Vancouver Regional District or GVRD) and the Fraser Valley Regional District (FVRD) operate an air quality monitoring network in the Lower Fraser Valley (LFV), British Columbia, Canada (http://www.gvrd.bc. ca/air/monitoring.htm). This network was recently subjected to its third and by far most comprehensive, wide-ranging review, that took a strategic approach to address the air quality monitoring network, its objectives and operation (RWDI, 2008). The review was deemed necessary to develop a strategy that was consistent with Metro Vancouver’s and FVRD’s Air Quality Management Plans (http://www.gvrd.bc.ca/air/pdfs/AQMPSeptember2005.pdf) and to ensure the network contributes effectively towards achieving community, provincial and federal goals to maintain and improve air quality in the region. The scope of the network review was broad and included a network description, establishment of network objectives and priorities, development of a strategic framework, a rigorous statistical analysis of the network and its stations, and recommendations for changes to the network. The review provided a process for continual improvement to guide the on-going enhancement of the LFV air quality monitoring. Research using data from the LFV network (Ainslie and Steyn, 2007) had demonstrated that the spatio-temporal patterns of some

2716

B. Ainslie et al. / Journal of Environmental Management 90 (2009) 2715–2729

pollutants (most notably ozone and PM) had changed over the past two decades. This discovery prompted the regulatory agencies to call for the review to provide guidance on a possible spatial redesign of the network. It is this aspect of the network review that this paper addresses. Spatial monitoring (sampling) networks, discussed in Mu¨ller (2001), Le and Zidek (2006) and Strobl and Robillard (2008) have been designed in a wide variety of ways (Dobbie et al., 2008), commonly to meet one or more specific objective. Unfortunately, even with a clear set of network objectives, there is no universally accepted methodology for implementing such objectives into the network design, with the approaches used being as varied as the regions being managed. For instance, Ignaccolo et al. (2008) use a clustering analysis of monitoring station data for three pollutants to identify a subset of representative monitoring sites and common pollutant spatial patterns. While this method is useful for the identification of redundant stations, it does not appear possible to use it to identify optimal locations for placement of new monitors. Both Elkamel et al. (2008) and Nunes et al. (2006) use statistical methods to determine an optimal network configuration given a large set of candidate sites. For Elkamel et al. (2008) the candidate sites come from mathematical modeling of pollutant dispersion from a set of smokestacks with inter-station correlation coefficients used to quantify site importance. For Nunes et al. (2006), candidate sites arise from data obtained during an intensive field campaign with an analysis of geostatistical variance used for site selection. In the present analysis, we are interested in identifying both redundant sites and potential new sites from an existing monitoring network. For this reason and because of the extensive nature of the Metro Vancouver network with its two-decade long, nearly complete data series, we use the entropy-based approach of Le and Zidek (2006) to redesign the monitoring network using hourly ozone concentration. 1.1. Air quality in the Lower Fraser Valley, B.C. The LFV, a roughly triangular valley situated on the western coast of North America that spans the Canada/USA border along the 49th parallel, has a population of 2.4 million, the majority living in the city of Vancouver as well as its surrounding communities. While the valley is relatively flat, it is flanked by the Coast Mountain Ranges (rising to over 2000 m ASL) to the North; the Cascades Ranges (rising to over 1200 m ASL) to the South; the Strait of Georgia to the Northwest and the Juan de Fuca Strait to the Southwest (see Fig. 1). In the East–West direction, the valley narrows from a width of 100 km at its western edge to a few kilometres at its eastern boundary some 90 km inland. Several major North–South running tributary valleys join the valley to the North. The LFV is comprised of three administrative regions: Metro Vancouver, the Fraser Valley Regional District and Whatcom County. Air quality in the Canadian portion of the valley is monitored by a network of 27 permanent air quality stations operated by Metro Vancouver, an intermediate level of government that consists of a legislated consortium of 27 municipal governments. Once a primarily industrial area, the LFV has largely de-industrialized since the 1950s and, as a result, air quality issues in the valley have changed from smoke and SO2 to ozone and fine particulates (Steyn et al., 2005). Since the 1980s, there has been a general improvement in the region’s air quality: summertime hourly ozone concentrations have steadily decreased, to the point where they seldom exceed the Canadian federal government ‘‘tolerable’’ threshold of 153 ppb (as a maximum 1 h average) and on only a few days a year do they exceed the ‘‘acceptable’’ limit of 82 ppb (Vingarzan and Taylor, 2003); peak wintertime CO has dropped more than 30% since the early 1990s (GVRD, 1998).

Additionally, there has also been a slight decrease in measured mean and peak inhalable PM10 since continuous measurements began in 1994. Finally, there has been little change in measured respirable PM2.5 trends since continuous measurements began in 1999 (GVRD, 2002). The improvement in local air quality has been achieved despite a 50% increase in population (BC Ministry of Water, Land and Air Protection and Environment Canada, 2003), largely the result of significant reductions in precursor emissions: a 24% decrease in annual NOx emissions; a 35% decrease in VOC emissions and 54% decrease in CO emissions (GVRD, 2003).

1.2. Spatial redesign of existing networks Like many monitoring networks, the GVRD air quality network was not planned as an integrated whole. Instead it evolved into its present form over time and the redesign was intended to maximize its capacity to meet modern demands. Unfortunately, this network modernization is faced with a multiplicity of objectives. For instance, for a seemingly definite goal like monitoring to detect noncompliance with regulatory standards, a number of possible objectives arise depending on how compliance is interpreted (Chang et al., 2007). Optimum designs may then be found by optimizing an objective function that embraces them all (Zhu and Stein, 2005, 2006; Sampson, Guttorp and Holland, http://www.epa. gov/ttn/amtic/files/ambient/pm25/workshop/spatial/sampsn2.pdf). Making things even more difficult, future demands on the network cannot be foreseen and thus a complete list of specific objectives cannot be made in advance. This is a standard problem in designing long-term monitoring networks (Zidek et al., 2000). Yet the high cost of network construction and maintenance dictates that the design be rational in some sense. One way of dealing with this demand on the one hand and the impossibility of specifying a comprehensive list of specific objectives on the other, would be to ask that the new network maximizes the amount of information it will provide about the environmental field it is being asked to monitor. Equivalently, it should maximally reduce uncertainty about that field. These ideas can be formalized through the use of entropy that quantifies uncertainty and can be used as an objective function. That idea, in a general context, goes back at least as far as Lindley (1956). However, its use specifically for designing networks was first proposed by Caselton and Husain (1980) as well as Caselton and Zidek (1984). It has also been embraced in this context by a number of other writers (Shewry and Wynn, 1987; Sebastiani and Wynn, 2000; Bueso et al., 1998, 1999; Angulo et al., 2000; Angulo and Bueso, 2001). Caselton et al. (1992) use it to rank monitoring sites (stations) for possible elimination, an idea extended by Wu and Zidek (1992). Guttorp et al. (1993) solve the complementary problem of extending a network while Le and Zidek (1994) extend that work to a multivariate setting. While maximization of entropy provides an objective design criterion in the absence of other criteria, additional constraints can be incorporated into the method. To make this more explicit, examples of other design objectives that could be included in the analysis are: a) Costs, for example for relocation, initial purchase, and operation. b) Placing higher priority on non-compliant areas. c) Increasing monitoring in areas with greater population densities. In this particular network review none of these objectives was included, each for specific reasons.

B. Ainslie et al. / Journal of Environmental Management 90 (2009) 2715–2729

2717

Fig. 1. Topography of the LFV and monitoring station locations (triangles). Two-digit codes ‘nn’ are abbreviations for ‘Tnn’. Station ‘XX’ is an abbreviation for ‘TYXX’, which was chosen as an acronym for a time series merged from three closely spaced stations in Abbotsford. Major municipalities and the Strait of Georgia are named. The border between the United States and Canada is shown as a dashed line, the coastline in solid. The coarse-resolution land mask misses some of the smaller islands in the Strait of Georgia.

a) Zidek et al. (2000) incorporate sampling costs, although that refinement was not used here for lack of good cost information. b) In Canada, noncompliance has no immediate legal consequences and therefore was not a guiding principle for the network review. Furthermore, the fairly long ozone time series (more than twenty years) indicates that areas in exceeding of Canada Wide Standards (CWS) are moving eastward in the LFV (Ainslie and Steyn, 2007). c) In the LFV, population density alone is not a good surrogate for expected health impacts. For example, in the smaller communities in the eastern part of the valley the population is more vulnerable to ozone because of a high percentage of people performing strenuous outdoor work (Brauer and Brook, 1997). Furthermore, weighting the design by population density would have required assumptions on future population and workforce development. For this network review, policy makers did not provide guidelines for additional constraints. However, policy makers may ask in the future for the inclusion of additional constraints resulting from different policy scenarios, and the method presented here would permit the inclusion of such constraints. 2. Application of entropy-based Bayesian optimization to LFV network 2.1. Entropy-based design As noted in Section 1.2, reduction of uncertainty (entropy) can be used as a design (in reality, redesign) criterion in situations where specific objectives are hard to define or may be unforeseen. We employ the general method developed in Le and Zidek (2006) in this network redesign exercise. A brief technical summary of the method is presented in Appendix 1. Here we provide a high level description. 2.2. Air quality data in the Lower Fraser Valley Within the network, monitoring stations are not equipped with a standard suite of analyzers resulting in some variability in spatial

coverage of monitoring stations for the different pollutants (see Table 1 and Fig. 1). While the 18 CO, 20 O3 and 13 PM monitoring locations are relatively uniformly distributed along the East–West direction, there is a much poorer spatial coverage in the North– South direction (Fig. 1). 2.2.1. Data availability, missing data, station relocations and averaging Network monitoring began in 1982 but only the 20-year period between 1984 and 2003 was used in the analysis. The pre-1984 readings have larger uncertainties than the later observations due to less precise instrumentation. In the 20-year study period, the Abbotsford monitoring station has been relocated twice: from a downtown location (T28), to a rural airport location (T34) to its present suburban location (T33). In the present analysis, the three separate Abbotsford observational records have been treated as a single record (called TXX) by splicing together the three time series. For the short times when there was an overlap between records, the observations were averaged. No attempt has been made to adjust the observations for location changes. This likely introduces some discontinuities in the observational time series. Analyses for each pollutant were based on single daily values for each station. Calculation of these values (which depends on pollutant) is based on the proposed CWS methodologies, and is described in more detail in the appropriate sections below. The statistical analysis proceeds by first calculating daily residuals obtained by removing, a linear regional (not site specific) model from the observational data, with factors including: year, month, day of week, week in year and maximum daily temperature at the mid-valley Environment Canada Abbotsford meteorological station (YXX) as covariates. The Abbotsford station was chosen to characterize mid-valley temperatures because it is less strongly affected by marine influences than the Environment Canada meteorological station situated at the Vancouver International Airport (YVR). Relatively high temperatures at YXX are a necessary but not sufficient condition for high ozone concentrations in the LFV. Including YXX temperatures as covariates provides a simple way of removing for example most rainy or very windy days. On the other hand, inclusion of YXX temperatures still permits different large-scale meteorological conditions which affect ozone

2718

B. Ainslie et al. / Journal of Environmental Management 90 (2009) 2715–2729

Table 1 LFV Monitoring network (source GVRD, 2004). Stn ID

Station Name

City/Municipality

Latitude

Longitude

Elev.

O3 Start

CO Start

PM2.5 Start

T1 T2 T4 T6 T9 T12 T13 T14 T15 T17 T18 T20 T25 T26 T27 T29 T30 T31 T32 T35 T28 T33 T34 TYXX

Downtown Vancouver Kitsilano Kensington Park Second Narrows Rocky Point Park Chilliwack North Delta Burnaby Mountain Surrey East Richmond South Burnaby South Pitt Meadows Seymour Falls Mahon Park Langley Hope Maple Ridge Vancouver Airport Coquitlam Horseshoe Bay Abbotsford Downtown Abbotsford Abbotsford Airport Abbotsford

Vancouver Vancouver Burnaby North Vancouver Port Moody Chilliwack Delta Burnaby Surrey Richmond Burnaby Pitt Meadows North Vancouver North Vancouver Langley Hope Maple Ridge Richmond Coquitlam West Vancouver Abbotsford Abbotsford Abbotsford Abbotsford

49 49 49 49 49 49 49 49 49 49 49 49 49 49 49 49 49 49 49 49 49 49 49 49

123 070 1900 123 090 4800 122 580 1600 123 010 1300 122 500 5700 121 560 2600 122 540 0600 122 550 2000 122 410 3900 123 060 3000 122 590 0800 122 420 3200 122 580 1700 123 050 0100 122 340 0100 121 290 5800 122 340 5500 123 090 0800 122 470 2900 123 160 3500 122 170 3300 122 180 3500 122 200 3800 122 180 3500

56 63 133 <15 <15 11 111 360 79 <15 145 20 220 80 82 131 100 <15 61 30 80 80 58 80

1987 1987 1987 1987 1987 1987 1987 1987 1987 1987 1990 1998 1987 1990 1987 1992 1998 1998 2001 – – – – 1992

1995 1995 1995 1995 1995 1995 1995 1995 1995 1995 1995 1998 1995 1995 1995 1996 1998 1998 2000 2002 – – – 2000

– – 2003 – 2003 1995 – – – – 2003 1999 – – 2002 – – 1999 – 2002 – – – 2002

160 5600 150 4200 160 4600 180 0600 160 5100 090 2200 090 3000 160 4700 070 5800 080 2900 120 5500 140 4300 260 2400 190 2600 050 4400 220 1100 120 5400 110 1100 170 1700 220 0500 020 5800 020 3400 010 3100 020 3400

concentrations to various degrees. Including year as a covariate removes long-term trends for the entire region, e.g. caused by changes of background ozone. Month and week of year can account for some seasonal trends. Finally, day of week is a surrogate for emission variations for example due to different commuting pattern on weekdays and weekends (Pryor and Steyn, 1995). The observational data sets are 95%–97% complete. However, application of the entropy-based design requires complete data sets for each station, once it begins operation (although all stations do not have to have concurrent start times; staircase patterns of observational data are admissible when the sites are arranged from the ones with shortest monitoring histories up to those with the longest ones; an example is shown in Table 2). Initially, gap filling of missing data was attempted based on previous, subsequent, and various mean values. However, without adding randomness to each gap filled datum, autocorrelations and spatial correlations were substantially increased and variances underestimated. Consequently stations at those locations missing data were removed during the course of the network redesign. On the other hand, with

increasingly realistic variations added to the deterministic values, agreement with the known withheld values substantially deteriorated. Because of these gap-filling problems, it was decided to only use days that had valid readings from all operating stations. Finally, while networks can be designed with many objectives in mind, it was decided that this analysis would focus on designing a network capable of resolving surface pollutant fields on those days, which have the potential for elevated concentrations. For each pollutant, such ‘candidate’ days are determined in terms of various meteorological conditions. It should be mentioned that while meteorological conditions are important determinants of local air quality, additional factors also play important roles in influencing local concentrations, and as such, ‘candidate’ days, while tending to have higher than average mean concentrations, are not exclusively days with extreme or exceedance level concentrations. 2.2.1.1. Ozone. Of the 27 stations that reported ozone measurements during the 20-year period from 1984 to 2003, 6 stations did not fit into the staircase pattern required by the entropy design

Table 2 Scheme of staircase pattern of time series t1, ., t10 for stations S1–S7. Available data are denoted by ‘#’. The staircase pattern requires complete data underneath the staircase indicated by the bold line. On the left, data are missing at stations S3 and S5. On the right is the cleaned up matrix with t8 removed and the time series for S3 shortened and moved to the right of S6.

t1 t2 t3 t4 t5 t6 t7 t8 t9 t10

S1 S2 S3 S4 S5 S6 S7 # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #

t1 t2 t3 t4 t5 t6 t7 t9 t10

S1 S2 S4 S5 S6 S3 S7 # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #

B. Ainslie et al. / Journal of Environmental Management 90 (2009) 2715–2729

theory (Table 2) and had to be dropped. Station T14 on Burnaby Mountain was also dropped. Because the station is roughly 310 m above the average valley bottom its correlations with stations in the valley are much lower than correlations among valley stations. The remaining time series from 20 stations were arranged in a staircase pattern. To assess the spatial structure of the ozone footprint on days when ozone concentrations arise from local photochemical production and not by intrusions of stratospheric ozone (Bovis, 2003), candidate days were chosen on the basis of: days between April 1 and September 30; no or low daily rainfall (<2.5 mm measured at YVR); and maximum temperature at the inland midvalley Abbotsford meteorological station (YXX) greater than 24.7  C. This last value was chosen because Burrows et al. (1995) show, using a classification and regression tree, that high (peak ozone > 40 ppb) and low (peak ozone < 40 ppb) days in the LFV can be distinguished by this threshold. There were a total of 751 such days over the 20-year period. Following the proposed CWS ozone standard, for each candidate day, a single maximum 8-h average concentration was calculated. This method involved first calculating 24-separate 8-h averages for each day (where an 8-h running mean was calculated if at least 75% valid measurements were available); then, for each day, the maximum running mean was calculated if at least 75% (18 out of 24 h) mean values were available or if the maximum of all available 8-h means exceeded 65 ppb. A total of 233 days were used in the ozone analysis and the final data set starts in 1987. 2.2.1.2. Carbon monoxide. Since CO concentrations in the LFV are generally highest in the wintertime, the analysis considered only observations between Dec 1 and Feb 28 (inclusive). Furthermore, only days when the forecast morning ventilation was poor were included in the analysis. The ventilation index (VI), forecast by Environment Canada twice daily for the Vancouver International Airport station, is a number between 1 and 100 based on the product of model predicted mixing depth and surface wind speed with a calculated VI below 33 being classified as ‘poor’. Archived VI values extend back only as far as 1995, and that determined the CO analysis start date. A total of 575 winter days between 1995 and 2003 had a forecast poor morning VI. Guided by the CWS CO standard, a single maximum 1-h CO measurement was calculated for each day. A maximum hourly CO concentration was calculated if at least 75% of all hours (18 out of 24 h) had a valid measurement or if the 1-h maximum exceeded 35 ppm. Only 474 days of the 575 candidate days had valid observations from all stations in operation. 2.2.1.3. PM2.5. For the PM2.5 analyses, daily 24-h average concentrations were used. Consistent with the CWS, for each day, the 24-h average was calculated if at least 75% of all hourly measurements (16 out of 24 h) were valid or if the 24-h average exceeded 25 mg m3. For example, on a day with 16 valid hourly measurements the mean over these 16 valid hourly measurements was calculated and included in the data set. On a day with 15 valid hourly measurements, the mean was calculated over these 15 valid hourly measurements; only if that mean value exceeded 25 mg m3, it was included in the data set. Two separate PM2.5 analyses were undertaken: a winter and a summer analysis. Local studies show the primary seasonal maximum of PM2.5 concentrations occurs during summer and early fall (BC Ministry of Water, Land and Air Protection and Environment Canada, 2003), with typical extreme values in a range 15–20 mg m3 and in rare cases exceeding the CWS of 25 mg m3. This seasonal maximum is believed to be caused by secondary PM2.5, a result of photochemical processes similar to those leading to high ozone

2719

concentrations (Pryor and Steyn, 1994, 1995). We therefore used ozone candidate days for the summer PM2.5 analysis. Locally, a secondary seasonal PM2.5 maximum is also reached in late winter and early spring (BC Ministry of Water, Land and Air Protection and Environment Canada, 2003), with slightly lower maxima (10–20 mg m3), in line with PM10 observations (McKendry, 2000). There is strong evidence that high winter PM2.5 concentrations originate from local sources like wood burning (Larson et al., 2007) and therefore behave similarly to CO. For winter PM2.5 analyses we therefore used CO poor-VI days (VI < 33). The summer PM2.5 data set comprised 320 days from 1995 to 2003 beginning with one station in 1995 and ending in 2003 with nine stations. Three more stations measured PM2.5 during this period but were discontinued before 2003 and could therefore not be included in the analysis. The winter PM2.5 data set contained 272 days with from 2 to 9 stations from 1999 to 2004. 2.3. Implementation A numerical implementation of the methods of Le and Zidek (2006)1 written for the R-statistical programming language (Hornik, 2007), was used to examine the network redesign for O3, CO and PM2.5 under the following scenarios: adding one, two, and three new stations; removing one, two and three stations; and moving one, two and three existing stations. For each pollutant, an exploration of the data was first conducted to assess the suitability of the chosen covariates and the normality of the residuals. It was found that residual normality could be improved by using the Box– Cox transformation (Box and Cox, 1964) on the daily observations: y ¼ xl  1=l where l was chosen to be 0.75 for O3 and PM2.5 and 0.5 for CO. For each pollutant, time series data were sorted according to the start date, with data from the station with the most recent start date first. The unobserved data were interpolated through the hierarchical framework as described by Le and Zidek (2006) and the estimated (unconditional) spatial correlations between all monitoring (gauged) sites were calculated. Implementing that framework is simplified by transforming the geographic locations into coordinates in a new space called the ‘‘dispersion space’’ (D), where the environmental field’s spatial covariance for responses at any two locations depends only on the distance between the locations in D and not on direction (thereby making the field stationary and isotropic). That is, using the spatial correlation matrix and site coordinates, geographic space was re-configured (transformed) into dispersion space wherein inter-station correlations follow an isotropic model. Station re-distribution was achieved by an alternating iterative algorithm, which first tries to optimally relocate the stations using a multidimensional scaling method followed by a fitting of the resulting D-space correlations using an exponential model (variogram). Two hundred iterations usually led to convergence with negligible variations on the order of 104. To reduce computational time, iterations were stopped at a convergence to a tolerance of 105. After convergence a thin-plate smoothing spline was fit between the original locations and new locations in order to reduce the amount of folding in the coordinate transformation and avoid numerical instabilities. For all pollutants, it was found that a smoothing value of 50 (called ‘lambda’ in the code) was necessary. This procedure reduces the large number of unknown covariances to far fewer hyperparameters describing the transformation and exponential fit. Next a regular grid of 30  10 ungauged sites was laid out, fitting within the existing east–west and north–south extent of the

1

Source code available at: http://enviro.stat.ubc.ca/statisticalanalysis/.

2720

B. Ainslie et al. / Journal of Environmental Management 90 (2009) 2715–2729

monitoring network. This grid of ungauged sites was chosen with grid spacing of approximately 6 km, smaller than the mean interstation spacing in the entire network. The method by Sampson and Guttorp (1992) was then used to extend the spatial correlation matrix from the ungauged sites to the new gauged sites by: first estimating station correlations at the new locations using the thin spline and the fitted variogram; then estimating variances at these locations; and finally combining the estimated variances with the estimated correlations. Results from the SG analysis were combined with the spatial correlations at the gauged sites to obtain hyperparameter estimates at the new ungauged sites. Hyperparameters were used to spatially interpolate pollutant mean and variance fields through the generation of 1000 realizations of the predictive pollutant distribution at the new ungauged sites. To examine network redesign, a subset of potential sites were chosen from the 30  10 grid of ungauged sites. These potential sites were chosen to lie along the valley floor since the boundarylayer characteristics and the processes affecting pollutant advection, chemical transformation and removal at higher altitudes and over sloping terrain can be substantially different than at the valley floor (e.g. McKendry and Lundgren, 2000; Reuten et al., 2005) making a comparison with observations over the valley bottom difficult or even impossible. This, for example, is the reason for the much weaker correlation of air quality monitoring data on Burnaby Mountain (T14) with any other station data in the valley than the correlation between all other valley-station data. Furthermore, and secondly, operating stations outside of the valley bottom may be technically not feasible or too expensive. Fig. 2 gives an example of the regular grid of ungauged sites along with the subset of potential sites for ozone. For a specified number, the approach to design in this paper, seeks the subset with number of sites, presently monitored or unmonitored, about whose future responses uncertainty (entropy) is greatest. Monitoring those sites in future would then maximize the amount of information the network provides, by eliminating their uncertainty. Some of the existing monitoring sites might not make it into that subset. Note that the entropy associated with

Fig. 3. This figure depicts the entropy associated with the network that results from dropping of one ozone station. Stations are shown sorted from left to right in order of increasing entropy. Entropy is shifted so that the station with the lowest entropy (T29) is assigned a value of 0. Dropping stations with greater entropy implies less information loss than dropping stations with smaller entropy.

a subset of site responses turns out to be a function of the log determinant of their spatial covariance matrix. To determine the optimal location of new stations, log determinants of the station covariance matrices were determined for all combinations of the new stations (one, two, or three, depending on the scenario), with the station combination with the largest log determinant being chosen as the optimal configuration. Removing existing stations and determining new locations can be combined to optimize station relocation: In a first step, stations that contribute minimal information (whose removal leaves a network with maximal entropy) are removed. In a second step, new stations were added to this reduced station network at ungauged sites in such a way that the entropy of the resulting network is maximal. Effectively this is a relocation of the removed stations to the new optimal sites. To determine network station removal, entropy with one, two, or three stations removed was calculated for all possible combinations. The combination with largest entropy was used to identify potentially redundant stations and was also used to seed a repeat of the above analysis aimed at identifying optimal sites to which existing stations should be moved. As an example, Fig. 3 shows entropy (shifted to yield zero for the lowest value) associated with

Fig. 2. Grid layout of potential station locations (circles) for ozone analysis. The region enclosed by the existing monitoring stations was divided into 300 regular grid locations. From these, mostly locations over land near sea level were selected as potential future monitoring locations.

B. Ainslie et al. / Journal of Environmental Management 90 (2009) 2715–2729

Fig. 4. Dispersion of ozone covariance before (o) and after (þ) station relocation in Dspace. Also shown is the exponential model for the dispersion.

dropping each of the stations that measure ozone, sorted by increasing entropy. Entropy is the greatest for dropping T15 (Surrey East) followed by T4 (Kensington Park), which implies minimal information loss when dropping these stations. Stations providing maximal information to the network are all at the edges of the network: T29 (Hope), T31 (Vancouver Airport), and T25 (Seymour Falls). This is plausible if all stations make measurements of similar quality and are reasonably well distributed; in such a situation points near the domain edge have fewer nearby neighbours to provide information on the air pollution field.

3. Results Fig. 4 shows the dispersion of the ozone covariance matrix after the D-space reconfiguring. Dispersion (22  correlation) is used instead of the usual variogram to emphasize that the spatial correlation structure in the geographic space may not be isotropic. Also shown is the exponential model for the dispersion. The redistribution has greatly improved the ozone covariance stationarity since the points now cluster much closer to the fitted exponential model than in its geographical counterpart which is not shown for brevity. The corresponding relocation for each station is given in Fig. 5, where it can be seen that the 4 coastal stations (T31, T17, T01

2721

and T02) are moved in a southeast direction, while the most eastern stations (T29, T12 and TXX) are moved to the northwest. It is not surprising that the coastal stations T1, T2 and T17 were moved to similar locations: are all strongly influenced by local vehicular traffic, especially during the morning and afternoon rush hours. However, station T31, located at the Vancouver International airport, a site less influenced by traffic emissions and more strongly influenced by the winds along Georgia Strait (Fig. 1), is moved away from all of the other stations, suggesting its ozone time series is weakly correlated with the rest of the network. The re-distribution also pulls the eastern most stations together, suggesting their ozone time series exhibit stronger statistical relationships than their physical separations imply. As an additional check on the analysis, interpolated mean fields on selected days were examined against hand drawn fields based on observations. Fig. 6 shows contours of maximum 8-h ozone concentrations for August 13th, 2001 during the Pacific 2001 intensive field campaign (reference), a day when ozone concentrations exceeded the 1-h National Ambient Air Quality Objectives (NAAQO) of 82 ppb. Because the network of ungauged sites is setup within the footprint of the existing network, no interpolation is made across the international border running through the middle of the valley. The ozone field shows a mid-valley maximum, east of the metropolitan Vancouver area. This pattern of ozone concentrations is typical of observed and modeled 1-h averaged ozone fields experienced during summer days conducive to local photochemical ozone production. For the 1-h averaged ozone fields, the location of the maximum typically is found between mid-valley and the eastern edge of the network and appears to be sensitive to: the local meso-scale wind regime (Ainslie and Steyn, 2007), the ratio of upwind VOC to NOx emissions (Ainslie, 2004), the reactivity of the VOC emissions and local meteorological processes that facilitate the day-to-day carryover of both precursor emissions and oxidants (McKendry et al., 1997, 1998a). 3.1. Ozone network redesign For the one-station scenario of the ozone network redesign (Fig. 7), the analysis suggests removing the suburban T15 (Surrey

Fig. 5. Re-distribution of station in D-space. Original locations (arrow tail) and isotropic locations (arrow tip). Also shown are local coastline and the Canada/US border.

2722

B. Ainslie et al. / Journal of Environmental Management 90 (2009) 2715–2729

Fig. 6. Interpolated maximum 8-h ozone concentrations (top) and standard deviations (bottom) in ppb over the Canadian portion of the LFV on August 13, 2001.

East) site and moving it up-valley between the two most eastern stations (T29 (Hope) and T12 (Chilliwack)). If an additional station is to be added to the network, it should also be placed in between these two eastern stations. For the two-station scenario, both the suburban T15 and the urban T04 (Kensington Park) station should be removed. The suggested new locations for these stations are mid-valley, near the town of Mission and again, between the two eastern most stations. However under this scenario, locations for the two additional stations do not coincide with the locations of the relocated stations. Instead, if two stations are to be added, they should be both placed between the two eastern most stations. For the three-station scenario, urban stations T04 and T18 as well as suburban stations T15 should be removed. One of the stations should be moved mid-valley to Mission and the other two between the eastern most stations. These relocation sites are identical to the sites chosen for the addition of three new stations. The suggested removal of stations T15 (Surrey East), T04 (Kensington Park) and T18 (Burnaby South) is in line with the associated large entropies discussed earlier, and shown in Fig. 3.

3.2. Carbon monoxide network redesign For the winter CO analysis, it was found that the analysis could not find an isotropic configuration of stations in D-space with the inclusion of the two most eastern stations: T29 and T12, revealing a possible limitation of the original Sampson–Guttorp approach as suggested by an anonymous referee. Newer versions may overcome this approach, but software for these was not available for this analysis. So for expediency we decided to drop these two stations from our analysis, thereby limiting the pool of available sites in the eastern part of the valley. Examination of the estimated covariance matrix showed that both of these stations have low and sometimes negative correlations with the other network stations. As a result, the analysis was performed without these stations. The reduced analysis (see Fig. 8) gives mostly consistent results for moving or adding one, two, or three new stations. For all scenarios, most information about the CO field can be gained by placing new stations along the Canada/US border near the mouth of

B. Ainslie et al. / Journal of Environmental Management 90 (2009) 2715–2729

2723

Fig. 7. Addition (circles), removal (crossed out triangles) and relocation (pluses) of existing ozone stations (triangles) for the ozone three-station scenario.

the valley. For both the one- and two-station scenarios, locations for the addition and movement of stations are identical. In all scenarios, a site near a busy ferry terminal and a large container/ coal port is identified, perhaps in response to the potentially large CO emissions arising from the loading and unloading of passenger ferries and the movement of vehicles and ships associated with the container and coal port. As with the ozone analysis, the suburban T15 site is first to be identified for either removal or transfer, followed by the urban T18 and T4 sites. 3.3. PM2.5 network redesign The summer PM2.5 analysis (Fig. 9) points to a combination of new sites in the western half of the airshed along the US border and sites in the eastern part of the valley, for the addition of three new

monitors. The result appears to be a mixture of winter CO (western locations potentially driven by local sources) and summer O3 (eastern locations driven by photochemistry coupled with advection) results. The substantial differences between moving stations and adding stations are a further indication that the current PM2.5 monitoring network is not robust, and that the period of record is too short to support this kind of analysis. This is further corroborated by the analysis’s suggestion to move two stations side by side while one of two new stations would be best located at the north-eastern corner of the region. The three stations that can be moved with the least impact, T20 (Pitt Meadows), T35 (Horseshoe Bay), and T27 (Langley) are among the four stations with the highest entropy. Notice that T35 (Horseshoe Bay) is at the edge of the gauged region and moving them would therefore not be an expected outcome.

Fig. 8. As Fig. 7 but for CO.

2724

B. Ainslie et al. / Journal of Environmental Management 90 (2009) 2715–2729

Fig. 9. As Fig. 7 but for PM2.5 on spring and summer days with conditions conducive to high ozone concentrations.

The relatively low information content of this site may be caused by its relatively low readings of primarily background concentration and a lack of substantial local sources of PM2.5. The same seems to apply to T27 (Langley). 3.3.1. Winter PM2.5 The analysis also had difficulties with wintertime PM2.5. In the LFV, high winter PM2.5 levels appear to be associated with local emissions sources (Larson et al., 2007), that lead to greater spatial variability. The analysis produces low negative correlations at several of the ungauged sites, if we use the standard 30  10 grid of sites lying within the existing network footprint. This anomaly may well stem from the configuration of the existing network stations. For PM2.5, the most westerly station (T35) is also the most northerly

(see Fig. 10), and it is about 24 km further north than the most easterly station (T12). With fewer stations in the eastern part of the domain, the analysis had to infer concentrations at the ungauged sites in the northeast corner of our grid with little support from nearby stations. With weak inter-station correlations for the winter PM2.5, this was not possible with the model. As a result, the analysis was carried out by restricting our grid of ungauged sites to a smaller footprint within the existing network. For the three-station scenario, the analysis suggests removing the eastern most station (T12) along with two stations within the metropolitan Vancouver area (T09 and T18) and placing new stations along the coast to the south of Vancouver and along the US/ Canada border. While only one location, near the region’s major ferry terminal and coal shipping port, is chosen for both station

Fig. 10. As Fig. 7 but for PM2.5 on winter days with poor forecast ventilation index.

B. Ainslie et al. / Journal of Environmental Management 90 (2009) 2715–2729

relocation and station addition, all of the new locations target the southern part of the valley close to the coast. In agreement with the entropies of the stations (not shown), dropping of T09 followed by T18 removes the least amount of information from the network. Accordingly, these two stations are moved in the two- and onestation scenarios (not shown). New station locations of moved stations and of added stations agree much better in the two- and one-station scenarios than in the three-station scenario, suggesting that the former are more robust than the latter. Our interpretation is that there is not enough information in the data to confidently infer the relocation of more than two stations. The optimal new locations in the two- and one-station scenarios agree with the two south western ‘add’-location just north of the Canada/US border of the three-station scenario.

4. Discussion and conclusions For ozone, new stations should be located in the eastern part of the valley and stations could be removed from the central and western part of the valley. However, this analysis has not examined the need for placing stations away from the developed valley areas. Observations by McKendry et al. (1997, 1998b) in some of the region’s tributary valleys show the potential for higher ozone concentrations there than in the main valley. Thus, any network design must decide if the network monitoring objective is to resolve the region’s total oxidant burden or to assess public exposure. The winter CO analysis showed that this analysis framework cannot deal with observations from poorly correlated stations. This may not be a shortcoming of the methods but perhaps a reflection of the inadequacy of the network. Of the two stations that were removed from the CO analysis, the Hope (T29) station is in a relatively undeveloped location, far from Metro Vancouver and not strongly influenced by local sources. However, the anomalous behaviour of the Chilliwack station is harder to understand, given that it is located within a few kilometres of the major freeway passing through the valley. Also the poor-correlation problem may well be confounded with our choice of poor ventilation days for the wintertime CO analysis: meteorological dispersion under these strongly stable days is weak and highly variable. Extending the set of ‘‘candidate days’’ to include meteorological conditions with lower static stability might aid in this regard. Summer PM2.5 results are similar to ozone: add stations upvalley and remove suburban stations that represent PM2.5 concentrations in the city of Vancouver. These placements highlight the influence of photochemical processing and advection of the regional summertime PM2.5 footprint. However, station placements are not as far east, suggesting importance of different chemical processes and emission sources for PM2.5 formation. The winter PM2.5 results stress the need for more stations within the developed urban areas at the valley-mouth, suggesting wintertime PM2.5 concentrations are more strongly influenced by local emission sources. The placement of additional stations in the Richmond and Delta regions is consistent with the Larson et al. (2007) analysis, who find high nighttime PM2.5 ‘‘hotspots’’, which they attribute to residential wood burning stoves, in the Richmond and south Surrey areas. Because of the small number of stations within the PM2.5 network, some care should be taken when interpreting the PM2.5 results. For instance, the winter three-station results do not necessarily imply that the Chilliwack (T12) station is redundant and should be removed, but rather given the sparse network, it contributes the second lowest amount of new information to what appears to be a largely urban generated PM2.5 plume.

2725

In general, the method seems to prefer putting stations primarily along network boundaries and secondarily near maxima. This appears reasonable given the algorithms used. Locations near the network boundaries tend to be surrounded by fewer stations and to be less correlated with the existing network. Therefore addition of stations at these locations tends to add more information than at locations in the centre of the network region. Locations near maxima tend to have greater absolute variances than other locations. Hence the addition of stations near hotspots likely adds more information to the network than stations added to locations with typically lower pollutant concentrations. We found the analysis was sensitive to gap-filling schemes. To get around this, we used only observations from days when all stations had valid readings. By doing this, we have thrown out potentially useful observations. To compensate, we have included observations over a longer period. While the removal of systematic time components (including trends) will at least partially compensate for any possible temporal changes, it is possible that the inclusion of events from too long ago will bias station placements to locations that no longer warrant them. This concern highlights the importance of performing network reviews on a regular basis. While large amounts of the observational data have not been used due to missing data and difficulties arising from gap-filling procedures, these omissions are not likely to alter the station placement/relocations for two reasons. Firstly, the analysis gives similar results when only a subset of the data is used, and in the coastal Lower Fraser Valley setting, elevated pollution levels occur under a relatively narrow set of meteorological conditions. This has been well-documented for summertime ozone at both synoptic (McKendry, 1994 and Pryor et al., 1995) and at the meso-scale (Taylor, 1992 and Ainslie and Steyn, 2007). Future analyses should try to use several pollutants at once so that the total information content is increased because of crosspollutant correlations. Care needs to be taken, however, to not increase the variance by combining pollutants that are poorly correlated. Furthermore, such an approach produces design suggestions for combined pollutant measurements, while separate pollutant analyses would likely suggest different locations for different pollutants. This analysis is based on observations taken in the past. However, future changes in emissions, the region’s population, its industry, transportation and climate are likely to impact the network design. For instance, in this region there already appears to be a decade long shift in the ozone footprint, with ozone maxima more common now over the far eastern edge of the valley than at mid-valley as was common in the mid-1980s. Research is being undertaken to determine how the ozone footprint has been affected by: changes in emission levels and locations; potential changes in the reactivity of the hydrocarbon emission; and changes in the relative abundance of NOx to VOC emissions. Outcomes from this research will help inform the network design with respect to the future.

Acknowledgments This work was supported by funds from the Canadian Foundation for Climate and Atmospheric Sciences and The Natural Science and Engineering Research Council of Canada. The entire review process (of which this work represents a small part) was managed and performed by RWDI AIR Inc. under contract. Metro Vancouver kindly made their network data available to DGS and his collaborators for research purposes. We thank the four anonymous

2726

B. Ainslie et al. / Journal of Environmental Management 90 (2009) 2715–2729

reviewers for their suggestions, which greatly improved the manuscript. Appendix 1

A.1. Entropy-based network redesign

those to be added to the list of gauged sites. The resulting network ðaddÞ0 ð2Þ0 will then yield responses in ðYf ; Yf Þ0 hG, of dimension (g þ u2)  1. Conditional on D we may express the total uncertainty Hnþ1(Yf,q) as:

TOT ¼ PRED þ MODEL þ MEAS with

Le and Zidek (2006) describe a general theory whereby the reduction of uncertainty (entropy) is used as a criterion for the redesign of environmental monitoring networks. We present a brief description here to make our results interpretable. The entropy of a random vector Y with f being its joint probability density measures the vector’s total uncertainty and is defined as

HðYÞ ¼ E

  log f ðYÞ hðYÞ

where h(,) is a possibly non-integrable reference density (see Jaynes, 1963). Its inclusion makes H(,) invariant under transformations of the scales of response measurements. Over affine spaces it is commonly taken to be identically 1, in the units in which elements of the random vector are measured. The entropy-based design is to select a subset of g sites among all p sites and choose a subset that yields the highest entropy with Y representing pollution levels at the p sites. For example, Caselton et al. (1992) had sought to optimally partition the column response vector Y so that (after re-labelling its coordinates) it could be ð1Þ0 ð2Þ0 (2) written as ðYf ; Yf Þ0 where Y(1) f and Yf correspond to the future sites that will be ungauged and gauged (and employing the 0 vector transposition). The partition is chosen to have a highest value ð2Þ0 forHðYf Þ. Subsequently, Bueso et al. (1998) describe the two broad design objectives of extending or reducing a given network. Although Y is of primary interest, its probability density function, f ð$Þ ¼ f ð$jqÞ, may depend on an uncertain parameter vector q. Increasing knowledge reduces the latter’s uncertainty and hence it is called ‘‘epistemic’’ uncertainty. In contrast, its cousin, ‘‘aleatory’’ uncertainty that represents chance process variation, does not diminish with increasing knowledge. (No matter how often a coin known to be fair is tossed, uncertainty about the outcome of the next toss never changes). Although we could use the marginal distribution of Y to compute the total uncertainty, separate interest in q may well suggest we include it in the overall calculation of total uncertainty, i.e. H(Y,q) (Zhu and Stein, 2006). In the next subsection, this design approach is justified through the decomposition of the total entropy. It is then showed that choosing a subset of g sites with highest entropy yields an optimal solution. In most applications, the goal is to redesign an existing network where available historical measurements could be incorporated to improve the optimal selection as described here. A.1.1. The entropy decomposition For expository simplicity we assume here that only one pollutant (univariate) is considered although similar decomposition can be obtained for multiple pollutants (multivariate). Responses are measured at g ‘‘existing-gauged’’ sites and at times, ð2Þ ð21Þ ð2gÞ ; .; Yj Þ0 . Let j ¼ 1, ., n, to yield g  1 data vectors, Yj ¼ ðYj , j ¼ 1, ., n}, and let D denote the observed data with D ¼ {Y(2) j (2) Yf ¼ (Y(1) f ,Yf ) denote the future response vector (e.g. f ¼ n þ 1) with and Y(2) correspond to the u ungauged and g existing-gauged Y(1) f f sites. To augment the existing network by gauging an additional u2 sites among the ungauged sites, seek an optimal partition ð1Þ ðremÞ0 ðaddÞ0 0 ; Yf Þ after suitable re-labelling, Y(rem) :u1 1 Yf ¼ ðYf f :u2  1, being the future ungauged sites (denoted by U) and Y(add) f

PRED ¼ E½  logðf ðUjG; q; DÞÞjD; MODEL ¼ E½  logðf ðQ; jDÞÞjD; and

MEAS ¼ E½  logðf ðGjDÞÞjD: The first of these quantities, PRED, refers to the uncertainty remaining in the predictive distribution for the ungauged sites, conditional on the gauged sites, model parameters and data. The second, MODEL, refers to the uncertainty in the model parameters. Finally MEAS stands for the uncertainty in the distribution of the responses at the gauged sites. Now observing G in future will eliminate all uncertainty about it, with a resulting expected reduction in uncertainty given by MEAS. Thus it will be optimal to maximize MEAS. That choice will automatito choose X(add) f cally minimize PRED þ MODEL, since TOT is fixed. In other words it will minimize our epistemic uncertainty about q and the residual aleatory uncertainty about responses at the ungauged sites, after observing G. A.1.2. The model The distribution of Yf and q given the observed data D (i.e. joint posterior distribution) is required for the application of this design approach. To obtain the joint posterior distribution entails specifying the joint distribution for all the responses, past and future, measured and unmeasured. The number of gauged sites would change from time n (now) to time n þ 1 (future). But the sampling and prior models of Le and Zidek (2006) incorporate not only this change but more generally allow for monotone pattern of missing data due to the gauged sites coming online at different times. Conditional on the hyperparameters, the response matrix, Y:p  (n þ 1) is assumed to follow a Gaussian-Generalized Inverted Wishart (GIW) model. The model assumes a known autocorrelation structure for the random field over times j ¼ 1,.,n þ 1, although this requirement can often be met by removing common (all-site) systematic temporal components as well as common autoregressive components. The hyperparameters involved in our Gaussian-GIW model can be written as:

H ¼ fb0 ; j; F; dg Here b0 denotes the prior mean of the matrix b of regression parameters that link the site responses to covariates that can represent trend, meteorology and so on. The hypercovariance kernel j in the GIW serves as the prior distribution of the spatial covariance matrix S, the latter being fixed over time. F denotes a matrix that rescales S to obtain the prior covariance matrix of b conditional on S. Finally, d denotes the vector of degrees of freedom for the GIW. Although we have considered only the univariate case (one response species), that assumption can be relaxed for multivariate case (k pollutants) in which j is assumed to be separable: j ¼ L  U where L:p  p now represents the spatial covariance between sites while U:k  k captures the between species covariance within

B. Ainslie et al. / Journal of Environmental Management 90 (2009) 2715–2729

2727

Fig. 11. Flow Chart describing analytical steps needed to perform the network analysis implemented in this work, and described in the Appendix.

sites. Thus the design approach described in this appendix for the univariate case can be readily extended to the multivariate setting. Most of the hyperparameters are estimated as type II maximum likelihood estimators using the EM algorithm in the software provided online (http://eviro.stat.ubc.ca) to accompany Le and Zidek (2006). In developing this general method embraced by the software used in our analysis, these authors adopted this empirical Bayes step for two reasons. The first is technical: to make computation times short (although as an anonymous referee pointed, this is not an issue in this project), because they could get the entropy (below) in an explicit form (and avoid the long computational times that MCMC sampling would entail for this Np – hard problem of optimal network redesign. Note that Np – hard problems basically have no upper bound on computational costs for their solution). The second is more substantive: to achieve a ‘‘matching’’ prior model that ensures the predictive distributions for unmeasured responses are reasonably well-calibrated (95% intervals contain 95% of the actual observations – something that is hard to achieve within a purely Bayesian hierarchical approach.). As a referee pointed out much work has been done to get matching priors. However, the results depend on having noninformative priors and the justification is usually based on use of large samples (see for

example Mukerjee and Reid, 1999 where agreement between Bayesian credibility sets and confidence sets is used as a way of defining ‘‘noninformative priors’’). Furthermore, in the asymptotic case, Savage’s principle of precise measurement would justify use of the type II MLE above. In the first stage, estimates are found for the hyperparameters associated with the gauged sites. The second entails the construction of a spatial covariance matrix for the entire network by an appropriate method. For example, it could be constructed empirically using geostatistical methods (the variogram), if spatial stationarity seemed an appropriate assumption. However that approach requires isotropy and stationarity of the spatial field, a property that certainly does not obtain for the Metro Vancouver ozone concentration field as shown by Ainslie and Steyn (2007). Thus we have rejected the geostatistical approach in favor of that of Sampson and Guttorp (1992). In the case of k > 1 dimensional response site vectors, that technique is applied to the estimated L (above). That approach transforms geographic site coordinates into new coordinates for sites in an imaginary new space called ‘‘dispersion space’’. There, the usual assumptions of geostatistics (isotropy) apply and variogram-like models can be fitted to the estimated intersite covariances for gauged sites with results of the type seen in Section 3.

2728

B. Ainslie et al. / Journal of Environmental Management 90 (2009) 2715–2729

A.1.3. The predictive distribution The posterior distribution computed with under the modeling assumptions above predicts both past and future (time n þ 1) observations, the latter conditional on the data that would come from the newly gauged sites once these have been specified. The posterior means offer point predictors of those responses while the distribution as a whole, allows us to assess the uncertainty of those predictors; in particular, this allows historical exposures at unmonitored sites to be hindcast from those that were monitored (Le et al., 2001). They allow us to convolve the unknown function with impact distributions so as to incorporate that uncertainty fully in a hierarchical model. Finally by drawing samples from that posterior, we can estimate the intractable posterior distribution of complicated metrics such as the moving 3 year average of the annual 4th largest maximum daily 8 h average ozone concentration. A.1.4. The entropy criterion Following the arguments in Le and Zidek (1994), the entropy, MEAS, of the predictive distribution described in the previous subsection conditional on the hyperparameter set H can be represented as a sum of entropies representing the past and future responses for the unmonitored sites as well as those for the responses that were not measured at currently monitored sites. These results can now be used to compute the objective entropy criterion we need for augmenting a network. The entropy criterion is to maximize MEAS as described in A.1.1 with respect to an ‘‘add’’ subset of ungauged sites among all possible subsets of ungauged sites. This is equivalent to selecting a subset of ‘‘add’’ sites among the potential sites so that the entropy restricted to these ‘‘add’’ sites would be maximized. The entropy criterion for augmenting the existing network is to maximize over all prospective ‘‘add’’ sites the determinant:

ðaddÞ € L ; a submatrix of the full matrix obtained by extending the estimated spatial covariance for the currently gauged sites using the Sampson–Guttorp or other method chosen by the designer. An analogous criterion obtains for reducing a network. In fact, it is possible to rank all sites in terms of their relative importance using this approach once the modeling is complete, using a succession of determinant maximizations like the one above. The entire process of network analysis is depicted schematically in the flow chart of Fig. 11. References Ainslie, B.D., 2004. A photochemical model based on a scaling analysis of ozone photochemistry. Ph.D. thesis, University of British Columbia, Vancouver, B.C., Canada, 311 pp. Ainslie, B., Steyn, D.G., 2007. Spatio-temporal trends in episodic ozone pollution in the Lower Fraser Valley, B.C. in relation to meso-scale atmospheric circulation patterns and emissions. J. Appl. Meteorol. 46, 1631–1644. Angulo, J.M., Bueso, M.C., Alonso, F.J., 2000. A study on sampling design for optimal prediction of space–time stochastic processes. Stoch. Environ. Res. Risk Assess. 14, 412–427. Angulo, J.M., Bueso, M.C., 2001. Random perturbation methods applied to multivariate spatial sampling design. Environmetrics 12, 631–646. BC Ministry of Water, Land and Air Protection, Environment Canada, 2003. Particulate Matter in British Columbia: a Report on PM10 and PM2.5 Mass Concentrations up to 2000. Tech. rep.. Water, Air, and Climate Change Branch of British Columbia Ministry of Water, Land and Air Protection; and Pacific and Yukon Region of Environment Canada. Bovis, P., 2003. Stratosphere-troposphere exchange and its influence on surface ozone concentrations in the lower Fraser valley, B.C., MSc thesis, The University of British Columbia, Vancouver, BC, Canada, 146 pp. Box, G.E.P., Cox, D.R., 1964. An analysis of transformations. J. Roy. Stat. Soc. B 26, 211–246.

Brauer, M., Brook, J.R., 1997. Ozone personal exposures and health effects for selected groups residing in the Fraser Valley. Atmos. Environ. 31, 2113–2121. Bueso, M.C., Angulo, J.M., Alonso, F.J., 1998. A state-space model approach to optimum spatial sampling design based on entropy. Environ. Ecol. Stat. 5, 29–44. Bueso, M.C., Angulo, J.M., Curz-Sanjulia`n, J., Garcı´a-Aro´stegui, J.L., 1999. Optimal spatial sampling design in a multivariate framework. Math. Geol. 31, 507–525. Burrows, W.R., Benjamin, M., Lord, E.R., McCollor, D., Thomson, B., 1995. CART decision-tree statistical analysis and prediction of summer season surface ozone for the Vancouver, Montreal and Atlantic regions of Canada. J. Appl. Meteorol. 34 (8), 1848–1862. Caselton, W.F., Husain, T., 1980. Hydrologic networks: information transmission. J. Water Resour. Plann. Manage. 106 (WR2), 503–520. Caselton, W.F., Zidek, J.V., 1984. Optimal monitoring network designs. Stat. Prob. Lett. 2, 223–227. Caselton, W.F., Kan, L., Zidek, J.V., 1992. Quality data networks that minimize entropy. In: Guttorp, P., Walden, A. (Eds.), Statistics in the Environmental and Earth Sciences. Griffin, London. Chang, H., Fu, A.Q., Le, N.D., Zidek, J.V., 2007. Designing environmental monitoring networks to measure extremes. Environ. Ecol. Stat. 14, 411–431. Dobbie, M.J., Henderson, B.L., Stevens Jr., D.L., 2008. Sparse sampling: Spatial design for monitoring stream networks. Stat. Surv 2, 113–153. Elkamel, A., Fatehifar, E., Taheri, M., Al-Rashidi, M.S., Lohi, A., 2008. A heuristic optimization approach for Air Quality Monitoring Network design with the simultaneous consideration of multiple pollutants. J. Environ. Manage. 88, 507–516. Greater Vancouver Regional District, 2002. 2000 Emissions Inventory for the Lower Fraser Valley Airshed. Tech. rep.. GVRD Policy and Planning Department, Burnaby, B.C. Greater Vancouver Regional District, 2003. 2000 Emissions Inventory for the Lower Fraser Valley Airshed: Detailed Listing of Results and Methodology. Tech. rep.. GVRD Policy and Planning Department, Burnaby, B.C. Greater Vancouver Regional district, 2004. Lower Fraser valley ambient air quality report, 2004. GVRD Policy and Planning Department, Burnaby, BC, 42 pp. Guttorp, P., Le, N.D., Sampson, P.D., Zidek, J.V., 1993. Using entropy in the redesign of an environmental monitoring network. In: Patil, G.P., Rao, C.R., Ross, N.P. (Eds.), Multivariate Environmental Statistics. North Holland/Elsevier Science, New York, pp. 175–202. Hornik, K., 2007. The R FAQ. http://CRAN.R-project.org/doc/FAQ/R-FAQ.html. Ignaccolo, R., Ghigo, S., Giovenali, E., 2008. Analysis of air quality monitoring networks by functional clustering. Environmetrics 19, 672–686. Jaynes, E.T., 1963. Information theory and statistical mechanics. In: Ford, K.W. (Ed.), Statistical Physics, 3. Benjamin, New York, pp. 102–218. Larson, T., Su, J., Baribeau, A.-M., Buzzelli, M., Setton, E., Brauer, M., 2007. A spatial model of urban winter woodsmoke concentrations. Environ. Sci. Technol. 41, 2429–2436. Le, N.D., Sun, L., Zidek, J.V., 2001. Spatial prediction and temporal backcasting for environmental fields having monotone data patterns. Can. J. Stat. 29, 516–529. Le, N.D., Zidek, J.V., 1994. Network designs for monitoring multivariate random spatial fields. In: Vilaplana, J.P., Puri, M.L. (Eds.), Recent Advances in Statistics and Probability, pp. 191–206. Le, N.D., Zidek, J.V., 2006. Statistical Analysis of Environmental Space–Time Processes. Springer, New York. Lindley, D.V., 1956. On the measure of the information provided by an experiment. Ann. Math. Stat. 27, 968–1005. McKendry, I.G., 1994. Synoptic circulation and summertime ground-level ozone concentrations at Vancouver, British Columbia. J. Appl. Meteorol. 33, 627–641. McKendry, I.G., 2000. PM10 levels in the Lower Fraser Valley, British Columbia, Canada: an overview of spatiotemporal variations and meteorological controls. J. Air Waste Manage. Assoc. 50, 443–452. McKendry, I.G., Lundgren, J., 2000. Tropospheric layering of ozone in regions of urbanized complex and/or coastal terrain: a review. Progr. Phys. Geogr. 24, 329–354. McKendry, I.G., Steyn, D.G., Banta, R.M., Strapp, W., Anlauf, K., Pottier, J., 1998a. Daytime photochemical pollutant transport over a tributary valley lake in Southwestern British Columbia. J. Appl. Meteorol. 37, 393–404. McKendry, I.G., Steyn, D.G., O’Kane, S., Zewar-Reza, P., Hueff, D., 1998b. Lower tropospheric ozone measurements by light aircraft equipped with chemiluminescent sonde. J. Atmos. Ocean Technol. 15, 136–143. McKendry, I.G., Steyn, D.G., Lundgren, J., Hoff, R.M., Strapp, W., Anlauf, K., Froude, F., Martin, B.A., Banta, R.M., Olivier, L.D., 1997. Elevated pollution layers and vertical downmixing over the Lower Fraser Valley, B.C. Atmos. Environ. 31 (4), 2135–2146. Mukerjee, R., Reid, N., 1999. On a property of probability matching priors: matching the alternative coverage probabilities. Biometrika 8, 333–340. Mu¨ller, W.G., 2001. Collecting Spatial Data: Optimum Design of Experiments for Random Fields, second ed. Physica-Verlag, Heidelberg. Nunes, L.M., Caeiro, S., Cunha, M.C., Ribeiro, L., 2006. Optimal estuarine sediment monitoring network design with simulated annealing. J. Environ. Manage. 78, 294–304. Pryor, S.C., Steyn, D.G., 1994. Visibility and Ambient Aerosols in South-Western British Columbia During REVEAL. British Columbia Ministry of Environment, Land and Parks. Pryor, S.C., Steyn, D.G., 1995. Hebdomadal and diurnal cycles in ozone time series from the Lower Fraser Valley, B.C. Atmos. Environ. 29, 1007–1019.

B. Ainslie et al. / Journal of Environmental Management 90 (2009) 2715–2729 Pryor, S.C., McKendry, I.G., Steyn, D.G., 1995. Synoptic-scale meteorological variability and ozone concentrations in Vancouver, B.C. J. Appl. Meteorol. 34, 1824–1833. Reuten, C., Steyn, D.G., Strawbridge, K.B., Bovis, P., 2005. Observations of the relation between upslope flows and the convective boundary layer in steep terrain. Bound.-Lay. Meteorol. 116, 37–61. RWDI AIR Inc., 2008. Review of the Lower Fraser Valley Air Quality Monitoring Network. Consulting Report to Metro Vancouver. RWDI AIR Inc., p 208. Sampson, P., Guttorp, P., 1992. Nonparametric estimation of nonstationary spatial structure. J. Am. Stat. Assoc. 87, 108–119. Sebastiani, P., Wynn, H.P., 2000. Maximum entropy sampling and optimal Bayesian experimental design. J. Roy. Sta. Soc. B 62, 145–157. Shewry, M., Wynn, H., 1987. Maximum entropy sampling. J. Appl. Stat. 14, 165–207. Steyn, D.G., Ainslie, B., McKendry, I., May 2005. Air pollution in the Lower Fraser Valley, B.C.: evolution, measurement, modelling and management. In: 39th Canadian Meteorological and Oceanographic Society Congress, Richmond, British Columbia, Canada.

2729

Strobl, R.O., Robillard, P.D., 2008. Network design for water quality monitoring of surface freshwaters: a review. J. Environ. Manage. 87, 639–648. Taylor, E., 1992. The Relationship Between Ground-Level Ozone Concentrations, Surface Pressure Gradients, and 850 mb Temperatures in the Lower Fraser Valley of British Columbia. Environment Canada, Tech. Rep. PAES-92-3, 36 pp. Vingarzan, R., Taylor, B., 2003. Trend analysis of ground level ozone in the Greater Vancouver/Fraser Valley area of British Columbia. Atmos. Environ. 37, 2159–2171. Wu, S., Zidek, J.V., 1992. An entropy based review of selected NADP/NTN network sites for 1983–86. Atmos. Environ. 26A, 2089–2103. Zhu, Z., Stein, M.L., 2005. Spatial sampling design for parameter estimation of the covariance function. J. Stat. Plann. Inference 134, 583–603. Zhu, Z., Stein, M.L., 2006. Two-step spatial sampling design for prediction with estimated parameters. J. Agric. Biol. Environ. Stat. 11, 24–44. Zidek, J.V., Sun, W., Le, N.D., 2000. Designing and integrating composite networks for monitoring multivariate Gaussian pollution fields. J. Roy. Sta. Soc. C 49, 63–79.