Atmospheric Environment Vol. 26A, No. 1, pp. 159 169, 1992.
0004~981/92 $3.00+0.00 © 199l Pergamon Press plc
Printed in Great Britain.
A STATISTICAL METHODOLOGY FOR EXPLORING ELEVATIONAL DIFFERENCES IN PRECIPITATION CHEMISTRY WILLIAM G. WARREN,*'~ MARGI B(SHM~ a n d DENISE LINK§ *Department of Forest Management, Oregon State University and EPA-CERL, 200 SW 35th Street, Corvallis, OR 97333, U.S.A., ~:NSITechnology Services Corporation, Western Conifers Research Cooperative, U.S. EPA Environmental Research Laboratory, 200 SW 35th Street, Corvallis, OR 97333, U.S.A. and §U.S. Environmental Protection Agency, Region VIII, 999 18th Street, Suite 500, Denver, CO 80202=2405, U.S.A. (First received 2 February 1990 and in final form 16 April 1991)
Abstraet--A statistical methodology for exploring the relationships between elevation and precipitation chemistry is outlined and illustrated. The methodology is only applicable to situations where the precipitation at two (or more) matched sites is correlated. Maximum likelihood estimates and likelihood tests are utilized, with contour ellipses of assumed bivariate log-normal distributions to assist in the interpretation. Data from 12 sites located in the southern Rocky Mountains of the U.S. were used for illustration. The results indicate differences in sulphate concentrations between airsheds, between snow and rain, and between higher and lower elevations in the Rocky Mountains. There are other approaches for investigating these issues, however, the likelihood ratio with contour ellipses is easy to apply and interpret, and we feel that it provides a greater insight than some of the more common analyses of variance or regression techniques. However, it is important to note that the superiority of one method over others depends on the criterion used. Key word index: Elevation effects, sulphate, likelihood ratio tests.
l. INTRODUCTION The western U.S. is characterized by complex mountainous terrain, with many wilderness areas that are remote and inaccessible. A recent survey of high altitude lakes across the West showed that, although none were currently acidic, 26.6% were dilute with conductance values of less than 10/iS era- 1, and the median acid neuteralizing capacity (ACN) was 119 #eq E- 1 (Landers et al., 1987). Lakes with such low conductance and alkalinity characteristics are considered sensitive to acid deposition (e.g. Landers et al., 1987). The magnitude of acid deposition to high elevations is a function of precipitation amount, type, and chemistry. It is expected that precipitation amount will increase with elevation since, as an air mass rises over an orographic obstacle, it cools, and if sufficient moisture is present, condensation will occur (Barry, 1981). Thus, a gradient in precipitation amount of 316 mm per 100 m gain in height, as has been recorded for the Sierra Nevada Mountains of California (Stohlgren and Parsons, 1987), is not surprising. However, not all high elevation sites in the Colorado Rockies receive large amounts of precipitation (Lewis et al., 1984). Consequently, it cannot be assumed that acid # Present address: Science Branch/CODE, Dept. of Fisheries and Oceans, P.O. Box 5667, St John's, NF, Canada A1C 5XI.
deposition is greater at higher elevations simply because precipitation volume is expected to increase. Precipitation is in the form of rain, snow, or ice. Precipitation type differs between low and high elevations, especially during the winter months. Higher elevations receive more snow and rime ice than lower elevation sites and since many pollutants have different solubilities in rain, snow, or ice (Scott, 1981; Finlayson-Pitts and Pitts, 1986) there may be a difference in the chemistry of precipitation falling at high vs low elevations. Furthermore, differences in precipitation chemistry may be observed if the air masses, from which the precipitation occurs, have different origins. The relationships between precipitation amount, type, and chemistry, and how these influence the magnitude and nature of acid deposition to high elevation ecosystems in the western U.S., are being investigated through data obtained from a network of wet deposition monitoring sites in Colorado and Wyoming. These sites are part of the Rocky Mountain Deposition Monitoring Project (RMDMP) which has been operational since 1986. There are five study areas, each consisting of two sites within the same airshed and separated by approximately 500 m in elevation. The higher site of each pair is located close to the Continental Divide. The purpose of matching sites is to have comparisons as free as possible from extraneous geographical variation. Each site is part of the National Atmospheric Deposition Program/Na159
160
WILLIAMG. WARRENet al.
tional Trends Network (NADP/NTN) and is equipped with an Aerochem Metrics 201 wet/dry precipitation collector and a recording Belfort rain gauge. N A D P / N T N field protocols, quality control and quality assurance are adhered to by the site operators. Preliminary analyses of R M D M P data showed strong correlations between weekly sulphate concentrations in precipitation collected at matched high and low sites. This result implies that either the origins and/or histories of the air masses from which the precipitation occurred were similar, or the weekly sampling interval allowed sufficient averaging of individual storms to mask differences in precipitation chemistry due to different air masses. It remained then to investigate the effect of precipitation type on precipitation chemistries at high and low sites in the southern Rocky Mountains. Our purpose is to demonstrate a statistical approach that we have found useful for exploring the relationships between precipitation type and chemistry, at high and low elevation sites in the Colorado and Wyoming Rockies as part of RMDMP, under conditions where the precipitation chemistries between matched sites are correlated. The analysis is based on likelihood ratio tests, of which conventional t and F tests are special cases. However, the more general approach is easy to apply, interpret, flexible and valid in this context, and provides a greater insight than some of the more common analysis of variance or regression techniques. A study area in the Colorado Rockies was chosen for a detailed illustration of the statistical methodology. The Rocky Mountain National Park (N.P.) is not one of the R M D M P study areas, but the two sites are part of the N A D P / N T N and are located in the same airshed at different elevations. This study area was chosen rather than one of the five R M D M P study areas because of the more complete data set. The illustrations will be restricted to sulphate concentrations (mgd-~), although it should be possible to handle other ion concentrations or their linear combinations (e.g. principle components) in a similar manner. Summary results for all study areas will be presented and discussed. It is important to add that, should the precipitation chemistries at matched sites be uncorrelated, the more conventional statistical methods would be preferred for investigating differences in precipitation chemistry measured at the two sites.
re
++ °+
o" 0O¢'4 t'~fq
+ +
+
+
+ e~ ID
Detailed descriptions of the sites are presented in Table 1. Weekly composite wet-only precipitation chemistry data were obtained from N A D P / N T N (National Atmospheric Deposition Program/National Trends. Network, Tape of Weekly Data, July 1978-August 1988) and were screened using the flagging system utilized by N A D P / N T N (National Atmo-
t¢~ o";t'-4
triO0 eqt'~
g
-/
Z e¢
.o t~
2. A P P R O A C H
Y~PPfPffYY
z:~z
Elevational differences in precipitating chemistry spheric Deposition Program, 1986). Samples flagged as bulk deposition, long duration, no information available, no chemical data available, presence of bird droppings, sampler malfunction, sample protocol problem or as failing N A D P quality assurance standards were discarded. No correction factor for poor collection efficiency was applied to the data. Daily precipitation data are also available from NADP/NTN. A precipitation type of snow, rain, mixed or unknown is assessed by the site operator to each daily value. The precipitation data are weekly composites, hence a predominant precipitation type was identified for each weekly sampling period (Fig. 1). The weekly precipitation types were than screened for consistency using NADP/NTN flags noting the presence or absence of water in the bucket and the calculated weekly precipitation amount. To investigate the relationships between precipitation type and chemistry at high and low elevation sites, the data were organized into a sequence of paired observations according to precipitation type. There are many possible pairs; however, we will restrict the discussion to the more commonly occurring snowzsnow2 (s/s) and rainl-rainz (r/r) combinations. The subscripts 1 and 2 refer to high and low elevation, respectively. These combinations will be referred to as
161
the snow and rain strata. Sample sizes within these strata are often small. To reinforce the snow stratum, an augmented stratum (as/as) composed of snowlsnowz together with (predominant snow)l-snow2, sno%-(predominant snow)z, and (predominant snow)~-(predominant snowh, was constructed. An augmented rain stratum (ar/ar) was formed in an analogous manner. Since only one observation fell in the r/r and ar/ar strata for the Snowy study area, conjectural strata for rain (cr/cr) and snow (cs/cs) were constructed from those precipitation types designated as 'unknown', using season as an indicator. The sequence of paired observations form a bivariate time series in which one might expect a degree of serial correlation. Strictly speaking, such serial correlation would invalidate the treatment of successive observations as independent. A seasonal cycle appeared to be the prime contributor to the observed serial correlation, with a tendency for higher sulphate concentrations in summer and lower in winter. Since snow was the dominant form of precipitation in winter, and rain in summer, stratification by precipitation type is equivalent to stratification by season. Further, because weeks exist without precipitation and, in some cases, with missing concentration data, the sequence of observations within a stratum is, in
none - 8
i
J
lenow, rain. mixed. Yes unknown| "°
unknown =
JYes
maxlsnow, rain. mixed, unknown| J
i/unkno*n"~
J~.d No:rain - 0 unknown > 0
F
snow
~
> 0
r No:snow ~ 0
unknown > 0 mixed ~ 0
~
Ye~
~,,..._(
rain - I
No
t Fig. 1. Logic diagram illustrating the procedure for selecting the weekly precipitation type from the daily data. ~(A) H,bX
162
WILLIAM G. WARREN et aL
general, far from continuous. Thus, it is conjectured that, within a stratum defined by precipitation type, the observations can be considered independent. Based on existing successive pairs within a season, the resulting serial correlations with associated p values (for a test against the alternative of positive serial correlation) are given in Table 2. The only seemingly serious violation of the null hypothesis occurs for the Snowy study area in winter. Closer examination showed that this violation stems from a single instance of two successive weeks with exceptionally high concentrations, the remainder of the pairs forming an amorphous cluster. The hypothesis also appeared to be violated for the Rocky Mountain N.P. study area for winter, but the 'significance' is conditioned by the relatively small number of points in this data set; the correlation is small and difficult to discern in associated data plots. Accordingly, it would appear that the failure to incorporate serial correlation in the analysis would have negligible impact on the conclusions.
assumption. While this does not provide a formal test, it is a commonly used criterion for assessing the relative goodness of fit (e.g. Meilke and Johnson, 1974; Pellicane, 1985). The maxima of the log likelihoods (Appendix A) for all sites are presented in Table 3 for both the basic and augmented data sets. In almost all cases, the maximum under the log-normal assumption exceeds that under the normal. The only exceptions are for rain at the Niwot and Elk Range study areas where the sample sizes are too small to permit discrimination. With the larger sample sizes the superiority of the log-normal is quite clear. This confirms that the data are better represented as a bivariate log-normal than as a bivariate normal distribution, but does not imply that the bivariate lognormal is the underlying distribution. Nevertheless, the linearity of plots of the logarithms of the order statistics against the expected values of normal order statistics confirms the viability of the log-normal assumption, and all subsequent analyses assume a bivariate log-normal model.
Effect of precipitation type 3. METHODS
Test for normality The marginal distributions of sulphate concentration, that is the distributions at the high and low sites, exhibit a degree of positive skewness. It would then seem inappropriate to assume that the observations are a realization of a bivariate normal distribution. The simplest, potentially viable alternative is the bivariate log-normal distribution. One may confirm that the bivariate log-normal gives a better representation of the data than the bivariate normal by computing the maximum of the likelihood under each
Table 2. Estimates of serial correlation by season, study area, and elevation (p values in parentheses)
Study area
Winter High Low site site
Rocky Mtn. N.P.
0.25 (0.027)
0.25 0.06 -0.01 (0.028) (0.404) (0.516)
Buffalo Pass
0.21 (0.176)
0.02 0.39 0.29 (0.461) (0.055) (0.122)
Snowy Range
0.77 (0.001)
0.56 0.12 -0.18 (0.018) (0.340) (0.735)
San Juan Mtn.
0.01 (0.491)
0.02 0.23 0.13 (0.498) (0.213) (0.326)
Niwot Ridge
0.21 (0.669)
0.16 0.14 -0.78 (0.219) (0.398) (0.974)
Elk Range
0.34 (0.286)
0.43 (0.235)
Where: NA--insullicient data; Mtn.--Mountain; N.P.--National Park.
Summer High Low site site
NA
NA
To test the hypothesis that precipitation chemistry is unaffected by precipitation type, we employ the likelihood ratio test (see e.g. Wilks, 1962). Firstly, the concentration data are pooled for the two precipitation strata within the same study area. The maximum of the log likelihood for the pooled data set is computed as if the same bivariate distribution applied to both precipitation types (i.e. the null hypothesis). This pooled maximum of the log likelihood cannot be greater than the sum of the two separate maxima. Under the null hypothesis, twice the difference between the log likelihoods is known to be asymptotically distributed as chi-squared with degrees of freedom equal to the difference in the number of estimated parameters. The bivariate log-normal has five parameters (mean and standard deviation for both sites, and the correlation between sites), thus the difference should be compared with chi-squared on: (5 . . . .
"3L 5rain) - 5pooled = 5 degrees of freedom.
For example, for Loch Vale (the Rocky Mountain N.P. high elevation site) and the augmented data sets, the maximum likelihood estimates of the means, variances, and correlation (Appendix A) are: m1 m2 s2 s2 r
snow 3.7378 3.9435 0.4586 0.4876 0.6147
rain 4.5633 4.9669 0.2831 0.3372 0.7320
pooled 3.8825 4.1222 0.5269 0.4016 0.7071.
The maxima of the log likelihoods (see Appendix A; note - n log (21t) and - n are omitted) are - 348.15 for snow, -87.73 for rain, and -446.83 for the pooled data set. Whence: 2 [( - 348.15 -- 87.73) -- ( -- 446.83) ] -- 21.9,
Elevational differences in precipitating chemistry
163
Table 3. Summary statistics for testing the log-normal and normal distribution assumptions (using log likelihood maxima) and for testing the hypotheses a~ = ere and 111=/~2 (using the likelihood ratio test) for the snow and rain strata. The results are given for both the basic (a) and augmented (b) data sets. For the Snowy Range study area, results for the conjectured snow and rain types (c) are also included. Angles formedby the major axes of the contour ellipses with the y~ axis (0) are also given for the augmented data sets (a) Snow Ho: tr~ =tr 2 Z2(p value)
Ho: #1 =#2 Z2(p value)
0
-257.5 -376.2
0.01 (0.97) 0.08 (0.78)
2.61 (0.11) 5.70 (0.02)
46.5 °
- 100.7 - 121.2
- 102.3 - 123.5
1.71 (0.19) 2.00 (0.16)
0.49 (0.45) 1.36 (0.24)
54.5°
5 8 12
-35.5 - 53.2 -81.7
-38.3 - 59.0 -91.3
1.47 (0.22) 1.98 (0.16) 0.42 (0.52)
0.63 (0.43) 0.42 (0.52) 0,54 (0.46)
50.4°
a b
9 13
-43.2 -77.5
-51.8 -86.6
1.89 (0.17) 0.51 (0.47)
15.7 (0.001) 13.1 (0.001)
47.6°
a b
22 25
- 141.7 - 163.7
- 149.8 - 171.3
1.86 (0.17) 0.19 (0.66)
9.41 (0.002) 6.67 (0.01)
42.5 °
a b
5 7
-20.7 -31.7
-22.6 -32.6
1.48 (0.022) 1.31 (0.25)
1.36 (0.24) 0.19 (0.66)
35.9 °
Data
N
Log likelihood maxima log-normal normal
Ho: a21=a~ ;(2(p value)
Ho: #1 =#2 X2(p value)
0
a b
8 11
-59.8 -87.7
-62.7 -92.7
0.84 (0.36) 0.18 (0.67)
2.92 (0.09) 7.7 (<0.01)
48.4°
Buffalo Pass
a b
10 13
-79.0 - 100.I
-83.8 - 107.5
0.27 (0.61) 0.004 (0.95)
1.79 (0.18) 2.65 (0.10)
44.6°
Snowy Range
a b c
NA NA 15
-95.8
NA NA 8.1 (<0.01)
NA NA 2.02 (0.16)
56.6°
San Juan Mtn.
a b
NA 7
- 53.2
- 54.4
NA 0.06 (0.81)
NA 0.80 (0.37)
42.5 °
Niwot Ridge
a b
3 4
- 16.3 -27.6
- 16.2 -27.1
3.25 (0.07) 2.21 (0.14)
1.09 (0.30) 0.01 (0.93)
62.9 °
a b
NA 4
-24.2
-23.6
NA 2.86 (0.09)
NA 0.66 (0.42)
35.0°
Study area
Log likelihood maxima log-normal normal
Data
N
Rocky Mtn. N.P.
a b
35 52
-236.4 -348.1
Buffalo Pass
a b
14 17
Snowy Range
a b c
San Juan Mtn. Niwot Ridge Elk Range
(b) Rain Study area Rocky Mtn. N.P.
Elk Range
-94.6
Where: NA--insufficient data.
which, when compared with chi-squared on 5 degrees of freedom, has a p-value of 0.001. The null hypothesis should clearly be rejected; it would appear that the sulphate concentrations are greater in rain than in snow. The tabled means are, however, the means of the logarithms of the concentrations and therefore, strictly speaking, the inference is applicable to the logarithms rather than the concentrations themselves. Nevertheless, the concentration means can be estimated as exp (re+s2~2). This leads to values of 55.8 and 65.6 for snow and 110.6 and 169.9 for rain. Thus, the inference is, here, applicable to the concentrations as well as their logarithms. Similar results were found for the remaining study areas (Table 4). Although the difference is not formally significant for the Buffalo Pass study area, the differences are in the same direction. In a parallel manner, one may investigate differ-
ences between study areas for a given precipitation type, i.e. instead of fixing a study area and combining precipitation types, one would fix a precipitation type and combine study areas. The results of such analyses show that there are differences between study areas (Table 5). To further investigation of these differences, the same method was employed two sites at a time (Table 6). The results of these tests indicate that some sites exhibit similarities while others clearly are different. For example, for snow, the Rocky Mountain N.P. study area appears to be more similar to Snowy Range and Niwot Ridge than to the other study areas. This implies that, generally, it would be inappropriate simply to combine all the data of a particular precipitation type for the purpose of overall analysis. In a later section, the combining of inferences will be considered as an alternative approach.
WILLIAM{~. WARRENet al.
164
Table 4. Test statistics (chi-squared on 5 d.f.) for the comparison of snow and rain concentrations by study area. The results are given for both the basic (a) and augmented (b) data sets. For the Snowy Range study area, results for the conjectured snow and rain types are also included (c) Study area
used for different precipitation type-study area combinations, the set of elliptical regions would be comparable, in the sense that each would contain the same proportion of its respective population. Unfortunately, the population parameters (#1, #2, a t , a2, p) are unknown, but their maximum likelihood estimates can be calculated (Appendix A). Substitution of these estimates for the corresponding parameters in (1) yields estimates of the ellipses for comparison. To determine the points on the contour, let:
Data
;(2
p value
Rocky Mtn. N.P.
a b
14.69 21.91
0.012 <0.001
Buffalo Pass
a b
5.26 5.58
0.385 0.350
Snowy Range
a b c
NA NA 9.04
0.110
a b
NA 20.29
Since Zo=Zmax/C, k reduces to k=2(1-r2)log(c) and (from Appendix A) the equation for the ellipse can be written:
0.001
92 - 2r~ty2 +92 - k = 0 .
Niwot Ridge
a b
20.52 13.30
0.001 0.020
Elk Range
a b
NA 7.87
If k - ( 1 - r2)~/>0, the values of~2 for a given value of Yl are given by:
0.160
Y2 = ry: _+(k - (1 - r2)~ 2) 1/2.
San Juan Mtn.
k= - 2 ( 1 -r2)log(2~sts2zo(1 - r2)1/2).
(2)
(3)
(4)
Let h =(k/(1 -r2)) 1/2. Then, the valid values of y: lie in the range -h~
Where: NA--insufficient data; Mtn.--Mountain; N.P.--National Park.
exp(mt) /[hexp(s:) ] <~Xl ~ hexp(mOexp(st ),
Interpretation A visual representation of the bivariate model, as formulated in Appendix A, can be obtained from contour ellipses in the (Yt, Y2) plane. For values of (Yl, Y2) on the contour of such an ellipse, z =f(Yl, Y2) is constant; i.e. one has a curve of constant probability density for the bivariate normal or, on change of scale, the bivariate log-normal distribution. With Q (yt, y2) as an Appendix A, the equation for such an ellipse can be written: Q(yx,y2) = _ 21og(2mr tcr2z( 1 __p2)1/2).
(1)
The permissible values of z lie in the range: 0 1, the resulting ellipse will contain a certain proportion, ~t, of the population. Indeed, c = 1/(1 -~t), (e.g. Johnson and Kotz, 1972). As c ~ 1 , Zo~Z . . . . and as c-~oo, Zo-,0 and the proportion of the population within the ellipse approaches 100%. In particular, when z o = z=,JlO, the proportion of the population within the ellipse is 90%. If the same value of c, or ct, is
where x represents the untransformed sulphate concentrations (see Appendix A). The contour ellipses (c = 10) for the as/as and ar/ar strata for the Rocky Mountain N P study area are illustrated in Fig. 2. The major axes of the ellipses appear to be almost parallel to the unit line (Yt =Y2). Indeed, the angle that the major axis makes with the Yt axis is 46.5 ° for as/as and 48.4 ° for ar/ar. As shown below, the implication is that any difference in sulphate concentration is independent of the concentration level. On the other hand, if, as the plot seems to suggest, more of the ellipse falls above the unit line than below it, there would be an elevational effect, with the higher concentrations tending to be at the lower level. Comparison of Figs 2a and 2b also illustrates the previous result that, generally, sulphate concentrations are higher in rain than in snow, although several of the weekly composite snow samples contained relatively high concentrations (probably as a consequence of rimed-snow events).
Testing hypotheses about elevational effects The first question is whether any difference in concentration is independent of the concentration level. Because we are dealing with a bivariate log (normal) distribution, there is no notion of a depend-
Table 5. Test statistics for the overall comparison of study area comparisons by precipitation type Statistic
s/s
as/as
cs/cs
r/r
ar/ar
cr/cr
Z2 d.f. p value
53.79 25 < 0.001
55.65 25 < 0.001
56.67 25 < 0.001
13.56 10 0.200
20.68 20 0.416
37.75 25 0.049
Elevational differences in precipitating chemistry
165
Table 6. Test statistics for the pairwise comparison of study area concentrations. The results are given for both the basic (a) and augmented (b) data sets. For the Snowy Range study area, results for the conjectured snow types are also included, in part a as (c) and in part b as *. Critical values of chi-squared on 5 d.f. are 10% 9.24, 5% 11.1, and 1% 15.1 (a) Snow
Rocky Mtn. N.P.
a b c
Buffalo Pass
a b c
Snowy Range
a b c
San Juan Mtn.
a b
Niwot Ridge
a b
Buffalo Pass
Snowy Range
San Juan Mtn.
Niwot Ridge
Elk Range
15.42 20.71 --
2.38 3.80 5.70
18.77 11.77 --
5.15 1.76 --
10.58 15.59 --
3.65 4.20 3.16
25.53 25.36 --
13.10 15.73 --
11.94 17.60 --
13.56 t0.96 13.53
4.46 3.71 4.18
8.92 11.21 12.19
14.27 7.52
19.76 21.10 11.19 14.91
(b) Rain
Rocky Mtn. N.P. Buffalo Pass Snowy Range
a b
Buffalo Pass
Snowy Range
San Juan Mtn.
Niwot Ridge
Elk Range
2.64 3.29
-13.81"
-6.83
11.07 6.45
-9.19
-13.83"
-3.32 -13.69"
9.99 3.50 -7.80*
a b a b a
- -
San Juan Mtn.
b
3.17
Niwot Ridge
b
6.03 8.62* - -
6.06
a
6.86
ent or independent variable. The functional relationship between the variables is then well described by orthogonal regression (sensu Kendall and Buckland, 1982) which corresponds to the major axis of the ellipse. It seems intuitive that, for the difference between the concentrations at the two elevations, Yt - Y 2 , not to depend on Yt (or Y2), in the sense of not being a function of Yt (or Y2), the major axis should have unit slope, i.e. the angle of intersection between the major axis and either coordinate axis should be 45 ° or n/4 radians. Denote the angle of intersection by 0. Then (e.g. Johnson and Kotz, 1972): tan (20) = 2pa t tr2/(~ - a 2,).
(5)
If 0 = n/4 then tan(20)= tan(n/2)= oo which implies a~ = a~. Thus the hypothesis that the elevational effect is independent of the concentration is equivalent to Ho: a~ =or22. Alternatively and more formally, the hypothesis can be stated as having the difference in concentration as independent of the mean concentration, i.e. Yt-Y2
independent of (yt+y2)/2. It is well known that, If (zt, z2) follow a bivariate normal distribution with correlation p, (I p I< 1), then zt and z2 are independent if and only if p = 0 (e.g. Fraser, 1958). Since the covariance o f y t -Y2 and Yt +Y2 is a 2-a22, it follows that the difference in concentration is independent of the mean concentration if and only if at2 =a2.2 Given that the major axis of the ellipse is parallel to the unit line, the hypothesis that there is no effect of elevation (or that the proportions of the ellipse above and below the unit line are equal) is equivalent to the center of the ellipse being on the unit line, i.e. Ho: #t =#2. Tests of these hypotheses can be performed using the log likelihoods. U n d e r the common variance assumption, the maximum likelihood estimate of this variance is sZ=(s~+s2,)/2, the maximum likelihood estimate of p is r=st2/s z, and the maximum likelihood estimates f o r / ~ and/z 2 are given in Appendix A. The maximum of the logarithm of the likelihood ( - n log (2n) and - n omitted), reduces to:
-21og(s)-(n/2)log(1 - r 2 ) - n(mt +m2).
(6)
WILLIAMG. WARRENet al.
166
.
.
.
.
.
!
(a) as/as stratum
.
.
.
.
.
.
.
.
~
J
.
.
.
.
.
.
.
~. . . . . . .
÷
..i-
•
..I+ . . . . .
++
+ . 6 '++ +.a, • ...
•
,, ,'<+
,+'++ +
4-
I....
.
.
.
.
. '
.
~+ ÷ •
.
i
.
•
+
:.
."
..I+
/
÷ /
~
," ,+ '÷
7
,,,
/;i'.
:
•
.
.
......................
d
÷
.
÷
+
i
.
~ .
.
•
•
.
' 2
.
.
-
. "'"
:
::
;
;
,
"
:
:+"
:"
~lO .
.
.
.
.
.
.
.
i
.
.
.
.
.
.
.
i
~O00
tOO
:tO
HIGH SITE SO+ (rag/l) (b) ar/ar stratum ~ooo
. . . . . .
.
........
....
i " "
":"
......
: . . . . . . +. . . .
+ .....................
i?
i ........
!. . . . . . . .
> / "
:
" " !
~;"
L-"+ : i i
O / ' /
i
/
i
10
tOO
iO00
H I G H S I T E SO+ (mg/l) Fig. 2. C o n t o u r ellipses s h o w i n g the r e l a t i o n s h i p between s u l p h a t e c o n c e n t r a t i o n s a n d elevation for (a) the a s / a s s t r a t u m a n d (b) the a r / a r s t r a t u m at t h e R o c k y M o u n t a i n N P s t u d y area; ~ =0.90.
Twice the difference between the values obtained using the reduced (Equation (6)) and unreduced (Equation (AI)) models, should be distributed as chisquared on 5 - 4 = 1 degree of freedom, where 5 and 4 are the numbers of parameters fitted without and with the hypothesis, respectively. For the Rocky Mountain study area: s/s: 2(236.3966- 236.3908)---0.0116
p=0.91,
as/as: 2(348.1903 - 348.1510)-- 0.0786
p=0.78,
therefore, the null hypothesis, namely that the differences in sulphate concentration are independent of concentration, is accepted.
Likewise the constraint ~1--/z2(=/~, say) may be imposed. The maximum likelihood estimates of the parameters become m-- (ml + m2)/2, s2 = [Y+(YlJ- m)2 + y+(Y2,- m)2 -I/2n and
r= E~,(Yu-m)(y2+-m)'l/ms ~. The form of the maximum of the logarithm of the likelihood is unchanged. Continuing the above example,
s/s:2(237.7042-236.3966)=2.6152
p=0.106,
as/as: 2(351.0426- 348.1903) = 5.7046
p--0.017.
Elevational differences in precipitating chemistry Thus, with the augmented stratum, the null hypothesis would be rejected and there is solid evidence of higher sulphate concentrations at the lower elevation. The corresponding results for the remaining study areas are given in Table 3. Table 3 also gives the angles that the major axes form with the Yl axis. While these exhibit a fair amount of scatter, the weighted averages are 46.7 ° for the as/as stratum and 49.1 ° for the ar/ar stratum.
Combining inferences across study areas There remains the question of combining inferences from different study areas, since there is clear evidence against pooling airsheds as well as precipitation types. Since the sample sizes for some categories are small, the power of the tests will be low and real differences may be declared 'non-significant'. Suppose, however, that over study areas and/or precipitation types, the concentrations at the lower elevation were consistently greater than those at the higher elevation, although, in no one case could the difference be declared 'significant' at conventional levels. A combined inference can be obtained by the following method, commonly used for this purpose (Fisher, 1932). Let pi be the probability of obtaining a more extreme value of the test statistic than that observed. Then, under the null hypothesis, the p~ would be uniformly distributed on (0, 1). It then follows that - 2~ log(pl) would be distributed as chi-squared on 2k degrees of freedom, where k is the number of cases being combined. From the p-values given in Table 3
167
and for the hypothesis t7z2-~2,2
as/as:
22; log(pl) =
16.41
p=0.17(12 d.f.),
- 2Y, l o g (Pi) = 1 2 . 9 6
p=0.37 (12 d.f.).
s/s: -
For the hypothesis/~1 =#2, s/s: - 22; log (pi) = 36.65
p = 0.0003 (12 d.f.)
as/as: - 2Y.log (Pl) = 38.58
p =0.0001 (12 d.f.).
Thus, overall, there is little or no evidence that the differences are dependent on concentration, but there is evidence that the sulphate concentrations are greater at the lower elevations. These results are demonstrated by the mean concentrations for each region, elevation, and precipitation type (Fig. 3).
4. DISCUSSION Various aspects of the methodology described require qualification. Firstly, note that the likelihood ratio tests are asymptotic. Although in some instances the sample sizes are quite small, the tests should still provide adequate approximations relative to those obtained by more exact procedures. As the databases get larger, any inaccuracy due to the asymptotic nature of the likelihood ratio tests becomes negligible. For testing #z =/~2 given cr~ = ~2, a paired difference t test would be valid but, because of the lack of independence between matched high elevation and low elevation pairs, a conventional F test for the
~.04.B-
®
•
•
~4.B-
E ~4,4" tO ~.~ 4 . 2 " C 034.0, 0 cO 03.8.
.¢ o c~
+ + +
3.6
tO 3,4' e 3.2
3.0 Ben
i Juan
I Elk
]
Ridge
ROCky
1 Ht
Niwot
i Snowy
i Buffalo
Study area
a s / a s high site a s / a s low site a r / a r high s i t s a t / o r low slts -
* + • ®
Fig. 3. Average sulphate concentrations by augmented stratum and elevation for each study area. Basic strata averages differ by less than 0.1 for samples sizes 5 and greater.
168
WILLIAMG. WARRENet al.
hypothesis tr2 = ~2 is, strictly speaking, invalid. We feel that maintaining all tests in the likelihood framework aids the interpretation, and is certainly convenient. The focus of the applications presented is restricted to sulphate concentrations. An analogous approach should apply to all ion species. Some modification will be necessary when concentrations fail to attain minimum detection. Further, it is unlikely that the concentrations of the various ion species would be independent. Accordingly, instead of statistically analysing each species separately, it may well be meaningful to consider linear combinations, in particular the principal components, of the species concentrations. Even if the concentrations of each species were log-normally distributed, the distribution of a linear combination would, in general, not be log-normal. Indeed, if the species concentrations were independent, then linear combinations thereof would tend towards being normally distributed. In actuality, the distribution of a linear combination probably lies somewhere between these two extremes (normal, log-normal) and its seems likely that Box-Cox (Box and Cox, 1964) transformations could be used to achieve approximate normality, and a procedure that parallels the above method can be then be applied.
5. CONCLUSIONS A statistical approach for investigating the effect of elevation and precipitation type on precipitation chemistry, when the data are correlated, has been presented. The application of the method to testing the differences between study areas, between precipitation types, between elevations, and also whether the effects are independent of concentration, has been illustrated. In general, the results indicate that average sulphate concentrations in snow are consistently lower than in rain, although maximum concentrations are often similar for each type. Likewise, sulphate concentrations in snow are greater at lower elevations than the higher elevations, and this difference is independent of concentration. A similar relationship for rain is not well established. In addition, there is evidence, overall, that sulphate concentrations differ between the six study areas, although the pairwise differences are not always significant. This result is not surprising as the study areas were chosen to represent different airsheds. There are other valid approaches for investigating these issues. Nevertheless, we believe we have demonstrated that the likelihood ratio test with contour ellipses is easy to apply, interpret and is flexible. Acknowledoements--We would like to thank Tony Olsen
(Battelle, Pacific Northwest Laboratories), John Sigrnon (University of Virginia), Luther Smith (NSI Technology Inc.), Timothy Haas (Colorado State University), five anonymous reviewers for their useful comments, most of which have been incorporated, and Greg Banmgardner for his assistance with
preparing the graphs. This research was supported by funds provided by the Western Conifers Research Cooperative within the joint U.S. Environmental Protection AgencyUSDA Forest Service Forest Response Program. The Forest Response Program is part of the National Acid Precipitation Assessment Program. REFERENCES
Barry R. G. (1981) Mountain Weather and Climate. Methuen, New York. Box G. E. P. and Cox. D. R. (1964) An analysis of transformations. J. R. Statist. Soc. B 26, 211-243. Finlayson-PittsJ. and Pitts J. N. (1986) Atmospheric Chemistry Fundamentals and Experimental Techniques. John Wiley, New York. Fisher R. A. (1932) Statistical Methods for Research Workers (4th edition). Oliver & Boyd, London. Fraser D. A. S. (1958) Statistics--An Introduction. John Wiley, New York. Johnson N. L. and Kotz S. (1972) Distributions in Statistics: Continuous Multivariate Distributions. John Wiley, New York. Kendall M. G. and Buckland W. G. (1982) A Dictionary of Statistical Terms (4th edition). Longmans, London. Landers D. H., Eilers J. M., Brakke D. F., Overton W. S., Kellar P. E., Silverstein M. E., Schonbrod R. D., Crowe R. E., Linthurst R. A., Omerik J. M., Teague S. A. and Meier E. P. (1987) Western lake survey phase 1. Characteristics of lakes in the western United States, Volume 1: population descriptions and physico-chemical relationships. EPA/60/3-86/054a. Lewis M. W., Grand M. C. and Saunders J. F. (1984) Chemical patterns of bulk atmospheric deposition in the State of Colorado. Wat. Resour. Res. 20, 1691-1704. Mielke P. R., Jr and Johnson E. S. (1974) Some generalized beta distributions of the second kind having desirable application features in hydrology and meteorology. Wat. Resour. Res. 10, 223-226. Pellicane P. J. (1985) Goodness-of-fit analysis for lumber data. Wood Sci. Technol. 19, 117-129. National Atmospheric Deposition Program (IR-7)/National trends network, tape of weekly data: July 1978-August 1988. Available from the Program Coordinator, NADP/NTN Coordination Office, Natural Resource Ecology Laboratory, Colorado State University, Fort Collins, CO 80523. National Atmospheric Deposition Program, NADP/NTN annual data summary: precipitation chemistry in the United States (1986). Available from Program Coordinator, NADP/NTN Coordination Office, Natural Resource Ecology Laboratory, Colorado State University, Fort Collins, CO 80523. Scott B. C. (1981) Sulphate washout ratios in winter storms. J. appl. Met. 20, 619-625. Stohlgren T. J. and Parsons D. J. (1987) Variation of wet deposition chemistry in Sequoia National Park, California. Atmospheric Environment 21, 1369-1374. Wilks S. S. (1962) Mathematical Statistics. John Wiley, New York. APPENDIX A. SUMMARY OF LIKELIHOOD RATIO
CALCULATIONS Let xli and x21 be the sulphate concentrations for the ith week at the high and low sites, respectively. It is assumed that yj~=log(xj~)(j=.l, 2) follow a bivariatc normal distribution with density: dF(yt,y2)= l/(21t~to~)(1-p2)t/2exp(-Q/2)dyldy2 (AI) =f(Yl,Y2)dYz dy2,
Elevational differences in precipitating chemistry where Q=[.~2-2p~1~2+~2]/(1
It is convenient to take the logarithm of the likelihood, which, together with Equations (A3)-(A6), yields a maximum of:
-/3 2)
~=(y~-u~)/% j = 1, 2,
nlog(2n)- n Elog(s0 + log(s2) ]
-
where /~=mean of yj, cry=variance of yj, p=correlation between Yl and Y2. Let there be n data points. The maximum likelihood estimates of the mean, variance and correlation are: ~j = ra t = Y~y j l / n = ~ ,
~2=s2 = Z(yj,-~j)2/n, b12=s12 = E ( y l i - ~ l ) ( y 2 1 - ~ 2 ) / n ,
(A2) (A3) (A4)
(Note divisor is n not n - 1) = r = Sx2/(sl s2).
169
(A5)
- (n/2) log(
1 -
r 2) -
n.
(A6)
The maximum of the likelihood under the log-normality assumption is -
nlog(2~)- nElog(sx) +log(s2)]
- n/210g(1 - - r 2 ) - n - n ( m l
-bm2).
(A7)
Note that the terms -nlog(2n) and - n are common to Expressions (A6) and (A7), depend solely on n, and may be omitted when the maxima of the log likelihoods are being compared.