~tater Research Vot. [l, pp. ¢~3l to 636. Pergamon Press 197". Prmted in Great Britatn.
STATISTICAL MODELS OF RIVER LOADINGS OF NITROGEN AND PHOSPHORUS IN THE LOUGH NEAGH SYSTEM R. V. SMtTH Freshwater Biological Investigation Unit, Department of Agriculture (NIL Greenmount Road, Antrim BT4I 4PX, N. Ireland and D. A. S'rEWART Biometrics Division. Department of Agriculture (NIL Computer Services Building, Stoney Road. Dundonald, Belfast BT4 3SB, N. Ireland {Received 28 October 76; in revised form March 1977)
Abstract--To establish nutrient budgets of Lough Neagh in N. Ireland, and provide strategies for the reduction of nitrogen and phosphorus it became necessary to obtain estimates of their inputs to the lake. On the six major rivers entering the lake continuous flow metering was available but chemical concentrations were only available every 8 or t5 days. To associate river chemical concentrations and loadings with flows for these observed dates, and hence predict from their relation the loads at the unobserved dates a computer program using eight possible statistical relations was used. Those methods examined included various regressions with adjustments, higher order (polynomial) fits. logarithmic and other transforms, and different rising and falling flow relations. Results produced by the various methods with discussion of their relative merits are presented. Reasons are given for the final selection of a simple log load on log flow relation for rivers derived from terrestrial catchments.
INTRODUCTION
Lough Neagh in N. Ireland has the largest surface area of a British Isles lake (383 km z) and a catchment area of 4453 km-" with a human population of about 340,000. More details of the catchment are given by Smith (1976, 1977). An apparent increasing rate of eutrophication in the lake following a recent increased human use of the lake for effluent disposal has prompted research programmes to examine both its present state and possible strategies for conservation (Wood and Gibson, 1973). In the present study a number of mathematical methods are examined in order to select the most appropriate method for estimating total annual load inputs of nutrients and also seasonal and daily load inputs. Those examined and described here are statistical or empirical models making the most direct use of the continuous flow recording data and the less frequent concentration data. Earlier attempts to estimate annual loads of nitrogen IN) and phosphorus (P) carried by the major rivers entering Lough Neagh were calculated by multiplying mean concentrations by mean flows over monthly periods. From knowledge of the amounts appearing in the lough these loads appeared to be considerable over-estimates, discrepancies being ~eatest for point or limited source derived chemicals such as soluble ortho-P. Consideration of the data showed this variation
was partly due to their extremely skewed frequency distributions and partly to the relationships between chemical concentrations and flows, which varied for the different chemicals. The positive skewness of the daily flow data was of the same magnitude as the daily rainfall data, the kurtosis slightly less. These flow distributions were considerably more skew than log normal and remained skew after log transformation of their values. Different chemicals associated with these flows were found to have different load/ flow characteristics and therefore different daily load frequency distributions. Limited source materials such as soluble ortho-P had curves of reducing slope or slope less than unity with increasing flow, and thus frequency distributions considerably less asymmetric than those of the flows. Many of these could be approximately normalised by using log transforms. According to Department of Environment (1971) such daily load distributions are often log normal. This varied to some extent with the river and the chemical involved, river profile being important as the rivers which fell more slowly in their lower reaches had depositions of material which were flushed and transported by higher flow so load increased more with flow, as discussed by Dickson (1971, 1975) who fitted quadratics of silica loads to flows for these rivers. The chemical concerned was important in that within each river particulate and soluble materials behaved differently, soluble material to some extent being produced by point sources capable of limited output so
631
632
R.V. SMITH and D. A. SrEW.agT
being diluted by higher flows, while some particulate materials were increased by both increased erosion and increased flushing at higher flows. Errors arise if m e a n concentrations are multiplied by mean flows and it was found that in establishing mean concentrations, weightings should be used for each river/year~chemical proportional to the measured flows. By not using such weightings, m e a n concentrations become biased towards those associated with the lower flows, e.g. for soluble o r t h o - P towards higher concentrations, producing considerable over-estimates of the annual loads. Separation into monthly or other periods partly compensates for this by introducing a stratification, or separation of seasonally varying periods of high and low flows but within m o n t h s bias remains. To obtain more accurate estimates of the total annual river load inputs and provide detailed variation of load with time, it was decided to model each river load/flow relationship separately for each year by curve fitting, and from this to predict, for every day from its flow record and possibly its history, the input load for the day. Estimates of daily input loads were required for establishment of treatment strategies. This statistical modelling was attempted by fitting a n u m b e r of alternative load/flow or concentration/flow curvilinear relations on the data available for the days when concentration was observed. By predicting from these relationships, loads for the other days were obtained.
METHODS Sampling The river sampling stations were located on the major rivers at convenient locations as close to Lough Neagh as possible but avoiding any backwash from the Lough. Samples were taken between 10.00-14.00 h every 15 days in 1971 and 1972 and every 8 days during the years 1973-1974. Flow data was collected from the flow gauging stations on the 6 major rivers. Mean daily flows were calculated from observations made at 8-h intervals. Chemical analyses Unfiltered and 0.45 #m membrane filtered samples were stored deep frozen prior to analysis. Soluble ortho-P was analysed after filtration, using a Technicon Autoanalyser, by the stannous chloride-acid molybdate-isobutanol extraction procedure of Henriksen (1965). Easily hydrolysed organic-P compounds are also determined as soluble ortho-P by this procedure (Rigler, 1968). For total P analysis unfiltered samples were digested with potassium persulphate using the procedure given by the American Public Health Association (19711. The digests were then analysed by the same method as for soluble ortho-P. To determine nitrate-N filtered samples were analysed using a Technicon Autoanalyser by reducing nitrate to nitrite with hydrazine in alkaline solution and then determining total nitrite by a modified Criess-llosray reaction (Chapman, Cooke and Whitehead, 1967). Total Kjeldahl-N, i.e. ammonia and organic-N, was determined on unfiltered samples using a micro-Kjeldahl apparatus following the procedure given by the American Public Health Association (1971). Selenium dioxide was used as a catalyst. After digestion the distilled ammonia was determined by Nesslerisation.
CI
o
Constanf supply
F Flow
Fig. 1. Load/flow relationship of a river during rising and falling flow. For explanation of symbols see Method 6. Total N is the sum of the total Kjeldahl-N and nitrate-N concentrations. Computer program The computer program written to analyse this data inputs the 365 (or 366 for leap year) sets of 3 (recorded for 08.00. 16.00 and 24.00 h each day) daily flow observations over the year. These flow values were stored, along with mean daily and mean midday values produced from them by the program, for use with sets of different chemical concentrations. Method 1. A preliminary estimate of total annual load was obtained by multiplying the mean of the 08.00 and 16.00h flows by concentration on the dates it was measured, converting from instantaneous rate to daily total load, and summing resulting total loads and total flows for these dates. The estimate of total annual load was then obtained by multiplying total load for days on which concentrations was measured by the ratio of total annual flow to total flow for these days. This estimate does not allow sufficiently for reduction of concentration with flow for soluble ortho-P, does not give load values for individual days, and was usually higher than the estimates obtained by the other methods. Method 2. A second estimate was obtained by fitting a fifth order polynomial regression of concentration on flow for the days concentration was measured. The use of the high order relationship produced adequate fit to curvatures or non-linearity in the data. From the resulting equation individual load estimates for every other day were obtained by predicting from the flow the concentration for the day. As Fig. l shows concentration/flow or load/ flow relation can be complex. This function was of the form, where C o n c = concentration (g m-3): Conc = A 2 + B,.Q + C2Q" + DzQ 3 + E,_Q~ + F2Q 5 Q is the flow (m -3 day-~l and the coefficients A2, B2, Ca, D2, Ez and F2 were fitted by the orthogonal polynomial subroutine in the program. Method 3. The third total annual value and estimated daily values, was obtained by attempting an adjustment to the fifth order polynomial estimates obtained by Method 2 to compensate for seasonal variation. When the predicted values for the days when concentrations were not measured were calculated, using this same function similar predicted values for all days when concentrations were measured were also calculated, so that two values, i.e. measured and estimated, were available for these 25 or 50 days/year-t. The ratios of measured/estimated were calculated by the program for all days, and a correction function of ratio with time was obtained, again a fifth order curve of ratio on days being used. This was done to adjust estimates for the possibility that the load/flow relationship did not remain stable throughout the year. Using this seasonal function of time all estimated values from Method 2 were adjusted, by fitting it to their day of year. This function was of the form: Ratio
= A3 +
B~t
+ C3t 2 + D3t 3 + E 3 t "~ +
F3t s
Nitrogen and phosphorus in the Lough Neagh system
633
where t is time in days from start of period examined. Method 4. A fourth set of daily load estimates was obtained by fitting a simple linear regression of load on flow for the days when concentration was measured. While this did not include load curvature at high flows, the overall slope reduction caused by these gave estimates lower than for the simpler Method 1. Like Methods 2 and 3 this method gave individual days estimates and the method appeared even with occasionally poor data to be robust if not ~ery accurate. The relation is:
Thus the computer program allowed prediction of eight estimates for each ri~er for each chemical each year. with daily load values predicted as well as annual estimates. The program repeated some of these estimates using 2. 3. 4 . . . days previous running means of flows, instead of actual days flows, both for obtaining the function and predicting from it.
Load = A.,. + B.,.Q
Various means were sought to test the predictions to find the most accurate and most reliable. Independent data was needed to give comparisons. This was available for 1972 for the River Main, one of the 6 major inflow rivers to the Lough. when two independent sets of soluble o r t h o - P concentration data at different 14 day intervals were obtained. Using each set it was possible to predict both overall annual input and also individual day loads for all days including the days recorded within the other data set, i.e. to use each data set to predict the other by each method. Correlations between observed and predicted values for each m e t h o d are s h o w n in Table 1. Method l was not included as it does not predict individual days. For M e t h o d s 2-5, running m e a n flows of 3 days were used as well as single day flows, which gave in most cases a slight improvement in the correlation. They were not applicable to Methods 6 and 7 which used flow history in the determination of whether flow was rising or falling on the date of observation or prediction. It was seen that M e t h o d s 5, 7 and 8 gave the consistently highest correlations between observed and predicted results. Potentially Methods 7 and 8 were the most accurate as they provided the greatest power and flexibility in the curve fitting. However, they were also the most vulnerable to "outliers" or atypical measurements in the data, and occasionally gave results completely different from the other methods. Table 2 illustrates variation in results between the methods, and shows how separate rising and falling flow and high order curve fitting methods were prone
Method 5. A fifth set of estimated daily values, and again total annual load was obtained by fitting the relationship:
Log Load = .4~ + B5 LogQ and from this predicting loads for all days. The merits of this simple method are discussed below. Method 6. It was considered that where deposition and flushing occur and when a chemical is from a limited or point source different load/flow relations are expected for rising or falling flows. History of recent flows was used as an alternative in the program by using running means of flows instead of daily flow. but this did not take into account whether the flow was rising or falling at the time of the observation or prediction. A cause of difference is illustrated in Fig. I where a constant supply of a chemical is considered shown by the line CC t. When deposition exceeds flushing at low flow between sources and measurement the load measured,is less than the supply, when flow rises at time t t, deposition reduces and flushing increases until they are equal at t_,, flushing exceeds deposition at t3 when the load exceeds supply, until deposits are exhausted and the load falls to CC ~ regardless of further increase of flow. The characteristic for falling flow follows the line FF ~ but does not return to the original point as the history of flows and loads with time never repeats. In an attempt to include this effect the program calculated different load:flow relationships for rising and falling flows. Both linear and log regressions were produced. Historical flow was used for each day to determine whether it had rising or falling flow, and the appropriate log load/ log flow relations used to predict the load for that day. The equations were: Log Load = A6R + B6R Log Q (rising flow) and Log Load = A6F- q-- B6F. Log Q (falling flow). Method 7. Some of the log load/log flow analyses and graphs produced by the program in Method 6 showed curvature of log load on log flow. To include this effect the most complex of the relations examined was introduced, which used separate log load/log flow relations for rising and falling flow as Method 6 with a third order polynomial for each relation of log load/log flow to include the curvature. There were limited numbers of observations in either rising or falling flow sets of data. making higher order fits unreliable. The equations were:
Log Load = A-rR + B7R Log Q + C-.R (log Q)2 + DvR (Log Q)3 (rising flow) and Log Load = ATr + Bvr LogQ + C T r ( l o g Q ) : + D~r (Log Q)3 (falling flow). Method 8. The eighth method used a polynomial of log load on log flow without separate equations for rising and falling flows. The equation was:
Log Load = As + Bs LogQ + Cs(LogQ): + Ds (Log Q)3 + Es (Log Q),t + Fs (Log Q)s.
RESULTS
Table 1. Correlations between observed loads of soluble ortho-P and predicted loads in the River Main employing Methods 2-8
Prediction method
No. of days running mean of flow
Correlation
2 2 3 3 4 4 5 5 6 7 8
l 3 1 3 1 3 I 3 l I 1
0.69*** 0.80*** 0.53** 0.63*** 0.54** 0.67*** 0.88*** 0.88*** 0.73*** 0.80*** 0.76***
Significance ***P < 0.001.
of
regression:
**0.001 < P < 0.01.
634
R.V. SMrrH and D. A. STEWART Table 2. Examples of range and variation in predicted loads employing the 8 Methods 1
2
3
17.9
15.5
--
175.4 244.8
--
Data Ballinderry 197l Soluble ortho-P Moyola 197l Total N minus nitrate-N Six Mile Water 1971 Solt, ble ortho-P Blackwater 1972 Soluble ortho-P Main 1972 Soluble ortho-P Blackwater 1973 Soluble ortho-P Main 1973 Soluble ortho-P Main 1973 Total P minus soluble ortho-P
13.4
14.5
14.8
7
8
t3.0
13.5
206.4 146.2 159.8 I77.5 164.8
24.6
18.2
68.4
6 4 . 3 6 3 . 8 5 8 . 7 69.3
--
--
--
51.0
46.4
5 1 . 5 4 5 . 3 48.1
--
--
--
72.9
6 9 . 8 6 4 . 7 7 5 . 7 70.8
--
--
--
56.7
5 0 . 2 4 9 . 1 5 1 . 6 47.7
--
--
--
35.0
2 7 . 4 2 8 , 2 3 3 . 6 31.9
--
--
--
to produce inconsistent results. It is seen that Method 1 estimated loads higher than other methods as it did not sufficiently allow for concentration reduction of soluble ortho-P at high flows. All methods, however, produced estimated loads about 40-50~o lower than multiplying mean concentration by mean flow. Separate rising and falling flow relations and polynomial regression methods due to the smaller number of data points available to establish each were too susceptible to outliers because their high degree of curve fitting would fit curves through uncharacteristic points. This particularly occurred following a period of low flows when a high flow was associated with a high concentration, the effects of such abnormal data being examined by omitting that measurement. An example of this is seen in Table 3, where in one of the 26 dates observed for River Main for 1971 a high flow in autumn occurred after a prolonged period of low flow and concentration was recorded on the first day of the high flow. It is likely that concentrations on the following high flow days were lower but the measures were not available. Using running mean of flows for 3 days instead of the single day flow reduced the effect of this observation and the annual load estimate appreciably, as also did leavTable 3. Effect on annual load estimate of 1971 R. Main soluble ortho-P of omitting one atypical observation from Method 5 data set
No. of observations
No. of days running mean of flow
Annual load (tonnes)
26 26 26 25 25 25
1 2 3 1 2 3
58.8 53.5 52.2 49.9 48.0 48.6
--
Method 6 4 5 itonnes- t year)
19.5
2 0 . 3 2 1 . 0 42.2
18.3
ing out this observation from the data. When it was omitted, using the 3 day mean instead of the single day value was seen to have little effect on the annual estimate, i.e. it was the 3-day mean of this value alone, using previous days" flows and associating this concentration with a lower flow, to so reduce the slope of the load/flow curves that altered the annual load estimates. It was also possible, when 50 observations/yearwere available for all rivers to separate these into two interlaced independent sets of 25 and similarly check predicted against observed. Results obtained were similar to those of Table 1 and are not reproduced here. Methods 6-8 gave potentially, and usually, the highest correlation between observed and predicted, but also the occasional anomalous result. On the grounds of reliability they were therefore rejected. Estimated annual loads of soluble ortho-P, total P, nitrate-N and total N using Method 5 on four years data, on the six rivers, with measured flows for these rivers and years, are reported by Smith (1977). DISCUSSION In the final selection of method the log load/log flow linear regression relationship (Method 5) was chosen because: (i) it gave among the highest correlations between observed and predicted as shown in Table 1; (ii) one method had to be selected for use throughout; (iii) results appeared stable and reliable; (iv) it can be reproduced by workers without recourse to such sophisticated computer facilities as polynomial curve fitting. Porter (1975) examined concentration/flow relations in N. American rivers and stated that "considerable effort was expended in deriving estimates of relia-
Nitrogen and phosphorus in the Lough Neagh s?stem bility of calculated loadings, this question seems so important yet is so often not mentioned". He showed that at given discharge rates the concentrations particularty of suspended solids were higher when flow was increasing than decreasing and attempted to improve accuracy by introducing a term to include rate of rise or tall of flow using the equation: Cone = a o 4- atQ + a,DQ Dt where flow rate of change DQ Dr is m 3 s - ~ h - t He did not show quantitatively improvements produced by this term. While various "'rate of change of flow" representations are being examined in the Lough Neagh rivers study they have not yet indicated a simple measure suitable for inclusion in the prediction equations. The simple "absolute" rate of change used by Porter does not allow adequately for dilution effects for which a proportional model should be used as percentage rate of change is more appropriate (a discharge rise from 2 to 4 units should be equated to a rise from 20 to 40, rather than from 20 to 22 as absolute rate of change implies). The U.S. Environmental Protection Agency (1974) analysis of concentration/flow relationships for each nutrient combines the results from a large number of rivers and streams, claiming the relation is -reasonably constant" from stream to stream and obtaining loads of chemicals from flows in individual rivers. It was considered in the Lough Neagh rivers analysis that river/chemical/year combinations should be kept separate, as there are differences between catchment areas of the rivers (some have much larger proportions of urban populations, i.e. point source contributions), as well as different profiles. While the log load on log flow linear regression: Log Load = a + b
Log Q
635
rived partly at least from point sources was shown by dilution with flow and lower slopes and correlations obtained from the regressions using soluble ortho-P than those using other chemicals. The question of whether the curve-fitting exercises should use concentration data fitted on flow to predict concentrations for other days. or use load data fitted on flow to predict loads for other days. was considered. If concentration has little relation with flow then load will (if the scatter is not greatl have a slope of about unity with flow and a high correlation as flow is appearing on both sides of the equation. However, for soluble ortho-P where the concentration generally reduces with tlow and the load flow slope is considerably less than unity (frequently less than 0.5) load flow relations are more applicable. In practice both were applied as .Methods 2 and 3 used concentration on flow relations while Methods 4-8 use load on flow relations. The curve fitting exercise described is very much an example of inductive modelling, rather than deductive or theoretical. The curves were not fitted to any process as theorised in Fig. 1 but data observed for each year/river/chemical combination, which are a combination of many sets of curves of this sort. Each river has a number of point sources producing different curves of this type. While the present statistical models produce apparently adequate results, a more complicated mathematical process simulation of flow related and point source derivation of chemicals, with flow related deposition and flushing, is under development as a longer term project.
REFERENCES American Public Health Association (1971) Standard
Methods for the Examination oj" Water and Wastewater, produces a straight line regression, it provides a curvilinear relalion of decreasing slope of load on flow when transformed to load on flow, the curvature depending on the value of the log regression slope b. Lines of slope less than unity, as generally occur, curve towards the horizontal at higher flow values, for example h = 0.5 produces a square root relation: Load = a QO.5. Edwards (1973b) showed that for the Yare and the Wensum. in Norfolk. soluble ortho-P was diluted as discharge increases, while nitrate concentration was positively correlated with discharge. He postulated that this was due to leaching and erosion of the soil. Increases in nitrate-N and silica and other chemical loads occurred in November probably as a result of accumulated products being flushed from the soil. In this analysis he used regressions of log concentration on log flow. Edwards (1973a) used regressions of log load on log flow to examine loads carried by the Rivers Yare, Tud and Wensum in Norfolk, and analysed load, flow characteristics of a number of chemicals. Evidence that soluble ortho-P tended to be de-
13th Ed., A.P.H.A., New York.
Chapman B., Cooke G. H. & Whitehead R. [1967l Automated analysis: the determination of ammoniacal nitrous and nitric nitrogen in river waters, sewage effluents and trade effluents. J. Inst. Wat. Pollut. Control 66, 185-18. Department of Environment (GB) (1971) Design of sampling programmes for river waters and effluents. Notes on water pollution No. 54. Dickson, E. L. (L971~ ,4 study oJ the water home silica budget of the Lough Neagh system. M.Sc. Thesis. Queen's University. Belfast. Dickson E. L. (1975) A silica budget for Lough Neagh 1970-1972. Freshwat. Biol. 5, 1-I2. Edwards A. M. C. (1973a) Dissolved load and tentative solute budgets of some Norfolk catchments. J. HydroL 18, 201-217. Edwards A. M. C. (1973b) The variation of dissolved constituents with discharge in some Norfolk rivers. J. Hydrol. 18, 219-242. Henriksen A. (1965) An automatic method for determining low level concentrations of phosphates in fresh saline waters. Analyst 90, 29-34. Porter K. S. 11975) Nitrogen and Phosphorus: Food Production, Waste, the Environment. Ann Arbor Science, Mich. U.S.A. Rigler F. H. (1968~ Further observations inconsistent with the hypothesis that the molybdenum blue method
636
R.V. SMITH and D. A. S'rEWART
measures orthophosphate in lake water. Limnol. Oceanoyr. 13, 7-1.3. Smith R. V. (1976~ Nutrient budget of the River Main, Co. Antrim. Tech. Btdl. Minist. Agric. Fish Fd. London. No. 32. 315-339. Smith R. V. (1977) Domestic and agricultural contributions to the inputs of phosphorus and nitrogen to Lough Neagh. Water Res. 11,453-459.
Stewart D. A. 11975) Mathematical modelling of the ecosystems of Louyh Neugh. Ph.D. Thesis, Queen's University, Belfast. U.S. Environmental Protection Agency (1974) National eutrophication survey methods for lakes sampled in 1972. Report No. PB 240 936. Wood R. B. & Gibson C. E. (I.973t Eutrophication and Lough Neagh. Water Res. 7, 173-187.