A multivariate comparison of the BERMS flux-tower climate observations and Canadian Coupled Global Climate Model (CGCM3) outputs

A multivariate comparison of the BERMS flux-tower climate observations and Canadian Coupled Global Climate Model (CGCM3) outputs

Journal of Hydrology 519 (2014) 1537–1550 Contents lists available at ScienceDirect Journal of Hydrology journal homepage: www.elsevier.com/locate/j...

2MB Sizes 0 Downloads 26 Views

Journal of Hydrology 519 (2014) 1537–1550

Contents lists available at ScienceDirect

Journal of Hydrology journal homepage: www.elsevier.com/locate/jhydrol

A multivariate comparison of the BERMS flux-tower climate observations and Canadian Coupled Global Climate Model (CGCM3) outputs K.P. Chun a,⇑, H.S. Wheater a, A.G. Barr b a b

Global Institute for Water Security, University of Saskatchewan, 11 Innovation Boulevard, Saskatoon, SK S7N 3H5, Canada Climate Research Division, Environment Canada, 11 Innovation Boulevard, Saskatoon, SK S7N 3H5, Canada

a r t i c l e

i n f o

Article history: Received 5 July 2013 Received in revised form 27 January 2014 Accepted 31 August 2014 Available online 16 September 2014 This manuscript was handled by Andras Bardossy, Editor-in-Chief, with the assistance of K.P. Sudheer, Associate Editor Keywords: Flux towers Climate variables Boreal forest Correlation structure Climate models Multivariate analysis

s u m m a r y Multiple variables from simulated climate fields are widely used in hydrological and ecological models and climate impact assessments, yet the importance of multivariate climate relationships is not widely recognised. This study evaluates climatic outputs from the Canadian Coupled Global Climate Model (CGCM3) in the southern boreal forests of western Canada, by comparing the simulated multivariate relationships with those observed at three representative forest sites. Monthly mean data for five near-surface climate variables (net radiation RN, air temperature TA, relative humidity RH, wind speed WS and surface pressure P) are analysed and compared using visual inspection, hypothesis testing and principal component analysis. The projections of the 1st and 2nd principal components, which explain about 75% of the variation in the data, show remarkable similarities in the observations from the three forest sites (with some subtle differences between the evergreen and deciduous plant functional types), but some broad differences between the observations and model outputs. The model reproduces the observed relationships among RN, TA and P, but not between RH or WS and the other variables. In particular, RH is strongly and negatively related to TA and RN in the forest observations but independent in the model outputs; RH is negatively related to WS in the observations but positively related in the model output; and P is uncoupled from the other variables in the observations but negatively related to RH and WS in the model output. The broad scope of the differences indicates a divergence of process representation at large time and space scales. We explore possible reasons for the observed discrepancies, which indicate some limitations in using climate model outputs directly to drive hydrological models. Ó 2014 Published by Elsevier B.V.

1. Introduction The climate fields generated by global and regional climate models are widely used in climate-change impact assessments as inputs to hydrological and biophysical models. Yet the simulated climate fields are imperfect and their deficiencies may add uncertainty or bias to the derived data products (Denis et al., 2002; Solomon et al., 2007; Maraun et al., 2010). Multivariate analysis is a useful technique to evaluate simulated climate data in relation to observations and assess the suitability of modelled climate fields for use in climate-change impact assessments in hydrology and ecology. For example, evaporation estimates using GCM outputs have been found to be unreliable if the correlation structure of the climate variables is unrealistic (Chun et al., 2012). Few studies of the multivariate properties of climate data have been published. ⇑ Corresponding author. Tel.: +1 306 966 8639. E-mail address: [email protected] (K.P. Chun). http://dx.doi.org/10.1016/j.jhydrol.2014.08.059 0022-1694/Ó 2014 Published by Elsevier B.V.

This study compares the multivariate properties of the output of a Canadian climate model with high-quality climate observations above the southern boreal forests of western Canada. We note that boreal forests cover 1/10 of earth’s land surface (1.3  109 ha global terrestrial area (Schindler, 1998) based on the Olson et al. (1983) gridded ecosystem complexes) and play an important role in the global water cycle and climate system. For example, the boreal forest accounts for approximately one third of the global soil carbon pool, which is one of the largest terrestrial carbon storages (Gorham, 1991; Schindler, 1998), and it has been suggested that the albedo changes that would result from more frequent disturbance events in the boreal forest would ameliorate climate warming (e.g. Eugster et al., 2000). Although Global Climate Models (GCM) can be useful tools in evaluating land surface response to climate variability and change, GCM projections have inherent uncertainty related to the complexity of ecological and biophysical processes. The feedbacks between boreal forests and the atmosphere need to be better understood and quantified

1538

K.P. Chun et al. / Journal of Hydrology 519 (2014) 1537–1550

within GCMs (Sellers et al., 1995). Despite the rapid development of GCMs (Solomon et al., 2007) and the evolution of downscaling techniques (e.g. Wilby and Wigley, 1997; Chandler and Wheater, 2002), the dynamics, interactions and cumulative effects of climate change at fine temporal and spatial scales within the boreal forest (e.g. Schindler, 1998) are still highly uncertain. The use of climate model output as a surrogate for observations in applied studies has been widely criticised because of recognised model limitations such as unresolved fine-scale processes (Environment Canada, 2010). However, policy makers and engineers routinely use climate model outputs in practical applications (e.g. Kundzewicz and Stakhiv, 2010); the model outputs are often the most appropriate data source that is available. To resolve the discrepancy between observations and model output, greater effort needs to be given to the use of in situ observations to identify deficiencies in simulated climate fields and flag issues that may limit their usefulness in hydrological and ecological applications. The Boreal Ecosystem Research and Monitoring Sites (BERMS) in Saskatchewan, Canada provide high-quality climate observations to evaluate gridded GCM climate outputs. The sites were established during the Boreal Ecosystem-Atmosphere Study (Sellers et al., 1995) and have continued as part of the Fluxnet-Canada Research Network (Margolis et al., 2006) and the Canadian Carbon Program. The data set, beginning in 1994, provides 30-min meteorological and eddy-covariance flux measurements from a range of contrasting land cover types including mature aspen, black spruce and jack pine forests. The data period includes significant climate variability, with an extended dry spell (2001–2003) followed by several wet years. The data have been widely used for model evaluation and improvement, by Canadian ecosystem C-cycle models (e.g. Ju et al., 2006, Yuan et al., 2008), land surface process models (e.g. Bartlett et al., 2006), atmospheric models (e.g. Betts et al., 2006) and models of the cold-season snowpack (e.g. Essery et al., 2009). They have been used to assess landsurface-atmosphere coupling in atmospheric reanalysis data (such as ERA-40, Betts et al., 2006) and in the off-line evaluation of land surface schemes (e.g. Verseghy, 2000, cf. GEWEX, 2012, and GEWEX, 2013), but are underutilised for evaluating climate model outputs. This paper uses climate observations from three of the BERMS flux towers to evaluate the outputs of the CGCM3 Global Climate Model (CGCM3) (Flato and Boer, 2001), based on multivariate statistical techniques. Three complementary approaches are used: (1) visual inspection of data plots, (2) hypothesis tests and (3) principal component analysis (PCA). The paper compares the multivariate correlation structure of the model output with local observations, and evaluates the usefulness of multivariate analysis as a diagnostic tool.

poor and well drained. The proximity of the sites to each other (82, 104 and 29 km from OA to OBS, OA to OJP and OBS to OJP, respectively) means that they have a similar climate. In general, the climate is continental with a long and dry cold season and a short growing season. The three study sites represent the dominant land cover types in the boreal forests of western Canada (see, e.g., Hall et al., 1997 and Steyaert et al., 1997). The energy and water balances of the three study sites, and of OBS in particular, are typical of the southern boreal forest, as demonstrated by Barr et al. (2012) who compared gauged streamflow at the watershed scale with stand-level outflow inferred from the water balances at the flux towers (i.e., as precipitation minus evapotranspiration minus soil water storage change). The mean long-term ratio of outflow to precipitation was 21% for the watershed (from gauged streamflow), 22% at OBS and 22% at the three stands combined (from the stand-level water balances). Given the observed uniformity in the spatial fields of precipitation and radiation in the study area (Barr et al., 2012), the similarity among outflow ratios implies that the surface energy and water balances of the study sites are regionally representative. 2.2. Climatic measurements

2. Data and methodology

Climatic measurements were made above the forest from flux towers. Air temperature (TA) and relative humidity (RH) were measured at twice the canopy height, at 36 m above-ground-level at OA, 25 m at OBS, and 28 m at OJP using temperature-humidity probes (model HMP45CF, Vaisala Inc., Oy, Finland) in 12-plate Gill radiation shields. For sub-zero temperatures, the relative humidity from the HMP45CF probe, which is output with respect to the saturated vapour pressure over water, was re-expressed relative to the saturated vapour pressure over ice; this adjustment made it comparable to the CGCM3 output. Wind speed (WS) was measured at 1 m above the air temperature measurement level using a propeller anemometer (model 05103, RM Young, Traverse City, MI, USA). Atmospheric pressure (P) was measured at 1–2 m above ground level using a barometer (model SBP270, Setra Systems Inc., Boxborough, MA, USA). The net radiation flux density (RN) was calculated from component measurements of upwelling and downwelling shortwave and longwave radiation, made using paired pyranometers (model CM11, Kipp and Zonen, Delft, The Netherlands) and paired pyrgeometers (model PIR, Eppley Laboratory, Newport, RI, USA). See Barr et al. (2006, 2012) for additional details. The data contained a few outliers that were probably related to undetected measurement problems. However, as described below, the statistical analyses were not affected by the retention or removal of these outliers. The results are reported after the exclusion of a few extreme outliers: OJP surface pressure in June and July 2000; OJP wind speed in July–December 1997; and OBS wind speed in February 1998.

2.1. Sites

2.3. Climate model outputs

The BERMS study area is located near the southern edge of the boreal forest in the Boreal Plains Ecozone, just to the north of Prince Albert, Saskatchewan (Fig. 1). This study analyses climatic data from three sites: Old Aspen (OA), Old Black Spruce (OBS) and Old Jack Pine (OJP), for 1996–2009. The sites are described in Barr et al. (2012). Briefly, OA has a trembling aspen overstory (Populus tremuloides L.) and hazel understory (Corylus cornuta Marsh.) on clay-rich glacial soils. OBS is dominated by black spruce (Picea mariana Mill. BSP) with a sparse understorey of shrubs and feather moss. The soil has an approximately 20-cm deep peat layer overlying waterlogged sand. OJP is a pure stand of jack pine (Pinus banksiana Lamb.) with an understory of reindeer lichen and infrequent, scattered clumps of green alder. The sandy soil is nutrient

The Canadian Coupled Global Climate Model (CGCM3) (Flato and Boer, 2001) is the principal model for climate change studies in Canada. Simulations are available for different Intergovernmental Panel Climate Change (IPCC) emission scenarios (Nakicenovic et al., 2000) as well as reference experimental runs (20c3m and COMMIT) which are independent of the projected emission scenarios (Nakicenovic et al., 2000). The two reference experiments used here are (1) the 20c3m Run 1 T47 experiment for 1850–2000 (backcast) and (2) the COMMIT Run 1 T47 experiment for 2001– 2100 (dry run without CO2 perturbation). The CGCM3 series for the investigated period (1996–2009) was created by appending the COMMIT series to the 20c3m series at year 2000. Other T47 runs and higher resolution runs (T63) of the 20c3m and COMMIT

K.P. Chun et al. / Journal of Hydrology 519 (2014) 1537–1550

1539

Fig. 1. Locations of flux-tower sites (black diamonds): Old Aspen (OA), Old Black Spruce (OBS) and Old Jack Pine (OJP). The shaded area of the Canada map is the extent of the boreal forest.

experiments along with the A1B and B1 experiments are used to assess whether the combined CGCM3 (1996–2009) series is representative. The grid squares containing the three observational sites are dominated by boreal forest but includes some aspen parkland and cropland to the south. The grid centres are within 25 km of the OBS and OJP sites. Moreover, the sites used in this study contributed to the development of the Canadian Land Surface Scheme (CLASS) used in CGCM3 (Verseghy, 2000). This study analyzes climate data at a monthly time scale, which captures the main features of the seasonal cycle as well as inter-annual variability while removing the influence of individual synoptic events. Although the CGCM3 (1996–2009) time series

does not correspond to a specific historical period, it represents possible realisations from a dynamic climate system over a multi-year period. Therefore, instead of comparing CGCM3 outputs individually, correlation matrices between CGCM3 variables are used for multivariate analysis. Two approaches are used to spatially interpolate the CGCM3 model outputs from the centroids of adjacent model grid squares to the location of the BERMS sites: nearest neighbour and the Leith (2005) weighting scheme with a cutoff of 360 km. The interpolated outputs for the OA, OBS and OJP sites are identical for nearest-neighbour interpolation and nearly identical for the Leith scheme; consequently our analysis will use only CGCM3 outputs

1540

K.P. Chun et al. / Journal of Hydrology 519 (2014) 1537–1550

interpolated to one site – the OA site. Optimum-weighting interpolation approaches were also evaluated, however the ensuing analysis was not sensitive to the interpolation method. The issues associated with the differences in spatial scale between the model and observational sites will be discussed in Section 4.2. 2.4. Visualisation Visual inspection of data plots serves three functions: (i) preliminary indication of data structure, (ii) identification of data quality problems, and (iii) identification of plausible assumptions for subsequent analysis (e.g. Chandler and Scott, 2011). The data plots used here are scatterplot matrices, histograms and contour plots of paired-climate-variable space derived from kernel smoothing. The kernel smoothing used (Venables and Ripley, 2002) has the following expression: n x  x  X j ^f ðxÞ ¼ 1 K nb j¼1 b

where P(x, y) is the joint probability of climate variables x and y; and P(x) and P(y) are the probability density functions of climate variables x and y, where x and y have been scaled to zero mean and unit variance. In this paper, P(x), P(y) and P(x, y) are estimated by dividing the scaled x and y data (e.g. TA and RH) into 10 equally spaced bins, with the first and last bins containing values of below 2.5 and above +2.5. Although the number of bins used to discretise the data affects the magnitude of the computed MI values, it does not affect the relative comparison of MI values between the site observations and CGCM3 outputs if the same discretisation grids are used. 2.5. Hypothesis testing

i¼1

q¼1

2p2 þ 3p  1 P  k 1 1 ð6ðp þ 1Þðk  1ÞÞ i¼1 n 1  nk i

Su ¼

n ni S and Sui ¼ Si nk ni  1

The null hypothesis and asymptotic distribution for Box’s M test are the same as the first log-likelihood ratio test (Eq. (1)). Generally, the results of these two tests are similar if the samples are balanced. For estimating p-values, the two formal statistical tests are based on asymptotic results and the data are assumed to be not highly skewed. These tests have known limitations and pitfalls. One recognised problem is that two hypothesis statistics are employed but there is no p-value correction for this multi-testing. Another possible problem is that both hypothesis tests can be overly sensitive; small deviations between the two testing correlation matrices may lead to unwarranted rejection of the hypotheses (e.g. see Everitt and Hothorn, 2011). Further discussion of the limitations of hypothesis testing in the environmental sciences can be found in Chandler and Scott (2011). Although the hypothesis test results are imperfect, they are interpreted cautiously here, in conjunction with the visual inspection of data plots and the results from principal components analysis. 2.6. Principal Components Analysis (PCA)

Although there are many tests for the equality of variances, we chose the Bartlett and Box tests which are less sensitive to nonnormality than the simpler chi-square test for covariance comparison. The null hypothesis investigated here is that the common covariance matrix of the CGCM3 model outputs and the local observations (S = SCGCM3+observation) is the same as the individual covariance matrices (Si = SCGCM3 or Sobservation). The covariance matrices are estimated by ni data points of p standardised climate variables (i.e. Si = XXi0 where Si is a p by p matrix and X is an ni by p data matrix). Two hypothesis tests are proposed here. The first is based on a multivariate generalisation of a statistic in Bartlett (1937). A log-likelihood ratio (log k) test for two covariance matrices can be expressed as: k X

k   X   ðni  1Þ log S1 iu Su 

where

X Pðx; yÞ Pðx; yÞ log PðxÞPðyÞ x;y

2 log k ¼ n log jSj 

M¼q

i¼1

where ^f ðxÞ is the kernel density estimate of the value x, n is the total number of the observed sample x1, . . ., xn, and K() is the kernel function which in this study is a Gaussian distribution with a bandwidth b computed from the standard deviations of the observed samples. As a comparison index for paired-climate-variable space, mutual information (MI) is used to quantify how much the uncertainty of one climate variable can be reduced by providing another climate variable information. For the discrete case, mutual information (MI(X, Y)) (MacKay, 2003) can be expressed as:

MIðX; YÞ ¼

common covariance matrix. k is the number of Si which is 2 as only pair comparisons are performed in this study. || is a determinant operation (cf. Tsagris, 2011). The test statistic can also be viewed as a summation of the log of the pth powered geometric mean of the eigenvalues of S1 i S, and has an asymptotic chi-square distribu1 tion with degree of freedom equal to 2ðpþ1Þðk1Þ where p is number of variables in S. Under the null hypothesis, the variance of the common covariance matrix (S) is same as the ith sample covariance matrix (Si). The second test is based on Box (1949). Similar to Eq. (1), Box (1949) also generalised the statistic in Bartlett (1937) but with the bias consideration related to the unequal sample sizes of the sample covariance matrices. The statistic (M) in Box (1949) is generally referred as to Box’s M and given by:

ni log jSi j ¼

k X i¼1

    log S1 i S

ð1Þ

where n is the total sample size and ni is the sample size of the ith sample covariance matrix (Si). S is the weighted average of Si (i.e. P S ¼ n1 ki¼1 ni Si ) which is the maximum likelihood estimate of the

Principal Components Analysis (PCA) is a widely used technique to reduce the number of multivariate data dimensions by projecting data onto principal axes. It summarises a data set using an eigenvalue decomposition based on a Gaussian assumption. Wallis (1965) provided an early review of PCA applications in hydrology. Chun et al. (2010, 2012) used PCA to identify the differences in the multivariate relationships between a set of observed climate data and the corresponding GCM outputs, and investigated their implications for potential evaporation estimation. In this study, PCA is used to investigate the correlations among five standard climate variables (air temperature, relative humidity, wind speed, radiation and surface pressure), based on the monthly means and standard derivations. To quantify the uncertainty of the PCA results, a simple bootstrapping approach is used (Davison and Hinkley, 1997). By resampling the data with replacement 1000 times, the 95% confidence bounds are derived for the proportional and cumulative variances explained by PCA. To simplify interpretation, the PCA results are plotted as 2D scatter plots, using axes of the principal components to display the projected

K.P. Chun et al. / Journal of Hydrology 519 (2014) 1537–1550

correlation structure of the climate variables. Arrows are added to represent the loadings of climate variable principal components in the scatter plots of observational scores. 3. Results 3.1. Visualising the correlation patterns The annual cycles of the five climate variables show important differences between the observations and model outputs, as seen in the boxplots of monthly means (Fig. 2). Three of the variables (RN, TA and RH) have strong and consistent seasonal cycles, driven by the pronounced seasonal cycle of solar irradiance at this latitude (not shown), which directly influences RN and indirectly influences TA and RH. The annual RN cycle leads TA by one month and RH by five to six months. An annual sinusoidal relationship explains more than 93% of the temporal variation in monthly mean RN and TA at all sites and in the model outputs. The most striking contrasts between the observations and outputs are in the seasonal cycles of RH, WS and P: modelled RH has less seasonal variation and more interannual variation than observed RH; modelled WS has greater variation than observed WS both seasonally and inter-annually; and observed P lacks any clear seasonality whereas modelled P shows a distinct annual cycle with a maximum in spring and minimum in fall. Prior to comparing the model output and observations, we evaluated the impact of interpolation scheme and stationarity on the CGCM3 model outputs. The interpolation scheme used to project the modelled climate fields onto the site locations has only minimal impact on the correlation patterns of the model outputs. The two primary interpolation schemes (nearest neighbour (Fig. 3a) and Leith (not shown)) produce similar correlation patterns, as does a third scheme (optimum weighting interpolation), designed to improve the matching of the climate model outputs to the in situ measurements. The optimum weighting scheme reduces the

1541

differences between the observed (tower) and modelled (CGCM3) PCA results slightly, but the differences remain large, greatly exceeding the PCA differences among the three forest sites. The insensitivity of the model PCA to the interpolation scheme reflects the coarse scale of the CGCM3 grid (3.75° or 400  250 km at this latitude) and the uniformity of the modelled climate field among neighbouring grid squares. None of the interpolation methods introduced fine-scale process information. The scatter plots (Fig. 3) show the correlation structures of the monthly mean climate fields from the CGCM3 model output and the three observational sites, 1996–2009. In the multivariate analysis, each of the monthly mean climate time series are scaled to zero mean and unit variance, then two-dimensional smoothing is used to generate probability density surface contours. The stationarity of the CGCM3 correlation structures was investigated by comparing the patterns for two different 15-year time slices: 1850–1865, driven by the 20C3M scenario using observed greenhouse gas concentrations from the 20th century; and 2085– 2100, under the COMMIT scenario which fixes the atmospheric burdens at AD2000 levels. The correlation structures are generally similar between the two periods and scenarios, and both are similar to the modelled correlation structures shown for a different time slice in Fig. 3a. We conclude that the CGCM3 climate variable correlations do not display strong temporal variability, so that the use of a specific 15-year time slice is justified. The correlation patterns of the observations from the three forest sites are plotted in panels (b), (c) and (d) of Fig. 3. In general, the correlation structures are similar among the three observational sites, but all three sites have features that differ from the model outputs. The CGCM3 simulations and site observations share similar relationships between TA, RH and P but diverge with respect to RH and WS. For instance, WS and RN are positively related in the observed data but negatively related in the CGCM3 outputs. The correlation patterns between RH and WS also differ between the observations and the CGCM3 simulations. The observation-model

Fig. 2. Boxplots of mean monthly climate variables 1996–2009. Vertical columns are Old Aspen (OA), Old Black Spruce (OBS), Old Jack Pine (OJP) and the CGCM3 series for the BERMS study area derived from the nearest- neighbour interpolation scheme. Five climate variables are temperature (TA), relative humidity (RH), wind speed (WS), radiation (RN) and surface pressure (P).

K.P. Chun et al. / Journal of Hydrology 519 (2014) 1537–1550

(b)

0.08 0.06

0.04

0.02

1

0.1

-2 -1 0 1 2

2

(c)

1

-2 -1 0

1

-2 -1 0 1 2 0.04 0.1

0.1

0.1

0.1

0.04

0.04 0. 06

0.08

0.0 8

0.08

0.02

(d)

-2 -1 0 1 2

0.0

0.1

P

0.02 6

6 0.0

0.0 4

0.02

-2 -1 0 1 2

2

0.02

0.04

0.02

0.1

0.08

0.05

6 0.0

RN

0.0 8

0.06

0.05

-2 -1 0 1 2

OBS

2

0.04

-2 -1 0 1 2

-2 -1 0 1 2

OJP

-2 -1 0 1 2

-2 -1 0 1 2

TA

-2 -1 0 1 2

RH 0.06 0.1

-2 -1 0 1 2

0.06

0.05

0.06

0.08

2 1

1

2

6 0.0

0.04

0.08 0.02

-2 -1 0 1 2

0.1

0.04

P

0.04

0.08

8 0.0

-2 -1 0

0.02 0.02

0.04

0.1

2

0.04

6

1

6 0.0

0.05

8 0.0

0.0

-2 -1 0

0.1 0.05

6 0.0

0.0 6

4 0.0

0.02

RN

0.02

0.1

0.1

0.1

2

8

1

0.0

-2 -1 0

0.04

P

0.08

0.1

0.1

0.04

0.02

0.04

0.06

8 0.0

1 0.

0.02 0.08

0. 06

0.02

0.02

-2 -1 0

0.1

0.05

0.05

0.06

0.08

2 1 -2 -1 0

0.08

0.02

0.1

RN

0.04

0.1

0.1

0.08

-2 -1 0 1 2

2

0.06

WS

0.1

0.0 6

0.02

0.04

06 0.

-2 -1 0

0.1

2 0.1

0.1

0.1

08 0.

0.02

0.04

0.1

1

WS

0.04

0.06

0.08

RH

0.1

0.08

08 0.

0.04

0.05

0.1

2

0.1

-2 -1 0

0.02

1

-2 -1 0

1

TA

-2 -1 0 1 2

2

-2 -1 0

0.06

0.1

0

0.04

0.04

0.04

0.1

0.1

0.06

6

0.1

0.0

0.08

-2 -1

0.06

0.1

0.1 0.04

P

0.02

0.08

0.08

0.0 8

0.02

0.02

0.02

0.02

0.08

0.1

0.0 8

1 0.

0.04

-2 -1 0 1 2

0.1 0.05

RN

0.06

0.06

0.08

WS

0.02

0.0 6

-2 -1 0 1 2

0.04

0.1

0.06

12

0.02

0.1

0.02

-2 -1 0

0.04

0.08

0.1

-2 -1 0 1 2

0.1

1 0.

WS 6

08 0.

0.0

-2 -1 0 1 2

0.07

0.04

0.06

0.02

-2 -1 0 1 2

-2 -1 0 1 2

0. 06

0.06 8 0.0

RH 0.02

0.1

0.0 3

0.04

08 0.

05 0.

0.08 0.01

-2 -1 0 1 2

0.0 8

RH

0. 04

OA

TA

-2 -1 0 1 2

0.02

9 0.0

-2 -1 0

12

TA

0.02

-2 -1 0 1 2

0.08

CGCM3 -2 -1 0 1 2 -2 -1 0 1 2

-2 -1 0 1 2

0.1

(a)

0.12

1542

-2 -1 0 1 2

Fig. 3. The plot matrices of (a) the CGCM3 series for the BERMS study area derived from the nearest-neighbour interpolation scheme, (b) Old Aspen (OA), (c) Old Black Spruce (OBS), and (d) Old Jack Pine (OJP). The histograms of five climate variables: temperature (TA), relative humidity (RH), wind speed (WS), radiation (RN) and surface pressure (P) are presented in the diagonal of the plot matrices. The pair relationships of climate variables are presented in scatterplots (upper diagonal subplots) and contour of probability density plots (lower diagonal subplots).

divergence in the correlation patterns that involve RH and WS may be related to the spatial heterogeneity of these fields; in general, WS and to a lesser extent RH are more heterogeneous than RN, TA or P (Ziche and Seidling, 2010). We conclude that the spatially coarse CGCM3 simulations provide more realistic simulations of the more continuous climate fields (RN, TA and P) and less realistic simulations of the more heterogeneous fields (RH and WS). Fig. 4 further illustrates the relationships between climate fields in 3D plots. In contrast to the (RN, TA, P) scatterplots (Fig. 4a), which are similar between the CGCM3 simulations and site

observations, the (WS, TA, RA) scatterplots (Fig. 4b) are strikingly different between observations and simulations. These differences are highlighted in Fig. 5, which shows the mutual information estimates for each climate-variable pair. Although the MI estimates of many variable pairs (e.g. RH vs P) are similar between the site observations and CGCM3 outputs, four pairs (i.e. TA vs RH, TS vs WS, RH vs WS and RH vs RN) share more information in the observations than the CGCM3 outputs, consistent with the results of Figs. 3 and 4. Sections 3.2 and 3.3 will further explore the modelobservation differences.

1543

K.P. Chun et al. / Journal of Hydrology 519 (2014) 1537–1550

OA

CGCM3

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2 1 0 -1

0 -1 -2

-3

-3 2.0

Temperature

-2

-2

2 1

-2

Temperature

0 -1

0 -1

Surface Pressure

2 1

2 1

-3

Surface Pressure

3

3

4

(a)

-1.5

-1.0

-0.5

Radiation

0.0

0.5

1.0

1.5

-3 2.0

Radiation

OJP

-1.5

-0.5

0.0

0.5

1.0

1.5

Radiation

-2

-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

Temperature

1 0 -1

Surface Pressure

2

3 -1.0

0 -1

-3

-1 -2 2.0

Temperature

0

2 1

-2

2 1 0 -2 -1

2 1

- 4 -3

Surface Pressure

3

4

4

OBS

-3 2.0

Radiation

Fig. 4a. Three-dimensional scatterplots of monthly mean net radiation, temperature and surface pressure as output from the CGCM3 model and observed at the three forest sites.

3.2. Hypothesis tests As a complement to the qualitative assessment of scatterplots, two multivariate hypothesis tests were performed to determine whether the data from different sources have a common covariance matrix. Our application of these tests violates two fundamental requirements (the large number requirements for testing asymptotic relationships, and the multinomial assumption that the data are not highly skewed). Other nonparametric or robust tests may be less affected by these limitations. However, our experience shows that, despite their limitations, the two hypothesis tests are practically useful when used in consort with other approaches. We use the two selected parametric tests for relative comparisons rather than formal statistical testing. Based on the hypothesis tests, the correlation matrices show only small differences among the different CGCM3 model runs, presumably because the model runs use the same underlying model structure. The correlation matrices of the CGCM3 series (1996–2009), composed of the 20c3m Run 1 T47 (1996–2000) and COMMIT Run 1 T47 (2001–2009), are not significantly different from the correlation matrices of the A1B (2001–2015), B1 (2001–2015) and 20c3m (1985–2000) experimental runs or the

higher-resolution experimental (T63) model runs. The only significant differences among model runs are between the CGCM3 series and the COMMIT Runs 2 and 3 experiments, related to differences in the relationship between RH and WS. Overall, the hypothesis tests confirm that the correlation structures of the CGCM3 (1996–2009) model runs used in this analysis are representative of the other experimental runs, including those simulated at finer spatial resolution. In contrast to the similarities among the CGCM3 model runs, the correlation patterns of the model runs differ significantly from those of the in situ observations, as identified through the hypothesis tests (Table 1) and confirmed by visual inspection of the scatter plots (Fig. 3). Table 1 compares the correlation matrices between various paired data sets: simulated vs simulated, observed vs observed, and simulated vs observed. The CGCM3 correlation structures from the two interpolation methods and the different periods are not significantly different from one another; however, in all cases the CGCM3 correlation structures are significantly different from the local observations. Among the three forest observational sites, the correlation matrices show significant differences between the OA (aspen) site and the OJP (jack pine) and OBS (black spruce) sites (p  0.05), but remarkable similarity between OBS and OJP (p > 0.96). Two

1544

K.P. Chun et al. / Journal of Hydrology 519 (2014) 1537–1550

CGCM3

OA

-2

-2 -4

-3

-2

-1

0

1

2

0

Wind

-3 3

Temperature

-1

-1

2 1

-4 -3 -2 -1

1

0

Tempeature

2

2 1

0

Wind

1

3

2

3

4

4

(b)

0 -1 -2 -3 -3

-2

-1

0

1

2

3

Rel. Humidity

Rel. Humidity

OJP

-3

-2

-1

0

1

-2

2

Temperature

0

Wind

-1 -2

0 -1

-3

-3 -4

-1 -2

2 1

-4

1 0

Temperature

-1

2

-2

Wind

0

1

1

2

2

3

3

OBS

-3 -2

-1

Rel. Humidity

0

1

2

Rel. Humidity

0.8 0.4

0.6

CGCM3 OA OBS OJP

0.0

0.2

Mutual Information

Fig. 4b. Three-dimensional scatterplots of monthly mean relative humidity, temperature and wind speed as output from the CGCM3 model and observed at the three forest sites.

TA vs RH

TA vs WS

RH vs WS TA vs RN

RH vs RN

WS vs RN TA vs P

RH vs P

WS vs P

RN vs P

Fig. 5. Mutual information (MI) estimates are plotted against pairwise climate variables. The grey lines connect the MI of the three tower observations (OA, OBS and OJP) to contrast the MI of the CGCM3 outputs (solid diamonds). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

features may account for the differences among sites: spatial separation and plant functional type. OA is more than 80 km from OJP and OBS whereas OJP and OBS are only 29 km apart. Perhaps more significantly, OA is deciduous-broadleaf forest whereas OBS and OJP are evergreen-needleleaf forests. The primary contrast among the two plant functional types is in the seasonality of leaf area index, which in turn causes differences in the seasonal cycles of albedo (affecting RN), canopy roughness (affecting WS), and

energy partitioning (affecting TA and RH). Nevertheless, the differences between OA and OJP or OBS are much smaller than those between the CGCM3 simulations and any of the individual sites. The differences in the correlation structure between OA and OJP or OBS indicate that spatial heterogeneity can be significant over distances less than the distance between the CGCM3 grid centres. The subtle differences between OA and OJP or OBS are further illustrated in the next section.

K.P. Chun et al. / Journal of Hydrology 519 (2014) 1537–1550 Table 1 The log-likelihood ratio test (Likelihood) and the Box M test (Box M) results comparing two independent sets of monthly climate data. The insignificant p values are in italics. (OA + OBS + OJP)/3 is the correlation matrix based on averaged climate series of the three forest sites between 1996 and 2009. CGCM3 is the correlation matrix of the model outputs between 1996 and 2009. Data

CGCM3 CGCM3 CGCM3 OA OA OBS CGCM3 CGCM3 CGCM3 CGCM3

p Value

CGCM3 (interpolation based on Leith (2005)) CGCM3 (1850–1865) CGCM3 (2085–2100) OBS OJP OJP OA OBS OJP (OA + OBS + OJP)/3

Likelihood

Box M

0.3163

0.2834

0.3768 0.7608 0.0119 0.0001 0.9721 0.0000 0.0000 0.0000 0.0000

0.3422 0.7345 0.0082 0.0000 0.9653 0.0000 0.0000 0.0000 0.0000

3.3. PCA For both observed and modelled data, the 1st and 2nd principal components explain 50–60% and 20% of the variance, respectively, (Table 2, Fig. 6), with slightly higher percentages for the observations than the model outputs. The 95% confidence bounds from resampling show distinct model – observation differences in principal components 1–3. Unlike the Hadley Centre Model (HadCM3) outputs, which inflated the correlation between climate variables (Chun et al., 2012), the CGCM3 model output analysed in this study deflated the variance in the 1st principal component. The projection of the 1st to 3rd principal components on 2D scatter plots (Figs. 7 and 8) provides a helpful visualisation of the PCA structures. The relative orientation of each pair of climate arrows on the projected space indicates their degree of interdependence; the correlation (r2) between each climate variable pair is roughly equal to cos h, where h is the angle between their respective arrows. Arrows that are orthogonal are thus uncorrelated, whereas arrows that point in the same direction are positively correlated. The PCA plots highlight the similarities and differences between the site observations and the CGCM3 outputs. They clearly demonstrate the model’s ability to reproduce the observed relationships between TA, RN and P (as seen in the size and orientation of their respective arrows); both observed and modelled data show that, at monthly time scales, TA is positively related to RN, and that RN and TA are independent of P. However, the most striking features of the PCA projections (Figs. 7 and 8) are in the broad differences between the CGCM3 outputs and the site observations, which stand out when compared to the close similarities among the three observational sites. The 1st and 2nd principal components for the RH–TA and RH–RN pairs show a strong negative correlation at all three sites but no correlation in the model outputs (Fig. 7). These three variables are all characterised by pronounced and consistent seasonal cycles (Fig. 2). The seasonal cycle of observed RH has a nearly opposite phase to RN and TA, which accounts for the negative correlations Table 2 The first (PC1) and second (PC2) principle components of the CGCM3 model outputs and in situ observations from three forested sites (OA, OBS and OJP). CGCM3

Temp R. Hum Wind Rad SurP

OA

OBS

OJP

1545

between the RH–TA and RH–RN pairs. The negative correlation between RH and TA has several physical causes: the strong dependence of the saturated vapour pressure on TA; the seasonal differences in planetary boundary-layer growth, which increases TA but reduces RH via the entrainment of warm–dry air from the free atmosphere; and the seasonal differences in the surface sensible and latent heat fluxes, which partially offset the other effects by increasing the evaporative fraction in the warm season relative to the cold season. The failure of the climate model to reproduce the observed correlations between the RH–TA and RN–TA pairs at the monthly time scale may reveal shortcomings in the model’s representation of the land–atmosphere processes that control the seasonal cycle of RH. A second disparity between model outputs and observations (Fig. 7) is that the 1st and 2nd principal components show a strong negative dependence of modelled RH and WS on P that is not present in the observations. At all three observational sites, RH and WS are unrelated to P, with one exception: a positive dependence of WS on P at OA. Given the lack of a significant seasonal cycle in P in the observations, the negative dependence of modelled RH and WS on P at the monthly time scale is surprising. A third modelobservation difference is with respect to the dependence of RH and TA on WS. The modelled 1st and 2nd principal components show RH (but not TA) to be positively correlated with WS, whereas two of the observational sites (OBS and OJP but not OA) show a positive correlation between TA and WS and a negative correlation between RH and WS. The model thus (negatively) couples RH and WS to P, and decouples RH from RN and TA, neither of which are corroborated by the observations. In contrast to the differences between the modelled and observed PCA plots (Figs. 7 and 8), the similarities in the PCA analyses among the three observational sites are remarkable. The similarities between the two evergreen-needleleaf forests (OBS and OJP) extend from the 1st to 3rd principal components (Figs. 7 and 8). Moreover, the 1st and 2nd principal components at the deciduous-broadleaf site (OA) differ from the evergreen-needleleaf sites in one respect only: the orientation of the WS arrow. This difference may be related to differences in the seasonality of leaf area index and canopy roughness: the seasonal WS cycle has a single peak in late spring at the evergreen sites but a dual peak in early spring and early autumn at the deciduous site, with a summer minimum that is associated with the leaf-on period (Fig. 2). The dual seasonal peak of WS at OA is unique; all other time series have either a single seasonal peak (WS at OBS and OJP; TA, RN an RH at all sites) or no significant seasonal cycles (P at all sites) (Fig. 2). The dual WS peak at OA effectively decouples WS from the variables that have a sinusoidal seasonal cycle – RN, TA and RH. In contrast, the seasonal WS cycle at the two evergreen needleleaf sites, although somewhat subtle, is roughly in phase with RN and TA, so that WS remains coupled to these variables. Based on these multivariate results, we conclude that the CGCM3 outputs fail to represent important features of the climate above the boreal forest. Further upscaling and downscaling studies are needed to improve the multivariate agreement between simulated and observed climate variables.

4. Discussion and conclusions 4.1. Use of multivariate statistics as a diagnostic tool

PC1

PC2

PC1

PC2

PC1

PC2

PC1

PC2

0.46 0.40 0.34 0.60 0.38

0.62 0.33 0.41 0.30 0.50

0.55 0.58 0.20 0.56 0.04

0.29 0.17 0.76 0.19 0.52

0.51 0.53 0.42 0.53 0.01

0.16 0.12 0.07 0.07 0.98

0.51 0.53 0.45 0.51 0.07

0.13 0.15 0.04 0.08 0.98

This study combines three approaches (visual inspection of scatter plots, hypothesis testing, and principal component analysis) to investigate the multivariate correlation patterns of monthly climate time series, and evaluate climate model outputs in relation to in situ observations from the boreal forests of western Canada.

K.P. Chun et al. / Journal of Hydrology 519 (2014) 1537–1550

1.0 0.9 0.8 0.7

CGCM3 Boreal data

0.5

0.6

Cumul. Explained Variance

0.5 0.1

0.2

0.3

0.4

CGCM3 Boreal data

0.0

Prop. Explained Variance

0.6

1546

1

2

3

4

1

5

2

3

4

5

PC

PC

Fig. 6. The proportional and cumulative variance in climate variables explained by principal components 1–5. Grey lines represent the CGCM3 data and black lines are for the 3 boreal forest data sets. The dark grey area is the confidence bounds of the CGCM3 output and the light grey area is the confidence bounds for the observations. Left panel is explained variance for each component (sometimes called scree plot) and right panel is for cumulative explained variance.

CGCM3

OA

0.0

0.5

-0.5

0.0

0.5

0.5

WS P

0.0

0

PC2

0.0

0

PC2

1

1

P

0.5

2

2

1.0

3

-0.5

-0.5

RN

-1.0

-2

-2

-0.5

RH WS

RN TA

-1

-1

RH

-3

TA

-1

0

1

2

3

-3

-2

-1

0

PC1 OBS -0.5

1

2

3

0.5

0

WSRN TA

-0.5 -1.0 -1.5

-2 -3

-1

PC2

-1.0 3

P

RH

-1.5

-0.5 0

PC1

0.5

1

0.5 0.0

0 -2 -3

-1

0.0

2

1.0

3 2 1

TA

-1

PC2

RN WS

RH

-2

3

1.5

0.5

P

-3

2

OJP

0.0

1.5

-0.5

1

PC1

1.0

-2

0.0

-3

-3

-2

-1

0

1

2

3

PC1

Fig. 7. Two-dimensional projection of principal components PC1 and PC2. The five climate variables are temperature (TA), relative humidity (RH), wind speed (WS), radiation (RN) and surface pressure (P). The grey points are the climate data from CGCM3 and the measurements from the boreal forest sites, and the magnitude (length) of the arrows are projection of explained variance of individual climate variables.

The three approaches give consistent results, revealing common features of agreement and disagreement between model output and observations. The 2D PCA visualisation is particularly helpful in assessing the similarities and differences among observational sites and between model outputs and observations.

However, these analytical tools have obvious limitations. Firstly, they use only linear approaches. Linear bias or error correction methods, which are commonly used in atmospheric studies (e.g. Maraun et al., 2010), cannot remediate nonlinear errors in the correlation structures of modelled climate variables. The

1547

K.P. Chun et al. / Journal of Hydrology 519 (2014) 1537–1550

CGCM3 -1.0

-0.5

0.0

OA 0.5

1.0

1.5

-2

-1

0

1

2

WS

0.0 -1.5 -1.0 -0.5

PC3

0

P

-2

-2

-1

WS

TA RH RN

-1

0

0

RN P

-1

PC3

RH

TA

0.5

1

1

1

1.0

2

1.5

2

-1.5

-3

-2

-1

0

1

2

3

-2

-1

0

PC2 OBS

OJP 3

-3

-2

-1

0

1

2

3 1.0

2

2

1

1

0

1.0

-1

0.5

-2

2

0.5

1

2

-3

1

PC2

0.0

0

-0.5

-0.5

P RH

-1

P

PC3

0.0

0

RH

TARN

-1 -3

-2

-1

0

-1.0 1

2

3

PC2

-1.0

-2

WS WS

-2

PC3

TARN

-3

-2

-1

0

1

2

3

PC2

Fig. 8. Two-dimensional projections of PC2 and PC3. The five climate variables are temperature (TA), relative humidity (RH), wind speed (WS), radiation (RN) and surface pressure (P). The grey points are the climate data from CGCM3 and the measurements from the boreal forest sites, and the magnitude (length) of the arrows are projection of explained variance of individual climate variables.

nonlinear properties of the residuals from this study merit further exploration. Nonlinear multivariate techniques have been used to provide insights into the oscillation modes of climate observations and model outputs (e.g. Hsieh, 2004; Schölkopf and Smola, 2002). Nonlinear transformation is often needed to improve the normality of climate variables (such as WS) and better satisfy the assumptions of hypothesis testing. The nonlinear height correction in Betts et al. (2006) is also worthy of further attention. However, a preliminary investigation of nonlinear methods using the data from this study did not alter the conclusions; like the linear results, the first nonlinear components were similar among CGCM3 model runs but diverged between model runs and observations. Another limitation of multivariate analysis as a diagnostic tool is its inability to attribute the shortcomings in the modelled climate fields to particular model features. Even so, some inferences about model weaknesses are possible. The multivariate analysis in this study demonstrates the CGCM3 model’s ability to reproduce the observed correlation patterns among RN, TA and P above boreal forest but not between those variables and RH or WS. The separation of RH and WS from the other climate fields may reflect the greater spatial heterogeneity of RH and WS. The CGCM3 spatial scale may be too coarse to represent some atmospheric processes at local scales, resulting in deficiencies in the more heterogeneous climate fields RH and WS. Cloud-field evolution, and the associated interplay between surface RN, TA and RH, is affected by land surface properties and processes at relatively fine scales (Giorgi and

Avissar, 1997). The local feedback mechanisms that influence cloud formation within GCMs warrant further attention. The disparities in the correlation patterns of model outputs vs observations may also result from differences in seasonality. Of the five climate fields, two (RN and TA) have strong, consistent annual cycles, and one (P) has little seasonality, at least in the observations (Fig. 2). These similarities account for the overall model-observation agreement in the multivariate patterns between RN, TA, and P. In contrast, the model does not reproduce the observed seasonal cycles of RH and WS or the correlation patterns that involve RH and WS. This shortcoming contributes to the model’s surprising failure to reproduce the observed negative correlation between RH and TA. The shortcomings in the multivariate relationships of the modelled climate fields limit their use in hydrological and ecological applications. In particular, simulated WS and RH should be used with caution until improved simulations are available. Similar limitations in climate model outputs have been noted elsewhere. For instance, the United Kingdom Climate Projection (UKCP) routinely withholds the statistical downscaled WS output because of spatial considerations (Jones et al., 2009). Many studies (e.g. Chun et al., 2010) question whether global or regional climate model outputs can be used directly for hydrological applications. For example, in UK applications, unrealistic Penman–Monteith evaporation estimates result when multiple Hadley Centre GCM and RCM series are used as surrogates for the observations (Ekström et al., 2007; Chun et al., 2012).

1548

K.P. Chun et al. / Journal of Hydrology 519 (2014) 1537–1550

4.2. Use of climate observations from flux towers to evaluate climate model outputs The use of local observations to evaluate climate model outputs has been criticised because of the differences in spatial scale and the absence of many subgrid-scale processes in climate models. In this study, the model grid (approximately 3.75° or 400  250 km at this latitude) is much larger than the area bounded by the three observational sites (which are up to 80 km apart). However, we do not believe that the scale difference poses a serious issue, for three reasons. Firstly, the observational sites were carefully chosen to represent the region. They are located in three of the region’s dominant land cover types (Hall et al., 1997; Zarco-Tejada and Miller, 1999). Secondly, in spite of the large differences in surface characteristics among sites (evergreen-needleleaf vs deciduous-broadleaf forest, forested upland vs treed peatland), the climate observations from each site share a consistent multivariate correlation pattern, which typifies the region (Figs. 7 and 8). Thirdly, the area represented by the observations, although difficult to delineate, is also large. The ‘‘climatological footprint’’ of a flux tower is several orders of magnitude larger than the eddy-covariance flux footprint. One metric of footprint size is the semivariogram, which shows strong spatial coherence at scales of tens of km and some spatial coherence to scales of hundreds of km for monthly TA above northern forests (Holdaway, 1996). The near-surface climate at the monthly time scale is dominated by the properties of the mixed layer and free atmosphere, which are controlled at regional and larger scales. We therefore conclude that the issues associated with the modelobservation mismatch in scale are small. A model’s ability to reproduce the multivariate relationships among observed climate variables at the monthly time scale is an important diagnostic of its performance. Despite the large domain of the model grid square, or perhaps because of it, the multivariate patterns of the modelled climate fields are uniform at even larger scales, with no significant differences among the nine model grid squares centred on the observational sites. This result was unexpected, given the important differences in plant functional type between the grid squares to the south (grass and cropland) and those to the north (boreal forest and peatland). Because of the broad spatial uniformity of the model outputs, the model-observation comparison was unaffected by the use of more sophisticated downscaling/interpolation approaches designed to match the spatial scale of the model to the observations. Observations made above regionally representative land covers have been used in evaluating atmospheric model outputs, particularly in forested regions where standard climate stations above short grass may be not be representative. Near-surface TA, RH and WS are sensitive to the land surface, through its influence on the surface energy balance (via e.g. albedo and canopy conductance) and aerodynamic roughness (via canopy height). We therefore recommend that the climatic observations from flux-tower networks be used more extensively in the evaluation of atmospheric models (Betts et al., 2006). This has not been the case to date, partly because of concerns about the spatial scale of local observations, and partly because of a lack of recognition of the availability and value of these data. The outcome of the analysis will depend on the time and space scales that are investigated, because the spatial and temporal variability of climatic and hydrological processes is scale dependent (Woods, 2005). Consequently, the gridded model outputs will diverge from the tower observations when the respective temporal and spatial scales capture different physical or dynamical processes (e.g. Rodríguez-Iturbe, 1986; Hogg, 1997). It is important to understand the characteristic scales for specific processes. For the temporal (monthly) and spatial (3.75° grid square, <100 km

between towers) scales in this study, the gridded climate model outputs appear to be appropriate for studies of physical processes related to TA and RN but not WS or RH. Physical processes such as evaporation that depend on WS and RH may be poorly simulated at local scales using climate model outputs; local interacting processes that are related to land surface heterogeneity, including spatial heterogeneity in the surface energy balance, planetary boundary-layer dynamics, mesoscale circulations, and cloud-field evolution, provide local feedbacks to WS and RH which are imperfectly modelled at the current climate model grid scales. 4.3. Further development of downscaling and upscaling approaches While multivariate tools are clearly useful to identify deficiencies in modelled climate fields in relation to observations, they do not provide specific details of which geophysical or land attributes contribute to the deficiencies. Consequently, they cannot be used directly to guide climate model improvement related to land surface schemes. One way to overcome this limitation is the incorporation of land surface information into nonlinear weighting or downscaling schemes. The extent of land surface information or the number of grids (kernel length) for downscaling can be decided by various machine learning techniques such as stepwise regression. Such an approach could be used to assess if the observed discrepancies in the modelled climate fields are due to limitations in the land cover scheme employed in climate models. In addition to the statistical downscaling of GCM outputs, dynamical downscaling approaches (i.e. Regional Climate Models (RCMs)) (Giorgi and Mearns, 1999) that include more detailed local processes may be useful to generate more realistic climate fields. Dynamical downscaling approaches could be used to show, e.g., whether the RH and WS discrepancies from this study result from the coarse model spatial resolution or from unresolved subgrid-scale processes. Upscaling is another promising route for generating more realistic climate fields from climate models. Local physically-based models with boundary conditions from climate models can be used to investigate local characteristic processes through upscaling. The main advantage of the upscaling approaches is that their products can be gridded which allows grid-to-grid comparisons. Different land surface schemes can be examined using local observations from flux towers for investigating the interactions between vegetative, hydrological and atmospheric processes. Although upscaling is a relatively new route for improving climate models, previous studies have investigated how the use of local physically-based models influence the feedbacks between vegetation and environmental processes. For example, Hejazi and Woodbury (2011) used the BERMS data to explore and compare the SABAE-HW soil–water– plant model (Loukili et al., 2008) and the Canadian Land Surface Scheme (Verseghy, 1991; Verseghy et al., 1993). The understanding of hydrological processes in the boreal forest and their improvement in hydrological and atmospheric models will benefit from observational programs that combine flux-tower and hydrological (e.g. groundwater and streamflow) measurements. 4.4. Final remarks Realistic multivariate relationships among climatic variables are important for simulating hydrological processes, particularly evaporation and snowmelt, which are sensitive to multiple climate drivers (RN, TA, RH and WS) (e.g. Pietroniro et al., 2007; Chun et al., 2012). This study develops a hybrid approach to investigate the multivariate relationships among climate fields. The approach is then used to evaluate simulated climate fields from the CGCM3 model by comparing its multivariate relationships with those of in situ observations above the boreal forests of western Canada. The greatest model-observation discrepancies are found in the

K.P. Chun et al. / Journal of Hydrology 519 (2014) 1537–1550

relationships that involve WS and RH, which may be related to the spatial heterogeneity of these fields. Because the simulated WS and RH fields are unrealistically decoupled from TA and RN, they should be used with caution for hydrological modelling (Wilby, 2010). Future efforts should be directed at the evaluation of the climate fields from regional climate models, investigating how their multivariate structures are affected by, e.g., model grid size, the degree of land-cover detail, and the representation of land-surface processes, boundary-layer dynamics and cloud physics. A parallel study of multivariate extremes would also be useful, particularly because of their importance in hydrology (Katz et al., 2002; Chun, 2011) and for ecological and natural disaster (drought and flooding) assessment. It has been possible to undertake the current study because of the availability of high quality and continuous flux-tower measurements. Although the rapid development of global climate models allows simulated outputs to be used as surrogate observations for various impact assessments (e.g. droughts and floods), the realism of these model outputs needs to be assessed by continuous field monitoring. Many atmospheric and hydrological processes at various temporal and spatial scales are still not well understood and more observations are needed to understand them. Moreover, emerging patterns of behaviour, which may not have been important in the past, are not likely to be included in current global climate models. Therefore, resources for continuous field measurements like the BERMS study are deemed to be crucial for studying the change (or evolution) of the non-stationary hydrological and ecological systems under uncertain climate. Acknowledgements The authors gratefully acknowledge the Canadian Centre for Climate Modelling and Analysis for the CGCM3 model outputs, and the research teams that collected the observational data, from Environment Canada, University of British Columbia (led by Andy Black) and Queen’s University, Canada (led by Harry McCaughey). The first author thanks the Global Institute for Water Security (GIWS), NSERC’s Canada Excellence Research Chair Program for post-doctoral fellowship support and his favourite Jedi. References Barr, A., Morgenstern, K., Black, T., McCaughey, J., Nesic, Z., 2006. Surface energy balance closure by the eddy-covariance method above three boreal forest stands and implications for the measurement of the CO2 flux. Agric. For. Meteorol. 140, 322–337. Barr, A.G., van der Kamp, G., Black, T.A., McCaughey, J.H., Nesic, Z., 2012. Energy balance closure at the BERMS flux towers in relation to the water balance of the White Gull Creek watershed 1999–2009. Agric. For. Meteorol.. Bartlett, M.S., 1937. Properties of sufficiency and statistical tests. Proc. R. Soc. Lond., A 160, 268–282. Bartlett, P.A., MacKay, M.D., Verseghy, D.L., 2006. Modified snow algorithms in the Canadian land surface scheme: model runs and sensitivity analysis at three boreal forest stands. Atmos. Ocean 44, 207–222. Betts, A.K., Ball, J.H., Barr, A.G., Black, T.A., McCaughey, J.H., Viterbo, P., 2006. Assessing land-surface-atmosphere coupling in the ERA-40 reanalysis with boreal forest data. Agric. For. Meteorol. 140, 365–382. Box, G.E.P., 1949. A general distribution theory for a class of likelihood criteria. Biometrika 36, 317–346. Chandler, R., Scott, M., 2011. Statistical Methods for Trend Detection and Analysis in the Environmental Sciences. Wiley. Chandler, R.E., Wheater, H.S., 2002. Analysis of rainfall variability using generalized linear models: a case study from the west of Ireland. Water Resour. Res. 38, 1192. Chun, K. P., 2011. Statistical downscaling of climate model outputs for hydrological extremes. PhD Thesis, Imperial College London. Chun, K.P., Wheater, H.S., Onof, C.J., 2010. Potential evaporation estimates for 25 stations in the UK under climate variability. In: Paper Presented at Proceedings of BHS Third International Symposium. Chun, K.P., Wheater, H., Onof, C., 2012. Projecting and hindcasting potential evaporation for the UK between 1950 and 2099. Clim. Change, 1–23. Davison, A.C., Hinkley, D.V., 1997. Bootstrap Methods and Their Application. Cambridge University Press.

1549

Denis, B., Laprise, R., Caya, D., Côté, J., 2002. Downscaling ability of one-way nested regional climate models: the Big-Brother experiment. Clim. Dyn. 18, 627–646. Ekström, M., Jones, P., Fowler, H., Lenderink, G., Buishand, T., Conway, D., 2007. Regional climate model data used within the SWURVE project? 1: Projected changes in seasonal patterns and estimation of PET. Hydrol. Earth Syst. Sci. 11, 1069–1083. Environment Canada, 2010. Canadian Centre for Climate Modelling and Analysis http://www.cccma.ec.gc.ca/data/cgcm3/cgcm3_t47_20c3m.shtml. Essery, R., Rutter, N., Pomeroy, J., Baxter, R., Stähli, M., Gustafsson, D., et al., 2009. An evaluation of forest snow process simulations. Bull. Am. Meteorol. Soc. 90, 1120–1135. Eugster, W., Rouse, W.R., Pielke Sr., R.A., Mcfadden, J.P., Baldocchi, D.D., Kittel, T.G.F., et al., 2000. Land–atmosphere energy exchange in Arctic tundra and boreal forest: available data and feedbacks to climate. Global Change Biol. 6, 84–115. Everitt, B., Hothorn, T., 2011. An Introduction to Applied Multivariate Analysis with R. Springer Verlag. Flato, G., Boer, G., 2001. Warming asymmetry in climate change simulations. Geophys. Res. Lett. 28, 195–198. GEWEX, 2012. Project Report: Twenty-Fourth Session of the GEWEX Scientific Steering Group (WCRP Report No. 14/2012), Rome, Italy. GEWEX, 2013. Newsletter: integrated land ecosystem – atmosphere process study (Issue No. 13), University of Helsinki, Finland. Giorgi, F., Avissar, R., 1997. Representation of heterogeneity effects in earth system modeling: experience from land surface modeling. Rev. Geophys. 35, 413–438. Giorgi, F., Mearns, L.O., 1999. Introduction to special section: regional climate modeling revisited. J. Geophys. Res. 104, 6335–6352. Gorham, E., 1991. Northern peatlands: role in the carbon cycle and probable responses to climatic warming. Ecol. Appl. 1, 182–195. Hall, F.G., Knapp, D.E., Huemmrich, K.F., 1997. Physically based classification and satellite mapping of biophysical characteristics in the southern boreal forest. J. Geophys. Res. 102, 29567–29580. Hejazi, A., Woodbury, A.D., 2011. Evaluation of land surface scheme SABAE-HW in simulating snow depth, soil temperature and soil moisture within the BOREAS site, Saskatchewan. Atmos. Ocean 49, 408–420. Hogg, E.H., 1997. Temporal scaling of moisture and the forest–grassland boundary in western Canada. Agric. For. Meteorol. 84, 115–122. Holdaway, M.R., 1996. Spatial modeling and interpolation of monthly temperature using kriging. Clim. Res. 6, 215–225. Hsieh, W.W., 2004. Nonlinear multivariate and time series analysis by neural network methods. Rev. Geophys. 42, RG1003. Jones, P., Kilsby, C., Harpham, C., Glenis, V., Burton, A., 2009. UK Climate Projections Science Report: Projections of Future Daily Climate for the UK from the Weather Generator. University of Newcastle, UK. Ju, W., Chen, J.M., Black, T.A., Barr, A.G., McCaughey, H., Roulet, N.T., 2006. Hydrological effects on carbon cycles of Canada’s forests and wetlands. Tellus B 58, 16–30. Katz, W.R., Parlange, M.B., Naveau, P., 2002. Statistics of extremes in hydrology. Adv. Water Resour. 25, 1287–1304. Kundzewicz, Z.W., Stakhiv, E.Z., 2010. Are climate models ‘‘ready for prime time’’ in water resources management applications, or is more research needed? Hydrol. Sci. J.–J. Sci. Hydrol. 55, 1085–1089. Leith, N. Using generalised linear models to simulate daily rainfall under scenarios of climate change. Report number: (FD2113_rpt2), 2005, University College London. Loukili, Y., Woodbury, A.D., Snelgrove, K.R., 2008. SABAE-HW: an enhanced water balance prediction in the Canadian land surface scheme compared with existing models. Vadose Zone J. 7, 865–877. MacKay, D.J.C., 2003. Information Theory, Inference, and Learning Algorithms. Cambridge University Press, Cambridge, ISBN 0-521-64298-1. Maraun, D., Wetterhall, F., Ireson, A., Chandler, R., Kendon, E., Widmann, M., et al., 2010. Precipitation downscaling under climate change: recent developments to bridge the gap between dynamical models and the end user. Rev. Geophys. 48, RG3003. Margolis, H.A., Flanagan, L.B., Amiro, B.D., 2006. The Fluxnet-Canada Research Network: influence of climate and disturbance on carbon cycling in forests and peatlands. Agric. For. Meteorol. 140, 1–5. Nakicenovic, N., Alcamo, J., Davis, G., de Vries, B., Fenhann, J., Gaffin, S., et al., 2000. Special report on emissions scenarios: a special report of Working Group III of the Intergovernmental Panel on Climate Change. Olson, J.S., Watts, J.A., Allison, L.J., Carbon in live vegetation of major world ecosystems. Report ORNL-5862, 1983, Oak Ridge National Laboratory. Pietroniro, A., Fortin, V., Kouwen, N., Neal, C., Turcotte, R., Davison, B., et al., 2007. Development of the MESH modelling system for hydrological ensemble forecasting of the Laurentian Great Lakes at the regional scale. Hydrol. Earth Syst. Sci. 11, 1279–1294. Rodríguez-Iturbe, I., 1986. Scale of fluctuation of rainfall models. Water Resour. Res. 22, 15S–37S. Schindler, D.W., 1998. A dim future for boreal waters and landscapes. Bioscience 48, 157–164. Schölkopf, B., Smola, A.J., 2002. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT press. Sellers, P., Hall, F., Ranson, K.J., Margolis, H., Kelly, B., Baldocchi, D., et al., 1995. The boreal ecosystem-atmosphere study (BOREAS): an overview and early results from the 1994 field year. Bull. Am. Meteorol. Soc. 76, 1549–1577. Solomon, S., Qin, D., Manning, M., Chen, Z., Marquis, M., Averyt, K.B., et al., 2007. Climate Change 2007: The Physical Science Basis: Contribution of Working

1550

K.P. Chun et al. / Journal of Hydrology 519 (2014) 1537–1550

Group I to the Fourth Assessment Report of the Intergovernmental Panel on Climate Change. Cambridge Univ. Press. Steyaert, L., Hall, F., Loveland, T., 1997. Land cover mapping, fire regeneration, and scaling studies in the Canadian boreal forest with 1 km AVHRR and Landsat TM data. J. Geophys. Res. 102, 29581–29598. Tsagris, M.T. Multivariate data analysis in R. 1.5., Research Report, 2011, The University of British Columbia. Venables W. N. and Ripley B. D., Modern Applied Statistics with S, 2002, SpringerVerseghy, D.L., 1991. CLASS—a Canadian land surface scheme for GCMs. Int. Soil Model. Int. J. Climatol. 11, 111–133. Verseghy, D.L., 2000. The Canadian land surface scheme (CLASS): its history and future. Atmos. Ocean 38, 1–13. Verseghy, D., McFarlane, N., Lazare, M., 1993. CLASS—a Canadian land surface scheme for GCMs, II. Vegetation model and coupled runs. Int. J. Climatol. 13, 347–370. Wallis, J.R., 1965. Multivariate statistical methods in hydrology—a comparison using data of known functional relationship. Water Resour. Res. 1, 447–461.

Wilby, R., 2010. Evaluating climate model outputs for hydrological applications. Hydrol. Sci. J.–J. Sci. Hydrol. 55, 1090–1093. Wilby, R.L., Wigley, T., 1997. Downscaling general circulation model output: a review of methods and limitations. Prog. Phys. Geogr. 21, 530. Woods, R., 2005. Hydrologic concepts of variability and scale. Encyclopedia Hydrol. Sci.. Yuan, F., Arain, M.A., Barr, A.G., Black, T.A., Bourque, C.P.A., Coursolle, C., et al., 2008. Modeling analysis of primary controls on net ecosystem productivity of seven boreal and temperate coniferous forests across a continental transect. Global Change Biol. 14, 1765–1784. Zarco-Tejada, P.J., Miller, J.R., 1999. Land cover mapping at BOREAS using red edge spectral parameters from CASI imagery. J. Geophys. Res.: Atmos. (1984–2012) 104, 27921–27933. Ziche, D., Seidling, W., 2010. Homogenisation of climate time series from ICP Forests Level II monitoring sites in Germany based on interpolated climate data. Ann. For. Sci. 67 (804), 1–6.