Heavy metals in European soils: A geostatistical analysis of the FOREGS Geochemical database

Heavy metals in European soils: A geostatistical analysis of the FOREGS Geochemical database

Geoderma 148 (2008) 189–199 Contents lists available at ScienceDirect Geoderma j o u r n a l h o m e p a g e : w w w. e l s ev i e r. c o m / l o c ...

3MB Sizes 0 Downloads 72 Views

Geoderma 148 (2008) 189–199

Contents lists available at ScienceDirect

Geoderma j o u r n a l h o m e p a g e : w w w. e l s ev i e r. c o m / l o c a t e / g e o d e r m a

Heavy metals in European soils: A geostatistical analysis of the FOREGS Geochemical database Luis Rodríguez Lado a,⁎, Tomislav Hengl b, Hannes I. Reuter a a b

European Commission, Directorate General JRC, Institute for Environment and Sustainability, TP 280, Via E. Fermi 1, I-21020 Ispra (VA), Italy Institute for Biodiversity and Ecosystem Dynamics, University of Amsterdam, Nieuwe Achtergracht 166, 1018 WV Amsterdam, The Netherlands

a r t i c l e

i n f o

Article history: Received 20 March 2008 Received in revised form 16 September 2008 Accepted 22 September 2008 Keywords: Soil mapping Regression-kriging MODIS Night lights image Geochemical database Pan-European monitoring

a b s t r a c t This paper presents the results of modeling the distribution of eight critical heavy metals (arsenic, cadmium, chromium, copper, mercury, nickel, lead and zinc) in topsoils using 1588 georeferenced samples from the Forum of European Geological Surveys Geochemical database (26 European countries). The concentrations were mapped using regression-kriging (RK) and accuracy of predictions evaluated using the leave-one-out cross validation method. A large number of auxiliary raster maps (topographic indexes, land cover, geology, vegetation indexes, night lights images and earth quake magnitudes) were used to improve the predictions. These were first converted to 36 principal components and then used to explain spatial distribution of heavy metals. The study revealed that this database is suitable for geostatistical analyses: the predictors explained from 21% (Cr) to 35% (Pb) of variability; the residuals showed spatial autocorrelation. The Principal Component Analysis of the mapped heavy metals revealed that the administrative units (NUTS level3) with highest overall concentrations are: (1) Liege (Arrondissement) (BE), Attiki (GR), Darlington (UK), Coventry (UK), Sunderland (UK), Kozani (GR), Grevena (GR), Hartlepool & Stockton (UK), Huy (BE), Aachen (DE) (As, Cd, Hg and Pb) and (2) central Greece and Liguria region in Italy (Cr, Cu and Ni). The evaluation of the mapping accuracy showed that the RK models for As, Ni and Pb can be considered satisfactory (prediction accuracy 45–52% of total variance), marginally satisfactory for Cr, Cu, Hg and Zn (36–41%), while the model for Cd is unsatisfactorily accurate (30%). The critical elements limiting the mapping accuracy are: (a) the problem of sporadic high values (hot-spots); and (b) relatively coarse resolution of the input maps. Automation of the geostatistical mapping and use of auxiliary spatial layers opens a possibility to develop mapping systems that can automatically update outputs by including new field observations and higher quality auxiliary maps. This approach also demonstrates the benefits of organizing standardized joint European monitoring projects, in comparison to the merging of several national monitoring projects. © 2008 Elsevier B.V. All rights reserved.

1. Introduction The generalized mobilization and dispersion of pollutants from their natural reservoirs to the atmosphere, soil and water is one of the most significant negative impacts of human activities on terrestrial and aquatic ecosystems (Salemaa et al., 2001; Lin et al., 2002; Koptsik et al., 2003). Mining, iron and steel industry, road transport, waste incineration, and the use of fertilizers and agrochemicals are identified as the main human sources of heavy metals in soils and water in the superficial ecosystems (Hutton and de Meeûs, 2001; Hansen et al., 2002). In addition, emissions from volcanoes, degassing processes in the Earth's crust, forest fires or the chemical composition of the parent material can be also important sources of heavy metals in soils (Løkke et al., 1996; Palumbo et al., 2000).

⁎ Corresponding author. Tel.: +39 0332 789977; fax: +39 0332 786394. E-mail addresses: [email protected] (L.R. Lado), [email protected] (T. Hengl), [email protected] (H.I. Reuter). 0016-7061/$ – see front matter © 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.geoderma.2008.09.020

A large proportion of soils in industrialized countries contain higher levels of several elements and compounds considered as pollutants than the corresponding natural background values in a pristine situation (Hijmans et al., 2005). Pollution caused by heavy metals is especially problematic in areas where synergy with other types of polluting agents exists. This is true in the case of industrial areas receiving large inputs of acidifying compounds, which creates optimal conditions for increased mobilization, bioavailability and thus toxicity of the metals stored in soils (Salemaa et al., 2001; Lin et al., 2002; Clemente et al., 2003; Cappuyns et al., 2004; de Vries et al., 2005). In Europe, mandatory reductions on the annual emissions of cadmium, lead and mercury to avoid significant adverse effects on ecosystems have been put in the Heavy Metals Protocol (UN/ECE, 1998). This will have an increasing importance because, according to the most recent report of the Coordination Center for Effects (Posch et al., 2005), the distribution and magnitude of the deposition of these elements puts large areas of European ecosystems at risk both in 2000 and 2020. Apart from cadmium, lead and mercury, six additional

190

L.R. Lado et al. / Geoderma 148 (2008) 189–199

heavy metals (chromium, nickel, copper, zinc, arsenic and selenium) present an increasing threat for human health and the environment. Recently, the European Commission (EC) has been preparing a proposal for a framework Directive (European Communities, 2006) that sets out common principles for the protection of soils across the EC. From the aspect of soil pollution, three main questions need to be clarified: (i) which threshold values should be used to classify soil as polluted? (ii) at which locations can high natural background values of heavy metals be expected? and (iii) which methods should be used to explain the spatial distribution of these elements in soils of Europe? Threshold values for soils are difficult to evaluate since the toxicity and bioavailability of heavy metals is not only dependent on the total content in soils but also on many other environmental variables. Although some European countries (Netherlands, UK) developed their own soil quality standards for some heavy metals, at European level only threshold values related to the application of sewage sludge in agricultural soils have been defined (EU Directive 86/278/EC). The determination of natural background values is controversial because they can decrease the responsibility of human activities for the overall pollution on soils (Baize and Sterckeman, 2001). At pan-European scale, it is often difficult to determine the background values that would correspond to a pristine situation since the geochemistry of most of our ecosystems is greatly influenced by a long history of anthropic activities, and even the concept of a background value is often fuzzily defined (Reimann and de Caritat, 2005; Reimann and Garrett, 2005). To improve this situation, in 1993 the International Union of Geological Sciences and the International Association of Geochemistry and Cosmochemistry (IUGS/IAGC) initiated the Global Geochemical Baselines Programme with an aim to establish the geochemical reference baselines for a number of elements at a global scale (Salminen, 2006). The European contribution to this International Programme started in 1997 and it has been carried out by governmental institutions of 26 European countries under the advisement of the Forum of European Geological Surveys (FOREGS). The final product of this collaboration was the “Geochemical Atlas of Europe” (http://www.gsf.fi/publ/foregsatlas/) that includes a database of about three thousand samples for solid media: topsoil, floodplain sediment, stream sediment and humus (ranging from 385 samples of sub soil to 852 samples of stream sediment) in addition to a number of maps of the element contents in soils of Europe based on the former database. The term “geochemical baseline” in this context is not equal to the “background value” since it represents a measurement that does not correspond to a pristine situation. However, these geochemical baselines can provide a perspective on the present status of pollution of the Europeans soils, and serve as a model for future pan-European monitoring projects. Some recent analyses of the FOREGS database can be seen in Imrie et al. (2008). So far, several attempts were made to determine the spatial distribution of the concentrations of heavy metals in European soils. Reimann et al. (2003) created maps of heavy metal contents in soils using an Inverse Distance Weighted interpolator on about 740 samples from agricultural fields. The Geochemical Atlas of Eastern Barents region (Salminen et al., 2004) also includes interpolated maps of heavy metals from 1358 sampling sites in the northern part of Europe. Gawlik and Bidoglio (2006) produced maps for Cd, Cr, Cu, Hg, Ni, Pb and Zn in 11 European countries by using empirical relationships with soil parent material and land use. The European Environmental Agency (2006) merged the sampling points from three different soil databases to create a map of the concentration of lead in topsoils across Europe (http://dataservice.eea.europa.eu). In principle, the only official maps of heavy metals in soils of Europe are those presented in the “Geochemical Atlas of Europe”. This atlas contains maps for 85 variables for five different media: floodplain sediment, humus, soil, stream sediment and stream water, all produced using the Alkemia Smooth interpolation on a

6 km grid (de Vos and Tarvainen, 2006). Although this work represents an important contribution, we identified several points that could be considered to improve the reliability and accuracy of the final maps produced. Firstly, the authors of the FOREGS atlas did not consider merging the samples taken for different media or running more sophisticated geostatistical analyses to generate the maps. Secondly, they did not mask out areas that were not represented in a specific sample (histosols, water bodies), which limits the further uses of such maps. Moreover, the FOREGS atlas shows multiple panEuropean maps of the same variable, which might be confusing for decision makers and spatial modelers. For example, there are 10 maps of lead distribution in the book: in subsoil (two laboratory techniques), in topsoil (two laboratory techniques), in humus, in stream water, in stream sediment (two laboratory techniques) and in floodplain sediment (two laboratory techniques). All 10 maps show values of the same element (often with different patterns) over the whole of Europe. Thirdly, because no transformation has been applied to the original data (most of the heavy metal concentrations show a skewed distribution), the resulting patterns were very much influenced by locally high values, which questions the overall reliability of the produced maps (see e.g. Papritz et al., 2005; Romić et al., 2007 for discussion). Fourthly, the interpolation method used to build the maps did not provide an estimate of its uncertainty. With this background, we decided to test a geostatistical, digital soil mapping framework to interpolate the concentration of heavy metals over broad areas and to provide information about the uncertainty of the produced maps. Our assumption was that auxiliary predictors can be used to improve the detail and accuracy of the existing FOREGS atlas maps and extend the spatial analysis of the point data using state-of-the-art geostatistical techniques. We also wanted to demonstrate the benefits of producing geoinformation through a joint pan-European project and to suggest ways to design soil monitoring networks for the permanent monitoring of this soil threat. 2. Materials and methods 2.1. The FOREGS dataset The point dataset was obtained from the FOREGS website, courtesy of the Association of the Geological Surveys of The European Union. For the purpose of this study, we used only the laboratory measurements of extractable heavy metal concentrations (HMC) in top-soil and floodplains determined for As, Cd, Cr, Cu, Ni, Pb and Zn (mg kg− 1) by ICP-AES using the Aqua Regia method (ISO, 1995) and for Hg (mg kg− 1), directly on the soil solid samples using a cold vapour absorption technique within an Advanced Mercury Analyzer (AMA254, ALTEC). We did not include the stream sediment measurements in the analyses because the geochemical processes and accumulation in the water environment are different. At five locations the precision of the coordinates was not sufficient for separation of sampling locations. At these locations values for topsoil and floodplain were averaged and replaced the original value. The FOREGS database is rather extensive (85 variables measured in five media), so we have focused on mapping a selection of variables in the database. In the near future, we anticipate that also other variables from this database will be mapped with an improved spatial and thematic detail. The total number of points used in this exercise was 1588, although not all values were available at all locations, ranging from 17 missing values for As, Cd, Cu, Ni, Pb and Zn to 79 missing values for Hg. The original coordinates of the sampling points have been provided in degrees, rounded to 0.01°, which corresponds to a horizontal precision of 850 m at 40° North latitude, and transformed to European Terrestrial Reference System (see www.euref.eu). The reorganized point database and all input and derived maps shown in this paper can be accessed from the http://eusoils.jrc.it website.

L.R. Lado et al. / Geoderma 148 (2008) 189–199

2.2. The study area The area of interest corresponds to the 26 European countries that contributed their data to the IUGS/IAGC program. It includes a very high diversity of climatic, geologic, edaphic and socio-economic conditions. The Canary Islands were excluded from the mapping project for computational reasons. All the maps were projected to the Lambert Azimuthal Equal Area projection (European Terrestrial Reference System 1989) following the recommendations of INSPIRE (Annoni, 2005). Each raster map had the following grid definition: MinX: 2,636,000, MinY: 1,425,000, MaxX: 5,956,000, MaxY: 5,416,000 and a grid size of 1 km. We further masked out non-soil surfaces such as water bodies (rivers, lakes, sea etc.) and permafrost areas. A consistent European wide water mask, indicating the percentage of water area inside a 1 km pixel, was produced by using the blocksum function of the NASA SRTM (Shuttle Radar Topography Mission) V2 data set (Rabus et al., 2003), the CORINE (Coordinated Information on the European Environment) land cover classification (http://dataservice.eea.europa.eu), lakes contained in the GISCO (Geographic Information System of the European Commission) data, water reflection by the use of Image2000 dataset (de Jager et al., 2006), the Global Self-consistent, Hierarchical, High-resolution Shoreline Database (Wessel and Smith, 1996), and the Global Lakes and Wetlands Database (Lehner and Doll, 2004). Areas with permanent ice cover have been detected using the mean annual Enhanced Vegetation Index (EVI) derived from the MODIS (Moderate Resolution Imaging Spectroradiometer) 1 km images obtained for the years 2003 and 2004. In this case, ice cover, water bodies and bare rock areas were detected based on a negative EVI value. The total soil-cover area for these 26 countries was estimated to be 4,217,241 km2. 2.3. Auxiliary GIS layers Because the FOREGS data set is covering a large extent, we decided to go to the effort to prepare a large number of auxiliary raster maps. These include geological and land cover class-type maps, MODISbased vegetation indices, night lights images, earthquake magnitudes and distance to roads and railroads (Fig. 1). The map of geology was obtained from Pawlewicz et al. (2003). This map includes a generalized synthesis of the main geologic surface outcrops of bedrock in Europe as well as coarse information on the rock types and their geologic age of formation (http://pubs.usgs.gov/). The original legend was reduced to 10 classes relevant for the mapping of heavy metals: (1) Granites, rhyolites and quartzites; (2) Paleozoic schists, phyllites, gneisses and andesites; (3) Shales and sandstones; (4) Mesozoic Ultramafic, basic phyllites, schists, limestones and evaporates; (5) Jurassic, Triassic and Cretaceous calcareous rocks; (6) Cenozoic serpentinites, gabbros and sand deposits; (7) Tertiary basanites and andesites; (8) Neogene and Paleogene calcareous rocks; (9) Quaternary limestones and basaltic rocks; and (10) Other Ultramafic and undefined rocks. The CORINE Land Cover 2000 map of Europe (CLC2000), generalized to a 1 km grid, was used to represent the main classes of land cover. For Switzerland, we used the Corine Land Cover 1990 (CLC1990) since no updated information was available. The CLC1990 classes for this country were adjusted to those described in the CLC2000 and both data sets were merged together and aggregated to 1 km resolution. The original 44 classes were simplified to 8 classes: (1) urban infrastructures; (2) agriculture; (3) forest; (4) natural vegetation; (5) beaches; (6) ice bodies, (7) wetlands and (8) water bodies. Monthly averaged MODIS images of the EVI at 1 km resolution for the period 01/01/2004 to 31/12/2006 were obtained from the MODIS Terra imagery at the Earth Observing System Data Gateway (http:// redhook.gsfc.nasa.gov/~imswww/pub/imswelcome/). Seventeen single blocks covering the whole study area were mosaicked and reprojected to the Lambert Azimuthal Equal Area (ETRS89) projection.

191

We performed Principal Component analysis on 19 complete mosaics and used the first 10 principal components of the EVI images. The 1 km Digital Elevation Model (DEM) was derived from the SRTM30 V2 data set obtained from the Jet Propulsion Laboratory (http://www2.jpl.nasa.gov/srtm/). The SRTM DEM was used to derive a slope map, the Topographic Wetness Index and total incoming solar insolation (Böhner et al., 2006). The cumulative earthquake magnitude (quakem) image was calculated by using the global Seismology point database (http:// earthquake.usgs.gov/eqcenter/). In Europe, there were over 90,000 registered earthquakes in the period 1973–1994. We used the logarithmic measure of the size of an earthquake, which is related to the energy released as seismic waves at the focus of an earthquake. The 1 km density grid was generated by using a kernel smoother and search radius of 10 km. The lights at night (nlight) image for the year 2003 was obtained from the Defense Meteorological Satellite Program (http://www.ngdc. noaa.gov/dmsp/), which measures night-time light emanating from the earth's surface at 1 km resolution. The lights at night map contains the lights from cities, towns, and other sites with persistent lighting, including gas flares. Ephemeral events, such as fires have been discarded. The background noise was identified and replaced with values of zero. This map is a direct estimate of the urbanization level and is now increasingly used for quantitative estimation of global socioe-conomic parameters as well as for human population mapping (Sutton, 1997; Sutton et al., 1997; Doll et al., 2000). The map of distances to roads, airports and utility lines (dinfra) was calculated using the distance operation in ILWIS and the GIS layers from the GISCO database of the European Commission (http://eusoils. jrc.it/giscodbm/dbm/). Firstly, distance maps were derived for each layer separately and then these three layers were combined to produce average distance to infrastructure lines. Mean annual temperature and accumulated precipitation maps were obtained from the very high resolution raster layers created by Hijmans et al. (2005) on a global scale at 1 km grid resolution. The annual potential evapotranspiration (PET) was calculated from monthly temperature data using the method of Thornthwaite (1948). Runoff was calculated as the difference between annual accumulated precipitation and PET. We also used the estimated annual deposition and emission rates of cadmium, lead and mercury in Europe for year 2004 calculated within the European Monitoring Evaluation Programme (http:// webdab.emep.int). The original 50 km grids were down-scaled to 1 km grids using ordinary kriging. This gives a total of 36 maps of predictors. Many of the predictors were intercorrelated and showed interesting correlations, e.g. PC2 of EVI seems to be highly correlated with the density of traffic (dinfra) and the map of runoff; PC1 of EVI is correlated with the map of temperature and elevation (DEM) is correlated with some geological units. To minimize multicollinearity, these predictors were converted to independent principal components (raster maps) in ILWIS GIS. However, due to the heterogeneous nature of the original predictors, they were all previously rescaled to a common scale ranging from 0 to 255. Since some predictors show continuous changes and others (land cover, geological units) represent abrupt changes of the values, the final principal component shows a hybrid representation of the total environmental conditions. 2.4. Geostatistical analysis The geostatistical analysis steps used in this paper basically follow the regression-kriging framework previously described in Hengl et al. (2007) and Romić et al. (2007). For more in-depth explanation of the procedures see also Hengl (2007). The original variables in all cases showed skewed distributions. Logit transformations with physical limits set at zmin = 0 and zmax = 106 were performed to make the data suitable for regression and

192 L.R. Lado et al. / Geoderma 148 (2008) 189–199 Fig. 1. Some auxiliary raster maps used as predictors for estimation of heavy metal concentrations. Geology and CORINE land cover: simplified classes as shown in Section 2.3; MODIS EVI, distance to roads, night light image and SRTM DEM: rescaled values from 0 to 255.

L.R. Lado et al. / Geoderma 148 (2008) 189–199

Fig. 2. The biplot display of the Principal component analysis on eight HMCs (N = 1588).

193

variogram analyses (Hengl et al., 2004; Papritz et al., 2005). The transformed variables in the point samples were first correlated with an extensive set of environmental predictors (auxiliary GIS layers at 1 km grid resolution). The regression models retained were selected by stepwise regression (backward selection) using the Akaike Information Criterion (AIC). The residuals were tested for normality using the standard normality tests (Thode, 2002) and then analysed for spatial auto-correlation. The predictions were obtained by regression-kriging, which was run in the computing environment R (http://www.r-project.org/) using the gstat V0.9-39 (Pebesma and Wesseling, 1998), sp V0.9-14 (Pebesma and Bivand, 2005) and rgdal V0.5-14 packages. We used block regression-kriging to obtain spatial averaged HMC on a continental scale. All regression analyses were performed on the point data, using the information from the 1 km auxiliary grids in the same locations as ancillary data. Variogram fitting was performed on the residuals of these regression analyses. The regression equations were applied on the auxiliary variables at 1 km grid resolution.

Fig. 3. Variogram models fitted for target variables and their residuals (broken line). All fitted automatically using the default settings in gstat. See also Table 1.

194

L.R. Lado et al. / Geoderma 148 (2008) 189–199

However, for computational efficiency, the resulting models were averaged to 5 km grids and the final predictions were also generated at this latter resolution. After mapping all HMCs over the whole area of interest, we derived the principal components and the overall (mean) normalized interpolation error map. The normalized interpolation error is calculated by dividing the interpolation error (universal kriging variance in gstat) by the global variance in the point data (Hengl et al., 2004): σ RK ¼

σ RK σ

This way the interpolation error can be summed for heavy metal concentrations of different type. The map of overall normalized interpolation error can be used to locate additional point samples to improve the quality of the maps. 3. Results 3.1. Preliminary data screening An inspection of the samples shows that the observations are well spread over the area of interest, with a mean shortest distance between point pairs of 5.6 km. The points are somewhat clustered toward shorter distances, because the samples in different media have been selected in the near vicinity. This can even be beneficial for geostatistical analysis since, for the fitting of variograms, we are often more interested in the estimation of the spatial auto-correlation model at short distances. According to our analysis, the inspection density is about 0.4 observations per 1000 km2 (N = 1588 points, A = 4,217,241 km2), which corresponds to an effective scale of about 1:8M, i.e. a cell size of about 4.1 km (Hengl, 2006). We eventually decided to work with a relatively coarse grid (5 km), and to consider downscaling only if the regression models become highly significant. Note also that the FOREGS points do not represent organic soils and soils with permafrost, thus these soils were also masked out from the analyses. The Principal Component Analysis of the samples shows that several HMCs are highly (positively) correlated (Fig. 2). This is the case of Cr and Ni (r = 0.85), Cu and Ni (r = 0.75), Pb and Zn (r = 0.74), Cd and

Zn (r = 0.69) and Pb and Hg (r = 0.69). In summary, the HMCs are well represented using the two components. The second component clearly differentiates two groups of elements, As, Cd, Hg, Pb and Zn with positive correlation and Cr, Cu and Ni with negative correlation. Similar groupings were found in other studies on heavy metals (Facchinelli et al., 2001; Micó et al., 2006; Zhang, 2006; Luo et al., 2007) and it is often interpreted as indicator of the source of origin of the elements: anthropogenic for As, Cd, Hg, Pb and Zn, and geogenic for Cr and Ni. The intermediate position of Cu may indicate a joint contribution from both natural and anthropogenic sources. 3.2. Variogram modeling and regression analyses Further analysis of the spatial autocorrelation structure of all HMCs showed that they present distinct spatial autocorrelation patterns (Fig. 3). However, in all cases the nugget variation is significant, ranging from 45% of global variance for As, up to a value of 66% for Cd. Assuming that the HMCs are measured with a relatively high precision, most of the nugget variance can be attributable to the shortrange variability of the HMCs. The nugget variance remains large even after the regression modeling, so it was decided to increase the support size from 1 to 5 km grid cell size to minimize the impact of short range variability. The correlation analyses between HMCs and predictors confirmed that several environmental variables can be used to explain the distribution of heavy metal content in soils. Preliminary correlation plots showed several interesting relationships: negative correlation between Cd and mean annual biomass; Zn and Pb are positively correlated with the night light image; Hg increases with an increase of temperature etc. Single predictors explain only a small portion of variation and most of the relationships often seem to be non-linear or at least very erratic. The final results of step-wise regression (8 target HMCs versus 36 principal components derived from a whole bulk of predictors) gave eight regression models which were all significant at 0.001 probability level. The adjusted R-square ranges from 0.21 for Cr up to 0.35 for modeling of Pb (Table 1). Step-wise regression reduced initial set of predictors (36) by 35–45% of the total number. The most

Table 1 Summary input and fitted parameters, and results of cross-validation Variable

As

Cd

Cr

Cu

Hg

Ni

Pb

Zn

Samples Min Max Median SD Median⁎ Variance⁎

2.5 410 6 19.98 −12.02 0.847

0.010 23.60 0.20 1.115 −15.20 1.127

1 2340 22 85.2 −10.75 0.681

1 421 14 26.29 −11.18 0.705

0.002 4.391 0.0405 0.2052 −17.06 0.968

1 2565 16 96.47 −10.98 0.906

1.5 5200 16 174.4 −11.11 1.021

4 2832 52 155 −9.81 0.601

Fitted models Nugget Sill Range (km) Adj. R-square Dominant SPCs Nugget (residual) Sill (residual) Range (residual)

0.365 0.331 175.5 29% 4 0.380 0.232 101.5

0.655 0.483 1,036.8 24% 2, 8, 13 0.688 0.123 332.3

0.333 0.411 806.0 21% 4, 17 0.327 0.180 129.7

0.397 0.217 308.4 27% 2, 4 0.346 0.152 65.7

0.596 0.362 550.6 24% 2, 4, 13 0.613 0.178 195.1

0.383 0.680 1,244.2 32% 4, 14 0.401 0.163 178.6

0.524 0.427 771.8 35% 2, 4, 13 0.471 0.200 87.0

0.358 0.215 354.6 28% 2, 4, 13 0.348 0.098 111.3

Evaluation Map SD (OK) Map SD (RK): RMSPE (OK)⁎ RMSPE (RK)⁎ Variance explained (OK) Variance explained (RK)

4.16 4.35 0.687 0.685 44% 45%

0.112 0.129 0.908 0.885 27% 30%

14.60 14.19 0.641 0.635 39% 41%

6.48 7.21 0.683 0.664 34% 37%

0.0274 0.0294 0.794 0.787 35% 36%

20.27 19.13 0.662 0.657 51% 52%

9.76 11.33 0.756 0.737 44% 47%

21.65 24.95 0.644 0.615 31% 37%

Units of measurement (mg kg− 1 ); ⁎ — transformed variables; SPC — Soil Predictive Components; RK — regression-kriging; OK — ordinary kriging; RMSPE — Root Mean Square Prediction Error.

L.R. Lado et al. / Geoderma 148 (2008) 189–199

195

Fig. 4. Final maps of heavy metal concentrations in topsoil [mg kg− 1] interpolated using block regression-kriging (support size = 5 km). All maps described in this paper are available on-line via the http://eusoils.jrc.it website.

196

L.R. Lado et al. / Geoderma 148 (2008) 189–199

Fig. 5. Highest baseline concentrations of heavy metals in Europe represented using the factors #1 and #2 of the eight maps of interpolated HMCs: (left) factor #1 (47.8% of variance) showing the areas of high concentrations of As, Cd, Hg and Pb, mainly caused by anthropogenic factors, and (right) factor #2 (22.6% of variance) showing the areas of high concentration of Cr, Cu and Ni.

significant predictors were PC04 (limestones, elevation), PC02 (agriculture), PC13 (EVI, infrastructures) and PC14 (earthquakes). Much of variation was left unexplained by our predictors, but the residuals still showed spatial autocorrelation (Fig. 3). 3.3. Regression-kriging — maps of HMCs The final transformed maps of HMCs obtained through regressionkriging are shown in Fig. 4. The most inter-correlated maps of HMCs were Cr and Ni (r = 0.91), Pb and Zn (r = 0.89), Cd and Zn (r = 0.79) and

Hg and Pb (r = 0.75). This agrees with the relationships obtained previously in the PCA for the observations (Fig. 2). High values of Cr and/or Ni are mainly found in central Greece, northern Italy, the central Pyrenees, northern Scandinavia, Slovakia and Croatia. Our analyses show that there is a strong correlation between the contents of Ni and Cr and the magnitude of earthquakes. In some tectonically unstable areas we find spots of ultramafic rocks, polymetallic sulphide minerals as well as sedimentary rocks and lateritic deposits derived from these materials (Kane, 1977; Skarpelis, 2006; Meissner and Kern, 2007). The seismic activity is indirectly

Fig. 6. Overall normalized interpolation error. This shows areas of extrapolation in both geographical and feature space and can be used to collect additional samples. In this case, the surveyors seem to have systematically omitted urban soils, some geological units and some land cover classes.

L.R. Lado et al. / Geoderma 148 (2008) 189–199

correlated with heavy metal concentrations — such materials provide high quantities of Ni and Cr to the soils by weathering processes. For the other heavy metals, the higher concentrations are mainly found in Central Europe and are directly related to human activities. Cd, Cu, Hg, Pb, Zn present a high correlation with Agriculture (r = 0.7) and with quaternary limestones (r = 0.41), where most of the agricultural areas in Central Europe are located. The use of fertilizers, manure and agrochemicals are important sources of these elements in European soils. These heavy metals are also highly correlated with the distance to infrastructures and to components 1 and 3 of the EVI images (PCEVI1 and PCEVI3), which depict clearly urban and industrial areas. 3.4. Validation The success of the technique was evaluated using the leave-oneout cross validation method, as implemented in the krige.cv method of gstat (Pebesma and Wesseling, 1998). The algorithm works as follows: it visits a data point, predicts the value at that location by leaving out the observed value, and proceeds with the next data point. This way each individual point is assessed versus the whole data set. The results of the validation are shown in Table 1. This shows that the contrast of the maps produced using regression-kriging is in general 20–40% higher than for the ordinary kriging (see also further Fig. 7). The Root Mean Square Prediction Error (RMSPE), on the other hand, is always smaller for regression-kriging, but the relative difference does not exceed 5%. The percentage of variability explained by RK is always higher than by OK. The results of evaluation using the leave-one-out technique suggest that the models for As, Ni and Pb can be considered satisfactorily

197

accurate (prediction accuracy 45–52% of total variance), the models for Cr, Cu, Hg and Zn are only marginally satisfactory (36–41%), while the model for Cd has low accuracy (30%), i.e. it does not seem to carry much useful information about the distribution of this element. These results demonstrate that there is quite some difference in how the HMCs vary in space. 3.5. Factor analysis — the overall concentrations In all cases the HMCs are positively correlated, so the results of factor analysis on these maps can be used to depict areas where the overall values are high or low. The dominantly high/low values in all maps (factor #1) can be seen in Fig. 5. Note that, according to this map, administrative units (NUTS level3) with highest overall concentrations are: (1) Liege (Arrondissement) (BE), Attiki (GR), Darlington (UK), Coventry (UK), Sunderland (UK), Kozani (GR), Grevena (GR), Hartlepool & Stockton (UK), Huy (BE), Aachen (DE) (As, Cd, Hg and Pb) and (2) central Greece and Liguria region in Italy (Cr, Cu and Ni). In some countries (England, Belgium), high values of factor #1 can be connected with the urbanization level, i.e. population density, while in other countries (Spain, Czech republic) there is no obvious relationship between the HMCs and the environmental factors. This proves that inventory of heavy metals in soils and reconstruction of their sources is complex and can not be achieved by using only few spatial predictors. The precision of mapping the eight elements, derived as the average normalized prediction error, can be seen in Fig. 6. This map shows high uncertainty for some specific areas systematically omitted or under-sampled by the national FOREGS survey teams, especially urban soils, but also national parks, wetlands and some geological

Fig. 7. A comparison of maps of Pb produced using: (a) auxiliary environmental predictors (regression-kriging) and (b) pure geostatistical interpolation (ordinary kriging).

198

L.R. Lado et al. / Geoderma 148 (2008) 189–199

units. We assume that we would be able to produce maps with more accurate estimates of the HMCs if a model-based design method had been used instead (see e.g. Brus and Heuvelink (2007)). 4. Discussion and conclusions Many spatial features like geology or land use influence the distribution of HMCs in soils and we find it convenient to use them as an input for mapping. This is illustrated with the map in Fig. 7a that shows how the values of HMCs change due to the changes in relief (mountain chains), geological unit and density of roads. Such environmental patterns are normally completely ignored in the maps that are produced by plain geostatistical techniques such as ordinary kriging (Fig. 7b). The validation of the results shows that regression-kriging is not much more accurate than ordinary kriging and that the model for Cd has low accuracy. Although this questions the use of more complicated techniques in the first place, we still believe that there is an added value in running analysis using large number of auxiliary maps (see also Pebesma (2006) for an in-depth discussion). Auxiliary predictors can contribute to understanding if the high values are due to the urbanization, geological substrata, hydrological shape or if the distribution is random. All this allows an in-depth analysis of the processes that cause the distribution of HMCs, so that also the remediation policies can be selected appropriately. There are two possible explanations for why the overall accuracy of maps of HMCs is relatively low. Firstly, we discovered that both regression-kriging and ordinary kriging are not fit to deal with many sporadic high values (hot-spots), which are almost always over smoothed. We assume that the right approach to dealing with local hot-spots would be to always include the most complete and most detailed information that can be used to explain such features (e.g. log of the distance to locations of all possible HMC sources). Another solution to over-smoothing is to, instead of using the predicted values, use geostatistical simulations of HMCs for spatial modeling purposes. The second critical issue affecting the accuracy of mapping is the relatively coarse resolution and low accuracy of the input maps. The relatively coarse scale of the project and consequent large support size obviously limits the predictability of features that possibly happen at short-ranges. From the computational point of view, we experience the serious limitations of using regression-kriging with large data sets (detailed scales for large areas). Because the raster maps were quite large (36 predictive components, each 3991 × 3320 pixels), the first attempts to interpolate using the standard universal kriging settings in R (global model, all predictors) resulted in memory usage problems. Interpolation of the maps was still possible using the gstat stand-alone exe application, but it often took between 24 and 38 h on a standard PC. Solving the computational limitations of regression-kriging will remain a task for programmers and geostatisticians to solve (Hengl et al., 2007). Imrie et al. (2008), in their study of the FOREGS database, also reported that the overall distribution of the geochemical elements in European topsoils follows diverse patterns that can be explained by different processes occurring at different spatial scales. They conclude that the geochemical variation at short scales depends mainly on the local variation of lithology, land use, weathering processes and organic matter content, at medium scales the major parent material and the effects of the recent glaciation can be used to explain those differences, while at large scales the effects of mineralization, pollution, climate and deposition processes are the main factors influencing the variability in the spatial distribution of geochemical elements. This is in agreement with our analyses — the larger autocorrelation ranges have been found for Ni and Cr (related to mineralization processes) and for Cd, Pb and Hg (anthropogenic pollution); whereas, smaller ranges were obtained for As, Cu and Zn. This indicates that local factors have greater influence in the variability of As, Cu and Zn in soils and, consequently, these HMCs are more difficult to map over large areas.

Despite all of these difficulties, the framework presented here has a number of important advantages over the smooth interpolator (currently used by FOREGS). The geostatistical technique presented in this paper is flexible and generic. The results can be easily improved by adding new observations and/or new auxiliary information. It can produce the best linear predictions of the values over the whole area of interest, but also an estimate of the mapping error (regression-kriging variance), which can be equally important for decision making and further spatial modeling. The R script we developed is fully automated so that both step-wise selection of best regression model, fitting of the variogram, interpolated and export to a GIS environment is done without any/much human intervention and the results can be directly reproduced. This way the analyst can focus his/her resources on improving the quality of inputs and on interpreting the final outputs. Further research must be done on the effects of cross-interactions between independent variables, as well as on possible non-linear relationships. These points could facilitate a greater understanding of the causes behind the distribution of HMCs in soils. Unlike many different databases that are produced by merging different national survey protocols, our impression was that the analytical quality of the FOREGS database is rather high. There is a well-defined protocol that includes the use of standardized analytical techniques, and a systematic control of the results based on duplicate measurements. These quality standards make this database fairly attractive for geostatistical analysis at pan-European scale. However, the accuracy of the geostatistical models is greatly dependent on the quality of sampling. In the case of the FOREGS database, the sampling density corresponds to cell sizes of 1–5 km, which obviously limits its usage to general scales only. The biggest limitation of the FOREGS dataset seems to be its sampling design which obviously carries limitations: sampling points are somehow clustered and several features (certain land use features and urban soils) have been omitted by the surveyors, which might lead to under or over-estimation over the whole continent (Fig. 6). We also observed unrealistic low values for carbon content in soils of Fennoscandia, which is because organic soils have been avoided in the sampling protocol. Organic components strongly bind heavy metals in soils, so that by avoiding the organic soil areas the overall content can be under-estimated. Finally, we hope that this work will motivate the national agencies in Europe to contribute additional point data and increase the number of points in the database up to a few thousand. The Geological Survey point database of USA (http://tin.er.usgs.gov/geochem/) has about 60,000 points; the Australian mapping agency has used a data set of 150,000 points to map main soil physical and chemical properties (Henderson et al., 2005). Also in Europe, there must be thousands and thousands of high quality field-sampled environmental data that is waiting for integration and further geostatistical analysis. References Annoni, A., (Ed.), 2005. European Reference Grids. EUR 21494 EN. European Commission, DG-JRC, IES, Ispra, Italy. Baize, D., Sterckeman, T., 2001. Of the necessity of knowledge of the natural pedogeochemical background content in the evaluation of the contamination of soils by trace elements. Sci. Total Environ. 264 (1–2), 127–139. Böhner, J., McCloy, K.R., Strobl, J. (Eds.), 2006. SAGA — Analysis and Modelling Applications. Göttinger Geographische Abhandlungen, Heft, vol. 115. Verlag Erich Goltze GmbH, Gottingen, Germany. Brus, D.J., Heuvelink, G.B.M., 2007. Optimization of sample patterns for universal kriging of environmental variables. Geoderma 138 (1–2), 86–95. Cappuyns, V., Swennen, R., Verhulst, J., 2004. Assessment of acid neutralizing capacity and potential mobilisation of trace metals from land-disposed dredged sediments. Sci. Total Environ. 333 (1–3), 233–247. Clemente, R., Walker, D.J., Roig, A., Bernal, M.P., 2003. Heavy metal bioavailability in a soil affected by mineral sulphides contamination following the mine spillage at Aznalcollar (Spain). Biodegradation 14, 199–205. de Jager, A., Rimavičiūte, E., Haastrup, P., 2006. A water reference for Europe. In: Haastrup, P., Würtz, J. (Eds.), Environmental Data Exchange Network for Inland Water. Elsevier, pp. 259–286.

L.R. Lado et al. / Geoderma 148 (2008) 189–199 de Vos, W., Tarvainen, T., 2006. Geochemical atlas of Europe part 2: interpretation of geochemical maps, additional tables, figures, maps and related publications. Vol. 2 of Geochemical Atlas of Europe. EuroGeosurveys & Foregs, Espoo, Finland. de Vries, W., Schütze, G., Lofts, S., Tipping, E., Meili, M., Römkens, P.F.A.M., Groenenberg, J.E., 2005. Calculation of critical loads for cadmium, lead and mercury. Background Document to a Mapping Manual on Critical Loads of Cadmium, Lead and Mercury. Doll, C.N.H., Muller, J.P., Elvidge, C.D., 2000. Night-time imagery as a tool for global mapping of socioeconomic parameters and greenhouse gas emissions. Ambio 29 (3), 157–162. European Communities, 2006. Proposal for a directive of the European Parliament and of the Council Establishing a Framework for the Protection of Soil and Amending Directive 2004/35/EC.COM(2006) 232.2004/35/EC.COM(2006) 232. European Commission, Brussels, Belgium. Facchinelli, A., Sacchi, E., Mallen, L., 2001. Multivariate statistical and GIS-based approach to identify heavy metal sources in soils. Environ. Pollut. 114 (3), 313–324. Gawlik, B.M., Bidoglio, G., 2006. Background values in European soils and sewage sludges part II — contents of trace elements and organic matter in European soils. Results of a JRC-coordinated Study on Background Values. EUR 22265 EN. European Commission, DG-JRC, IES, Ispra, Italy. Hansen, E., Lassen, C., Stuer-Lauridsen, F., Kjølholt, J., 2002. Heavy Metals in Waste. Henderson, B.L., Bui, E.N., Moran, C.J., Simon, D.A.P., 2005. Australia-wide predictions of soil properties using decision trees. Geoderma 124 (3–4), 383–398. Hengl, T., 2006. Finding the right pixel size. Comput. Geosci. 32 (9), 1283–1298. Hengl, T., 2007. A practical guide to geostatistical mapping of environmental variables. EUR 22904 EN. Office for Official Publications of the European Communities, Luxembourg. Hengl, T., Heuvelink, G.B.M., Stein, A., 2004. A generic framework for spatial prediction of soil variables based on regression-kriging. Geoderma 120 (1–2), 75–93. Hengl, T., Heuvelink, G.B.M., Rossiter, D.G., 2007. About regression-kriging: from equations to case studies. Comput. Geosci. Hijmans, R.J., Cameron, S.E., Parra, J.L., Jones, P.G., Jarvis, A., 2005. Very high resolution interpolated climate surfaces for global land areas. Int. J. Climatol. 25, 1965–1978. Hutton, M., de Meeûs, C., 2001. Analysis and conclusions from Member States' Assessment of the risk to health and the environment from cadmium in fertilisers. Imrie, C.E., Korre, A., Munoz-Melendez, G., Thornton, I., Durucan, S., 2008. Application of factorial kriging to the FOREGS European topsoil geochemistry database. Sci. Total Environ. 393, 96–110. ISO, 1995. Soil quality. Extraction of Trace Elements Soluble in Aqua Regia. Kane, M., 1977. Correlation of major eastern earthquake centers with mafic/ultra-mafic bassement masses. Koptsik, S., Koptsik, G., Livantsova, S., Eruslankina, L., Zhmelkova, T., Vologdina, Z., 2003. Heavy metals in soils near the nickel smelter: chemistry, spatial variation, and impacts on plant diversity. J. Environ. Monit. 5 (3), 441–450. Lehner, B., Doll, P., 2004. Development and validation of a global database of lakes, reservoirs and wetlands. J. Hydrol. 296, 1–22. Lin, Y.P., Teng, T.P., Chang, T.K., 2002. Multivariate analysis of soil heavy metal pollution and landscape pattern in Changhua county in Taiwan. Landsc. Urban Plan. 62 (1), 19–35. Løkke, H., Bak, J., Falkengren-Grerup, U., Finlay, R.D., Ilvesniemi, H., Nygaard, P.H., Starr, M., 1996. Critical loads of acidic deposition for forest soils: is the current approach adequate? Ambio 25, 510–516. Luo, W., Wang, T., Lu, Y., Giesy, J.P., Shi, Y., Zheng, Y., Xing, Y., Wu, G., 2007. Landscape ecology of the Guanting reservoir, Beijing, China: multivariate and geostatistical analyses of metals in soils. Environ. Pollut. 146, 567–576. Meissner, R., Kern, H., 2007. Earthquakes and strength in the laminated lower crust — can they be explained by the “corset model”. Tectonophysics. Micó, C., Recatalá, L., Peris, M., Sánchez, J., 2006. Assessing heavy metal sources in agricultural soils of an European Mediterranean area by multivariate analysis. Chemosphere 65, 863–872.

199

Palumbo, B., Angelone, M., Bellanca, A., Dazzi, C., Hauser, S., Neri, R., Wilson, J., 2000. Influence of inheritance and pedogenesis on heavy metal distribution in soils of Sicily, Italy. Geoderma 95 (3–4), 247–266. Papritz, A., Herzig, C., Borer, F., Bono, R., 2005. Modelling the spatial distribution of copper in the soils around a metal smelter in northwestern Switzerland. In: Renard, P., Demougeot-Renard, H., Froidevaux, R. (Eds.), Geostatistics for Environmental Applications: Proceedings of the Fifth European Conference on Geostatistics for Environmental Applications. Springer, New York, pp. 343–354. Pawlewicz, M.J., Steinshouer, D.W., Gautier, D.L., 2003. Map showing geology, oil and gas fields, and geologic provinces of Europe including Turkey. Open-File Report 97470I. US Geological Survey, Denver, CO. Pebesma, E.J., 2006. The role of external variables and GIS databases in geostatistical analysis. Trans. GIS 10 (4), 615–632. Pebesma, E.J., Wesseling, C.G., 1998. Gstat: a program for geostatistical modelling, prediction and simulation. Comput. Geosci. 24 (1), 17–31. Pebesma, E.J., Bivand, R.S., 2005. Classes and methods for spatial data in R. R News 5 (2), 9–13. Posch, M., Slootweg, J., Hettelingh, J.P., 2005. European critical loads and dynamic modelling: CCE status report 2005. Rabus, B., Eineder, M., Roth, A., Bamler, R., 2003. The shuttle radar topography mission — a new class of digital elevation models acquired by spaceborne radar. Photogramm. Eng. Remote Sensing 57 (4), 241–262. Reimann, C., de Caritat, P., 2005. Distinguishing between natural and anthropogenic sources for elements in the environment: regional geochemical surveys versus enrichment factors. Sci. Total Environ. 337 (1–3), 91–107. Reimann, C., Garrett, R.G., 2005. Geochemical background-concept and reality. Sci. Total Environ. 350 (1–3), 12–27. Reimann, C., Siewers, U., Tarvainen, T., Bityukova, L., Eriksson, J., Giucis, A., Gregorauskiene, V., Lukashev, V.K., Matinian, N.N., Pasieczna, A., 2003. Agricultural soils in northern Europe: A geochemical atlas. Geol. Jb. Sonderhefte, vol. SD 5. Hannover, Germany. Romić, M., Hengl, T., Romíc, D., Husnjak, S., 2007. Representing soil pollution by heavy metals using continuous limitation scores. Comput. Geosci. 33, 1316–1326. Salemaa, M., Vanha-Majamaa, I., Derome, J., 2001. Understorey vegetation along a heavy-metal pollution gradient in SW Finland. Environ. Pollut. 112 (3), 339–350. Geochemical atlas of Europe part 1: background information, methodology and maps. In: Salminen, T. (Ed.), Vol. 1 of Geochemical Atlas of Europe. EuroGeosurveys & Foregs, Espoo, Finland. Salminen, R., Chekushin, V., Tenhola, M., Bogatyrev, I., Fedotova, E., Tomilina, O., Zhdanova, L., Glavatskikh, S.P., Selenok, I., Gregorauskiene, V., Kashulina, G., Niskavaara, H., Polischuok, A., Rissanen, K., 2004. Geochemical Atlas of the Eastern Barents Region. Elsevier Science, Amsterdam. Skarpelis, N., 2006. Lateritization processes of ultramafic rocks in Cretaceous times: the fossil weathering crusts of mainland Greece. J. Geochem. Explor. 88 (1–3), 325–328. Sutton, P., 1997. Modeling population density with night-time satellite imagery and GIS. Comput. Environ. Urban 21 (3), 227–244. Sutton, P., Roberts, D., Elvidge, C., Meij, H., 1997. A comparison of nighttime satellite imagery and population density for the continental United States. Photogramm. Eng. Remote Sensing 63, 1303–1313. Thode Jr., H.C., 2002. Testing for Normality, vol. 164. Stony Brook University, New York. Thornthwaite, C.E., 1948. An approach toward a rational classification of climate. Geogr. Rev. 38, 55–94. UN/ECE, 07/05/2007, 1998. Protocol to the 1979 convention on long-range transboundary air pollution on heavy metals. Wessel, P., Smith, W.H.F., 1996. A global self-consistent, hierarchical, high-resolution shoreline database. J. Geophys. Res. 101, 8741–8743. Zhang, C., 2006. Using multivariate analyses and GIS to identify pollutants and their spatial patterns in urban soils in Galway, Ireland. Environ. Pollut. 142, 501–511.