Statistical analyses of geochemical variables in soils of Ireland

Statistical analyses of geochemical variables in soils of Ireland

Geoderma 146 (2008) 378–390 Contents lists available at ScienceDirect Geoderma j o u r n a l h o m e p a g e : w w w. e l s ev i e r. c o m / l o c ...

2MB Sizes 0 Downloads 33 Views

Geoderma 146 (2008) 378–390

Contents lists available at ScienceDirect

Geoderma j o u r n a l h o m e p a g e : w w w. e l s ev i e r. c o m / l o c a t e / g e o d e r m a

Statistical analyses of geochemical variables in soils of Ireland Chaosheng Zhang a,⁎, Deirdre Fay b, David McGrath b, Eamonn Grennan c, Owen T. Carton b a b c

Department of Geography, National University of Ireland, Galway, Ireland Teagasc, Environmental Research Centre, Johnstown Castle, Wexford, Ireland Institute of Technology, Sligo, Ireland

A R T I C L E

I N F O

Article history: Received 1 June 2007 Received in revised form 28 May 2008 Accepted 10 June 2008 Available online 17 July 2008 Keywords: Soil geochemistry Probability Statistical analyses Data transformation Multivariate analyses

A B S T R A C T Geochemical variables in soils are affected by multiple factors, and different variables have different responses to these factors, causing complicated multivariate relationships between them. In this study, the relationships between a total of 45 geochemical variables in soils and in particular mineral soils of Ireland were investigated using multivariate analyses following an investigation into their probability features and data transformation based on a total of 1310 surface soil samples. There was strong variation in the values of geochemical variables. Multi-modal features in the histograms and multi-kink features in the normal quantile–quantile (Q–Q) plots were observed for many variables implying the existence of multiple populations. Obvious outliers were identified using the normal Q–Q plots. Most of the variables did not pass a Kolmogorov–Smirnov test for either normal or lognormal distribution, and the Box–Cox power transformation was effective in transposing data to a form suited to further parametric statistical analyses. There were generally good correlations between most metals, but relatively poor correlations between extractable nutrients (available P, K, and Mg) and metals. The extractable nutrients were more affected by agricultural activities that were more spatially variable compared with natural factors such as geology and soil type. Cluster analysis classified the variables into groups of metals of mainly natural origin, mobile elements, and variables that are affected by agricultural activities and pollution. Among the three factors of rock type, soil type and land use investigated, land use was found to be the least important influencing factor for most of the geochemical variables studied. © 2008 Elsevier B.V. All rights reserved.

1. Introduction Soil databases that include information on the levels of heavy metals and related compounds exist for parts of Europe, such as Scotland (Reaves and Berrow, 1984), England and Wales (McGrath and Loveland, 1992), and Northern Ireland (Cruickshank, 1997). However, such information on the chemical components of Irish soils is limited. The establishment of national norms in relation to a range of soil chemical parameters is required in order to provide a benchmark for soils in a country. A national soil database project was carried out in the Republic of Ireland during 2003–2006. This national soil database contains much useful information that requires proper statistical analyses in order to fully comprehend the results. Soil geochemistry is affected by multiple factors and influences ranging from small-scale mineral composition to regional-scale geology, soil type, and topography. It was found that geochemical variables seldom followed a normal distribution (Reimann and Filzmoser, 2000; Zhang et al., 2005). The lognormal distribution was once widely recognised, and was even

⁎ Corresponding author. Fax: +353 91 495505. E-mail address: [email protected] (C. Zhang). 0016-7061/$ – see front matter © 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.geoderma.2008.06.013

regarded as a “fundamental law of geochemistry” (Ahrens, 1954). Later, Vistelius (1960) proposed a “fundamental law of geochemical processes”, which was defined as the joint probability distribution function of the concentration of the minor chemical element deposited by natural chemical reactions and which had a large positive skewness. Shaw (1961) declared that no single probability function could be expected to suit all elements, and proposed five rules including the lognormal law that was regarded as the “best probability function to use”. Oertel (1969) proposed that other than the lognormal, the gamma distribution was most satisfactory for representing the frequency distributions of trace-element concentrations in mineral samples. In recent years, Zhang and Selinus (1998) regarded the lognormal distribution as a special case of more general, positively skewed distributions in environmental geochemistry. Furthermore, non-normal and non-lognormal distributions are widely observed in geochemical databases (McGrath and Loveland, 1992; Reimann and Filzmoser, 2000; Zhang et al., 2005). The non-normality and non-lognormality features of geochemical variables provide a challenge for parametric statistical analyses since normality of data sets is required for such analyses. Therefore, it is necessary to understand the probability features of geochemical variables prior to further statistical analyses. Geochemical variables in soils are affected by multiple processes resulting in complicated relationships between them. This issue

C. Zhang et al. / Geoderma 146 (2008) 378–390

becomes more important when a geochemical database becomes large with a large number of variables analysed for a large number of samples taken from a large area with mixed rock type, soil type, land use, and human influences. Multivariate analyses have been widely applied in environmental studies (e.g., Howarth, 1983; Zhang and Selinus, 1998; Gallego et al., 2002; Reimann et al., 2002; Zhang and Lalor, 2002; Lee et al., 2006; Micó et al., 2006; Zhang, 2006). They provide an effective way to reveal the relationships between multiple variables and thus they are helpful for the understanding of the influencing factors as well as sources of chemical components. Correlation analysis reveals the linear relationship between pairs of variables: A significantly positive correlation does not mean a causeand-effect relationship, but does imply that the two variables vary consistently and there should be some factors in common that influence both variables. Multivariate analyses have the advantage of analysing the relationships between multiple variables. For example, cluster analyses can provide information of the relationships among all the variables using a single graph of a cluster tree. In this study, the results from the national soil database project of Ireland are presented and demonstrated with a focus on probability features of geochemical variables using statistical tools. Meanwhile, correlation analysis and cluster analysis were applied to the soil data of Ireland in order to better understand the relationships between the geochemical variables as well as their influencing factors.

379

2. Methods 2.1. Study area The Republic of Ireland has a total land area of 71,000 km2. Traditionally Ireland is an agricultural country, with little pollution from heavy industries. Therefore, with the exception of urban and mining areas, it is expected that soil geochemistry is mainly affected by natural factors such as geology and soil type. Rock types in Ireland were classified based on a rock unit map from the Geological Survey of Ireland (McConnell and Gately, 2006) and included: basalt, granite, rhyolite, impure limestone (Carboniferous only), pure limestone (Carboniferous only), sandstone (Old Red Sandstone), sandstone and shale, shale, and schist. Soil types in Ireland can be simplified as Podzols, Brown Podzolics, Grey Brown Podzolics, Acid Brown Earths, Gleys, Brown Earths, Rendzinas, Lithosols, and Peat (Gardiner and Radford, 1980). 2.2. Soil sampling and laboratory analyses A total of 1310 soil samples were taken at fixed locations on a predetermined grid system (Fig. 1). Two samples were taken from each 10 km × 10 km segment of the National Grid System of Ireland. At sampling sites, a total of 25 soil cores were taken to a depth of 10 cm at 5 m intervals on a grid measuring 20 m × 20 m and with the centre point of the grid at the sample location. The samples were placed in

Fig. 1. Soil sampling locations in Ireland (n = 1310).

380

C. Zhang et al. / Geoderma 146 (2008) 378–390

Table 1 Summary statistics for 45 geochemical variables in soils of Ireland (n = 1310)⁎

pH SOC Avail_P Avail_K Avail_Mg Al As Ba Ca Cd Ce Co Cr Cu Fe Ga Ge Hg K La Li Mg Mn Mo Na Nb Ni P Pb Rb S Sb Sc Se Sn Sr Ta Th Ti Tl U V W Y Zn

n b DL

Min

5%

10%

25%

Median

75%

90%

95%

98%

Max

0 0 0 0 0 0 1 0 0 1 0 0 44 0 0 10 72 11 0 12 137 0 0 0 0 0 0 0 0 0 0 30 1 0 10 0 129 6 0 72 20 14 132 0 0

3.2 1.40 0.56 4.66 13.49 0.06 b 0.2 6.6 0.026 b 0.02 0.6 0.2 b2 1.1 0.05 b 0.1 b 0.1 b 0.02 0.02 b 0.5 b2 0.038 7 0.07 0.015 0.06 0.8 0.007 1.1 0.6 0.011 b 0.05 b 0.1 0.08 b 0.2 9.2 b 0.05 b 0.1 39 b 0.02 b 0.1 b2 b 0.1 0.22 3.6

3.7 2.86 2.32 45.52 71.11 0.20 1.43 21.3 0.102 0.111 1.9 0.5 2.6 3.5 0.20 0.60 b 0.1 0.022 0.08 1.1 b2 0.107 25 0.32 0.053 0.34 1.9 0.036 11.7 2.2 0.035 0.10 0.36 0.34 0.54 20.7 b 0.05 0.25 125 b 0.02 0.20 3.9 b 0.1 0.73 15.9

3.9 3.56 2.98 55.94 89.38 0.56 2.09 56.0 0.149 0.150 5.7 0.9 6.1 4.8 0.44 1.53 0.25 0.035 0.17 3.4 b2 0.137 59 0.42 0.080 1.08 2.8 0.048 13.6 5.8 0.042 0.18 0.86 0.40 0.73 24.8 0.05 0.73 345 0.079 0.53 9.4 b 0.1 1.82 21.4

4.6 4.92 4.32 82.51 127.51 2.21 4.41 141.7 0.225 0.212 22.3 3.0 25.2 9.5 1.14 5.66 0.86 0.058 0.59 12.7 10.7 0.196 190 0.61 0.205 4.42 9.2 0.075 18.2 29.8 0.055 0.31 3.34 0.51 1.12 32.5 0.27 2.91 1355 0.292 1.48 30.8 0.36 6.94 35.6

5.3 7.00 7.05 124.01 186.13 3.48 7.25 230.2 0.358 0.326 34.8 6.2 42.6 16.2 1.87 8.46 1.26 0.086 0.98 20.0 19.7 0.298 462 0.91 0.338 6.83 17.5 0.105 24.8 53.5 0.073 0.53 5.85 0.74 1.68 49.7 0.45 4.65 2133 0.430 1.96 52.2 0.59 10.33 62.6

6.1 14.26 12.47 181.95 276.28 4.89 10.66 305.6 0.613 0.640 46.6 9.7 58.6 24.7 2.59 12.47 1.55 0.134 1.33 25.4 29.1 0.429 844 1.37 0.545 8.95 28.6 0.136 33.5 75.7 0.127 0.85 8.37 1.14 2.34 70.1 0.61 6.28 2851 0.568 2.48 74.2 0.85 14.46 90.8

6.6 40.82 20.48 266.55 404.65 6.03 15.97 387.3 1.495 1.253 56.3 13.0 74.9 35.1 3.31 15.76 1.82 0.185 1.68 30.2 41.2 0.645 1355 2.17 0.860 10.64 41.6 0.174 48.0 100.5 0.252 1.24 10.95 1.94 3.35 93.5 0.74 7.69 3401 0.715 3.21 94.8 1.10 20.31 126.1

7.0 48.01 30.52 312.77 485.92 6.65 21.90 454.5 2.591 1.652 62.3 15.1 86.8 45.9 3.80 17.76 2.00 0.237 1.85 33.1 54.2 0.824 1903 3.29 1.091 12.01 50.0 0.202 61.9 117.5 0.319 1.54 12.33 2.67 4.72 115.0 0.85 8.50 3773 0.818 4.74 104.8 1.31 24.04 144.7

7.2 50.81 43.75 386.79 559.39 7.38 33.38 545.5 4.900 2.267 70.2 19.0 99.6 64.1 4.39 20.42 2.24 0.295 2.13 36.5 72.1 1.062 3173 4.77 1.354 14.76 58.6 0.242 86.2 136.8 0.399 2.02 13.79 5.12 6.79 162.7 1.07 9.42 4468 1.008 7.36 123.6 1.80 30.61 178.9

7.7 55.80 316.41 949.23 1001.97 9.74 122.70 1296.9 26.628 15.148 136.4 58.7 221.7 272.4 19.43 25.16 2.58 3.450 2.64 75.2 165.7 2.096 21,077 21.14 2.254 38.88 176.0 0.493 2634.7 222.0 0.701 5.29 17.11 17.44 17.84 1252.5 2.71 11.15 8704 2.664 64.19 240.3 7.72 111.78 1384.4

⁎Units are reported in mg/kg except for available (Avail_) elements (Avail_P, Avail_K, Avail_Mg) in mg/l; soil organic carbon (SOC), Al, Ca, Fe, K, Mg, Na, S, P in %; pH in pH unit.

clear polythene bags, which were delivered to Teagasc, Johnstown Castle within one week. All soil samples were air-dried at ambient temperature. Soil samples were sieved to remove stones and plant debris, and mixed thoroughly to obtain a representative sample. Soil samples were rolled manually with a steel roller and then sieved through a mechanically vibrating 2 mm stainless steel mesh. Soil pH was determined by mixing 10 ml soil sample with 20 ml deionised water. After 10 min, pH was determined using a digital pH meter with a glass electrode. Available nutrients (P, K and Mg) were determined using Morgan's extracting solution (10% sodium acetate in a 3% acetic acid buffered at pH 4.8) (Peech and English, 1944). Soil organic carbon (SOC) was determined using a Leco CN-2000 dry combustion analyser. The dry combustion method employed followed that described by Wright and Bailey (2001) for organic C. All soil samples were finely ground in an agate mortar and pestle and passed through a 0.42 mm nylon mesh for ICP analyses. The method used involved the digestion of a representative sub-sample (200 mg) to dryness on a hot-plate with 10 ml hydrofluoric acid (HF), 5 ml perchloric acid (HClO4), 2.5 ml hydrochloric acid (HCl) and 2.5 ml nitric acid (HNO3). Salts were dissolved in 20% aqua regia and made up to 10 ml. Indium was added to this solution prior to analysis to serve as

an internal standard. For analysis of Al, Ba, Ca, Cr, Fe, K, Li, Mg, Mn, Na, P, S, Sr, Ti, and V this solution was introduced to ICP-AES, while for analysis of As, Cd, Ce, Co, Cu, Ga, Ge, La, Mo, Nb, Ni, Pb, Rb, Sb, Sc, Sn, Ta, Th, Tl, U, W, Y and Zn, this solution was introduced to ICP-MS. In terms of quality control, the standard procedure in ICP analysis included analysis of every 10th sample in duplicate and the inclusion of a reference material and a blank for every 50 samples. Reference materials included two in-house standards and certified reference materials (CRM), and the errors were generally within 10%. Both Se and Hg were measured using atomic fluorescence spectrometry. Details of the sampling and laboratory analyses are available in Fay et al. (2007). The final database contains 1310 samples with 45 variables of geochemical properties/attributes: pH, SOC, Avail_P, Avail_K, Avail_Mg, Al, As, Ba, Ca, Cd, Ce, Co, Cr, Cu, Fe, Ga, Ge, Hg, K, La, Li, Mg, Mn, Mo, Na, Nb, Ni, P, Pb, Rb, S, Sb, Sc, Se, Sn, Sr, Ta, Th, Ti, Tl, U, V, W, Y, and Zn. 2.3. Statistical analyses In this study statistical analyses were carried out using descriptive statistics, histograms, normal quantile–quantile (Q–Q) plots, data transformation, correlation analysis and cluster analysis. The main statistical software in use was SPSS (v. 14). It needs to be mentioned

C. Zhang et al. / Geoderma 146 (2008) 378–390

381

Fig. 2. Histograms of the raw data for selected variables of chemical components.

that the significance testing was done assuming a simple random sampling, whereas in this study a systematic sampling was carried out. 3. Results and discussions 3.1. Basic statistics The basic descriptive parameters of minimum, median, maximum and percentiles are shown in Table 1. Since ICP-MS was employed for the elements with low concentrations, the majority of the variables have all their values above the detection limits. A total of 17 variables had one or more values below their detection limits (As, Cd, Cr, Ga, Ge, Hg, La, Li, Sb, Sc, Sn, Ta, Th, Tl, U, V and W). Of these, Li, Ta and W had about 10% of values below the detection limits due to their very low concentrations. Values below detection limits were arbitrarily assigned half of the detection limits for the following statistical analyses. The raw data exhibit a wide range of variation of several magnitudes, e.g. Al from 0.06% to 9.74%, showing the strong diversity of soils in the study area. Due to the large variation, the median values are recommended as the representative values of the soil samples.

values can be obtained in the summary statistics table (Table 1) and the following normal Q–Q plots (Fig. 3). Histograms for all the 45 variables are available in Fay et al. (2007). Most of the variables showed a positively skewed distribution, except for pH and Al. The pH value is already transformed data of concentrations of H+ ion. The positive skewness feature demonstrates that there are some elevated values for most elements due to either natural enrichment or human activities. As expected, trace elements are more skewed than major elements. Soil organic carbon demonstrated a bimodal distribution, reflecting the two distinct populations of mineral soils and organic soils (peat). Many of the elements exhibited some degree of multi-modality that was generally related to the detection limits and/or organic soils. This multi-modal feature of the elements demonstrates the complexity of the soils. Another important feature of the histograms is that few of them demonstrate a normal (bell-shape) distribution. The non-normality feature has been widely observed in environmental geochemistry (Aubrey, 1956; Vistelius, 1960; McGrath and Loveland, 1992; Zhang and Selinus, 1998; Reimann and Filzmoser, 2000; Zhang et al., 2005), and may be attributed to complicating influencing factors. 3.3. Normal Q–Q plots for raw data

3.2. Histograms Histograms were created for selected variables (Fig. 2). In the figures, extreme values were truncated for some variables to show the feature of the majority of data clearly. Information on the truncated

In order to obtain a better understanding of the probability features, normal quantile–quantile (Q–Q) plots were produced for the raw data of selected variables (Fig. 3). The plots for all the 45 variables are available in Fay et al. (2007). The observed values were plotted on

382

C. Zhang et al. / Geoderma 146 (2008) 378–390

Fig. 3. Normal Q–Q plots of the raw data for selected variables of chemical components.

the x-axis, and values expected for a normal distribution were plotted on the y-axis. Samples with a normal distribution cluster along a diagonal straight line. Tied values (same values for different soil samples) were assigned arbitrary expected different values and they are represented as stacked points. Deviations from normality were observed for most of the variables. Due to the positive skewness of the raw data, most of the figures showed a concave feature, with both the low values and high values located below the diagonal line. If the data followed a normal distribution, both the low and high values would have been lower. Some of the variables showed a slightly different feature, e.g. pH and SOC. As mentioned earlier, the pH values are already transformed data of H+ concentrations; while SOC is strongly affected by the presence of organic soils. Besides SOC, multiple kink features were observed for several variables (Al, Ba, Ce, Ga, Ge, La, Li, Nb, Ta, Ti, and Tl), demonstrating the existence of multiple populations in the data sets (e.g. caused by organic soils, different rock types and soil types). It should be noted that the number of kinks are not related to the potential number of populations. One of the important features of the normal Q–Q plots was that high value outliers were clearly observable (located away from the majority of samples), such as the high values for Avail_P, Avail_K, and As. It should be mentioned that two extreme values were observed for both Pb and Zn. One sample, likely from a mining site, had both extreme values of the two elements. The extreme values should be

properly dealt with (e.g. removal from the data set) in statistical analyses, as they have a strong effect on the overall feature of the data sets, and may cause biased results for statistical analyses which are sensitive to outliers, such as calculation of mean and variance, as well as Pearson's correlation analysis. 3.4. Optimal data transformation and test for normality Based on the histograms and normal Q–Q plots, the raw data generally did not follow a normal distribution, and thus appropriate data transformation was needed prior to further statistical analyses. A logarithmic transformation did produce distributions generally with lower skewness values than the original data (Table 2). However, since the logarithmic transformation was not an optimal method of transformation for the raw data, a more powerful method of Box–Cox transformation (Box and Cox, 1962; Jobson, 1991; Zhang and Zhang, 1996; Zhang and Selinus, 1998) was applied. In a Box–Cox transformation, a power parameter called “lambda” is determined to make the transformed data set closer to normality, especially improving symmetry. Outliers can make statistical results unreliable as well as having an effect on the power parameters for Box–Cox transformation, and thus they were identified using normal Q–Q plots both for the raw data and also those for the log-transformed data (not shown). For high value outliers, those distanced far away from the majority of samples on both the raw data plots and the transformed data plots were regarded as outliers. The low value outliers are those distanced far away from

C. Zhang et al. / Geoderma 146 (2008) 378–390

383

Table 2 Skewness, kurtosis and results of Kolmogorov–Smirnov test (K–S p) for the raw, log-transformed and Box–Cox transformed data (outliers removed)

pH SOC Avail_P Avail_K Avail_Mg Al As Ba Ca Cd Ce Co Cr Cu Fe Ga Ge Hg K La Li Mg Mn Mo Na Nb Ni P Pb Rb S Sb Sc Se Sn Sr Ta Th Ti Tl U V W Y Zn

Sample no. (n)

Raw data Skewness

Kurtosis

K–S p

Log-transformed data Skewness

Kurtosis

K–S p

Skewness

Kurtosis

K–S p

1310 1310 1309 1309 1310 1310 1309 1310 1310 1308 1310 1310 1310 1309 1307 1310 1310 1307 1310 1310 1310 1310 1310 1310 1310 1310 1309 1309 1308 1310 1309 1310 1309 1310 1310 1307 1310 1310 1310 1310 1309 1310 1310 1308 1308

0.01 1.68 4.22 2.24 1.60 0.08 5.49 1.32 7.74 3.32 0.31 2.06 0.62 2.78 0.71 0.30 −0.36 1.98 0.18 0.17 2.26 1.99 8.93 6.15 1.63 1.87 0.93 1.30 2.85 0.70 2.08 2.58 0.34 5.81 3.94 2.01 1.61 −0.06 0.49 1.56 7.47 0.59 4.51 1.37 1.27

−0.71 1.48 30.84 9.64 3.78 −0.49 49.70 6.20 87.22 19.03 1.20 11.76 1.83 14.94 1.90 −0.27 −0.32 6.99 −0.40 1.35 10.08 5.98 115.66 56.49 3.48 11.16 0.62 4.46 13.60 1.08 4.83 13.06 −0.37 47.38 23.43 7.07 8.38 −0.59 1.76 9.63 82.91 1.27 42.62 3.94 2.95

0.01 0.00 0.00 0.00 0.00 0.03 0.00 0.00 0.00 0.00 0.02 0.00 0.00 0.00 0.01 0.01 0.00 0.00 0.04 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.03 0.00 0.00 0.00 0.00 0.00 0.00

− 0.35 0.68 0.35 − 0.19 − 0.12 − 1.82 − 0.38 − 1.59 0.79 0.33 − 1.89 − 1.04 − 1.73 − 0.44 − 1.51 − 2.12 − 2.06 − 0.39 − 1.59 − 2.06 − 1.17 0.00 − 0.68 0.52 − 0.63 − 1.88 − 0.85 − 0.69 − 0.17 − 1.68 0.68 − 1.01 − 1.58 0.90 − 0.33 0.05 − 1.51 − 1.93 − 1.77 − 2.03 − 1.58 − 1.66 − 1.12 − 1.58 − 0.59

−0.59 −0.43 0.21 0.43 0.16 2.79 0.82 2.55 1.46 −0.06 3.06 0.50 2.61 0.18 2.20 5.04 3.63 0.38 2.10 3.92 0.68 0.04 0.49 1.43 0.21 3.24 0.05 0.91 2.66 2.32 −0.20 2.02 2.02 1.82 2.27 −0.06 1.49 3.30 2.62 3.75 4.06 2.60 0.91 2.40 0.19

0.00 0.00 0.03 0.84 0.98 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.52 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.29 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

−0.04 0.07 −0.00 0.01 0.00 −0.26 0.03 0.02 −0.05 0.00 −0.18 −0.06 −0.16 0.00 −0.08 −0.20 −0.34 0.01 −0.20 −0.19 −0.04 0.00 0.03 −0.04 −0.02 0.06 −0.10 0.03 0.02 −0.17 0.05 0.00 −0.20 −0.07 0.04 0.00 −0.05 −0.31 −0.14 0.02 0.30 −0.14 0.08 0.00 −0.02

−0.71 −0.36 0.11 0.33 0.04 −0.45 1.01 1.07 0.94 0.02 0.35 0.06 0.09 0.19 0.13 −0.31 −0.33 0.22 −0.47 0.60 0.38 0.04 0.75 1.02 −0.16 2.23 −0.62 0.58 2.25 −0.11 −0.33 0.69 −0.50 1.40 1.95 −0.08 1.16 −0.51 0.40 1.95 5.05 −0.04 1.88 0.64 −0.24

0.01 0.00 0.47 0.66 0.67 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.13 0.02 0.00 0.00 0.06 0.00 0.00 0.00 0.52 0.00 0.08 0.03 0.00 0.01 0.08 0.00 0.00 0.00 0.13 0.01 0.13 0.00 0.32 0.00 0.00 0.00 0.00 0.00 0.02 0.00 0.00 0.14

the majority of samples in the plots for the log-transformed data only. A total of 26 outliers were identified out of a total of 58,950 (=1310 × 45) values, which is equivalent to 0.044%. The outliers identified in this way can be regarded as “extreme values”. Due to the power of optimal data transformation, this method of lenient outlier identification is considered adequate. It was necessary to calculate the shape parameters of all the data sets and also to test their probability feature (Table 2). Skewness characterizes the degree of asymmetry of a distribution around its mean, and kurtosis measures the relative peakiness or flatness of a distribution compared with the normal distribution. Most of the raw data followed a positively skewed distribution, except for Ge and Th. The distributions of the raw data of pH, Al, and Th were quite symmetric. High kurtosis values were observed for many variables, together with their high skewness values, indicating that the majority of their values were concentrated on the low value end. It was found that the log-transformation “over-transformed” most of the variables, changing their skewness from positive to negative values. The skewness values for the Box–Cox transformed data sets were close to “0”, showing the power of this transformation in pushing data sets towards normality. However, many variables still failed to pass the test for normality (p ≤ 0.05), due to a mixture of

Box–Cox transformed data

populations, detection limits, tied values, and sample numbers (Zhang et al., 2005). Nevertheless, it was demonstrated that the Box–Cox transformation was most effective in changing the raw data sets towards normality, or becoming more symmetric. 3.5. Correlation analyses Since peat or organic soil is a substantial component of soils in Ireland, it is necessary to separate organic soils from mineral soils. Based on the frequency distribution of the SOC data, the mineral soils were arbitrarily defined as those with SOC b 15% for the purposes of this study. Another important factor that needs attention is the existence of urban areas. Soils in urban areas are heavily affected by human activities, especially traffic pollution (Zhang, 2006). Even though major built-up areas (such as Dublin) were not included in the sampling design explicitly, soils from even small urban areas might be affected by the complex human activities. Therefore, it was appropriate to separate these urban samples. The urban samples were identified based on the CORINE land cover map obtained from the Irish Environmental Protection Agency (EPA). The 1310 samples were classified into 977 mineral soils, 318 organic soils, and 15 urban samples. Prior to further analyses, outlier

384

C. Zhang et al. / Geoderma 146 (2008) 378–390

Table 3 Pearson’s correlation coefficients (lower-left) and significance levels (upper-right) between variables in mineral soils

pH SOC Avail_P Avail_K Avail_Mg Al As Ba Ca Cd Ce Co Cr Cu Fe Ga Ge Hg K La Li Mg Mn Mo Na Nb Ni P Pb Rb S Sb Sc Se Sn Sr Ta Th Ti Tl U V W Y Zn

pH

SOC

Avail_P

Avail_K

Avail_Mg

Al

As

Ba

Ca

Cd

Ce

Co

Cr

Cu

Fe

Ga

Ge

Hg

K

La

Li

Mg

1 −0.26 0.34 0.02 0.31 −0.07 0.27 −0.04 0.70 0.39 0.02 0.28 0.03 0.10 0.08 −0.10 −0.12 0.20 −0.03 0.06 0.20 0.17 0.38 0.09 −0.08 −0.16 0.33 0.17 0.20 0.02 0.02 0.23 0 0.02 0.06 0.19 −0.13 −0.01 −0.21 0.18 0.09 −0.09 −0.02 0.39 0.38

0 1 0.01 0.13 0.32 0.10 −0.05 0.05 0.15 0.03 −0.04 −0.12 0.05 0.06 0 0.13 −0.05 0.26 0.01 −0.03 0 0.09 −0.16 0.04 0.13 −0.06 −0.10 0.19 0.06 −0.03 0.82 −0.14 0.09 0.40 0.14 0.20 −0.01 −0.03 0 0.02 0.12 0.10 0.05 −0.06 −0.07

0 0.86 1 0.40 0.28 −0.21 −0.08 −0.14 0.33 0.20 −0.19 −0.09 −0.11 0.13 −0.20 −0.21 −0.16 −0.02 −0.14 −0.13 −0.08 −0.03 0 −0.01 −0.12 −0.22 0.05 0.48 0.03 −0.12 0.15 0.04 −0.15 −0.08 −0.04 0 −0.20 −0.18 −0.23 −0.07 0.04 −0.13 −0.13 0.03 0.15

0.54 0 0 1 0.32 0.11 0.10 0.02 0.06 0.05 0.06 0.13 0.14 0.21 0.14 0.11 0.06 0.01 0.09 0.05 0.15 0.13 0.14 0.20 0.10 0.06 0.09 0.45 0.11 0.13 0.18 0.03 0.11 0.02 0.09 0.07 0.08 0.09 0.07 0.04 0.12 0.16 0.10 −0.02 0.17

0 0 0 0 1 0.03 0.14 − 0.04 0.46 0.27 − 0.03 0.18 0.13 0.14 0.12 0.03 − 0.09 0.16 − 0.02 0 0.20 0.27 0.17 0.20 0.02 − 0.04 0.23 0.33 0.18 0.02 0.47 0.05 0.07 0.12 0.09 0.26 0.01 − 0.04 − 0.04 0.12 0.12 0.09 0.01 0.17 0.27

0.02 0 0 0 0.30 1 0.45 0.75 −0.05 −0.21 0.82 0.50 0.74 0.34 0.74 0.96 0.77 0.21 0.81 0.75 0.71 0.63 0.31 0.20 0.59 0.70 0.19 0.34 0.39 0.87 0.08 0.24 0.88 0.21 0.73 0.43 0.77 0.87 0.78 0.58 0.44 0.80 0.77 0.13 0.23

0 0.11 0.01 0 0 0 1 0.23 0.19 0.35 0.50 0.63 0.52 0.44 0.65 0.43 0.39 0.44 0.29 0.51 0.57 0.33 0.62 0.50 0.09 0.35 0.44 0.43 0.55 0.43 0.09 0.50 0.52 0.39 0.47 0.20 0.37 0.49 0.31 0.53 0.52 0.50 0.50 0.43 0.51

0.25 0.14 0 0.46 0.26 0 0 1 −0.02 −0.28 0.60 0.30 0.46 0.29 0.54 0.71 0.61 0.11 0.83 0.51 0.30 0.53 0.23 0.01 0.56 0.46 0.01 0.22 0.25 0.70 0.04 0.22 0.65 0.10 0.53 0.35 0.50 0.64 0.58 0.38 0.19 0.52 0.51 0.04 0.10

0 0 0 0.05 0 0.09 0 0.51 1 0.53 −0.05 0.27 0.03 0.26 0.10 −0.08 −0.32 0.26 −0.09 0.01 0.15 0.36 0.31 0.23 0.16 −0.18 0.42 0.32 0.26 −0.10 0.47 0.03 0.03 0.20 0.02 0.53 −0.17 −0.11 −0.21 0.26 0.21 −0.01 −0.14 0.50 0.46

0 0.29 0 0.11 0 0 0 0 0 1 −0.07 0.41 0.08 0.32 0.09 −0.27 −0.40 0.37 −0.37 0.04 0.21 0.05 0.40 0.54 −0.17 −0.16 0.67 0.34 0.38 −0.24 0.31 0.26 −0.03 0.37 −0.08 0.21 −0.18 −0.20 −0.29 0.41 0.43 0.05 −0.19 0.69 0.69

0.58 0.23 0 0.06 0.34 0 0 0 0.15 0.03 1 0.54 0.71 0.35 0.72 0.80 0.71 0.19 0.63 0.96 0.58 0.52 0.36 0.18 0.32 0.73 0.27 0.34 0.35 0.72 0 0.31 0.82 0.24 0.62 0.23 0.74 0.91 0.73 0.53 0.46 0.73 0.71 0.34 0.27

0 0 0.01 0 0 0 0 0 0 0 0 1 0.68 0.52 0.78 0.41 0.26 0.36 0.24 0.52 0.61 0.62 0.75 0.51 0.22 0.43 0.79 0.44 0.50 0.36 0.11 0.41 0.67 0.23 0.36 0.34 0.40 0.45 0.45 0.49 0.40 0.60 0.37 0.60 0.73

0.42 0.14 0 0 0 0 0 0 0.40 0.01 0 0 1 0.39 0.76 0.70 0.57 0.27 0.44 0.71 0.62 0.68 0.38 0.29 0.26 0.68 0.59 0.39 0.39 0.56 0.14 0.44 0.88 0.25 0.51 0.27 0.65 0.69 0.75 0.49 0.39 0.88 0.65 0.37 0.43

0 0.07 0 0 0 0 0 0 0 0 0 0 0 1 0.49 0.30 0.15 0.26 0.20 0.34 0.27 0.42 0.45 0.48 0.28 0.21 0.44 0.60 0.41 0.22 0.25 0.27 0.45 0.37 0.35 0.36 0.19 0.30 0.24 0.36 0.44 0.44 0.18 0.37 0.52

0.01 0.91 0 0 0 0 0 0 0 0.01 0 0 0 0 1 0.69 0.57 0.31 0.49 0.66 0.61 0.64 0.63 0.44 0.36 0.61 0.48 0.47 0.42 0.59 0.12 0.37 0.82 0.32 0.55 0.34 0.60 0.71 0.67 0.46 0.44 0.77 0.58 0.37 0.48

0 0 0 0 0.33 0 0 0 0.01 0 0 0 0 0 0 1 0.79 0.19 0.80 0.74 0.65 0.58 0.25 0.15 0.53 0.71 0.09 0.32 0.35 0.86 0.08 0.21 0.85 0.21 0.74 0.38 0.78 0.88 0.77 0.53 0.41 0.77 0.79 0.06 0.14

0 0.11 0 0.05 0 0 0 0 0 0 0 0 0 0 0 0 1 0.14 0.69 0.65 0.50 0.30 0.21 − 0.05 0.25 0.67 − 0.10 0.21 0.26 0.78 − 0.16 0.34 0.69 0.10 0.69 0 0.71 0.81 0.72 0.28 0.22 0.61 0.79 − 0.14 − 0.03

0 0 0.51 0.70 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0.09 0.22 0.33 0.22 0.40 0.31 0.08 0.11 0.31 0.32 0.56 0.16 0.31 0.35 0.26 0.41 0.38 0.19 0.14 0.16 0.10 0.37 0.30 0.26 0.23 0.33 0.40

0.38 0.77 0 0 0.48 0 0 0 0.01 0 0 0 0 0 0 0 0 0 1 0.54 0.41 0.48 0.21 −0.06 0.54 0.51 −0.08 0.20 0.29 0.90 −0.06 0.17 0.61 0.01 0.64 0.27 0.60 0.73 0.58 0.47 0.21 0.47 0.62 −0.05 0.04

0.05 0.35 0 0.13 1 0 0 0 0.66 0.27 0 0 0 0 0 0 0 0 0 1 0.55 0.47 0.36 0.19 0.21 0.70 0.34 0.35 0.34 0.65 0.05 0.32 0.80 0.30 0.56 0.20 0.69 0.88 0.66 0.57 0.50 0.72 0.67 0.45 0.29

0 0.90 0.01 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0.47 0.42 0.38 0.33 0.50 0.44 0.37 0.50 0.60 0.10 0.29 0.62 0.19 0.60 0.39 0.59 0.60 0.47 0.60 0.54 0.62 0.61 0.32 0.48

0 0 0.37 0 0 0 0 0 0 0.10 0 0 0 0 0 0 0 0 0 0 0 1 0.38 0.22 0.54 0.42 0.53 0.36 0.39 0.47 0.25 0.24 0.72 0.15 0.46 0.49 0.42 0.49 0.54 0.43 0.25 0.61 0.43 0.38 0.47

“0” values represent “b0.01”.

detection was carried out for the mineral soils using normal Q–Q plots. Several values of Sample No. 764 (in the original database) were found to be extremely low. This sample was removed from the data set prior to data transformation and multivariate analyses, and a further investigation is needed for this sample. A total of 18 values were identified as outliers for the remaining 976 samples, which was 0.041% of the total number of values (45 variables for 976 samples). The optimal data transformation method of Box–Cox transformation (Box and Cox, 1962; Jobson, 1991) was carried out for data sets of the mineral soils, and multivariate analyses were then carried out for the transformed data sets. Results for Pearson's correlation coefficients and their significance levels of correlation analysis between all the studied variables for the mineral soils are shown in Table 3. There was an expected negative correlation between pH value and SOC, as higher SOC contents are related to lower pH values. Relationships between pH value and the other variables were varied, with poor correlations associated with Avail_K, Ba, Ce, Cr, K, La, Rb, S, Sc, Se, Sn, Th, W (with p values ≥ 0.05); while very strong positive correlations were observed between pH value and Avail_P, Avail_Mg, As, Ca, Cd, Co, Cu, Hg, Li, Mg, Mn, Ni, P, Pb, Sb, Sr, Tl, U, Y, and Zn (p b 0.01). Furthermore, the correlations between pH value and Ga, Ge, Nb, Ta, and Ti were

significantly negative (p b 0.01). Relationships between SOC and the other variables were also complicated, but were generally opposite to those of pH value. There were generally good correlations between metals, but relatively poor correlations existed between the extractable nutrients (Avail_K, Avail_P, Avail_Mg) and metals. 3.6. Cluster analysis Due to the large number of variables, results from correlation analyses appeared fairly complicated, and the relationships among all the variables can be better analysed and visualised using a cluster analysis. In a cluster analysis, the cluster tree of all the variables in mineral soils was produced based on the measure of Pearson's correlation coefficient and the linkage method of furthest neighbour (Fig. 4). Such methods are an effective way to cluster variables with good correlations into the same group (Zhang, 2006). Due to the large number of variables, the relationships among them still appeared complicated. However, taking both the results of cluster tree and geochemical features of variables into consideration, they could be generally classified into four main groups as described below.

C. Zhang et al. / Geoderma 146 (2008) 378–390

385

Mn

Mo

Na

Nb

Ni

P

Pb

Rb

S

Sb

Sc

Se

Sn

Sr

Ta

Th

Ti

Tl

U

V

W

Y

Zn

0 0 0.95 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0.46 0.13 0.19 0.49 0.49 0.45 0.34 0.05 0.41 0.41 0.26 0.32 0.21 0.20 0.32 0.17 0.42 0.33 0.34 0.27 0.47 0.60

0.01 0.16 0.68 0 0 0 0 0.70 0 0 0 0 0 0 0 0 0.14 0 0.07 0 0 0 0 1 0.13 0.14 0.50 0.38 0.41 0.09 0.18 0.30 0.28 0.44 0.19 0.40 0.12 0.12 0.09 0.50 0.51 0.42 0.13 0.44 0.55

0.01 0 0 0 0.61 0 0.01 0 0 0 0 0 0 0 0 0 0 0.01 0 0 0 0 0 0 1 0.24 0.07 0.18 0.20 0.43 0.15 − 0.16 0.43 0.05 0.38 0.65 0.32 0.35 0.36 0.29 0.20 0.32 0.27 0.04 0.16

0 0.07 0 0.06 0.27 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0.22 0.25 0.29 0.60 −0.09 0.24 0.70 0.06 0.53 0.11 0.96 0.74 0.92 0.37 0.33 0.69 0.72 0.19 0.19

0 0 0.14 0.01 0 0 0 0.70 0 0 0 0 0 0 0 0 0 0 0.02 0 0 0 0 0 0.03 0 1 0.33 0.43 0.05 0.20 0.38 0.44 0.23 0.10 0.31 0.15 0.14 0.18 0.48 0.35 0.44 0.08 0.77 0.79

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0.42 0.31 0.41 0.24 0.39 0.29 0.40 0.27 0.27 0.33 0.24 0.33 0.49 0.42 0.29 0.32 0.51

0 0.05 0.41 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0.38 0.17 0.39 0.38 0.25 0.56 0.30 0.33 0.33 0.26 0.55 0.35 0.38 0.36 0.38 0.65

0.47 0.33 0 0 0.51 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.15 0 0 1 − 0.07 0.28 0.70 0.09 0.74 0.22 0.70 0.83 0.62 0.59 0.34 0.60 0.74 0.04 0.17

0.56 0 0 0 0 0.02 0 0.26 0 0 0.93 0 0 0 0 0.01 0 0 0.05 0.14 0 0 0.12 0 0 0.01 0 0 0 0.03 1 −0.08 0.16 0.47 0.11 0.35 −0.06 −0.04 −0.06 0.15 0.27 0.15 0 0.25 0.22

0 0 0.19 0.33 0.14 0 0 0 0.31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.02 1 0.36 0.20 0.34 −0.10 0.25 0.29 0.25 0.34 0.23 0.42 0.39 0.29 0.36

0.90 0 0 0 0.03 0 0 0 0.28 0.28 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0.30 0.62 0.37 0.69 0.82 0.79 0.54 0.40 0.89 0.71 0.37 0.38

0.63 0 0.01 0.46 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.81 0 0 0 0 0 0.10 0.06 0 0 0 0.01 0 0 0 1 0.26 0.28 0.06 0.22 0.03 0.35 0.54 0.33 0.18 0.29 0.21

0.05 0 0.19 0 0.01 0 0 0 0.51 0.01 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0.26 0.63 0.69 0.55 0.49 0.42 0.54 0.71 0.07 0.27

0 0 0.93 0.02 0 0 0 0 0 0 0 0 0 0 0 0 0.89 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0.16 0.18 0.19 0.41 0.31 0.34 0.13 0.29 0.34

0 0.74 0 0.01 0.86 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.07 0 0 0.05 0 0 1 0.78 0.89 0.42 0.38 0.67 0.78 0.14 0.17

0.68 0.39 0 0.01 0.23 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.27 0 0 0 0 0 0 1 0.74 0.51 0.47 0.71 0.79 0.18 0.17

0 0.90 0 0.02 0.26 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.04 0 0 0.40 0 0 0 0 1 0.30 0.23 0.77 0.73 0.09 0.13

0 0.53 0.03 0.22 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0.58 0.56 0.45 0.55 0.53

0 0 0.18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0.47 0.39 0.43 0.37

0.01 0 0 0 0.01 0 0 0 0.78 0.15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0.70 0.32 0.37

0.49 0.10 0 0 0.81 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.01 0 0 0 0.88 0 0 0 0 0 0 0 0 0 0 0 1 0.07 0.14

0 0.07 0.34 0.52 0 0 0 0.17 0 0 0 0 0 0 0 0.06 0 0 0.14 0 0 0 0 0 0.24 0 0 0 0 0.24 0 0 0 0 0.03 0 0 0 0 0 0 0 0.03 1 0.66

0 0.03 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.38 0 0.18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1

Group 1 comprised Al, Ga, K, Rb, Ba, Ge, W, Sn, Nb, Ta, Ti, Sc, V, Cr, Ce, La and Th. The relationships within this group are very strong, and they are mainly controlled by geology, related to coarse minerals, such as feldspar in granite, and in a few cases, related to volcanics within the greywacke sequence. They are mainly of natural origin, and are less affected by mineralization or anthropogenic pollution. Group 2 consisted of Li, Tl, U, Co, Fe, As, Mn, Cu and P. These elements are mainly enriched in areas of volcanic rocks of basalt, granite and rhyolite. A few of them are widely known to be related to anthropogenic pollution (e.g., As and Cu in biocides), but the fact that this group has a close relationship with Group 1 demonstrates that the effects of pollution on these elements are not strong in Irish soils. Group 3 comprised Na, Sr, Mg, SOC, S, and Se. These are alkaline and alkaline earth elements, and organic matter related variables. These elements are quite mobile or changeable in the environment. Group 4 consisted of Avail_P, Avail_K, pH, Ca, Avail_Mg, Ni, Zn, Cd, Y, Mo, Hg, Pb, Sb. The relationship within this group is loose, especially among the extractable nutrients (Avail_P, Avail_K, Avail_Mg). The relatively good correlation between pH and Ca may be regarded as being related to limestone parent material. The heavy metals in this group appear more related to human activities including mining, agriculture and atmospheric deposition.

3.7. Scatter plot matrix To better visualize multiple relationships between the variables, a scatter matrix plot was produced for the 8 variables showing the best relationships in the cluster tree (Fig. 5). On the plot, each cell shows the scatter plot between the variables in the corresponding row and column, and a histogram for each variable was also provided in the diagonal cells where the variables for the corresponding row and column are the same. Quite good correlations between all the 8 variables of Al, Ga, K, Rb, Ba, Ge, W and Sn were illustrated. The best correlation existed between Al and Ga where almost all of the samples cluster along a diagonal line. Relatively good correlations were observed for the 4 variables of Al, Ga, K and Rb, while the scatter plots between Ba and these variables were noisy. The scatter plots between Sn and the other variables were also relatively noisy. These results were in line with the cluster tree, but more details were shown in the scatter matrix plot. Another scatter matrix plot was produced for the last 8 variables in group 4 of the cluster tree (Fig. 6). Compared with the scatter matrices in Fig. 5, the relationships between the variables in this group were obviously weaker, showing more diverse distribution of samples. Most of the metals in this group are affected by mineralization and human

386

C. Zhang et al. / Geoderma 146 (2008) 378–390

pollution. The poorest correlation existed between Sb and the others, which was also shown in the cluster tree. There were some stacked samples for Hg (horizontal in the row of Hg and vertical in the column of Hg) which were caused by the samples below or close to the detection limit. 3.8. Statistical tests for differences among sample groups Due to the non-normality feature of the raw data, the non-parametric method of Kruskal–Wallis test was applied to test the differences among the soil groups classified by rock type, soil type, and land use (Table 4), in order to better understand the degree of influences of these factors on the geochemical variables. The analyses included organic soils and soils from the urban area for comparison purposes. Due to the robust feature of a non-parametric method, all the values in the raw data set including outliers were retained for the test. In Table 4, the zero values represent “b0.01” and can be regarded as a reason to judge that the differences were “very significant”. Most of the significance levels were very significant showing the strong influence of all factors, except for land use when organic soils and urban samples were excluded. Due to the large number of samples used for the statistical test, a lower significance level e.g., 0.01 instead of widely used 0.05 can be used, because the power of statistical tests are affected by number of samples (Zhang et al., 2005). In general, the significance values increased (i.e., became less significant) when organic soils and urban soils were excluded, showing that organic matter and the urban environment had a strong influence on soil chemical composition through dilution and contamination, respectively. An important feature to note is that the significance levels for many metals in soils classified by land use became obviously less significant when organic soils and urban samples were excluded, showing that of the three factors, land use is the least important factor affecting soil chemical composition. This demonstrates that it is unnecessary to focus on land use in the interpretation of soil geochemistry of the study area at the current sampling density, even though fewer classes of land use than soil type and rock type were applied. The extractable nutrients showed generally relatively less significant levels, implying that as expected, they are more affected by other factors such as agricultural activities. 4. Conclusions There was a strong variation in the geochemical variables in soils of Ireland due to the complicated geology and soil types as well as the existence of peat, and the median values were recommended as the representative values of the soil samples. Most of the variables followed neither a normal nor a lognormal distribution. Outliers could be easily identified using a normal Q–Q plot, and they needed to be removed prior to further statistical analyses. A logarithmic transformation was found to have “over-transformed” most of the variables, changing their skewness from positive to negative values. A power transformation was effective in changing the data sets towards normality. There are generally good correlations between metals, but relatively poor correlations between the extractable nutrients (available P, K, and Mg) and metals. The extractable nutrients were more affected by agricultural activities which were more spatially variable (“random”) compared with natural factors such as geology and soil type. There were strong relationships within the group of Al, Ga, K, Rb, Ba, Ge, W, Sn, Nb, Ta, Ti, Sc, V, Cr, Ce, La and Th, as they are mainly controlled by geology, related to coarse minerals, such as feldspar in granite, and in a few cases, related to volcanics. Good correlations between Li, Tl, U, Co, Fe, As, Mn, Cu and P imply their main sources of volcanic rocks of basalt, granite and rhyolite. The alkaline and alkaline earth elements (Na, Sr, Mg) and organic matter related variables of SOC, S, and Se are quite mobile in the environment (except for SOC).

For the rest of the variables studied, Avail_P, Avail_K, pH, Ca, Avail_Mg, Ni, Zn, Cd, Y, Mo, Hg, Pb, Sb, the relationship within them is weak. Heavy metals in this group appear more related to human activities including pollution. There was a relatively good correlation between pH and Ca which may be related to limestone. Correlation analysis is effective in revealing the relationship between pairs of variables, but when the number of variables is large the correlation coefficient matrix can become large and the results can be complex. Cluster analysis has the advantage of summarising the multiple relationships between all the variables in a single cluster tree. A scatter matrix plot can provide details of pairwise relationships but its efficiency is limited to a small number of variables. A statistical test was applied to reveal the influencing factors of rock type, soil type and land use. Among the three factors, land use was found to be the least important influencing factor for the geochemical variables studied. Acknowledgements This study was part of a project funded by the Irish Government under the National Development Plan 2000–2006 (Project No. 2001CD/S2-M2). It was administered by the Irish Environmental Protection Agency (EPA), and the project was part funded by Teagasc. The authors acknowledge the invaluable contribution of members of the project steering committee: P. Loveland, E. P. Farrell, C. Campbell, and J. Brogan. The project team gratefully acknowledges the comments received from the Teagasc research staff including Drs. N. Culleton, B. Coulter, and R. Schulte. Former research staff members at the Teagasc Johnstown Castle are also acknowledged for their expert opinion and discussion, particularly Drs. S. Diamond and G. Fleming. Finally, the support of the EPA Officers, H. Walsh, A. Wemaere and B. Donlon is acknowledged. References Ahrens, L.H., 1954. The lognormal distribution of the elements (a fundamental law of geochemistry and its subsidiary). Geochimica et Cosmochimica Acta 5, 49–73. Aubrey, K.V., 1956. Frequency distributions of elements in igneous rocks. Geochimica et Cosmochimica Acta 9, 83–89. Box, G.E.P., Cox, D.R., 1962. An analysis of transformations. Journal of the Royal Statistical Society, Series B 26 (2), 211–252. Cruickshank, J.G., 1997. Soil and Environment: Northern Ireland. Agricultural and Environmental Science Department, The Queen's University of Belfast, Newforge Lane, Belfast. Fay, D., McGrath, D., Zhang, C., Carrigg, C., O'Flaherty, V., Carton, O.T., Grennan, E., 2007. EPA Report: Toward a National Soil Database (2001-CD/S2-M2). http://www.epa.ie/ downloads/pubs/research/land/ (last accessed: November 16, 2007). Gallego, J.L.R., Ordóñez, A., Loredo, J., 2002. Investigation of trace element sources from an industrialized area (Avilés, northern Spain) using multivariate statistical methods. Environment International 27 (7), 589–596. Gardiner, M.J., Radford, T., 1980. Soil Associations of Ireland and Their Landuse Potential —Explanatory Bulletin to Soil Map of Ireland 1980. The Agricultural Institute, Dublin. 142 pp. Howarth, R.J. (Ed.), 1983. Statistics and Data Analysis in Geochemical Prospecting. Elsevier, Amsterdam, p. 437. Jobson, J.D., 1991. Applied Multivariate Data Analysis. Vol. I: Regression and Experimental Design. Springer-Verlag, New York. Lee, C.S., Li, X.D., Shi, W.Z., Cheung, S.C., Thornton, I., 2006. Metal contamination in urban, suburban, and country park soils of Hong Kong: a study based on GIS and multivariate statistics. Science of The Total Environment 356 (1–3), 45–61. McConnell, B., Gately, S., 2006. Bedrock Geological Map of Ireland. Geological Survey of Ireland, Dublin. McGrath, S.P., Loveland, O.J., 1992. The Soil Geochemical Atlas of England and Wales, (SGAEW). Blackie Academic & Professional, London. Micó, C., Recatalá, L., Peris, M., Sánchez, J., 2006. Assessing heavy metal sources in agricultural soils of an European Mediterranean area by multivariate analysis. Chemosphere 65 (5), 863–872. Oertel, A.C., 1969. Frequency distributions of element concentrations—I. Theorectical aspects. Geochimica et Cosmochimica Acta 33 (7), 821–831. Peech, M., English, L., 1944. Rapid microchemical soil tests. Soil Science 57, 16. Reaves, G.A., Berrow, M.L., 1984. Total lead concentrations in Scottish soils. Geoderma 32, 1–8. Reimann, C., Filzmoser, P., 2000. Normal and lognormal data distribution in geochemistry: death of a myth. Consequences for the statistical treatment of geohemical and environmental data. Environmental Geology 39 (9), 1001–1014.

C. Zhang et al. / Geoderma 146 (2008) 378–390

387

Fig. 4. Cluster tree of variables for mineral soils (measure: Pearson's correlation coefficient; linkage method: furthest neighbour).

Reimann, C., Filzmoser, P., Garrett, R.G., 2002. Factor analysis applied to regional geochemical data: problems and possibilities. Applied Geochemistry 17 (3), 185–206. Shaw, D.M., 1961. Element distribution laws in geochemistry. Geochimica et Cosmochimica Acta 23 (1–2), 116–134. Vistelius, A.B., 1960. The skew frequency distributions and the fundamental law of the geochemical processes. Journal of Geology 68, 1–22. Wright, A.F., Bailey, J.S., 2001. Organic carbon, total carbon and total nitrogen determination in soils of variable calcium carbonate contents using a LECO CN2000 dry combustion analyser. Communication in Soil Science Plant Analysis 32 (19–20), 3243–3258. Zhang, C.S., Zhang, S., 1996. A robust-symmetric mean: a new way of mean calculation for environmental data. GeoJournal 40 (1–2), 209–212.

Zhang, C.S., Selinus, O., 1998. Statistics and GIS in environmental geochemistry—some problems and solutions. Journal of Geochemical Exploration 64, 339–354. Zhang, C.S., Lalor, G., 2002. Multivariate relationship and spatial distribution of geochemical features of soils in Jamaica. Chemical Speciation and Bioavailability 14 (1), 57–65. Zhang, C.S., Manheim, F.T., Hinde, J., Grossman, J.N., 2005. Statistical characterization of a large geochemical database and effect of sample size. Applied Geochemistry 20, 1857–1874. Zhang, C.S., 2006. Using multivariate analyses and GIS to identify pollutants and their spatial patterns in urban soils in Galway, Ireland. Environmental Pollution 142 (3), 501–511.

388

C. Zhang et al. / Geoderma 146 (2008) 378–390

Fig. 5. Scatter matrix plot for selected elements of mainly natural origin from volcanic rocks in mineral soils (Box–Cox transformed data).

C. Zhang et al. / Geoderma 146 (2008) 378–390

Fig. 6. Scatter matrix plot for selected elements of affected by mineralization and/or human pollution in mineral soils (Box–Cox transformed data).

389

390

C. Zhang et al. / Geoderma 146 (2008) 378–390

Table 4 Significance level of Kruskal–Wallis test for the differences among soil groups⁎

pH SOC Avail_P Avail_K Avail_Mg Al As Ba Ca Cd Ce Co Cr Cu Fe Ga Ge Hg K La Li Mg Mn Mo Na Nb Ni P Pb Rb S Sb Sc Se Sn Sr Ta Th Ti Tl U V W Y Zn

Organic and urban soils included

Organic and urban soils excluded

Rock type

Soil type

Land use

Rock type

Soil type

Land use

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0.03 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.14 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0.03 0.04 0 0 0 0 0 0 0 0 0 0.19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0.36 0 0.39 0 0.01 0.01 0 0.01 0 0.01 0.39 0.08 0 0.31 0 0 0.14 0 0.20 0.34 0.02 0 0 0.03 0.03 0 0 0.02 0 0.05 0.23 0.10 0 0.02 0.01 0 0.05 0.03 0 0

⁎ “0” values represent “b0.01”.