Measuring the change under compositional data analysis (CoDA): Insight on the dynamics of geochemical systems

Measuring the change under compositional data analysis (CoDA): Insight on the dynamics of geochemical systems

Accepted Manuscript Measuring the change under compositional data analysis (CoDA): Insight on the dynamics of geochemical systems A. Buccianti, A. Li...

1MB Sizes 0 Downloads 46 Views

Accepted Manuscript Measuring the change under compositional data analysis (CoDA): Insight on the dynamics of geochemical systems

A. Buccianti, A. Lima, S. Albanese, B. De Vivo PII: DOI: Reference:

S0375-6742(17)30365-5 doi: 10.1016/j.gexplo.2017.05.006 GEXPLO 5923

To appear in:

Journal of Geochemical Exploration

Received date: Revised date: Accepted date:

23 March 2017 13 May 2017 25 May 2017

Please cite this article as: A. Buccianti, A. Lima, S. Albanese, B. De Vivo , Measuring the change under compositional data analysis (CoDA): Insight on the dynamics of geochemical systems, Journal of Geochemical Exploration (2017), doi: 10.1016/ j.gexplo.2017.05.006

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT Measuring the change under Compositional Data Analysis (CoDA): insight on the dynamics of geochemical systems A. Buccianti1,2*, A. Lima3, S. Albanese3, B. De Vivo3 1Department

of Earth Sciences, University of Florence, Via G. La Pira 4, 50121 Firenze (I) (Institute of Geosciences and Earth Resources), G. La Pira 4, 50121 Firenze, Italy 3Department of Earth, Environment and Resources Sciences, University of Naples Federico II, Complesso Universitario Montte Sant’Angelo, Via Cintia snc, 80125, Napoli, Italy

SC RI PT

2CNR-IGG

Abstract

PT

ED

MA

NU

Geochemical systems are complex and the analysis of the shape of the frequency distribution of elements and chemical species often informs about the variability and dispersion related to the underlying dynamics. However, when compositional data are used it is incorrect to investigate concentrations in the Euclidean space. Distribution analysis is aimed at focussing on real coordinates, which can be obtained by applying some appropriate log-ratio transformation. In particular, distributional analysis for compositional data can be applied to isometric coordinates, called balances, obtained through partition of a multi-element data set. As an alternative the same investigation can be performed on indices such as the robust Mahalanobis distance after having ranked log-ratio compositions with respect to some reference composition (i.e. average Earth crust) and having obtained a sort of “measure of change”. By considering this second new perspective, the chemical composition of topsoils of the Campania Region (Southern Italy) was investigated by analysing the concentrations of Al, As, B, Ba, Ca, Co, Cr, Cu, Fe, K, La, Mg, Mn, Mo, Na, Ni, P, Pb, Sr, Th, Ti, V and Zn, in mg/kg, determined on more than 3500 samples. Results reveal the presence of complex processes governed by interaction-dominant dynamics where self-organisation appears to play some role.

AC

1. Introduction

CE

Keywords: compositional data, power laws, multifractals, dissipative systems, distributional analysis, robust Mahalanobis distance, compositional changes

In this contribution the regionalised structure of geochemical processes will be investigated by considering the joint behaviour of several elements constituting for each sample its whole composition. The approach is based on the CoDA (Compositional Data Analysis) theory so that the proportionality features of abundance data are fully taken into account enhancing their relative multivariate behaviour (Aitchison, 1986). The work is motivated by the fact that the shape of a probability distribution of chemical variables reveals information about the dynamics governing the system under investigation (van Rooij et al., 2013). However, whenever the sample space is not the real space, some change of *

corresponding author

ACCEPTED MANUSCRIPT representation is needed (Buccianti and Magli, 2011).

Under CoDA, it is mandatory to

investigate the frequency distribution of real coordinates and, for example, one can take balances between compositional parts so that it is the behaviour of the geochemical processes that is monitored instead of that of single variables. The underlying idea is that the association of one group of variables versus another group is more adequate to intercept the dynamics of some process than by looking at single components. This perspective can permit us to obtain interesting information on the dynamics of geochemical systems particularly if spatial or

SC RI PT

temporal scales can also be considered (Seuront, 2010).

Description of the frequency distribution of the abundances of geochemical elements has been an important target of research, attracting attention for at least 100 years, starting with Clarke (1889; 1924) and continued by Goldschmidt (1933 ) and Wedepohl (1955 ). In the book “The data of Geochemistry” by Clarke (1924) there are several attempts to identify the

NU

expected values of natural reservoirs or materials, starting from the first basic step relating to the collection of significant numerical information to explore frequency histograms. Vladimir

MA

I. Vernadsky wrote and published the book "The History of natural waters" in 1933-1936 (Edmunds and Bogush, 2013) where he formulated the bases of the modern analysis of biogeochemical cycles, including the important role of the interactions among different

ED

reservoirs such as lithosphere, biosphere and hydrosphere. It is now known that the equilibrium between ground water and rocks is more complicated than the Vernadsky

PT

description. According to Kondepudi and Prigogine (1998), stationary non-equilibrium states with a high degree of order may emerge in open systems that receive energy from the

CE

external environment. In this framework the evolution of the water composition leads to the emergence of new daughter systems in bifurcation points so that each of them is more

AC

complex than the previous one. The control parameter is given by the water composition, and the thermodynamic variable is the constant of the dissolution reactions. It might be interesting to evaluate which type of imprinting these conditions could give on the frequency distribution of natural materials since water-rock equilibrium also affect soils composition. It was Ahrens (1954a, b) who focussed on the effect of skewness, for example the log-normal distribution, regarded by him as a fundamental law of geochemistry. It is a positively skewed distribution that appears as a symmetric Gaussian one after a logarithmic transformation of the measured variable (Aitchison and Brown, 1954). Processes that follow the law of proportionate effect generate this probability function, and the term ‘‘multiplicative process’’ is used to describe the underlying model. The success of this type of distribution in geochemistry is related to the wide application of the theory of successive random dilutions

ACCEPTED MANUSCRIPT (Ott, 1990; Limpert et al., 2001). A critical approach to the mathematics of probability laws in geochemistry from a CoDA point of view can be found in Mateu-Figueras and PawlowskyGlahn (2008). The key to the emergence of lognormality is the presence of multiplicative or proportional operators linking the system’s variables and represents a special case of dynamics in which the interdependencies among processes are minimised or absent. Just as the sum of many independent random variables yields a Gaussian distribution, the product of many of them

SC RI PT

yields a lognormal distribution (van Rooij et al., 2013). However, an inverse power law distribution is also associated with a positively skewed shape but it is symptomatic of selforganizing physical-chemical systems poised near a critical point (Bak, 1996; Jensen, 1998). Thus adding feedback connections or recurrent dynamics yields a more complex behaviour

NU

(Allégre and Lewin, 1995; Agterberg, 2007; Ma et al., 2014; Buccianti and Zuo, 2016).

2. Methodology

MA

2.1 The statistical approach

ED

The Pareto distribution in its original form is usually written as: 𝑘 𝛼

(𝛼 > 0, 𝑥 ≥ 𝑘 > 0),

PT

𝑃𝑟[𝑋 ≥ 𝑥] = (𝑥 )

𝑃𝑟[𝑋 ≥ 𝑥] = 𝑘𝑥 −𝛼 ,

AC

CE

that is equivalent to the way in which a power-law (fractal) distribution is often presented:

where both k, representing a minimum value for x, and  are constants determined by the dataset.

The cumulative frequency distribution corresponding to Pr[X  x] is given by: 𝑘 𝛼 𝐹𝑋 (𝑥) = 1 − ( ) 𝑥

(𝛼 > 0, 𝑥 ≥ 𝑘 > 0),

and thus 𝑙𝑜𝑔[1 − 𝐹 𝑋 (𝑥)] = 𝛼 log 𝑘 − 𝛼 log 𝑥 .

ACCEPTED MANUSCRIPT The scaling exponent  could be a fraction and is usually called the fractal dimension (Newman, 2005). In a power law distribution the tails fall according to the power  thus leading to much heavier tails than other common models. If X follows a power law distribution, then in a log-log plot of 𝑃𝑟[𝑋 ≥ 𝑥] (the complementary cumulative distribution function) its behaviour will be described by a straight line. On the other hand, a random variable X has a lognormal distribution if the random variable Y=log X has a normal (i.e.

𝑓(𝑥) =

1 √2𝜋𝜎𝑥

𝑒 −(ln 𝑥−𝜇)

SC RI PT

Gaussian) distribution: 2 ⁄2𝜎 2

(𝑥 > 0).

There are several similarities between the two distributions and if the variance of the corresponding normal distribution is large, the log-log plot will show a straight line for

NU

several orders of magnitude (Mitzenmacher, 2004).

In our case the distribution analysis has been applied on distances representing differences

MA

between compositional data. It is well known that compositions are vectors of nonnegative values that carry relative information. A particular case are those data that sum to a constant, typically one or 100%, and are used in cases where the relative abundances are of interest,

ED

and often, the total abundance is not (Aitchison, 1986). When the parts of a composition are proportions, they are not independent, and this condition represents a constraint that

PT

seriously affects their statistical analysis (Pawlowsky-Glahn and Buccianti, 2011; Buccianti, 2013; Pawlowsky-Glahn et al., 2015). Under CoDA the Aitchison distance da is the tool that can

CE

be used to track the compositional changes or, in other words, the difference between compositions (Aitchison, 1986; Aitchison et al., 2000). Data can be ranked according to some

AC

geochemical criteria and distances (difference of each composition from the subsequent one) calculated, thus obtaining a vector of positive values to be analysed as a signal of the compositional changes. As an alternative for each composition it is possible to determine the Aitchison distance with comparison to a fixed composition (difference of each composition from the same reference composition). The fixed composition could represent some reference such as the average Earth crust or, for example, the average of GEMAS (Geochemical Mapping of Agricultural and grazing land Soils, Reimann et al., 2014a and Reimann et al., 2014b) or other compositions (e.g. the baseline determined in Buccianti et al., 2015 for Campania topsoils), depending on the nature of the geological material. given by:

The Aitchison distance da is

ACCEPTED MANUSCRIPT

1

𝑥

2

𝑦

𝑖 𝑖 𝐷 𝑑𝑎 (𝐱, 𝐲) = ||𝐱 − 𝐲||𝑎 = √2𝐷 ∑𝐷 𝑖=1 ∑𝑗=1 (𝑙𝑛 𝑥 − 𝑙𝑛 𝑦 ) . 𝑗

𝑖

where x and y are two different compositions, D stands for the number of variables (parts) of the composition.

SC RI PT

By considering the features of geochemical data, often characterised by skewness and presence of outliers, the calculus of the robust Mahalanobis distance appears to be advisable. Moreover, since the analysed database is characterised by a wide regional extension, a situation that moves geology through very different chemical conditions, the robust approach could guarantee us from avoid the “average” of important differences.

The distance may be calculated after representing compositional data by in isometric log-ratio

NU

coordinates (Egozcue et al., 2003). Robust methods are developed to minimise the impact of atypical observations on mean and variance estimates (Verboven and Hubert, 2005). Robust

MA

analysis in applied geochemistry dates back to the 1980s (Garrett, 1989; Garrett et al., 1982; Smith et al., 1984) and has been subsequently refined by Filzmoser et al. (2005). The robust estimates of the compositional centre μ and scatter matrix Σ of ilr coordinates can be

ED

obtained by using the Minimum Covariance Determinant (MCD) estimator (Filzmoser and Hron, 2008; Filzmoser et al., 2011; Rousseeuw, 1984; Verboven and Hubert, 2005). The

PT

robust Mahalanobis distance is defined by:

AC

CE

−1 (𝒙 𝑅𝐷𝑖 = √(𝒙𝑖 − 𝝁𝑀𝐶𝐷 )𝑇 Σ𝑀𝐶𝐷 𝑖 − 𝝁𝑀𝐶𝐷 )

with μMCD and ΣMCD the MCD location and scatter estimates. The proposed robust distance is a robustification of the Mahalanobis one (Mahalanobis, 1936) where classical mean and empirical covariance matrix are used as estimates of location and scatter. In our case after representing data in isometric log-ratio coordinates the robust Mahalanobis distances of each composition from the average Earth crust (Faure, 1998) and mean GEMAS values (Reimann et al., 2014a; Reimann et al., 2014b) were determined. The two vectors of data representing the compositional changes from the chosen references have then been analysed by considering the information contained in their distributional shape (van Rooij et al., 2013). There is also another way to investigate the RDi values as a tracer (signal) of geochemical processes, by ranking compositions considering their element mobility in soil: Ca > Na > Mg >

ACCEPTED MANUSCRIPT K > Si > Al > Fe > Zr > Ti (Ryan, 2014). If compositions are ranked for increasing values of Ti, the difference between subsequent compositions could trace the mobility conditions and the mapping of RDi could inform us about the constraints imposed by geology as well as by other surficial factors (Sucharovà et al., 2012). Regardless of the type of choice, the computation of these changes essentially could refer to compositions ranked by an element considered to be ‘‘conservative’’ (assumed to be little affected by weathering, e.g., Sc, Ti, Zr, but Fe, Al and Li are also frequently used). Then, it is possible to calculate the differences subsequently for the

SC RI PT

ranked compositions or to compare all the compositions to those characterised by the low content for the chosen element. The options represent different possibilities to construct multivariate enrichment/depletion indices under the CoDA perspective.

NU

2.2 The database

MA

The compositional matrix for which the compositional distances have been determined is given by the chemical concentration of Al, As, B, Ba, Ca, Co, Cr, Cu, Fe, K, La, Mg, Mn, Mo, Na, Ni, P, Pb, Sr, Th, Ti, V and Zn, in mg/kg, in 3535 samples of topsoils collected in Campania

ED

region (southern Italy). The area of about 13,600 km2 was sampled through a 16 km2 grid in the suburban and agricultural areas and a 4 km2 grid in urban areas. More details on sampling

PT

procedure and experimental methods can be found in Buccianti et al. (2015). The concentrations of the chosen elements were all above analytical detection limits, making them

CE

amenable to statistical analysis without any preliminary substitutions or imputations. However, the source repository also contained determinations for several other elements (Ag,

AC

Au, Be, Bi, Cd, Ce, Cs, Ga, Ge, Hf, Hg, In, Li, Nb, Pd, Pt, Rb, Re, S, Sb, Sc, Se, Sn, Ta, Te, Tl, U, W, Y, Zr) but most of the values were below the detection limits. When the compositional data matrix contains a reasonable number of observations with values below the detection limit it is advisable to adopt the procedure proposed in Palarea-Albaladejo and Martin-Fernandez (2015). From the geological point of view the Campania region presents a very complex framework characterised by different lithologies which consist mostly of sedimentary and volcanic rocks, spanning from the Triassic to the recent age (Figure 1, Lima et al., 2003). The morphology is made up of the Apennine Mountains in the eastern sector and of two coastal plains on the west (Volturno and Sele rivers drainage basins). The Apennine chain consists of a pile of nappes formed during the Miocene, overthrusting towards N–NE, and it is structured in

ACCEPTED MANUSCRIPT several blocks in contact with one another along tectonic discontinuities (Bonardi et al., 2009). The volcanic rocks (dated from about 600 ka to present) are represented by potassic/ultrapotassic rocks (lavas and pyroclastics) of different volcanoes (De Vivo et al., 2001). The complex geological framework affects the features of soils but often the combination of climate, vegetation and anthropic activity tend to mask the bedrock nature (Costantini and Dazzi, 2013). The heterogeneous flysch complexes with clayey matrix that are diffused in the

SC RI PT

southern Apennines of Campania present active morphological dynamics with landslides and strong erosion phenomena, thus giving the formation of thin soils. The Somma–Vesuvius and Campi Flegrei complexes, near Naples, includes soils with weak differentiation as well as calcareous soils developed on pyroclastic–calcareous deposits and weakly developed soils on the lapilli from the last eruption. At Roccamonfina parental material is mainly pyroclastic with

NU

different degrees of weathering so that the most significant feature is the clay translocation

MA

(Lulli, 2007).

ED

3. Results and discussion

The composition given by the concentration of Al, As, B, Ba, Ca, Co, Cr, Cu, Fe, K, La, Mg, Mn,

PT

Mo, Na, Ni, P, Pb, Sr, Th, Ti, V and Zn, in mg/kg of the average Earth crust (Faure, 1998) and GEMAS (Al = 10,993; As = 5.48; B 2.42; Ba = 61.70; Ca = 30.35; Co = 7.46; Cr = 20.2; Cu =

CE

14.50; Fe = 17,199; K = 12.50; La = 14.35; Mg = 28.61; Mn = 445; Mo = 0.42; Na = 48; Ni = 14.72; P = 653; Pb = 15.81; Sr = 18.10; Th = 2.89; Ti = 85.70; V = 25.35; Zn = 45.05, Buccianti

AC

et al., 2105) were considered as reference values to calculate the RDi values of the topsoils composition of the Campania Region samples. In this context the RDi values can be considered a measure of the change and the investigation of their distributional shape a source of information about system dynamics (van Rooij et al., 2013). 3.1 Explorative probability plots In Figures 2 and 3 the explorative probability plots indicate that the classical models used to describe positive random variables are not fully adequate, particularly for the presence of heavy tails. Moreover, explorative maps of the RDi values, here not reported, indicate a very complex situation described by a low spatial continuity with samples whose composition is

ACCEPTED MANUSCRIPT characterised by a high difference from the crust and GEMAS average values but which are located nearby in space. In the case of the Earth crust the spatial behaviour of the differences appears to be more related to the nature of the bedrock. In fact, higher values are mainly related to outcrops of limestones, dolostones and sedimentary material, while the lower ones are related to the volcanic areas. On the other hand, the compositional differences with respect to GEMAS mean value indicate that the lowest values appear to characterise all the regional territory. However, locally, due to the nature of the background or the presence of discrimination arises.

SC RI PT

specific phenomena (also anthropic ones in urban and sub-urban areas), some important The described situation is confirmed by the low values, even if

significant (p < 0.01), of the Moran’s I index whose negative or positive values indicate tendency to clustering as well as dispersion depending on the lag (Moran, 1950; Brunsdon and Comber, 2015; Oyana and Margai, 2016).

NU

This intermittent behaviour may be the source of the complex conditions shown by the probability plots. Intermittency implies a tendency of a variable to concentrate into small-

MA

scale features of high intensity surrounded by extended areas of less fluctuations leading to a multifractal behaviour (Frisch, 1995). This condition often describes dynamic systems where dissipative processes must not be neglected and which work far from the thermodynamic

ED

equilibrium (Kondepudi and Prigogine, 1998). The behaviour of the soil system fits this pattern since it remains apart from the thermodynamic equilibrium as an open, porous

PT

system containing solids, liquids and gases in which an exchange of matter and energy with the surrounding environment takes place (Khabirov and Mikayilov, 2013). Soil formation

CE

results from an interplay of acting factors such as parent material, climate, organisms and topography that move through time by a number of processes like weathering, translocation,

AC

neo-formation of minerals and humic substances (Zehe et al., 2010). Chemical and physical processes drive the formation of micro and macroaggregates but soil organisms are a positive feedback in the formation and maintenance of soil structure (Caruso et al., 2011; Nicolodi and Gianello, 2015). If interaction-dominant dynamics describe tightly coupled processes spanning a wide range of temporal or spatial scales, with multiplicative and/or interdependent feedback transactions among processes, heavy-tailed (inverse power law) distribution may arise (Clauset et al., 2009; van Rooij et al., 2013). 3.2 Searching for Pareto distributions After considering the previous results, a more detailed investigation was performed by

ACCEPTED MANUSCRIPT analysing the complementary cumulative distribution function. We should remember that if X has a power law distribution, then in the log-log plot of Pr [X  x] the behaviour will be given by one or more straight lines. Results are reported in Figures 4 and 5 regarding the difference from the Earth crust and GEMAS average compositions, respectively. Note that the horizontal scale is drawn as linear but it is in effect logarithmic, since the applied transformation includes the log function. The cumulative curve was analysed for detecting structural changes in linear regression

SC RI PT

relationships following the approach proposed by Zeiles et al. (2003) and Zeiles (2005). The model assumes n observations of some dependent variable yi and a regressor vector xi such that the conditional distribution yi|xi follows some (quasi)-likelihood f(yi|xi,i) with kdimensional parameter i. The ordering of the observations usually corresponds to time or i.e.,

(𝑖 = 1, … , 𝑛)

MA

𝐻0 : 𝜃𝑖 = 𝜃0

NU

some ranking criteria as in our case. Thus, the hypothesis of interest is “parameter stability”,

that should be tested against the alternative that the parameter i changes over time or rank.

ED

Mathematical details are reported in the R package “strucchange” (Zeileis et al., 2002; R Development Core Team, 2016). In the case of the difference from average crust, an optimal

PT

division in five segments was obtained, corresponding to  values of 0.028, 0.196, 0.337, 0.436, 0.278, all significant for p < 0.001. The corresponding adjusted R2 values were given by

CE

0.77, 0.987, 0.996, 0.998, 0.975 with n equal to 530, 635, 1003, 821 and 530, respectively. In the case of the difference from the GEMAS mean value, an optimal division in six segments

AC

was obtained, corresponding to  values of 0.04, 0.170, 0.270, 0.309, 0.258, 0.167, all significant for p < 0.001. The corresponding adjusted R2 values were given by 0.740, 0.984, 0.989, 0.998,0.997, 0.996 with n equal to 529, 567, 851, 530, 530, 530, respectively. Results seem to indicate that self-similarity does not hold in the strict sense but does hold statistically over a finite range of scales. However this result could not correspond to the real presence of power laws. In literature the problem of the identification of Pareto distribution s.l. is highly discussed, often questioning the ubiquity of power laws in a wide range of disciplines (Stumpf et al., 2005; Stumpf and Porter, 2012), so that alternative rigorous statistical approaches have been proposed (Goldstein et al., 2004). In this framework models are fitted by using a maximum likelihood procedure conditioned to the cut-off value (xmin)

ACCEPTED MANUSCRIPT estimate, through the Kolmogorov-Smirnov statistic (Clauset et al., 2009). When power laws are used in practice, often only the tails of the distribution follow a power law, so that xmin must be estimated. The application of goodness-of-fit tests, via a bootstrapping procedure, developed in the R package “poweRlaw” (Gillespie, 2015) has confirmed the fit of empirical data by power laws. However, for differences from average Earth crust the situation appears to be more complicated and only one Pareto distribution modelling the tail for x>xmin (29.32) is not

SC RI PT

exhaustive. In the case of the differences from GEMAS the application of the KolmogorovSmirnov test (Clauset et al., 2009) is associated to p >> 0.05 for a Pareto distribution characterised by an estimated xmin equal to 11.48.

Consequently, a further deepening has concerned the ratio between skewness and variation coefficients by using the discriminant moment ratio plot proposed by Cirillo (2013). For this

NU

plot good results have already been obtained with 100 observations but if the cardinality is greater than 1000, the discrimination is strongly reliable. In our case differences from crust

MA

and GEMAS mean values fall both in the field of the Pareto distribution, also considering the partition in the different linear segments previously found (Figure 6).

ED

3.3 On the behaviour of compositional changes

PT

Summarising all the results, there are several proofs of the presence of nonlinear interactions between different scales testifying that the compositional changes of the topsoils, when

CE

compared to average GEMAS and Earth crust present intermittent/multifractal behaviour so that their distribution is no longer determined by the first and second statistical moments.

AC

High-order moments play a critical role and contribute significantly to the shape of the heavy tails of the distribution functions. Thus, from the perspective of compositional changes the analysed topsoils could be described as an open system characterised by multiplicative interactions with feedbacks involving different components (Morales and Charbonneau, 2008). Following Morato et al. (2017) this behaviour is attributable to the combination of functional variations and random fluctuations or noise whose contribution is scale dependent (Logsdon et al., 2008; Baveye and Laba, 2015). Is this behaviour peculiar to the compositional changes affecting the analysed topsoils in comparison to GEMAS and Earth crust average values or is it more general? To answer this question compositional changes were analysed by calculating the robust Mahalanobis distances also from the compositions of the Campania database containing the

ACCEPTED MANUSCRIPT lower values for some elements like Al, As, Ca, Cr, Fe, Ni and P. Results are reported in Figure 7 where the curves of the differences from average Earth crust and GEMAS are also included as well as their modelling by using a log-normal distribution (dashed black lines). The choice of the elements was motivated by the low mobility for Al and Fe, high mobility for Ca, the possible impact of human activity due to agriculture for P, and the investigation of the behaviour of some metals like As, Cr and Ni. The fact that Al and Fe may be considered as immobile or nearly immobile was also demonstrated by methods based on element ratios

SC RI PT

(Pearce, 1968; Russel et al., 1990). However, it is now known that diagrams with axes characterised by a common denominator are not Euclidean thus leading to misleading statistical analysis.

Furthermore, also the difference from the robust barycentre of the dataset as determined in Buccianti et al. (2015) was reported. It represents the most frequent multi-element vector

NU

and it might be candidate to be a compositional baseline.

As we can see, the changes from compositions characterised by the lower contents of the

MA

selected elements present the typical segmentation in different linear pieces, and the presence of heavy tail distributions is confirmed by using several tools (Cirillo, 2013). When compositional changes are analysed using the composition containing the lower value of Al

ED

and Fe, a very similar behaviour is found with a high overlapping between the curves testifying a similar participation of the elements in geochemical processes generating the

PT

topsoils. Changes with respect to the composition characterised by the lower values of Ni and Cr are located towards those obtained in comparison with GEMAS mean values with wide

CE

overlapping for increasing RDi values. A shift towards right is revealed for Ca and P indicating that changes from the compositions with the lower contents of Ca and P are higher when

AC

compared with the GEMAS ones. Changes that move from the composition with the lower As content tend towards those registered when the average Earth crust is assumed as reference, with a very similar curve even if shifted in RDi values. Under no circumstances the curves of the chosen elements follow the log-normal models obtained for the differences from average Earth crust and GEMAS values. By considering the behaviour of the differences from the robust barycentre candidate to be a baseline (Buccianti et al., 2015), a power law behaviour is again obtained (Clauset et al., 2009; Cirillo, 2013) even if for lower difference values. On the whole, for the same robust Mahalanobis distance value, the number of samples for which N >= x tends to increase from left to right giving us a clear perception of inhomogeneity increasing with the change of the reference composition. The result enforces the idea that multiplicative processes alone cannot

ACCEPTED MANUSCRIPT be at the base of the compositional changes and that feedback mechanisms are needed to move towards power law modelling (van Rooij et al., 2013). Summarising all the obtained results it appears that the presence of power law (Pareto) distributions for different scales is typical of the compositional changes affecting the analysed topsoils. The presence of the Pareto distribution indicates that the geochemical laws that have contributed to the genesis of the soils moved according to random multiplicative processes affected by feed-backs, thus generating a complex evolving structure, dominated by self-

SC RI PT

organised criticality. In geochemical systems multiplicative processes are related to the partition phenomena when elements move in the geochemical cycle and intercept different phases/reservoirs, depending on their affinity. The spontaneous self-organization of fractal structures in time and space occurs principally because this is the most efficient way for geochemical systems to dissipate energy gradients, consume free energy and produce entropy

NU

(Goncalves, 2001; Rantitsch, 2001; Kleidon, 2015).

Frink (2011) describes soil as a self-organizing system because it dissipates entropy as a

MA

result of its development or organization. The exchange of information, energy and matter with the environment occurs primarily at two spatially distinct locations. The first one is the lower boundary, where soil interfaces with parent material, bedrock, and/or groundwater

ED

systems, where entropy is exported in the form of leachates. The second is the upper boundary where soil interfaces with the atmosphere where gases, organic matter, climate,

PT

and other biochemical and biomechanical exchanges with the environment occur. Between these two boundaries soil is self-organizing.

CE

Considering the behaviour of compositional changes reported in Figure 7, elements like Ca, P, Cr e Ni appear to be more correlated with the upper boundary since compositional differences

AC

emulate those obtained by using GEMAS mean value as reference. This could imply that the geochemical behaviour of these elements has been affected by surficial phenomena reducing the whole variability and could be related to factors affecting their mobility or reactivity. It is well known that Ca and Mg are considered the predominant exchangeable cations, thus presenting a high mobility in soils able also to affect P behaviour, this last also perturbed by intensive agriculture (Manimel Wadu et al., 2013). On the other hand, the behaviour of Cr and Ni, besides their origin from mafic bedrock, is highly influenced by dispersion mechanisms that characterise sedimentary material, particularly flysch deposits. They participate in redox (with Fe and Mn) and pH-mediated adsorption/desorption reactions with their abundance conditioned by the presence of organically bound forms (Kabata-Pendias, 2011, Gurumurthy et al., 2014).

ACCEPTED MANUSCRIPT On the other hand, As could be more associated with the lower boundary, revealing a variability highly associated with the bedrock nature. Overall, As minerals and compounds are readily soluble, but As migration is greatly limited, because of strong sorption by clays, hydroxides and organic matter; the latter may have a marked influence on the measured As concentration (Reimann et al. 2003). Hydrated Fe (goethite) and Al oxides are the most active for As retention (Kabata-Pendias, 2001; Reimann et al. 2003). Although adsorbed As in soil is unlikely to be desorbed again, it may be liberated when As is combined with Fe and Al oxides

SC RI PT

by hydrolysis caused by the reduction of soil potential. Similar conditions could also describe the Al and Fe behaviour (De Vivo e t al., 2016) explaining the relative position of the curves in Figure 7. Thus, apparently, the geochemical processes have worked efficiently for Ca, P, Cr and Ni compared with As, Fe and Al. This result implies some differences in the processes controlling the solubility, mobility, bioavailability, and toxicity, particularly for the analysed

NU

metals (Sparks, 2005).

Diagrams such as that reported in Figure 7, if realised for all the elements, might contribute to

MA

understanding their geochemical behaviour and the sources of their variability. Interesting here is that Kirchner and Neal (2013) found similar results in water quality time series. They analysed differences between adjacent pairs of local averages rationed on the length of time

ED

over which those averages were calculated, thus discovering non-self-averaging behaviour.

PT

Conclusions

CE

Power law behaviour is associated with complex systems composed of processes that interact to self-organise their behaviour across multiple temporal or spatial scales. To recognise the

AC

fingerprint of this condition is one of the most important item in several fields of investigation. Complex dynamical systems can have tipping points at which a sudden shift to a contrasting dynamical regime may occur so that prediction in time or space is very difficult. Several investigations suggest that two major features are crucial for the overall response of such systems and their resilience capacity: (i) the heterogeneity of the components, (ii) their connectivity. How these properties affect the stability depends on the nature of the interactions (Scheffer et al., 2012). In this context, multimodality of frequency distribution may be caused by nonlinear responses to unobserved drivers or from the multimodality of their distribution (Scheffer et al., 2009). In this contribute the role of the analysis of the shape of frequency distribution was exploited when applied to indices related to compositional changes. To achieve this aim the robust

ACCEPTED MANUSCRIPT Mahalanobis distance was determined as a (multivariate) measure of the difference between compositions when transformed into real coordinates. Differences calculated with respect to the average Earth crust and to GEMAS mean value for the top soils of the Campania region (southern Italy) have revealed the fingerprint of more than one power law distribution, thus indicating that the “soil system” is neither completely deterministic, nor totally chaotic. Rather it is in an intermediate state where it possesses a property of multifractality in the spatial domain and, probably, intermittency in time. This behaviour is not a coincidence since

SC RI PT

it occurs also when the robust distance is calculated considering as reference the compositions with the lower abundance of Al, As, Ca, Cr, Fe, Ni and P and was found also by Kirchner and Neal (2013) in water quality time series analysing compositional differences, although without applying CoDA philosophy.

Even if more investigations are needed, developing predictive systems based on these

NU

properties poses major challenges to understand how geochemical systems work and what tools (statistical, geostatistical, time series analysis) should be used for their description. Self-

MA

organization requires a particular kind of interaction to coordinate the processes that must work together. The precise form of this interaction balances competitive and cooperative processes to create an optimally adaptive and flexible working configuration called critical

ED

state, related to a self-organized criticality where interaction-dominant dynamics are the rule. The key fact of interaction-dominant dynamics is that system components change each

PT

other's dynamics to coordinate their collective behaviour giving no sense to the univariate analysis. Thus, if in this condition the multivariate perspective is mandatory, the CoDA

AC

CE

approach appears to be essential for unbiased evaluations.

Figure Captions

Figure 1. Simplified geological map of Campania Region, Italy. From Lima et al. (2003). Figure 2. Probability plots of the difference (RDi) between the composition of topsoils of the Campania region and the average Earth crust. Dashed lines correspond to the probability models.

ACCEPTED MANUSCRIPT Figure 3. Probability plots of the difference (RDi) between the composition of topsoils of the Campania region and the GEMAS reference composition. Dashed lines correspond to the probability models. Figure 4. Log-log plot of the complementary cumulative distribution function ( 𝑃𝑟[𝑋 ≥ 𝑥]) of the robust Mahalanobis distance from the average Earth crust composition.

SC RI PT

Figure 5. Log-log plot of the complementary cumulative distribution function ( 𝑃𝑟[𝑋 ≥ 𝑥]) of the robust Mahalanobis distance from the GEMAS mean composition.

Figure 6. Discriminant moment-ratio plot. Data for variation coefficient (CV) and skewness (red point in the green area) are related to the robust Mahalanobis distance values describing

NU

the compositional changes of the Campania region topsoils from average GEMAS and Earth

MA

crust.

Figure 7. Behaviour of the robust Mahalanobis distance from the compositions containing the lower values of Al, As, Ca, Cr, Fe, P, Ni, respectively. The curve of the differences from the

ED

robust barycentre of Campania dataset (Buccianti et al., 2015), average Earth crust and GEMAS are also reported. Dashed black lines correspond to the modelling of Earth crust and

CE

PT

GEMAS differences by using a log-normal distribution.

Acknowledgments

AC

The University of Florence (I) and the GEOBASI project (BUC_BASI) are thanked for research funds for 2015 and 2016. The work was also supported by the grant from the Ministero dell'Università e della Ricerca Scientifica — Industrial Research Project “Integrated agro-industrial chains with high energy efficiency for the development of ecocompatible processes of energy and biochemicals production from renewable sources and for the land valorization (EnerbioChem)” PON01_01966, funded (Prof. B. De Vivo) in the frame of the Operative National Programme Research and Competitiveness 2007–2013 D. D. Prot. n. 01/Ric. 18.1.2010.

References Aitchison, J., 1986. The statistical analysis of compositional data. Monographs on Statistics and Applied Probability. Chapman and Hall, Ltd., London, UK (416 pp. (reprinted 2003 with additional material by The Balckburn Press)). Aitchison, J., Brown, J., 1954. On criteria for descriptions of income distribution.

ACCEPTED MANUSCRIPT Metroeconomica 6, 88–98. Aitchison, J., Barceló-Vidal, C., Martín-Fernández, J.A., and Pawlowsky-Glahn, V. (2000) Logratio analysis and compositional distance. Mathematical Geology, 32 (3), 271–275. Agterberg, F., 2007. Mixtures of multiplicative cascade models in geochemistry, Nonlinear Processes Geophys. 14, 201–209.

SC RI PT

Ahrens, L., 1954a. The lognormal distribution of the elements-II, Geochim. Cosmochim. Acta 6, 121–131. Ahrens, L., 1954b. The lognormal distribution of the elements (a fundamental law of geochemistry and its subsidiary), Geochim. Cosmochim. Acta 5, 49–73. Allegre, C., Lewin, E., 1996. Scaling laws and geochemical distributions. Earth Planet. Sci. Lett. 132, 1–13. Bak, P., 1996. How Nature works. New York, Springer-Verlag.

NU

Baveye, P.C., Laba M., 2015. Moving away from the geostatistical lamppost: Why, where, and

how does the spatial heterogeneity of soils matter? Ecological Modelling, 298, 24-38.

MA

Bonardi, G., Ciarcia, S., Di Nocera, S., Matano, F., Sgrosso, I., Torre, M., 2009. Carta delle principali unità cinematiche dell'appennino meridionale. Nota illustrativa. Ital. J. Geosci. 128 (1), 47–60.

ED

Brunsdon C., Comber, L., 2015. An introduction to R for Spatial Analysis & Mapping. SAGE Publications Ltd, Thousand Oaks, California 91320, 464 pp..

PT

Passi di: Comber, Lex. “An Introduction to R for Spatial Analysis and Mapping”. iBooks.

CE

Buccianti, A., 2013. Is compositional data analysis a way to see beyond the illusion? Comput. Geosci. 50, 165–173.

AC

Buccianti, A., Magli, R., 2011. Metric concepts and implications in describing compositional changes for world river’s chemistry. Computers and Geosciences, 37(5), 670-676. Buccianti, A., Lima, A., Albanese, S., Cannatelli, C., Esposito, R., De Vivo, B., 2015. Exploring topsoil geochemistry from the CoDA (Compositional Data Analysis) perspective: The multielement data archive of the Campania Region (Southern Italy). Journal of Geochemical Exploration, 159, 302-316. Buccianti, A., Zuo, R., 2016. Weathering reaction and isometric log-ratio coordinates: Do they speak to each other? Applied Geochemistry, 75, 189-199. Caruso, T., Barto, E.K., Siddiky, Md. R.K., Smigelski, J., Rilling, M.C., 2011. Are power laws that

estimate fractal dimension a good descriptor of soil structure and its link to soil biological properties? Soil Biology & Biochemistry 43, 359-366. Cirillo, P., 2013. Are your data really Pareto distributed? Physica A., 392, 5947-5962.

ACCEPTED MANUSCRIPT

Clarke, F.W., 1889. The relative abundance of the chemical elements, Philosophical Society of Washington Bulletin 11, 131–142. Clarke, F.W., 1924. The Data of Geochemistry, 5th ed. US Geological Survey Bulletin, 770. Washington, US Government Printing Office.

SC RI PT

Clauset, A., Shalizi, C.R., Newman, M.E.J., 2009. Power-law distributions in Empirical Data, SIAM Review, 51(4), 661-703. Costantini, E.A.C., Dazzi, C., 2013. The Soils of Italy. Springer, 354 pp.

De Vivo, B., Rolandi, G., Gans, P.B., Calvert, A., Bohrson, W.A., Spera, F.J., Belkin H.E., 2001. New constraints on the pyroclastic eruptive history of the Campanian volcanic Plain (Italy) Min. Petr., 73, 47–65

NU

De Vivo, B., Lima, A., Albanese, S., Cicchella, D., Civitillo, D., Minolfi, G., Zuzolo, D., 2016. Atlante geochimico-ambientale dei suoli della Campania. Collana Conoscenza Geochimica del Territorio (CGT), 14, Aracne Editrice-Gioacchino Onorati Editore S.r.l., Canterano (RM). ISBN: 978-88-548-9744-1, 359 pp.

MA

Edmunds, W.M., Bogush, A.A., 2013. V.I. Vernadsky – Pioneer of water-rock interaction, Procedia Earth and Planetary Science, 7, 236-239.

ED

Egozcue, J.J., Pawlowsky-Glahn, V., Mateu-Figueras, G., Barcelo-Vidal, C., 2003. Isometric logratio transformations for compositional data analysis, Mathematical Geology 35(3), 279– 300.

PT

Faure, G., 1998. Principles and Applications of Geochemistry, II edition, Prentice Hall, Upper Saddle River, New Jersey, 07458, 600 pp.

CE

Filzmoser, P., Reimann, C., Garrett, R.G., 2005. Multivariate outlier detection in exploration geochemistry. Computers & Geosciences 31, 579–587.

AC

Filzmoser, P., Hron, K., 2008. Outlier detection for compositional data using robust methods. Mathematical Geosciences 40 (3), 233–248. Filzmoser, P., Hron, K., Reimann, C., 2011. Interpretation of multivariate outliers for compositional data. Computers & Geosciences 39, 77–85. Frink, D.S., 2011. Explorations into a Dynamic Process-oriented Soil Science. Elsevier, Amsterdam, 117 pp. Frisch, U., 1995. Turbulence, The Legacy of A.N. Kolmogorov, Cambridge University Press, Cambridge. Garrett, R.G., 1989. The chi-square plot: a tool for multivariate outlier recognition. Journal of Geochemical Exploration 32, 319–341.

ACCEPTED MANUSCRIPT Garrett, R.G., Goss, T.I., Poirier, P.R., 1982. Multivariate outlier detection — an application of robust regression in the earth sciences. Am. Stat. Assoc. Ann. Mtg. (abstract 101). Gillespie, C.S., 2015. Fitting heavy tailed distributions: The poweRlaw package, Journal of Statistical Software, 64(2). Goldschmidt, V.M., 1933. Grundlagen der quantitativen geochimie, Fortschrift Mineralogie 17(2), 112–156.

SC RI PT

Goldstein, M.L. Morris, S.A., Yen, G.G., 2004. Problems with fitting to the power-law distribution. The European Physical Journal B, 41(2), 1060-1068. Goncalves, M.A., 2001. Characterization of geochemical distributions using multifractal models. Mathematical Geology, 33, 41-61. Gurumurthy, G.P., Balakrishna, K., Tripti, M., Audry, S., Riotte, J., Brraun, J.J., Udaya Shankar, H.N., 2014. Geochemical behaviour of dissolved trace elements in a monsoon-dominated tropical river basin, Southwestern India. Environ. Sci. Pollut. Res. Int., 21 (7), 5098–5120.

NU

Jensen, H.J., 1998. Self-Organised Criticality. Cambridge University Press, Cambridge.

MA

Kabata-Pendias, A., 2011. Trace Elements in Soils and Plants, Fourth edition, CRC Press, Taylor & Francis Group, Boca Radon, Florida, 505 pp.

ED

Khabirov, I., Mikayilov ,F., 2013. Non-linear thermodynamic laws application to soil processes. Eurasian Journal of Soil Science, 2, 54-58.

PT

Kirchner, J.W., Neal, C., 2013. Universal fractal scaling in stream chemistry and its implications for solute transport and water quality trend detection. Proceedings of the National Academy of Sciences, 110(3), 12213-12218.

CE

Kleidon, A., 2015. Non-equilibrium thermodynamics, maximum entropy production and Earth-system evolution. Philosophical Transactions of the Royal Society, 368, 181-196.

AC

Kondepudi, D., Prigogine, I., 1998. Moder Thermodynamics. From heat engines to dissipative structures. Wiley, 486 pp. Lima, A., De Vivo, B., Cicchella D., Cortini, M., Albanese, S., 2003. Multifractal IDW interpolation and fractal filtering method in environmental studies: an application on regional stream sediments of (Italy), Campania region. Applied Geochemistry, 18(12), 1853-1865. Limpert, E., Stahel, W.A., Abbt, M., 2001. Log-normal distributions across the sciences: keys and clues. Bioscience, 51(5), 341-352. Logsdon, S.D., Perfect, E., Tarquis, A.M., 2008. Multiscale soil investigations: physical concepts and mathematical techniques. Vadose Zone Journal, 7, 453–455. Lulli, L., 2007. Italian volcanic soils. In: Arnalds, O., Bartoli, F., Buurman, P., Oskarsson, H., Stoops, G., Garcia-Rodeja, E. (Eds.), Soils of Volcanic Regions in Europe. Springer-Verlag, pp. 51–67.

ACCEPTED MANUSCRIPT

Ma, T., Li, C., Lu, Z. (2014), Estimating the average concentration of minor and trace elements in surficial sediments using fractal methods, Journal of Geochemical Exploration 139, 207– 216. Mahalanobis, P.C., 1936. On the generalised distance in Statistics. Proceedings of the National Institute of Sciences of India, 2 (1), 49–55.

SC RI PT

Manimel Wadu, M. C.W., Michaelis, V.K., Kroeker, S., Akinremi, O.O. 2013. Exchangeable Calcium/Magnesium ratio affects Phosphorus behavior in calcareous soils. Soil Sci. Soc. Am. J. 77, 2004-2013. Mateu-Figueras, G., Pawlowsky-Glahn, V., 2008. A critical approach to probability laws in geochemistry, Mathematical Geosciences 40, 489–502. Mitzenmacher, M. 2004. A brief history of generative models for power law and lognormal distributions. Internet Mathematics, 1(2), 226-251.

NU

Morales, L.F., Charbonneau, P., 2008. Scaling laws and frequency distributions of avalanche areas in a self-organised criticality model of solar flares. Geophysical research Letters, 35, L04108, 1-5.

MA

Moran, P.A.P., 1950. Notes on continuous stochastic phenomena. Biometrika, 37:17–23.

ED

Morato, M.C., Castellanos, M.T., Bird, N.R., Tarquis, A.M., 2017. Multifractal analysis in soil properties: spatial signal versus mass distribution. Geoderma, 287, 54-65.

PT

Newman, M.E.J., 2005. Power laws, Pareto distribution and Zipf’s law. Contemp. Phys., 46, 323-351.

CE

Nicolodi, M., Gianello, C., 2015. Understanding soil as an open system and fertility as an emergent property of the soil system. Sustainable Agriculture Research, 4(1), 94-105.

AC

Ott, W., 1990. A physical explanation of the lognormality of pollutant concentrations. Journal of Air and Waste Management Association 40(10), 1378–1383. Oyana T.J., Margai F.M., Spatial Analysis. Statistics, Visualization, and Computational Methods. CRC Press, Taylor & Francis Group, Boca Raton, FL 33487-2742, 316 pp. Palarea-Albaladejo, J., Martin-Fernandez, J.A., 2015. ZCompositions. R package for multivariate imputation of left-censored data under a compositional approach. Chemom. Intell. Lab. Syst. 143, 85–96. Pawlowsky-Glahn, V. and Buccianti, A. (2011). Compositional Data Analysis: Theory and Applications, John Wiley & Sons, Ltd, 378 p. Pawlowsky-Glahn, V., Egozcue, J.J., Tolosana-Delgado, R., 2015. Modeling and analysis of compositional data, John Wiley & Sons, LtD., 247 pp.

ACCEPTED MANUSCRIPT Pearce, T.H., 1968. A contribution to the theory of variation diagrams. Contributions to Mineralogy and Petrology 19 (2), 142-157. R Development Core Team, 2016. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/. Rantitsch, G., 2001. The fractal properties of geochemical landscapes as an indicator of weathering and transport processes within the Eastern Alps. Journal of Geochemical Exploration, 73, 27-42.

SC RI PT

Reimann C., Siewers, U., Tarvainen, T., Bityukova, L., Eriksson, J., Gilucis, A., Gregorauskiene, V., Lukashev, V.K., Matinian, N.N., Pasieczna, A., 2003. Agricultural Soils in Northern Europe: A Geochemical Atlas. Geologisches Jahrbuch, Sonderhefte, Reihe D, Heft SD 5, Schweizerbart'sche Verlagsbuchhandlung, Stuttgart, 279 pp. Reimann, C., Birke, M., Demetriades, A., Filzmoser, P., O'Connor, P. (Editors) and GEMAS Team, 2014a. Chemistry of Europe's agricultural soils — part A: methodology and interpretation of the GEMAS data set. Geologisches Jahrbuch (Reihe B), Schweizerbarth: Hannover, (528 pp.).

MA

NU

Reimann, C., Birke, M., Demetriades, A., Filzmoser, P., O′Connor P. (Editors) and GEMAS Project Team, 2014b. Chemistry of Europe's agricultural soils — part B: general background information and further analysis of the GEMAS data set. Geologisches Jahrbuch (Reihe B), Schweizerbarth: Hannover, (352 pp.) Rousseeuw, P.J., 1984. Least median of squares regression. J. Am. Stat. Assoc. 79, 871–880.

ED

Russell, J.K., Nicholls, J., Stanley, C.R., Pearce, T.H., 1990. Pearce element ratios: a paradigm for testing hypotheses. Eos, Transactions American Geophysical Union, 71(5), DOI 10.1029/EO071i005p00234.

PT

Ryan, P., 2014. Environmental and low temperature geochemistry, Wiley Blackwell, 402 pp.

AC

CE

Scheffer, M., Carpenter S.R., Lenton, T.M., Bascompte, J., Brock, W., Dakos, V., van de Koppel, J., van de Leemput, I.A., Levin, S.A., van Nes, E.H., Pascual, M., Vandermeer, J., 2012. Anticipating Critical Transitions, Science, 338, 344-348. Seuront, L., 2010. Fractals and multifractals in Ecology and aquatic science. Boca Raton, FL, CRC Press. Smith, R.E., Campbell, N.A., Litchfield, R., 1984. Multivariate statistical techniques applied to pisolitic laterite geochemistry at Golden Grove, Western Australia. Journal of Geochemical Exploration 22 (1–3), 193–216. Sparks, D.L., 2005. Toxic metals in the Environment: the role of surfaces. Elements, 1, 193197. Stumpf, M.P.H., Porter M.A., 2012. Critical truths about power laws. Science, 335, 665-666.

ACCEPTED MANUSCRIPT Stumpf, M.P.H., Wiuf C., May R.MN., 2005. Subnets of scale-free networks are not scale-free: Sampling properties of networks. Proc. National Acad. Sci., USA, 102, 4221. Sucharovà, J., Suchara, I., Hola, M., Marikova, S., Reimann, C., Boyd R., Filzmoser, P., Englmaier, P., 2012. Top-/bottom-soil ratios and enrichment factors: What do they really show? Applied Geochemistry, 27, 138-145.

SC RI PT

van Rooij, M.M.J.W., Nash, B.A., Rajaraman, S., Holden, J.G., 2013. A fractal approach to dynamic inference and distribution analysis. Frontiers in Physiology, 4, 1-16. Verboven, S., Hubert, M., 2005. LIBRA: aMatlab library for robust analysis. Chemom. Intell. Lab. Syst. 75, 127–136. Zehe, E., Blume, T., Bloschl, G., 2010. The principle of “maximum energy dissipation”: a novel thermodynamic perspective on rapid water flow in connected soil structures. Philosophical Transactions of the Royal Society, B, 365, 1377-1386.

NU

Zeileis, A., Leisch, F., Hornik, K., Kleiber, C., 2002. strucchange: An R package for testing for structural change in linear regression models. Journal of Statistical Software, 7(2):1–38.

MA

Zeileis, A., Kleiber, C., Krämer, W., Hornik, K., 2003. Testing and dating of structural changes in practice. Computational Statistics & Data Analysis, 44(1–2):109–123.

ED

Zeileis, A. 2005. A unified approach to structural change tests based on ML scores, F statistics, and OLS residuals. Econometric Reviews, 24(4):445–466.

AC

CE

PT

Wedepohl, H. (1955), The composition of the Earth crust, Geochim. Cosmochim. Acta 59, 1217–1239.

MA

NU

SC RI PT

ACCEPTED MANUSCRIPT

AC

CE

PT

ED

Figure 1

SC RI PT

ACCEPTED MANUSCRIPT

AC

CE

PT

ED

MA

NU

Figure 2

SC RI PT

ACCEPTED MANUSCRIPT

AC

CE

PT

ED

MA

NU

Figure 3

SC RI PT

ACCEPTED MANUSCRIPT

AC

CE

PT

ED

MA

NU

Figure 4

SC RI PT

ACCEPTED MANUSCRIPT

AC

CE

PT

ED

MA

NU

Figure 5

SC RI PT

ACCEPTED MANUSCRIPT

AC

CE

PT

ED

MA

NU

Figure 6

SC RI PT

ACCEPTED MANUSCRIPT

AC

CE

PT

ED

MA

NU

Figure 7

ACCEPTED MANUSCRIPT Highlights

CE

PT

ED

MA

NU

SC RI PT

Frequency distribution informs about the underlying geochemical system dynamics. Isometric log-ratio transformation is applied to work with compositions. Robust Mahalanobis distance is an evaluable tool for measuring chemical changes. Balancing competitive and cooperative processes is testified by Pareto model. Systems coordinating collective behaviour give no sense to univariate analysis. When multivariate perspective is mandatory, the CoDA approach is essential.

AC

     