Journal of Volcanology and Geothermal Research 207 (2011) 83–92
Contents lists available at SciVerse ScienceDirect
Journal of Volcanology and Geothermal Research j o u r n a l h o m e p a g e : w w w. e l s ev i e r. c o m / l o c a t e / j vo l g e o r e s
Predictive regions for geochemical compositional data of volcanic systems Brandon Chiasera, Joaquín A. Cortés ⁎ Department of Geology, University at Buffalo, 411 Cooke Hall, Buffalo, NY 14260-1350, USA
a r t i c l e
i n f o
Article history: Received 17 February 2011 Accepted 28 July 2011 Available online 18 August 2011
a b s t r a c t In this contribution, the theoretical basis to define predictive regions on multivariate compositional data is applied to, volcanological, geochemical data. The method defines a predictive region based on the calculation of the eigenvalues and eigenvectors of the covariance matrix of a logratio-transformed data set to circumvent the issue of singularity of the covariance matrix of the raw data due to the closure problem. Principal components have been used to reduce dimensionality and produce a 2-D display of the transformed data and the confidence ellipse. The method was tested in a well studied volcanic sequence from Stromboli volcano (The Vancori period) and in the tholeiitic differentiation of Thingmuli volcano represented in the AFM diagram. Results of the first example are strongly consistent with previous works, suggesting that the methodology reproduces the conventional patterns and potentially could help in the search of hidden patterns in noisy data. Results of the second example depict a useful approach for the treatment of volcanic geochemical data when plotted in a ternary diagram. © 2011 Elsevier B.V. All rights reserved.
1. Introduction Understanding geological processes often requires understanding how geochemical data vary. Equivalently, geological systems will change in response to a given process and it is accepted that geochemical data will somehow reflect these changes. Traditional statistical methods are thus used to describe and quantify such changes as proxies of the geological process. Because of the use of this statistical background, an estimation of central tendency of a geochemical data set is typically calculated using the arithmetic average under the tacit assumption that geochemical data follows a normal distribution. However, standard geochemical data are compositions, i.e. data are multivariate sets of major and trace elements in which the concentrations of major elements are in wt.% and the concentrations of trace elements are in parts per whole such as parts per million (ppm). Although the arithmetic average can be calculated (the arithmetic average of compositions is a composition), determining the variance or dispersion of such data is, on the other hand, not possible because the arithmetic average ± the variance does not produce a composition, i.e. does not close to 100%. Despite this evident problem (closure), it is usual to find geochemical data reported in that way in the literature with the authors oblivious to the fact that the “±” of an average will produce values that will be bigger than 100% or smaller than zero. In this work, we build on key
⁎ Corresponding author. E-mail address:
[email protected] (J.A. Cortés). 0377-0273/$ – see front matter © 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.jvolgeores.2011.07.009
concept (Chayes, 1971) that traditional statistical methodologies are not adequate when studying a geochemical data set (and any other compositional data set) due to the essential nature of compositions: 1. Compositions are constrained to positive values. The normal distribution, on the other hand, is defined for all the real numbers, which means that at least a truncated approximation should be calculated. 2. Compositions carry relative information (see below). Any statistical fit will only yield relative results unless additional ancillary data is known. Since the realization of the problems of using standard statistical methodologies on compositional data (Pearson, 1897), several attempts have been carried out to solve this problem (e.g. Chayes, 1960; Pearce, 1968; Chayes, 1971; Aitchison, 1982; Russell and Nicholls, 1988; Rollinson, 1992). In this contribution a logratio transformation was used (Aitchison, 1982; Egozcue and Pawlowsky-Glahn, 2006) to explore: 1. The application of compositional (“simplicial”) geometry (Aitchison, 1982; Tolosana-Delgado et al., 2005; Egozcue and Pawlowsky-Glahn, 2006; Pawlowsky-Glahn and Egozcue, 2006; Pawlowsky-Glahn et al., 2007) to construct a predictive region for the particular case of geochemical data (Aitchison, 1982). 2. A graphical approach to display the results in a plot based on Principal Component Analysis modified for compositions (e.g. Aitchison, 1983).
84
B. Chiasera, J.A. Cortés / Journal of Volcanology and Geothermal Research 207 (2011) 83–92
3. The performance of the proposed technique compared with traditional approaches using real examples taken from the literature.
means of the eigenvector decomposition. Both restrictions can be circumvented using a logratio transformation.
2. Methodology
2.2. Logratio transformations
2.1. The predictive region
Compositional data only carries relative information (e.g. proportions) thus additional ancillary information is needed for absolute statements on them. For example, in a MgO (wt.%) vs. SiO2 (wt.%) variation diagram, evolution from a primitive magma to a more evolved one will produce the decrease of MgO with the increase of SiO2. However, both species are actually decreasing in absolute terms because both species are consumed with mineral nucleation and growth. The former (the inverse correlation) is a relative statement, which can be interpreted in absolute terms (a positive correlation) when mass conservation and the fractionation of the olivine phase are taken into account (Cortés, 2009). On the other hand the ratio between two parts is always an absolute quantity (Chayes, 1960, 1971). It can also be proved that a logratio transformation is a linear application applied on compositions that maps such compositions in a non-constrained space in which traditional statistical methods can be applied. The interested reader is referred to Aitchison (1986), Aitchison et al. (2000), Pawlowsky-Glahn and Olea (2004), Egozcue and Pawlowsky-Glahn (2006), Pawlowsky-Glahn and Egozcue (2006), and Pawlowsky-Glahn et al. (2007) for a detailed account of how the technique is applied to different compositional problems. There are three equivalent logratio transformations. For the practical approach presented here, the additive logratio transformation (alr) has been chosen due to its relative simpler implementation and the fact that the transformed data generates a covariance matrix which is not singular (Pawlowsky-Glahn et al., 2007). The alr transformation is defined by selecting an arbitrary component in the data set and then dividing the other coordinates by this value, the logarithm of the coordinate is then taken in order to simplify further mathematical treatment (Aitchison, 1982). For instance, if x = [x1 … xm] is a composition and the m-component is chosen as divisor, the transformed vector will be equal to:
A predictive region is the zone “drawn” around the empirical mean of the population, enclosing the region where individual observations of this population should be with a given probability. It differs of a probability region in which the true mean of the population is known, and from a confidence region in which the empirical mean is used and the probability of the true mean in the region is known (Pawlowsky-Glahn et al., 2007). Different regions are drawn using different distributions. For instance, a Fisher distribution is used to calculate a confidence region while Student–Siegel is the adequate distribution to calculate a predictive region (Pawlowsky-Glahn et al., 2007). In this work we want to calculate an approximate predictive region based in a sample of n compositional vectors (the chemical analyses), in which the true mean and true variance are not known and need to be estimated from the sample. The traditional approach to calculate a predictive region is based on non-constrained (i.e. not closed and/or strictly positive) data. The usual estimators of expectation and variance in the univariate case are the arithmetic mean and the standard deviation. For multivariate data (e.g. x = [x1, …, xm]) the arithmetic average vector of n data points is calculated as: x=
n ∑i = 1 xi1 ∑n x ; …; i = 1 im : n n
ð1Þ
The estimator of the variance, is represented by the covariance matrix. For a data set of m-variables sorted as the rows of the matrix A, the covariance matrix is by definition: 2
covðx1 ; x1 Þ covðx1 ; x2 Þ 6 covðx2 ; x1 Þ covðx2 ; x2 Þ covðAÞ = 6 4 … … covðxi ; x1 Þ covðxi ; x2 Þ
3 … covðx1 ; xn Þ … covðx2 ; xn Þ 7 7: 5 … … … covðxn ; xn Þ
ð2Þ
The orientations of maximum variability for each coordinate are given by the orientations of the eigenvectors of the covariance matrix (Eq. (2)). The predictive region can be defined and calculated based on such orientations, in which the lengths of the axes are a function of the variance and can be predicted using the estimator of the variance of the (statistical) sample. For m dimensions (coordinates), the predictive region defines a “hyper-ellipse” centered in the average vector (Eq. (1)), in which each axis is defined by an eigenvector, which represents an orientation of maximum variability of the data. For an approximate predictive region, the length of each axis can be estimated based on the usual assumption that the sample is representative of the distribution (normal) of the data. In other words, 2σn − 1, calculated from the sample represents ~95% confidence if the sample is big enough. The predictive region can be defined by Eq. (3), in which α ≤ 1 can be interpreted as a data point inside such predictive region. 2 m xi ∑ ≤α: ai i=1
ð3Þ
x x y = alr ðxÞ = log 1 ; …; log m−1 : xm xm
The covariance matrix of this transformed data is not singular, allowing the calculation of a predictive region based on eigenvalues and eigenvectors. Since the transformed data are not constrained to positive values, tacit assumptions regarding the distribution of the data (e.g. that the transformed data are normally-distributed) can be also accepted with no restrictions. Providing that the closure constant (e.g. 100%) and that the “divisor coordinate” are known, it is always possible to apply the inverse of the alr logratio transformation (alr − 1). Let there be xm the divisor in the transformation. Defining: ri =
xi ; xm
ð5Þ
and a transformed vector as: y = ½r1 ; r2 ; ::::rm−1 ;
ð6Þ
it is clear that: xm expðr1 Þ + ::::: + xm expðrm−1 Þ + xm = 100:
As pointed out before, when data are strictly positive and closed, the normal distribution (and related distributions such as Student's t) cannot be used as a model unless modified. Moreover, when the data are closed, the covariance matrix (Eq. (2)) is singular thus not allowing a straightforward representation of probability regions by
ð4Þ
ð7Þ
Rearranging, xm can be found as: xm =
100 : expðr1 Þ + ::::: + expðrm−1 Þ + 1
ð8Þ
B. Chiasera, J.A. Cortés / Journal of Volcanology and Geothermal Research 207 (2011) 83–92
The compositional vector x can be then calculated as: x = alr
−1
ðyÞ = ½xm r1 ; xm r2 ; …xm rm−1 ; xm :
ð9Þ
2.3. Construction of the 2-D predictive region A graphical output can be produced directly from the calculation of the principal components of the data already used to define the predictive region. The usual procedure is to project the data onto the two eigenvectors with the two larger eigenvalues, since they represent the orientations with maximum variability. This is equivalent to a reduction of the dimensionality of the data while minimizing the sacrifice of relevant relationships in the data in terms of variance. Assuming that the (transformed) data are sorted in the matrix Xn × m, in which each row represents a data point and each column a component, if Sm × 2 is the matrix in which the two columns represent the two eigenvectors with larger eigenvalues, then the projected data Xn⋆ × 2 can be obtained as: ⋆
Xn ×
2
h i t t t = S ×X :
ð10Þ
in which the superscript “t” denotes transposition. With the projected data mapped onto a two-dimensional, nonconstrained Euclidean vector space, it is possible to apply the usual multivariate statistical methods in order to find the predictive region. In particular, the eigenvectors of the covariance matrix of a centered version of Xn⋆ × 2 will provide the orientation of the major axes, the arithmetic mean of the transformed data the center of the ellipse. In order to determine the lengths of the axes of the ellipse, we recall that the covariance matrix of the data projected on the axes defined by the eigenvectors will be diagonal with the eigenvalues as non-zero values. Because all the covariances are zero, the eigenvalues are proportional to the values of the standard deviations in the principal axes. 2.3.1. Pseudo-code of the application In this section, we describe the algorithm used to produce 2-D predictive regions. The code used in the application (“mellipse.m”) has been written in MatLab and can be found below in the Appendix A. Input Data IF Data has more than 2 columns BREAK % Data is inputed in a n × 2 matrix in which each row % is the projection. If data has got more than two components % the code stops although the algorithm can be easily generalized % for m-components adding a reduction of dimension based on PCA COMPUTE Arithmetic mean COMPUTE Data-centered % Achieved by subtracting the mean to the raw data matrix. COMPUTE CovarianceMatrix of Data-centered COMPUTE eigenvalues and eigenvectors of CovarianceMatrix % Matlab has a built-in function to calculate this COMPUTE Angle of orientation of main eigenvector % Done using simply trigonometry: tan θ = y/x COMPUTE Data projected on the eigenvectors as base COMPUTE σn − 1 of the Data projected % Since the data has been projected on the principal components, % the covariances are zero and the standard deviation of each axis % represent the spread of the data on the two principal components. COMPUTE Length of the axes of the ellipse % Since is a predictive region we assume that ~ 95% of the data % will be enclosed in a region in which the axes are equal to 2 × σn − 1 % if n (the sample size) is enough big and the data is distributed normally. COMPUTE ellipse % calculate the angle θ varying from 0 to 2π
85
% and calculate the parametric equation of the ellipse PLOT ellipse % Graphical output. The actual script has several options such as plot the data points OUTPUT eigenvectors, eigenvalues, center of ellipse, parametric ellipse end 3. Applications 3.1. Example 1: the transition Vancori–Neostromboli at Stromboli Volcano, Italy 3.1.1. Geological background The geological evolution of the Stromboli volcanic system represents an overall time span of approximately 200,000 years represented by six major volcanic periods (from older to younger: Paleostromboli I,II,III, Vancori, Neostromboli and Recent), marked by major collapses of the volcanic edifice (Hornig-Kjarsgaard et al., 1993). The eruptive products of Stromboli volcano are mainly basalts to basaltic andesites that range between high-K and shoshonitic series (Francalanci et al., 1989; Calanchi et al., 2002), in which changes in the chemical affinity seem to be cyclic (Francalanci et al., 1989; Cortés et al., 2005). In this example we will concentrate on the geochemical changes of the transition between the Vancori period and the Neostromboli period (Hornig-Kjarsgaard et al., 1993; Cortés et al., 2005), which coincide with the first major lateral collapse of the volcanic edifice. In a K2O (wt.%) vs SiO2 (wt.%) variation diagram (Fig. 1), Vancori period samples from Francalanci et al. (1989) and (Cortés et al. (2005) have been highlighted (see figure caption for details). Not including sample STR158, which is the most primitive sample from the Francalanci et al. (1989) data set, sample numbers represent their stratigraphic position in the Vancori sequence, with STR46 the older and STR59 the younger. Stratigraphy is summarized in Table 1. Following Cortés et al. (2005), the trend from the Lower to Upper Vancori period can be interpreted as a typical magma evolution, due
Fig. 1. K2O vs. SiO2 (wt.%) bulk-rock classification diagram, according to Peccerillo and Taylor (1976) for the eruptive products of the Vancori (squares: Lower, diamonds: Middle, circles: Upper) and Neostromboli (triangles) periods, modified from Cortés et al. (2005). Open symbols: analyses from Francalanci et al. (1993). Filled symbols selected samples from Cortés et al. (2005) and this study.
86
B. Chiasera, J.A. Cortés / Journal of Volcanology and Geothermal Research 207 (2011) 83–92
to fractional crystallization that generate the more evolved products, punctuated with infrequent magma recharges. However, the trend from the beginning of the Upper Vancori period suggests the arrival of a primitive recharge that changes the composition of the products back to a basaltic composition, albeit a higher potassium one. In the following section we will explore this trend using the methodology proposed. 3.2. 2-D predictive region applied to the Vancori period To define a geochemical background, the proposed method was first applied to the whole-rock geochemical data taken from Francalanci et al. (1989). The data were logratio-transformed and PCA was applied to all the geological periods (Fig. 2). Reduction to two dimensions was then performed choosing the eigenvectors that are related with the larger eigenvalues and the data were plotted in this particular projection. The data set was then divided in each of the periods of the volcano, in order to calculate the ellipses. This was done by calculating the PCA of each of the sub-data sets (now two-dimensional) and using the arithmetic mean of the data and the two pair of eigenvectors as the axes of the enveloping ellipses for each period, scaled by 2σn − 1 of the transformed coordinates. In the plot we have also highlighted the selected samples from Cortés et al. (2005) shown in Fig. 1. In Fig. 3, the methodology was performed by considering the Vancori period data for PCA and dimension reduction, which were split into Lower, Middle and Upper periods to generate the ellipses. From Figs. 2 and 3, it is clear that the methodology succeeds in reproducing the trends presented in Fig. 1. 3.3. Example 2: a predictive region in the AFM diagram (the tholeiitic differentiation of Thingmuli volcano Iceland) The following example is another key application of the methodology, namely, finding predictive regions in ternary diagrams for the particular case of volcanic products. While the general problem has been already solved (Pawlowsky-Glahn and Barceló-Vidal, 1999) and probability regions has been also calculated for sediment provenance (Weltje, 2002), we recall the particular setting of volcanological data in which a tacit assumption is that all the data point are genetically related, hence they are not strictly compositional. For the example, the classic Thingmuli, Iceland tholeiitic data (Carmichael, 1964) has been chosen because they represent the typical tholeiitic differentiation which is a typical example of the closure effect. Because of the closure (i.e. sum 100 wt.%), the incompatible behavior of FeO⋆ is mainly due to the crystallization of the olivine phase in which relatively more magnesium is taken to form this particular mineral phase. The crystallization of other mineral phases for more evolved products, such as clinopyroxene will increase the consumption of FeO⋆ thus modifying its relative behavior from incompatible to compatible. In a calc–alkaline series such an effect is not observed, because iron (mostly
Fig. 2. Principal Component Analysis Plot of the Stromboli periods. Samples taken from Francalanci et al. (1989, 1993)and Cortés et al. (2005). Confidence ellipses were calculated with the method proposed in this study; see text for more details.
ferric) will crystallize mainly as spinel, thus relatively behaving as compatible from the very beginning of the evolution of that system. In relative terms, the main effect of this changing behavior will produce a relative increase of the amount of iron with differentiation and subsequent relative decrease. However, under absolute terms, being incompatible, iron should remain constant during the tholeiitic differentiation and should decrease during a calc–alkaline differentiation. Because a logratio transformation shows the true relationship between data, this effect is attenuated when the data are transformed; in particular, kinked patterns become straight lines. The AFM ternary diagram (Kuno, 1968; Irvine and Baragar, 1971), is a specific plot in which this effect is dramatically enhanced because of the actual parameters (pseudo-components) used in the plot. The parameters
Table 1 Stratigraphy of the selected samples shown in Fig. 1 as reported in Cortés et al.(2005). Sample
Period
STR156 STR46 STR47 STR48 STR52 STR53 STR54 STR55 STR56 STR57 STR59
Lower Vancori Lower Vancori Lower Vancori Middle Vancori Middle Vancori Middle Vancori Middle Vancori Upper Vancori Upper Vancori Upper Vancori Upper Vancori
Fig. 3. Principal Component Analysis Plot of the Stromboli periods. Samples taken from Francalanci et al. (1989, 1993)and Cortés et al. (2005). Ellipses were calculated with the confidence method described in the text.
B. Chiasera, J.A. Cortés / Journal of Volcanology and Geothermal Research 207 (2011) 83–92
are: left apex A = Na2O + K2O, upper apex: F = FeO + 0.8998 × Fe2O3 + MnO, and right apex: M = MgO. Although amalgamation (i.e. adding components to create new pseudo-components) might distort the relation between samples (e.g. Egozcue and Pawlowsky-Glahn, 2005) the theoretical background of why this particular amalgamation is chosen (Kuno, 1968) strongly justify its use (i.e. that the parts behave similarly into the mineral phases). It is clear that in such a system, an incompatible FeO⋆ will imply that the products of the differentiation will first increase towards the F apex and then will decrease when iron becomes compatible. In Fig. 4 we have plotted (Carmichael, 1964) data, in which the tholeiitic nature is clearly revealed. This is a clear example in which the predictive region is not straightforward to calculate, because the data projected in the AFM diagram are strictly positive and closed. Not considering these constraints, it is possible to calculate a predictive ellipse simply projecting the compositions onto the plane x + y + z = 100. However this will render a predictive area which will include negative compositions (Fig. 5) and which does not consider the relative kink on the data, i.e. the assumption that the increase–decrease behavior of FeO⋆ is a relative effect. In order to visualize the true correlations in the data and calculate a predictive region constrained to the sample space (the Simplex S) such data is pre-processed. This can be done directly on the Simplex (e.g. Hron, 2009), or using a logratio transformation such as alr. Thus if x = [x1, x2, x3] is a point in the ternary diagram (x ∈ S) then: 3
∑ xi = 100;
i=1
ð11Þ
Fig. 5. AFM diagram (Kuno, 1968; Irvine and Baragar, 1971) of the Tholeiitic data set from Thingmuli volcano (Carmichael, 1964). The super-imposed ellipse is a predictive area calculated based on 2σn − 1 as axes lengths, not taking into account that compositional data is strictly positive (Pawlowsky-Glahn et al., 2007).
y = [y1, y2] the projection in the AFM diagram will be: x = alr
−1
ðyÞ =
100 ½y ; y ; 1: expðy1 Þ + expðy2 Þ + 1 1 2
ð13Þ
Fig. 7 shows the AFM diagram of the Thingmuli data with the addition of the calculated predictive region. Moreover, the main principal component orientation represents the ideal tholeiitic evolution as a “linear” regression on the real Carmichael (1964) data. 4. Discussion
and the alr-transformed data point will be: x x y = alr ðxÞ = log 1 ; log 2 : x3 x3
87
ð12Þ
Fig. 6 presents Carmichael (1964) data alr-transformed, centered, in which the confidence region (ellipse) has also been calculated using the methodology proposed in this study. The transformation has reduced the three parts of the data into two coordinates (Fig. 6), in which, although noisy, it is possible to observe a closer pattern to the hypothetical straight line, equivalent to constant, absolute FeO ⋆, marked by the main principal component hence the assumption of a normal distribution in the absolute trend is funded. Using a conveniently spaced grid, the axes and the ellipse can be alr −1-transformed because it is known that these points should also close to 100% and the third coordinate is the divisor in the transformation. Thus for a given point of the ellipse or axes:
Fig. 4. AFM diagram (Kuno, 1968; Irvine and Baragar, 1971) of the Tholeiitic data set from Thingmuli volcano (Carmichael, 1964). Note the classic tholeiitic “kink”.
The examples presented here from the Vancori period (Francalanci et al., 1989; Cortés et al., 2005) and the example from Thingmuli volcano (Carmichael, 1964) are two simple cases in which the potential of the technique in the particular case of volcanological geochemical data is exposed. In the first example, the methodology shows the validity of using PCA with geochemical data, which is in theory, not possible when the data are closed. It is also clear that artifacts related to the constant sum problem (e.g. (Chayes, 1960, 1971), such as loss of alkali or the addition
Fig. 6. The Thingmuli volcano data set (Carmichael, 1964) alr-transformed and centered. The predictive region ellipse has been calculated following the methodology proposed in this study. Orientations of the principal axes of the ellipse are also shown in the figure.
88
B. Chiasera, J.A. Cortés / Journal of Volcanology and Geothermal Research 207 (2011) 83–92
of FeO ⋆ with the other elements are clearly shown in the transformed data, strongly justify the use of the technique, and open the possibilities of defining “fields” in these diagrams based on robust statistical methods. There is an obvious limitation in the method regarding the existence of zero values in a data set (logarithm of zero is undefined). For this, there are techniques to replace these zeros (Martín-Fernández et al., 2000, 2003), which can partially solve the problem. To this extent, mineral chemistry data (in which zeros are “real” and not a “below detection limit”) needs to be reformulated at the time of applying the technique (e.g. joining species such as FeO⋆ + MnO). Usually, of the cases with whole-rock chemistry data zeros are rare and combining species solves the problem in most of the practical cases. Fig. 7. The AFM diagram (Kuno, 1968; Irvine and Baragar, 1971) of the Tholeiitic data set from Thingmuli volcano (Carmichael, 1964) in which the predictive ellipse and the principal axes have been alr−1 transformed.
of secondary volatile content can be minimized when a logratio transformation is applied on the geochemical data. Problems such as correlation and patterns that become obscure in noisy data, in which alteration has taken place can be potentially circumvented with a logratio transformation and using as divisor a specie that is relatively stable and conserves, such as TiO2 (Rollinson and Roberts, 1986). It remains unclear however how sensitive the methodology is with highly altered data (e.g. Ohta and Arai, 2007). The Thingmuli example shows another key application of the methodology. Transforming closed data from a ternary projection to an unconstrained space implies that the usual statistical tools can now be applied to the data without restrictions, thus generating quantitative predictive regions. In the particular example presented here, it is of course possible to generate a hypothesis test to accept or reject a set of data as a Tholeiitic differentiation based in the probability region that has been created in Fig. 7. Although the Thingmuli data set is strictly not compositional due to the ancillary assumptions of mass conservation and consanguinity, typical of volcanological geochemical data (Cortés, 2009), still the data set is closed and positive, arguing for a logratio transformation treatment. The fact that the true correlations
Appendix A. Matlab codes used in this contribution Appendix A.1. Matlab code to calculate the alr transformation
5. Conclusions In this work, we have applied the logratio technique to solve the closure problem that usual geochemical analyses have, such as the limitation at the time of using statistical tools that only ought to be applied on unconstrained data. In particular, based on this idea we have explored a predictive region methodology that can be successfully applied on volcanological geochemical data. Issues such as alteration can potentially be circumvented applying the methodology, although more work needs to be done to explore this statement.
Acknowledgements The authors would like to acknowledge the help of Dr. J.L. Palma for useful comments at the time of improving this manuscript and the careful reviews of Dr. Raimón TolosanaDelgado and Prof. Vera Pawlosky-Glahn. We particularly thank the patience put on the work of non-mathematicians dealing with mathematical problems. Thanks are also extended to the Geology Department at the University at Buffalo for the financial support necessary to complete this project as well as the donors who make the Owens fellowship possible.
B. Chiasera, J.A. Cortés / Journal of Volcanology and Geothermal Research 207 (2011) 83–92
Appendix A.2. Matlab code to calculate the inverse of the alr transformation
89
90
B. Chiasera, J.A. Cortés / Journal of Volcanology and Geothermal Research 207 (2011) 83–92
Appendix A.3 Matlab code of a 2-D confidence ellipse
B. Chiasera, J.A. Cortés / Journal of Volcanology and Geothermal Research 207 (2011) 83–92
References Aitchison, J., 1982. The statistical analysis of compositional data. Journal of the Royal Statistical Society Series B 44 (2), 139–177. Aitchison, J., 1983. Principal component analysis of compositional data. Biometrika 70 (1), 57–65. Aitchison, J., 1986. The Statistical Analysis of Compositional Data. Monographs on Statistics and Applied Probability. Chapman and Hall, London. Aitchison, J., Barceló-Vidal, C., Martín-Fernández, J., Pawlowsky-Glahn, V., 2000. Logratio analysis and compositional distance. Mathematical Geology 32 (3), 271–275. Calanchi, N., Peccerillo, A., Tranne, C.A., Lucchini, F., Rossi, P.L., Kempton, P., Barbieri, M., Wu, T.W., 2002. Petrology and geochemistry of volcanic rocks from the island of Panarea: implications for mantle evolution beneath the Aeolian island arc (southern Tyrrhenian sea). Journal of Volcanology and Geothermal Research 115, 367–395. Carmichael, I., 1964. The petrology of Thingmuli volcano, a Tertiary volcano in eastern iceland. Journal of Petrology 5, 435–460. Chayes, F., 1960. On correlation between variables of constant sum. Journal of Geophysical Research 65 (12), 4185–4193. Chayes, F., 1971. Ratio Correlation. University of Chicago, Chicago. 99 pp.
91
Cortés, J., 2009. On the Harker variation diagrams; a comment on “The statistical analysis of compositional data. Where are we and where should we be heading” by Aitchison and Egozcue (2005). Mathematics and Geosciences 41 (7), 817–828. Cortés, J.A., Wilson, M., Condliffe, E., Francalanci, L., Chertkoff, D.G., 2005. The Evolution of the magmatic system of Stromboli Volcano during the Vancori period (26–13.8 ky). Journal of Volcanology and Geothermal Research 147 (1–2), 1–38. Egozcue, J., Pawlowsky-Glahn, V., 2006. Simplicial geometry for compositional data. In: Buccianti, A.G., Mateu-Figueras, G., Pawlowsky-Glahn, V. (Eds.), Compositional Data Analysis in the Geosciences: Geological Society, London, Special Publications, 264, pp. 145–159. Egozcue, J.J., Pawlowsky-Glahn, V., 2005. Groups of parts and their balances in compositional data analysis. Mathematical Geology 37 (7), 795–828. Francalanci, L., Manetti, P., Peccerillo, A., 1989. Volcanological and magmatological evolution of Stromboli volcano (Aeolian islands): the roles of fractional crystallisation, magma mixing, crustal contamination and source heterogeneity. Bulletin of Volcanology 51, 355–378. Francalanci, L., Manetti, P., Peccerillo, A., Keller, J., 1993. Magmatological evolution of the Stromboli volcano (Aeolian Arc, Italy): inferences from major and trace element and Sr-isotopic composition of lavas and pyroclastic rocks. Acta Vulcanologica 3, 127–151.
92
B. Chiasera, J.A. Cortés / Journal of Volcanology and Geothermal Research 207 (2011) 83–92
Hornig-Kjarsgaard, I., Keller, J., Koberski, U., Stadlbauer, E., Francalanci, L., Lenhart, R., 1993. Geology, stratigraphy and volcanological evolution of the island of Stromboli, Aeolian arc, Italy. Acta Vulcanologica 3, 21–68. Hron, K., 2009. Analytical representation of ellipses in the Aitchison geometry and its application. acta Universitatis Palackianae Olomucensis, Facultas Rerum Naturalium. Mathematica 48, 53–60. Irvine, T., Baragar, W., 1971. A guide to the chemical classification of the common volcanic rocks. Canadian Journal of Earth Sciences 8, 523–545. Kuno, H., 1968. Differentiation of basalt magmas. In: Hess, H.H., Poldervaart, A. (Eds.), The Poldervaart Treatise, on Rocks of Basaltic Composition, Vol. 2. Insterscience Publishers, pp. 623–688. Martín-Fernández, J., Barceló-Vidal, C., Pawlowsky-Glahn, V., 2000. Zero replacements in compositional data sets. In: Kiers, H., Rasson, J., Groenen, P., Shader, M. (Eds.), Studies in classification, data analysis, and knowledge organization. SpringerVerlag, Berlin, pp. 155–160. Martín-Fernández, J., Barceló-Vidal, C., Pawlowsky-Glahn, V., 2003. Dealing with zeros and missing values in compositional data sets using nonparametric imputation. Mathematical Geology 35 (3), 253–278. Ohta, T., Arai, H., 2007. Statistical empirical index of chemicl weathering in igneous rocks: a new tool for evaluating the degree of weathering. Chemical Geology 240, 280–297. Pawlowsky-Glahn, V., Barceló-Vidal, C., 1999. Confidence regions in ternary diagrams. Terra Nostra (Schriften der Alfred Wegener Stiftung) 1, 37–47. Pawlowsky-Glahn, V., Egozcue, J.J., 2006. Compositional data and their analysis: an introduction. In: Buccianti, A.G., Mateu-Figueras, G., Pawlowsky-Glahn, V. (Eds.), Compositional data analysis in the geosciences: Geological Society, London, Special Publications, 264, pp. 1–10.
Pawlowsky-Glahn, V., Olea, R., 2004. Geostatistical analysis of compositional data. Studies in Mathematical Geology, 7. International Association For Mathematical Geology. Pawlowsky-Glahn, V., Egozcue, J., Tolosana-Delgado, R., 2007. Lecture Notes on Compositional Data Analysis. CoDa'07. http://diobma.udg.edu/handle/10256/297 2007. Pearce, T., 1968. A contribution to the theory of variation diagrams. Contributions to Mineralogy and Petrology 19, 142–157. Pearson, K., 1897. Mathematical contributions to the theory of evolution. on a form of spurious correlation which may arise when indices are used in the measurement of organs. Proceedings of the Royal Society of London LX, pp. 489–502. Peccerillo, A., Taylor, S.R., 1976. Geochemistry of Eocene calk-alkaline volcanic rocks from the Kastamonu area, northern Turkey. Contributions to Mineralogy and Petrology 58, 63–81. Rollinson, H., 1992. Another look at the constant sum problem in geochemistry. Mineralogical Magazine 56, 469–475. Rollinson, H., Roberts, C., 1986. Ratio correlations and major element mobility in altered basalts and komatiites. Contributions to Mineralogy and Petrology 93, 89–97. Russell, J., Nicholls, J., 1988. Analysis of petrologic hypotheses with pearce element ratios. Contributions to Mineralogy and Petrology 99, 25–35. Tolosana-Delgado, R., Otero, N., Pawlowsky-Glahn, V., 2005. Some basic concepts of compositional geometry. Mathematical Geology 37, 673–680. Weltje, G., 2002. Quantitative analysis of detrital modes: statistically rigorous confidence regions in ternary diagrams and their use in sedimentary petrology. Earth Sciences Reviews 57 (3–4), 211–253.