Journal Pre-proof Geodata science and geochemical mapping
Renguang Zuo, Yihui Xiong PII:
S0375-6742(19)30496-0
DOI:
https://doi.org/10.1016/j.gexplo.2019.106431
Reference:
GEXPLO 106431
To appear in:
Journal of Geochemical Exploration
Received date:
29 August 2019
Revised date:
13 November 2019
Accepted date:
23 November 2019
Please cite this article as: R. Zuo and Y. Xiong, Geodata science and geochemical mapping, Journal of Geochemical Exploration (2018), https://doi.org/10.1016/ j.gexplo.2019.106431
This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
© 2018 Published by Elsevier.
Journal Pre-proof Geodata science and geochemical mapping: Renguang Zuo*, Yihui Xiong State Key Laboratory of Geological Processes and Mineral Resources, China University of Geosciences, Wuhan 430074, China *Email:
[email protected] Abstract Geodata science (GDS) is an interdisciplinary field in which geoscience data are mined for us to
oo
f
well understand the origin, evolution and future of our Earth and planet with prediction and assessment of its resources and environments. The data chain of GDS involves collecting
pr
geosciences data, mining geoinformation, discovering geo-knowledge, and making spatial decisions.
e-
There are three groups of GDS methods for exploring and mining geoscience data including data
Pr
statistics, data mining, and data insight and prediction. A case study on geochemical exploration data mapping was conducted to demonstrate the powerful use of GDS. The results show that GDS
al
is a new research paradigm for exploring the spatial association of geochemical patterns, mining
rn
elemental association, and recognizing geochemical anomalies associated with mineralization via
Jo u
geo-computation and geo-visualization techniques in support of mineral exploration. Keywords: Geodata science; Data mining; Data insight and prediction; Geochemical exploration
1. Introduction
Data Science (DS) is the science of extracting information and knowledge from data with the aim of better understanding the dataset itself. DS, which was first introduced by Naur (1974) in Concise Survey of Computer Method, is defined as a new pattern for the research of data mining. It is similar but different with the term of data mining. DS is an interdisciplinary approach to data mining, which combines statistics, many fields of computer science, and scientific methods and processes in order to mine data in automated ways, without human interaction (Hayashi, 1998). Modern data science is increasingly concerned with big data. Mattmann (2013) and Cleveland (2014) further
Journal Pre-proof explained and promoted DS. DS is a complementation of computational science and statistics. It is not just restricted to the field of statistics with the increasingly concerned with big data and artificial intelligence (AI). Both DS and AI techniques have gained great attention when entering the era of big data. AI is a technique of learning human skills to deal with problems and is a broad family containing machine learning (ML) and other techniques. Deep learning (DL), as a subtype of ML algorithm, is a kind of artificial neural network linked to multiple hidden layers. DS is a cross subject of AI, ML and DL. In general, the workflow of DS is different from that of ML. The former
data,
training
model,
deploying
model
e-
pr
(https://www.deeplearning.ai/ai-for-everyone/).
and
oo
collecting
f
involves collecting data, analyzing data, and suggesting hypotheses or actions. The latter involves
Big data has gathered much attention and has instigated the research on data mining in multiple
Pr
domains. Similarly, earth scientists are dedicated to unearth potential information from big data to
al
find solutions to problems in nature, such as climate change prediction, air pollution monitoring, predicting risks to infrastructures by nature hazards, consumption of water and mineral resources,
rn
and identifying factors of earthquakes, landslides, flooding, and volcanic eruptions (Karpatne et al.,
Jo u
2019). Research on the earth system is shifting from collecting traditional patterns, such as empirical data, theoretical derivation, and simulation local, into exploiting and mining earth datasets to discover the interrelationships between different variables (Tansley and Tolle, 2009). The earth system has entered a stage of data-intensive scientific discovery, which benefits from new generations of sensors, instruments and platforms for quick transmission rates in data storage facilities, publicity available datasets that gives earth scientists the conditions for global research and resource sharing and large efforts to standardize geoscience data sets to facilitate better mining of them (e.g., Baumann et al., 2016; Ma, 2018).
Journal Pre-proof Geodata science (GDS) is an interdisciplinary field of science to mine geoscience data to better understand the origin, evolution, and future of our Earth and planet with prediction and assessment of its resources and environments. With a similar definition to geophysics and geochemistry, GDS is an interdisciplinary subject of geosciences and DS (Fig.1). The data chain of GDS includes collecting geosciences datasets, mining geoinformation, discovering geoknowledge, and making spatial decisions (Fig.2). Various methods are available to analyze and mine geoscience data. These methods can be classified as data statistics, data mining, and data insight and prediction (Fig.3).
f
Data statistics mainly refers to traditional statistical methods with the aim of sorting, filtering,
oo
calculating and counting the data to reveal meaningful information. Data mining refers to the
pr
discovery of unknown, potentially useful, and hidden rules from geoscience data via association
e-
analysis, clustering analysis, factor analysis and traditional AI algorithms. Data insight and prediction aim to provide insight into and prediction of geological events which are the core
Pr
application of big data for extracting geoscience features and integrating of geoscience variables to
al
support decision-making (Zuo et al., 2019). The main aim of this paper is to introduce the concept of GDS and its applications to deal with geoscience problems through a case study in Gejiu region,
rn
Yunnan Province, China. This case study demonstrates how the three groups of GDS methods can
Jo u
be applied to mine geochemical exploration data in support of mineral exploration.
2. Study area and Data
The study area, located in the Gejiu region, Yunnan Province, China, is one of the largest primary Sn mineral districts all over the word. It cantinas ore reserves of 300 Mt Sn, 300 Mt Cu, 400 Mt of Pb + Zn, and > 1000 Mt of Mn (Cheng et al., 2013a). Yu et al. (1988) introduced its geological background. Two major geological units consisting of igneous rocks and a sequence of Paleozoic to Mesozoic sedimentary rocks (Fig. 4) were developed in the study area. The outcrops in this district mainly contain Middle Triassic Gejiu Formation, and Falang Formation (Qin and Li, 2008). The carbonate rocks of these two formations, including the interlayered Triassic basic lavas in the Gejiu
Journal Pre-proof Formation, are the main ore-hosting rocks for Sn mineralization (Cheng et al., 2012, 2013b). The Gejiu granite Batholith as a granitoid complex can be recognized as porphyritic biotite granite, fine-grained porphyritic biotite granite, porphyritic biotite granite, coarse to medium-grained equigranular biotite granite, medium to fine-grained leucogranite and fine-grained equigranular granitic dyke swarm and small stocks around granite margins based on texture and mineralogy (Dai, 1996; Cheng and Mao, 2010; Cheng et al., 2013b). The study area was divided into the eastern and western parts by the Gejiu fault. N–S and E–W trending, and NE–SW or NW–SE trending are the
f
main orientations of faults and folds in the eastern and western parts of the study area, respectively.
oo
Sn polymetallic mineralization (such as Sn, Sn-Pb, Sn-Cu-Pb, and Sn-W) developed in this district
pr
mainly contain four ore types which are greisen, skarn, stratabound cassiterite-sulfide and vein type
e-
ore, respectively (Mao et al., 2008; Cheng et al., 2013b). Previous studies have pointed out that Sn mineralization in the study area had a spatial correlation with the Gejiu Formation, Geijiu Batholith,
Pr
and fault structures (e.g NNE-SSW and E-W trending faults) (Cheng, 2007; Mao et al., 2008;
al
Cheng and Mao, 2010; Cheng et al., 2013b).
rn
The stream sediment geochemical data were collected Chinese National Geochemical Mapping
Jo u
Project as part of the Regional Geochemistry National Reconnaissance (RGNR) Project initiated in 1979 (Xie et al., 1997). This project collected stream sediment samples in this district, for the determination of 39 major and trace element concentrations with a density of 1 sample per 4 km2. Eleven of them (Bi, Cd, Co, Cu, La, Mo, Nb, Pb, Th, U, and W) are obtained by inductively coupled plasma-mass spectrometry (ICP-MS), nine of them (Al, Cr, Fe, K, P, Si, Ti, Y, and Zr) are obtained by X-ray fluorescence (XRF), eleven of them (Ba, Be, Ca, Li, Mg, Mn, Na, Ni, Sr, V, and Zn) are obtained by inductively coupled plasma-atomic emission spectrometry (ICP-AES), three of them (Ag, B and Sn) are obtained by emission spectrometry (ES), two of them (As and Sb) are obtained by hydride generation-atomic fluorescence spectrometry (HG-AFS), and finally Au, Hg, and F are determined by graphite furnace-atomic absorption spectrometry (GF-AAS), cold
Journal Pre-proof vapor-atomic fluorescence spectrometry (CV-AFS), and ion selective electrode (ISE), respectively (Xie et al., 2008).The dataset had been used for prediction of Sn mineral deposits by Cheng (2007). For more details on the sampling, detection limits, and data quality can be found in Xie et al. (1997).
3. Methods 3.1. Exploratory spatial data analysis
f
As an extension of exploratory data analysis (Tukey, 1977), exploratory spatial data analysis
oo
(ESDA) aims to examine geospatial data using various approaches, such as histograms, Voronoi
pr
map, normal QQ plots, trend analysis, semivariogram/covariance clouds, cross-covariance clouds,
e-
and other functions (Anselin, 1999). The core functions of ESDA are to visualize and explore the spatial data. Symanzik (2014) considered a British anesthesiologist John Snow the ―grandfather of
Pr
ESDA‖. The local indicators of spatial association (LISA) are a popular and key technique for
al
ESDA (Symanzik, 2014). In 1995, Luc Anselin proposed the local Moran's I index to assess the spatial association at a location i (Anselin, 1995). The index of I can measure local instability and
rn
identify local spatial clusters and outliers. A value of I > 0 suggests a cluster feature with
Jo u
neighboring features with similarly high or low attribute values. A value of I < 0 suggests an outlier feature with neighboring features with dissimilar values. The results of local Moran's I index analysis lead to four patterns including cluster of high values (HH), cluster of low values (LL), outlier in which a high value is surrounded primarily by low values (HL), and outlier in which a low value is surrounded primarily by high values (LH).
3.2. Robust principal component analysis Principal component analysis (PCA) is a popular multivariate analysis method for reducing the dimensionality of datasets and integrating several correlated variables into a single principal component (PC) (Jolliffe, 2002). In applied geochemistry, the obtained principal component could
Journal Pre-proof represent a meaningful elemental association related to mineralization (Zuo, 2011). Robust PCA (RPCA) is not like traditional PCA, which is sensitive to outliers based on traditional variances or covariance matrices. RPCA is based on the minimum covariance determinant estimator instead of the classic covariance matrix (Filzmoser et al., 2009; Zuo et al., 2013). Therefore, RPCA can overcome the shortcoming of traditional PCA and can reduce the influences of outliers (Zuo, 2014). In addition, the isometric logratio (ilr) transformation (Egozcue et al., 2003) has been used in RPCA to reduce the effects of the closed data problem. The ilr-transformed variables lost the
f
correspondence between original variables and the ilr-transformed variables. So the analysis results
oo
generated from ilr-transformed variables are difficult to interpret (Egozcue and Pawlowsky-Glahn,
pr
2005). For interpreting results of ilr transformed data, sometimes we have to back-transform the
e-
resluts (e.g., loadings and scores of PCA) into the centered logratio (clr) space (Filzmoser et al., 2009; Reimann et al., 2008). Therefore, RPCA can lead to robust and reliable results. Most of
Pr
geochemical datasets consist of many variables. For instance, the National Geochemical Survey of
al
Australia dataset has 60 variables (de Caritat et al., 2010), and the Chinese National Geochemical Mapping dataset has 39 variables (Xie et al., 1997). However, only several elements are linked to
rn
mineralization and can guide mineral exploration. Identifying elemental associations related to a
Jo u
specific mineralization is a tough task when mapping geochemical exploration data. Zuo (2018) summarized three techniques for identifying elemental associations for a specific type of mineral deposit. These techniques include studying the geological characteristics of known mineral deposits, multivariate analysis, and spatial analysis (Zuo, 2018). In this study, RPCA as a multivariate analysis was used. For more details on RPCA can be found in Filzmoser et al. (2009).
3.3. Deep autoencoder network ML algorithms have been widely applied in many fields. Recently, they have been employed to the mapping of exploration geochemical data (e.g., Chen et al., 2014, 2017; Zhao et al., 2016). This is because they have the powerful ability to model the complex and unknown multivariate
Journal Pre-proof geochemical distribution and extract meaningful elemental associations related to mineralization (Zuo, 2017; Zuo and Xiong, 2018). DL as a type of ML algorithm has a strong ability to automatically extract high-level representations from complex data, and has potential for processing geochemical exploration data (Xiong and Zuo, 2016; Zuo and Xiong, 2018; Chen et al., 2019a, 2019b; Li et al., 2019; Zuo et al., 2019).
As a typical DL model, the deep autoencoder network (DAN) proposed by Hinton and
Valentine and Trampert, 2012; Fiore et al., 2013; Sakurada and Yairi,
pr
reconstruction errors (e.g.,
oo
f
Salakhutdinov (2006), has been successfully applied to anomaly detection based on the
e-
2014; Sun et al. 2014). The principle of DAN for geochemical anomaly detection lies in geochemical anomalies with small sample sizes, which are poorly encoded because they are linked
Pr
to a low probability of detection. Therefore, geochemical anomalies have high reconstruction errors
al
in the DAN model. Based on this principle, DAN has been successfully applied to recognize
rn
geochemical anomalies (Xiong and Zuo, 2016; Zuo et al., 2019) and mapping mineral prospectivity
Jo u
(Xiong et al., 2018). The training of DAN consists of two phases. The first phase is pretraining, where every restricted Boltzmann machine (RBM) is trained respectively for initialize weights. The second phase is unrolled and fine-tuning; once the training of an RBM is finished, another RBM is "stacked" on the top of it, taking its input from the output of the front RBM, and then, the whole deep network is fine-tuned via back-propagation to adjust all the parameters simultaneously. Thus, the weights of the multi-layer feed-forward neural network are initialized by the weights of the stacked RBMs instead of the traditional method of random initialization weights.
4. Results and discussion
Journal Pre-proof Mapping geochemical exploration data plays a critical role in mineral exploration. Various methods have been successfully applied to handle with geochemical exploration data. These methods include traditional methods such as mean ± 2 × standard deviations (Hawkes and Webb, 1962), probability graph (Sinclair, 1974), exploratory data analysis (Tukey, 1977), geostatistics (Matheron, 1962), gap statistic (Miesch, 1981; Wang and Zuo, 2016), multivariate statistics (Grunsky, 2010; Zuo, 2011), fractal/multifractal models (Cheng et al., 1994; 2000; Cheng, 2007; Zuo and Wang, 2016), and ML algorithms (Chen et al., 2014, 2017; Xiong and Zuo, 2016; Zuo and Xiong, 2018; Zuo et al., 2019).
f
In general, in the field of exploration geochemistry, the main aims of mapping geochemical
oo
exploration data are to explore the spatial geochemical patterns, reveal geochemical element
pr
association related to mineralization, and identify geochemical anomalies associated with
e-
mineralization in support of mineral exploration. In this section, ESDA, RPCA, and SDAN are employed as representative tools for data statistics, data mining and data insight and prediction,
4.1. ESDA
al
Pr
respectively, to process geochemical exploration data and identify geochemical anomalies.
rn
The major ore-forming elements Sn and Cu were selected as examples for the spatial
Jo u
autocorrelation analysis using the local Moran's I index supported by ArcGIS@TM 10.2. In term of the First Law of Geography (Tobler, 1970), the weights matrix for cluster and outliers analysis was created based on the inverse Euclidean distance of stream geochemical sampling points. The distance band or threshold was set as 20 km. Features outside 20 km of a target feature were ignored in analyses for that feature.
There were a total of 593 points. For Sn, there were 538 points which are classified as neither outliers nor clusters. 51 HH clusters, 3 LH and 1 HL outliers were detected (Fig.5a). Most of the known Sn polymetallic mineralization in the eastern part of the district were located around of HH clusters. Meanwhile, for Cu, there were 554 points which are classified as neither outliers nor
Journal Pre-proof clusters. 37 HH clusters, 1 LH and 1 HL outlier were detected (Fig. 5b). The cluster and outlier pattern of Cu is similar to that of Sn because more than 89% (33/37) of Cu HH clusters were also classified as Sn HH clusters, and the Cu HL outlier is in the same location as the Sn HL outlier, implying that Cu and Sn have a spatial correlation and could be related in origin (Cheng and Mao, 2010; Cheng et al., 2012).
4.2. RPCA
f
The major ore-forming elements of Ag, Au, As, Bi, Cd, Co, Cu, Hg, Mo, Ni, Pb, Sb, Sn, W, and Zn,
oo
were selected to reveal the elemental association related to Sn polymetallic mineralization. Here,
pr
the RPCA method firstly opened the raw data using ilr transformation to address the data closure
e-
problem; it then combined multiple geochemical variables (Zuo et al., 2013). The results of RPCA on ilr transformed geochemical data suggest that the negative loadings of PC2 with the assemblages
Pr
of Ag, Bi, Co, Cu, Ni, Pb, Sn, W, and Zn perhaps represents Sn polymetallic mineralization (Fig. 6a
al
and Table. 1). It can be observed that areas with high PC2 scores which show a strong spatial correlation with the Sn polymetallic mineralization are mainly distributed in the eastern parts of the
rn
study area (Fig. 6b). The areas with high PC2 scores occupied 5% and 10% of the total area contain
Jo u
34.1% and 54.5% of the known Sn deposits, respectively. The PC2 scores in the western part of the study area are relatively low compared to those in the eastern part. The high geochemical background in the eastern part might inhibit the recognition of relatively weak anomalies in the western part of the study area. Previous study suggested that the western part of the area has great potential for undiscovered Sn deposits (Cheng, 2007). Therefore, the elemental association anomalies based on the assemblages of Ag, Bi, Co, Cu, Ni, Pb, Sn, W, and Zn should be further studied in this area.
4.3. DAN
Journal Pre-proof Based on the results of RPCA, Ag, Bi, Co, Cu, Ni, Pb, Sn, W, and Zn were selected for further detecting geochemical anomalies associated with Sn mineralization via DAN. The anomaly recognition index of DAN is reconstruction error, which is high for geochemical anomalies samples, and low for geochemical background samples. Choosing appropriate model parameters is critical to the performance of DAN. Based on a number of experiments, the number of input layer units of the DAN is set to 5, and the number of hidden layer units was set to 5, 10, 20 and 40, respectively. The number of iterations and learning rate of the model were fixed to 200 and 0.3, respectively. The
oo
f
detailed optimal parameters selection process of DAN has been drawn in Xiong and Zuo (2016).
pr
On the basis of choosing appropriate model parameters, the reconstruction errors corresponding to
e-
each cell in the study area were calculated by DAN. The extracted elemental association anomalies based on the assemblages of Ag, Bi, Co, Cu, Ni, Pb, Sn, W, and Zn occur not only in the eastern
Pr
part of the area, where most of the known Sn polymetallic deposits fall within the target areas, but
al
also a number of target areas delineated in the western part of the area where there have not yet been any significant discoveries (Fig. 7). One of the reason may be attributed to the strong ability of
rn
deep learning to extract automatically high-level features from complex geochemical data, which
Jo u
are characterized by neither normal nor log-normal, strongly skewed, multi-modal data distributions due to various complex geological processes, complex erosion processes, and influence of compositions and distributions of regolith and bedrock (Reimann and Filzmoser, 2000; Reimann et al., 2002; Spadoni, 2006; Yousefi et al., 2013).
The Student’s t value was used to evaluate whether the anomalies obtained by DAN had a spatial correlation with the locations of known Sn polymetallic mineralization or not. In general, Student’s t value > 1.96 suggests a statistically significant correlation between anomalies and the known mineralization at a 95% confidence interval. The larger the Student’s t-value, the stronger the spatial correlation (Bonham-Carter, 1994). It can be observed from Fig.8 that the maximum
Journal Pre-proof Student’s t value is up to 5.84 which is higher than 1.96, suggesting a strong spatial correlation between the obtained anomalies and the locations of known Sn polymetallic mineralization.
In addition, from a geological point of view, most of the areas linked to high values occur either within the Gejiu Formation or round the Gejiu Batholith, and other anomalous areas develop along structures (Fig.7). These geological factors played a critical role for the formation of Sn polymetallic mineralization. The Gejiu Batholith provided fluids, heat, and a part of metals (Cheng
f
et al., 2012), the Gejiu formation offered depositional spaces for metals (Cheng et al., 2012, 2013b),
oo
and faults served as pathways of hydrothermal fluids and the space for the depositional of ore
pr
minerals, such as six EW-trending faults in Laochang Sn–W–Cu polymetallic deposit in this district
e-
(Sun et al., 1987; Jiang et al., 1997; Cheng et al., 2013a, 2013b). These observations indicate these anomalies detected in this study have a close spatial association with Sn polymetallic mineralization.
Pr
Therefore, the delineated anomaly areas based on DAN in the western part can be considered as
rn
5. Conclusions
al
favorable prospective for undiscovered Sn polymetallic deposits.
Jo u
Geodata science is a discipline that deals with and mines geoscience data in order to derive the geoinformation and geoknowledge of interest. In this paper, a case study for mapping geochemical exploration data was reported to demonstrate the new research paradigm of GDS in geoscience. The following conclusions can be obtained: (1) GDS is the science to studying and mining geospatial patterns and can derive meaningful and unknown geoinformation and geoknowledge; (2) GDS is a new research paradigm in geoscience and can be used to explore the spatial association of geochemical patterns, mining elemental association, and recognize geochemical anomalies associated with mineralization via geo-computation and geo-visualization techniques in support of mineral exploration; and
Journal Pre-proof (3) For processing stream sediment geochemical data in the Gejiu district, Yunnan Province, China via GDC, we found that Sn and Cu have a close spatial association with each other, and Ag, Bi, Co, Cu, Ni, Pb, Sn, W, and Zn can be regarded as pathfinder elements for Sn polymetallic mineralization. The anomalies obtained by this study provide a clue for the next round of mineral exploration in this study area.
Acknowledgements
f
Thanks are due to two reviewers’ comments and suggestions, which helped us improve this study.
oo
This study was jointed awarded by the National Natural Science Foundation of China
pr
(Nos.41972303 and 41772344), and the Most Special Fund from the State Key Laboratory of
e-
Geological Processes and Mineral Resources, China University of Geosciences (MSFGPMR03-3).
Pr
References
al
Anselin, L., 1995. Local indicators of spatial association—LISA. Geographical Analysis 27, 93–
rn
115.
Jo u
Anselin, L., 1999. Interactive techniques and exploratory spatial data analysis. Geographical Information Systems: principles, techniques, management and applications 1, 251-264. Baumann, P., Mazzetti, P., Ungar, J., Barbera, R., Barboni, D., Beccati, A., Campalani, P., 2016. Big data analytics for earth sciences: the EarthServer approach. International Journal of Digital Earth 9, 3–29. Bonham-Carter, G.F., 1994. Geographic information systems for geoscientists: modeling with GIS. Pergamon Press, Oxford. 398 pp. Chen, L., Guan, Q., Xiong, Y., Liang, J., Wang, Y., Xu, Y., 2019a. A spatially constrained multi-autoencoder approach for multivariate geochemical anomaly recognition. Computers & Geosciences 125, 43-54.
Journal Pre-proof Chen, L., Guan, Q., Feng, B., Yue, H., Wang, J., Zhang, F., 2019b. A multi-convolutional autoencoder approach to multivariate geochemical anomaly recognition. Minerals 9, 270. Chen, Y., Lu, L., Li, X., 2014. Application of continuous restricted Boltzmann machine to identify multivariate geochemical anomaly. Journal of Geochemical Exploration 140, 56–63. Chen, Y., Wu, W., 2017. Application of one-class support vector machine to quickly identify multivariate anomalies from geochemical exploration data. Geochemistry: Exploration
oo
f
Environment. Analysis 17, 231–238. Cheng, Q., 2007. Mapping singularities with stream sediment geochemical data for prediction of
pr
undiscovered mineral deposits in Gejiu, Yunnan Province, China. Ore Geology Reviews 32,
e-
314–324.
Pr
Cheng, Q., Agterberg, F. P., Ballantyne, S. B., 1994. The separation of geochemical anomalies from background by fractal methods. Journal of Geochemical Exploration 51,109–130.
al
Cheng, Q., Xu, Y., Grunsky, E., 2000. Integrated spatial and spectrum method for geochemical
rn
anomaly separation. Natural Resources Research 9, 43–52.
Jo u
Cheng, Y., Mao, J., 2010. Age and geochemistry of granites in Gejiu area, Yunnan province, SW China: constraints on their petrogenesis and corresponding tectonic setting. Lithos 120, 258– 276.
Cheng, Y., Mao, J., Rusk, B., Yang, Z., 2012. Geology and genesis of Kafang Cu–Sn deposit, Gejiu district, SW China. Ore Geology Reviews 48, 180–196. Cheng, Y., Mao, J., Spandler, C., 2013a. Petrogenesis and geodynamic implications of the Gejiu igneous complex in the western Cathaysia block, South China. Lithos 175–176, 213–229 Cheng, Y., Mao, J., Chang, Z., Pirajno, F., 2013b. The origin of the world class tin-polymetallic deposits in the Gejiu district, SW China: Constraints from metal zoning characteristics and
Journal Pre-proof 40
Ar–39Ar geochronology. Ore Geology Reviews 53, 50–62.
Cleveland, W.S., 2014. Data science: an action plan for expanding the technical areas of the field of statistics. Statistical Analysis and Data Mining, 414–417. Dai, F., 1996. Characteristics and evolution of rock series, lithogenesis, metallogenesis of crust-derived anatectin magma in Gejiu ore field. Geol. Yunnan 15, 330–344 (in Chinese with English abstract).
f
de Caritat, P., Cooper, M., Pappas, W., Thun, C., Webber, E., 2010. National geochemical survey of
oo
australia: analytical methods manual geoscience Australia record. (2010/15 (22 pp.)).
pr
Egozcue, J.J., Pawlowsky-Glahn, V., Mateu-Figueras, G., BarceloVidal, C., 2003. Isometric logratio
e-
transformations for compositional data analysis. Mathematical Geology 35, 279–300.
Pr
Egozcue, J. J., Pawlowsky-Glahn, V., 2005. Groups of Parts and Their Balances in Compositional Data Analysis. Mathematical Geology, 37, 795-828.
al
Filzmoser, P., Hron, K., Reimann, C., 2009. Principal component analysis for compositional data
rn
with outliers. Environmetrics 20, 621–632.
Jo u
Fiore, U., Palmieri, F., Castiglione, A., De Santis, A., 2013. Network anomaly detection with the restricted Boltzmann machine. Neurocomputing, 122, 13-23. Grunsky, E.C., 2010. The interpretation of geochemical survey data. Geochemistry: Exploration, Environment, Analysis 10, 27–74. Hawkes, H.E., Webb, J.S., 1962. Geochemistry in mineral exploration. Harper and Row, New York, NY. Hayashi, C., 1998. What is data science? Fundamental concepts and a heuristic example. In Data science, classification, and related methods (pp. 40-51). Springer, Tokyo. Hinton, G.E., Salakhutdinov, R.R., 2006. Reducing the dimensionality of data with neural networks.
Journal Pre-proof Science 313, 504–507. Jiang, Z.W., Nicholas, H.S.O., Teren, D.B., 1997. Numerical modeling of fault-controlled fluid flow in the genesis of tin deposits of the Malage ore field Gejiu mining district. Economic Geology 92, 228–247. Jolliffe, I.T., 2002. Principal Component Analysis, 2nd edn. Springer, New York, 547 NY. 487 pp. Karpatne, A., Ebert-Uphoff, I., Ravela, S., Babaie, H. A., Kumar, V., 2019. Machine learning for the
oo
f
geosciences: Challenges and opportunities. IEEE Transactions on Knowledge and Data Engineering 31, 1544–1554.
pr
Li, K., 2007. Integrated information metallogenic prediction of tin-polymetallic deposit in western
e-
Gejiu, Yunnan. A Dissertation Submitted to China University of Geosciences for the Degree of
Pr
Master of Engineering (In Chinese with English abstract). Li, S., Chen, J., Xiang, J., 2019. Applications of deep convolutional neural networks in prospecting
al
prediction based on two-dimensional geological big data. Neural Computing and Applications.
rn
https://doi.org/10.1007/s00521-019-04341-3.
Jo u
Ma, X., 2018. Data science for geoscience: leveraging mathematical geosciences with semantics and open data. In Handbook of Mathematical Geosciences, pp. 687-702. Springer, Cham. Mao, J., Cheng, Y., Guo, C., Yang, Z., Zhao, H., 2008. Gejiu tin polymetallic ore-field: deposit model and discussion. Acta Geology Sinica 81, 1456–1468 (in Chinese with English abstract). Matheron, G., 1962. Traité de géostatistique appliquée. Editions Technip. Mattmann, C.A., 2013. A vision for data science. Nature 493, 473–475. Miesch, A.T., 1981. Estimation of the geochemical threshold and its statistical significance. Journal Geochemical Exploration 16, 49–76. Naur, P., 1974. Concise Survey of Computer Methods. Petrocelli Books. 397 p.
Journal Pre-proof Qin, D., Li, Y., 2008. Studies on the Geology of the Gejiu Sn–Cu Deposit. Science Press, Beijing, pp. 1–180 (in Chinese with English abstract). Reimann, C., Filzmoser, P., 2000. Normal and lognormal data distribution in geochemistry: death of a myth. Consequences for the statistical treatment of geochemical and environmental data. Environmental geology, 39, 1001-1014. Reimann, C., Filzmoser, P., Garrett, R. G., 2002. Factor analysis applied to regional geochemical
oo
f
data: problems and possibilities. Applied Geochemistry 17, 185-206. Reimann, C., Filzmoser, P., Garrett, R., Dutter, R., 2008. Statistical Data Analysis Explained:
pr
Applied Environmental Statistics with R. John Wiley & Sons, Chichester. 362 pp.
e-
Sakurada, M., Yairi, T., 2014. Anomaly detection using autoencoders with nonlinear dimensionality
Pr
reduction. In Proceedings of the MLSDA 2014 2nd Workshop on Machine Learning for Sensory Data Analysis, pp. 4.
al
Sinclair, A.J., 1974. Selection of threshold values in geochemical data using probability graphs.
rn
Journal of Geochemical Exploration 3, 129–149.
Jo u
Spadoni, M., 2006. Geochemical mapping using a geomorphologic approach based on catchments. Journal of Geochemical Exploration, 90, 183-196. Sun, J., Jiang, Z., Lei, Y., 1987. Structure-geochemistry of Malage deposit in Gejiu district. Geochimica 4, 303–311 (in Chinese with English abstract). Sun, J., Steinecker, A., Glocker, P., 2014. Application of deep belief networks for precision mechanism quality inspection. In International Precision Assembly Seminar, 87–93. Symanzik, J., 2014. Exploratory spatial data analysis. M. M. Fischer, P. Nijkamp (eds.), Handbook of Regional Science, https://doi.org/10.1007/978-3-642-36203-3_76-1. Tansley, S., Tolle, K. M., 2009. The fourth paradigm: data-intensive scientific discovery, Vol. 1. A.
Journal Pre-proof J.Hey. Redmond, WA: Microsoft research. Tobler, W. R., 1970. A computer movie simulating urban growth in the Detroit region. Economic Geography 46, 234–240. Tukey, J.W., 1977. Exploratory data analysis. Addison Wesley, Reading. Valentine, A. P., Trampert, J., 2012. Data space reduction, quality assessment and searching of seismograms: autoencoder networks for waveform data. Geophysical Journal International 189,
oo
f
1183–1202. Wang, J., Zuo, R., 2016. An extended local gap statistic for identifying geochemical anomalies.
pr
Journal of Geochemical Exploration 164, 86–93.
e-
Xie, X., Mu, X., Ren, T., 1997. Geochemical mapping in China. Journal of Geochemical
Pr
Exploration 60, 99–113.
Xie, X., Wang, X., Zhang, Q., Zhou, G., Cheng, H., Liu, D., Cheng, Z., Xu, S., 2008. Multiscale
rn
341.
al
geochemical mapping in China. Geochemistry: Exploration, Environment, Analysis 8, 333–
Jo u
Xiong, Y., Zuo, R., 2016. Recognition of geochemical anomalies using a deep autoencoder network. Computers & Geosciences 86, 75–82. Xiong, Y., Zuo, R., Carranza, E.J.M., 2018. Mapping mineral prospectivity through big data analytics and a deep learning algorithm. Ore Geology Reviews 102, 811–817. Yousefi, M., Carranza, E. J. M., Kamkar-Rouhani, A., 2013. Weighted drainage catchment basin mapping of geochemical anomalies using stream sediment data for mineral potential modeling. Journal of Geochemical Exploration, 128, 88-96. Yu, C., Tang, Y., Shi, P., Deng, B., 1988. The dynamic system of endogenic ore formation in Gejiu tin–polymetallic ore region, Yunnan Province. China University of Geosciences Press, Wuhan,
Journal Pre-proof China, 394 pp. (in Chinese with English Abstract). Zhao, J., Chen, S., Zuo, R., 2016. Identifying geochemical anomalies associated with Au–Cu mineralization using multifractal and artificial neural network models in the Ningqiang district, Shaanxi, China. Journal of Geochemical Exploration 164, 54–64. Zuo, R., 2011. Identifying geochemical anomalies associated with cu and Pb–Zn skarn mineralization using principal component analysis and spectrum–area fractal modeling in the
oo
f
Gangdese belt, Tibet (China). Journal of Geochemical Exploration 111, 13–22. Zuo, R., Xia, Q., Wang, H., 2013. Compositional data analysis in the study of integrated
pr
geochemical anomalies associated with mineralization. Applied Geochemistry 28, 202–211.
e-
Zuo, R., 2014. Identification of geochemical anomalies associated with mineralization in the
Pr
Fanshan district, Fujian, China. Journal of Geochemical Exploration 139, 170–176. Zuo, R., Wang, J., 2016. Fractal/multifractal modeling of geochemical data: a review. Journal of
al
Geochemical Exploration 164, 33–41.
rn
Zuo, R., 2017. Machine learning of mineralization-related geochemical anomalies: a review of
Jo u
potential methods. Natural Resources Research 26, 457-464. Zuo, R., Xiong, Y., 2018. Big data analytics of identifying geochemical anomalies supported by machine learning methods. Natural Resources Research 27, 5–13. Zuo, R., 2018. Selection of an elemental association related to mineralization using spatial analysis. Journal of Geochemical Exploration 184, 150–157. Zuo, R., Xiong, Y., Wang, J., Carranza, E.J.M., 2019. Deep learning and its application in geochemical mapping. Earth-Science Reviews 192, 1–14.
Journal Pre-proof Figure and Table caption Figure 1. Geodata science as an interdisciplinary subject of geoscience and data science. Figure 2. The data chain of Geodata science. Figure 3. Geodata science tools. Figure 4. Simplified geological map of Gejiu region, Yunnan Province, China (after Li, 2007). Figure 5. Cluster and outliers analysis of (a) Sn and (b) Cu. Figure 6. Results of robust principal component analysis: (a) biplot of PC1 and PC2, and (b) the
f
spatial distribution of PC2.
oo
Figure 7. Geochemical anomalies detected by deep autoencoder network.
e-
pr
Figure 8. Plot of Student's t-values vs geochemical anomaly (a), and geochemical anomaly map (b).
Jo u
rn
al
Pr
Table 1. Loading values of principal component analysis
Jo u
rn
al
Pr
e-
pr
Figure 1
oo
f
Journal Pre-proof
Journal Pre-proof
Jo u
rn
al
Pr
e-
pr
oo
f
Figure 2
Journal Pre-proof
Jo u
rn
al
Pr
e-
pr
oo
f
Figure 3
pr
oo
f
Journal Pre-proof
Jo u
rn
al
Pr
e-
Figure 4
e-
pr
oo
f
Journal Pre-proof
Jo u
rn
al
Pr
(a)
(b) Figure 5
Jo u
rn
al
Pr
e-
(a)
pr
oo
f
Journal Pre-proof
(b) Figure 6
e-
pr
oo
f
Journal Pre-proof
Jo u
rn
al
Pr
Figure 7
Journal Pre-proof 7.0 6.0
4.0 3.0 2.0
0.0 0.2
0.4
0.6
oo
0
f
1.0
rn
al
Pr
e-
(a)
pr
Geochemical anomaly
Jo u
t-values
5.0
(b) Figure 8
0.8
1
Journal Pre-proof Table 1 Elements
PC1
PC2
PC3
0.022
-0.094
0.411
As
-0.275
0.327
-0.256
Au
0.082
-0.002
-0.026
Bi
-0.470
-0.227
0.012
Cd
0.164
0.058
0.512
Co
0.295
-0.238
-0.091
Cu
0.228
-0.249
-0.441
Hg
0.168
0.369
0.243
Mo
0.130
0.387
-0.086
Ni
0.370
-0.181
-0.339
Pb
-0.281
-0.145
Sb
-0.079
0.522
-0.213
Sn
-0.306
-0.232
-0.068
W
-0.314
Zn
0.266
f
Ag
pr
oo
0.123
-0.012
-0.182
0.230
Pr
e-
-0.113
Highlights
GDS is the science to studying and mining geospatial patterns and can derive meaningful and
rn
al
unknown geoinformation and geoknowledge
Jo u
GDS is a new research paradigm in geoscience and can be used for geochemical mapping in support of mineral exploration
GDC can reveal the spatial association, identify the elemental association, and recognize geochemical anomalies.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8