Unsupervised clustering and empirical fuzzy memberships for mineral prospectivity modelling

Unsupervised clustering and empirical fuzzy memberships for mineral prospectivity modelling

Accepted Manuscript Unsupervised clustering and empirical fuzzy memberships for mineral prospectivity modelling Johanna Torppa, Vesa Nykänen, Ferenc M...

7MB Sizes 0 Downloads 29 Views

Accepted Manuscript Unsupervised clustering and empirical fuzzy memberships for mineral prospectivity modelling Johanna Torppa, Vesa Nykänen, Ferenc Molnár PII: DOI: Reference:

S0169-1368(18)30519-5 https://doi.org/10.1016/j.oregeorev.2019.02.007 OREGEO 2822

To appear in:

Ore Geology Reviews

Received Date: Revised Date: Accepted Date:

26 June 2018 28 January 2019 7 February 2019

Please cite this article as: J. Torppa, V. Nykänen, F. Molnár, Unsupervised clustering and empirical fuzzy memberships for mineral prospectivity modelling, Ore Geology Reviews (2019), doi: https://doi.org/10.1016/ j.oregeorev.2019.02.007

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Unsupervised clustering and empirical fuzzy memberships for mineral prospectivity modelling Johanna Torppa, Vesa Nykänen and Ferenc Molnár Geological Survey of Finland

Abstract We propose to increase the role of empirical methods in mineral prospectivity modelling for two reasons: 1) to make use of data more effectively and 2) to decrease the effect of subjectivity included in expert interpretation. We present two approaches for using known mineral occurrences to define the relationship between observed or measured geoscientific parameters and the occurrence of mineralizations. In the first approach, we define the fuzzy memberships of each geoscientific parameter separately for fuzzy logic modelling. Our approach proves to be highly useful for investigating the quality of the data in addition to defining the membership transformation functions. In our test case, the data are somewhat scattered due to the inherent variability of ore-forming environments, and manual evaluation was required to guide the computations. For the second approach, we present a technique for delineating non-prospective regions to be able to focus more detailed prospectivity modelling to potentially prospective regions. Our study not only highlights the advantages of using computational methods in prospectivity modelling, but also emphasizes the important role of geological expertise in the modelling process. Keywords: Mineral prospectivity, empirical, clustering, fuzzy, self-organizing maps, membership function

1 Introduction The Central Lapland Greenstone Belt (CLGB) in northern Finland (i.e., the study area) contains a large number of ore deposits of various types (e.g., 10 orogenic Au, 6 magmatic Ni-Cu-PGE, 5 magmatic Cr-PGE, 9 magmatic Fe-Ti-V, 15 IOCG, 2 VMS and 6 BIF deposits as well as 4-5 mineral deposits with polygenetic or other hydrothermal origin) and a large number of ore occurrences. Although the belt is largely concealed by Quaternary till, there is an excellent coverage of geological, geochemical and geophysical information on the bedrock due to intense exploration and mining activities during the past decades. As such, the study area is well suited to developing prospectivity modelling methods, studying ore-forming processes and defining

parameters that are applicable for regions geologically similar to the CLGB. Our study is focused on the prospectivity modelling of orogenic gold deposits and occurrences, which are especially abundant and economically important in the region, as the largest primary gold producer in Europe is located in the northern part of the CLGB (the Kittilä mine at the Suurikuusikko gold deposit; Wyche et al., 2015). According to Niiranen et al. (2014), up to 30 times the currently reported gold resources are still undiscovered within the CLGB. The prospectivity of orogenic gold in the CLGB has previously been investigated using modern GISbased computational methods and various geoscientific data sets in several studies (Nykänen and Salmirinne, 2007; Nykänen et al., 2008b; Nykänen, 2008). Computational prospectivity modelling has significantly improved during the past decades, along with the evolution of geographical information system (GIS) platforms, and this approach is currently able to efficiently integrate input from various data sources. An extensive overview of the large number of studies carried out in this field of research has been provided by Carranza (2017). Despite the active development of computational methods, practical prospectivity modelling still includes a considerable amount of manual work and expert interpretation, part of which could be transferred to more easily traceable computational procedures. One of the most commonly used methods in current prospectivity modelling involves fuzzy logic, which was first applied in mineral exploration by An et al. (1991). A crucial step in fuzzy logic modelling is the definition of continuous favourability values, or memberships, of various geoscientific evidence features. Logistic functions are popular for transforming the evidence data values to membership values, usually ranging from zero to one. Function parameters are often derived from the evidence data using expert judgement (Nykänen et al. 2008a and Nykänen et al. 2017). The problem with evidence memberships constantly ranging from zero to one is that all the evidence is then expected to have an equally strong relation to the occurrence of minerals, which is not the case in reality. Yousefi and Nykänen (2016) proposed an approach that uses the range of the evidence data to define the parameters of the fuzzy membership transformation function and allows the minimum and maximum values of the membership to deviate from zero and one, respectively. However, their method does not take into account the distribution of the data within its range, and the obtained membership values are difficult to interpret as favourability. Yousefi et al. (2012) proposed an approach for geochemical data that uses factor scores, computed for a set of several elements, to define fuzzy memberships for the geochemical factors. Methods using known mineral occurrences as ground truth have also been used for defining the fuzzy memberships, but with discretized evidence features, which does not fully take advantage of the information content of the data. For example, Knox-Robinson and Wyborn (1997) and Parsa et al. (2017) computed the relative abundance of ore occurrences for categorical and discretized data using the proportion of the study area covered by each evidence feature category and the proportion of known occurrences representing this category. Cheng and Agterberg (1999), Porwal et al.

(2003) and Zhang et al. (2017) used the contrast value from the weights of evidence method to define the fuzzy value for each category of a discretized evidence feature. Mineral prospectivity modelling (MPM) is commonly conducted for an area bounded by governmental borders or environmental features, or it can simply be a square area around the region of interest. Furthermore, the availability of data restricts the area that can be mapped. Whatever the delineated area is, prospectivity scores are usually computed for the entire area, although large parts of it could be considered non-prospective. MPM approaches do not generally produce large areas with zero prospectivity, and it is not straightforward to interpret the threshold for “non-prospective” on a prospectivity map computed for the entire study region. For decision making in land-use planning, however, it would be useful to be able to clearly distinguish the non-prospective areas from possibly prospective ones. In the context of quantitative mineral resource assessment (Singer, 1993), a region defined as a permissive tract is manually outlined based on the local geology. Such an approach could also be applied in prospectivity score computation. Permissive tracts could also be delineated computationally using data clustering to delineate geologically similar areas, and by classifying certain areas as non-prospective based on, for instance, the spatial distribution of known occurrences. Clustering of geoscientific data has been applied in prospectivity modelling by Abedi et al. (2012), who trained clusters using drill-core data to generate prospectivity scores, and by Torppa et al. (2015), who used the clusters to describe the spatial distribution of evidence data values and the quantization error of clustering to define the locations of anomalies. These studies suggest that clustering of geophysical and geochemical data can be used to delineate different geological units. To concentrate the computation of prospectivity scores on the potentially prospective region, we propose clustering of the data prior to prospectivity score computation, and masking out clusters that have no known occurrences and that are considered non-prospective based on the data distribution within the cluster. We use two unsupervised clustering methods, self-organizing maps and k-means, to cluster the data, and train the clusters using known mineral occurrences. To compute the prospectivity scores for the masked area, we use fuzzy logic in a data-driven form, introducing an empirical approach to define fuzzy memberships that does not break the continuity of the input data.

2 Terminology and structure of the paper We use the term mineral prospectivity modelling instead of mineral prospectivity mapping. The central idea is to develop a prospectivity model that can be applied to greenfield regions. Modelling here mainly refers to defining the membership functions and parameters that can be used to produce prospectivity maps for greenfields. The model parameters can and should be refined using data from other brownfield areas. Input data is used to refer to the data derived from measurements and observations, for which basic processing has already carried out. Basic processing is considered to include the minimum processing needed

to make the measurements useful (e.g., removal of instrument effects in geophysical measurements). Evidence features refer to the pre-processed input data. Pre-processing includes interpolation, gradient computation and filtering. The grids derived from the evidence features that are finally integrated to produce the prospectivity map (fuzzy integration) are here called fuzzy memberships. Training data or training points refer to known mineral occurrences. For the region that we compute the prospectivity scores for, we use the term prospective tract instead of permissive tract, which is used in the context of quantitative mineral resource assessment. A permissive tract is defined, using expert judgement, as a region that is favourable for mineral deposition based on its geology (Singer, 1993), while a prospective tract is empirically defined. The better the expert judgement in the delineation of the permissive tract is, and the better the data for the delineation of the prospective tract are, the more similar these two regions will be. This paper first describes the geological environment of the study area, along with the measurements carried out, in Section 3. Section 4 describes the preprocessing of the data and the evidence data used for data integration. Sections 5 and 6 present the two stages of the MPM process in which novel approaches are introduced, namely the delineation of the prospective tract and computation of the prospectivity model. Results are also presented in these sections. Finally, discussion and conclusions are provided in Section 7.

3 Geological setting and input data Although careful choice of the data analysis approach is important, the input data have the most dominating effect on the prospectivity solution. Expertise on the geology of the study area and on the properties of the mineral deposit type must be used to define which geophysical and geochemical phenomena are connected with the occurrence of the mineral and the deposit type of interest. The mineral system approach and analysis (Wyborn et al., 1994; McCuaig et al. 2010; Hagemann et al., 2016; Wyman et al., 2016) was used to define those mappable critical geological and geophysical footprints of orogenic gold deposits in the Central Lapland Greenstone Belt that can be used as input parameters for prospectivity mapping. Our study area extends from the Finnish–Swedish border 180 km to the east and 110 km in a north–south direction. It centres on the Sirkka Shear Zone (SSZ) in the Central Lapland Greenstone Belt (CLGB) and covers most of the prospective Palaeoproterozoic CLGB, with a total area of 18515 km2 ( Figure 1). The Palaeoproterozoic rock sequences in the CLGB show a history of episodic sedimentation and volcanism during an elongated intracontinental and continental margin rifting from 2.4 to about 2.05 Ga. Rifting finally led to the development of narrow oceanic basins between the Kola, Karelia and Norrbotten terrains of Archean age at around 1.95 Ga (Lahtinen et al., 2005; Korja et al., 2006). Accretion and collision driven by E–W- and N–S-oriented compression and associated thrusting and metamorphism took place during the Lapland–Kola and Lapland–Savo orogenies between 1.92 and 1.88 Ga, the early stages of the diachronous and complex Svecofennian orogeny (1.92–1.8 Ga). In the central part of the CLGB, the peak

metamorphic conditions correspond to mid- to upper greenschist facies, whereas the metamorphic grade increases away from these central areas and in close proximity to the syn- to post-orogenic granite plutons (Hölttä et al., 2007; Hölttä and Heilimo, 2017). According to Weihed et al. (2005), all orogenic gold deposits in the Fennoscandian Shield are structurally controlled and are located in second- to lower-order shear or fault zones within local compressional to transpressional structures at the time of mineralization, being traditional representatives of this deposit type (Groves et al. 1998, McCuaig & Kerrich 1998, Goldfarb et al. 2001). Furthermore, Weihed et al. (2005) described the most favourable sites for orogenic gold on a local scale as being: (i) pre-gold albitized units, (ii) competent units enveloped by less competent rocks and (iii) contact zones between chemically reactive rocks having a significant competency contrast with adjacent rocks. These features are not commonly shown on regional-scale geological maps, and we therefore use derivatives of geological and geophysical maps as proxies for them. In addition to structural control, metamorphism can be considered as one of the key elements for orogenic gold mineral systems. According Philips & Powell (2010), amphibolite or higher facies metamorphic domains are the sources for the fluids, ligands and metals. Greenschist facies domains are found to be the most favourable for hosting the deposits. The orogenic gold deposits and occurrences in the CLGB are mainly hosted by the mafic to ultramafic volcanic and sedimentary sequences of the Savukoski Group and Kittilä Group. Orogenic gold occurrences and deposits are mostly confined to the areas of greenschist facies metamorphism, while higher grade rocks have very few gold occurrences. Gold mineralization is most typically hosted by the mafic to ultramafic volcanic and sedimentary rocks of the Savukoski Group, which were deposited during the ca. 2.05 Ga continental margin rifting, and basaltic and sedimentary rocks of the Kittilä Group, which were accumulated in oceanic basins at around 1.95 Ga. Therefore, the spatial distribution of the Savukoski and Kittilä Group rocks, together with their metamorphic grade, are belt-scale critical parameters controlling the distribution of gold deposits. Most of the known deposits were formed during the late to post-orogenic extensional phase (1.8–1.76 Ga), but some of them possibly during the 1.92–1.88 Ga compression phase (Sorjonen-Ward et al. 1992, 2003; Eilu et al. 2007; Molnár et al., 2017). There is no spatial relationship between the distribution of gold deposits and localization of the syn- to post-orogenic granitoid plutons, although several gold deposits of the CLGB include altered and mineralized 1.91 Ga felsic porphyry dykes, and some early gold mineralization events (e.g. at the Suurikuusikko and Kuotko deposits) have the same ages (within the errors of the dating methods). The main 1.81–1.76 Ga age of gold mineralizing events temporarily overlaps the crystallization age of the post-orogenic granitoid plutons (Molnár et al., 2017; Molnár et al., 2018). The majority of the gold occurrences and deposits are associated with the NNW–SSE-oriented Sirkka Shear Zone (SSZ), which is the main south-dipping thrust fault system along the southern boundary of the CLGB (Figure 1). Another major

structure controlling the localization of known deposits is the NNE–SSW-oriented Kiistala Shear Zone (KSZ, Fig. 1). According to the results of U–Pb and Re–Os dating, these shear zones formed during the early accretional/collisional orogenic stages and channelled repeated fluid flow events during the evolution of the Svecofennian orogeny (Molnár et al., 2017; Molnár et al., 2018). There are several shear zones and thrust fault systems in and around the CLGB with an orientation similar to the KSZ and SSZ. Some of the gold deposits are located within the main shear zones and others in subsidiary fault and shear zones within 3 km from them. This peculiarity and the pattern of shear zones are considered as critical parameters defining the localization of gold deposits in the CLGB. The hydrothermal alteration zones with and without known gold mineralizations are also controlled by the major structures in the CLGB. Although potassium metasomatic alteration (sericitization, biotitization, Kfeldspar alteration), carbonatization, tourmalinizaton and sulfurization are common footprints of orogenic gold deposits in the CLGB (Eilu, 2007), albitization is the most consistently observed alteration type of rocks in the CLGB, according to the field and drill-core observation records in GTK’s database. Albitization is usually a pre- or syn-ore type of alteration and indicates well the major zones of fluid flow. Albitization, regardless of the protolith, produces a very competent rock, and the competency difference between adjacent lithological units is an important factor in the formation of local trap conditions for gold precipitation. The competency differences support the development of dilatational zones for focused fluid flow. The fluctuation of pressure in these zones favours phase separation (degassing, effervescence, boiling) in the hydrothermal fluids, which may lead to the precipitation of gold. Therefore, the occurrence of albitites is also considered as a mappable critical parameter in the orogenic gold mineral system model for the CLGB. Hydrothermal alteration may also significantly change the magnetic and electrical properties of host rocks. Magnetic and conductive minerals such as monoclinic pyrrhotite and magnetite are also often present in the gold-bearing veins and altered rocks of the CLGB, while these minerals are usually absent in unmineralized zones. Therefore, filtered airborne magnetic anomalies (see below) appear to be useful for outlining hydrothermal zones with potential for gold mineralization. Significant competency differences also exist between mafic–ultramafic volcanic rocks and turbiditic or sulphidic–graphitic metasedimentary units of the Savukoski and Kittilä Groups. These original or juxtaposed lithological boundaries also control the localization of gold ore bodies in several deposits in the CLGB. Although the area is largely covered by the unconsolidated sediments of the last glaciation periods, the contacts between the volcanic and sedimentary units are well expressed by the pseudogravimetric gradient maxima.

3.1 Training data We used the known orogenic gold deposits and occurrences available in the public database of the Geological Survey of Finland (http://gtkdata.gtk.fi/mdae/index.html) as positive training points ( Table 1). Each occurrence is given a single location in the database, but we manually extended the occurrence boundaries to surrounding cells. For occurrences with unknown extents, artificial training points were added so that each occurrence covered 4–8 cells, depending on the reported location with respect to the cell boundaries. This rather non-specific approach can be justified by the following: 1) the occurrences are not point-like in reality, 2) most of the data we use are interpolated, causing only small changes from each pixel to the neighbouring ones, 3) the geophysical measurements are not point-like but are affected by rock properties in a certain footprint area, and 4) the geological data (polygon boundaries and lineaments) have spatial uncertainty considering the resolution of the evidence feature grids. The extent of the active Kittilä mine (Suurikuusikko deposit) was defined on the basis of the outlines of ore bodies. We consider the expanded training data set to better represent the evidence feature values than the original occurrence data, and to enable a more reliable determination of the relationships between evidence features and occurrences/deposits. About half of the orogenic gold occurrences in the Central Lapland Greenstone Belt can be classified as gold only occurrences (Table 1). However, there also are gold occurrences that contain base metals, most commonly copper, nickel and cobalt, as major commodities in association with gold. These occurrences are classified as orogenic gold occurrences with atypical/anomalous metal associations (Eilu, 2015). This variety of orogenic gold occurrences mostly occurs along the Sirkka Shear Zone in the southern part of the Central Lapland Greenstone Belt (Figure 1).

3.2 Input data for evidence features A wide selection of geophysical, geochemical and geological data sets are available from the study area. For this study, we collected all the available information that, based on the general conceptual model of the orogenic gold mineral system, could be related to the occurrences. The input data that were used for generating the evidence features consist of geophysical measurements (magnetic, electromagnetic and gravity) and geological observations and models (rock type from drill cores and outcrops, structural data, metamorphic grade). The input data are presented below in Sections 3.2.1 and 3.2.2. Till geochemical data, although extensively covering the study area and widely used in mineral exploration in the Central Lapland Greenstone Belt, have many challenges from the point of view of mineral prospectivity modelling, such as sparse sampling, till transportation and differing sampling depths. We checked how anomalies for a few elements (Au, Co, Cu, Ni, Te, Pd) in till reflect the occurrence of deposits, but did not find them useful considering our modelling approach. A palaeostress model proposed by Nykänen et al. (2008b) was

considered but was excluded for the same reason as geochemistry, based on frequency-of-occurrence computation, as described in Section 6.1.

3.2.1 Geophysical measurements The formation of orogenic gold deposits includes the interaction of fluids with the host rocks of mineralizations. This interaction alters the original mineralogy of the host rocks and may include the removal or addition of magnetite or pyrrhotite. Moreover, orogenic gold deposits are located along faults/shear zones along which rocks with contrasting magnetic properties are in contact (e.g. the contact of sulphide-rich black schists with mafic volcanic rocks or the contact of mafic volcanic rocks with different magnetite/pyrrhotite contents). Thus, abrupt changes in magnetic properties along fault/shear zones may indicate preferred sites of orogenic gold deposits. Electromagnetic anomalies in bedrock are caused by conductive minerals (graphite, many sulphides) and fractured rock with current-carrying fluids. Although the bedrock sources cannot be uniquely defined from electromagnetic data due to the small skin depth of the ground, the response from overburden and the magnetic properties of the bedrock, electromagnetic anomalies appear to correlate with known orogenic gold occurrences. Gravity is often affected by structures that are favourable for orogenic gold deposition. The density of host rocks also tends to be higher in areas containing gold mineralizations. For more detailed information, Airo (2015) provides a good overview of the geophysical signatures of mineral deposit types, as well as an extensive list of references in the field of research. Airborne magnetic and electromagnetic measurements were carried out during the national airborne geophysical survey programme run by the Geological Survey of Finland (GTK) in 1973–2004 (Airo, 2005), prior to the extraction of minerals from currently active mines. Data were collected with 200-m line spacing at a nominal altitude of 30–40 m using a fixed-wing aircraft with vertical coplanar coils (coaxial until 1979). The study area in the Central Lapland Greenstone Belt consists of 25 survey areas flown during 1975–2004. From the instrumentation statistics in Table 2, it is apparent that the quality of the measurements improved in the course of the survey: the number of frequencies for the electromagnetic measurements increased and the frequency of both magnetic and electromagnetic field registration increased. Furthermore, the caesium magnetometer used after 1992 improved the accuracy and coverage of the magnetic data. A number of reductions and corrections have been made to the raw airborne measurements prior to releasing them for scientific use. These basic quality control and data processing procedures are described in Hautaniemi et al. (2005) for the measurements in general, in Korhonen (2005) for magnetics and in Suppala et al. (2005) for electromagnetics. As the initial magnetic data in this study, we use the deviation of the local magnetic field from the Definitive Geomagnetic Reference Field, derived from the magnetic measurements. By electromagnetic data, we mean the real and imaginary components of the electromagnetic response from the ground.

The airborne geophysical data have many sources of uncertainty concerning the phenomena they are assumed to reflect, including the shift in magnetic anomalies due to the geometry of the sources and possible remanent magnetization and the effect of the overburden and susceptibility on the electromagnetic response, among others. No specific uncertainty estimates for magnetic and electromagnetic data are given in the literature, but it can be expected that the uncertainty due to the measurement system and circumstances is small compared to mineral prospectivity modelling-related issues. The reported metre-scale positional uncertainty should not be a problem in a regional-scale study such as this. The gravity data used in this study were collected by multiple institutions. The Finnish Geospatial Research Institute (formerly the Finnish Geodetic Institute) has carried out gravity measurements across the whole of Finland with an average station distance of 5 km (Kääriäinen and Mäkinen, 1997). GTK has continuously carried out supplementary measurements in different parts of the country since 1972, and most of the gravity measurements in the Central Lapland Greenstone Belt were carried out in the 1970s and 1980s by GTK. A region in the northwestern part of the area was measured in 2005–2010, mainly by GTK, but also by the Finnish Geospatial Research Institute, Suomen Malmi Ltd and Outokumpu Mining Ltd. The spatial density of the measurements is 1–4 points/km2. Measurements are bound to the First Order Gravity Net, maintained by the Finnish Geospatial Research Institute. The First Order Gravity Net is being updated based on measurements carried out in 2009–2011 using the A10 absolute gravimeter (Mäkinen et al., 2010). However, the updated First Order Gravity Net is not yet publicly available, and the gravity measurements used here are bound to the current First Order Gravity Net measured in 1962–1963 (Kiviniemi, 1964). The gravity data used in this study were the Bouguer anomaly computed for each gravity measurement, following Elo (2013).

3.2.2 Geological observations and interpretation As geological data, we used the drill-core and outcrop observation databases of Outokumpu Ltd and the Geological Survey of Finland. Sources of geological interpretation were the bedrock map and the map of metamorphic facies in Finland. The DigiKP bedrock map of Finland (http://en.gtk.fi/informationservices/map_services; Lehtonen et al., 1998) was used to provide information on geological discontinuities. Structures are represented on the map as lineaments showing the locations of faults and thrusts that control the distribution of gold deposits (Figure 1). Approximately 90% of orogenic gold deposits in Lapland are hosted by greenschist facies metamorphic rocks. Higher grade rocks surround the area of the Central Lapland Greenstone Belt with greenschist facies peak metamorphism, and the number of known gold deposits drastically decreases in the higher-grade area (Fig. 1; Table 1). The metamorphic map of Finland, along with the methods and materials used to create it, are described in Hölttä and Heilimo (2017). The map consists of three separate layers for metamorphic grade,

facies and zones defined by polygon areas. We used the peak metamorphic facies layer, which categorizes rocks in the study area into greenschist facies, low amphibolite facies, middle amphibolite facies, high amphibolite facies and undefined. Albitization of host rocks is ubiquitous in the orogenic gold deposits of the Central Lapland Greenstone Belt. This type of hydrothermal alteration affected both sedimentary and volcanic rocks prior to or during the deposition of gold (Eilu et al., 2003; 2007). Albitization created rigid rock units, regardless of the precursor lithology. The importance of albitization in the formation of orogenic gold deposits lies in the competency contrast between the rigid rocks and the differently altered, most commonly chloritized and sericitized surrounding rocks. Albitite is easy to recognize in drill cores and in the field, making the identification of albitite occurrences in rock databases reliable. The locations of albitite occurrences and non-occurrences were extracted from the drill-core and outcrop observation databases of Outokumpu Ltd and the Geological Survey of Finland. The distance between sampling points varies across the region. The largest voids are 10 km in diameter, but an extensive area is covered with observations less than half a kilometre apart. The total number of observations in the area is 821 for albitite occurrences and 65 977 for albitite non-occurrences. The total number of drill cores and outcrop samples in this area is thus 66 798, and the average sample density 3.6 samples/km2.

4 Data preprocessing To obtain a consistent informative set of evidence features (Table 3), we performed a number of modifications to the input data. In order to perform multivariate spatial data analysis, all the features were evaluated at the same locations. Since the geophysical input data represent continuously varying quantities and more or less cover the entire study region, a natural approach is to interpolate the data to a grid. Geological input data do not describe a continuous quantity but rather a binary or multi-class quantity, and all the geological data already cover the entire study area. Derivatives of the geophysical input data were computed to describe the change and strength of the anomalies. The aim was to find regional large-scale features, and a suitable grid resolution for the evidence feature grids was considered to be 25 px/km2, which corresponds to a square pixel size of 200 m x 200 m. This resolution is a compromise between the resolutions of the different input data sets (see Section 3.2); geophysical input data could provide information for a pixel size of 100 x 100 m, but the other input data are spatially less accurate.

4.1 Geological data As structural evidence data, we computed the distance to the manually interpreted structures on the geological map, and as evidence data on the occurrence of albitite, we computed the distance to both albitite occurrences and non-occurrences. The metamorphic grade input data consisted of polygons that were

assigned integer values of 1 to 4, ranging from greenschist facies (1) to high amphibolite facies (4), and transformed to a grid format.

4.2 Geophysical data 4.2.1 Airborne magnetics Several transformations for the magnetic Bouguer anomaly were carried out in this project to generate evidence features that could reveal features related to orogenic gold in the Earth’s topmost crust, where mineral deposits are within our reach. Some of the derived features represent the rate or direction of change of the magnetic field and some the strength of the field. We tested the usability of six derivatives of magnetic input data, but only two proved to be useful for prospectivity modelling of the Central Lapland Greenstone Belt. Since the transformations are much easier to perform on a uniform grid of values, interpolation was first carried out using OASIS software (Geosoft, Toronto, Canada) and the Minimum Curvature Gridding tool. The pixel size for this initial interpolation was 50 m x 50 m. The two derivatives of magnetic input data that were chosen for prospectivity modelling were the reductionto-pole reduced (RTP) magnetic field and the pseudogravimetric gradient maxima. Reduction to the northern pole (Grant and Dodds, 1972) is commonly carried out to model what the field would be in a vertical inducing magnetic field. On an RTP magnetic map, anomalies are located directly above the respective sources and the asymmetry of the field is entirely caused by the asymmetry of the sources, provided that the magnetization of the rocks is caused by induction. Magnetization of the basement rocks is, indeed, usually induction-dominated, but the remanent magnetization component can be significant in mineralized zones containing magnetite and/or monoclinic pyrrhotite, which makes the use of reduction to pole questionable in such areas. The magnetic field reduced to pole was also used as the starting point in all of the following transformations, all of which were carried out using Intrepid software (Intrepid Geophysics, Melbourne, Australia). For prospectivity modelling, the magnetic field reduced to pole was further high-pass filtered using a 1-km-sized median filter. The other derivative used for prospectivity modelling was the pseudogravimetric gradient maxima, which are the horizontal gradient maxima computed using the multi-scale edge detection method (Hornby et al., 1999). The pseudogravimetric gradient maxima are able to show hidden boundaries of rocks with contrasting properties in covered areas. As the boundaries of lithological units with contrasting physical and chemical properties have significance in the localization of orogenic gold deposits, this method has been widely used in the evaluation of relationships between the crustal architecture and settings of gold deposits in orogenic belts (Roy et al., 2010). The assumed connection between the gravimetric and magnetic potential follows Poisson’s relation. Where the relation does not apply, the gravity and pseudomagnetic features do not coincide. It is thus possible to distinguish gravity sources from magnetic sources by including both pseudogravimetric and gravity gradient maxima (Sec 3.2.3) in the evidence data set. Pseudogravimetric

gravity maxima were computed for up to 300 m of upward continuation (Blakely, 1995) to mimic the field as it would be if measured from higher altitudes. In practice, upward continuation smoothens out the highfrequency features caused by near-surface sources and emphasizes the low-frequency features caused by deeper sources. Upward continuation is computed in the wavenumber domain using the Fourier transform for upward continuation of the field (e.g., Blakely 1995). Spacing of the pseudogravimetric gradient maxima of different continuation levels reflects the slope of the underlying feature; the more densely the gradient maxima are located, the steeper is the structure. Multi-scale edge detection also provides the amplitude of the gradient maxima, which reflects the strength of the contrast between the properties of the units on different sides of the structure. Pseudogravimetric gradient maxima below 50 m were removed to exclude random noise. As an evidence feature for prospectivity modelling, we used the distance from the gradient maxima. Four other derivatives were also computed from the magnetic field reduced to pole. Three of these (the contrast-normalized field, analytic signal and first vertical derivative) were not used due to a non-existent correlation with the known occurrences (see Section 6.1), and one (the tilt derivative) due to a strong correlation with RTP magnetic data. The contrast normalized brightness was computed to further enhance local variations using a non-linear space variant contrast stretch filter that supresses long wavelengths and strongly enhances short wavelength variation. To represent the horizontal rate of change of the magnetic field, the analytic signal was computed. The analytic signal vector (Nabighian 1984) gives the direction of the maximum change in magnetic intensity, and its length gives the magnitude of the change. It does not provide information on the tilt of the magnetic source, but it may provide information on the locations of magnetic anomalies; maxima of the analytic signal are located at the edges of extensive magnetic sources, and directly above narrow ones. For vertical change, we computed the first vertical derivative in the wavenumber domain using the Fourier transform for upward continuation of the field (e.g., Blakely 1995). The tilt derivative is the angle between the horizontal plane and the analytic signal vector. The tilt derivative shows the magnitude of the change in the magnetic field, usually obtaining the maximum value on both sides of strong anomalies. In our case, the tilt derivative had a significant correlation with the RTP magnetic data (Spearman’s correlation coefficient 0.8) and, since the RTP magnetic data had a stronger correlation with the known deposits, the tilt derivative was omitted from fuzzy integration. As the final step of preprocessing, the geophysical feature maps were resampled to the pixel size of 200 m x 200 m using the median values of the subsequent pixels.

4.2.2 Airborne electromagnetics The conductivity of the ground affects both components of electromagnetic data. Often, the apparent resistivity is modelled from the data using both components and certain models for the conductivity and geometry of the ground, but due to the additional sources of uncertainty arising from the inverse modelling,

we directly used the real (EMRe) and imaginary (EMIm) components of the response, as well as their ratio EMRe/EMIm. Before computing the ratio, the measured values of both components were interpolated to 50m grids using OASIS software (Geosoft, Toronto, Canada) and the Minimum Curvature Gridding tool. Values of EMIm < -100 ppm and > 2500 ppm, as well as EMRe < -1400 ppm and > 8000 ppm, were winsorized, i.e., all the values above or below the given limit were given the constant value of the limit. For EMRe/EMIm computation, values of EMIm close to zero were transformed to a constant small value to prevent the ratio from approaching infinity.

4.2.3 Gravity Bouguer anomaly values were first interpolated using the minimum curvature gridding algorithm and a cell size of 50 m x 50 m. The interpolated values did not even have a moderate correlation with the known occurrences and, to obtain information on the density contrasts that often occur in structures favourable for hydrothermal gold deposition, we performed multi-scale edge detection to find the gravity gradient maxima analogously with the process carried out on the magnetic input data (Section 4.2.1). Finally, distance from the gravity gradient maxima was computed with a 200-m resolution. A spatial relationship between gravity gradient maxima and world-class orogenic gold deposits has been recognized by Bierlein et al. (2006) in the Archaean Yilgarn Craton, Australia. In the Central Lapland Greenstone Belt, Lahti et al. (2014) have shown that calculated gravity gradient maxima only cover about 30% of the total area, but 70% of the known gold occurrences in orogenic gold and iron oxide copper gold type systems are located within 675 m from the gradient maxima.

5 Prospective tract Our evidence features cover a rectangular area around the Sirkka Shear Zone, but a significant part of the area is not even moderately prospective based on geological knowledge. We propose to define the prospective tract as the region from which non-prospective areas are masked out. One approach to define the prospective tract would be to generate a prospectivity map in the traditional way, using data from the entire study area, and to set a threshold for the final prospectivity map below which the prospectivity values correspond to non-prospective areas. The definition of the threshold is not necessarily unique, however. We chose to define the prospective tract by clustering the data and masked out areas with negligible prospectivity, based on the spatial distribution of known deposits in the clusters. As the permissive tract in quantitative mineral resource assessment (Singer, 1993) is defined as the favourable region, the definitions of permissive and prospective tracts are reversed. By selecting suitable data and criteria, however, they represent the same matter.

Choosing the best clustering algorithm is not straightforward, since the number and the properties of the clusters are not known in advance and the metrics for defining the “best” are not unique. We chose to perform clustering using a two-level approach proposed by Vesanto and Alhoniemi (2000), which first uses self-organizing maps (SOM; Kohonen, 2001) to generate a large number of protoclusters, and then k-means to combine the protoclusters into a smaller number of clusters. In the following, we denote this number of clusters with the symbol k. The motive for using SOM in this study for protoclustering was to save computation time. Since the number of clusters in the data is not known in advance, k-means clustering has to be systematically carried out for a selected range of k. In addition, the k-means clustering solution is sensitive to the initialization of the clusters, and it should thus always be run multiple times for a given number of clusters. SOM clustering is not as sensitive to the initialization as k-means (Vesanto and Alhoniemi, 2000; Bacao et al., 2005), and it is sufficient to carry out SOM protoclustering only once. This reduces the computation time, since there is no need to perform the k-means clustering a number of times on a range of k for the large original data set, but only on the approximately two orders of magnitude smaller set of SOM cluster code vectors. In this study, we computed k-means clusters for k = 2-25 using ten different random initializations (based on the SOM cluster code vectors). The k producing the best clustering solution was defined by computing the Davies-Bouldin index (DB) for each clustering solution: 𝑁

𝑆𝑖 + 𝑆𝑗 1 𝐷𝐵 = ∑ max , 𝑗≠𝑖 𝑀𝑖,𝑗 𝑁 𝑖=1

where N is the number of clusters, Si is the scatter within cluster i, and Mi,j is the distance between the centroids of clusters i and j. Thus, the smaller the scatter within the clusters and the larger the distance between the cluster centroids, the smaller the index. The value of the index usually sharply decreases with an increasing number of clusters (starting from 2) and settles at an approximately constant value. We considered the best k to be the smallest k above of which the Davies-Bouldin index no longer decreases. The chosen number of clusters for our data set was 11. Based on the frequency of known occurrences in each cluster (Figure 2), we divided the clusters into five prospectivity classes by visual inspection. The known occurrences are strongly concentrated on the class containing the single cluster number 5 (surrounded with a red square in the plot in Figure 2 and coloured red in the map in Figure 3), which corresponds to the proximity to albitites, slightly elevated gravity and a low metamorphic grade. The class that is coloured orange corresponds to either elevated RTP magnetic data and correspondingly decreased EMRe and EMRIR (cluster 9), or to elevated EMRe and EMRIR (cluster 2). The yellow cluster represents a low metamorphic grade (cluster 6), elevated EMRe, EMRIR and EMIm (cluster 11) or elevated EMIm and slightly elevated EMRe (cluster 4). No occurrences were entirely in the area covered by the blue class, but always extended to the yellow, orange or red class. Thus, the blue regions were masked out prior to fuzzy integration. The grey region on the map

(Figure 3) represents two clusters with a large distance from structures (pseudogravimetric, gravimetric and manually interpreted) or a high metamorphic grade, neither of which hosted any occurrences. The prospective tract thus contains the red, orange and yellow regions in Figure 3. From the list of occurrences corresponding to each cluster (Figure 2), we see that with the data set we have, we cannot make a distinction between the deposits with gold only and those with atypical metal associations ( Table 1), since both types occur in each cluster.

6 Prospectivity modelling Prospectivity modelling is here performed using the ArcGIS implementation of the fuzzy integration method (Fuzzy overlay tool). Fuzzy integration combines information on several input layers called fuzzy memberships that range over a comparable scale and represent how the corresponding evidence feature responds to the mineral occurrences. The advantages of using fuzzy logic include the ease of understanding of the method and the opportunity to provide a continuous scale for prospectivity instead of defining artificial prospectivity classes, the boundaries of which may be difficult to interpret. In mineral potential mapping (MPM), the possibility to evaluate the memberships using expert estimation only, i.e., without the locations of known occurrences being available, is also advantageous. We, however, use an approach where the membership transformation function is defined empirically using known mineral occurrences as ground truth. Fuzzy memberships were computed here for evidence features in the prospective tract, i.e., omitting areas with no significant mineral potential based on the distribution of known occurrences in clustered data (blue and grey areas in Figure 3). The prospectivity map thus shows the prospective tract with varying prospectivity and the rest of the study area as not prospective. The maximum value range of the fuzzy memberships is here from 0 to 1, so that a zero value refers to strong negative evidence and 1 to strong positive evidence. The minimum and maximum membership values of each evidence depend on the strength of their response to the occurrences. The values in the final prospectivity map do not represent absolute probabilities, but rather indicate which region is more prospective than the others, so we rank the areas based on the fuzzy membership or prospectivity value.

6.1 Fuzzy memberships Membership grids can be taken as spatial representations of the expected level of prospectivity based on a single evidence feature. To define the function for transforming the evidence features to membership values, established geological models and assumptions regarding how the feature is related to mineral occurrences are often used. In this study, we used known occurrences in the study area to empirically define the

memberships by fitting a transformation function to frequency-of-occurrence (FoO) vs. evidence-feature data. For the FoO computation, we generated histograms of the evidence feature values for both the entire study area (HistStudy, left-hand plots in Figure 4) and the known occurrences (HistOcc, middle plots in Figure 4). FoO was calculated as the ratio HistOcc/HistStudy (right-hand plots in Figure 4). In histogram computation, we considered two ways of binning the data: 1) by dividing the evidence feature value evenly (value range binning) and 2) by dividing the number of data points in each bin evenly (quantile binning). The problem in value range binning appeared to be the long tails for some of the HistStudy distributions and a few of the known occurrences falling in some of the tail bins. This combination produces a spurious effect of increasing FoO towards the HistStudy tail bins. Although this problem did not occur for all the evidence features, we decided to use quantile binning for all evidence for consistency. Quantile binning extends the bin range in the small and/or large end of evidence values (histogram tails), but it was not considered a problem in this case, since we did not expect to see rapid changes in FoO at these values. Using quantile binning, HistStudy would normally be approximately flat (e.g., Figure 4, top left-hand plot), but for our winsorized data, the end bins sometimes get more counts than other bins (e.g., Figure 4, second row, left-hand plot). Since quantile binning produces small bin ranges at the most frequent values (usually at small evidence values), the corresponding FoO plot points are very densely spaced, making it difficult to detect any trends in this evidence value range. To spread the most frequent evidence values over a relatively wider range, we raised the values to the power of 1/3 prior to histogram computation. For FoO data, careful selection of the fitted function is required, since, due to the limited number of data points, noise and outliers are induced in the data by the variability in the geophysical and geological properties of the occurrences, the incompleteness of the geological data and the non-specific determination of the occurrence locations (right-hand plots in Figure 4). If the FoO plot is very scattered and there appears to be no correlation between the evidence feature values and FoO, the evidence feature should not be used in fuzzy integration. We chose to use two alternative functions to describe the membership of the evidence: logistic (Equation 1) and Gaussian (Equation 2) functions. The logistic function effectively represents the favourability of evidence, which has a continuously increasing or decreasing response to mineral occurrences. The Gaussian function is used for dependences that have a minimum or maximum response somewhere within the range of the data and the maximum or minimum, respectively, in the tails of the data range. Both logistic and Gaussian functions are implemented in commercial software, for instance in the ArcGIS (ESRI) and advangeo® (Beak Consultants, Germany) fuzzy membership tools. Since the data are noisy, some of the fitted function parameters had to be regularized for some evidence features to obtain reasonable fitted functions. Regularization was carried out by restricting selected parameters within 10% of the given initial value.

The logistic function can be defined as μL (𝑥) = 𝑎𝑚𝑖𝑛 +

𝑎𝑚𝑎𝑥 − 𝑎𝑚𝑖𝑛 1 + e−𝑠(𝑥−𝑘)

(1)

where amin is the minimum and amax the maximum function value, x is the value of the evidence feature, s is the spread and k the midpoint. For decreasing and increasing functions, s has to be negative and positive, respectively. The Gaussian function is defined as μG (𝑥) = 𝑎 + (𝑏 − 𝑎) ∗ e



(x−k)2 2σ2

(2)

where a is the value the function approaches for x → ±∞, k is the mean, σ is the standard deviation and b the function value at k. Parameter values were obtained by fitting the selected function to the FoO data using the Levenberg-Marquard minimization method. We also tested the fitting of spline functions, but they tend to overfit the data, also fitting noise. However, spline functions may better describe than Gaussian and logistic functions how the current data set responds to the known deposits, so its use should be studied further. In Figure 4, we present the Gaussian and logistic function fits to FoO data for all the evidence features. When interpreting the histograms, it has to be borne in mind that the raster value on the x-axis is not the original value of the preprocessed evidence data, but is raised to the power of 1/3, as explained above. For the distance from structures evidence (pseudogravimetric, gravimetric and manually interpreted), it is quite natural to choose the continuously decreasing logistic function, since the favourability for an orogenic gold occurrence is expected to increase with closer proximity to structures. This assumed behaviour is in agreement with the frequency of occurrence values. For the electromagnetic real component, EMRe, and the ratio of real and imaginary components, EMRe/EMIm, there is an expected minimum in the frequency of occurrences at a raster value of zero, which corresponds to non-conductive material. The increased frequency of occurrences at high values of EMRe and EMRe/EMIm is also in agreement with the conceptual model, suggesting elevated conductivity in orogenic gold deposits. The reason for the increased frequency of occurrences at low values of EMRe and EMRe/EMIm is not obvious. Since the data are scattered, it might simply be a spurious feature. However, it can also be thought of as representing magnetite-bearing host rocks. The true reason for the increase is a topic of future studies, and here we chose to use a Gaussian membership transformation function that reproduces the increase. The imaginary component of the electromagnetic response, EMIm, is difficult to interpret when discussing the occurrence of orogenic gold. However, due to the clear dip in the frequency of occurrences, we chose to include the evidence in the fuzzy

integration and modelled the fuzzy membership using a Gaussian transformation function. Due to scattered data and the lack of explanation for the minimum in the frequency of occurrences, EMIm was not given much weight in the fuzzy integration. The relation of the magnetic evidence data to the frequency of occurrences does not show a clear trend but, again, a minimum at a data value of -10. The large magnetic evidence values correspond to the negative EMRe values and may be related to magnetic host rocks in the occurrences. The large frequency of occurrence values at low magnetic evidence values are probably connected to structures in which the magnetism is diminished. Membership grids were further scaled within the range [0,1] by manually inspecting how clearly the minimum and maximum FoO was represented by the data, i.e., how noisy the data were. Most of the evidence features were scaled not to cover the entire range from 0 to 1. Albitite memberships were not defined using the frequency of occurrence distribution due to the spatially biased sampling; the fitting procedure would falsely give low prospectivity values to all the regions with no outcrop or drill core analyses. Albitite membership was computed as follows: 1. The distance from albitite occurrences was transformed to a membership map using the logistic transformation function (Eq. 1) with s = -3, k = 2000 m, a = 0 and L = 1; 2. The distance from albitite non-occurrences was transformed to a membership map using the logistic transformation function (Eq. 1) with s = 3, k = 500 m, a = 0 and L = 1; 3. The non-albitite membership map (step 2) was rescaled to cover a value range [0,0.5]; 4. Albitite and non-albitite membership maps (steps 2 and 3) were combined with the fuzzy overlay function fOR(alb) (Section 6.2), where alb is a two-element vector containing the membership of distance from albitite occurrences (step 1) and the rescaled membership of distance from nonalbitite occurrences (step 3) in each map cell. The final albitite occurrence membership values range from 0 (non-occurrence of albitite) to 1 (albite occurrence). Regions with no observations receive a value of around 0.5.

6.2 Fuzzy integration functions We used the following functions in each map cell to combine the membership values x: fOR(x) = max(x) - Takes the largest value of all the input rasters for each cell. fPROD(x) = ∏𝑖 𝑥𝑖 - Calculates the product of values of all the input rasters for each cell. Tends towards small values, even if one of the inputs has a small value. fPROD(x) = 0, even if one of the inputs has the value of 0.

fSUM(x) =1- ∏𝑖(1 − 𝑥𝑖 ) – Calculates the product of non-favourabilities of all the input rasters and subtracts this from unity for each cell. Tends towards large values, even if one of the inputs has a large value. fSUM(x) = 1, even if one of the inputs has the value of 1. fGAMMA(x, γ) = fPROD(x)1-γ * fSUM(x)γ { 0 ≤ γ ≤ 1 } - Exponent weighted product of fPROD(x) and fSUM(x).

6.3 Prospectivity maps and validation Prospectivity computation by integration of the fuzzy memberships was performed for the prospective tract (Section 5), and the resulting prospectivity map displays the surrounding areas, accordingly, as nonprospective (Figure 7). The extent of the prospectivity map is constrained by the limited extent of the gravity data, since we used the fuzzy gamma integration function (Section 6.2), which cannot deal with missing data. Other commonly used fuzzy functions include the fuzzy or and fuzzy and functions, which are useful for evidence features that are considered to be conditionally dependent with respect to known occurrences. In this case, however, methods that allow more complex modelling of the relation between the evidence data and the occurrence of minerals, such as neural networks and random forests, might be more suitable. Different weights for the fuzzy product and fuzzy sum parts of the fuzzy gamma function were tested, and finally they were given equal weights based on the goodness of the solution. This choice corresponds to both the negative and positive memberships being considered equally informative and important for all the evidence features. The goodness of the solution was estimated using the idea of the receiver operating characteristics (ROC) curve and area under the ROC curve (AUC). The ROC curve is a plot of the true positive rate (TP) vs. false positive rate (FP) and AUC is ROC integrated from FP = 0 to FP = 1. AUC values range from 0 to 1, and AUC = 0.5 corresponds to the worst possible model, which has a 50% probability of being correct. AUC = 1 corresponds to a perfect model. AUC < 0.5 values should not exist, since this suggests that the prospectivity scores should be reversed, referring to a completely false setting of the problem. ROC computation is not used in prospectivity modelling as it was originally defined, as we do not really have known negative occurrences that are needed for computing FP. One alternative for defining negative occurrences is to use deposit types other than the one of interest (Nykänen, 2008). We followed the approach of Nykänen et al. (2015), who used randomly sampled points in the study region as negative occurrences, assuming that the majority of the study region is non-prospective. In this approach, FP represents the portion of the study area covered by the prospective area, and the ROC curve describes how strongly the known deposits are concentrated on high prospectivity values; the higher the AUC value, the more restricted an area is sufficient to contain the majority of the known deposits. AUC does not depend on the number of negatives as long as there is a sufficient number of them, and they are uniformly distributed in the study area. For validation of the prospectivity map, we used all the known occurrences to compute TP. We also used all the occurrences to define the fuzzy membership transformation functions. The validation metric is therefore

the success rate of the map. The AUC value for the prospectivity map (Figure 7) was 0.8. The training points that do not settle on the high prospectivity values are the mostly poorly known, and some are not likely to represent orogenic gold occurrences, which probably explains the rather low AUC value for the success rate. We computed the prospectivity map using a range of values for the γ parameter in the fuzzy gamma function, and computed the AUC value for each solution. The best solution was obtained with γ = 0.5. Validation using the prediction rate instead of the success rate would be preferable, but it would require the known occurrences to be divided into training points and validation points. However, the number of known deposits is so low and the spread in the characteristics of the deposits so wide that omitting a large proportion of the training points from the membership fit procedure (Section 5.1) would only reduce the goodness of the fit to the frequency of occurrence data and allow for a wider range of solutions, making it difficult to define the membership function. Leave-one-out or leave-a-pair-out cross-validation, on the other hand, did not affect the fitted fuzzy membership transformation function parameters. The range of prospectivity values in the fuzzy integrated map, or any other prospectivity map, does not directly provide probabilities for the existence of a mineral occurrence, but indicates which area is more favourable for ore formation than the others. Using a prospectivity map along with additional geological knowledge, the most promising locations for further exploration can be selected.

7 Discussion and Conclusions In this study, we used known mineral occurrences to empirically define the relationship between geoscientific evidence data and the occurrence of minerals. We experimented how known occurrences can be used for 1) transforming continuous geoscientific evidence data to fuzzy memberships for data integration and 2) finding non-prospective clusters in clustered geoscientific data. The aim of the paper is to boost the development and use of computational methods in prospectivity modelling to make the modelling process faster and easier to track. Moreover, some of the structures in the data are not visible to an expert eye but can be detected using the computational methods of data mining. Clustering geoscientific evidence data to delineate areas with similar bedrock properties, and training the clusters using known mineral occurrences, provided a robust method for identifying non-prospective areas, i.e., areas that are unlikely to contain deposits of the mineral of interest. The non-prospective area for orogenic gold occurrences defined in this study is in agreement with expert interpretation: the dominant data values in clusters that contain no deposits do not indicate favourability for orogenic gold deposition based on expert knowledge. A clear division into “non-prospective” and “at least somewhat prospective” is not only important for land-use planning, but it also focuses the prospectivity score computation on essential areas. A drawback when clustering the data is that some interesting anomalous features may be missed.

However, the self-organizing maps method, used in this study for data clustering, can be used to locate the anomalous clusters if these are of interest. The prospectivity model for the prospective tract was generated using fuzzy integration of membership grids. Defining the membership transformation function for the evidence features using the distribution of known deposits in the evidence data space was tested, and it can be considered advantageous to be able to guide the choice of the transformation function parameters using the ground truth. The large scatter in the frequency of occurrences (Fig. 4), however, is challenging in the interpretation of certain frequency values as outliers or as correct values caused by the varying favourability of a mineral occurrence. Overfitting the membership transformation function to the frequency of occurrence data is a risk that has to be controlled, for instance, by selecting a simple membership function and by restricting the fitted parameter values. Noise reduction of the frequency of occurrences data should be studied to be able to use more versatile fitted functions, for instance splines, since the real shape of the membership function is unlikely to be symmetrical for all the evidence features. Moreover, computational methods should be developed for the definition of the fuzzy membership ranges based on, for instance, the goodness of the membership function fit. As a conclusion, the frequency of occurrence data applied in this study cannot be used to automatically define the memberships, but can be used to guide the determination of membership function parameters. In addition to defining the membership function, investigation of frequency of occurrence vs. evidence data plots (Fig. 4) can be used to evaluate the goodness of the choice of the training data, the validity of the ore model and knowledge of how certain evidence features respond to the occurrence of the mineral. The plots are useful, no matter what method of data integration is applied to generate the prospectivity model. In particular, methods such as fuzzy logic, used in this study, which assume each evidence feature to more or less independently have a certain dominating dependence on the occurrence of the mineral, do not work if such a dependence does not exist in the chosen data set. In Section 5.1, we discuss the shape of the frequency of occurrence plots computed in this study and find relations that would not have been reproduced if the fuzzy membership transformation functions had been defined based on expert knowledge alone. For instance, the magnetic evidence data have a high frequency of occurrence of high data values, which would not have been taken into account if the membership function was defined based on the assumption that only low magnetic values are favourable (representing structures). The validity of the interpretation that the high frequency of occurrence values for high magnetic evidence data values (or, correspondingly, for negative electromagnetic real component values) are related to magnetite-bearing ore-forming environments is a subject for future study. The fuzzy logic approach is safe in its simplicity, and it is easy to understand what the prospectivity scores are derived from. Also, the uncertainty of the solution is easy to qualitatively understand and evaluate. When

using more complicated models and machine learning-based methods, overfitting is a risk and it is challenging to find a meaning for the fitted parameters and their quantitative uncertainties. However, many different types of machine learning-based approaches exist with varying model complexity. In fact, our approach of using unsupervised clustering and classification of the clusters using known mineral occurrences is in many ways similar to machine learning approaches that are based on data classification using training data. Machine learning approaches are effective, and their use in MPM should continue to be studied.

8 Acknowledgements The authors wish to thank Hanna Leväniemi and Jouni Lerssi for fruitful discussions on the geophysical signatures connected with orogenic gold deposits. This study was mainly supported by a joint project between the Geological Survey of Finland and the University of Oulu entitled “Mineral Systems and Mineral Prospectivity in Finnish Lapland – MINSYSPRO” (number 281670), which was funded by the Academy of Finland, and partly by a personal grant to the first author from the K.H. Renlund Foundation.

9 References Abedi M., Norouzi G.H. and Torabi S.A., 2012. Clustering of mineral prospectivity area as an unsupervised classification approach to explore copper deposit, Arabian Journal of Geosciences, 6 (10), pp. 3601-3613. Agterberg F.P., 1992. Estimating the probability of occurrence of mineral deposits from multiple map patterns, in: The use of microcomputers in geology, Merriam D.F. and Kurtzl H. (eds.), Plenum Press, New York, pp. 73-92. Airo M-L., 2005. Aerogeophysics in Finland 1972-2004. Methods, system characteristics and applications, Geological survey of Finland, Special Paper, 39. Airo M-L., 2015. Geophysical signatures of mineral deposit types – synopsis, Geological survey of Finland, Special Paper, 58, pp. 9-70. An P., Moon W.M. and Rencz A., 1991. Application of fuzzy set theory to integrated mineral exploration, Canadian Journal of Exploration Geophysics, 27 (1), pp. 1-11. Bação F., Lobo V., Painho M., 2005. Self-organizing maps as substitutes for k-means clustering, in: Computational Science – ICCS 2005. ICCS 2005. Lecture Notes in Computer Science, 3516, Sunderam V.S., van Albada G.D., Sloot P.M.A., Dongarra J. (eds)., Springer, Berlin, Heidelberg, pp. 476-483. Bierlein F.P., Murphy F.C., Weinberg R.F. and Lees T., 2006. Distribution of orogenic gold deposits in relation to fault zones and gravity gradients: targeting tools applied to the Eastern Goldfields, Yilgarn Craton, Western Australia, Mineralum Deposita, 41, pp. 107-126.

Blakely R., 1995. Potential theory in gravity & magnetic applications, Cambridge University Press, pp. 316326. Bonham-Carter G. F., Agterberg F. P. and Wright D. F., 1989. Weights of evidence modeling: A new approach to mapping mineral potential, in: Statistical Applications in the Earth Sciences, F. P. Agterberg, et al. (ed.), Canadian Government Publishing Centre, Ottawa, pp. 171-183. Carranza E. J., 2017. Natural Resources Research publications on geochemical anomaly and mineral potential mapping, and introduction to the special issue of papers in these fields, Natural Resources Research, 26 (4), pp. 379-410. Cheng Q. and Agterberg F.P., 1999. Fuzzy weights of evidence method and its application in mineral potential mapping, Natural Resources Research, 8 (1), pp. 27-35. Eilu P., Sorjonen-Ward P., Nurmi P. and Niiranen T., 2003. A Review of Gold Mineralization Styles in Finland, Economic Geology, 98, pp. 1329–1353. Eilu P., Pankka H., Keinänen V., Kortelainen V., Niiranen T. & Pulkkinen E., 2007. Characteristics of gold mineralisation in the greenstone belts of northern Finland, Geological Survey of Finland, Special Paper, 44, pp. 57−106. Elo S., 2013. Gravimetrian uudet standardit ja GTK:n päivitetty APV-rekisteri (New standards for gravimetry and the updated gravity database of GTK), in: Sovelletun geofysiikan XIX neuvottelupäivät 24.-25.9.2013. Sarja B, Nro 96, Eurajoki (in Finnish). Goldfarb R. J., Bakert T., Dubé B., Groves D. I., Hart C. J. R. & Gosselin P. 2005. Distribution, character, and genesis of gold deposits in metamorphic terranes, Economic Geology 100th Anniversary Volume, pp. 407 – 450. Grant, F.S. and Dodds, J., 1972. MAGMAP FFT processing system development notes, Paterson Grant and Watson Limited, Canada. Groves D. I., Goldfarb R. J., Gerbe-Mariam M., Hagemann S. G., Robert F. & Arne D. C. 1998. Orogenic gold deposits; a proposed classification in the context of their crustal distribution and relationship to other gold deposit types: mesothermal gold mineralization in space and time, Ore Geology Reviews, 13, pp. 7 – 27. Hautaniemi H., Kurimo M., Multala J., Leväniemi H., and Vironmäki J., 2005. The ”Three In One” aerogeophysical concept of GTK in 2004, in: Aerogeophysics in Finland 1972-2004 - Methods, System Characteristics and Applications, Airo M-L (ed.), Geological Survey of Finland, Special paper, 39, pp. 21-74. Holyland P.W. and Ojala J., 1997. Computer aided structural targeting: two and three dimensional stress mapping, Australian Journal of Earth Sciences, 44, pp.421-432.

Hornby P., Boschetti F. and Horowitz F.G., 1999. Analysis of potential field data in the wavelet domain, Geophysical Journal International, 137 (1), pp. 175-196. Hölttä P., Väisänen M., Väänänen J., Manninen T., 2007. Paleoproterozoic metamorphism and deformation in Central Lapland, Finland, Geological Survey of Finland, Special Paper, 44, 7-56. Hölttä P. and Heilimo E., 2017. Metamorphic map of Finland, in: Bedrock of Finland at the scale 1:1000000 – Major stratigraphic units, metamorphism and tectonic evolution, Nironen M. (ed.), Geological survey of Finland, Special Paper, 60, pp. 77-128. Knox-Robinson C.M. and Wyborn L.A.I., 1997. Towards a holistic exploration strategy: Using geographic information systems as a tool to enhance exploration, Australian Journal of Earth Sciences, 44, pp. 453-463. Kohonen T., 2001. Self-Organizing Maps, Third Extended Edition, Springer Series in Information Sciences, 30. Korhonen J.V., 2005. Airborne magnetic method: special features and review on applications, Geological Survey of Finland, Special paper, 39, pp. 77-102. Korja A., Lahtinen R., Nironen M., 2006. The Svecofennian orogen: a collage of microcontinents and island arcs, in: European Lithosphere Dynamics, Gee D.G., and Stephenson R.A. (eds). Geological Society, London, Memoirs, 32, pp. 561-578. Kääriäinen J. and Mäkinen J., 1979. The 1979-1996 gravity survey and results of the gravity survey of Finland 1945-1996, Publications of the Finnish Geodetic Institute, 125. Lahti I., Nykänen V. and Niiranen T., 2014. Gravity worms in the exploration of epigenetic gold deposits: New insight S into the prospectivity of the central Lapland Greenstone Belt, northern Finland, Geological Survey of Finland, Report of Investigation, 209, pp. 8-17. Lahtinen R., Korja A. and Nironen M., 2005. Paleoproterozoic tectonic evolution, in: Lehtinen, M., Nurmi, P.A., Rämö, O.T. (eds.), Precambrian geology of Finland: Key to the evolution of the Fennoscandian Shield, Elsevier, Amsterdam, pp. 481–532. Lehtonen M., Airo M.-L., Eilu P., Hanski E., Kortelainen V. and Lanne E., 1998. The stratigraphy, petrology and geochemistry of the Kittilä greenstone area, northern Finland, in: Report of Investigation, 140, Geological survey of Finland, pp. 140-144. Lusty P., Scheib C., Gunn A. and Walker A., 2012. Reconnaissance-scale prospectivity analysis for gold mineralization in the southern Uplands-Down-Longford terrane, northern Ireland, Natural Resources Research, 21 (2), pp. 359-382.

Mäkinen J., Sekowski M. and Krýnski J., 2010. The use of the A10-020 gravimeter for the modernization of the Finnish First Order Gravity Network, Geoinformation Issues, 2 (1), pp. 17-29. McCuaig, T. C., & Kerrich, R. 1998. P-T-t-deformation-fluid characteristics of lode gold deposits: evidence from alteration systematics, Ore Geology Reviews, 12, 381–453. Molnár F., O’Brien H., Lahaye Y., Kurhila M., Middleton A. and Johanson B., 2017. Multi-stage hydrothermal processes and diverse metal associations in orogenic gold deposits of the Central Lapland greenstone belt, Finland, Proceedings of the 14th SGA Biennial meeting, 20-23 August 2017, Québec City, Canada, pp. 63-66. Molnár F., Middleton A., Stein H. and Johansson B., 2018. Repeated syn- and post-orogenic gold mineralization events between 1.92 and 1.76 Ga along the Kiistala Shear Zone in the Central Lapland Greenstone Belt, northern Finland, Ore Geology Reviews, 101, pp. 936-959. Nabighian, M.N., 1984. Toward a three-dimensional automatic interpretation of potential field data via generalized Hilbert transforms: Fundamental relations, Geophysics, 49 (3), pp. 780-786. Niiranen T., Lahti I. and Nykänen V., 2014. 3D model of the Kittilä terrane and adjacent structures, in: Central Lapland Greenstone Belt 3D modeling project, Final Report, Niiranen T., Lahti I., Nykänen V. and Karinen T. (eds), Geological Survey of Finland, Report of Investigation, 209, pp. 27-41. Nykänen V., 2008. Radial basis functional link nets used as a prospectivity mapping tool for orogenic gold deposits within the Central Lapland greenstone belt, northern Fennoscandian shield, Natural Resources Research, 17 (1), pp. 29-48. Nykänen V. and Salmirinne H., 2007. Prospectivity analysis of gold using regional geophysical and geochemical data from the central Lapland greenstone belt, Finland, in Gold in the Central Lapland Greenstone Belt, Ojala J. (ed.), Geological Survey of Finland, Special Paper, 44, pp. 251-269. Nykänen V., Groves D., Ojala J., Eilu P. and Gardoll S., 2008a. Reconnaissance-scale conceptual fuzzy-logic prospectivity modelling for iron oxide copper-gold deposits in the northern Fennoscandian Shield, Finland, Australian Journal of Earth Sciences, 55, pp. 25-38. Nykänen V., Groves D., Ojala J. and Gardoll S., 2008b. Combined conceptual/empirical prospectivity mapping for orogenic gold in the northern Fennoscandian Shield, Finland, Australian Journal of Earth Sciences, 55, pp. 39-59. Nykänen V., Lahti I., Niiranen T. and Korhonen K., 2015. Receiver operating characteristics (ROC) as validation tool for prospectivity models – A magmatic Ni-Cu case study from the Central Lapland Greenstone Belt, Northern Finland, Ore Geology Reviews, 71, pp. 853-860.

Nykänen V., Niiranen T., Molnár F., Lahti I., Korhonen K., Cook N. and Skyttä P., 2017. Optimizing a knowledgedriven prospectivity model for gold deposits within Peräpohja Belt, Northern Finland, Natural Resources Research, 26, p. 571-584. Parsa M., Maghsoudi A. and Yousefi M., 2017. An improved data-driven fuzzy mineral prospectivity mapping procedure; cosine amplitude-based similarity approach to delineate exploration targets, International Journal of Applied Earth Observation and Geoinformation, 58, pp. 157-167. Phillips G.N. and Powell R., 2010. Formation of gold deposits: a metamorphic devolatilization model, Journal of Metamorphic Geology, 28, pp. 689-718. Porwal, A., Carranza, E. J. M., & Hale, M., 2003. Knowledge-driven and data-driven fuzzy models for predictive mineral potential mapping, Natural Resources Research, 12, pp. 1–25. Roy I.G, Henson P.A., Blewett R.S., 2010. Application of Potential Field Methods over the Eastern Goldfields Superterrane (EGST) of Western Australia, Geoscience Australia, Canberra. Salminen R. and Tarvainen T., 1995. Geochemical mapping and databases in Finland, Journal of Geochemical Exploration, 55, pp. 321-327. Singer D., 1993. Basic Concepts in Three-Part Quantitative Assessments of Undiscovered Mineral Resources, Natural Resources Research, 2, pp. 69-81. Suppala I., Oksama M. and Hongisto H., 2005. GTK airborne EM system: characteristics and interpretation guidelines. Geological survey of Finland, Special paper 39, pp. 103-118. Torppa J., Middleton M., Hyvönen E., Lerssi J. and Fraser S., 2015. A novel spatial analysis approach for assessing regional-scale mineral prospectivity in northern Finland, in: Novel technologies for greenfield exploration, Sarala P. (ed.), Geological Survey of Finland, Special Paper, 57, pp. 87-120. Verduzco B., Fairhead J.D., Green C.M., and MacKenzie C., 2004. New insights into magnetic derivatives for structural mapping, The Leading Edge, 23, pp. 116–119. Vesanto J. and Alhoniemi E., 2000. Clustering of the self-organizing map, IEEE transactions on neural networks, 11 (3), pp. 586-600. Weihed P., Arndt N., Billström K., Duchesne J.-C., Eilu P., Martinsson O., Papunen H. and Lahtinen R., 2005. Precambrian geodynamics and ore formation: the Fennoscandian Shield, Ore Geology Reviews, 27, pp. 273322. Wyche N.L., Eilu P., Koppström K., Kortelainen V.J., Niiranen T., Välimaa J., 2015. The Suurikuusikko gold deposit (Kittilä Mine), northern Finland, in: Mineral deposits of Finland, Maier, W. D., Lahtinen, R. and O`Brien, H. (eds.), Elsevier, Amsterdam, Netherlands, pp. 411-434.

Yousefi M., Kamkar-Rouhani A. and Carranza E.J., 2012. Geochemical mineralization probability index (GMPI): A new approach to generate enhanced stream sediment geochemical evidential map for increasing probability of success in mineral potential mapping, Journal of Geochemical Exploration, 115, pp. 24-35. Yousefi M. and Nykänen V., 2016. Data-driven logistic-based weighting of geochemical and geological evidence layers in mineral prospectivity mapping, Journal of Geochemical Exploration, 164, pp. 94-106. Zhang N., Zhou K. and Du X., 2017. Application of fuzzy logic and fuzzy AHP to mineral prospectivity mapping of porphyry and hydrothermal vein copper deposits in the Dananhu-Tousuquan island arc, Xinjiang, NW China, Journal of African Earth Sciences, 128, p. 84-96.

Figure captions Figure 1: Lithological map of the study area in the Central Lapland Greenstone Belt. SSZ= Sirkka shear zone and KSZ=Kiistala shear zone. Coordinate system is EUREF-FIN. Figure 2: The frequency of occurrences in each cluster obtained from the SOM and k-means clustering of the evidence data. Color of the circles and the cluster number below each circle correspond to the right most list of occurrences showing the names of the occurrences in each cluster, starting from the least prospective one. Numbers and their color above the circles represent the known occurrences, divided into gold only occurrences (red) and gold occurrences with atypical metal associations (blue). Color of the squares surrounding the plot points corresponds to the map coloring in Figure 3. Figure 3: Clustered evidence data in the study region, colored according to prospectivity of the clusters. Clusters are color coded according to the frequency of occurrences as shown in Figure 2. The prospective tract is defined by the red, orange and yellow regions. Figure 4: Evidence feature histograms and frequency of occurrences plots using quantile binning of evidence values. Left hand plots = histogram of evidence values over the entire study area, center plots = histogram of evidence values at known occurrences, and right hand plots = frequency of occurrences. The raster value is the evidence data value raised to the power 1/3 (see text). Figure 5: Evidence features for the entire study area. In the metamorphic facies map, the lowest values refer to greenschist facies, the second highest to high amphibolite facies and the highest values to the undefined class. Coordinate system is EUREF-FIN. Figure 6: Fuzzy membership grids for the prospective tract. Coordinate system is EUREF-FIN. Figure 7: Prospectivity map for the study area. Grey color is the region defined as non prospective based on clustering of the evidence data. Blue color is the least prospective, and red the most prospective part of the prospective tract region. Black dots are the known occurrences. White areas are regions with missing gravity data inside the prospective tract. Coordinate system is EUREF-FIN.

Table captions Table 1: List of known deposits used as training points. Associated other metals are listed for deposits with atypical metal associations. Metamorphic grade (MG) classes: 1=greenschist, 2=low amphibolite, 3= middle amphibolite, 4=high amphibolite. Clusters (SC): 1=prospective, 2=semi-prospective. Table 2: Statistics of instrumentation in the national airborne geophysical survey for the magnetic and electromagnetic measurements. Nm = number of magnetometers, C/P = magnetometer type (C = caesium, P = proton), Fm = registration frequency of the magnetic field, D = coil distance, F = frequency of the primary field, M = moment, Fem = registration frequency of the electromagnetic field. Table 3: Evidence features included in fuzzy integration.

Figure 1.

Figure 2.

Figure 3.

Figure 4

Figure 5.

Figure 6.

Figure 7.

Table 1. Orogenic gold deposits with atypical metal associations

Orogenic gold deposits with gold only

MG

SC

Deposit

Other elements

MG

SC

Ahvenjärvi

1

2

Aakenusvaara

U

1

2

Hakokodanmaa

1

1

Harrilompolo

Cu

1

2

Hirvilavanmaa

1

1

Kaaresselkä

Cu

1

2

Hookana

1

2

Kittilän Palovaara

Cu

1

2

Kaarestunturi

1

2

Kirakka-aapa

Co-Cu-Ni

3

2

Kellolaki

1

2

Koppelokangas

Cu

1

2

Kiekerömaa

2

2

Kutuvuoma

Cu

1

1

Kittilän Hanhilampi

1

1

Lammasvuoma

Ni

1

2

Kuotko

1

2

Loukinen

Cu-Ni

1

2

Mustajärvi

1

2

Mantovaara

Zn

1

1

Mäkärärova

4

2

Muusanlammit

Cu-Ag

1

2

Paha

1

2

Naakenavaara

Cu-Co-Ni

1

2

Pahtavaara

2

1

Pittarova B

Cu

3

2

Palokiimaselkä

4

2

Päivänenä

Cu

1

1

Palolaki

1

1

Riikonkoski

Cu

1

1

Pikku-Mustavaara

1

1

Saattopora

Cu

1

1

Rovaselkä

3

2

Sirkka kaivos

Co-Cu-Ni

1

2

Ruoppapalo

3

2

Sirkka W

Cu

1

2

Ruosselkä

3

2

Sukseton

Cu

3

2

Soretialehto

1

1

Tuongankuusikko

Cu-Ni-Co

1

1

Soretiavuoma N

1

1

Suurikuusikko

1

1

Table 2. Magnetometers

Electromagnetics

Year

Nm

C/P

Fm/s-1

D /m

F /Hz

M

Fem/s -1

1975-77

2

P

2

25.8

3220

127

2

105

4

115/55

4/4

1978-79

25.0

1980

21.44

3222

1981–83

21.36

3113

1984-88

3

1989-91

2

1992-95

4

C

1996

3125/14368

1997-98

8

1999-2000

8/10

2001-04

10

Table 3. Description

Membership range

Geophysics Bouguer anomaly gradient maxima

0.15-0.85

Electromagnetic real component

0.05-0.95

Electromagnetic imaginary component

0.3-0.7

Electromagnetic real - imaginary ratio

0.05-0.95

Distance to pseudogravimetric worms, derived from

0.2-0.9

magnetic airborne input data Deviation from the Definitive Magnetic Reference Field

0.3-0.7

reduced to north pole Geology Distance to all structures

0.2-0.9

Weighted Inverse distance to albite and distance to

0.0-1.0

non-albite occurrences Metamorphic grade

0.0-1.0

Training data All orogenic gold deposits

Reserach highlights -

Empirical fuzzy membership determination is a powerful and easily traceable method Delineation of the permissive tract is efficient using trained data clusters Albitites are a strong positive evidence for orogenic gold occurrence Metamorphic grade is a strong positive evidence for orogenic gold occurrence

Clustered data

Clustering

Training points and masked data

Permissive area mask

Membership transformation Fuzzy gamma membership integration