Prediction of habitat suitability of Morina persica L. species using artificial intelligence techniques

Prediction of habitat suitability of Morina persica L. species using artificial intelligence techniques

Ecological Indicators 112 (2020) 106096 Contents lists available at ScienceDirect Ecological Indicators journal homepage: www.elsevier.com/locate/ec...

17MB Sizes 2 Downloads 56 Views

Ecological Indicators 112 (2020) 106096

Contents lists available at ScienceDirect

Ecological Indicators journal homepage: www.elsevier.com/locate/ecolind

Prediction of habitat suitability of Morina persica L. species using artificial intelligence techniques

T

Fateme Ghareghan, Gholamabbas Ghanbarian , Hamid Reza Pourghasemi, Roja Safaeian ⁎

Department of Natural Resources and Environmental Engineering, School of Agriculture, Shiraz University, Shiraz, Iran

ARTICLE INFO

ABSTRACT

Keywords: Maximum entropy Support vector machine Generalized linear model Boosted regression trees Habitat suitability Morina persica

The Morina genus has 13 species in the world, out of which only M. persica L. is found to be growing wild in Iran. The aim of this research is to predict the spatial distribution and model the habitat suitability for M. persica species using four data mining models: maximum entropy (MaxEnt), support vector machine (SVM), generalized linear model (GLM), and boosted regression trees (BRT). A total of 404 M. persica locations were identified during extensive field surveys, and their geographical locations were recorded using a global positioning system (GPS) device. Furthermore, seventeen environmental predictors including topographical, geological, climatic, and edaphic factors were selected, and their thematic layers were mapped in ArcGIS. Lastly, habitat suitability was modeled using data mining techniques. The validity of the results was assessed using the area under the receiver operating characteristic curve (AUROC). Moreover, three cutoff-dependent metrics, Cohen’s kappa, sensitivity, and specificity, were used for more scrutinized performance assessment. The results revealed that the highest effects on M. persica distribution were mostly associated with edaphic factors, followed by climatic, lithological, and topographical factors. The results showed that MaxEnt with an AUROC value of 95% showed an outstanding performance in terms of prediction power and generalization capacity, followed by SVM (94.1%), GLM (87.4%), and BRT (84.7%). Comparing the AUROC values, MaxEnt was selected as the premier model with the best performance for M. persica distribution modelling across the study area. Cutoff-dependent metrics were also in line with AUROC values; however, the latter made a more discernible distinction between the performance of SVM and MaxEnt. The GLM, BRT, SVM, and MaxEnt models classified 37.37%, 27.28%, 23.31%, and 6.51% of the study area as high and very high suitable habitats for M. persica, respectively. The inferences of this research would be of interest to authorities in the natural resources sector, the research community, local stakeholders, and biodiversity conservation agencies for use in conserving and reclaiming M. persica habitats in the study area.

1. Introduction The Morina genus (Morinaceae family) has 13 species in the world, extending from southern Europe to the Himalayas in Nepal, Bhutan, China, and further in the east and north of the Qinghai – Tibetan plateau. Out of 13 species, only M. persica L. grows wild in Iran (Mozaffarian, 1996; Aghabeigi, 2009). In general, these plants grow in different habitats such as rock edges, alpine meadows, dry slopes, margins of pine forests, and even swamps at high altitudes up to 4,200 m and even higher (Bell and Donoghue, 2003; Joshi, 2013; Kumar et al., 2013). The genus Morina has been widely used in Chinese traditional medicine for the treatment of cerebral apoplexy, arthralgia, lumbago, megrim, and tumors. More applications in traditional systems of



medicine have also been reported (Joshi, 2013; Mocan et al., 2016). Owing to their sweet and astringent taste and heating potency, the stem, leaves, and flowers are used in Tibetan medicine. Moreover, Indian traditional medicine applies the roots and flowers as a poultice on boils and wounds and for unconsciousness. The use of the plant in veterinary medicine as a treatment for maggot wounds is also documented (Kumar et al., 2013). In Iran, M. persica is known as “bride’s thorn” by local villagers and for its uses in traditional medicine as well as ornamental and soil conservational values (Ghareghan, 2018). Recent conditions of climate change have taken the plants, particularly the rare ones, to the brink of extinction. Many of the plants have been destroyed before they were documented or their beneficial values were realized (Deb et al., 2017). Hence, plant conservation has become a pivotal topic. In this regard, species’ distribution modelling is an

Corresponding author. E-mail addresses: [email protected], [email protected], [email protected] (G. Ghanbarian).

https://doi.org/10.1016/j.ecolind.2020.106096 Received 23 August 2019; Received in revised form 26 November 2019; Accepted 8 January 2020 1470-160X/ © 2020 Elsevier Ltd. All rights reserved.

Ecological Indicators 112 (2020) 106096

F. Ghareghan, et al.

important tool for conservation endeavors (Deb et al., 2017). This branch of modeling is receiving due attention in various subfields of ecology and being widely used in many ecological studies (Kumar and Stohlgren, 2009). These models show the relationship between the occurrence of plants and the environmental and climatic factors (Kumar and Stohlgren, 2009; Hosseinzadeh et al., 2018). Recent conditions of climate change have taken the plants, particularly the rare ones, to the brink of extinction. Many of the plants have been destroyed before they were documented or their beneficial values were realized (Deb et al., 2017). Hence, plant conservation has become a pivotal topic. In this regard, species’ distribution modelling is an important tool for conservation endeavors (Deb et al., 2017). This branch of modeling is receiving due attention in various subfields of ecology and being widely used in many ecological studies (Kumar and Stohlgren, 2009). These models show the relationship between the occurrence of plants and the environmental and climatic factors (Kumar and Stohlgren, 2009; Hosseinzadeh et al., 2018). Recent research has used different environmental variables related to the ecological characteristics of the studied species. Knudson et al. (2015) used digital elevation model (DEM), groundwater well observations, topographic wetness index (TWI), topographic position index (TPI), and distance to groundwater (DTG) as the set of indicators of Platanthera praeclara habitats in North Dakota, USA. Nine environmental variables (mean annual rainfall, mean annual temperature, elevation, slope, aspect, geology, pH, sand, and organic carbon) were used in the study of Artemisia aucheri in southern Iran (Ghanbarian et al., 2019). Habitat suitability modeling of Pinus sylvestris in the Iberian Peninsula was performed using both climatic (seasonal average temperature, seasonal precipitation, annual precipitation, annual average temperature, minimum average temperature of the coldest month, and maximum average temperature of the warmest month) and topographic (slope and aspect) predictor variables (Garzon et al., 2006). In other research, 26 conditioning factors including seven biophysical variables (aspect, elevation, land cover, river presence, slope, soil classification, and soil geology) and 19 bioclimatic variables were used in predicting geographical distribution and modeling of the suitable habitats of 14 selected threatened plant species in the Philippines (Garcia et al., 2013). Moreover, in the habitat suitability assessment of Schisandra sphenanthera, 15 conditioning variables of topography (elevation, slope, aspect), climate (minimum temperature of coldest month, maximum temperature of warmest month, average temperature of growth, precipitation of growth, relative humidity, sunshine hours), and soil (topsoil texture, organic carbon, pH, total nitrogen, total phosphorus, total potassium) were chosen and applied in China (Lu et al., 2012). A variety of modeling techniques are available for predicting habitat potentiality (i.e. suitability) for different species of plants (Kumar and Stohlgren, 2009). Several modeling methods have been used to assess habitat suitability (Kumar and Stohlgren, 2009; Gogol-Prokurat, 2011; Shiroyama and Yoshimura, 2016; Deb et al., 2017; Hosseinzadeh et al., 2018), landslide susceptibility (Chen et al., 2017; Pourghasemi and Rahmati, 2018), groundwater/qanat potentiality (Naghibi et al., 2016; Naghibi et al., 2018), gully erosion susceptibility (Pourghasemi et al., 2017), and forest fire susceptibility (Pourtaghi et al., 2016). Although the feasibility of machine-learning methods in ecological and environmental modeling has been approved, researchers’ main concern today is the selection of a more efficient model from among the existing approaches (Motevalli et al., 2018). Among the varied model ranges, data mining techniques including SVM, GLM, BRT, and MaxEnt have been used in many modelling attempts because of their capability of governing scarce data and using robust pattern recognition algorithms. The SVM model is based on statistical learning theory and has been applied in environmental studies (Shruthi et al., 2014; Rahmati et al., 2019); however, the number of reports on the performance of SVM in mapping suitable habitats for a plant species is limited. Generalized linear models (GLM) comprise one developed case of classical multiple regression allowing the modeling of

non-normal response alternatives (Guisan et al., 2002; Rupprecht et al., 2011). The application of GLM methods in species distribution models (SDM) has been approved and were reviewed in the literature (Franklin, 2010; Guisan and Zimmermann, 2000; Guisan et al., 2002; Franklin, 2010). The BRT approach is a machine-learning method differs from traditional regression methods and is not frequently used in plant species distribution modeling. This approach uses the technique of boosting data exploration and analysis to combine large numbers of relatively simple tree models and optimize predictive performance and has recently been introduced into ecological and biological scientific studies (De’ath, 2007; Elith et al., 2008; Martínez-Rincón et al., 2012; Soykan et al., 2014). Among these, maximum entropy is one of the commonly used models in different environmental studies due to its high accuracy and good performance (Matin et al., 2012; Yang et al., 2013; Khosravi et al., 2016; Javed et al., 2017). Not being sensitive to the sample size, being applicable to different spatial scales (from global to local), and performing well in an area with scarce data (which is the case in the current study) (Kumar and Stohlgren, 2009; Deb et al., 2017; Hosseinzadeh et al., 2018) are other reasons for which MaxEnt was chosen in this study for determining habitat suitability for the Morina persica L. species in comparison to the SVM, GLM, and BRT data mining/machine learning techniques. The evidence was insufficient to show suitable localities or potential areas for M. persica to exist in the study area. The current research investigated which type of species distribution models and/or habitat suitability maps may better describe the suitable habitats for M. persica and which environmental variables are more relevant for such aim. To that end, the researchers aimed to (1) map the distribution of M. persica; (2) identify the environmental factors associated with the presence (i.e. observation) of this species; and (3) predict and compare suitable habitats using MaxEnt, SVM, GLM, and BRT models. 2. Materials and methods Fig. 1 depicts the methodological flowchart of the current study which entails inventory mapping and training/validation splitting, selection, and preparation of the environmental predictors, modeling process, validation of the results, and selection of the better performing model (Fig. 1). 2.1. Study area The study area extends for 1,217 km2 in the southern parts of the Zagros Mountains to the west of Shiraz, the capital of Fars Province, Iran. It lies between longitudes 52° 06′ 23″ and 52° 21′ 36 “east and latitudes 29° 37′ 33″ and 30° 00′ 32″ north. The altitude varies between 1,472 and 3,132 m above sea level (Fig. 2). The study area receives an annual average of 390 mm precipitation and has an average temperature of 18 °C. The Mediterranean climate of the study area has a rainy season that extends from October to April and a dry season extending from May to September (IRIMO, 2016). Dominant plant species consist of shrubs such as Phlomis elliptica Benth., Convolvulus leiocalycinus Boiss, Ebenus stellata Boiss., and Astragalus sp. accompanied by perennial grass and forbs including Stipa barbata Desf., Bromus tomentellus Boiss., Hordeum bulbosum L., and Gundelia tournefortii L. 2.2. Data compilation 2.2.1. Target species The genus Morina (Morinaceae) with 13 species is distributed widely in various world habitats such as the southeast of Europe, Turkey, Syria, Lebanon, Afghanistan, Pakistan, the western parts of the Himalayas from Kashmir to Kumaon and China (Zhu et al., 2009; Kumar and Varshney, 2013; Mocan et al., 2016). In Iran, the only species growing wild is Morina persica found in the Irano-Turanian region (Fig. 3), which is mainly distributed in the southern Zagros 2

Ecological Indicators 112 (2020) 106096

F. Ghareghan, et al.

Modelling habitat suitability of Morina persica

Environmental factors Plan curvature

Slope

Elevation

Aspect

Mean rainfall

Topographic wetness index Maximum temperature

Minimum temperature

Nitrogen

Electrical conductivity

Carbon

Acidity

Sand

Bulk density

Clay

Silt

Lithology

Application of four models to prepare habitat suitability maps: -Maximum entropy - Support vector machine -General linear model - Boosted regression trees

Presence/absence dataset Random selection

Validation dataset (30%)

Training dataset (70%)

Validating models using: -ROC curve -Sensitivity/ Specificity -Kappa value

Choose the best model

Fig. 1. Methodological flowchart adopted in this study.

mountain range in Fars and Kohkiloye/Boyerahmad provinces (Aghabeigi, 2009). It is unceasingly used in Iranian traditional medicine, and its overutilization by local people has put this species at degradation risk (Mozaffarian, 1996; Ghareghan, 2018).

Drawing on a literature review (Khanum et al., 2013; Cobben et al., 2015; Hosseini et al., 2013; Deb et al., 2017), a total of 17 environmental predictors were used in the habitat suitability modeling process (Table 1).

2.2.2. Morina species inventory mapping A total of 404 presence locations (i.e. observations) of M. persica were recorded during extensive field surveys conducted from April 2016 to September 2017 using a handheld GPS device (Garmin Map 62s). In this research, 70% of the presence data (i.e. 282 locations) was used for training the models, and the remaining 30% was kept apart for the validation phase (Pourghasemi et al., 2012; Jaafari et al., 2014). A 1:1 balance was chosen for presence. Absence data size, that is, a total of 404 absence locations, was extracted from the areas that had the least potential for species appearance (Pourghasemi et al., 2019). The latter was carried out based on preliminary desk studies, field observations, and expert knowledge. The selection of presence-absence locations is vital to the modeling process and model validation stage as it is the proxy to a varied range of validation metrics (e.g., false negative, false positive, ROC, Cohen’s kappa) that highly depend on sample size and spatial distribution (both presence and absence).

2.3. Environmental predictors Based on the growth conditions of M. persica as well as the accessibility and feasibility of data, three groups of environmental factors including 17 variables of climatic, topographic, edaphic, and lithological datasets were chosen for modeling the distribution of suitable habitats for M. persica in the study area. 2.3.1. Climatic data Rainfall and temperature maps provide the main climatic data frequently used in species’ distribution modeling (Khanum et al., 2013). In this work, thematic maps of mean annual precipitation and temperature (mean maximum/minimum) were prepared using the inverse distance weighting (IDW) approach based on 13 years of precipitation and temperature data obtained from 15 rain gauges and 18 weather stations in and adjacent to the study area (IRIMO, 2016). Based on DEM 3

Ecological Indicators 112 (2020) 106096

F. Ghareghan, et al.

Fig. 2. Location of the study area in Iran and Fars Province.

4

Ecological Indicators 112 (2020) 106096

F. Ghareghan, et al.

(a)

(b)

(c)

Fig. 3. Morina persica in natural habitat of Mashayekh area, Iran (a) plant, (b) flowers, (c) habitat landscape (Photo credit: Gholamabbas Ghanbarian, 5 September 2017).

2.3.3. Edaphic data In total, 42 soil samples were collected from depths of 0–30 cm in M. persica habitats. The most frequently used soil properties, according to the related studies (Hosseini et al., 2013), were selected and used for species’ distribution modeling in this work. Organic matter (OM), electrical conductivity (EC), acidity (pH), soil sand/ silt/ clay content, bulk density (BD), and nitrogen (N) were examined in the laboratory. The soil samples were air-dried, sieved through a 2-mm screen, and prepared for further analyses. The hydrometer method was used to determine soil texture (Bouyoucos, 1962), and pH was measured using an electric pH meter. Soil nitrogen (N) and soil organic carbon (OC) levels were determined using the Kjeldahl (Bremmer and Mulvaney, 1982) and Walkley-Black methods, respectively (Nosetto et al., 2006). The EC of soil samples was measured in a 1:1 soil–water suspension with an EC meter (Carter, 2008). Soil bulk density was measured using a soil core sampler (8 cm in diameter) (Nosetto et al., 2006). Soil properties of the final data were obtained from laboratory analyses and mapped using interpolation methods such as kriging and IDW (Fig. 4 (bd, f, j, k, n, p)). Overall, eight soil thematic maps were prepared, rasterized, and further used in the habitat suitability modeling process.

Table 1 Environmental predictors used for modeling habitat suitability for M. persica. Category

Predictor

Data scale

Topographical factors

Slope Aspect Plan curvature Elevation TWI Bulk Density Sand Clay Silt EC pH Organic Carbon Nitrogen Rainfall Temperature (min/max) Formation

Continuous Categorical (9 classes) Continuous Continuous Continuous Continuous Continuous Continuous Continuous Continuous Continuous Continuous Continuous Continuous Continuous Categorical (4 classes)

Edaphic factors

Climatic factors Geological factors

resolution, a 30-m*30-m spatial resolution was used for mapping in the current study (Fig. 4 (h, i, and m)).

2.3.4. Lithological data A lithological formation map of the study area was acquired from the Geological Survey Department of Iran (GSDI) (1997) and included 4 main groups (Table 2, Fig. 4g).

2.3.2. Topographical data The elevation, slope degree, plan curvature, and aspect maps as derivatives of the digital elevation model (DEM) (30-m × 30-m) were prepared in ArcGIS 10.2 software. The TWI map was generated from DEM in SAGA-GIS software (Fig. 4 (a, e, l, o and q)) following the equation:

TWI = ln

tan

2.4. Models description 2.4.1. Frequency ratio: A preliminary assessment The frequency ratio (FR) method was used to carry out an intercomparison among the classes of each predictor (Eq. (2)):

(1)

where α is the cumulative upslope area draining through a point (per unit contour length) and tan is the slope angle at the point (Pourghasemi et al., 2012).

FR =

5

b B a A

(2)

Ecological Indicators 112 (2020) 106096

F. Ghareghan, et al.

where b is the Morina number in each class, B is the total number of the Morina in the region, a is the class area, and A is the total area of the study region. Although the FR enables users to compare different classes of each predictor, intercomparison of predictors is not feasible, and the comparison is limited to each predictor’s classes. Intercomparison between predictors can be addressed only by the factor importance results derived from data mining models.

issue which it shares with other presence-only models is that the prevalence of the phenomenon remains unknown, because absence locations are not engaged in the modeling process. As with other generalpurpose models, the output probability distribution function (pdf) of MaxEnt would be just an estimation (i.e. statistically the best one) of the true target pdf (Chen et al., 2017). More details are provided by Phillips et al. (2006) and Phillips and Dudík (2008).

2.4.2. Maximum entropy Maximum entropy (MaxEnt) has seen major interest in the machine learning/data mining community because of its great performance in different spatial modeling tasks (Yang et al., 2013; Khosravi et al., 2016; Javed et al., 2017). Entropy refers to the information stored in the spatial data. Therefore, maximizing the entropy can provide more information for the model and, consequently, ease the pattern recognition of the phenomenon of interest. As a general purpose, presence-only model, MaxEnt has a number of shortcomings. The main

2.4.3. Generalized linear model Generalized linear model (GLM) is one of the most frequently applied models in many scientific branches. It was first formulated by Nelder and Wedderburn (1972) to determine the best fit to the training data by establishing a link between a dependent variable (i.e. species presence data) and independent explanatory variables (i.e. environmental predictors) (Atkinson and Massari, 1998; Devkota et al., 2013). Hence, a dichotomous dataset (i.e. zero/one) enables GLM to distinguish the pattern of presence/absence localities (Menard, 2002;

(a)

(c)

(b)

(d)

Fig. 4. Thematic maps of the study area: (a) slope aspect, (b) bulk density, (c) organic carbon, (d) clay content, (e) elevation, (f) EC, (g) lithology, (h) annual mean maximum temperature, (i) annual mean minimum temperature, (j) nitrogen, (k) pH, (l) plan curvature, (m) rain, (n) sand content, (o) slope degree, (p) silt content, and (q) TWI. 6

Ecological Indicators 112 (2020) 106096

F. Ghareghan, et al.

(e)

(g)

(f)

(h)

Fig. 4. (continued)

Kavzoglu, 2014). The predicted values range from 0 to 1 which indicates the probability of species presence or the suitability of the habitat. More precisely, a linear model attempts to connect a dependent variable to some predictors by multiplying each predictor with a specific value termed as the regression coefficient. Instead, the GLM generalizes the latter by substituting the regression coefficients with nonparametric functions (i.e. link functions). Therefore, the model will exhibit higher flexibility and prediction power.

the margins. There is always at least one support vector for each class (Sor et al., 2017). The kernel functions enable SVM to handle the nonlinearity in the data by transferring the problem surface into n-dimensional space. The latter also eases accommodation of the optimal hyperplane. This non-parametric method is widely applied to several fields of environmental studies such as landslide susceptibility (Chen et al., 2017; Pourghasemi and Rahmati, 2018), gully erosion (Pourghasemi et al., 2018), and groundwater potentiality assessment (Naghibi et al., 2018). In the current study, the SVM model was executed using the SVM R-package (Karatzoglou et al., 2016).

2.4.4. Support vector machine (SVM) The support vector machine (SVM) model is a supervised machine learning algorithm which can be used in ecological studies for both classification and regression tasks (Kavzoglu and Colkesen, 2009; Ballabio and Sterlacchini, 2012). Underpinned by a statistical theory proposed by Vapnik (1995), SVM distinguishes the sample data into two classes by accommodating a hyperplane which maximizes the margins between two different patterns (Qi et al., 2017). The optimal hyperplane is set by the help of some specific data points called support vectors which are a function of the training data and are located near

2.4.5. Boosted regression trees (BRT) The boosted regression trees (BRT) model is a nonparametric, robust, and easy-to-use machine learning method that has recently been developed and is used in ecological studies, particularly in the exploration and prediction of species distribution (Hutchinson et al., 2011). The BRT model combines two algorithms of regression and boosting to incorporate the advantages of both approaches (Elith et al., 2008). In this approach, there is no need for data transformation or the 7

Ecological Indicators 112 (2020) 106096

F. Ghareghan, et al.

(k)

(i)

(j)

(l)

Fig. 4. (continued)

elimination of outliers prior to the modeling process. In addition, the complicated nonlinear relationships among predictors are well considered by the robust mathematical functions of BRT (Pourtaghi et al., 2016). The flexible regression functions of BRT are quite beneficial in ecological modeling, because they allow the handling of different data types (i.e. continuous and categorical) and give a precise prediction of species distribution (Elith et al., 2008). The good performance and precise prediction of BRT have been reported in several ecological studies (De'ath, 2007; Pourtaghi et al., 2016). In the current study, the gbm R-package was used to implement the BRT method (Elith et al., 2008).

1997; Pearce and Ferrier, 2000). The ROC curves plotted by training and validation sets address the goodness-of-fit and prediction power of the models, respectively (Fielding and Bell, 1997; Merow et al., 2013). The ROC curve originates from the confusion matrix which examines four main components: true positive (TP), true negative (TN), false positives (FP), and false negatives (FN). The TP and TN, respectively, address the performance of the model on correctly predicting the presences (i.e. positives) and absences (i.e. negatives). Conversely, the FP and FN, respectively, show how the model has incorrectly predicted the absences as presences and vice versa. The main advantage of the ROC curve is that it measures the model performance regardless of any threshold (i.e. cutoff) value, while other cutoff-dependent metrics merely function on a predefined cutoff value. The area under ROC curve (AUROC) was used to select the premier (i.e. better performing) model among MaxEnt, GLM, SVM, and BRT models. The AUROC values ranged from 0.5 to 1, with 0.5 representing a random prediction and 1 showing a perfect prediction. According to an AUROC classification proposed by Hosmer and Lemeshow (2000), the values ranges of 0.8–0.9 and 0.9–1 represent good and excellent performances,

2.5. Model validation The ROC curve, as a cutoff-independent performance metric, was used to test the validity of the models’ results. In practice, the ROC curve plots the sensitivity (i.e. correctly predicting the presence locations) on the Y-axis against the 1-specificity (i.e. incorrectly predicting the absence locations as presences) on the X-axis (Fielding and Bell, 8

Ecological Indicators 112 (2020) 106096

F. Ghareghan, et al.

(m)

(o)

(n)

(p)

Fig. 4. (continued)

respectively. Overall, higher AUC values show the greater performance of the model. To scrutinize the model evaluation process, the three cutoff-dependent performance metrics, Cohen’s kappa, specificity, and sensitivity, were used. Cohen’s kappa is used to assess whether or not the model’s accuracy stems from any randomness. Kappa values close to zero indicate purely randomly-driven results, while a value of 1 represents a perfect prediction without any random chance. Sensitivity and specificity, respectively, aim to quantify the probability of correctly detecting the presence and absence locations. In this study, a 50% cutoff value—as commonly used in the literature—was chosen to calculate the abovementioned cutoff-dependent metrics.

578.62; maximum temperature < 22.4; minimum temperature < 7.64; elevation ranging between 1854 and 2062 m above sea level; slope ranging between 6.88 and 15.02%; east and north facing slopes; convex planar curvatures, TWI < 7.29; and the lithological group 3. Previous studies have shown that M. persica is a perennial forb growing mainly on thin bedded limestone and grey to green ferruginous marls, silty and marl limestone, weathered marls, minor silty shale, argillaceous limestone, conglomerate and sandstone (GSI, 1979; Ghareghan, 2018; Tables 2-3). It prefers silty soil textures with a high content of organic carbon (Table 3). The rapid aboveground biomass of M. persica contributes to litter accumulation at the end of the growth season and leads to higher levels of soil carbon storage (Ghareghan, 2018; Mousazade et al., 2019). It has been proven that M. persica prefers a middle elevation range with north-facing aspects as well as more annual mean rainfall compared to some xerophyte shrubs in the study area, such as Astragalus fasciculifolius (Mozaffarian, 1996; Mousazade et al., 2019).

3. Results and discussion An initial assessment of the FR values in Table 3 shows that the current presence pattern of M. persica species is highly correlated with the following predictor classes: bulk density < 1.08; EC > 1.5; soil textural classes of sand < 35.63, clay < 22.05, silt > 36.75%; nitrogen < 0.12; organic carbon > 3.19; pH < 7.5; rainfall > 9

Ecological Indicators 112 (2020) 106096

F. Ghareghan, et al.

affected by different environmental variables. Rupprecht et al. (2011) showed that maximum temperature and potential radiation are the main predictors of Juniperus oxycedrus distribution in central high Atlas, Morocco. Elevation, lime, and gravel were the most significant environmental factors on the habitat distribution of A. aucheri in Poshtkoh, Iran. In a study of Astragalus fasciculifolius, the soil factors of bulk density, electrical conductivity (EC) nitrogen, acidity (pH), and sand percentage were found to be the most significant variables affecting the spatial distribution of A. fasciculifolius in southern Iran (Mousazade et al., 2019) (See Table 7).

(q)

3.2. Application of SVM model R statistical software was used to map the habitat suitability model of M. persica for the SVM model (Fig. 8c). The heuristic test using training data was carried out to find the best kernel parameters. The analysis results showed that the hyper-parameter sigma was 0.056 with 127 support vectors, whereas the training error was 0.087. Then, the training model was extended to the whole study area. The final map, produced based on the habitat suitability index, was reclassified into four classes of low, moderate, high, and very high (Fig. 8). The percentages of these classes are shown in Fig. 9. The low class had the

Fig. 4. (continued) Table 2 Description of the formation groups of the study area. No

Class

Geo - Unit

1

Group 1

As, Ja, OMas, PEja

2

Group 2

3

Group 3

Alluvium, Lac, PIOmc, Q, Qac, Qap, Qb1, Qb2, Qc, Qcb, Qg, Qgsc, Qs, Qsc, Qsc2, Qscg, Qscs, Qsm, Qssc, Sr Mc, Mgs, Mgs-1, Mgs-2, Mgs-up, MPlaj, PEpb, Pq

4

Group 4

PIQb1c

Materials Limestone, dolomitic limestone, Massive, brown weathering, marl limestone with abundant microfauna, Uniform bedded, intensive layers, bearing cream limestone to grey ferruginous and dolomitic limestone (upper portion), dolomitic and marl limestone (lower portion) Alluvium and recent deposits; Sub recent old fan deposits, unfolded conglomerates old fine clastic Well rounded cherty mature and somewhere petromictic conglomerate; Alternating thick-bedded superficial chemical deposit and thin bedded limestone and grey to green ferruginous marls; Silty and marl limestone; chert arenite to calclithite fine-grained to coarse-grained cross bedded and somewhere ripple mark sandstone; Mainly shale and weathered marls and minor silty shale, argillaceous limestone, conglomerate and sandstone Consolidated and cemented reworked pile of fine to very coarse_ grained polymitic deposits, indicative of proximal sourcand tectonic instability

3.1. Multicollinearity and application of the GLM model

largest area percentage (49.09%), followed by the moderate (27.60%), high (14.69%), and very high habitat classes (8.62%). In previous research, the SVM model revealed a reasonable relationship between the distribution of plant/animal species and environmental variables (Martínez-Rincón et al., 2012; Rahimian Boogar et al., 2019). In the current research, both the MaxEnt and the SVM models were consistent with predictions of M. persica presence with respect to the factors used in the suitability maps (Table 8, Fig. 10).

The result of the GLM approach is shown in Fig. 8 (b). The multicollinearity among independent environmental predictors was determined using the variance inflation factor (VIF). A VIF > 5 denotes critical multicollinearity in a predictor and, consequently, it should be excluded from the modeling process (Wang et al., 2015; Ghanbarian et al. 2019). Accordingly, mean minimum temperature and soil silt content have been eliminated (Tables 4 and 5). The training data (70%) of M. persica presence was fed into the GLM to determine the relationship between species presence and environmental predictors. The Negelkerke R Square (0.572), the X2 value of the Hosmer and Lemeshow test, and the Cox & Snell R2 value were used to evaluate the effectiveness of the training data (Ghanbarian et al., 2019). As shown in Tables 4 and 5, there are positive and negative coefficients for environmental predictors which represent positive and negative correlations with the M. persica presence. As it appears, BD, EC, organic carbon, plan curvature, rain, slope, and lithology are the predictors that are positively correlated with suitability values. In other words, the suitability value increases as these variables increase and vice versa. Conversely, slope aspect, clay, elevation, maximum temperature, nitrogen, pH, and sand predictors are negatively correlated with suitability values. The pseudo – R2 value, calculated in SPSS, was higher than 0.2, which is in a satisfactory range and, in parallel, indicates an acceptable goodness-of-fit of the GLM model (Table 6). Previous research has shown that the distribution of a plant species could be

3.3. Application of the MaxEnt model The results of the MaxEnt model indicated that among all environmental variables, edaphic factors contributed highly to the modeling process of habitat suitability for M. persica, followed by climatic, lithological, and topographical factors. In particular, electrical conductivity, pH, bulk density, organic carbon, and soil silt content had the greatest effects on M. persica presence, while aspect and plan curvature showed less importance (Fig. 7). The map produced using the MaxEnt model was classified based on the natural breaks and showed that 1.90, 4.61, 14.47, and 79.02% of the study area coincided with very high, high, moderate, and low suitable habitats for M. persica, respectively (Fig. 8a and 9). 3.4. Application of boosted regression tree model In general, the results of the BRT model related to the training data 10

Ecological Indicators 112 (2020) 106096

F. Ghareghan, et al.

Table 3 Assessment of the relationship between M. persica presence and environmental predictors using frequency ratio. Factors

Classes

Pixel. No

%Pixel

Species. No

%Species

FR

Bulk density (g/cm3)

< 1.08 1.08 – 1.21 > 1.21

394,749 504,004 451,008

29.25 37.34 33.41

122 47 114

43.11 16.61 40.28

1.47 0.47 1.21

< 0.92 0.92 – 1.14 1.14 – 1.50 > 1.50

718,161 459,096 120,985 51,519

53.21 34.01 8.96 3.82

173 28 50 32

61.13 9.89 17.67 11.31

1.15 0.29 1.97 2.96

< 35.63 35.63 – 39.94 39.94 – 44.02 44.02 – 49.78 > 49.78

97,728 442,218 546,923 237,441 24,451

7.24 32.76 40.52 17.59 1.89

65 97 81 32 8

22.97 34.28 28.62 11.31 2.83

3.17 1.05 0.71 0.64 1.50

< 22.05 22.05 – 28.58 28.58–35.36 35.36 – 42.87 > 42.87

167,017 433,899 289,285 329,613 129,947

12.37 32.15 21.43 24.42 9.63

98 95 6 65 19

34.63 33.57 2.12 22.97 6.71

2.80 1.04 0.10 0.94 0.70

< 14.17 14.17 – 21.91 21.91 – 29.64 29.64 – 36.75 > 36.75

165,301 256,875 212,312 519,549 195,724

12.25 19.03 15.73 38.49 14.50

19 57 7 46 154

6.71 20.14 2.47 16.25 54.42

0.5 1.06 0.16 0.42 3.75

< 0.12 0.12 – 0.15 0.15 – 0.2 > 0.2

280,663 877,239 164,344 27,515

20.79 64.99 12.18 2.04

140 96 35 12

49.47 33.92 12.37 4.24

2.38 0.52 1.02 2.08

< 1.10 1.10 – 1.76 1.76 – 3.19 > 3.19

225,546 1,018,752 99,582 5881

16.71 75.48 7.38 0.44

135 122 20 6

47.70 43.11 7.07 2.12

2.58 0.57 0.96 4.87

< 7.50 7.50 – 7.88 > 7.88

60,786 146,627 1,142,348

4.50 10.86 84.63

46 21 216

16.25 7.42 76.33

3.61 0.68 0.90

< 441.28 441.28 – 487.06 487.06 – 532.84 532.84 – 578.62 > 578.62

693,060 465,033 110,045 65,111 16,512

51.35 34.45 8.15 4.82 1.22

191 45 2 35 10

67.49 15.90 0.71 12.37 3.53

1.31 0.46 0.09 2.56 2.89

< 22.40 22.40 – 24.51 > 24.51

288,416 829,752 231,593

21.37 61.47 17.16

62 175 46

21.91 61.84 16.25

1.03 1.01 0.95

< 7.64 7.64 – 9.29 9.29 – 10.94

332,045 824,541 193,175

24.60 61.09 14.31

96 148 39

33.92 52.30 13.78

1.38 0.86 0.96

< 1854 1854 – 2062 2062 – 2218 2218 – 2413 > 2413

169,437 347,636 507,626 243,095 81,967

12.55 25.76 37.61 18.01 6.07

9 181 88 5 0

3.18 63.96 31.10 1.77 0

0.25 2.48 0.83 0.10 0

< 6.88 6.88 – 15.02 15.02 – 25.66 > 25.66

651,946 379,670 240,951 77,194

48.30 28.13 17.85 5.72

73 153 53 4

25.80 54.06 8.73 1.41

0.53 1.92 1.05 0.25

Flat North North east East South east South South west West

5920 161,824 221,705 156,707 112,099 142,195 227,115 182,741

0.44 11.99 16.43 11.61 8.31 10.53 16.83 13.54

0 47 45 49 23 29 32 33

0 16.61 15.90 17.31 8.13 10.25 11.31 11.66

0 1.39 0.97 1.49 0.98 0.97 0.67 0.86

EC (dsm-1)

Sand (%)

Clay (%)

Silt (%)

Nitrogen (%)

Organic carbon (%)

pH

Annual mean rainfall (mm)

Mean max. temperature (C)

Mean min. temperature (C)

Elevation (m)

Slope (angle)

Aspect

(continued on next page) 11

Ecological Indicators 112 (2020) 106096

F. Ghareghan, et al.

Table 3 (continued) Factors Plan curvature (100/m)

TWI

Lithology

Classes

Pixel. No

%Pixel

Species. No

%Species

FR

North west

139,455

10.33

25

8.83

0.86

562,451 199,846 587,464

41.67 14.81 43.52

138 17 128

48.76 6.01 45.23

1.17 0.41 1.04

< 7.29 7.29 – 11.20 > 11.20

1,053,204 277,224 19,333

78.03 20.54 1.43

239 43 1

84.45 15.19 0.35

1.08 0.74 0.25

Group Group Group Group

146,999 584,003 450,400 168,359

10.89 43.27 33.37 12.47

5 65 207 6

1.77 22.97 73.14 2.12

0.16 0.53 2.19 0.17

1 2 3 4

as plan curvature, lithology, and sand had the lowest effects on occurrence of the studied plant species. According to the natural breaks classification of the map produced using the BRT model, 44.68, 28.04, 19.06, and 8.22% of the total study area coincided with low, moderate, high, and very high location suitability for M. persica distribution, respectively (Figs. 8d and 9). Naghibi et al. (2016) reported that the BRT model produced the best prediction results for groundwater mapping and that this model could be useful in generating accurate spring potential maps. Moreover, it has been shown that the BRT model shows good potential for being a useful tool for critical decision-making support in research on the global scale as well as in the study of plant traits for conservation objectives (Wyse and Dickie, 2017).

Table 4 Weights of environmental predictors in the modeling process of habitat suitability for M. persica derived from the GLM model.

a

Step 1

Aspect Aspect(1) Aspect(2) Aspect(3) Aspect(4) Aspect(5) Aspect(6) Aspect(7) Bulk density Clay DEM EC Max temperature Nitrogen Organic carbon pH Plan curvature Annual mean rainfall Sand Slope Lithology Lithology (1) Lithology (2) Lithology (3) Lithology (4) Constant

B

S.E.

Wald

df

Sig.

Exp(B)

0.514 −0.427 0.083 0.315 −0.191 −0.794 −0.888 5.895 −0.291 −0.005 2.529 −1.562

0.473 0.451 0.515 0.564 0.518 0.481 0.515 1.484 0.032 0.001 0.492 0.253

14.764 1.184 0.896 0.026 0.312 0.136 2.729 2.970 15.774 81.787 16.786 26.383 38.060

7 1 1 1 1 1 1 1 1 1 1 1 1

0.039 0.277 0.344 0.872 0.576 0.712 0.099 0.085 0.000 0.000 0.000 0.000 0.000

1.672 0.652 1.086 1.371 0.826 0.452 0.412 363.092 0.748 0.995 12.537 0.210

−19.243 0.137

4.393 0.224

19.185 0.376

1 1

0.000 0.540

0.000 1.147

−7.070 0.195

0.981 0.284

51.994 0.472

1 1

0.000 0.492

0.001 1.215

0.043

0.006

51.467

1

0.000

1.044

−0.121 0.018

0.031 0.019 40192.970 0.926 0.737 0.742 12.090

1 1 4 1 1 1 1 1

0.000 0.337 0.000 1.000 0.162 0.098 0.000 0.000

0.886 1.018

−19.836 1.295 1.221 3.020 87.175

14.793 0.921 31.659 0.000 1.953 2.745 16.581 51.994

3.5. Habitat suitability maps (HSMs) The habitat suitability maps were generated from the maximum entropy, generalized linear model, support vector machine, and boosted regression tree models. The obtained maps were classified into the four classes of low, moderate, high, and very high suitability using the natural break classification scheme in ArcGIS 10.3 (Fig. 8 (a-d)). Evidently, the most suitable areas for M. persica are located in the southernmost and westernmost points of the study area and extend toward the northwestern parts of the region. As shown in Fig. 7, in the GLM, BRT, SVM, and MaxEnt models, 37.37%, 27.28%, 23.31%, and 6.51% of the study area, respectively, were classified into highly and very highly suitable habitats for the M. persica species.

0.000 3.650 3.389 20.487 7.237E37

3.6. Validation of the habitat suitability models The ROC curves of MaxEnt, SVM, GLM, and BRT models are illustrated in Fig. 10. Some descriptive statistics regarding the ROC curves are presented in Table 8. According to Fig. 10 and Table 8, MaxEnt with the highest AUROC value of 0.95 gave the best performance, followed by SVM (0.941), GLM (0.874), and BRT (0.847). The results are interesting in different ways. Since all the models except MaxEnt were built upon the presence-absence dataset and should have learned the pattern recognition in a better way compared to presence-only models (i.e. MaxEnt), they were expected to also give better performances in terms of goodness-of-fit and predictive value. Previous studies have noted the superior performance of the MaxEnt model in plant species suitability modeling (Mousazade et al., 2019; Ghanbarian et al., 2019; Rahimian Boogar et al., 2019). However, MaxEnt outperformed its presence-absence counterparts. Although the differences in AUROC values are negligible, MaxEnt and SVM both managed to reach the performance level of excellent (Hosmer and Lemeshow, 2000). The similar performance of the SVM model to the MaxEnt model shows the high potential of SVM in species distribution modeling. As previously mentioned, MaxEnt being a presence-only model and not engaging absence locations can expose the model to a weak spatial differentiation.

a. All variable(s) entered on step 1 according to Enter method: Aspect, Bulk density, Clay, DEM, EC, Max-temperature, Nitrogen, Organic carbon, pH, Plan, Rain, Sand, Slope, Lithology

were used for modeling process are presented in Fig. 6d. The BRT method often depends on the number of trees (Naghibi et al., 2016). For instance, in this study, when the BRT model became fitting with a final fixed number of 1850 trees, it was found that the BRT model can estimate the habitat suitability with high accuracy (Fig. 5). The BRT model does not rely on the random selection of the variables. Some studies have emphasized that the evaluation quality of this model seems better than that of other machine learning models. Moreover, according to the Fig. 6, there are differences among the weight values for each factor. Based on the literature, the BRT model has several advantages, including automatically regulating between predictors, replacing missing data, and changing predictor factors (Elith et al., 2008). According to the results, silt, DEM, OC and OM had the greatest effect on the M. persica presence with the relative influence of 14.315, 11.757, 10.596, and 10.552%, respectively. On the other hand, variables such 12

Ecological Indicators 112 (2020) 106096

F. Ghareghan, et al.

Table 5 Multicollinearity of the environmental predictors and significance levels for the GLM model. Coefficients Model

(Constant) Aspect Bulk density Clay DEM EC Max temperature Nitrogen Organic carbon pH Plan Annual mean rainfall Sand Slope Lithology

Unstandardized Coefficients

Standardized Coefficients

B

Std. Error

Beta

13.358 −0.014 0.892 −0.036 0.000 0.196 −0.207 −2.579 0.033 −1.023 0.008 0.005 −0.022 0.000 0.047

1.134 0.008 0.195 0.003 0.000 0.057 0.025 0.616 0.034 0.107 0.046 0.001 0.004 0.002 0.028

−0.067 0.261 −0.718 −0.211 0.131 −0.513 −0.180 0.043 −0.515 0.007 0.462 −0.236 0.002 0.066

t

11.777 −1.899 4.563 −12.109 −5.055 3.428 −8.274 −4.185 0.978 −9.517 0.181 7.502 −5.271 0.063 1.646

Sig.

0.000 0.058 0.000 0.000 0.000 0.001 0.000 0.000 0.329 0.000 0.856 0.000 0.000 0.950 0.100

Collinearity Statistics Tolerance

VIF

0.958 0.365 0.340 0.682 0.818 0.310 0.643 0.609 0.408 0.922 0.315 0.593 0.822 0.735

1.044 2.736 2.943 1.466 1.223 3.223 1.556 1.643 2.453 1.085 3.178 1.686 1.217 1.361

Table 6 Statistics and accuracy coefficients of the GLM model. Step

1

−2 Log likelihood

Cox & Snell R Square

Nagelkerke R Square

Hosmer and Lemeshow Test Chisquare

df

Sig.

467.122a

0.429

0.572

10.233

8

0.249

Table 7 Relative influence of effective habitat suitability factors in BRT model.

Silt DEM OC EC BD pH Min.temp Max.temp Rain Slope N Clay Aspect Sand Lithology Plan

Variable

Relative influence (%)

Silt Digital elevation model Organic carbon Electrical conductivity Bulk density pH Mean minimum temperature Mean maximum temperature Rain Slope Nitrogen Clay Aspect Sand Lithology Plan curvature

14.3151047 11.7576791 10.5967612 10.5527697 9.5651835 7.1106823 6.0617053 4.8650909 4.7615601 4.6398073 3.8587784 3.5252008 2.7516913 2.6935686 2.2403473 0.7040695

Fig. 5. Fitting final BRT model with a fixed number of 1850 trees for M. persica.

performances in many studies. Therefore, based on these premises, the outperformance of MaxEnt and SVM is somewhat justifiable. Different studies such as Hirzel et al. (2002), Yang et al. (2013), Hosseini et al. (2013), Cobben et al. (2015), Deb et al. (2017), and Hosseinzadeh et al. (2018) also attested to the good performance of MaxEnt. Fig. 11 depicts a schematic comparison of the four main elements of the confusion matrix (TP, TN, FP, and FN) attributed to each model and are considered the source of the models’ success and shortcomings. Table 9 also details the cutoff-dependent performance metrics based on the confusion matrix variables and the formulas reported by Rahmati et al. (2019). Based on a 50% cutoff value, MaxEnt and SVM both showed identical results. Also, GLM slightly outperforms BRT in terms of higher sensitivity, specificity, and Cohen’s kappa which stem from lower FP and FN values. FP and FN, in particular, can address more inferences regarding prediction errors. In fact, FP (i.e. incorrectly predicting absences as presences) is considered as Error Type 1. The consequences can cause significant economic losses, because larger areas will be considered as danger zones and, accordingly, deprived of further financial investments due to incorrect model predictions. Moreover, FN (i.e. incorrectly predicting species presence as non-presence) can cause

Table 8 Descriptive statistics of ROC curves for each suitability model. Models

MaxEnt GLM SVM BRT

Area

0.950 0.874 0.941 0.847

Standard Error

0.013 0.022 0.015 0.025

Asymptotic 95% Confidence Interval Lower Bound

Upper Bound

0.924 0.831 0.911 0.798

0.977 0.917 0.970 0.896

Nevertheless, not engaging the absence locations can also benefit the MaxEnt model, especially when the absence data has not been recorded extensively (i.e. an incomplete data archive) or are contaminated by presence data (i.e. data pollution). On the other hand, the SVM is specifically designed for pattern recognition and has given great 13

Ecological Indicators 112 (2020) 106096

F. Ghareghan, et al.

Fig. 6. Partial dependence plots of the predictor variables in the BRT model for predicting the potential map of M. persica distribution. The relative contribution of each predictor is shown between brackets.

Fig. 7. Factors importance analysis. The variables abbreviations in the figure are: aspect_class (slope aspect), bd (bulk density), clay (clay), DEM (digital elevation model), EC (electrical conductivity), litho (lithology), max_t (mean maximum temperature), min_t (mean minimum temperature), nitro (nitrogen), oc (organic carbon), ph (pH), plan (plan curvature), rain (mean annual rainfall), sand (sand) and slope (slope).

14

Ecological Indicators 112 (2020) 106096

F. Ghareghan, et al.

(a)

(c)

(b)

(d)

Fig. 8. Habitat Suitability Maps using: (a) MaxEnt, (b) GLM, (c) SVM and (d) BRT models.

severe consequences, because incorrectly identified areas will be erroneously regarded as being safe. Therefore, further residential development in such areas may put many species habitats at risk. Therefore, the superiority of MaxEnt and SVM over GLM and BRT is apparent.

However, the cutoff-dependent measures failed to make a distinction between MaxEnt and SVM, while ROC curves were slightly successful. Overall, cutoff-dependent and independent measures were in line in terms of sieving the performance of the model.

Fig. 9. Percentages of different habitat suitability classes for SVM, BRT, GLM, and MaxEnt models. 15

Ecological Indicators 112 (2020) 106096

F. Ghareghan, et al.

Table 9 Cutoff-dependent performance metrics calculated for data mining models. Models

BRT GLM MaxEnt SVM

Metrics TP

TN

FP

FN

Sensitivity

Specificity

Cohen's kappa

95 97 105 105

94 95 105 105

27 26 16 16

26 24 16 16

0.7851 0.8017 0.8678 0.8678

0.7769 0.7851 0.8678 0.8678

0.5620 0.5870 0.7360 0.7360

important factor limiting species distribution subsequent to biodiversity loss, causes for reduced genetic variation, and species extinction (Sandifer et al., 2015; Ayre and Hughes, 2004). Habitat suitability modeling plays a pivotal role in comprehending, preserving, and restoring natural habitats. This work was an attempt to address the latter by employing multivariate statistical and data mining models. The results showed that among these models, MaxEnt had higher accuracy (95%), followed by the SVM, GLM, and BRT models. Interestingly, the inferences revealed that M. persica has been growing on Marl formations as well. Because the marl formation is susceptible to soil erosion, planting the M. persica species, which has a deep rooting system and basal and aerial cover, can be a biological alternative to controlling soil erosion. Among all environmental predictors, edaphic, climatic, lithological, and topographical factors were found to have the greatest to the least effect successively on the presence of M. persica. More precisely, electrical conductivity, pH, bulk density, organic carbon, and silt percentage, successively, had the greatest effects on M. persica presence. In contrast, slope aspect and plan curvature had lower effects. The

Fig. 10. ROC curves plotted for suitability models.

4. Conclusion It is important to note that based on the results of research on biodiversity, habitat degradation and fragmentation is the most

Fig. 11. Confusion components calculated at a 50% cutoff value for each data mining model: a) BRT, b) GLM, c) MaxEnt, and d) SVM; true positive (TP), true negative (TN), false positives (FP), and false negatives (FN). 16

Ecological Indicators 112 (2020) 106096

F. Ghareghan, et al.

inferences of the GLM model were somewhat in line with the data mining models, where BD, EC, organic carbon, plan curvature, rainfall, slope, and lithology were positively correlated with suitability values, as suitability values concurrently increased with the values of these predictors. Conversely, slope aspect, clay, elevation, maximum temperature, nitrogen, pH, and sand content predictors were negatively correlated with suitability values. The suitability maps produced using selected models could be used as powerful tools in reclamation projects of degraded habitats of M. persica in southern Iran and similar ecological niches worldwide. Furthermore, the inferences of this study would be of interest to authorities in the natural resources sector, the research community, local stakeholders, and biodiversity conservation agencies who work to protect and conserve the M. persica habitat in the study area.

Elith, J., Leathwick, J.R., Hastie, T., 2008. A working guide to boosted regression trees. J. Anim. Ecol. 77 (4), 802–813. Fielding, A.H., Bell, J.F., 1997. A review of methods for the assessment of prediction errors in conservation presence/absence models. Environ. Conserv. 24 (1), 38–49. Franklin, J., 2010. Mapping species distributions: spatial inference and prediction. Cambridge University Press. Garcia, K., Lasco, R., Ines, A., Lyon, B., Pulhin, F., 2013. Predicting geographic distribution and habitat suitability due to climate change of selected threatened forest tree species in the Philippines. Appl. Geogr. 44, 12–22. Garzon, M.B., Blazek, R., Neteler, M., De Dios, R.S., Ollero, H.S., Furlanello, C., 2006. Predicting habitat suitability with machine learning models: the potential area of Pinus sylvestris L. in the Iberian Peninsula. Ecol. Model. 197 (3–4), 383–393. Ghanbarian, G., Raoufat, M.R., Pourghasemi, H.R., Safaeian, R., 2019. Habitat suitability mapping of Artemisia aucheri Boiss based on the GLM model in R. In: Spatial modeling in GIS and R for earth and environmental sciences. Elsevier, pp. 213–227. Ghareghan, F., 2018. Evaluating of the habitat suitability of Morina persica L. using machine learning models. MSc thesis. Shiraz University, Shiraz, Iran. Gogol-Prokurat, M., 2011. Predicting habitat suitability for rare plants at local spatial scales using a species distribution model. Ecol. Appl. 21 (1), 33–47. Guisan, A., Edwards, T.C., Hastie, T., 2002. Generalized linear and generalized additive models in studies of species distribution: setting the scene. Ecol. Model. 157, 89–100. Guisan, A., Zimmermann, N.E., 2000. Predictive habitat distribution models in ecology. Ecol. Model. 135, 147–186. GSI, 1979. Geological map of Kazeroon, 1: 250000. Geolological Survey and Mineral Explorations of Iran Publishing, Tehran. Iran. Hirzel, A.H., Hausser, J., Chessel, D., Perrin, N., 2002. Ecological-niche factor analysis: how to compute habitat-suitability maps without absence data? Ecology 83 (7), 2027–2036. Hosmer Jr, D.W., Lemeshow, S., 2000. Applied logistic regression, second ed. Wiley, New York. Hosseini, S.Z., Kappas, M., Chahouki, M.Z., Gerold, G., Erasmi, S., Emam, A.R., 2013. Modelling potential habitats for Artemisia sieberi and Artemisia aucheri in Poshtkouh area, central Iran using the maximum entropy model and geostatistics. Ecol. Inf. 18, 61–68. Hosseinzadeh, M.S., Farhadi Qomi, M., Naimi, B., Roedder, D., Kazemi, S.M., 2018. Habitat suitability and modelling the potential distribution of the Plateau Snake Skink Ophiomorus nuchalis (Sauria: Scincidae) on the Iranian Plateau. North-Western Journal of Zoology 14 (1), 60–63. Hutchinson, R.A., Liu, L.P., Dietterich, T.G., 2011. Incorporating boosted regression trees into ecological latent variable models. Twenty-Fifth AAAI Conference on Artificial Intelligence. IRIMO (Islamic Republic of Iran Meteorological Organization), 2016. Climate data-base. http://www.weather.ir/English/.(accessed 16 August 2017). Jaafari, A., Najafi, A., Pourghasemi, H.R., Rezaeian, J., Sattarian, A., 2014. GIS-based frequency ratio and index of entropy models for landslide susceptibility assessment in the Caspian forest, northern Iran. Int. J. Environ. Sci. Technol. 11 (4), 909–926. Javed, S.M., Raj, M., Kumar, S., 2017. Predicting Potential Habitat Suitability for an Endemic Gecko Calodactylodes aureus and its Conservation Implications in India. Trop. Ecol. 58 (2). Joshi, R.K., 2013. Genus Morina: valuable aromatic plants for herbal drug. VRI Phytomedicine 1 (2), 81–84. Karatzoglou, A., Smola, A., Hornik, K., 2016. Package ‘kernlab’. pp. 108 (Date/ Publication 2016-03-29). Kavzoglu, T., Colkesen, I., 2009. A kernel functions analysis for support vector machines for land cover classification. Int. J. Appl. Earth Obs. Geoinf. 11 (5), 352–359. Kavzoglu, T., Sahin, E.K., Colkesen, I., 2014. Landslide susceptibility mapping using GISbased multi-criteria decision analysis, support vector machines, and logistic regression. Landslides 11 (3), 425–439. Khanum, R., Mumtaz, A.S., Kumar, S., 2013. Predicting impacts of climate change on medicinal asclepiads of Pakistan using Maxent modeling. Acta Oecologica 49, 23–31. Khosravi, R., Hemami, M.R., Malekian, M., Flint, A., Flint, L., 2016. Maxent modelling for predicting potential distribution of goitered gazelle in central Iran: the effect of extent and grain size on performance of the model. Turk. J. Zool. 40 (4), 574–585. Knudson, M.D., VanLooy, J.A., Hill, M.J., 2015. A Habitat Suitability Index (HSI) for the Western Prairie Fringed Orchid (Platanthera praeclara) on the Sheyenne National Grassland, North Dakota, USA. Ecol. Ind. 57, 536–545. Kumar, S., Stohlgren, T.J., 2009. Maxent modelling for predicting suitable habitat for threatened and endangered tree Canacomyrica monticola in New Caledonia. Journal of Ecology and the Natural Environment 1 (4), 094–098. Kumar, A., Varshney, V.K., Rawat, M.S., Sahrawat, S., 2013. Chemical constituents of Morina genus: a comprehensive. American Journal of Essential Oils and Natural Products 1 (3), 1–15. Lu, C.Y., Gu, W., Dai, A.H., Wei, H.Y., 2012. Assessing habitat suitability based on geographic information system (GIS) and fuzzy: a case study of Schisandra sphenanthera Rehd. et Wils. in Qinling Mountains, China. Ecological Modelling 242, 105–115. Martínez-Rincón, R.O., Ortega-García, S., Vaca-Rodríguez, J.G., 2012. Comparative performance of generalized additive models and boosted regression trees for statistical modeling of incidental catch of wahoo (Acanthocybium solandri) in the Mexican tuna purse-seine fishery. Ecol. Model. 233, 20–25. Matin, S., Chitale, V.S., Behera, M.D., Mishra, B., Roy, P.S., 2012. Fauna data integration and species distribution modelling as two major advantages of geoinformatics-based phytobiodiversity study in today’s fast changing climate. Biodivers. Conserv. 21 (5), 1229–1250. Menard, S., 2002. Applied logistic regression analysis Vol. 106 Sage. Merow, C., Smith, M.J., Silander Jr, J.A., 2013. A practical guide to MaxEnt for modeling species’ distributions: what it does, and why inputs and settings matter. Ecography 36

Funding This work was supported by the Shiraz University [grant number 94GCU3M153031]. CRediT authorship contribution statement Fateme Ghareghan: Visualization, Investigation. Gholamabbas Ghanbarian: Conceptualization, Writing - original draft, Supervision. Hamid Reza Pourghasemi: Methodology, Software, Data curation. Roja Safaeian: Validation. Acknowledgments The authors would like to thank the editor-in-chief and two anonymous reviewers for their valuable suggestions and revisions that improved the quality of the manuscript. Appendix A. Supplementary data Supplementary data to this article can be found online at https:// doi.org/10.1016/j.ecolind.2020.106096. References Aghabeigi, F., 2009. Flora of Iran: Morinaceae. Research Institute of Forest and Rangeland Press, Tehran. Atkinson, P.M., Massari, R., 1998. Generalized linear modeling of susceptibility to landsliding in the central Apennines, Italy. Comput. Geosci. 24, 373–385. Ayre, D.J., Hughes, T.P., 2004. Climate change, genotypic diversity and gene flow in reefbuilding corals. Ecol. Lett. 7 (4), 273–278. Ballabio, C., Sterlacchini, S., 2012. Support vector machines for landslide susceptibility mapping: the Staffora river basin case study, Italy. Mathematical Geosciences 44 (1), 47–70. Bouyoucos, G.J., 1962. Hydrometer method improved for making particle size analyses of soils. Agron. J. 54 (5), 464–465. Bremmer, J.M., Mulvaney, C.S., 1982. Total N. methods of soil analysis. Agron. Monog 9, 895–926. Bell, C.D., Donoghue, M.J., 2003. Phylogeny and biogeography of Morinaceae (Dipsacales) based on nuclear and chloroplast DNA sequences. Org. Divers. Evol. 3 (3), 227–237. Carter, M.R., 2008. Soil sampling and methods of analysis. CRC Press. Chen, W., Pourghasemi, H.R., Kornejady, A., Zhang, N., 2017. Landslide spatial modelling: introducing new ensembles of ANN, MaxEnt, and SVM machine learning techniques. Geoderma 305, 314–327. Cobben, M.M.P., Van Treuren, R., Castañeda-Álvarez, N.P., Khoury, C.K., Kik, C., Van Hintum, T.J., 2015. Robustness and accuracy of maxent niche modelling for Lactuca species distributions in light of collecting expeditions. Plant Genetic Resources 13 (2), 153–161. De'Ath, G., 2007. Boosted trees for ecological modeling and prediction. Ecology 88 (1), 243–251. Deb, C.R., Jamir, N.S., Kikon, Z.P., 2017. Distribution prediction model of a rare Orchid species (Vanda bicolor Griff.) using small sample size. American. Journal of Plant Sciences 8 (06), 1388. Devkota, K.C., Regmi, A.D., Pourghasemi, H.R., Yoshida, K., Pradhan, B., Ryu, I.C., Althuwaynee, O.F., 2013. Landslide susceptibility mapping using certainty factor, index of entropy and logistic regression models in GIS and their comparison at Mugling-Narayanghat road section in Nepal Himalaya. Nat. Hazards 65 (1), 135–165.

17

Ecological Indicators 112 (2020) 106096

F. Ghareghan, et al. (10), 1058–1069. Mocan, A., Zengin, G., Uysal, A., Gunes, E., Mollica, A., Degirmenci, N.S., Aktumsek, A., 2016. Biological and chemical insights of Morina persica L.: A source of bioactive compounds with multifunctional properties. J. Funct. Foods 25, 94–109. Motevalli, A., Pourghasemi, H.R., Zabihi, M., 2018. Assessment of GIS-based machine learning algorithms for spatial modeling of landslide susceptibility: case study in Iran. In: Reference Module in Earth Systems and Environmental Sciences. Elsevier. http:// dx.doi.org/10.1016/B978-0-12-409548-9.10461-0. Mousazade, M., Ghanbarian, G., Pourghasemi, H.R., Safaeian, R., Cerdà, A., 2019. Maxent Data Mining Technique and Its Comparison with a Bivariate Statistical Model for Predicting the Potential Distribution of Astragalus fasciculifolius Boiss. in Fars, Iran. Sustainability 11 (12), 3452. Mozaffarian, V., 1996. A dictionary of Iranian plant names. Farhang Moaser Publication, Tehran, pp. 396. Naghibi, S.A., Pourghasemi, H.R., Abbaspour, K., 2018. A comparison between ten advanced and soft computing models for groundwater qanat potential assessment in Iran using R and GIS. Theor. Appl. Climatol. 131 (3–4), 967–984. Naghibi, S.A., Pourghasemi, H.R., Dixon, B., 2016. GIS-based groundwater potential mapping using boosted regression tree, classification and regression tree, and random forest machine learning models in Iran. Environ. Monit. Assess. 188 (1), 44. Nelder, J.A., Wedderburn, R.W., 1972. Generalized linear models. Journal of the Royal Statistical Society: Series A (General) 135 (3), 370–384. Nosetto, M.D., Jobbágy, E.G., Paruelo, J.M., 2006. Carbon sequestration in semi-arid rangelands: comparison of Pinus ponderosa plantations and grazing exclusion in NW Patagonia. J. Arid Environ. 67 (1), 142–156. Pearce, J., Ferrier, S., 2000. Evaluating the predictive performance of habitat models developed using logistic regression. Ecol. Model. 133 (3), 225–245. Phillips, S.J., Anderson, R.P., Schapire, R.E., 2006. Maximum entropy modelling of species geographic distributions. Ecol. Model. 190 (3–4), 231–259. Phillips, S.J., Dudík, M., 2008. Modelling of species distributions with Maxent: new extensions and a comprehensive evaluation. Ecography 31 (2), 161–175. Pourghasemi, H.R., Rahmati, O., 2018. Prediction of the landslide susceptibility: which algorithm, which precision? Catena 162, 177–192. Pourghasemi, H.R., Yousefi, S., Kornejady, A., Cerdà, A., 2017. Performance assessment of individual and ensemble data-mining techniques for gully erosion modelling. Sci. Total Environ. 609, 764–775. Pourghasemi, H.R., Mohammady, M., Pradhan, B., 2012. Landslide susceptibility mapping using index of entropy and conditional probability models in GIS: Safarood Basin. Iran. Catena 97, 71–84. Pourghasemi, H.R., Gayen, A., Panahi, M., Rezaie, F., Blaschke, T., 2019. Multi-hazard

probability assessment and mapping in Iran. Sci. Total Environ. https://doi.org/10. 1016/j.scitotenv.2019.07.203. Pourtaghi, Z.S., Pourghasemi, H.R., Aretano, R., Semeraro, T., 2016. Investigation of general indicators influencing on forest fire and its susceptibility modelling using different data mining techniques. Ecol. Ind. 64, 72–84. Qi, X., Silvestrov, S., Nazir, T., 2017. Data classification with support vector machine and generalized support vector machine. In: AIP Conference Proceedings (Vol. 1798, No. 1, p. 020126). AIP Publishing. Rahimian Boogar, A., Salehi, H., Pourghasemi, H.R., Blaschke, T., 2019. Predicting habitat suitability and conserving Juniperus spp habitat using SVM and maximum entropy machine learning techniques. Water 11 (10), 2049. Rahmati, O., Kornejady, A., Samadi, M., Deo, R.C., Conoscenti, C., Lombardo, L., Bui, D.T., 2019. PMT: new analytical framework for automated evaluation of geo-environmental modelling approaches. Sci. Total Environ. 664, 296–311. Rupprecht, F., Oldeland, J., Finckh, M., 2011. Modelling potential distribution of the threatened tree species Juniperus oxycedrus: how to evaluate the predictions of different modelling approaches? J. Veg. Sci. 22 (4), 647–659. Sandifer, P.A., Sutton-Grier, A.E., Ward, B.P., 2015. Exploring connections among nature, biodiversity, ecosystem services, and human health and well-being: opportunities to enhance health and biodiversity conservation. Ecosyst. Serv. 12, 1–15. Shiroyama, R., Yoshimura, C., 2016. Assessing bluegill (Lepomis macrochirus) habitat suitability using partial dependence function combined with classification approaches. Ecol. Inf. 35, 9–18. Shruthi, R.B.V., Kerle, N., Jetten, V., Stein, A., 2014. Object-based gully system prediction from medium resolution imagery using Random Forests. Geomorphology 216, 283–294. Sor, R., Park, Y.S., Boets, P., Goethals, P.L., Lek, S., 2017. Effects of species prevalence on the performance of predictive models. Ecol. Model. 354, 11–19. Soykan, C.U., Eguchi, T., Kohin, S., Dewar, H., 2014. Prediction of fishing effort distributions using boosted regression trees. Ecol. Appl. 24 (1), 71–83. Vapnik, V.N., 1995. The nature of statistical learning theory. Springer, New York. Wang, L.J., Guo, M., Sawada, K., Lin, J., Zhang, J., 2015. Landslide susceptibility mapping in Mizunami City, Japan: a comparison between logistic regression, bivariate statistical analysis and multivariate adaptive regression spline models. Catena 135, 271–282. Yang, X.Q., Kushwaha, S.P.S., Saran, S., Xu, J., Roy, P.S., 2013. Maxent modelling for predicting the potential distribution of medicinal plant, Justicia adhatoda L. in Lesser Himalayan foothills. Ecol. Eng. 51, 83–87. Zhu, Y., Lü, Z.P., Xue, C.B., Wu, W.S., 2009. New triterpenoid saponins and neolignans from Morina kokonorica. Helv. Chim. Acta 92 (3), 536–545.

18