Impacts of the spatial scale of climate data on the modeled distribution probabilities of invasive tree species throughout the world Ji-Zhong Wan, Chun-Jing Wang, Fei-Hai Yu PII: DOI: Reference:
S1574-9541(16)30093-0 doi: 10.1016/j.ecoinf.2016.10.001 ECOINF 711
To appear in:
Ecological Informatics
Received date: Revised date: Accepted date:
15 July 2016 1 October 2016 4 October 2016
Please cite this article as: Wan, Ji-Zhong, Wang, Chun-Jing, Yu, Fei-Hai, Impacts of the spatial scale of climate data on the modeled distribution probabilities of invasive tree species throughout the world, Ecological Informatics (2016), doi: 10.1016/j.ecoinf.2016.10.001
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
Impacts of the spatial scale of climate data on the modeled distribution probabilities of invasive tree species throughout the world
IP
T
Ji-Zhong Wan, Chun-Jing Wang, Fei-Hai Yu*
NU
SC R
School of Nature Conservation, Beijing Forestry University, Beijing 100083, China
* Corresponding author
MA
E-mail:
[email protected] Tel: +86 10 62336173
TE
D
Address: School of Nature Conservation, Beijing Forestry University, Beijing 100083, China
AC
CE P
Running title: Spatial scale and distribution probability
1
ACCEPTED MANUSCRIPT
Highlights
CE P
TE
D
MA
NU
SC R
IP
T
Climate data scale could affect predicted species distribution probabilities. Responses of climatic variables to distribution probabilities may vary with scale. The 5.0 arc-minute resolution was the best for modeling distributions of invasive trees. Different numbers of presence points between scales may elevate uncertainty.
AC
2
ACCEPTED MANUSCRIPT
Abstract
TE
D
MA
NU
SC R
IP
T
Species distribution models (SDMs) are powerful tools to predict species distributions, and thus support invasion risk assessments for tree species at the global scale. However, SDMs may produce different species distribution probabilities depending on the spatial scale of climate data included in the model. Hence, we must understand impacts of the climate data scale on the modeled distribution probabilities of invasive tree species (ITS) throughout the world. We used nine ITS from the list of “The 100 of the World's Worst Invasive Alien Species” as our study species, and applied Maxent modeling based on presence and background points to model the distribution probabilities of these ITS across the globe using three climate data scales: 2.5, 5.0 and 10.0 arc-minutes. The average distribution probabilities of presence and background points across the nine focal ITS increased significantly from the 2.5 to the 10.0 arc-minute resolution, indicating that coarse climate data scales may increase the distribution probabilities of presence and background points for these focal species. The large gap between different climate data scales resulted in high prediction uncertainty for the distribution probabilities of ITS. We offer two suggestions for decreasing the prediction uncertainty of the distribution probabilities of ITS at the global scale due to the effects of the climate data scale when using SDMs: 1) use 5.0 arc-minute resolution as the input to SDMs when using GBIF or other specimen databases; and 2) decrease the gap between 2.5, 5.0 and 10.0 arc-minutes in the number of presence points of ITS.
AC
CE P
Keywords: climate data scale; distribution probability; globe; species distribution model; tree invaders; Worldclim.
3
ACCEPTED MANUSCRIPT
1. Introduction
CE P
TE
D
MA
NU
SC R
IP
T
Invasive tree species (ITS) have been suggested as a model group in plant invasion ecology at the global scale (Rejmánek and Richardson et al., 2013; Rejmánek, 2014). Previous studies have shown that climatic variables are the main driving factors shaping the global distribution patterns of ITS and may facilitate invasion of ITS via strong adaptation and rapid spread into areas of high protection value (Nunez et al., 2011; Monahan et al., 2013; Aguirre-Gutiérrez et al., 2015; Hof, 2015). The invasion of ITS can impact invaded systems in several ways: 1) ITS can occupy the suitable habitat of native species so that those native species may not survive; 2) ITS can change the ecological landscape and result in habitat fragmentation; 3) ITS can break the structure of communities and ecosystems (Nunez et al., 2011; Rejmánek and Richardson et al., 2013; Rejmánek, 2014; Donaldson et al., 2014; Rundel et al., 2014). Species distribution models (SDMs) are widely used to predict the global distributions of invasive plant species based on climatic variables (Thuiller et al., 2005; Donaldson et al., 2014; Mainali et al., 2015). The outputs of such modeling are used, for instance, to put forth feasibility suggestions for biological conservation and invasion risk control (Thuiller et al., 2005; Liang et al., 2014). Despite these important uses, there are still many technical challenges associated with SDMs, and solving such problems will greatly increase the prediction precision of the models and thus bolster environmental management or policymaking (Convertino et al., 2014; Mainali et al., 2015). For example, ecological transferability limits the application of SDMs for prediction of ITS distributions (Donaldson et al., 2014; Ray et al., 2016). To address this limit, ecologists have used SDMs to project the distributions of ITS based on climate data for native and invaded ranges at the global scale (Mainali et al., 2015; Shabani and Kumar, 2015).
AC
Species distribution patterns and determinants are known to vary with the spatial scale of climate data (Wang et al., 2012; Rahbek and Graves, 2001). Reasons for this include: 1) a scale mismatch between large-scale ecological effects of climate change and species distributions with small scales of resolution (Rahbek and Graves, 2001); and 2) with the expansion of geographical extent, the explanatory power of climate variables such as environmental energy, water availability and climatic seasonality increase, while the explanatory power of habitat heterogeneity and human activities decrease (Wang et al., 2009, 2012). Therefore, projections of species distributions using SDMs may vary based on the climate data scales selected for models. Previous studies have shown that higher model performance was observed at finer data scales (Guisan et al., 2007; Gottschalk et al., 2011; Franklin et al., 2013). In comparison with the fine scale, SDMs at coarse scales may result in large prediction uncertainties for potential species distributions (Franklin et al., 2013). However, coarse-grained occurrence records, for example, from the Global Biodiversity Information Facility (GBIF), are unable to accurately predict species’ distributions at fine scales of climate data (Gottschalk et al., 2011; Song et al., 2013; Beck et al., 2014). These findings suggest that the response of species occurrence probability to different climate data scales is an important consideration for modelers estimating species distribution models at the 4
ACCEPTED MANUSCRIPT
global scale.
SC R
IP
T
Here, we address two questions: 1) How do different climate data scales affect projections of distributions of ITS at the global scale? and 2) How can we reduce prediction uncertainty resulting from the impacts of the climate data scale on projections of distribution probability of ITS throughout the world? To address these questions, we selected nine ITS from the list of “The 100 of the World's Worst Invasive Alien Species” compiled by the Invasive Species Specialist Group (www.issg.org; Luque et al., 2014) as our focal study species, and used Maxent modeling, a common SDM method, to project the distributions of ITS throughout the world using three climate data scales: 2.5, 5.0 and 10.0 arc-minutes.
NU
2. Methods and Materials
MA
2.1. Species data and climate data
TE
D
Species with more than 100 occurrence records were chosen for this study to maximize the reliability of logistic SDMs (Wisz et al., 2008; Fig. 1 and Table 1). Occurrence records for the nine ITS, especially specimens or recorded sightings, were compiled from GBIF (www.gbif.org; Fig. 1).
AC
CE P
Eight bioclimatic variables for input into the SDMs were downloaded from the WorldClim database (averaged from 1950-2000; www.worldclim.org; Table 2). These eight bioclimatic variables represent general trends (means), variation (seasonality), and limits (i.e. minimum and maximum) which are likely to influence the distribution and physiological performance of ITS (Hijmans and Graham, 2006). We used three spatial scales of resolutions of bioclimatic variables (2.5, 5.0 and 10.0 arc-minutes) because these resolutions are commonly used in SDMs (www.worldclim.org). 2.2. Species distribution modeling We used Maxent (ver.3.3.3k; http://www.cs.princeton.edu/~schapire/maxent/) to model the distribution of the nine ITS across the three scales of climate data based on maximum entropy (Phillips et al. 2006). Maxent modeling has the following advantages: 1) Maxent typically outperforms other methods in predictive accuracy based on the presence points (Merow et al., 2013); 2) Maxent is nonlinear, nonparametric, and not sensitive to multi-collinearity (Evangelista et al., 2011); 3) Maxent can estimate the importance of environmental variables to species distributions based on the jackknife method (Elith et al., 2011); 4) Maxent can have good prediction performance when the number of input species occurrence localities is low (Pearson et al., 2007; Wisz et al., 2008). Maxent produces a prediction map based on a logistic output format wherein cells with a value of 1 have the highest possibility of distribution, and those with a value of 0 the lowest. Species distribution areas were predicted based on similarity in climatic conditions between the study region and sites where 5
ACCEPTED MANUSCRIPT
T
occurrence localities have already been recorded (Merow et al., 2013). Maxent modeling may have possible applications in biological conservation, biological invasion and ecological restoration (Thuiller et al., 2005; Donaldson et al., 2014; Denoël and Ficetola, 2015; Gelviz-Gelvez et al., 2015).
MA
NU
SC R
IP
When running the Maxent modeling, we removed the duplicated presence records in the same grid cell across the different scales (Phillips et al. 2006; Elith et al., 2011). The replicated run types were cross-validated to determine estimates of uncertainty for the response curves and predictions (Merow et al., 2013). We used a five-fold cross-validation approach to divide the presence dataset into five approximately equal partitions with four of the partitions used to train the model and the fifth to generate the SDM estimate (Merow et al., 2013). We set the regularization multiplier (beta) to 2.0 to produce a smooth and general response (Radosavljevic and Anderson, 2014). The convergence threshold was set to 0.0001. The maximum number of background points was 10,000, and default features were used in the model output. Other values were kept at default (after Elith et al., 2011).
AC
CE P
TE
D
We assessed the performance of the models using the area under the ROC curve (AUC). This statistic regards each value of the estimate as a possible threshold based on the corresponding sensitivity and specificity when randomly selected background points are removed from the dataset (Phillips et al. 2006). To ensure the high precision of SDM at the three spatial scales, we used SDMs with AUC values above 0.7 (Elith et al., 2011). The omission rate is the proportion of the sample units within grid cells that are predicted to be species absence within the occurrence localities (Phillips et al. 2006). These are 1-sided p-values for the null hypothesis that test points are predicted no better than a random prediction with the same fractional predicted area. The binomial probabilities were based on five common thresholds in Maxent modeling (10th percentile training presence; Equal training sensitivity and specificity; Maximum training sensitivity plus specificity; Equal test sensitivity and specificity; Maximum test sensitivity plus specificity; Anderson and Gonzalez, 2011). An omission rate of lower than 17% is a necessary condition for a good model (Anderson et al., 2002; Phillips et al. 2006). These five common thresholds were used to evaluate the Kappa statistic and true statistical skill (TSS) for the models (Allouche et al., 2006). In our study, the prevalence is the proportional occurrence of presences in a data set of 2.5 arc-minute resolution (Allouche et al., 2006). The Kappa statistic can correct the overall accuracy of model predictions by the accuracy expected to occur by chance (Allouche et al., 2006). TSS can explain the observed unimodal dependency of Kappa on prevalence (Allouche et al., 2006; Shabani et al., 2016). Both Kappa and TSS ranges from −1 to 1, where 1 indicates good model performance and values of zero or less indicate a performance no better than random, or poor model performance (Allouche et al., 2006; Shabani et al., 2016). The model performance was considered useful when the values of Kappa and TSS were over 0.3, and the performance was good when the values were over 0.4 (Faleiro et al., 2013; Guo et al., 2015).
6
ACCEPTED MANUSCRIPT
2.3. Impacts of climate data scales on distribution probability
IP
T
First, we analyzed the variance of AUC, the omission rates (including training and test data), Kappa and TSS for the 2.5, 5.0 and 10.0 arc-minute resolutions (Gueta and Carmel, 2016). Here, paired-sample T-tests were used to compute the difference of these values based on the five common thresholds across the nine ITS.
MA
NU
SC R
Secondly, the jackknife test was used in Maxent to analyze the importance of different climatic variables to species distributions based on the percentage contribution (PC) and the response curves of climatic variables to species distribution probabilities across 2.5, 5.0 and 10.0 arc-minute resolutions (Merow et al., 2013). PC was used to assess the contribution of the environmental variable to the final model (where the combined variables summed to 100%; Merow et al., 2013). We considered the variable to be important if its PC was at least 15% of the models for each ITS (Oke and Thompson, 2015).
CE P
TE
D
Thirdly, we explored the variance of the distribution probabilities based on presence (that is, occurrence localities) and background points along the gradient of climate data scales. We used the number of presence points available for the 2.5 arc-minute resolution, and selected 10000 background points from the map of the world for each ITS using ENMtools 1.4.4 (Warren et al., 2010; Elith et al., 2011; Fig. 1). We computed the average values of the distribution probabilities for the presence and background points of each ITS. Paired-sample T-tests were used to evaluate the difference in the average distribution probabilities of presence and background points between 2.5, 5.0 and 10.0 arc-minute resolutions for the nine ITS.
AC
Finally, we used linear regression analysis to assess the relationship between the distribution probabilities of each ITS predicted using any two of the three scales of climatic data (2.5, 5.0 and 10.0 arc-minute resolutions) based on the presence and background points separately. Previous studies have shown that the number of presence points as an input into Maxent modeling could affect the performance of species distribution prediction models (Pearson et al., 2007; Wisz et al., 2008; Gueta and Carmel, 2016). Hence, we tested whether the gap of presence points as an input into Maxent modeling could have a significant impact on the relationship between the distribution probabilities predicted for different climate data scales for the nine ITS. The gap (namely, the difference in the number of presence points between two climate data scales) was computed using the number of presence points as inputs into the Maxent models based on the two climate data scales. We used linear regression analysis to analyze the relationship between the gap and R2 of the relationships between the distribution probabilities of the nine ITS predicted using any two of the three scales of climatic data. All data analysis was conducted in JMP 11.0 (SAS, USA). 3. Results
7
ACCEPTED MANUSCRIPT
TE
D
MA
NU
SC R
IP
T
The AUC measurements of SDM accuracy were above 0.7 (Table 1), Kappa and TSS were over 0.4 (except for Kappa of Spathodea campanulata (over 0.3); Tables S1 and S2) and the training and test omission rates were low (P < 0.01; Tables S3 and S4), indicating accurate predictions (Tables 1, S1 and S2; Fig. 2). There were no significant differences in AUC and Kappa between the 2.5, 5.0 and 10.0 arc-minute resolutions (T-test: P > 0.05). However, TSS values of the 10.0 arc-minute resolution were significantly larger than the 2.5 and 5.0 arc-minute resolutions for the 10th percentile training presence (T-test: P < 0.05). The training omission rates of the Maxent models based on the 2.5 arc-minute resolution were significantly larger than those based on the 5.0 arc-minute resolution for the 10th percentile training presence (T-test: P < 0.05). The training omission rates of the 2.5 arc-minute resolution were significantly lower than those based on the 5.0 and 10.0 arc-minute resolutions for Maximum training sensitivity plus specificity (T-test: P < 0.01), and the training omission rates of the 2.5 arc-minute resolution were significantly lower than those of the 10.0 arc-minute resolution (T-test: P < 0.05; Fig. 2). The test omission rates of Maxent modeling based on the 2.5 arc-minute resolution were significantly lower than those based on the 5.0 arc-minute resolution for the 10th percentile training presence (T-test: P < 0.05), and the test omission rates based on the 2.5 arc-minute resolution were significantly lower than the 5.0 and 10.0 arc-minute resolutions for Maximum training sensitivity plus specificity (T-test: P < 0.01).
AC
CE P
Temperature seasonality was the most important climatic variable for the modeled distribution probabilities of all studied ITS except for Prosopis glandulosa across the 2.5, 5.0 and 10.0 arc-minute resolutions at the global scale (Table 3). The importance of maximum temperature of the warmest month (for Acacia mearnsii, Cinchona pubescens, and P. glandulosa), minimum temperature of the coldest month (for Cecropia peltata, P. glandulosa, and S. campanulata), annual precipitation (for Miconia calvescens and S. campanulata) and precipitation of the driest month (for M. quinquenervia) to species distribution probabilities may differ across the different scales of climate data (Table 3). For example, maximum temperature of the warmest month was the most important variable affecting distribution probabilities of P. glandulosa at the 5.0 and 10.0 arc-minute resolutions, but not at the 2.5 arc-minute resolution (Table 3). Minimum temperature of the coldest month would have the largest contribution to the distribution probabilities of S. campanulata at the 2.5 and 10.0 arc-minute resolutions, but not at the 5.0 arc-minute resolution (Table 3). According to the response curves, the distribution probabilities of P. glandulosa would increase sharply and then fall at the 2.5 and 5.0 arc-minute resolutions (Figs. 3a and c), but fall sharply at the 10.0 arc-minute resolution from 40°C to 50°C as the maximum temperature of the warmest month (Fig. 3e). When the minimum temperature of the coldest month was higher than 20°C, the reduction degrees of distribution probabilities of S. campanulata were smaller at the 2.5 and 5.0 arc-minute resolutions than at the 10.0 the arc-minute resolution (Figs. 3b, d and f). The average distribution probabilities across the nine ITS increased significantly from the 2.5 to 10.0 arc-minute resolutions based on presence and background points (T-test: P < 0.05; 8
ACCEPTED MANUSCRIPT
NU
SC R
IP
T
Fig. 4; Table S5). However, there was no significant difference in the average distribution probabilities between the 2.5 and 5.0 arc-minute resolutions based on background points (T-test: P > 0.05). In addition, there were significant correlations (R2) between the distribution probabilities of presence and background points based on the 2.5, 5.0 and 10.0 arc-minute resolutions and the presence and background points of each ITS (Fig. 5). These relationships (R2) for the nine ITS ranged from 0.784 to 0.977 for the 2.5 and 5.0 arc-minute resolutions, from 0.564 to 0.937 for the 2.5 and 10.0 arc-minute resolutions and from 0.703 to 0.959 for the 5.0 and 10.0 arc-minute resolutions based on presence points. The relationships (R2) ranged from 0.940 to 0.992 for the 2.5 and 5.0 arc-minute resolutions, from 0.870 to 0.981 for the 2.5 and 10.0 arc-minute resolutions and from 0.937 to 0.978 for the 5.0 and 10.0 arc-minute resolutions based on the background points (Fig. 5).
TE
D
MA
Finally, with a decreasing gap in the number of presence points for different climate data scales, the correlation (R2) between the average distribution probabilities between the 2.5, 5.0 and 10.0 arc-minute resolutions increased significantly across the nine ITS for analyses based on presence points (P < 0.05; Fig. 6a, c, e), but this significance did not exist for analyses based on background points (P > 0.05; Fig. 6b, d, f). The correlation between the gap and R2 using the 2.5 and 5.0 arc-minute resolutions based on the presence points was the largest (R2 = 0.674; P = 0.007; Fig. 6a). 4. Discussion
AC
CE P
We estimated the impact of the spatial scale of climate data (namely, 2.5, 5.0 and 10.0 arc-minute resolutions) on predictions of the distribution of ITS based on global presence and background points, and found that climate data scale can significantly alter distribution probabilities of ITS and species functional responses to variables. Furthermore, a deceasing gap in the numbers of occurrence localities used as inputs into SDMs using climate data of different scales may reduce the prediction uncertainty of global distribution probabilities of ITS. Our study provided important insights into the selection of an appropriate spatial scale of climate data as an input of SDM, and implies that the impact of climate data scales must be assessed when considering the distribution probabilities of invasive tree species. AUC is currently considered the standard method of assessing the accuracy of SDMs (Lobo et al., 2008), and according to AUC our Maxent modeling outputs based on the 2.5, 5.0 and 10.0 arc-minute resolutions may all be accurate (AUC values higher than 0.7; Table 1). However, previous studies have shown that AUC alone is insufficient for evaluating SDMs for invasive species that may not be at distribution equilibrium at the global scale (Lobo et al., 2008; Anderson et al., 2011). Hence, we also focused on presence models in a continuous output using the binomial test based on binary maps (Phillips et al. 2006; Anderson and Gonzalez, 2011). We found that the model performance based on the 2.5 arc-minute resolution was better than that based on the other two scales, according to the training and test omission rates for the 10th percentile training presence and Maximum training sensitivity 9
ACCEPTED MANUSCRIPT
SC R
IP
T
plus specificity (except for the training omission rates; Fig. 2), indicating that higher resolution data can be essential for deriving accurate predictions of the distribution patterns of ITS from Maxent modeling. Even so, the training and test omission rates were lower than 17% and Kappa and TSS over 0.4 (Tables S1~S4), indicating that our Maxent model outputs based on the 2.5, 5.0 and 10.0 arc-minute resolutions were all robust (Anderson et al., 2002). Despite this, a poor selection of the climate data scale could increase prediction uncertainty for global-scale distribution probabilities of ITS (Figs. 2 and 4).
AC
CE P
TE
D
MA
NU
We found that the average distribution probabilities of presence and background points across the nine ITS increased significantly between the models using the 2.5 and 10.0 arc-minute resolutions, indicating that coarse climate data scales may increase the distribution probability for the nine ITS at the global scale, and that a higher gap between different climate scales can result in larger prediction uncertainty for Maxent modeling (Figs. 4 and 5). Previous studies have shown that the use of coarse climate data scales dramatically increases the regional-scale distribution probabilities of species as estimated using SDM (Guisan et al., 2007; Gottschalk et al., 2011; Franklin et al., 2013). Song et al. (2013) showed that greater grain sizes of grid cells could decrease the accuracy of modeling outputs. These results were consistent with our study. Some studies have reported a significant relationship between distribution probability using fine or coarse scales, indicating that the selection of certain scales might lead to over- or underestimation in SDMs (Franklin et al., 2013; Bean et al., 2014). In our study, differences in results of Maxent at different scales also suggested this pattern, particularly for presence points (Figs. 4 and 5). Some researchers use a presence/absence threshold for each individual species and produce a binary map of distributions for ITS (Anderson et al., 2002; Hof et al., 2015). However, Merow et al. (2013) found that this method of setting the probability threshold could produce bias in predictions, and Calabrese et al. (2014) demonstrated that threshold values typically over-predict species distributions. This prediction uncertainty of species distribution based on a threshold may be due to the impact of climate data scales on the distribution probabilities of grid cells (Anderson and Gonzalez, 2011; Merow et al. 2013; Calabrese et al. 2014). The fact that omission rates may increase with increasing climate data scales also suggests that prediction uncertainty may be affected by probability thresholds in SDM (Fig. 2). Reducing this uncertainty is a challenge in predicting ITS distributions. In this study, we used occurrence records of species from GBIF and climate data from the Worldclim database as inputs for Maxent modeling. However, GBIF has a large sampling bias in occurrence records of species, and could not provide the precise distribution data of occurrence localities at a fine scale (a similar problem also exists in specimen databases around the world; Delisle et al., 2003; Beck et al., 2014). At the same time, the fine scale climate data from Worldclim database (such as the 2.5 arc-minute resolution) could not support the prediction of ITS distributions using SDM based on GBIF (Beck et al., 2014). Hence, we offer suggestions on selecting the appropriate scale of climate data to model the global distribution of ITS and minimize the scale-influenced estimation uncertainty of SDM 10
ACCEPTED MANUSCRIPT
NU
SC R
IP
T
results (Guisan et al., 2007; Gottschalk et al., 2011). First, the results of the jackknife tests and the response curves suggested that the importance of climatic variables to species distribution probabilities may vary with different climate data scales for ITS (namely, the bold vs. non-bold values in Table 3). Furthermore, the distribution probabilities of ITS are likely to have a unimodal response to temperature at the 10.0 arc-minute resolution, indicating that coarse-scale climate data may result in an imprecise estimate of species response functions (Fig. 3; Franklin et al., 2013). Thus, the coarse-scale data may over- or under-estimate the species distribution probabilities between the specific periods of temperature or precipitation (Fig. 3), and then lead to the prediction uncertainty of the distributions of ITS (Franklin et al., 2013). Hence, we do not suggest the use of the 10.0 arc-minute resolution as the appropriate scale for modeling species distributions to improve management strategies for ITS in the world.
AC
CE P
TE
D
MA
Secondly, we found that the distribution probabilities across the nine ITS based on background points was not significantly different between the 2.5 and 5.0 arc-minute resolutions (Fig. 4), and that R2 of the relationship was the largest between distribution probabilities of ITS predicted using the 2.5 and 5.0 arc-minute resolutions (Fig. 5). These results indicate that the distribution probability derived from the 5.0 arc-minute resolution was similar to that of the finer scale (2.5 arc-minute resolution). Also, R2 of the relationship between the distribution probabilities predicted using climatic data of the 5.0 and 10.0 arc-minute resolutions was significantly larger than that predicted using climatic data of the 2.5 and 10.0 arc-minute resolutions (Fig. 5). Furthermore, the 5.0 arc-minute resolution was the finest scale at which it is possible to correct the sampling bias of GBIF (Raes et al., 2013; Beck et al., 2014). Therefore, we suggest the use of the 5.0 arc-minute resolution as the appropriate spatial scale for climate data input into SDM when modeling the distributions of ITS using GBIF or other specimen databases. Additionally, our results indicate that decreasing the gap in the number of presence points between the 2.5, 5.0 and 10.0 arc-minute resolutions could reduce the prediction uncertainty of ITS distributions at the global scale based on presence points (Fig. 6a, c and e). With a large number of occurrence records, it would be possible to correct and remove the sampling bias associated with occurrence localities based on the 2.5 arc-minute resolution in each grid cell of the 5.0 arc-minute resolution, and keep the remaining occurrence record(s) in the previous grid cell of the 5.0 arc-minute resolution (Gottschalk et al., 2011; Kramer-Schadt et al., 2013; Gueta and Carmel, 2016). This would reduce the gap in the number of presence points between the 2.5 and 5.0 arc-minute resolutions (Gueta and Carmel, 2016). Removing occurrence records with sampling bias in each cell of the 5.0 arc-minute resolution is the key step. Once this is done, it would be possible to use the corrected occurrence records and climate data at the 5.0 arc-minute resolution as inputs for Maxent to model the distribution probability of ITS based on presence and background points throughout the world. The effect of the scale on the results of SDMs remains a challenge for researchers and land 11
ACCEPTED MANUSCRIPT
SC R
IP
T
managers, keeping them from using data to make reasonable and accurate decisions and policies for prevention and control of plant invasion at the global scale. The results of SDMs (namely, the distribution probabilities of species across all scales) can be used to predict the distributions of invasive plants throughout the world and to guide such policy development. We hope that future studies can expand the application of SDMs to provide feasible suggestions for risk evaluation of invasive species based on different spatial scales of climate data. Acknowledgements
AC
CE P
TE
D
MA
NU
We thank the two anonymous reviewers for their valuable comments on an early version of the manuscript, and the Fundamental Research Funds for the Central Universities (2015ZCQ-BH-01), NSFC (31570413) and the National Key Research and Development Program of China (2016YFC1201100) for support.
12
ACCEPTED MANUSCRIPT
AC
CE P
TE
D
MA
NU
SC R
IP
T
References Aguirre-Gutiérrez, J., Serna-Chavez, H.M., Villalobos-Arambula, A.R., Pérez de la Rosa, J.A., Raes, N., 2015. Similar but not equivalent: ecological niche comparison across closely–related Mexican white pines. Diversity and Distributions 21, 245-257. Allouche, O., Tsoar, A. Kadmon, R., 2006. Assessing the accuracy of species distribution models: prevalence, kappa and the true skill statistic (TSS). Journal of Applied Ecology 43, 1223–1232. Anderson, R.P., Gonzalez, I., 2011. Species-specific tuning increases robustness to sampling bias in models of species distributions: an implementation with Maxent. Ecological Modelling 222, 2796-2811. Anderson, R.P., Gómez-Laverde, M., Peterson, A.T., 2002. Geographical distributions of spiny pocket mice in South America: insights from predictive models. Global Ecology and Biogeography 11, 131-141. Bean, W.T., Prugh, L.R., Stafford, R., Butterfield, H.S., Westphal, M., Brashares, J.S., 2014. Species distribution models of an endangered rodent offer conflicting measures of habitat quality at multiple scales. Journal of Applied Ecology 51, 1116-1125. Beck, J., Böller, M., Erhardt, A., Schwanghart, W., 2014. Spatial bias in the GBIF database and its effect on modeling species' geographic distributions. Ecological Informatics 19, 10-15. Calabrese, J.M., Certain, G., Kraan, C., Dormann, C.F., 2014. Stacking species distribution models and adjusting bias by linking them to macroecological models. Global Ecology and Biogeography 23, 99-112. Convertino, M., Muñoz-Carpena, R., Chu-Agor, M.L., Kiker, G.A., Linkov, I., 2014. Untangling drivers of species distributions: Global sensitivity and uncertainty analyses of MaxEnt. Environmental Modelling & Software 51, 296-309. Delisle, F., Lavoie, C., Jean, M., Lachance, D., 2003. Reconstructing the spread of invasive plants: taking into account biases associated with herbarium specimens. Journal of Biogeography 30, 1033-1042. Denoël, M., Ficetola, G.F., 2015. Using kernels and ecological niche modeling to delineate conservation areas in an endangered patch‐breeding phenotype. Ecological Applications 25, 1922-1931. Donaldson, J.E., Hui, C., Richardson, D.M., Robertson, M.P., Webber, B.L., Wilson, J.R., 2014. Invasion trajectory of alien trees: the role of introduction pathway and planting history. Global Change Biology 20, 1527-1537. Elith, J., Phillips, S.J., Hastie, T., Dudík, M., Chee, Y.E., Yates, C.J., 2011. A statistical explanation of MaxEnt for ecologists. Diversity and Distributions 17, 43-57. Evangelista, P. H., Kumar, S., Stohlgren, T. J., Young, N. E., 2011. Assessing forest vulnerability and the potential distribution of pine beetles under current and future climate scenarios in the Interior West of the US. Forest Ecology and Management 262: 307-316. Faleiro, F.V., Machado, R.B., Loyola, R.D., 2013. Defining spatial conservation priorities in the face of land-use and climate change. Biological Conservation 158:248-257. 13
ACCEPTED MANUSCRIPT
AC
CE P
TE
D
MA
NU
SC R
IP
T
Franklin, J., Davis, F.W., Ikegami, M., Syphard, A.D., Flint, L.E., Flint, A.L., Hannah, L., 2013. Modeling plant species distributions under future climates: how fine scale do climate projections need to be? Global Change Biology 19, 473-483. Gelviz-Gelvez, S.M., Pavón, N.P., Illoldi-Rangel, P., Ballesteros-Barrera, C., 2015. Ecological niche modeling under climate change to select shrubs for ecological restoration in Central Mexico. Ecological Engineering 74, 302-309. Gottschalk, T.K., Aue, B., Hotes, S., Ekschmitt, K., 2011. Influence of grain size on species–habitat models. Ecological Modelling 222 3403-3412. Gueta, T., Carmel, Y., 2016. Quantifying the value of user-level data cleaning for big data: A case study using mammal distribution models. Ecological Informatics 34, 139–145. Guisan, A., Graham, C.H., Elith, J., Huettmann, F., 2007. Sensitivity of predictive species distribution models to change in grain size. Diversity and Distributions 13, 332-340. Guo, C., Lek, S., Ye, S., Li, W., Liu, J., Li, Z., 2015. Uncertainty in ensemble modelling of large-scale species distribution: Effects from species characteristics and model techniques. Ecological Modelling 306, 67-75. Hijmans, R.J., Graham, C.H. 2006. The ability of climate envelope models to predict the effect of climate change on species distributions. Global Change Biology 12, 2272-2281. Hof, A.R., 2015. Alien species in a warming climate: a case study of the nutcracker and stone pines. Biological Invasions 17, 1533-1543. Kramer-Schadt, S., Niedballa, J., Pilgrim, J.D., Schröder, B., Lindenborn, J., Reinfelder, V., Stillfried, M., Heckmann, I., Scharf, A.K., Augeri, D.M., Cheyne, S.M., Hearn, A.J., Ross, J., Macdonald, D.W., Mathai, J., Eaton, J., Marshall, A.J., Semiadi, G., Rustam, R., Bernard, H., Alfred, R., Samejima, H., Duckworth, J.W., Breitenmoser-Wuersten, C., Belant, J.L., Hofer, H., Wilting, A., 2013. The importance of correcting for sampling bias in MaxEnt species distribution models. Diversity and Distributions 19, 1366-1379. Liang, L., Clark, J.T., Kong, N., Rieske, L.K., Fei, S., 2014. Spatial analysis facilitates invasive species risk assessment. Forest Ecology and Management 315, 22-29. Lobo, J.M., Jiménez-Valverde, A., Real, R., 2008. AUC: a misleading measure of the performance of predictive distribution models. Global Ecology and Biogeography 17, 145-151. Luque, G.M., Bellard, C., Bertelsmeier, C., Bonnaud, E., Genovesi, P., Simberloff, D., Courchamp, F., 2014. The 100th of the world’s worst invasive alien species. Biological Invasions 16, 981-985. Mainali, K.P., Warren, D.L., Dhileepan, K., McConnachie, A., Strathie, L., Hassan, G., Karki, D., Shrestha, B.B., Parmesan, C., 2015. Projecting future expansion of invasive species: comparing and improving methodologies for species distribution modeling. Global Change Biology 21, 4464-4480. Merow, C., Smith, M.J., Silander, J.A., 2013. A practical guide to MaxEnt for modeling species’ distributions: what it does, and why inputs and settings matter. Ecography 36, 1058-1069. Monahan, W.B., Cook, T., Melton, F., Connor, J., Bobowski, B., 2013. Forecasting distributional responses of limber pine to climate change at management-relevant scales 14
ACCEPTED MANUSCRIPT
AC
CE P
TE
D
MA
NU
SC R
IP
T
in Rocky Mountain National Park. PloS One 8, e83163. Nunez, M.A., Medley, K.A., 2011. Pine invasions: climate predicts invasion success; something else predicts failure. Diversity and Distributions 17, 703-713. Oke, O.A., Thompson, K.A., 2015. Distribution models for mountain plant species: The value of elevation. Ecological Modelling 301, 72-77. Pearson, R.G., Raxworthy, C.J., Nakamura, M., Townsend Peterson, A., 2007. Predicting species distributions from small numbers of occurrence records: a test case using cryptic geckos in Madagascar. Journal of Biogeography 34, 102-117. Phillips, S.J., Anderson, R.P., Schapire, R.E., 2006. Maximum entropy modeling of species geographic distributions. Ecological Modelling 190, 231-259. Rundel, P.W., Dickie, I.A., Richardson, D.M., 2014. Tree invasions into treeless areas: mechanisms and ecosystem processes. Biological Invasions 16, 663-675. Radosavljevic, A., Anderson, R.P., 2014. Making better Maxent models of species distributions: complexity, overfitting and evaluation. Journal of Biogeography 41, 629-643. Rahbek, C., Graves, G.R., 2001. Multiscale assessment of patterns of avian species richness. Proceedings of the National Academy of Sciences 98, 4534-4539. Raes, N., Saw, L.G., van Welzen, P.C., Yahara, T., 2013. Legume diversity as indicator for botanical diversity on Sundaland, South East Asia. South African Journal of Botany 89, 265-272. Ray, D., Behera, M.D., Jacob, J., 2016. Improving Spatial Transferability of Ecological Niche Model of Hevea brasiliensis using Pooled Occurrences of Introduced Ranges in Two Biogeographic Regions of India. Ecological Informatics 34, 153-163. Rejmánek, M., 2014. Invasive trees and shrubs: where do they come from and what we should expect in the future? Biological Invasions 16, 483-498. Rejmánek, M., Richardson, D.M., 2013. Trees and shrubs as invasive alien species-2013 update of the global database. Diversity and Distributions 19, 1093-1094. Shabani, F., Kumar, L., 2015. Should species distribution models use only native or exotic records of existence or both? Ecological Informatics 29, 57-65. Shabani, F., Kumar, L., Ahmadi, M., (2016), A comparison of absolute performance of different correlative and mechanistic species distribution models in an independent area. Ecology and Evolution 6, 5973–5986. Song, W., Kim, E., Lee, D., Lee, M., Jeon, S.W., 2013. The sensitivity of species distribution modeling to scale differences. Ecological Modelling 248, 113-118. Thuiller, W., Richardson, D.M., Pyšek, P., Midgley, G.F., Hughes, G.O., Rouget, M., 2005. Niche-based modelling as a tool for predicting the risk of alien plant invasions at a global scale. Global Change Biology 11, 2234-2250. Wang, Z., Brown, J.H., Tang, Z., Fang, J., 2009. Temperature dependence, spatial scale, and tree species diversity in eastern Asia and North America. Proceedings of the National Academy of Sciences 106, 13388-13392. Wang, Z., Rahbek, C., Fang, J., 2012. Effects of geographical extent on the determinants of woody plant diversity. Ecography 35, 1160-1167. 15
ACCEPTED MANUSCRIPT
AC
CE P
TE
D
MA
NU
SC R
IP
T
Warren, D.L., Glor, R.E., Turelli, M., 2010. ENMTools: a toolbox for comparative studies of environmental niche models. Ecography 33, 607-611. Wisz, M.S., Hijmans, R.J., Li, J., Peterson, A.T., Graham, C.H., Guisan, A., 2008. Effects of sample size on the performance of species distribution models. Diversity and Distributions 14, 763-773.
16
ACCEPTED MANUSCRIPT
AC
CE P
TE
D
MA
NU
SC R
IP
T
Fig. 1. Occurrence records of the nine focal invasive plant species (ITS), as well as background points.
17
ACCEPTED MANUSCRIPT
AC
CE P
TE
D
MA
NU
SC R
IP
T
Fig. 2. The training and test omission rates for Maxent modeling. Different spatial scales of climate data are represented as 2.5, 5.0 and 10.0 and refer to corresponding arc-minute resolutions. 1: The 10th percentile training presence; 2: Equal training sensitivity and specificity; 3: Maximum training sensitivity plus specificity; 4: Equal test sensitivity and specificity; 5: Maximum test sensitivity plus specificity. Error bars represent standard deviation.
18
ACCEPTED MANUSCRIPT
AC
CE P
TE
D
MA
NU
SC R
IP
T
Fig. 3. The response curves of two climatic variables to species distributions probabilities of Prosopis glandulosa (a, c and e) and Spathodea campanulata (b, d and f) across the 2.5, 5.0 and 10.0 arc-minute resolutions. Bio5 represents maximum temperature of the warmest month, and Bio6 represents minimum temperature of the coldest month. Probability represents the average distribution probabilities for P. glandulosa and S. campanulata.
19
ACCEPTED MANUSCRIPT
AC
CE P
TE
D
MA
NU
SC R
IP
T
Fig. 4. The average distribution probability across the nine ITS for the presence and background points. Distribution probability represents the average distribution probability across the nine focal ITS. Different spatial scales of climate data are represented as 2.5, 5.0 and 10.0 and refer to corresponding arc-minute resolutions. Error bars represent standard deviation. Presence represents the average distribution probability across nine ITS based on the presence points, and background represents the average distribution probability across nine ITS based on the background points.
20
ACCEPTED MANUSCRIPT
AC
CE P
TE
D
MA
NU
SC R
IP
T
Fig. 5. Correlation (R2) of distribution probabilities of presence and background points across the nine focal ITS between models predicted using climatic data of any two of three scales. Different spatial scales of climate data are represented as 2.5, 5.0 and 10.0 and refer to corresponding arc-minute resolutions. The upper vertical bar represents the maximum extent of R2; the lower vertical bar represents the minimum extent of R2; the point in the box represents the median; the line in the box represents the mean.
21
ACCEPTED MANUSCRIPT
AC
CE P
TE
D
MA
NU
SC R
IP
T
Fig. 6. The relationship between the gap in the number of presence points and R2 across climate data scales. Different spatial scales of climate data are represented as 2.5, 5.0 and 10.0 and refer to corresponding arc-minute resolutions. Presence gap represented the subtracting values of the input number of the occurrence records between different scales (namely, 2.5, 5.0 and 10.0 arc-minute resolutions).
22
ACCEPTED MANUSCRIPT Tables Table 1. The number of occurrence localities and AUC values for the nine focal invasive plant species.
6 7 8 9
Pinus pinaster Prosopis glandulosa Spathodea campanulata Mean SD
463
338
296
245
1475
1349
1128
1058
673
420
504
445
7248
4479
917
824
666
361
331
1117.4
738. 4
1233. 2
612
382
1570. 4 2055. 3
T
526
387
2320
2.5 0.93 7 0.94 8 0.97 8 0.89 3 0.95 5 0.95 5 0.81 3 0.93 9 0.93 7 0.92 8 0.04 6
IP
565
SC R
686
AUCTraining 5.0 10.0 0.95 0.96 3 4 0.95 0.95 2 2 0.98 0.98 1 3 0.89 0.89 7 5 0.96 0.97 7 2 0.95 0.95 9 8 0.86 0.91 3 6 0.94 0.94 4 7 0.94 0.94 4 4 0.94 0.94 8 0.03 0.02 5 6
NU
5
1104
MA
4
1647
2.5 0.93 8 0.95 2 0.97 9 0.89 6 0.95 6 0.95 8 0.81 3 0.94 3 0.94 5 0.93 1 0.04 7
D
3
10.0
CE P
2
Acacia mearnsii Cecropia peltata Cinchona pubescens Leucaena leucocephala Melaleuca quinquenervia Miconia calvescens
2.5*
Record 5.0
AC
1
Name
TE
Cod e
AUCTest 5.0 0.95 2 0.94 7 0.97 9 0.89 3 0.96 4 0.95 5 0.86 2 0.94 1 0.93 5 0.93 6 0.03 4
10.0 0.96 2 0.94 7 0.98 0 0.89 2 0.96 8 0.95 4 0.91 5 0.94 3 0.93 6 0.94 4 0.02 6
*2.5, 5.0 and 10.0 represented Maxent modeling based 2.5, 5.0 and 10.0 arc-minute resolutions, respectively.
23
ACCEPTED MANUSCRIPT
Table 2. Climate data used as inputs for Maxent modeling. Unit
AC
CE P
TE
D
MA
NU
SC R
Bio1 Annual mean temperature Bio4 Temperature seasonality Bio5 Max. temperature of the warmest month Bio6 Min. temperature of the coldest month Bio12 Annual precipitation Bio13 Precipitation of the wettest month Bio14 Precipitation of the driest month Bio15 Precipitation seasonality SD represented Standard Deviation.
T
Climate variables
IP
Code
24
°C*10 SD*100 °C*10 °C*10 mm mm mm C of V
Bio1 3 0.2
Bio1 4 3.3
Bio1 5 4.4
1.6
3.6
0.7
0.3
3.0
1.5
2.4
5.6
2.2
3.6
6.7
2.2
6.1
4.9
0.0
2.9
0.4
0.0
2.6
0.5
0.0
8.5
0.3
0.3
9.7
1.3
0.8
12.5
1.2
4.7
12.0
1.4
0.0
18.1
0.8
0.0
10.7
2.2
0.3
9.9
3.2
0.1
12.6
2.1
0.2
12.9
1.8
0.1
13.4
1.4
0.1
0.3
3.1
0.1
0.6
3.6
0.2
0.5
3.2
0.4
AC
CE P
TE
D
MA
NU
SC R
IP
Table 3. The results of jackknife test from Maxent modeling. Species Scal Bio Bio Bio Bio Bio1 e 1 4 5 6 2 Acacia mearnsii 2.5* 28. 39. 15. 5.4 3.1 4 6 7 5.0 32. 38. 15. 4.0 4.7 0 3 4 10.0 35. 35. 12. 5.3 7.6 9 2 0 Cecropia peltata 2.5 0.4 37. 1.9 19. 31.0 9 2 5.0 0.5 53. 0.7 8.9 24.1 4 10.0 0.7 42. 1.0 15. 27.6 4 1 Cinchona pubescens 2.5 3.7 64. 26. 1.0 1.1 4 5 5.0 9.0 63. 21. 0.7 2.8 0 4 10.0 13. 60. 13. 3.0 0.6 6 6 4 Leucaena 2.5 0.4 45. 8.7 34. 0.3 leucocephala 1 3 5.0 1.0 43. 6.2 34. 0.2 5 6 10.0 1.3 46. 4.0 29. 0.4 5 8 Melaleuca 2.5 9.1 42. 9.6 0.7 19.2 quinquenervia 5 5.0 7.4 46. 10. 2.4 20.8 0 5 10.0 5.8 47. 9.3 2.5 21.9 0 Miconia calvescens 2.5 3.5 50. 8.0 9.2 14.3 2 5.0 5.8 48. 6.9 8.8 15.5 1 10.0 3.0 62. 7.2 7.3 5.5 0 Pinus pinaster 2.5 42. 31. 0.0 19. 3.2 9 3 1 5.0 41. 28. 0.2 21. 4.8 1 2 4 10.0 44. 27. 0.0 18. 5.0
T
ACCEPTED MANUSCRIPT
25
ACCEPTED MANUSCRIPT 5 13. 11.7 5.4 0.3 8.7 0 5.0 6.9 13. 9.1 10.3 0.4 8.9 9 10.0 8.5 0.6 10.5 17. 11.0 5.1 9 Spathodea 2.5 4.8 2.7 45. 21. 17.8 0.2 campanulata 5 3 5.0 4.0 54. 5.7 10. 15.1 0.2 8.1 1.4 9 5 10.0 2.8 48. 5.9 19. 14.7 0.4 7.2 1.8 1 1 *2.5, 5.0 and 10.0 represented Maxent modeling based 2.5, 5.0 and 10.0 arc-minute resolutions, respectively. The codes of climatic variables were the same as Table 2. The bold values represented the important variables to species distribution probabilities.
9 6.3
13. 2 16. 5 18. 1 5.9
T
8 41. 5 34. 1 28. 3 1.9
IP
2.5
AC
CE P
TE
D
MA
NU
SC R
Prosopis glandulosa
26
ACCEPTED MANUSCRIPT Highlights
T
IP
SC R NU MA D TE CE P
Climate data scale could affect predicted species distribution probabilities. Responses of climatic variables to distribution probabilities may vary with scale. The 5.0 arc-minute resolution was the best for modeling distributions of invasive trees. Different numbers of presence points between scales may elevate uncertainty.
AC
27