Global Ecology and Conservation 21 (2020) e00894
Contents lists available at ScienceDirect
Global Ecology and Conservation journal homepage: http://www.elsevier.com/locate/gecco
Original Research Article
Investigating spatial non-stationary environmental effects on the distribution of giant pandas in the Qinling Mountains, China Xinping Ye a, b, Xiaoping Yu a, b, Tiejun Wang c, * a
College of Life Sciences, Shaanxi Normal University, Xi’an, 710119, China Research Center for UAV Remote Sensing, Shaanxi Normal University, Xi’an, 710119, China c Department of Natural Resources, Faculty of Geo-Information Science and Earth Observation (ITC), University of Twente, P.O. Box 217, 7500 AE, Enschede, the Netherlands b
a r t i c l e i n f o
a b s t r a c t
Article history: Received 24 August 2019 Received in revised form 20 December 2019 Accepted 20 December 2019
Analyses of species distribution have commonly been performed with global regression models by assuming species-environment interactions are spatially stationary. However, environmental variables are often spatially heterogeneous and their effects on species distribution may vary across space. Here we employed a geographically weighted logistic regression (logistic GWR) to investigate environmental effects on the distribution of giant pandas (Ailuropoda melanoleuca) in the Qinling Mountains of China. Outputs from the logistic GWR were compared with those derived from a global logistic regression in predicting panda distribution. A k-means cluster analysis was used to identify distinct zones of panda-environment relationships. We found that logistic GWR outperformed global logistic regression in terms of goodness-of-fit and predictive accuracy. Results from the logistic GWR model clearly showed both the strength and direction of the environmental effects on panda distribution changed spatially and formed distinct subareas with particular panda-environment relationships. The findings emphasize the importance of considering spatial non-stationarity in studying ecological relationships between organisms and their environments, especially for threatened species such as the giant panda with small populations in highly fragmented habitats. © 2019 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
Keywords: Spatial heterogeneity Species-environment relationship Geographically weighted logistic regression Global logistic regression Giant panda
1. Introduction Understanding of species-environment relationships in heterogeneous landscapes is essential for biodiversity conservation and landscape management (Wu et al., 2000; Turner and Tjørve, 2005). Conventional regression models, such as multivariate linear regression (Radeloff et al., 1999), ordinary least-squares regression (Coppolillo, 2000), and logistic regression (Pereira and Itami, 1991; Augustin et al., 1996), are the most common statistical methods used for estimating the ecological relationships between organism and environmental covariates (Windle et al., 2009; Tavernia and Reed, 2012). These models postulate that the relationships between species’ measures and environmental covariates are spatially stationary over space, and produce constant regression coefficients through the studied area (Fotheringham
* Corresponding author. E-mail address:
[email protected] (T. Wang). https://doi.org/10.1016/j.gecco.2019.e00894 2351-9894/© 2019 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/ licenses/by-nc-nd/4.0/).
2
X. Ye et al. / Global Ecology and Conservation 21 (2020) e00894
et al., 1996, 2002; Mcnew et al., 2013). However, ecological processes in a real landscape are likely to spatially heterogeneous so that spatial non-stationarity emerges (i.e., the regression coefficients vary spatially across areas), bringing about the difficulty for accurate identification of key environmental factors affecting ecological processes of interest (Mitchell et al., 2001; Miller and Hanham, 2011). In such circumstances, models that assume spatially constant relationships (i.e., global regression models) may misinterpret the true factors contributing to underlying spatial patterns of the distribution of organisms and consequently, lead to biased inference and ineffective management (Mitchell et al., 2001; Fotheringham et al., 2002; Mcnew et al., 2013). Therefore, it is critical to investigate whether the effects of environmental factors are spatially heterogeneous before making model inferences over wide areas (Fortin et al., 2006). To improve the power of models in assessing ecological relationships, a few techniques have been developed to account for unexplained spatial dependence in data by adding spatially correlated random effects in standard models, such as spatial autoregressive model (Lichstein et al., 2002; Miller, 2005), spatial spline model (Sangalli et al., 2013), and spatial gaussian fields model (Lindgren et al., 2011; also see Bakka et al., 2019; Martínez-Minaya et al., 2019). These approaches, although promising to capture spatial patterns in the covariates’ effects, can be strongly influenced by departures from the underlying distributions, while the estimated regression coefficients are still spatially invariant (i.e., ‘global’ coefficients) and may overlook potentially influential variables affecting ecological processes of interest (Loucks and Wang, 2004). More recently, a spatial statistical method termed geographically weighted regression (GWR) has been introduced as a powerful method to address spatial heterogeneity in ecological relationships (Brunsdon et al., 1996). GWR extends traditional regression models by allowing each covariate’s parameters to vary among different locations (Brunsdon et al., 1996; Fotheringham et al., 2002). Unlike non-spatial regression models that compute only one set of global coefficients, GWR calculates local coefficients for each observation by weighting neighboring observations with a decreasing function of distance (Zhang and Shi, 2004). Such location-specific coefficients can explicitly show the pattern of spatially-varying species-environment relationships over the areas of interest. GWR has been successfully applied in different studies such as evaluating spatially explicit vegetation-environment relationships (Bickford and Laffan, 2006; Kupfer and Farris, 2007), detecting spatial heterogeneity in avian nest-site selection (Mcnew et al., 2013), investigating spatial determinants of infectious disease (Hu et al., 2012), and detecting links between human leptospirosis and hydrological dynamics (VegaCorredor and Opadeyi, 2014). In this context, GWR could also be a promising tool for understanding the spatial nature of wildlife distribution and how they vary spatial-temporally in heterogeneous landscapes (Shi et al., 2006). The giant panda (Ailuropoda melanoleuca) is a threatened and iconic conservation species, and panda-habitat relationships have been the focal interest of ecological research on giant pandas (Wei et al., 2015). Previous studies have applied a variety of statistical methods (e.g., generalized linear regression (Wang et al., 2010; Kang et al., 2014), discriminate function analysis (Ye et al., 2007), and ecological niche factor analysis (Qi et al., 2012)) to estimate the environmental effects on giant pandas and found that the spatial distribution of giant pandas is regulated by topographic conditions, vegetation fragmentation, and human disturbances (Hull et al., 2014). Although technical details differ among these modeling approaches, they are aspatial and seldom consider the spatial non-stationarity in analyzing panda-habitat relationships (Yackulic and Ginsberg, 2016). If the underlying scientific assumptions about the ecological needs of giant pandas are incorrect, the resulting habitat models would misguide panda conservation and management (Swaisgood et al., 2010; Zhang et al., 2011). Considering the inability of global regression models to capture spatial variations in ecological relationships, there is an urgent need to reassess the conventional wisdom about panda’s habitat requirements. From this perspective, GWR could be an ideal analytical means to improve our understanding of giant pandas’ ecological needs in variable space. However, to our knowledge, no study has employed the GWR technique to investigate spatial non-stationarity in associations between environmental factors and ginat panda distribution and habitat use. In this study, we analyzed the relationships between panda distribution and environmental factors in the Qinling Mountains of central China, using an extension of the GWR technique called geographically weighted logistic regression (hereafter logistic GWR). Specifically, we set out to (1) compare the performances of logistic GWR with commonly-used global logistic regression in modeling the distribution of giant pandas; (2) examine spatial variability in the effects of environmental factors on panda distribution; (3) identify the distinct zones of panda-environment relationships over the studied area.
2. Material and methods 2.1. Study area The study area (32 500 - 34100 N, 106 250 - 108 500 E) extends 400e500 km in the east-west direction through the southern part of Shaanxi Province and covers the whole distribution of giant pandas in the Qinling Mountains, China (Fig. 1). Acting as a boundary between northern China and southern China, the broad, undulating southern slopes capture the warm rains and moisture from the southeastern monsoon and are dominated by evergreen broad-leaved and mixed-deciduous broadleaved forests. According to the fourth national giant panda survey, around 340 giant pandas inhabit the forests in this mountainous region, and most of them are concentrated on the southern slope of the Qinling Mountains, which are found at the northern edge of the giant panda distribution (Shaanxi Provincial Forestry Department, 2017). The forests, as well as panda habitats, are geographically isolated from the other giant panda mountain range in China (State Forestry
X. Ye et al. / Global Ecology and Conservation 21 (2020) e00894
3
Fig. 1. Map of the study area in the Qinling Mountains of central China. Presence (filled triangles) and pseudo-absence (plus signs) samples of giant pandas are extracted from the Third National Giant Panda Survey conducted in 1999e2001.
Administration of China, 2006). Human density and road density vary considerably across the study area, while local residents mostly inhabit in low altitude zones.
2.2. Panda distribution data Giant panda occurrence points (n ¼ 294) were derived from the database of the Third National Giant Panda Survey carried out in 1999e2001 (State Forestry Administration of China, 2006). The survey was conducted via an exhaustive dragnet investigation throughout the study area (Loucks and Wang, 2004). Due to the lack of true absence data, we generated panda pseudo-absences by extracting 500 random points within forest areas but outside 3-km buffer zones around giant panda occurrence locations (Wang et al., 2010). To reduce spatial autocorrelation in the distribution data, both presence and pseudo-absence locations were spatially filtered using a distance of 2 km (Wang et al., 2010) using the “SDMtool box” toolkit (Brown, 2014) in ArcGIS software (ESRI, v. 10.2). The resulting data, containing 135 panda presence and 135 pseudo-absence locations, was then used as the binary response variable for further modeling of panda distribution.
2.3. Environmental variables We first developed a preliminary set of potential predictor variables (see Table A1) that have been commonly used in previous panda habitat studies in the study area (e.g., Feng et al., 2009; Wang et al., 2010; Hull et al., 2014). To reduce the multicollinearity of explanatory variables (see Fig. A1), we statistically thinned the preliminary set of environmental variables by the ‘corSelect’ function in the ‘fuzzySim’ R package (Barbosa, 2015). If the correlation coefficient of a pair of variables was >0.75 or the VIF values were >10 (Zuur et al., 2010), the variables were tested in a bivariate model and the one with a better fit was retained. The final set of the variables included topographic roughness, climate heterogeneity, mean forest patch size, distance between forest patches, distance to road, and distance to settlement (Table 1; also see Fig. A2). Topographic roughness and climate heterogeneity reflect geophysical and climatic conditions, while mean forest patch size and distance between forest patches depict the spatial patterns of forests, and distances to road and settlement measure the severity of human disturbances in the study area (Fig. A2).
2.4. Model development We applied both global logistic regression and logistic GWR models to evaluate the spatial distribution of giant pandas. The global logistic regression model (i.e., binomial GLM with a logistic link) is a non-spatial method that ignores spatial variability and produces a single, spatially stationary coefficient for each environmental variable (Real et al., 2006). In contrast, the logistic GWR model is a local form of the logistic regression and produces location-specific coefficients for each variable which may spatially vary across the studied space (Brunsdon et al., 1996; Fotheringham et al., 2002). A typical logistic GWR model takes the following form:
4
X. Ye et al. / Global Ecology and Conservation 21 (2020) e00894
Table 1 Description of environmental variables used in the modeling of giant panda distribution. Variable name
Description
Source
Topographic The measure of topographic ruggedness at a panda presence/ heterogeneity pseudo-absence location.
Extracted from a 30-m spatial resolution DEM from ASTER GDEM (Hook et al., 2001; downloaded from the USGS website), and resampled to 100-m spatial resolution using the STMtoolbox in ArcGIS. Climate The measure of climatic variation at a panda presence/pseudoExtracted from 30 arc-seconds bioclimatic variables (Hijmans et al., heterogeneity absence location. 2005; downloaded from the WorldClim website), and resampled to 100-m spatial resolution using the STMtoolbox in ArcGIS, Mean forest The average area of the forest patches within a 5-km radius circle Calculated on the National LULC 2000 data with a 30-m spatial patch size around a panda presence/pseudo-absence location. resolution (Liu et al., 2002) using Fragstats v4.2 software, and resampled to 100-m spatial resolution. The nearest distance between the forest patches within a 5-km Calculated on the National LULC 2000 data with a 30-m spatial Distance radius circle around a panda presence/pseudo-absence location. resolution in Fragstats v4.2 software, and resampled to 100-m between spatial resolution forest patches Distance to road Distance to the nearest road from a panda presence/pseudoCalculated on the roadmap (1:100000) from National Geometrics absence location. Center of China, and resampled to 100-m spatial resolution in ArcGIS. Distance to Distance to the nearest human settlement from a panda presence/ Calculated on the residential map (1:100000) from National settlement pseudo-absence location. Geometrics Center of China, and resampled to 100-m spatial resolution in ArcGIS.
" log
1
y*i
y*i
# ¼ b0i þ
X
bki xki þ εi
(1)
k
where y*i is the prediction of the dependent variable y at observation i, b0i is the intercept specific to observation i, and bki is the coefficient for kth covariate at observation i, and xki and εi are the kth covariate and error at observation i, respectively. In a logistic GWR model, local variable coefficients for each observation are fitted via a distance-decay kernel weighting scheme that neighbors closer to the modeled observation are weighted more heavily than those further away (Fotheringham et al., 2002). The kernel bandwidth of the weighting function can be set to ‘fixed’ or ‘adaptive’. A fixed kernel means the bandwidth size keeps constant across space, while an adaptive kernel means the bandwidth size will adjust based on the number of neighbors around the focal observation (Fotheringham et al., 2002; Atkinson et al., 2003). In addition, the bandwidth size of the spatial kernel can affect model fitting by controlling the variance in the weighting function. Narrow bandwidths result in highly localized and varied coefficient estimates, whereas broad bandwidths lead to local coefficients that are similar to global regression coefficients (Mcnew et al., 2013). The same set of panda presence-absence data and environmental variables were used for global logistic regression and logistic GWR modeling. All environmental variables were standardized via z-transformation (Zuur et al., 2010) to facilitate direct comparison of their effects on giant panda distribution. Because the panda presence-absence samples were irregularly spaced in the study area, we adopted a Gaussian spatial weighting function with an adaptive kernel in the logistic GWR, where the optimum bandwidth size was determined by minimizing the corrected Akaike Information Criterion (AICc), as suggested by Fotheringham et al. (2002). 2.5. Model evaluation and comparison The performances of logistic GWR and global logistic regression models were examined based on AICc and adjusted R2 (Miller, 2012). In general, the models are considered to have different goodness of fit if the difference between the two models’ AICc values is greater than four (Saefuddin et al., 2012). We also compared the ability of each model to predict panda presence/absence based on the area under the receiver operating characteristic curve (AUC; Zou et al., 2007). Spatial autocorrelation of regression residuals was examined by the Moran’s I statistics (Moran, 1950). Furthermore, we calculated and compared each model’s classification accuracy (i.e., the percentage of correct predictions), the degree of model agreement (measured by Cohen’s Kappa coefficient), and the true skill statistic (TSS). All statistical analyses were conducted using the ‘gwrr’ package (Wheeler, 2007) and the ‘GWmodel’ package (Lu et al., 2013) in the R environment (R Core Team, 2018). 2.6. Mapping spatial variability in panda-environment relationships To explicitly show how environmental effects on giant pandas varied over space, we converted a set of local parameter estimates from logistic GWR, including local coefficients, local R2 values, and local residuals, to continuous surfaces via the Inverse Distance Weighting (IDW) interpolation (Windle et al., 2009). We further conducted k-means clustering on logistic
X. Ye et al. / Global Ecology and Conservation 21 (2020) e00894
5
GWR-derived local coefficients to detect specific zones of distinct panda-environment relationships (Windle et al., 2009). The number of clusters (k) was set a priori to 2 to 5, and the best number of clusters was estimated based on a gap statistic (Tibshirani et al., 2001; Windle et al., 2009; Liu et al., 2019). Mean regression coefficients were calculated for each cluster. All continuous surfaces of local parameters as well as the spatial patterns of the clusters were mapped in ArcGIS software (ESRI, v10.2).
3. Results The results of our global logistic regression model suggested that the distribution of giant pandas was significantly affected by topographic roughness, climate heterogeneity, distance between forest patches, distance to road, and distance to settlement (p < 0.05; Table 2). Among them, the distance to road had the strongest positive association with giant panda presence, while the distance between forest patches had the strongest negative effect on giant pandas, followed by climate heterogeneity and topographic roughness (Table 2). However, mean forest patch size was less correlated with panda presence in the study area (p > 0.05). The range of logistic GWR-derived local coefficients for each variable was much wider than that in the global logistic regression model (Table 3 & Fig. 2). It is notable that climate heterogeneity and distance to settlement had both negative and positive coefficient values, exhibiting self-inconsistency in their effects on giant panda distribution through the study area (Table 3). Non-stationary effects of environmental variables were further revealed by the maps of their local coefficients, as shown in Fig. 3. For instance, topographic roughness had a significant negative effect on giant panda presence in the central part of the study area, but the influence became weaker in directions east and west (Fig. 3a). In contrast, the influence of climate heterogeneity was strong in the eastern part but had a decreasing trend to the west part of the study area (Fig. 3b). Distance to road was the only variable that had a significant positive effect throughout the study area (Fig. 3e), while the distance to settlement had a weak negative effect on panda distribution in both western and eastern parts of the study area (Fig. 3f). The association between giant panda presence and mean forest patch size was not significant in almost the entire study area, except for a small part in the central-southern part of the study area (Fig. 3c). The logistic GWR model showed a marked improvement over the non-spatial logistic regression model. A lower AICc value and higher adjusted R2 and AUC values for the logistic GWR model (AICc ¼ 230; local adjusted R2 ¼ 0.337e0.589; AUC ¼ 0.965) suggested that logistic GWR fitted data better than the global logistic regression model. The map of logistic GWR-derived local R2 showed that logistic GWR had greater explanatory power in the central part of the study area (Fig. 4). Moreover, the residuals of the logistic GWR model had much weaker spatial autocorrelation (Moran’s I ¼ 0.088; p ¼ 0.293) than that of the logistic regression model (Moran’s I ¼ 0.327; p ¼ 0.002). Regarding the percentage of correctly classified points, the overall percentage of success for the logistic GWR model was 89.7% with a Kappa value of 0.794, whereas the overall success for the logistic regression model was 73.8% with a much lower Kappa value of 0.475 (Table 4). The likelihood maps of giant panda presence obtained from both models show that giant pandas were more likely to occur in the central part of the study area (Fig. 5). However, the logistic regression model produced a spotty probability distribution and over-estimated the possibility of giant panda presence around the peripheral regions (Fig. 5a). In contrast, the prediction map of the logistic GWR model generally coincides with the spatial pattern of giant panda occurrences in the study area (Fig. 5b). Results of the k-means cluster analysis on logistic GWR-derived local coefficients were mapped in Fig. 6. Based on the gap statistics, the best number of clusters for panda-environment relationships in the study area was five (k ¼ 5). When k ¼ 2, a western zone (Cluster 1) was differentiated from the rest (Cluster 2) by a positive relationship with climate heterogeneity (Fig. 6a; Table 5), while Cluster 2 was further divided into two groups at k ¼ 3 because of the counter effects of mean forest patch size (in Fig. 6b; Table 5). As k changed to 4 and 5, Cluster 4 (Fig. 6c) and Cluster 5 (Fig. 6d) were further differentiated from previous Cluster 1 and Cluster 3 respectively, largely due to the different effects of climate heterogeneity and distance to settlement (Table 5).
Table 2 Coefficient estimates for the explanatory variables in the global logistic regression model of giant panda distribution. Variable
Coefficient
Std. error
z-value
p-value
VIF
Intercept Topographic roughness Climate heterogeneity Mean forest patch size Dist. between forest patches Distance to road Distance to settlement
0.098 0.466 0.545 0.406 1.683 1.192 0.359
0.160 0.175 0.188 0.210 0.499 0.197 0.171
0.616 2.658 2.898 1.936 3.371 6.038 2.103
0.538 0.008 0.004 0.053 0.001 <0.001 0.035
e 1.147 1.344 2.034 1.974 1.265 1.222
6
X. Ye et al. / Global Ecology and Conservation 21 (2020) e00894
Table 3 Coefficient estimates for explanatory variables in the logistic GWR model of giant panda distribution, including the overall percentage of negative (% e) and positive (% þ) values. The optimal bandwidth was 21 (number of nearest neighbors). Variable
Minimum
Lower quartile
Median
Higher quartile
Maximum
%e
%þ
Intercept Topographic roughness Climate heterogeneity Mean forest patch size Distance between forest patches Distance to road Distance to settlement
0.661 1.006 1.992 0.835 2.451 0.805 0.351
0.090 0.674 1.418 0.468 2.003 1.152 0.094
0.489 0.489 0.973 0.306 1.741 1.395 0.309
1.138 0.410 0.143 0.232 1.400 1.443 0.486
1.924 0.330 0.375 0.104 0.935 1.626 0.761
26.3 100.0 81.8 100.0 100.0 0.0 17.8
73.7 0.0 18.2 0.0 0.0 100.0 82.2
Fig. 2. Comparison of coefficient estimates for each environmental variable in global logistic regression and logistic GWR models. For the global logistic regression model, the mean coefficients and 95% CIs are presented. For the logistic GWR model, the coefficients are shown as median, IQR, minimum, maximum, and range.
4. Discussion This study evaluated the capability of logistic GWR for characterizing the local relationships between giant panda distribution and environmental variables in the Qinling Mountains of China. Consistent with prior GWR applications (e.g., Windle et al., 2009; Li et al., 2018), we found that logistic GWR was able to characterize how local environmental factors affect the distribution of giant pandas over the study area. Unlike the global logistic regression model that ignores local variability, GWR can explicitly delineate spatial variations in the significance and direction of the panda-environment relationships across space. The global logistic regression analysis suggested that topographic roughness, climate heterogeneity, distance between forest patches, and distance to settlements were significant factors affecting giant panda distribution, which are similar to the results of previous panda research with non-spatial models (e.g., Wang et al., 2010; Kang et al., 2014). However, the GWR-derived estimates of these variables indicated that their effects were not significant in many subareas. For example, the effect of climate heterogeneity was strong in the eastern part but had a decreasing trend to the west part of the study area, indicating that giant pandas prefer more stable weather conditions as in the west part of the Qinling Mountains. We also found that giant pandas were sensitive to the distance to roads but generally insensitive to the size of forest patches throughout the study area, implying that road network was the critical factor limiting giant panda presence and more efforts should be made to reduce the impacts from road traffics. Moreover, the global model suggested that climate heterogeneity had a negative association and distance to settlement a positive association with giant panda distribution, yet the logistic GWR model indicated that they had both positive and negative relationships over the study area, a pattern that had not been reported by previous panda research. The findings may not come as a surprise since the eastern portion of the study area generally has a lower ruggedness as well as a higher density of human settlements than the western part does (Shaanxi Provincial Forestry Department, 2017). The major implication of changing coefficients across space is that the strength of environmental effects on giant panda distribution was non-stationary because of the spatial variability of the environmental variables, showing the importance of spatial heterogeneity in examining interactions between giant pandas and its environmental context. Since ecological relationships are likely to vary intrinsically over space, incorporating spatial information into modeling can help to identify dominant drivers causing spatial inconsistency in species-environment relationships and therefore, greatly improve model predictability in a complex system (Saefuddin et al., 2012). In the present study, the logistic GWR model yielded a more realistic map of giant panda’s presence in comparison with the global logistic regression model, where the latter apparently over-estimated the probability of giant panda presence around the peripheral areas which is not congruent with the actual distribution pattern of giant pandas. This can be expected because our study area covers about 50 thousand km2. In this situation, the non-spatial analytical methods that frequently used by
X. Ye et al. / Global Ecology and Conservation 21 (2020) e00894
7
Fig. 3. Interpolated continuous surfaces of the logistic GWR-derived local coefficient estimates for variables (a) topographic roughness, (b) climate heterogeneity, (c) mean forest patch size, (d) distance between forest patches, (e) distance to road, and (f) distance to settlement. Filled circles denote the samples where the relationship between panda distribution and the variable were significant (p < 0.05), and unfilled circles the not significant samples.
decision-makers could not explore spatially varying relationships and induce biased inferences, and ultimately misguide practices in habitat restoration and designing effective reserve networks. Given the risks of using global parameters to model species distribution in a large heterogeneous region, we suggest future panda-habitat analysis should consider using the GWR technique to improve our understanding of the spatial ecology of giant pandas, for instance, by combining the global regression and GWR methods to improve the prediction of panda presence in heterogeneous landscapes. The present study further underscores the importance of considering spatial non-stationarity when exploring the environmental niche of species across scales. When data are collected within a large spatial extent, it is anticipated that the shapes and strengths of relationships between variables in one or more subareas are different from “global” situations. Our k-means cluster analysis on GWR-derived parameters successfully identified five distinct zones of spatial associations between giant panda distribution and environmental covariates, implying that using global logistic regression (with a single set of parameters) to model species distribution in a large region could be problematic and may lead to ineffective conservation policies and practices. However, designing broad-scale approaches to panda conservation is commonly based on range-wide analyses of species. If we fail to identify the species-environment differentiations and its underlying mechanisms, there would be a danger of mismatch between policy decisions and species’ real needs. Therefore, we need to partition the region into smaller subareas and fit the model separately in each of them to avoid such issues. The GWR technique can be used to spatial partitioning of the data by characterizing distinct areas of unique spatial associations between environmental covariates across space, as noted by Wimberly et al. (2008). This is particularly helpful for decision-makers or landscape managers to design reserve networks or functional management units (MU) for species at a landscape scale. Conventionally the MU boundaries for giant pandas were designed based on administrative boundaries and socioeconomic concerns (e.g., human-
8
X. Ye et al. / Global Ecology and Conservation 21 (2020) e00894
Table 4 Summary of model performances for logistic regression and logistic GWR models of giant panda distribution. Model
n
Deviance
AICc
Pseudo R2
AUC
PSuccessa
Kappa
TSS
Global logistic regression Logistic GWR
270 270
273 192
288 230
0.270 0.337e0.589
0.832 0.965
73.8 89.7
0.475 0.794
0.478 0.795
Fig. 5. Probability of giant panda presence predicted by (a) global logistic regression and (b) logistic GWR models in the study area.
inhabited or not). Such MUs are convenient for administrative management but may to some extent lack ecological significance. When comparing the five distinct zones characterized by the k-means cluster analysis (k ¼ 5) for giant pandas with current MUs planning for The Giant Panda National Park, three different clusters of panda-environment relationships were found within the core distribution areas of the Qinling Mountains. Therefore, a comprehensive analysis of unique zones and refinement of the current MUs with consideration of non-stationarity in ecological processes would greatly benefit the regional conservation of giant pandas. In addition to the GWR method, there are a few other spatial modeling techniques that also deal with spatial dependency ez et al. (2008) compared GWR with moving windows regression and moving windows kriging and non-stationarity. Pa techniques and found that GWR performs best for interpolation. In general, kriging can incorporate systematic residual information (i.e. error autocorrelation) to get an improved predictive model, while GWR is designed to model spatially heterogeneous processes as well as locational effects. The present study demonstrated that logistic GWR not only had more
X. Ye et al. / Global Ecology and Conservation 21 (2020) e00894
9
Fig. 6. Mapped results of k-means cluster analyses of the local coefficient estimates from the logistic GWR for giant panda distribution, (a) k ¼ 2, (b) k ¼ 3, (c) k ¼ 4, and (d) k ¼ 5.
Table 5 Mean logistic GWR coefficient estimates (s.d. in parentheses) for each cluster identified by the k-means cluster analyses. k Cluster n
Topographic roughness
Climate heterogeneity
Mean forest patch size
Distance between forest patches
Distance to road
Distance to settlement
2 1 2 3 1 2 3 4 1 2 3 4 5 1 2 3 4 5
0.414 0.618 0.414 0.480 0.720 0.398 0.480 0.720 0.453 0.398 0.480 0.627 0.453 0.804
0.002 (0.277) 1.259 (0.391) 0.002 (0.277) 1.567 (0.274) 1.032 (0.300) 0.087 (0.209) 1.567 (0.274) 1.032 (0.300) 0.212 (0.308) 0.087 (0.209) 1.567 (0.274) 1.005 (0.262) 0.212 (0.308) 1.058 (0.331)
0.396 0.350 0.396 0.263 0.414 0.273 0.263 0.414 0.688 0.273 0.263 0.312 0.688 0.507
1.304 1.905 1.304 1.748 2.021 1.133 1.748 2.021 1.709 1.133 1.748 1.772 1.709 2.247
1.043 1.436 1.043 1.395 1.467 0.967 1.395 1.467 1.222 0.967 1.395 1.485 1.222 1.450
0.322 (0.155) 0.251 (0.316) 0.322 (0.155) 0.559 (0.112) 0.024 (0.207) 0.401 (0.092) 0.559 (0.112) 0.024 (0.207) 0.134 (0.103) 0.401 (0.092) 0.559 (0.112) 0.128 (0.144) 0.134 (0.103) 0.161 (0.152)
91 179 91 76 103 64 76 103 27 64 76 49 27 54
(0.058) (0.174) (0.058) (0.072) (0.155) (0.036) (0.072) (0.155) (0.079) (0.036) (0.072) (0.148) (0.079) (0.106)
(0.211) (0.150) (0.211) (0.089) (0.154) (0.092) (0.089) (0.154) (0.088) (0.092) (0.089) (0.102) (0.088) (0.133)
(0.298) (0.282) (0.298) (0.143) (0.303) (0.116) (0.143) (0.303) (0.178) (0.116) (0.143) (0.243) (0.178) (0.117)
(0.163) (0.073) (0.163) (0.039) (0.078) (0.113) (0.039) (0.078) (0.118) (0.113) (0.039) (0.098) (0.118) (0.048)
explanatory power but also identified spatial non-stationarity in relationships between giant pandas and their environments. Furthermore, GWR is ‘GIS-friendly’ and can interact seamlessly with GIS platforms to spatially visualize the results such as local coefficient estimates, predictions, residuals, etc. Spatial mapping of these statistics enables us to examine geographic “hotspots” in the data that would be missed in a global analysis (Kimsey et al., 2008), which further helps explore the critical relationships generating the patterns uncovered by GWR. In these regards, the GWR would be a promising tool in the analysis of the distribution of a species in heterogeneous landscapes. While GWR performs well in spatial relationship analysis, there has been some controversy about whether GWR is appropriate for making inferences about multivariate spatial relationships (P aez et al., 2011). Fotheringham et al. (2002) noted that GWR should be used with caution because geographical coordinates are the only information required to estimate local coefficients at unobserved locations. Li et al. (2018) mentioned that this method is not suitable for predicting species’ future distribution under substantial changes in environmental conditions. Because local regression coefficients in GWR are estimated based on the neighboring observations, attention should be given to potential collinearity in local coefficients when interpreting ecological associations between species and their environment (Wheeler and Tiefelsdorf, 2005). Furthermore, sampling density can influence the performance of GWR (Chen et al., 2012; Ye et al., 2017), and sparse data points may not be sufficient to produce ecologically interpretable results because the estimation of local parameters requires large data quantities when applying GWR analysis (Fotheringham et al., 2002). Additionally, we acknowledge that the study would undoubtedly be refined with the inclusion of additional explanatory variables (e.g., food quantity and quality) affecting
10
X. Ye et al. / Global Ecology and Conservation 21 (2020) e00894
giant pandas’ distribution. Nevertheless, our relatively simple analysis shows that GWR is a promising tool for taking spatial heterogeneity into account in analyzing ecological relationships, which allows us to gain deep insight into the causes and consequences of spatial non-stationarity in ecological processes. 5. Conclusions The study demonstrates that logistic GWR is preferable to global logistic regression in characterizing panda-environment relationships in the Qinling Mountains, China. Using the logistic GWR technique, we successfully delineated how the significance and direction of the relationships between giant panda distribution and environment variables varied across the study area. Specific zones with distinct panda-environment associations were also identified by the spatial clustering of logistic GWR-derived coefficients. The outputs of GWR are not just a reflection of the spatial variations of the covariates but provide important insights into the relative importance of the ecological drivers at local scales. In this regard, GWR could be a promising tool for identifying spatial-varying ecological relationships and providing useful information for species of conservation concern in a heterogeneous landscape. Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Acknowledgments This work was supported by the National Key Research and Development Program of China (grant number 2016YFC0503200) and the National Natural Science Foundation of China (grant number 31672310). We thank the Forestry and Grassland Administration of Shaanxi Province for providing the 3rd panda survey data. We are also grateful for the helpful comments from the editor and anonymous reviewers. Appendix A. Supplementary data Supplementary data to this article can be found online at https://doi.org/10.1016/j.gecco.2019.e00894. Appendices
Table A.1 Preliminary set of potential predictor variables for the correlation testing prior to panda-habitat analysis. Variable name
Description
Altitude (m) Elevation at a panda presence/pseudo-absence location Topographic roughness Measure of topographic ruggedness at a panda presence/ pseudo-absence location Climate heterogeneity Measure of climatic variation at a panda presence/pseudoabsence location. Largest patch area (ha) Area of the largest forest patch within a 5 km radius of a panda presence/pseudo-absence location. Number of patches Number of forest patches within a 5 km radius of a panda presence/pseudo-absence location. Proportional abundance of forest in the landscape within a Percentage of landscape occupied 5 km radius of a panda presence/pseudo-absence location. by forest (%) Mean forest patch size Mean size of forest patches within a 5 km radius of a panda (ha) presence/pseudo-absence location. Distance between Nearest distance between forest patches within a 5 km patches (km) radius of a panda presence/pseudo-absence location. Distance to road (km) Distance to the nearest public road from a panda presence/ pseudo-absence location Distance to settlement Distance to the nearest human settlement from a panda (km) presence/pseudo-absence location
Source SRTM 90 m DEM (downloaded from the USGS website) Extracted from SRTM 90 m DEM using the STMtoolbox in ArcGIS Extracted from 30 arc-seconds bioclimatic variables (downloaded from the WorldClim website) using the STMtoolbox in ArcGIS Calculated on the 30-m National LULC 2000 data in Fragstats software Calculated on the 30-m National LULC 2000 data in Fragstats software Calculated on the 30-m National LULC 2000 data in Fragstats software Calculated on the 30-m National LULC 2000 data in Fragstats software Calculated on the 30-m National LULC 2000 data in Fragstats software Calculated on the 1:100000 roadmap from National Geometrics Center of China in ArcGIS Calculated on the 1:100000 residential map from National Geometrics Center of China in ArcGIS
X. Ye et al. / Global Ecology and Conservation 21 (2020) e00894
11
Fig. A.1. Coefficients of pairwise correlations for all environmental variables.
Fig. A.2. Maps of the six environmental variables used in the models. (a) topographic roughness, (b) climate heterogeneity, (c) mean forest patch size, (d) distance between forest patches, (e) distance to road, and (f) distance to settlement.
12
X. Ye et al. / Global Ecology and Conservation 21 (2020) e00894
References Atkinson, P.M., German, S.E., Sear, D.A., Clark, M.J., 2003. Exploring the relations between riverbank erosion and geomorphological controls using geographically weighted logistic regression. Geogr. Anal. 35, 58e82. Augustin, N.H., Mugglestone, M.A., Buckland, S.T., 1996. An autologistic model for the spatial distribution of wildlife. J. Appl. Ecol. 33, 339e347. Bakka, H., Vanhatalo, J., Illian, J.B., Simpson, D., Rue, H., 2019. Non-stationary Gaussian models with physical barriers. Spatial Stat. 29, 268e288. Barbosa, A.M., 2015. fuzzySim: applying fuzzy logic to binary similarity indices in ecology. Methods Ecol. Evol. 6, 853e858. Bickford, S.A., Laffan, S.W., 2006. Multi-extent analysis of the relationship between pteridophyte species richness and climate. Glob. Ecol. Biogeogr. 15, 588e601. Brown, J.L., 2014. SDMtoolbox: a python-based GIS toolkit for landscape genetic, biogeographic and species distribution model analyses. Methods Ecol. Evol. 5, 694e700. Brunsdon, C., Fotheringham, A.S., Charlton, M.E., 1996. Geographically weighted regression: a method for exploring spatial nonstationarity. Geogr. Anal. 28, 281e298. Chen, G., Zhao, K.G., McDermid, G.J., Hay, G.J., 2012. The influence of sampling density on geographically weighted regression: a case study using forest canopy height and optical data. Int. J. Remote Sens. 33, 2909e2924. Coppolillo, P.B., 2000. The landscape ecology of pastoral herding: spatial analysis of land use and livestock production in east africa. Hum. Ecol. 28, 527e560. Feng, T.-T., Manen, F.T.v., Zhao, N.-X., Li, M., Wei, F.-W., 2009. Habitat assessment for giant pandas in the qinling mountain region of China. J. Wildl. Manag. 73, 852e858, 7. Fortin, M.J., Dale, M.R.T., Hoef, J.M.V., 2006. Spatial Analysis in Ecology. Fotheringham, A.S., MartinCharlton, ChrisBrunsdon, 1996. The geography of parameter space: an investigation of spatial non-stationarity. Int. J. Geogr. Inf. Syst. 10, 605e627. Fotheringham, A.S., Brunsdon, C., Charlton, M., 2002. Geographically Weighted Regression : the Analysis of Spatially Varying Relationships. John Wiley and Sons, West Sussex, England, United Kingdom. Hijmans, R.J., Cameron, S.E., Parra, J.L., Jones, P.G., Jarvis, A., 2005. Very high resolution interpolated climate surfaces for global land areas. Int. J. Climatol. 25, 1965e1978. Hook, S.J., Myers, J.E.J., Thome, K.J., Fitzgerald, M., Kahle, A.B., 2001. The MODIS/ASTER airborne simulator (MASTER) - a new instrument for earth science studies. Remote Sens. Environ. 76, 93e102. Hu, M., Li, Z., Wang, J., Jia, L., Liao, Y., Lai, S., Guo, Y., Zhao, D., Yang, W., 2012. Determinants of the incidence of hand, foot and mouth disease in China using geographically weighted regression models. PLoS One 7 e38978-e38978. Hull, V., Roloff, G., Zhang, J., Liu, W., Zhou, S., Huang, J., Xu, W., Ouyang, Z., Zhang, H., Liu, J., 2014. A synthesis of giant panda habitat selection. Ursus 25, 148e162, 15. Kang, D., Wang, X., Yang, H., Duan, L., Li, J., 2014. Habitat use by giant panda in relation to man-made forest in Wanglang Nature Reserve of China. Environ. Sci. Pollut. Control Ser. 21, 13440e13445. Kimsey, M.J., Moore, J., Mcdaniel, P., 2008. A geographically weighted regression analysis of douglas-fir site index in north central Idaho. For. Sci. 54, 356e366. Kupfer, J.A., Farris, C.A., 2007. Incorporating spatial non-stationarity of regression coefficients into predictive vegetation models. Landsc. Ecol. 22, 837e852. Li, B., Cao, J., Guan, L., Mazur, M., Chen, Y., Wahle, R.A., 2018. Estimating spatial non-stationary environmental effects on the distribution of species: a case study from American lobster in the Gulf of Maine. ICES (Int. Counc. Explor. Sea) J. Mar. Sci. 75, 1473e1482. Lichstein, J.W., Simons, T.R., Shriner, S.A., Franzreb, K.E., 2002. Spatial autocorrelation and autoregressive models in ecology. Ecol. Monogr. 72, 445e463. € m, J., 2011. An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential Lindgren, F., Rue, H., Lindstro equation approach. J. R. Stat. Soc. Ser. B 73, 423e498. Liu, C., Liu, J., Jiao, Y., Tang, Y., 2019. Exploring spatial nonstationary environmental effects on species distribution: a case study of Yellow Perch in Lake Erie. Peer J. 7 e27592v1. Liu, J., Liu, M., Deng, X., Zhuang, D., Zhang, Z., Di, L., 2002. The Land-use and land-cover change database and its relative studies in China. J. Geogr. Sci. 12, 275e282. Loucks, C., Wang, H., 2004. Panel Report 9.1: Assessing the Habitat and Distribution of the Giant PandaMethods and Issues, pp. 149e154. Lu, B., Harris, P., Gollini, I., Charlton, M., Brunsdon, C., 2013. GWmodel: an R Package for Exploring Spatial Heterogeneity. Martínez-Minaya, J., Conesa, D., Bakka, H., Pennino, M.G., 2019. Dealing with physical barriers in bottlenose dolphin (Tursiops truncatus) distribution. Ecol. Model. 406, 44e49. Mcnew, L.B., Gregory, A.J., Sandercock, B.K., 2013. Spatial heterogeneity in habitat selection: nest site selection by greater prairie-chickens. J. Wildl. Manag. 77, 791e801. Miller, J., 2005. Incorporating spatial dependence in predictive vegetation models: residual interpolation methods. Prof. Geogr. 57, 169e184. Miller, J.A., 2012. Species distribution models:Spatial autocorrelation and non-stationarity. Prog. Phys. Geogr.: Earth Environ. 36, 681e692. Miller, J.A., Hanham, R.Q., 2011. Spatial nonstationarity and the scale of specieseenvironment relationships in the Mojave Desert, California, USA. Int. J. Geogr. Inf. Sci. 25, 423e438. Mitchell, M.S., Lancia, R.A., Gerwin, J.A., 2001. Using landscape-level data to predict the distribution of birds on a managed forest: effects of scale. Ecol. Appl. 11, 1692e1708. Moran, P.A.P., 1950. Notes on continuous stochastic phenomena. Biometrika 37, 17e23. ez, A., Long, F., Farber, S., 2008. Moving window approaches for hedonic price estimation: an empirical comparison of modelling techniques. Urban Stud. Pa 45, 1565e1581. ez, A., Farber, S., Wheeler, D., 2011. A simulation-based study of geographically weighted regression as a method for investigating spatially varying Pa relationships. Environ. Plan. 43, 2992e3010. Pereira, J.M.C., Itami, R.M., 1991. GIS-based habitat modeling using logistic multiple regression- A study of the Mt. Graham red squirrel. Photogramm. Eng. Remote Sens. 57, 1475. Qi, D., Zhang, S., Zhang, Z., Hu, Y., Yang, X., Wang, H., Wei, F., 2012. Measures of giant panda habitat selection across multiple spatial scales for species conservation. J. Wildl. Manag. 76, 1092e1100. R Core Team, 2018. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. Radeloff, V.C., Pidgeon, A.M., Hostert, P., 1999. Habitat and population modelling of roe deer using an interactive geographic information system. Ecol. Model. 114, 287e304. Real, R., Barbosa, A.M., Vargas, J.M., 2006. Obtaining environmental favourability functions from logistic regression. Environ. Ecol. Stat. 13, 237e245. Saefuddin, A., Setiabudi, N.A., Fitrianto, A., 2012. On comparison between logistic regression and geographically weighted logistic regression: with application to Indonesian poverty data. World Appl. Sci. J. 19. Sangalli, L.M., Ramsay, J.O., Ramsay, T.O., 2013. Spatial spline regression models. J. R. Stat. Ser. Soc. B Stat. Methodol. 75, 681e703. Shaanxi Provincial Forestry Department, S., 2017. In: ^ (Ed.), The Pandas of Shaanxi-The 4th Survey Report on Giant Pandas in Shaanxi Province, China. Shaanxi Science & Technology Press, Xi’an.
X. Ye et al. / Global Ecology and Conservation 21 (2020) e00894
13
Shi, H., Laurent, Edward, Lebouton, J., Joseph, Racevskis, Laila, Hall, Kimberly, R., Donovan, 2006. Local spatial modeling of white-tailed deer distribution. Ecol. Model. 190, 171e189. State Forestry Administration of China, S., 2006. The Third National Survey Report on Giant Pandas in China. Science Press, Beijing. Swaisgood, R.R., Wei, F., Wildt, D.E., Kouba, A.J., Zhang, Z., 2010. Giant panda conservation science: how far we have come. Biol. Lett. 6, 143e145. Tavernia, B.G., Reed, J.M., 2012. The impact of exotic purple loosestrife (lythrum salicaria) on wetland bird abundances. Am. Midl. Nat. 168, 352e363, 12. Tibshirani, R., Walther, G., Hastie, T., 2001. Estimating the number of clusters in a data set via the gap statistic. J. R. Stat. Soc. Ser. B 63, 411e423. Turner, W.R., Tjørve, E., 2005. Scale-dependence in species-area relationships. Ecography 28, 721e730. Vega-Corredor, M.C., Opadeyi, J., 2014. Hydrology and public health: linking human leptospirosis and local hydrological dynamics in Trinidad, West Indies. Earth Perspect. 1, 3. Wang, T., Ye, X., Skidmore, A.K., Toxopeus, A.G., 2010. Characterizing the spatial distribution of giant pandas (Ailuropoda melanoleuca) in fragmented forest landscapes. J. Biogeogr. 37, 865e878. Wei, F., Swaisgood, R., Hu, Y., Nie, Y., Yan, L., Zhang, Z., Qi, D., Zhu, L., 2015. Progress in the ecology and conservation of giant pandas. Conserv. Biol. 29, 1497e1507. Wheeler, D., Tiefelsdorf, M., 2005. Multicollinearity and correlation among local regression coefficients in geographically weighted regression. J. Geogr. Syst. 7, 161e187. Wheeler, D.C., 2007. Diagnostic tools and A remedial method for collinearity in geographically weighted regression. Environ. Plan. 39, 2464e2481. Wimberly, M.C., Yabsley, M.J., Baer, A.D., Dugan, V.G., Davidson, W.R., 2008. Spatial heterogeneity of climate and land-cover constraints on distributions of tick-borne pathogens. Glob. Ecol. Biogeogr. 17, 189e202. Windle, M.J.S., Rose, G.A., Devillers, R., Fortin, M.-J., 2009. Exploring spatial non-stationarity of fisheries survey data using geographically weighted regression (GWR): an example from the Northwest Atlantic. ICES (Int. Counc. Explor. Sea) J. Mar. Sci. 67, 145e154. Wu, J., Jelinski, D.E., Luck, M., Tueller, P.T., 2000. Multiscale analysis of landscape heterogeneity: scale variance and pattern metrics. Geograp. Inf. Sci. 6, 6e19. Yackulic, C.B., Ginsberg, J.R., 2016. The scaling of geographic ranges: implications for species distribution models. Landsc. Ecol. 31, 1195e1208. Ye, H., Huang, W., Huang, S., Huang, Y., Zhang, S., Dong, Y., Chen, P., 2017. Effects of different sampling densities on geographically weighted regression kriging for predicting soil organic carbon. Spatial Stat. 20, 76e91. Ye, X., Yong, Y., Yu, C., Zhang, Z., 2007. Den selection by the giant panda in foping nature reserve, China. J. Nat. Hist. 41, 2529e2536. Zhang, L., Shi, H., 2004. Local modeling of tree growth by geographically weighted regression. For. Sci. 50, 225e244. Zhang, Z., Swaisgood, R.R., Zhang, S., Nordstrom, L.A., Wang, H., Gu, X., Hu, J., Wei, F., 2011. Old-growth forest is what giant pandas really need. Biol. Lett. 7, 403. Zou, K.H., O’Malley, A.J., Mauri, L., 2007. Receiver-operating characteristic analysis for evaluating diagnostic tests and predictive models. Circulation 115, 654e657. Zuur, A.F., Ieno, E.N., Elphick, C.S., 2010. A protocol for data exploration to avoid common statistical problems. Methods Ecol. Evol. 1, 3e14.