A Machine Learning approach to the assessment of the vulnerability of Posidonia oceanica meadows

Ecological Indicators 108 (2020) 105744 Contents lists available at ScienceDirect Ecological Indicators journal homepage: www.elsevier.com/locate/ec...

Download PDF

2MB Sizes 7 Downloads 47 Views

Report

PDF Reader
Full Text

Ecological Indicators 108 (2020) 105744

Contents lists available at ScienceDirect

Ecological Indicators journal homepage: www.elsevier.com/locate/ecolind

A Machine Learning approach to the assessment of the vulnerability of Posidonia oceanica meadows

T

Elena Catuccia,b, , Michele Scardia,b ⁎

a b

Department of Biology, University of Rome “Tor Vergata”, via della Ricerca Scientifica, 00133 Rome, Italy CoNISMa, Piazzale Flaminio, 9, 00196 Rome, Italy

ARTICLE INFO

ABSTRACT

Keywords: Posidonia oceanica Vulnerability assessment Habitat Suitability Model Random Forest Machine Learning

Posidonia oceanica is an endemic Mediterranean seagrass that ranks among the most important and valuable species, with regard to both its ecological role and the services it provides. Despite this species is one of the main targets of conservation actions, the current regression trend of P. oceanica is alarming, underlying the urgent need for reliable methods capable of assessing meadows vulnerability. To address this need, we developed a Habitat Suitability Model (HSM) aimed at assessing the vulnerability of P. oceanica meadows in the Italian marine coastal waters using the Random Forest (RF) Machine Learning technique. Building on the current knowledge on both spatial distribution and condition of meadows in the Italian seas, the RF was used as a classifier aimed at modeling the habitat suitability for P. oceanica, rather than for predictive purposes. The assessment of the potentially most vulnerable P. oceanica meadows at increasing risk of regression was performed through the analysis of the RF output. The HSM showed a good level of accuracy, i.e. Cohen’s K = 0.685. The proposed approach provided valuable information regarding the vulnerability of P. oceanica meadows over the Italian marine coastal waters. In addition, an evaluation of the relative importance of the predictors was carried out using the permutation measure. The developed HSM can support conservation and monitoring programs regarding this species playing a crucial role in the marine ecosystems of the Mediterranean Sea.

1. Introduction Knowledge on the spatial distribution of species is crucial in conservation biology and environmental management (Hirzel et al., 2006). However, field data at large scale require massive and expensive efforts to be collected. Hence, our awareness of the species spatial distribution is usually fragmented, especially in marine systems, which are more difficult to access and monitor compared to terrestrial ones (Robinson et al., 2011). In recent years, the crucial need for understanding the spatial distribution of species resulted in an increasing application of modeling approaches (Reiss et al., 2014). Indeed, models predicting the species spatial distribution, such as Habitat Suitability Models (HSMs), Species Distribution Models (SDMs) and Ecological Niche Models (ENMs), provide useful ecological insights into the species-environment interactions (Elith and Leathwick, 2009; Guisan and Zimmermann, 2000). Due to their potentiality, those predictive spatial models have become valuable supporting tools in addressing conservation and environmental issues, such as the definition of conservation plans and

⁎

monitoring efforts (Guisan et al., 2013; Guisan and Thuiller, 2005). Despite the above-mentioned spatial predictive models and relative acronyms, i.e. HSM, SDM, ENM, are frequently indistinctly used, they may be differentiated regarding their final aim. According to Guisan et al. (2013), the SDM should be used when the focus is on spatial prediction, while ENM is more proper for niche quantification. On the other hand, the HSM acronym should be adopted when the focus is on the assessment of the suitability of a study area for target species. Those models can certainly boost our ability to understand species distribution over large regions. Accordingly, they have been used to address a wide range of ecological issues in the last decades. For instance, applications of HSM in the marine ecosystems include the definition of conservation programs (Whittock et al., 2016), the assessment of the effects of climate change on benthic species (Reiss et al., 2014), and the evaluation of the impact of human activities on vulnerable ecosystems (Anderson et al., 2016). In the terrestrial realm, HSMs have been likewise adopted to define conservation plans (Di Marco et al., 2016), to guide invasive species monitoring (Crall et al., 2013), as well as to address wildlife management (Hirzel and Le Lay, 2008).

Corresponding author. E-mail address: [email protected] (E. Catucci).

https://doi.org/10.1016/j.ecolind.2019.105744 Received 28 June 2019; Received in revised form 11 September 2019; Accepted 14 September 2019 1470-160X/ © 2019 Elsevier Ltd. All rights reserved.

Ecological Indicators 108 (2020) 105744

E. Catucci and M. Scardi

Nevertheless, ecological systems are always complex in their structure and dynamics, presenting non-linear and high-order interactions (Dreyfus-Leon and Scardi, 2008), thus modeling these systems might be arduous. To tackle such an issue, in the last twenty years Machine Learning methods have gained popularity due their ability to handle multifaceted relationships between species and environmental variables (Thessen, 2016; Elith and Leathwick, 2009). Those Machine Learning methods, such as Random Forests (RFs), Gradient Boosted Regression Trees (GBRTs) and Artificial Neural Networks (ANNs), have become effective tools in modeling species response in ecological systems (Mattei et al., 2018; Martin et al., 2015; Evans et al., 2011). Several comparative studies have already shown how Machine Learning can outperform statistical approaches in a wide variety of ecological applications (e.g. Segurado and Araújo, 2004; Manel et al., 1999). However, direct comparison of their results can be laborious and requires careful consideration (Thessen, 2016; Fielding, 2006). In this study, we developed a HSM for Posidonia oceanica L. Delile 1813, at Italian national spatial scale using a RF. P. oceanica is an endemic Mediterranean seagrass playing a crucial ecological role over the whole basin and is vital regarding the services it provides. This species creates meadows that stretch from the surface to 40–50 m depth, depending upon the water transparency, which rank among the most valuable ecosystems on Earth (Boudouresque et al., 2012). Indeed, P. oceanica meadows are one of the most productive ecosystems playing a critical role regarding both water movement and sediment flows (Boudouresque et al., 2012). For instance, the meadows with their considerable vegetal biomass have a major role in the reduction of hydrodynamic energy, thus preventing shoreline erosion (Boudouresque et al., 2009). The meadows also promote the sedimentation of particulate matter, increasing the transparency of the coastal waters (Boudouresque et al., 2012). Over the past years, P. oceanica has been adopted as biological indicator, reflecting the quality of the environment, in the context of the European Union Water Framework Directive (WFD 2000/60/EC) and Marine Strategy Framework Directive (WFD 2000/60/EC) (e.g. Malea et al., 2019; Personnic et al., 2014; Gobert et al., 2009). From a structural viewpoint, P. oceanica forms a characteristic structure composed of intertwined rhizomes, together with roots and sediment trapped between them, called matte. When a regression of a meadow happens, the matte (hereafter “dead matte”) can persist for centuries due to the very slow decay of rhizomes (Boudouresque et al., 2009). Accordingly, the dead matte is the final condition of P. oceanica in the regression process. It is worth noting that the construction of the matte by P. oceanica, either in living or in dead form, is a unique feature among the Mediterranean seagrasses crucial in the assessment of the ecological status of a meadow (Boudouresque et al., 2009). In fact, the presence of dead matte allows to accurately detect areas where the environmental conditions were suitable for P. oceanica, but which are no longer adequate, or have not been suitable over a period of time, leading to the regression phenomena (Leriche et al., 2006). Clearly, the presence of dead matte is not a sufficient reason for inferring that a perturbation still exist, but it certainly shows that the regressed meadow was possibly more vulnerable than others. The regression of P. oceanica meadows is well-documented over the whole Mediterranean based (Short et al., 2011; Boudouresque et al., 2009; Waycott et al., 2009; Orth et al., 2006; Hemminga and Duarte, 2000). In Italy the estimated loss of known meadows has been 25% in the last 30 years, basing on the available historical data (Telesca et al., 2015). Over the last years, both the distribution and the condition, i.e. living vs. dead matte, of the Italian meadows have been entirely mapped. This means that the available spatial data on P. oceanica included all the Italian meadows either in living or in regressed condition. The occurrence data on living meadows in combination with those on regressed ones are crucial in modeling the habitat suitability for P. oceanica. In fact, the latter permit to assess with certainty the past

occurrence of a living meadow, testifying the past existence of a potential habitat suitability (Boudouresque et al., 2009; Leriche et al., 2006). Accordingly, we exploited the available information on the Italian meadows, proposing a Machine Learning approach aimed at assessing the vulnerability of P. oceanica in the Italian seas. Precisely, we used a RF as a classification tool to assess the level of habitat suitability for P. oceanica, rather than as a strictly predictive tool. In fact, the latter purpose would sound redundant given our comprehensive knowledge on the spatial distribution of meadows. Rather, we analyzed the RF output to assess the vulnerability of P. oceanica meadows over the Italian seas. 2. Materials and methods 2.1. P. oceanica spatial data and environmental variables Our data set included spatial data on the occurrence of P. oceanica meadows in the Italian seas and 35 environmental variables. The latter were acquired from the sources reported in the MEDISEH project (Giannoulaki et al., 2013), which were used as predictors in order to develop the HSM (Table 1). The complete list of the references of the environmental variables can be found in the Supplementary Material Table 1. The predictors included both environmental and anthropogenic variables. While most of the former are based on in situ measurements, e.g. depth and pH, the latter are modelled, e.g. human impact and population pressure to the marine ecosystems. All the predictors were available as raster layers presenting the same spatial resolution, i.e. 0.004166 decimal degrees (about 400 m) and coordinate system, i.e. WGS84. It is worth noting that the raster layers of all the predictors included only pixels whose average depth was less or equal to 50 m, or pixels with at least one neighboring pixel whose depth was shallower than 50 m, i.e. only those pixels in which P. oceanica potentially occurred (Scardi et al., 2013). The spatial distribution of P. oceanica in the Italian marine coastal waters is entirely known as a result of a national mapping program carried out from 1995 to 2004, and occurrence data included areas where the meadows were either in living or in regressed conditions. P. oceanica spatial data are available as shapefiles at https://www. Table 1 Environmental variables used as predictors for HSM development and available at Mediterranean spatial scale. Predictors 1 Artisanal fishing

19

2 3 4 5 6

Aspect of the seafloor Bottom salinity Bottom temperature Bottom type Calcite concentration

20 21 22 23 24

7

25 26

Pollutants (inorganic)

27

Pollutants (organic)

10 11 12 13

Chlorophyll a concentration (annual range) Chlorophyll a concentration (mean) Climate change (sea surface temperature) Climate change (UV) Depth Diffuse attenuation coefficient Dissolved oxygen concentration

Human impact to marine ecosystems Nitrate concentration Nutrient input (fertilizers) Ocean acidification pH Photosynthetically available radiation Phosphate concentration

28 29 30 31

14 15 16 17 18

Distance to 200 m isobaths Distance to coast Distance to ports Distance to river mouths Euphotic depth

Pollution (ocean-based) Population pressure Salinity Sea surface temperature (annual range) Sea surface temperature (mean) Shipping intensity Silicate concentration Slope of the seafloor

8 9

2

32 33 34 35

Ecological Indicators 108 (2020) 105744

E. Catucci and M. Scardi

naturaitalia.it/cartografiaPrateriePosidonia.do. These shapefiles were converted to raster in QGIS (QGIS Development Team, 2018. QGIS Geographic Information System. O3.0pen Source Geospatial Foundation http://qgis.osgeo.org), using the function ‘Rasterize’ in the toolbox SAGA v. 2.1.4. The latter toolbox allowed carrying out a rasterization using the conservative approach, i.e. turning on any pixel intersecting a P. oceanica polygon. The use of conservative rasterization allowed to identify areas where both living and regressed meadows occurred. We defined those areas of co-occurrence of the two states of the meadows as “mixed conditions”. Following the conservative rasterization, our data set included 266,634 records among which (i) 24,545 referred to living meadows, (ii) 5,168 to mixed condition, (iii) 3,379 to regressed meadows, while (iv) 233,542 were absence records, i.e. 9.20%, 1.94%, 1.27% and 87.59% respectively. For modeling purposes, the data set was partitioned into two subsets, i.e. into a training and a test set. While the former, which included about 90% of the total records, was used for developing the model, the latter, including the remaining records, was adopted for the optimization and the evaluation of the HSM. The data set was partitioned using a 3 × 3 degree grid, i.e. a mesh size of about 333 km in latitude. For each cell of the grid, we firstly computed the number of records belonging to each condition of the P. oceanica meadows, i.e. living, mixed and regressed. Then we randomly selected, in each cell, 10% of records of each type. Thus, we obtained a test subset that was as much representative as possible of the spatial distribution of P. oceanica meadows in the Italian seas. It is worth noting that potential spatial autocorrelation did not represent an issue in our work, since the assessment of the vulnerability of P. oceanica was carried out only for the study area, i.e. Italian seas, and it was not intended to be applied elsewhere. In other words, our model aimed at being accurate, not at being general. Furthermore, it is crucial to highlight that while the differentiation between the three different conditions of meadows was considered for the data partitioning, no specific information regarding the status of P. oceanica was taken into account in the modeling procedure. In fact, all the areas where the seagrass has been occurring, either forming living, mixed or regressed meadows, were considered as suitable areas, i.e. as presence records, in order to develop the HSM. The rationale behind this approach is that also the spatial data on the regressed meadows are crucial to achieve our aim, which is the assessment of the vulnerability of the meadows by modeling the habitat suitability for P. oceanica in the Italian seas. As previously stated, the occurrence of a regressed condition allowed to detect with certainty the former presence of a living meadow, hence, in a wider perspective, the former, and potentially the present, existence of suitable habitat conditions for P. oceanica.

algorithm. As the RF selects only variables that are relevant to the optimization of its predictions, a specific procedure to select the most important predictors before the RF training is not required. The mathematical explanation of the latter assumption can be found in Breiman et al. (1984) and Louppe et al. (2013). Briefly, in the tree-building process only a random subset of predictors are available at each split for the partitioning. Among the predictors in the random subset, the algorithm selects the one that makes the child nodes as pure as possible. Subsequently, the predictors that are less relevant are hardly selected because it is highly probable that in the random subset there are some other predictors, which make the decrease in impurity larger. Those features make the RF a powerful and effective technique to model the species response considering the complexity of the modelled ecological relationships. Over the last years, the RF was used for several ecological applications such as classification and regression problems, survival analysis, missing value imputation, cluster analysis and multidimensional scaling (Cutler et al., 2007; Prasad et al., 2006; Liaw and Wiener, 2002). In this study, we used the original Breiman’s Fortran 77 code, (available at https://www.stat.berkeley.edu/~breiman/RandomForests/ cc_software.htm), which was ported to Fortran 90, improved in file I/O sections (source code is available upon request) and compiled using Gfortran 4.10. 2.3. Tuning parameters of RF Three main parameters can be tuned to optimize RF accuracy: (i) the number of randomly selected candidate predictors at each split, called mtry0 in Breiman’s code, (ii) the minimum number of records contained in leaf to stop the splitting, ndsize, and (iii) the number of trees in the RF, jbt. Breiman (2001) suggested to set mtry0 to q = √p for classification problems, where p is the total number of predictors, and to tune this parameter in a range between q/2 and 2q. The mtry0 parameter controls how much randomization is injected into the splitting procedure. Briefly, if q = p the splitting predictor is always chosen among all the others, which means that the tree construction is determinist and the randomness is entirely obtained through the bootstrap process, which affects the selection of the training patterns (Scornet, 2017). However, from a methodological perspective it is crucial to base the randomness on both the splitting and the bootstrapping procedure. A detailed explanation of the effect of mtry0 can be found in Boulesteix et al., (2012) and Scornet (2017). Regarding ndsize, Breiman (2001) suggested to fully grow the trees by setting this parameter to 1. However, Segal and Xiao (2011) demonstrated that growing very large trees in a RF may cause overfitting, which was not observed by Breiman (2001) probably because the benchmark data sets he used shared properties that made large trees nearly optimal (Cutler et al., 2012). In Kruppa et al. (2013), the authors suggested to tune this parameter considering the data set properties to reach the optimal ndsize value. The effect of this parameter was discussed in detail in Díaz-Uriarte and De Andres (2006) who claimed that setting ndsize to 1 could not be optimal. In fact, if the trees are grown to purity, i.e. ndsize = 1, the predictions can only be 1 or 0, and thus a large number of trees might be required to obtain consistent predictions (Dankowski and Ziegler, 2016). It is worth noting that growing a forest with very large number of trees is time-consuming and it increases the computation costs, especially in the case of large amount of data (Cutler et al., 2012). Accordingly, the optimal values of both ndsize and jbt, which is the total number of trees in the forest, should be empirically reached, taking into account the trade-off between the model accuracy and the computation costs (Biau and Scornet, 2016). Since our data set included 266,634 records and 35 predictors, we tested the mtry0 parameter in the [4,12] range, while the ndsize was set to 9 different values, namely 1, 50, 100, 250, 500, 1000, 2000, 5000

2.2. Random Forest RF is an ensemble Machine Learning technique based on a set of classification trees that are trained on partly independent data subsets. In case of classification problems, the output is obtained at run time by majority voting of the trees, while the average output of all the trees is considered in regression problems (Breiman, 2001). For each tree in the forest, the algorithm selects a bootstrap sample of n cases with replacement out of the original N cases, where n < N (Breiman, 2001). Cases that do not occur in the current bootstrap sample are called Out-Of-Bag (OOB) records. For each bootstrap sample, a tree is grown, by default, to the maximum depth and left unpruned. At each split in the tree, only a randomly selected subset of q predictors is available for the binary partitioning out of the p predictors in the whole data set, i.e. q < p (Breiman, 2001). It is worth stressing that RF is robust to the inclusion of a large number of predictors, which might be more or less relevant in modeling the response variable. In fact, the inclusion of less relevant predictors does not influence the way a RF works, due to the properties of the 3

Ecological Indicators 108 (2020) 105744

E. Catucci and M. Scardi

and 10000. Accordingly, we obtained 81 different RF configurations, which were trained setting the jbt parameter to 250. We kept the latter constant since we performed several empirical tests which showed that 250 trees were sufficient to obtain consistent results over multiple training procedures.

for each HSM. Once the optimal threshold values was identified, we evaluated the RFs accuracy using the K metric. In order to select the final RF configuration, we adopted the following approach. For each mtry0 value, we analyzed the relationship between K values and ndsize, which is expected to be inverse and monotonic, by means of the so-called “elbow” method, i.e. looking for the ndsize whose K value was the farthest from the line passing through the minimum and then maximum ndsize values. While the outcome of this procedure returned the best compromise between accuracy (with smaller ndsize) and generality (with larger ndsize), mtry0 values also played a role. Therefore, we selected the one whose optimum ndsize was associated to the largest K value.

2.4. Measures of predictors relative importance The Breiman’s RF implementation incorporates two measures of importance of the predictors, called permutation and Gini importance (Breiman, 2004). Both these measures have been widely used to assess the relative importance of the predictors (e.g. Fox et al., 2017; Nicodemus, 2011; Strobl et al., 2008). In Breiman (2004), the author stressed that, although the Gini measure could be the fastest option, the permutation one is more reliable. Accordingly, we adopted the latter to evaluate the relative importance of the predictors. The permutation measure is computed during the RF training and it is based on the OOB records, which are the data that are not included in the bootstrap sample. As discussed in Section 2.2, each tree in the forest is grown using the records included in the bootstrap sample, while the OOB records are passed down to that tree afterwards. The permutation importance estimates are obtained as follows: (1) the OOB records are passed down to the tree previously grown with the bootstrap sample data, obtaining predictions for the OOB observations; (2) the OOB records are passed down to the same tree once more during which the values of each predictor, one at a time, are randomly permuted, while those of the other predictors are left unchanged, obtaining new predictions for the modified OOB observations; (3) the predictions for both the original and the modified OOB records are aggregated tree by tree as the forest is constructed; (4) the overall misclassification error between the original and the modified OOB records is computed and is considered as a measure of the relative importance of the predictors (Boulesteix et al., 2012; Cutler et al., 2007; Liaw and Wiener, 2002).

2.6. Assessment of the vulnerability of P. oceanica meadows As previously stated, the aim of our study was to assess the vulnerability of P. oceanica meadows in the Italian marine coastal waters using a Machine Learning approach, and we pursued this goal by means of the analysis of the RF output. To explain how we carried out the assessment of the vulnerability of meadows, it is firstly crucial to note the RFs were trained using the P. oceanica occurrence data included in the training set considering as ‘presence’ all the records in which the seagrass has been observed, regardless of the meadows conditions, as discussed in Section 2.1. In fact, the RFs were trained as classifiers (e.g. Gislason et al., 2006) to estimate the occurrence of P. oceanica, i.e. presence and absence, rather than for predicting the condition of the meadows. In a methodological perspective, using this approach the RFs learned the greatest possible extent on the spatial distribution of the meadows and they were able to reproduce the distribution of P. oceanica over the Italian marine coastal waters. The RF output is a value ranging in the [0,1] interval, corresponding to the proportion of trees voting for meadow presence. The minimum value corresponding to a predicted presence was computed by means of the ROC curve analysis. Clearly, the predictions that showed values larger than the optimal threshold are those predicted as presence instances. In a wider perspective, the areas in which the RF provided ‘presence’ predictions are those that showed environmental conditions that are assumed to be suitable for P. oceanica. Therefore, the continuous output value provided by our RF can be interpreted as the “level” of the suitability for P. oceanica of a specific area. In the view of the above, the assessment of the P. oceanica vulnerability is based on the rationale that meadows located in areas presenting low RF output value, and therefore low predicted habitat suitability, are those where the potential regression risk is higher. In detail, we analyzed the distribution of the RF output with respect to the observed data about the conditions of meadows. In practice, we compared the medians of the RF output values for cases in which the meadow condition was known to be living, mixed or regressed. We expected that the RF output values were lower in cases of marginally suitable habitat conditions, i.e. where regressed meadows occurred.

2.5. Model evaluation The accuracy of presence-absence models that exhibit output values ranging in the [0,1] interval can be evaluated using two different metrics, which can be regarded as threshold-independent or thresholddependent measures (Guisan et al., 2017). Using the former the accuracy is computed with the default threshold value for discriminating between instances, i.e. 0.5, while the use of the latter metrics foresees the selection of an optimal value of the threshold. The selection of the optimal threshold is critical in the case of imbalanced frequencies in the data set, since the prevalence, i.e. frequency of occurrence, affects the model accuracy (Manel et al., 2001). The analysis of the Receiver Operating Characteristic (ROC) curve (Zweig and Campbell, 1993) is a common procedure for dealing with imbalanced data, by means of which the value of the optimal threshold is obtained. The latter is associated to the point on the curve at which the sum of sensitivity and specificity is maximized (Hirzel et al., 2006; Manel et al., 2001). Furthermore, the ROC curve provides the value of the Area Under Curve (AUC) of the ROC, which is a threshold-independent metric. Our data set falls in the above-mentioned case of unbalanced frequencies since the absence records accounted for more than 85% of the total ones. Accordingly, we evaluated the RF accuracy using the Kappa statistic (K) (Cohen, 1960), a threshold-dependent measure. The computation of K is based on the confusion matrix, which strictly depends on the optimal threshold selected (Guisan et al., 2017). The K values range in the [0,1] interval, where 0 corresponds to a random agreement between the predictions and the observed data, while 1 indicates a perfect classification (after Landis and Koch, 1977). Hence, once the training procedure was completed, for each RF configuration the data included in the test set were analyzed using the ROC curve, which returned the optimal threshold and the AUC values

3. Results & discussion 3.1. Modeling procedure The optimization of the 81 RFs trained with the different combinations of mtry0 and ndsize values were carried out using only the data included in the test set, which were never used for the RF training. We optimized the RFs output on the basis of ROC curve analysis, which allowed to define the optimal threshold value for discriminating between P. oceanica presence and absence, then we computed the K values for evaluating the RFs accuracy. We determined the final RF configuration by means of the “elbow” method for the mtry0 values that allowed obtaining the largest K value at the optimum ndsize, as shown in Fig. 1. The gray square in the plot 4

Ecological Indicators 108 (2020) 105744

E. Catucci and M. Scardi

the level of habitat suitability for P. oceanica, hence it is possible to interpret the ‘good’ ability of our HSM in distinguishing between presence and absence records as a more general capability in discriminating between suitable and unsuitable habitat conditions for P. oceanica all over the Italian coastal waters. The areas showing prediction values lower than the optimal threshold, i.e. RF output < 0.057, are those that can be regarded as unsuitable for P. oceanica, while the ones associated to RF output values larger than 0.057 are those that can be regarded as suitable. Afterwards, for the specific purpose of assessing the meadows vulnerability, we performed a second ROC analysis taking into account the records that were expected to be associated to suitable habitat conditions for P. oceanica., i.e. only the records associated to RF output values larger than 0.057. In this second ROC analysis we looked for an optimal separation between records with living meadows (either entirely living or in mixed conditions) and those with regressed meadows only. The optimal threshold value suggested by the secondo ROC curve analysis was 0.759. Since pixel area in our P. oceanica distribution raster was rather large (about 160,000 m2), we considered records of entirely living meadow and those containing both living and regressed meadow, i.e. mixed condition records, as a sole condition, in opposition to ones containing dead matte only. This solution was adopted because in mixed areas living shoots still existed, thus testifying that environmental conditions are not, or have not been, as unsuitable as to cause complete meadow regression over a rather large area. In fact, co-occurrence of living and dead patches within a P. oceanica meadow may be a natural feature, as a certain percentage of dead matte constitutes a characteristic of each meadow (Boudouresque et al., 2006). For instance, small patches of dead matte may occur in shallow waters where the meadow approaches the sea-level at which the environmental conditions are no longer suitable for the survival of P. oceanica or they may depend on processes caused by the natural senescence of the seagrass. According to Boudouresque et al. (2006), a meadow can be regarded as a cluster of individuals at different stages of maturity (ramet) and shoots in ramet may share nutrients with each other through the rhizome (Libes and Boudouresque, 1987), and this constitutes an insurance for any given shoot in need for nutrients, as they can be supplied by others. However, when an orthotropic rhizome exceeds a certain length, this process becomes inefficient (Marbà et al., 2002), making shoot survival precarious. Consequently, small patches of dead matte in a healthy P. oceanica meadow may result merely from the death of such vulnerable shoots (Boudouresque et al., 2009).

Fig. 1. The “elbow” plot for selecting the final RF configuration. The “elbow” is shown by the gray square, which is the farthest point from the dashed line passing through the two points showing the extreme combinations of K and ndsize values. While a different curve was obtained for each mtry0 value in the [4,12] range, here is shown the one that allowed finding the largest deviation from a linear decrease in K values with increasing ndsize. The resulting optimum, i.e. ndsize = 500 and mtry0 = 6, was associated to the largest optimized K value, i.e. 0.685.

showed the resulting combination of mtry0 and ndsize values, i.e. 6 and 500, respectively. Hence, our final RF was based on 250 trees whose leaves contained at least 500 cases, using 6 randomly selected predictors at each split. It is worth noting that since the ndsize parameter regulates the generalization ability (Dankowski and Ziegler, 2016; Kruppa et al., 2013), by using the “elbow” method we were able to select the RF showing the best trade-off between accuracy, i.e. low ndsize, and generality, i.e. large ndsize. In Fig. 2, we showed the ROC curve for the final RF providing the optimal threshold value and the AUC, which was 0.982. The optimal threshold value was very low, i.e. 0.057, as a consequence of the unbalanced frequencies of presence and absence records in the training set, which affected the way the majority of trees voted. In practice, since the absence records were prevalent, the RF learned more frequently from those data, providing predictions biased towards absence. The final RF showed a ‘good’ level of accuracy (Landis and Koch, 1977), i.e. K = 0.685, which testifies that our HSM exhibited a good ability in discriminating between presence and absence records of P. oceanica. The HSM is available as raster file in the Supplementary Material. As previously discussed in Section 2.6, the RF output values reflects

3.2. Using RF for the assessment of the vulnerability of P. oceanica meadows Following the selection of the final RF configuration, we analyzed the distribution of the RF output values with respect to the observed meadow conditions. We observed that the former showed differences in their range when grouped by meadow condition (Fig. 3). In detail, RF output values related to living meadows showed the largest median value, i.e. 0.839, while the mixed meadow conditions were associated to an intermediate median value, i.e. 0.739, which in turn was larger than the one related to the regressed meadows, i.e. 0.616. All the RF output values for areas that are environmentally suitable for P. oceanica at present, as living or mixed meadows, or that have been suitable in the past, as now regressed meadows, were much larger, except for a few outliers, than those for the ones where P. oceanica does not occur and is not known to have occurred previously. In fact, RF output values for absence records showed the lowest values, with median very close to zero. The dashed line in the plot represents the optimal threshold value aimed at distinguishing between presence and absence records of P. oceanica, i.e. 0.057, hence for discriminating between suitable (except for transient or recent disturbances) and unsuitable habitat condition, as extensively discussed so far (Fig. 3). On the other hand, the dotted

Fig. 2. The ROC curve for the final RF configuration. The ROC curve allowed to define the optimal threshold value (0.057), for discriminating between P. oceanica presence and absence predictions. The AUC (0.982) is a thresholdindependent measure for evaluating the overall accuracy of the HSM. 5

Ecological Indicators 108 (2020) 105744

E. Catucci and M. Scardi

Fig. 3. Distribution of RF output values grouped by condition of P. oceanica meadows, including absence. The boxplots showed that the RF output values varied according to meadows conditions: the living meadows were associated to the largest values, while intermediate values were related to mixed condition meadows and the still lower ones to regressed meadows. The lowest RF output values were obtained for absence records. The dashed line referred to the optimal threshold value for discriminating between P. oceanica presence and absence (0.057), while the dotted one represented the value for distinguishing between living and regressed condition of meadows (0.759).

line represents the optimal threshold value provided by the second ROC curve analysis that we used for distinguishing between the living and the regressed condition, i.e. 0.759 We carried out a Kruskal-Wallis test based on the null hypothesis of equal medians in the three meadow conditions. As the null hypothesis was rejected (p < 0.0001), we concluded that the different conditions of the meadows were actually associated to different median values in the corresponding RF output. We also applied the Mann-Whitney test in order to perform a posteriori pairwise comparisons, testing the null hypothesis of equal medians for each couple of meadow conditions. All the comparisons showed significant differences (p < 0.0001), using the appropriate Bonferroni correction, suggesting that there was strong evidence for difference in RF output values. These results pointed out that RF output tended to show more frequently low values in areas where regressed meadows actually occur, while large RF output values were usually associated to living meadows areas and intermediate ones with areas where mixed conditions were observed. For instance, as showed in Fig. 4, the RF tended to provide in output lower values for the areas in which the regressed meadows were

actually observed, while it returned more frequently larger values for the areas where living meadows actually occurred. Furthermore, the suitability tended to be more limited towards the lower limit of the meadows, underlying an increased vulnerability of that portion of meadows, which is consistent with our knowledge of the regression phenomena at the lower limit of P. oceanica meadows (e.g. Montefalcone, 2009; Leriche et al., 2006; Montefalcone et al., 2006). It is worth stressing that information about the status of the meadows was not passed to the RF during its training, thus the agreement between RF output values and the actual conditions of the meadows is remarkable. Clearly, the matching is not perfect everywhere, nevertheless the general trend of our RF is to provide more frequently lower and intermediated values, i.e. lower than 0.759, for the areas of regressed meadows, while larger ones for the living conditions (Fig. 4). These results support the possibility of using our HSM as a tool for assessing the vulnerability of the meadows and, as a direct consequence, for identifying the areas in the Italian marine coastal waters that should be prioritized for conservation efforts. Indeed, since the RF output reflected the level of suitability for the presence of P. oceanica, the meadows that occurred in areas in which the environmental Fig. 4. RF output vs. observed conditions of P. oceanica meadows. RF output values shown in three shades of gray, changing in accordance with the condition of meadows. Intermediate gray, which represents areas predicted as marginally suitable for P. oceanica, (RF output values: [0.057, 0.759)), corresponded to regressed meadows, to some areas where P. oceanica is not present and to parts of the living meadows that are supposed to be at risk of regression. Most of the currently living meadows, on the other hand, are in darker gray, i.e. in habitat conditions that are fully suitable for P. oceanica (RF output values: [0.759, 1]). The light gray in the lower left corner represents areas predicted as unsuitable for the presence of P. oceanica (RF output values: [0, 0.057)).

6

Ecological Indicators 108 (2020) 105744

E. Catucci and M. Scardi

Fig. 5. Vulnerability assessment of a P. oceanica meadow around Gallipoli (Apulia, Southern Italy). The three shades show RF output values reflecting the level of habitat suitability for P. oceanica: areas that are too deep for P. oceanica are shown in white; those predicted as unsuitable in light gray (RF output [0, 0.057)); those predicted as marginally suitable and therefore vulnerable in intermediate gray (RF output [0.057, 0.759)); those expected to be fully suitable in dark gray (RF output [0.759, 1]). The meadow that is located east of the small island in the middle of the map is expected to be potentially vulnerable as well as other smaller portions of the existing meadow. The level of habitat suitability is obviously very diverse in the intermediate gray regions and is reflected by the exact RF output value. In fact, P. oceanica is actually absent in large intermediate gray regions.

oceanica meadows that are expected to be at risk of regression, thus considerably reducing survey costs. Despite our results cannot shed direct light on the causes of P. oceanica vulnerability, we observed that most of the highly vulnerable meadows are located along the western shore of the Tyrrhenian Sea, where coastal waters are certainly more affected by anthropogenic impacts than around the main islands and along the southern Italian coasts, because of population density, agriculture, larger rivers, etc. Since P. oceanica requires oligotrophic waters presenting a sedimentation rate compatible with the growth of the matte (Boudouresque et al., 2009), the vulnerability of meadows may increase in areas characterized by an excess of nutrient inputs. The latter could enhance the growth of the epibiota of P. oceanica leaf, thus increasing the competition for light and favoring the herbivore grazing (Ruiz et al., 2001). Furthermore, an increase in turbidity following the enrichment in organic and inorganic matters, leading to anomalous phytoplankton blooms due to eutrophic conditions, reduces the waters transparency and the penetration of light, and clearly might result in an increase of the vulnerability of P. oceanica.

conditions are predicted as marginally suitable, i.e. with very low RF output values, are expected to be the most vulnerable. On the other hand, the meadows located in areas showing suitable conditions for the presence of P. oceanica, i.e. with high RF output values, can be regarded as the least vulnerable. As shown in Fig. 5, a rather large portion of the meadow, stretching east of the small island in the middle of the map, is located in a region where habitat conditions for P. oceanica are predicted as marginally suitable (i.e. shown in intermediate gray, with RF output in the [0.057, 0.759) range). Therefore, we expect this portion of meadow to be among the most vulnerable, hence at risk of regression. As the lower limit of that part of the meadow is very close to the 50 m isobath, i.e. to the maximum depth at which P. oceanica can be found, the actual risk can be even higher than expected. On the other hand, the portion of meadow located in areas showing appropriate habitat suitability for P. oceanica, i.e. in darker gray (RF output values in the [0.759, 1] range), were assumed to be the least vulnerable. While the less vulnerable areas could be targeted for mitigation measures and monitoring programs aimed at maintaining or improving the present level of the habitat suitability, the most vulnerable ones are good examples of meadow sections that absolutely deserve appropriate conservation actions in order to control the regression risk, possibly avoiding further loss of P. oceanica. Focusing on a meadow located in Central Italy, off San Felice Circeo (Latium), it is interesting to note a very close match between our vulnerability estimates and the actual condition of P. oceanica based on recent observations, carried out several years after our data were collected (Fig. 6). In fact, the portion of meadow that we identified as potentially vulnerable (intermediate gray in Fig. 6a and b), hence at risk of regression, includes the deepest part of the area identified as “degraded meadow” by Ardizzone et al. (2018) (Fig. 6b). That portion of meadow is rapidly deteriorating, meaning that both living and dead matte are present. This process is especially due to direct anthropogenic impacts in the westernmost part of the meadow, which is regressing since several decades, but in the southernmost part of the meadow, which is also the deepest, it is probably caused by diffuse pressures (e.g. increasing water turbidity). In fact, the latter may deeply alter the habitat suitability for P. oceanica and lead to an increase in meadows vulnerability. This example, as well as other not shown here, supports the possibility of using our HSM to plan targeted monitoring programs for P.

3.3. Potential applications The use of spatial models opened new possibilities for ecosystem management and conservation strategies, since they are straightforward and easy to communicate (Reiss et al., 2014). Modeling the habitat suitability is extremely useful in supporting the definition of environmental and conservation programs, as well as in defining ecosystem-based management (Vacchi et al., 2014). In this study, we focused on P. oceanica, which is the most widespread seagrass in the Mediterranean Sea, playing a crucial ecological role over the whole basin, e.g. in primary production, in controlling the sedimentary flows, in mitigating hydrodynamic stress on the shores, thus protecting the coastline from erosion (Boudouresque et al., 2012; Boudouresque et al., 2009). Despite all the conservation efforts, this species is regressing at alarming rate over the whole basin (Telesca et al., 2015). The current regression trend of P. oceanica underlines the necessity of reliable methods for assessing the vulnerability of the meadows (Duarte, 2002). In this respect, our HSM represents an innovative and effective approach for addressing conservation and monitoring programs regarding P. oceanica. By modeling the habitat suitability for P. oceanica, 7

Ecological Indicators 108 (2020) 105744

E. Catucci and M. Scardi

Table 2 Estimates of relative importance of predictors assessed using the permutation measure, which referred to the final selected RF. The results are normalized on the basis of the ‘depth’ that was the one showing the largest relative importance.

Fig. 6. Vulnerability assessment of P. oceanica meadow located off San Felice Circeo (Latium, Central Italy). The RF output values, shown in three shades of gray, show a clear decreasing gradient from the shallower to the deeper parts of the meadow and are exactly the same in panel a) and b). However, panel a) shows the structure of the meadow as obtained from the maps that were used for model development, while panel b) is based on a more recent map, showing also degraded regions of the meadow (modified from Ardizzone et al., 2018). The southernmost part of the meadow was in good conditions, although according to the model it was potentially vulnerable (i.e. in intermediate gray). Vulnerability, however, was not only potential, as the most recent data show that a large section of that part of the meadow is now degraded and therefore very close to definitive regression.

we were able to assess the vulnerability of the meadows over large spatial scale on the basis of the current knowledge on its spatial distribution, while information about meadows condition was only used to validate our approach. However, when interpreting our results, it is important to consider the biases inherent to the available data and to the modelled process. As a matter of fact, as the predictors were only available at a relative coarse spatial resolution, i.e. ¼ of minute, about 400x400 m, the vulnerability estimates we provided are mostly aimed at large scale. For instance, across large bays rather than within their coves. On the other hand, it is worth stressing that the RF is capable of detecting the multifaceted relationships between the available variables without relying upon specific assumptions, such as, for instance, linear dependence. In other words, the RF, as well as other Machine Learning methods, does not require explicit hypotheses about the relationships between responses and predictors, as it learns directly from the data. This is a crucial feature of our approach and it implies that the vulnerability estimates we obtained go beyond the subjective evaluation of the conditions of P. oceanica, as they are based only on the available data. This further means that our RF could be updated, and most probably improved, as soon as new data on P. oceanica meadows become available or are updated at finer spatial resolution.

We would like to stress that the output of our HSM is available in the Supplementary Materials as a raster file, which can be downloaded and imported in any Geographical Information System (GIS). It can be used as a supporting tool in environmental impact assessment and decision makers could take into account the vulnerability estimates it provides in order to evaluate the feasibility of marine projects that could involve alterations of suitability of the environmental conditions for P. oceanica. Our HSM can offer valuable insights into the status of P. oceanica meadows, possibly in association with other relevant properties, such as the analysis of shoot-density and productivity, thus providing baseline information in the assessment of the ecological status of meadows. 3.4. Relative importance of predictors As discussed in Section 2.4, the relative importance of predictors is routinely computed during the RF training, hence a dedicated 8

Ecological Indicators 108 (2020) 105744

E. Catucci and M. Scardi

procedure, e.g. sensitivity analysis, is not required. The relative importance of the predictors was assessed using the permutation measure. In Table 2, we reported the results referred to the final RF configuration, and we normalized the estimates basing on the depth, which was – not surprisingly - the predictor showing the largest relative importance. Among the 10 most relevant predictors we found variables that are mainly based on direct measurements, such as nutrient input and pH, and modelled variables, such as cumulative human impacts on marine ecosystems, as estimated by Halpern et al. (2008). However, it is worth noting that, setting aside depth, other predictors were less relevant, showing a relative importance that ranged in the [0.02, 0.33] interval, and one of them, i.e. shipping intensity, was not even selected by our final RF. These results seem consistent with the current ecological knowledge as depth is certainly the variable that mainly influences the distribution of P. oceanica meadows, since it is strictly related to an environmental variable as fundamental as light. However, it is crucial to stress that RF is a Machine Learning technique which is able to handle high-order interactions between the response variable and the predictors. Hence, the relative importance of predictors does not necessarily reflect the role they play in complex causal relationships. As a matter of fact, a RF exploits the resemblance between the spatial distribution of P. oceanica and those of predictors (Scardi et al., 2013), relying on correlation between their shapes rather than on causality. Therefore, the assessment of the relevance of the predictive variables is based on the multifaceted interactions that the RF handled during the tree-building procedure.

Appendix A. Supplementary data Supplementary data to this article can be found online at https:// doi.org/10.1016/j.ecolind.2019.105744. References Anderson, O.F., Guinotte, J.M., Rowden, A.A., Clark, M.R., Mormede, S., Davies, A.J., Bowden, D.A., 2016. Field validation of habitat suitability models for vulnerable marine ecosystems in the South Pacific Ocean: implications for the use of broad-scale models in fisheries management. Ocean Coast. Manage. 120, 110–126. https://doi. org/10.1016/j.ocecoaman.2015.11.025. Ardizzone, G.D., Belluscio, A., Criscoli, A., 2018. Atlante degli Habitat dei Fondali Marini del Lazio. Sapienza Università Editrice. Biau, G., Scornet, E., 2016. A random forest guided tour. Test 25, 197–227. Boudouresque, C.F., Mayot, N., Pergent, G., 2006. The outstanding traits of the functioning of the Posidonia oceanica seagrass ecosystem. Biol. Mar. Medit. 13, 109–113. Boudouresque, C.F., Bernard, G., Bonhomme, P., Charbonnel, E., Diviacco, G., Meinesz, A., Pergent, G., Pergent-Martini, C., Ruitton, S., Tunesi, L., 2012. Protection and Conservation of Posidonia oceanica Meadows. RAMOGE and RAC/SPA publisher, Tunis, pp. 1–202. Boudouresque, C.F., Bernard, G., Pergent, G., Shili, A., Verlaque, M., 2009. Regression of Mediterranean seagrasses caused by natural processes and anthropogenic disturbances and stress: a critical review. Bot. Mar. 52. https://doi.org/10.1515/BOT. 2009.057. Boulesteix, A.-L., Janitza, S., Kruppa, J., König, I.R., 2012. Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics. Wiley Interdiscip. Rev.: Data Min. Knowl. Discovery 2, 493–507. Breiman, L., 2001. Random forests. Mach. Learn. 45, 5–32. Breiman, L., Friedman, J., Olshen, R., Stone, C., 1984. Classification and regression trees. Wadsworth Int. Group 37, 237–251. Breiman, L., 2004. Consistency for a simple model of random forests. Cohen, J., 1960. A coefficient of agreement for nominal scales. Educ. Psychol. Measur. 20, 37–46. https://doi.org/10.1177/001316446002000104. Crall, A.W., Jarnevich, C.S., Panke, B., Young, N., Renz, M., Morisette, J., 2013. Using habitat suitability models to target invasive plant species surveys. Ecol. Appl. 23, 60–72. Cutler, A., Cutler, D.R., Stevens, J.R., 2012. Random Forests. In: Zhang, C., Ma, Y. (Eds.), Ensemble Machine Learning. Springer US, Boston, MA, pp. 157–175. https://doi.org/ 10.1007/978-1-4419-9326-7_5. Cutler, D.R., Edwards, T.C., Beard, K.H., Cutler, A., Hess, K.T., Gibson, J., Lawler, J.J., 2007. Random forests for classification in ecology. Ecology 88, 2783–2792. Dankowski, T., Ziegler, A., 2016. Calibrating random forests for probability estimation. Stat. Med. 35, 3949–3960. Di Marco, M., Santini, L., Visconti, P., Mortelliti, A., Boitani, L., Rondinini, C., 2016. Using habitat suitability models to scale up population persistence targets for global species conservation. Hystrix – Ital. J. Mammal. https://doi.org/10.4404/hystrix-27.111660. Díaz-Uriarte, R., De Andres, S.A., 2006. Gene selection and classification of microarray data using random forest. BMC Bioinf. 7, 3. Dreyfus-Leon, M.J., Scardi, M., 2008. Application of ecological informatics. In: Encyclopaedia of Ecology. Elsevier, pp. 222–227. https://doi.org/10.1016/B978008045405-4.00156-7. Duarte, C.M., 2002. The future of seagrass meadows. Environ. Conserv. 29. https://doi. org/10.1017/S0376892902000127. Elith, J., Leathwick, J.R., 2009. Species distribution models: ecological explanation and prediction across space and time. Annu. Rev. Ecol. Evol. Syst. 40, 677–697. https:// doi.org/10.1146/annurev.ecolsys.110308.120159. Evans, J.S., Murphy, M.A., Holden, Z.A., Cushman, S.A., 2011. Modeling species distribution and change using random forest. In: Drew, C.A., Wiersma, Y.F., Huettmann, F. (Eds.), Predictive Species and Habitat Modeling in Landscape Ecology. Springer New York, New York, NY, pp. 139–159. https://doi.org/10.1007/978-1-4419-73900_8. Fielding, A.H., 2006. In: Cluster and Classification Techniques for the Biosciences. Cambridge University Press, Cambridge. https://doi.org/10.1017/ CBO9780511607493. Fox, E.W., Hill, R.A., Leibowitz, S.G., Olsen, A.R., Thornbrugh, D.J., Weber, M.H., 2017. Assessing the accuracy and stability of variable selection methods for random forest modeling in ecology. Environ. Monit. Assess. 189, 316. Giannoulaki, M., Belluscio, A., Colloca, F., Fraschetti, S., Scardi, M., Smith, C., Panayotidis, P., Valavanis, V., Spedicato, M.T. 2013 Mediterranean Sensitive Habitats. DG MARE Specific Contract SI2.600741, Final Report, 107–143 pp. Gislason, P.O., Benediktsson, J.A., Sveinsson, J.R., 2006. Random Forests for land cover classification. Pattern Recogn. Lett. 27, 294–300. https://doi.org/10.1016/j.patrec. 2005.08.011. Gobert, S., Sartoretto, S., Rico-Raimondino, V., Andral, B., Chery, A., Lejeune, P., Boissery, P., 2009. Assessment of the ecological status of Mediterranean French coastal waters as required by the Water Framework Directive using the Posidonia oceanica Rapid Easy Index: PREI. Mar. Pollut. Bull. 58, 1727–1733. https://doi.org/ 10.1016/j.marpolbul.2009.06.012. Guisan, A., Thuiller, W., 2005. Predicting species distribution: offering more than simple habitat models. Ecol. Lett. 8, 993–1009. https://doi.org/10.1111/j.1461-0248.2005. 00792.x. Guisan, A., Zimmermann, N.E., 2000. Predictive habitat distribution models in ecology.

4. Conclusions This study focused on the assessment of the vulnerability of P. oceanica meadows in the Italian marine coastal waters using a Machine Learning approach. The P. oceanica meadows rank among the most valuable ecosystems in term of services they provide and no real compensation actions can be planned to effectively counterbalance their loss (Boudouresque et al., 2009). Accordingly, methods aimed at identifying the potential regression-risk areas are nowadays crucial to establish appropriate ecosystem management measures for preserving the natural capital and the goods and services it delivers. To our knowledge, this study represents the first Machine Learning approach aimed at assessing the vulnerability of P. oceanica meadows. We used a RF as a classification tool to develop a HSM aimed at modeling the habitat suitability for P. oceanica, rather than for predictive purposes, as both the spatial distribution and the condition of meadows are entirely known in Italian waters. The HSM we developed showed a good level of accuracy in determining the suitability for P. oceanica. We observed that the RF output values varied according to the meadows conditions, allowing to exploit the HSM in order to assess the vulnerability of P. oceanica over the Italian coastal waters. The assessment of the vulnerability of meadows was based on the rationale that the RF output actually reflected the level of habitat suitability for the presence of P. oceanica. The identification of potential regression-risk areas based on our results could be crucial in order to support environmental impact assessment, to identify meadows that should be prioritized for conservation efforts, to optimize monitoring programs, as well as to select sites where suitable habitat condition may favor restoration activities and to define ecosystem management plans aimed at preserving natural capital and ecosystem services associated to P. oceanica. Acknowledgements This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. We sincerely thank the two anonymous Reviewers for their valuable suggestions and comments that improved the quality of the manuscript. 9

Ecological Indicators 108 (2020) 105744

E. Catucci and M. Scardi

Nicodemus, K.K., 2011. Letter to the editor: on the stability and ranking of predictors from random forest variable importance measures. Briefings Bioinf. 12, 369–373. Orth, R.J., Carruthers, T.J.B., Dennison, W.C., Duarte, C.M., Fourqurean, J.W., Heck, K.L., Hughes, A.R., Kendrick, G.A., Kenworthy, W.J., Olyarnik, S., Short, F.T., Waycott, M., Williams, S.L., 2006. A global crisis for seagrass ecosystems. Bioscience 56, 987. https://doi.org/10.1641/0006-3568(2006) 56[987:AGCFSE]2.0.CO;2. Personnic, S., Boudouresque, C.F., Astruch, P., Ballesteros, E., Blouet, S., Bellan-Santini, D., Bonhomme, P., Thibault-Botha, D., Feunteun, E., Harmelin-Vivien, M., Pergent, G., Pergent-Martini, C., Pastor, J., Poggiale, J.-C., Renaud, F., Thibaut, T., Ruitton, S., 2014. An ecosystem-based approach to assess the status of a mediterranean ecosystem, the Posidonia oceanica Seagrass Meadow. PLoS ONE 9, e98994. https://doi. org/10.1371/journal.pone.0098994. Prasad, A.M., Iverson, L.R., Liaw, A., 2006. Newer classification and regression tree techniques: bagging and random forests for ecological prediction. Ecosystems 9, 181–199. Reiss, H., Birchenough, S., Borja, A., Buhl-Mortensen, L., Craeymeersch, J., Dannheim, J., Darr, A., Galparsoro, I., Gogina, M., Neumann, H., Populus, J., Rengstorf, A.M., Valle, M., van Hoey, G., Zettler, M.L., Degraer, S., 2014. Benthos distribution modelling and its relevance for marine ecosystem management. ICES J. Mar. Sci. 72, 297–315. https://doi.org/10.1093/icesjms/fsu107. Robinson, L.M., Elith, J., Hobday, A.J., Pearson, R.G., Kendall, B.E., Possingham, H.P., Richardson, A.J., 2011. Pushing the limits in marine species distribution modelling: lessons from the land present challenges and opportunities: marine species distribution models. Glob. Ecol. Biogeogr. 20, 789–802. https://doi.org/10.1111/j. 1466-8238.2010.00636.x. Ruiz, J.M., Pérez, M., Romero, J., 2001. Effects of fish farm loadings on seagrass (Posidonia oceanica) distribution, growth and photosynthesis. Mar. Pollut. Bull. 42, 749–760. Scardi, M., Martin, C.S., Valavanis, V., Fraschetti, S., Belluscio, A., Gristina, M., Salomidi, M., Punzo, E., Panayotidis, P., Giannoulaki, M., 2013. Modeling of protected habitats using predictor variables. Mediterranean Sensitive Habitats (MEDISEH) Final Report. DG MARE Specific Contract SI2.600741. Scornet, E., 2017. Tuning parameters in random forests. In: ESAIM: Proceedings and Surveys, pp. 144–162. Segal, M., Xiao, Y., 2011. Multivariate random forests. Wiley Interdiscipl. Rev.: Data Min. Knowl. Discovery 1, 80–87. Segurado, P., Araújo, M.B., 2004. An evaluation of methods for modelling species distributions: methods for modelling species distributions. J. Biogeogr. 31, 1555–1568. https://doi.org/10.1111/j.1365-2699.2004.01076.x. Short, F.T., Polidoro, B., Livingstone, S.R., Carpenter, K.E., Bandeira, S., Bujang, J.S., Calumpong, H.P., Carruthers, T.J., Coles, R.G., Dennison, W.C., 2011. Extinction risk assessment of the world’s seagrass species. Biol. Conserv. 144, 1961–1971. Strobl, C., Boulesteix, A.-L., Kneib, T., Augustin, T., Zeileis, A., 2008. Conditional variable importance for random forests. BMC Bioinf. 9, 307. Telesca, L., Belluscio, A., Criscoli, A., Ardizzone, G., Apostolaki, E.T., Fraschetti, S., Gristina, M., Knittweis, L., Martin, C.S., Pergent, G., Alagna, A., Badalamenti, F., Garofalo, G., Gerakaris, V., Louise Pace, M., Pergent-Martini, C., Salomidi, M., 2015. Seagrass meadows (Posidonia oceanica) distribution and trajectories of change. Sci. Rep. 5. https://doi.org/10.1038/srep12505. Thessen, A., 2016. Adoption of machine learning techniques in ecology and earth science. One Ecosyst. 1, e8621. https://doi.org/10.3897/oneeco.1.e8621. Vacchi, M., Montefalcone, M., Schiaffino, C.F., Parravicini, V., Bianchi, C.N., Morri, C., Ferrari, M., 2014. Towards a predictive model to assess the natural position of the Posidonia oceanica seagrass meadows upper limit. Mar. Pollut. Bull. 83, 458–466. https://doi.org/10.1016/j.marpolbul.2013.09.038. Waycott, M., Duarte, C.M., Carruthers, T.J.B., Orth, R.J., Dennison, W.C., Olyarnik, S., Calladine, A., Fourqurean, J.W., Heck, K.L., Hughes, A.R., Kendrick, G.A., Kenworthy, W.J., Short, F.T., Williams, S.L., 2009. Accelerating loss of seagrasses across the globe threatens coastal ecosystems. Proc. Natl. Acad. Sci. 106, 12377–12381. https://doi.org/10.1073/pnas.0905620106. Whittock, P.A., Pendoley, K.L., Hamann, M., 2016. Using habitat suitability models in an industrial setting: the case for internesting flatback turtles. Ecosphere 7, e01551. https://doi.org/10.1002/ecs2.1551. Zweig, M.H., Campbell, G., 1993. Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. Clin. Chem. 39, 561–577.

Ecol. Model. 135, 147–186. https://doi.org/10.1016/S0304-3800(00)00354-9. Guisan, A., Tingley, R., Baumgartner, J.B., Naujokaitis-Lewis, I., Sutcliffe, P.R., Tulloch, A.I.T., Regan, T.J., Brotons, L., McDonald-Madden, E., Mantyka-Pringle, C., Martin, T.G., Rhodes, J.R., Maggini, R., Setterfield, S.A., Elith, J., Schwartz, M.W., Wintle, B.A., Broennimann, O., Austin, M., Ferrier, S., Kearney, M.R., Possingham, H.P., Buckley, Y.M., 2013. Predicting species distributions for conservation decisions. Ecol. Lett. 16, 1424–1435. https://doi.org/10.1111/ele.12189. Guisan, A., Thuiller, W., Zimmermann, N.E., 2017. In: Habitat Suitability and Distribution Models: With Applications in R. Cambridge University Press, Cambridge. https://doi. org/10.1017/9781139028271. Halpern, B.S., Walbridge, S., Selkoe, K.A., Kappel, C.V., Micheli, F., D’Agrosa, C., Bruno, J.F., Casey, K.S., Ebert, C., Fox, H.E., Fujita, R., Heinemann, D., Lenihan, H.S., Madin, E.M.P., Perry, M.T., Selig, E.R., Spalding, M., Steneck, R., Watson, R., 2008. A global map of human impact on marine ecosystems. Science 319, 948–952. https://doi.org/ 10.1126/science.1149345. Hemminga, M.A., Duarte, C.M., 2000. Seagrass Ecology. Cambridge University Press, Cambridge, UK, New York, NY. Hirzel, A.H., Le Lay, G., Helfer, V., Randin, C., Guisan, A., 2006. Evaluating the ability of habitat suitability models to predict species presences. Ecol. Model. 199, 142–152. https://doi.org/10.1016/j.ecolmodel.2006.05.017. Hirzel, A.H., Le Lay, G., 2008. Habitat suitability modelling and niche theory. J. Appl. Ecol. 45, 1372–1381. https://doi.org/10.1111/j.1365-2664.2008.01524.x. Kruppa, J., Schwarz, A., Arminger, G., Ziegler, A., 2013. Consumer credit risk: individual probability estimates using machine learning. Expert Syst. Appl. 40, 5125–5131. Landis, J.R., Koch, G.G., 1977. The measurement of observer agreement for categorical data. Biometrics 159–174. Leriche, A., Pasqualini, V., Boudouresque, C.-F., Bernard, G., Bonhomme, P., Clabaut, P., Denis, J., 2006. Spatial, temporal and structural variations of a Posidonia oceanica seagrass meadow facing human activities. Aquat. Bot. 84, 287–293. https://doi.org/ 10.1016/j.aquabot.2005.10.001. Liaw, A., Wiener, M., 2002. In: Classification and Regression by randomForest, pp. 5. Libes, M., Boudouresque, C.-F., 1987. Uptake and long-distance transport of carbon in the marine phanerogam Posidonia oceanica. Mar. Ecol. Prog. Ser. 177–186. Louppe, G., Wehenkel, L., Sutera, A., Geurts, P., 2013. Understanding variable importances in forests of randomized trees. Adv. Neural Inf. Process. Syst. 431–439. Malea, P., Mylona, Z., Kevrekidis, T., 2019. Improving the utility of the seagrass Posidonia oceanica as a biological indicator of past trace element contamination. Ecol. Ind. 107, 105596. Manel, S., Dias, J.-M., Ormerod, S.J., 1999. Comparing discriminant analysis, neural networks and logistic regression for predicting species distributions: a case study with a Himalayan river bird. Ecol. Model. 120, 337–347. https://doi.org/10.1016/S03043800(99)00113-1. Manel, S., Williams, H.C., Ormerod, S.J., 2001. Evaluating presence-absence models in ecology: the need to account for prevalence: presence-absence modelling. J. Appl. Ecol. 38, 921–931. https://doi.org/10.1046/j.1365-2664.2001.00647.x. Marbà, N., Hemminga, M.A., Mateo, M.A., Duarte, C.M., Mass, Y.E., Terrados, J., Gacia, E., 2002. Carbon and nitrogen translocation between seagrass ramets. Mar. Ecol. Prog. Ser. 226, 287–300. Martin, C.S., Giannoulaki, M., De Leo, F., Scardi, M., Salomidi, M., Knittweis, L., Pace, M.L., Garofalo, G., Gristina, M., Ballesteros, E., Bavestrello, G., Belluscio, A., Cebrian, E., Gerakaris, V., Pergent, G., Pergent-Martini, C., Schembri, P.J., Terribile, K., Rizzo, L., Ben Souissi, J., Bonacorsi, M., Guarnieri, G., Krzelj, M., Macic, V., Punzo, E., Valavanis, V., Fraschetti, S., 2015. Coralligenous and maërl habitats: predictive modelling to identify their spatial distributions across the Mediterranean Sea. Sci. Rep. 4. https://doi.org/10.1038/srep05073. Mattei, F., Franceschini, S., Scardi, M., 2018. A depth-resolved artificial neural network model of marine phytoplankton primary production. Ecol. Model. 382, 51–62. https://doi.org/10.1016/j.ecolmodel.2018.05.003. Montefalcone, M., 2009. Ecosystem health assessment using the Mediterranean seagrass Posidonia oceanica: a review. Ecol. Ind. 9, 595–604. https://doi.org/10.1016/j. ecolind.2008.09.013. Montefalcone, M., Albertelli, G., Nike Bianchi, C., Mariani, M., Morri, C., 2006. A new synthetic index and a protocol for monitoring the status ofPosidonia oceanica meadows: a case study at Sanremo (Ligurian Sea, NW Mediterranean). Aquat. Conserv. Mar. Freshwater Ecosyst. 16, 29–42. https://doi.org/10.1002/aqc.688.

10

A Machine Learning approach to the assessment of the vulnerability of Posidonia oceanica meadows

A Machine Learning approach to the assessment of the vulnerability of Posidonia oceanica meadows

Recommend Documents