Spatial Statistics 30 (2019) 27–38
Contents lists available at ScienceDirect
Spatial Statistics journal homepage: www.elsevier.com/locate/spasta
Design-based estimation of mark variograms in forest ecosystem surveys ∗
A. Marcelli a , , P. Corona b , L. Fattorini c a
Department for Innovation in Biological, Agro-food and Forest systems, University of Tuscia, Via San Camillo de Lellis s.n.c., 01100 Viterbo, Italy b CREA Research Centre for Forestry and Wood, viale Santa Margherita 80, 52100 Arezzo, Italy c Department of Economics and Statistics, University of Siena, Piazza San Francesco 8, 53100 Siena, Italy
article
info
Article history: Received 22 November 2018 Accepted 14 February 2019 Available online 21 February 2019 Keywords: Horvitz–Thompson estimation Jackknife variance estimation Plot sampling Replicated plots Simulation study Tree spatial interaction
a b s t r a c t Mark variograms are widely adopted in forest ecosystem studies for analyzing spatial interaction among trees. Inference on mark variograms can be performed under a model-dependent perspective (ergodic variograms) or under a deterministic perspective (non-ergodic variograms). A simple and workable definition of non-ergodic mark variogram is introduced based on distances between pairs of trees distinguished by distance classes. Designbased estimators of mark variogram values are proposed as the ratio of two Horvitz–Thompson estimators using replicated plots randomly located on the study area and making use of the inclusion probabilities of the pairs of trees. Jackknife estimators of their variances are considered. A simulation study is performed on a real forest stand to check the statistical performance of the proposed strategy. © 2019 Elsevier B.V. All rights reserved.
1. Introduction Any rational decision related to the conservation, maintenance and enhancement of the multiple functions provided by forest ecosystems needs to be based on objective, reliable information (Corona, 2018). Under this perspective, over the last decades a large number of studies have been performed to quantify forest ecosystem structure and diversity in a spatially explicit way: results ∗ Corresponding author. E-mail addresses:
[email protected] (A. Marcelli),
[email protected] (P. Corona),
[email protected] (L. Fattorini). https://doi.org/10.1016/j.spasta.2019.02.002 2211-6753/© 2019 Elsevier B.V. All rights reserved.
28
A. Marcelli, P. Corona and L. Fattorini / Spatial Statistics 30 (2019) 27–38
from these studies have led to the safe conclusion that spatially explicit measures provide relevant information on biodiversity patterns in forest stands and, hence, should be important elements of forest applied ecology and sustainable management (Pommerening, 2002; Gadow and Hui, 2002; Pommerening and Sánchez Meador, 2018). The spatial pattern of trees and of their attributes throughout forest stands has been usually analyzed under a model-dependent approach, viewing tree locations and attributes as a realization of a marked point process. The analysis of these processes can be performed by simple spatially explicit indexes such as quadrat counts, distance-based methods and angle-based methods (see e.g. Cressie, 1991; Illian et al., 2008), thus revealing features of the spatial structure at the nearest neighbor scale. On the other hand, modern analysis of marked point processes exploits functions depending on distances between trees in the stand that can be used to simultaneously detect features at different spatial scale (Szmyt, 2014). Among the functional statistics adopted for model-dependent inference on marked point processes, the mark variogram has been widely adopted in forest studies (Pommerening and Särkkä, 2013 and references therein). Isaaks and Srivastava (1988) refer to variograms arising from secondorder stationary spatial processes as ergodic variograms, where expectation is done with respect to all the possible realizations of the process. The effectiveness of mark variogram in analyzing spatial interaction among trees clearly emerges from several studies. If the distribution of trees does not show cluster or trended patterns and if marks do not show a dependence with tree locations, then mark variograms are just similar to horizontal lines. On the other hand, if trees of similar sizes are arranged in clusters, the expected variogram is an increasing curve having the variance of marks as asymptote, just like geostatistical variograms from random fields (Biondi et al., 1994; Garcia, 2006). In some cases, variograms turn out to be decreasing for small distances, a feature caused by pairs of plants with different sizes at close proximity, usually referred to as small-scale size diversity. Explanations of this unusual (from a geostatistical point of view) behavior have been attempted in literature (e.g. Reed and Burkhart, 1985; Stoyan and Wälder, 2000; Kint et al., 2003; Suzuki et al., 2008; Pommerening and Särkkä, 2013). Man-made disturbances (e.g. thinnings) constitute possible causes: large mature trees are usually surrounded by clusters of many small trees. If most of these small trees are removed, then pairs of large and small trees at short distances become dominant, in such a way that the resulting variograms have larger values for small distances than for large ones. Regarding the estimation of ergodic mark variograms from field data, sampling protocols adopted for data collection are sometimes neglected and exhaustive mapping and callipering of forest stand are tacitly presumed. For example, Cressie (1991, Section 8.7.2) claims the knowledge of locations and of the corresponding marks for the events in a bounded study region. Similarly, Pommerening and Särkkä (2013, p. 66) claim a recording ‘‘throughout the observation windows’’. Unfortunately, mapping and callipering a whole forest stand is unsuitable and unusual in forest study and is completely prohibitive especially for large stands. On the other hand, much work has been done in developing efficient sampling strategies to estimate ergodic variograms (for a complete review see de Gruijter et al., 2006 chapter 9). Regrettably, most of the sampling schemes proposed are suitable for sampling locations on a continuous support, i.e. to estimate geostatistical variograms, but are difficult to be applied for estimating mark variograms, where discrete populations of points have to be sampled. From a deterministic perspective, in which populations are considered fixed rather than realizations of spatial processes, Isaaks and Srivastava (1988) give a rigorous definition of variograms, referring to them as non-ergodic variograms. In this setting, the only source of uncertainty stems from the schemes adopted for sampling locations from which variograms are estimated. Brus and de Gruijter (1994) proposed several sampling schemes for performing non-ergodic variogram estimation. However, also in this case, most schemes are suitable for sampling locations on a continuous support, but are difficult to be applied for sampling a discrete set of points. Moreover, a further difficulty is that the non-ergodic variogram introduced following Isaaks and Srivastava (1988) is not suitable for working with finite population of points, because it may remain undefined in most cases. Practically speaking, for both model dependent and deterministic approaches, mark variograms estimation is problematic. The purpose of this paper is to propose a more simple, naïve, but workable definition of nonergodic mark variogram based on distance classes. The resulting mark variogram is suitable to be
A. Marcelli, P. Corona and L. Fattorini / Spatial Statistics 30 (2019) 27–38
29
estimated in a design-based setting using replicated plots randomly located on the study area and making use of the inclusion probabilities of the pairs of trees. The paper is organized as it follows. In Section 2, the non-ergodic mark variogram from a finite population of marked points is defined for classes of distance rather than for single distance values, owing to the possible scarcity or the absence of pairs having such distances. Subsequently, in Section 3, the mark-variogram values are estimated under any sampling scheme inducing known first- and second-order inclusion probabilities as the ratio of two Horvitz–Thompson (HT) estimators. General considerations about unbiasedness, variance and variance estimation of the resulting estimators are given. Then, in Section 4, the proposed estimators are applied when the selected trees are those contained within several plots randomly located on the forest stand. Approximate unbiasedness of the resulting estimators is highlighted and a jackknife estimator of their variances is proposed. Finally, in Section 5, a simulation study is performed on a real forest stand to empirically check the performance of the strategy. Some final considerations are given in Section 6. 2. Variograms in finite populations of marked points Let A be the study area in which a finite population U = {1, . . . , N } of N units (e.g. trees) is located on a set of as many spatial locations D = {p1 , . . . , pN }. Denote by y(pj ) the value of the interest variable (marks) at location pj and by djh the distance between locations pj and ph . Traditionally, correlation between marks is analyzed in a model-dependent framework, i.e. it is assumed that the locations of events and their corresponding marks are realizations of a stochastic process of the form {y(p), p ∈ D}, where both y(·) and D are random (see e.g. Cressie, section 8.7). In this framework, under the further assumption of stationarity and isotropy of the underlying marked point process, the ergodic mark variogram is defined to be
{ } v (r) = Em [y(p) − y(p + r)]2 |p, p + r ∈ D where Em {·|p, p + r ∈ D} denotes model-based expectation conditional on the event that there are two units located at p and p + r, and r > 0 denotes the distance between p and p + r. On the other hand, in a deterministic framework, where the N locations and marks are viewed as fixed constants, Brus and de Gruijter (1994), in analogy with Isaaks and Srivastava (1988), define the non-ergodic mark-variogram as
ν (r) =
1 N(r)
∑
[
y(pj ) − y(ph )
]2
(1)
(pj ,ph )∈C (r)
where C (r) = (pj , ph ): pj ∈ D ∩ D−r ; ph ∈ D ∩ D+r
{
}
N(r) is the number of pairs in C (r), D−r = {p1 − r, . . . , pN − r} and D+r = {p1 + r, . . . , pN + r} are translations of D. Practically speaking, for a given r, only those locations that fall within D ∩ D−r and that can be paired with another location of Dr away can be considered in Eq. (1). Owing to the discrete nature of D, it is unlikely to find pairs of vectors belonging to C (r), i.e. C (r) is likely to be empty, leaving (1) undefined for most r. To bypass the issue, an alternative, more simple definition of non-ergodic mark variogram should be adopted. Obviously, a non-ergodic mark variogram cannot be defined for any distance r > 0, because distances between all the N(N − 1)/2 pairs of locations in D are likely to differ each other. Therefore, it is convenient to arbitrarily choose K classes of distance d0 − d1 , d1 − d2 , . . . , dK −1 − dK where d0 = 0 and dk = ∞, in such a way that Ck = (pj , ph ): h > j ∈ U, dk−1 ≤ djh < dk
{
}
, k = 1, . . . , K
30
A. Marcelli, P. Corona and L. Fattorini / Spatial Statistics 30 (2019) 27–38
Fig. 1. An example of graphic representation of mark variogram in a finite population with 5 distance classes. The horizontal lines represent the mark variogram values for each class.
is the set of pairs of units having distances belonging to the class k. On the basis of such classes, the non-ergodic mark variogram can be defined as
vk =
1
]2
∑ [
y(pj ) − y(ph )
Nk
(2)
(pj ,ph )∈Ck
where Nk is the number of pairs in Ck . Practically speaking, whereas the units in U are individual trees, the units in Ck are pairs of trees that are classified on the basis of their separation distances. Once the K classes are fixed, the K variogram values v1 , . . . , vK give insights on the level of variability of marks at the various distance classes (see e.g. Fig. 1 where the variability between marks increases with distances). Even if arbitrary, the K distance classes should be chosen in order to be not too large, for avoiding the masking of the variability pattern as distance varies, but also not too small for avoiding to have few units within the classes. However, whenever the classes are, knowing the variogram values v1 , . . . , vK involves knowing locations and marks for all units in the population (e.g. locations and basal areas or volumes of all the trees in a forest stand), that is unfeasible in most cases. Therefore, v1 , . . . , vK constitute unknown parameter to be estimated from a sample survey. 3. Design-based estimation of mark variogram values Let S be a sample of n < N units selected from the population U by means of a sampling scheme with first-order inclusion probabilities πj > 0 for any j ∈ U, and second-order inclusion probabilities πjh for each pair of units h > j ∈ U. To estimate vk from S it should be pointed out that Eq. (2) is a ratio between two totals, where the numerator is the total of the squared differences (yj − yh )2 of the pairs of units belonging to Ck , while the denominator is the cardinality of Ck , i.e. the total of a variable equal to 1 for each pair in Ck . Then, in accordance with the well-known HT criterion for estimating totals, vk can be estimated by the ratio of the HT estimators of the two totals, i.e. 2
∑ vˆ k =
(j,h)∈Sk
[y(pj )−y(ph )] πjh
1 (j,h)∈Sk πjh
∑
(3)
A. Marcelli, P. Corona and L. Fattorini / Spatial Statistics 30 (2019) 27–38
31
where Sk is the set of the pairs of S with distances in the class k. Obviously, Eq. (3) is undefined when Sk = ∅, i.e. when no pair in the sample S has distance in the class k. Totals in Eq. (2) both refer to a population of pairs. Therefore, if there are pairs that cannot enter the sample, both the HT estimators in Eq. (3) are negatively biased. To avoid the drawback, a sampling scheme ensuring invariably positive second-order inclusion probabilities should be adopted. However, even if the unbiasedness of the two estimators involved in (3) is ensured by the sampling scheme, nothing ensures that the ratio of the twos is unbiased. In this case, as is customary in sampling theory, approximate results can be achieved using the Taylor series approximation of (3) up to the first order, achieving approximate unbiasedness and an approximate variance expression that can be unbiasedly estimated from the sample (see e.g. Särndal et al., 1992, section 5.5). Unfortunately, in this case the analytical expression of the approximate variance and of its estimator, subsequently, would be much more complex because we are dealing with populations of pairs, in such a way that third- and fourth-order inclusion probabilities would be involved in those expressions. Fortunately, the problem of estimating the variance of (3) can be readily bypassed in forest surveys when, as usual, replicated plots are adopted for sampling trees. 4. Design-based estimation of mark variogram values from replicated plots Plot sampling is probably the most widely applied scheme in forest surveys (see e.g. Gregoire and Valentine, 2008, Chapter 7). Plot sampling is usually performed by: (i) locating a circular plot of a pre-fixed size a at random onto a region A∗ constituted by the study area A enlarged by a buffer of width greater than or equal to the radius of the plot; (ii) including in the sample S all the trees that are located in the study area A (see Fig. 2). The enlargement of the study area onto which plots are randomly located is performed to handle edge effects. Indeed in this way the inclusion probability of each tree invariably turns out to be πj = a/|A∗| for each j ∈ U, where |A∗| denotes the size of A∗, and it can be previously determined without no field measurement (see Gregoire and Valentine, 2008, section 7.5). Evidently, a single plot of few-meter radius is not sufficient for adequately sampling a study area of some hectares or even hundreds of hectares. Usually R plots are randomly and independently located onto the enlarged area (see Fig. 2). In statistical literature this scheme is usually referred to as uniform random sampling (URS) (see e.g. Barabesi and Franceschi, 2011; Barabesi et al., 2012). Obviously, the R plots give rise to as many samples of trees S1 , . . . , SR . Because the plots are randomly located, some of them may overlap, in such a way that a tree can be sampled more times. Therefore, it seems suitable to consider the final sample of final random size n(R) as the union of R
the R samples, say S(R) = ∪ Si , in such a way that a tree is sampled if it is sampled by at least one i=1
plot (see Fig. 2). In this way, the inclusion probability of any tree j ∈ U is given by
( )R a πj(R) = 1 − 1 − ,j∈U |A∗|
(4)
Regarding the second order inclusion probabilities, from the basic probability laws, it follows that
( ) πjh(R) = Pr j, h ∈ S(R) ( ) ( ) ( ) = 1 − Pr j ∈ / S(R) − Pr h ∈ / S(R) + Pr j ∈ / S(R) , h ∈ / S(R) ( )R ( )R a ajh a + 1−2 + , h>j∈U =1−2 1− |A∗| |A∗| |A∗|
(5)
where ajh denotes the size of the intersection of the two plots of size a centered at pj and ph . From Eq. (5), it follows that the second-order inclusion probabilities of those pairs of trees that cannot be selected in the same plot because their distance is greater than the plot radius, i.e. for those pairs of trees such that ajh = 0, reduce to
(
πjh(R) = 1 − 2 · 1 −
a
|A∗|
)R
( + 1−2
a
|A∗|
)R (6)
32
A. Marcelli, P. Corona and L. Fattorini / Spatial Statistics 30 (2019) 27–38
Fig. 2. An Example of replicated plot sampling: the R = 5 plots are randomly located onto area A* (red rectangle) achieved enlarging the study area (black rectangle) by a buffer of width equal to the plot radius. Only the trees within the study area A (black points) are sampled, while trees outside (white points) are disregarded. The blue points are the random centers of plots, the points highlighted in red are the sampled trees. The final sample is constituted by the whole set of points highlighted in red. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Obviously, for any pair of units with distance smaller than the plot radius, the second order inclusion probabilities vary from (6) to (4). Eq. (6) gives the lower bound of these probabilities for distances smaller than, but approaching, the plot radius, while (4) gives their upper bound for distances approaching 0. Using the second-order inclusion probabilities into Eq. (3), the estimator of the mark variogram values from replicated plot sampling turns out to be 2
∑ vˆ (R)k =
(j,h)∈S(R)k
[y(pj )−y(ph )] πjh(R)
1 (j,h)∈S(R)k πjh(R)
∑
, k = 1, . . . , K
(7)
where S(R)k is the set of the pairs of S(R) with distances in the class k. For classes of distances greater than the plot radius, the second-order inclusion probabilities are constant. Therefore, the estimator (7) simply reduces to the sample mean of the squared distances, i.e.
vˆ (R)k =
1 n(R)k
∑ [
]2
y(pj ) − y(ph )
(j,h)∈S(R)k
where n(R)k is the number of pairs in S(R)k . Because the second-order inclusion probabilities of type (5) are invariably positive for each pair of trees in the study area, the estimator (7) is the ratio of two unbiased estimators. Therefore, it is unbiased up to the first-order approximation, with approximate variance involving inclusion probabilities of higher order than the second. However, because the estimator (7) can be viewed as a function of the R replicated samples S1 , . . . , SR , its variance can be estimated by using the jackknife technique, deleting one sample at a time.
A. Marcelli, P. Corona and L. Fattorini / Spatial Statistics 30 (2019) 27–38
33
To this purpose, let vˆ (−i)k be the estimator of vk achieved, mutatis mutandis, from Eq. (6) deleting the i-th sample Si . Then, let
v (R)k,jack =
R 1∑
R
(i) vˆ (R)k
i=1
(i)
be the average of the pseudovalues, i.e. vˆ (R)k = Rvˆ (R)k − (R − 1)vˆ (−i)k . The jackknife variance estimator of (7) is given by Vˆ ar(R)k,jack =
1 R(R − 1)
R [ ]2 ∑ (i) vˆ (R)k − v (R)k,jack
(8)
i=1
However, because Eq. (7) cannot be written as a function of the arithmetic mean of R statistics of type ti = τ (Si ) (i = 1, . . . , R) arising from the R samples, the familiar asymptotic results (R → ∞) about the consistency of the jackknife variance estimator (see e.g. Shao and Tu, 1995, section 2.1.1) cannot be claimed in this case. 5. Simulation study A simulation study was performed on a real forest stand to check the statistical properties of the variogram estimator (7) and of its variance estimator (8) based on replicated plots. 5.1. Population The population considered for the simulation study was a collection U of N = 387 trees within a rectangular region of size 70 × 140 = 9,800 m2 located in the Natural Reserve of Bosco Fontana (Northern Italy). This forest is one of the last remaining floodplain forests in the Northern Italy and one of the most endangered ecosystems in Europe (Fardusi et al., 2017) belonging to the Sub-Atlantic and medio-European oak-hornbeam forests of the Carpinionbetuli (code 9160, Habitats Directive1992/43/CEE), European Forest Type 5.1 (Pedunculate oak-hornbeam forests; (Barbati et al., 2014). The high level of diversity and naturalness of the stands makes this forest an ideal site for studying structure and dynamics in mixed floodplain forests. The location of each tree in the population (CA1 permanent plot, see (Fardusi et al., 2017) was recorded (see Fig. 3) together with its height and basal area expressed in m and cm2 , respectively. The dataset is available at https://d ata.mendeley.com/datasets/n8827ssnv7/2. 5.2. Mark variogram values Five distance classes were adopted to partition the collection of 74,691 pairs of trees in the population: the first class C1 was constituted by the 16,704 pairs (22%) having distances <30 m, the second class C2 was constituted by the 13,942 pairs (19%) having distances of 30–45 m, the third class C3 was constituted by the 13,576 pairs (18%) having distances of 45–60 m, the fourth class C4 was constituted by the 14,213 pairs (19%) having distances of 60–80 m, the fifth class C5 was constituted by the 16,256 pairs (22%) having distances ≥ 80 m. The variogram values of these classes were computed from Eq. (2) for heights and basal areas. The resulting values of vk for the two variables and for the five classes are reported in Table 1. Because variogram values are squared quantities they are expressed in m2 for heights and in cm4 for basal area. Table 1 also reports in brackets the square roots of the variogram values that are expressed in m and cm, respectively, and can be more easily interpreted as the average variation of heights and basal areas within the classes. From Table 1 it is apparent that the mark variogram values were very similar for both the attributes, suggesting in both cases a variogram approaching a horizontal line, a shape arising under random or regular distributions of trees onto the stand and marks independently assigned to trees, irrespective of their positions.
34
A. Marcelli, P. Corona and L. Fattorini / Spatial Statistics 30 (2019) 27–38 Table 1 Mark variogram values for the five distance classes and for heights and basal area. Values in brackets are the square roots of the mark variogram values. Distance class
vk (height)
vk (basal area)
< 30 m
55.00 59.51 59.27 60.00 60.35
48.11 (6.9) 51.73 (7.2) 48.96 (7.0) 48.54 (7.0) 47.29(6.9)
30–45 m 45–60 m 60–80 m > 80 m
(7.4) (7.7) (7.7) (7.7) (7.8)
5.3. Sampling and estimation The sampling scheme adopted in the simulation was the replicated plot sampling with plots of 13 m radius, corresponding to a size of 530.93 m2 , as customary in many forest surveys. In order to avoid edge effects and to ensure that all the trees in the stand had the same first-order inclusion probabilities, the region onto which plots were randomly selected was enlarged by a buffer of width 13 m, achieving a surface of |A∗ | = 15,936 m2 (see Fig. 3), from which the inclusion probability for any tree to be included into a single plot turned out to be πj = 530.93/15, 936 = 0.0333. Sampling was performed adopting R = 10,15,20,30 replicated plots. Therefore, from Eq. (4), each tree in the stand had a probability to enter the final sample S(R) equal to 0.29, 0.40, 0.49, 0.64, respectively. Regarding the second-order inclusion probabilities there were 70,907 pairs out of 74,691 pairs in the population (95%) having distances greater than 13 m. For these pairs the second-order inclusion probabilities were invariably equal to 0.077, 0.152, 0.236, 0.402, respectively. For the remaining 3,784 pairs (5%) with distances smaller than 13 m, their second-order inclusion probabilities ranged from 0.077, 0.152, 0.236, 0.402 to 0.29, 0.40, 0.49, 0.64, respectively. An example of the final sample achieve with R = 10 plots is reported in Fig. 3. Once the R plots were randomly located on the enlarged area and the final sample S(R) was obtained as the set of trees sampled at least by one plot, the estimates υˆ (R)k were achieved by means of Eq. (7) for k = 1, . . . , 5 and for both the considered forest attributes, while the variance estimates were achieved by means of Eq. (8). Subsequently, an estimate of the relative standard ˆ (R)k = Vˆ ar 1/2 /v(R)k . error was achieved by the ratio RSE (R)k,jack 5.4. Performance indexes For each R = 10,15,20,30 the selection of the whole sample S(R) was simulated M = 10,000 times. At each simulation run, the final sample size n(R),i , the mark variogram estimates vˆ (R)k,i and ˆ (R)k,i were achieved for the considered the corresponding estimates of the relative standard error RSE attributes, for each k = 1, . . . , 5 and each i = 1, . . . , M, obtaining the Monte Carlo distribution of these quantities. From Monte Carlo distributions, the expectation of the final sample size (EFSS) E(n(R) ) =
M 1 ∑
M
n(R),i
i=1
together with the expectation and the mean squared error of the estimator (7) E(vˆ (R)k ) =
M 1 ∑
M
vˆ (R)k,i , k = 1, . . . , 5
i=1
and MSE(vˆ (R)k ) =
M 1 ∑ (vˆ (R)k,i − v(R)k )2 , k = 1, . . . , 5 M i=1
A. Marcelli, P. Corona and L. Fattorini / Spatial Statistics 30 (2019) 27–38
35
Fig. 3. Map of the locations of the 387 trees of the population (black points) adopted in the simulation study. The black rectangle delineates the study area, the red rectangle delineates the region onto which plots are randomly located to avoid edge effects. The blue points are the random centers of R = 10 replicated plots, the points highlighted in red are the trees selected in the final sample. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
were empirically derived, from which the relative bias (RB)
( ) ( ) E vˆ (R)k − v(R)k RB vˆ (R)k = , k = 1, . . . , 5 v(R)k and the relative root mean squared error (RRMSE)
√ ( ) RRMSE vˆ (R)k =
MSE vˆ (R)k
(
v(R)k
) , k = 1, . . . , 5
36
A. Marcelli, P. Corona and L. Fattorini / Spatial Statistics 30 (2019) 27–38
Table 2 Percent values of relative bias (RB), relative root mean squared error (RRMSE) and expectation of the relative standard error estimator (ERSEE) of the mark variograms values for height and basal area attributes within K = 5 distance classes from a forest stand of N = 387 trees in the Natural Reserve of Bosco Fontana (Northern Italy). Attribute
Height
Basal area
Distance class
R
RB
RRMSE
ERSEE
RB
RRMSE
ERSEE
< 30 m
10 15 20 30
−0.2% −0.1% 0.0% 0.0%
10.3% 7.7% 6.0% 4.3%
13.7% 10.8% 9.1% 7.6%
1.1% 0.7% 0.8% 0.2%
25.0% 18.4% 15.0% 10.8%
28.9% 24.8% 21.5% 18.4%
10 15 20 30
0.7% 0.4% 0.4% 0.1%
13.4% 10.0% 7.8% 5.7%
19.1% 13.9% 11.7% 9.7%
0.5%
30–45 m
−0.1%
28.8% 19.2% 15.4% 10.8%
32.7% 25.0% 21.3% 17.9%
45–60 m
10 15 20 30
1.3% 0.6% 0.5% 0.2%
13.9% 9.7% 7.6% 5.1%
18.9% 13.5% 11.1% 8.9%
2.3% 0.7% 0.6% 0.1%
32.9% 22.0% 17.4% 12.1%
36.2% 28.1% 23.8% 20.1%
60–80 m
10 15 20 30
0.6% 0.3% 0.2% 0.0%
13.3% 9.1% 7.0% 4.7%
18.1% 12.8% 10.4% 8.3%
1.5% 0.5% 0.4% −0.1%
36.5% 24.6% 19.2% 13.4%
39.3% 30.4% 25.8% 21.9%
0.6%
> 80 m
10 15 20 30
−0.2% −0.1% −0.1%
12.8% 8.8% 7.0% 5.0%
18.8% 13.0% 10.6% 8.6%
0.1% 0.1% 0.1% −0.1%
41.3% 30.1% 23.5% 16.6%
41.1% 32.4% 28.8% 25.4%
−0.2% 0.3%
were achieved. Moreover, the expectation of the relative standard error estimator (ERSEE) was empirically obtained by
ˆ (R)k ) = E(RSE
M 1 ∑
M
ˆ (R)k,i , k = 1, . . . , 5 RSE
i=1
For each attribute, each distance class k = 1, . . . , 5 and each R = 10, 15, 20, 30, Table 2 reports the percent values of RB, RRMSE and ERSEE, while for R = 10, 15, 20, 30 the EFSS values were 111, 154, 190 and 247, respectively, corresponding to sampling fractions of 29%, 40%, 49% and 64% . 5.5. Results Even if no theoretical result can ensure unbiasedness of estimator (7), it turned out to be approximately unbiased, with RB rarely greater than one percentage point. The RB became negligible as the number of plots increased. The precision of estimator (7) was satisfactory when estimating the variogram values of heights, with RRMSEs invariably smaller than 10% for R ≥ 15 (sampling fractions ≥ 40%). On the other hand the precision of (7) deteriorated when estimating the variogram values of basal areas, with RRMSEs invariably greater than 10% even for a large number of plots (R = 30 and sampling fraction of 64%). For both the attributes, RRMSEs quickly decreased with R, suggesting the consistency of the strategy. Regarding the estimation of the precision, the jackknife variance estimator tended to overestimate the true variance, therefore giving rise to overestimation of the RSE. In some cases, precision was slightly overestimated, while in other cases the procedure revealed highly conservative, masking the actual precision of the strategy. 6. Discussion Mark variograms constitute effective tools for assessing spatial forest ecosystem structures. Ergodic mark variograms, arising in model-dependent frameworks, lack effective estimation strategies
A. Marcelli, P. Corona and L. Fattorini / Spatial Statistics 30 (2019) 27–38
37
that require the impractical exhaustive recording of forest stands or the use of sampling schemes suitable for continuous support but difficult to be used with discrete populations such as forests. On the other hand, from a deterministic view of tree populations, non-ergodic mark variograms as proposed by Isaaks and Srivastava (1988) are theoretically sound, ensuring symmetry and relationship with the deterministic covariance function, but are likely to be undefined for most translation vectors. Therefore, a more simple definition of non-ergodic mark variograms is introduced, that is suitable to be estimated when trees are sampled by replicated plots randomly located on the study area. The use of sample plots in forest surveys is a well-investigated and well-experienced technique. Guidelines for the effective implementation of plot sampling in real forests are available in many textbooks on environmental and forest sampling (e.g. Schreuder et al., 1993, section 7.11, Gregoire and Valentine, 2008, chapter 7). In accordance with these considerations, the paper provides a design-based, straightforward estimation criterion of mark variograms based on a sampling scheme of practical importance, being one of the most widely adopted in forest investigations. From the simulation results, the proposed strategy gives rise to a mark variogram estimator that proves to be approximately unbiased and consistent as the number of sample plots increases. A jackknife estimator of the variance of such estimator has been proven to give conservative estimates of precision. Such findings provide substantial matter, grounded on transparent and robust methodology, that contributes to consolidate new paradigms in monitoring and assessment of forest ecosystems (Corona, 2016). References Barabesi, L., Franceschi, S., 2011. Sampling properties of spatial total estimators under tessellation stratified designs. Environmetrics 22, 271–278. Barabesi, L., Franceschi, S., Marcheselli, M., 2012. Properties of design-based estimation under stratified spatial sampling. Ann. Appl. Stat. 6, 210–228. Barbati, A., Marchetti, M., Chirici, G., Corona, P., 2014. European forest types and forest europe SFM indicators: tools for monitoring progress on forest biodiversity conservation. Forest Ecol. Manag. 321, 145–157. Biondi, F., Myers, D.E., Avery, C.C., 1994. Geostatistically modelling stem size and increment in an old-growth forest. Can. J. Forest Res. 24, 1354–1368. Brus, D.J., de Gruijter, J.J., 1994. Estimation of non-ergodic variograms and their sampling variance by design-based sampling strategies. Math. Geol. 26, 437–454. Corona, P., 2016. Consolidating new paradigms in large-scale monitoring and assessment of forest ecosystems. Environ. Res. 144, 8–14. Corona, P., 2018. Communicating facts, findings and thinking to support evidence-based strategies and decisions. A. S. R. 42, 1–2. Cressie, N.A.C., 1991. Statistics for Spatial Data. Wiley, New York. de Gruijter, J.J., Brus, D.J., Bierkens, M.F.P., Knotters, M., 2006. Sampling for Natural Resource Monitoring. Springer, Berlin. Fardusi, M.J., Castaldi, C., Chianucci, F., Corona, P., Mason, F., Minari, E., Puletti, N., 2017. A Spatio-Temporal Dataset of Forest Mensuration for the Analysis of Tree Species Structure and Diversity in Semi-Natural Mixed Floodplain Forests [Dataset]. Mendeley Data, v2. Gadow, K.v., Hui, G., 2002. Characterising forest spatial structure and diversity. In: Bjoerk, L. (Eds.), Sustainable Forestry in Temperate Regions. Proceedings IUFRO Int. Workshop, Lund, Sweden, pp. 20–30. Garcia, O., 2006. Scale and spatial structure effects on tree size distribution: implications for growth and yield modelling. Can. J. Forest Res. 36, 2983–2993. Gregoire, T.G., Valentine, H.T., 2008. Sampling Strategies for Natural Resources and the Environment. Chapman & Hall, Boca Raton (FL), USA. Illian, J., Penttinen, A., Stoyan, H., Stoyan, G., 2008. Statistical Analysis and Modelling of Spatial Point Patterns. Wiley, Chichester. Isaaks, E.H., Srivastava, R.M., 1988. Spatial continuity measures for probabilistic and deterministic geostatistics. Math. Geol. 20, 313–341. Kint, V., Van Meirvenne, M., Nachergale, L., Geudens, G., Lust, N., 2003. Spatial methods for quantifying forest stand development: a comparison between nearest neighbor indices and variogram analysis. For. Sci. 49, 36–49. Pommerening, A., 2002. Approaches to quantifying forest structures. Forestry 75, 305–324. Pommerening, A., Sánchez Meador, A.J., 2018. Tamm review: Tree interactions between myth and reality. Forest Ecol. Manag. 424, 164–176. Pommerening, A., Särkkä, A., 2013. What Mark Variogram tell about spatial plant interactions. Ecol. Model. 251, 64–72. Reed, D.D., Burkhart, H.E., 1985. Spatial autocorrelation of individual tree characteristics in loblolly pine stands. For. Sci. 31, 575–587. Särndal, C.E., Swensson, B., Wretman, J., 1992. Model Assisted Survey Sampling. Springer-Verlag, New York. Schreuder, H.T., Gregoire, T.G., Wood, G.B., 1993. Sampling Methods for Multiresource Forest Inventory. Wiley, New York.
38
A. Marcelli, P. Corona and L. Fattorini / Spatial Statistics 30 (2019) 27–38
Shao, J., Tu, D., 1995. The Jackknife and Bootstrap. Springer-Verlag, New York. Stoyan, D., Wälder, O., 2000. On variograms in point process statistics. II: models of marking and ecological interpretation. Biom. J. 42, 171–187. Suzuki, S.N., Kachi, N., Suzuki, J., 2008. Development of a local size-hierarchy causes regular spacing trees in an even-aged Albies forest: analyses using spatial autocorrelation and the mark correlation function. Ann. Bot. 102, 435–441. Szmyt, J., 2014. Spatial statistics in ecological analysis: from indices to functions. Silva Fenn. 48 (1).