Comparing sampling strategies in forest monitoring programs

Comparing sampling strategies in forest monitoring programs

Pores;$ology Management ELSEVIER Forest Ecology andManagement82 ( 1996)23I-238 Comparing sampling strategies in forest monitoring programs Sucharita...

643KB Sizes 1 Downloads 230 Views

Pores;$ology Management ELSEVIER

Forest Ecology andManagement82 ( 1996)23I-238

Comparing sampling strategies in forest monitoring programs Sucharita Ghosh * , John L. Innes Swiss

Federal

Institute

for

Foresr,

Snow

and

Landscape

Research

(WSL),

8903-Birtnensdor-

Switzerlmd

Accepted2 1 August 1995

Abstract Some sampling strategies are compared using Monte Carlo simulations. In particular the sampling distributions of the sample means of defoliation and DBH are compared using a 100 m X 100 m forest site of mature Norway Spruce (Picea abies). The analysis considers possible spatial correlations. The results show that for this data set, a 10 m X 20 m plot design is almost as statistically efficient as two 10 m X 10 m randomly selected plots. Keywords:

Samplingdesign;Tree health

1. Introduction The periodic assessment of crown health has formed an important component of forest health monitoring since the mid-1980s. Numerous reports

have appeared, often based on very different sampling designs. The greatest difference appears to be between plots with fixed areas (such as in Switzerland) and plots with a fixed number of trees (for example in the European forest health monitoring programme). The fixed number of trees method normally only includes a portion of the tree population, usually the emergents, dominants and co-dominants, with suppressed trees being excluded. In the fixed

area method, suppressed trees are often included, but there may be a size limitation on the trees, with stems under a specified threshold being excluded.

* Correspondingauthor.

The problem of sample size selection and assumptions about the underlying probability distribution are also of concern. Forests are complex ecosystems, often comprising a mosaic of spatially autocorrelated individuals of a number of different species. This creates problems for traditional sampling methods based on assumptions of the uniform distribution of a single variable, for example when a particular variable is rare and patchily distributed (e.g. Bimbaum and Sirken, 1965, Kalton and Anderson, 1986). The variation in the condition of trees in individual stands has received some attention as it might poten-

tially affect the sampling design used in monitoring studies (e.g. Innes and Boswell, 1990). However, most studies of within-stand variation, such as those by Naslund (1944), Sture (1987) and Salas Gonzalez et al. (1993), have been concerened primarily with growth variations in stands. Other comprehensive studies include that of Mandallaz et al. (1987) where the authors compare efficiencies of cluster sampling and single point sampling. A particularly interesting

0378-I 127/96/$15.00 0 1996Elsevier ScienceB.V. All rights reserved XSDlO378-1127(95)03674-l

232

S. Ghosh.

J.L.

lnnes

/ Forest

Ecology

approach is adaptive cluster sampling (Thompson, 1990, Thompson, 1991, Roesch, 1993). which may be useful in studying localised problems such as pest outbreaks. This adaptive design is based on an initial sampling of the population followed by a more detailed sampling around points of interest. With recent increased interest in long-term intensive monitoring plots in forest ecosystems, the sampling design used in forest inventories has come under renewed scrutiny. In particular, considerable uncertainty surrounds the area of forest that is required to adequately characterise forest condition within a stand. The 24 trees used in many European surveys is clearly inadequate (Innes and Boswell, 19901, but there have been no studies undertaken to assess the extent of variation over larger areas. In most European areas, the size of individual stands is small (< 2 ha) and the probability of sample plot lying in more than one stand increases rapidly as the sample area increases. In this study, the variation of tree condition over one hectare has been determined, with all trees with a diameter at breast height of greater than 12 cm being included in the survey.

crrul Management

82 (I996)

231-238

Fig. 1. Distribution of total distribution and DBH &J single trees in all plots. The figures show normal quantile-quantile plots and histograms.

2. The initial data set

The sample Area (A), consisting of mature Norway Spruce (Piceu dies CL.) Karst.), is located on the Uetliberg between Zurich and Birmensdorf, Switzerland. Every tree in this area of 100 m X 100 m was assessed for, among other factors, total defoliation and DBH. Defoliation was assessed in 5% classes by comparison of tree crowns to the standard photographs of Mueller and Stierlin (1990). DBH was assessed with callipers, the recorded diameter being the mean of two perpendicular measurements. Table 1 summarizes the basic data on these two

variables. The quantile-quantile plots for the normal distribution for defoliation and DBH on single trees over all plots in forest Area A (Fig. 11 indicate that the marginal distributions of these two variables are highly abnormal (Chambers et al., 1983). For the analysis, forest Area A was partitioned into ten rows and ten columns, giving rise to one hundred 1Om X 10m subplots. Since the subplots were of fixed areas, the number of trees within a subplot was variable. Denoting the variable of mterest by Y, the distribution of the sample mean (7% of

Table 1 Distribution of defoliation (on a scale of 0% to 100% in steps of 5%) and DBH (to the nearest centimeter) on single trees in all plots in the forest Area A. The table shows the mean, the standard deviation (SD), the total number of trees (Freq.1 and some selected qua&es [minimum, 25th percentile, median, 75th percentile, maximum). Only trees with non-miss&g values for def&liation aud IX& were taken Min. Max. Mean SD 25th Med. 75th Freq. Defol 14.9 13.1 416 0 5 IO 20 loo DBH 31.8 13.0 425 12 21 31 42 98 -

S. Ghosh, J.L. Innes/Foresr Ecology and Management 82 (19961231-238

Y in samples of trees drawn from Area A under various sampling designs were examined. In particular, closeness of the mean of T to the population mean M, the closeness of the standard deviation of T to zero and the shape of the distribution of T were examined. The strategies for selecting the plots described below need not be optimal, although they are simple by not requiring complete enumeration of all possible combinations of the plots. For example, T may not be an unbiased estimator of M, although the expected value of T will approach M with increasing number of plots sampled. Moreover, in some of these strategies, all possible plot combinations need not be selected with equal probabilities and also the plots located at the edges of the forest Area A will tend to be selected with probabilities lower than the remaining ones. These can be established easily by constructing artificial numerical examples. In what follows, two subplots p and q will be said to be adjacent if a side of p coincides with a side of q. Also, the indices I and J will denote the row number and the column number of a subplot. Some sampling strategies and the corresponding estimators are described below.

233

2c

2b

2a

Fig. 2. Sampling Strategies 1 and 2. a-c show Sampling Strategy 1 (T-adj). c shows Sampling Strategy 2 (T-plus). 1, First stage subplot; 2, Second stage subplot.

2.1. Sampling

Strategy

1

2.1 .I. Sampling two adjacent sampling scheme

plots

in a two stage

First one subplot is selected at random out of one hundred. Then a second subplot is selected at random from the ones that are adjacent to the first selected subplot. In particular, there may be two (Fig. 2a), three (Fig. 2b) or four (Fig. 2c) subplots adjacent to the first one, depending on whether the first subplot is at one of the four comers of A, along one of the edges of A or inside A respectively.

Table 2 The table shows k as the number of subplots and the Monte Carlo estimates of the mean and the SD of T-wodk), [for k = 1, 2, 3, 4, 5, 10, 20, 30. 40, 501, T-plus [k = 51 and T-adj [k = 21 in loo0 simulations for defoliation and DBH. The estimators T-wor( k), T-plus and T-adj are defmed by taking the sample mean of defoliation or DBH over all trees in the sampled plots when respectively the Sampling Strategies 3,2 and 1 are used Defoliation k

DBH Mean

SD

T-WV 2 3 4 5 10 20 30 40 50 T-pIUS 5 T-&j 2

k

Mean

SD

32.46 31.92 31.88 31.98 31.69 31.77 31.78 31.77 31.82 31.76

7.06 4.60 3.42 2.94 2.48 1.76 1.18 0.9 1 0.72 0.58

31.77

2.92

32.22

4.72

T-wor 14.24 14.60 14.47 14.92 14.84 14.82 14.84 14.86 14.93 14.91

6.13 4.50 3.54 3.18 2.84 1.93 1.32 1.00 0.8 1 0.68

14.86

3.14

14.90

4.6 1

2 3 4 5 10 20 30 40 50 T-plUS 5 T-adj 2

S. Ghosh. J.L. Innes/Forest

234

Ecolngy

cmd Marurgement

82 (19961231-238

tolddefawia

2.2, Sampling Strategy 2 6-

2.2.1, Sampling jive adjacent subplots so that they form a plus f + ) sign First, since a plus formation is sought, the four comer subplots are omitted. Since the center subplot determines the set of subplots forming the plus sign, at the first stage the center is chosen by selecting at random from one of the 64 subplots that are completely inside A. Thus the center subplot (Z,J) is such that 1 < Z,J < 10. Then the second stage subplotshavetheindices(Zl,J),(J+ l,J).(J.JI), (Z,J f 1) (Fig. 2~).

s4x

32-

1:

I_:_/

0 0.0

0.2 .ttmdad

deviation

0.4 of T-aor

l/W pbtted

0.6 (4 against l&t

08

1.0

(ttwtbsrof

p&q

DBH

2.3. Sampling Strategy 3

6s-

2.3.1. Sampling k plots at random and without replacement for various values of k This leads to k-stage sampling where in stage i, (i = 1,2,....,k), one subplot is selected at random

/

j

4,

0.0

0.2 standard

deviation

0.4

0.6 l/V w of T-war plotted against

0.6 llsqfl (mmbcr

1.0 of

plots)

Fig. 4. Monte Carlo estimates from 1000 simulations of ihc standard deviation of T-wodk) plotted against the reciprocal of the square root of k with cubic polynomial through zero for total defoliation and quadratic polynomial polynomial through zero for DBH drawn through the plotted points.

32.4 11

322 I i 32.0 .

I 0



I 10 hocizmttal

20 30 40 k=numherofplots line shorn the mean over all trees

50

Fig. 3. Monte Carlo estimates from loo0 simulations of the mean of T-wodk) plotted against k. for total defoliation and DBH.

from the remaining 100 - i + 1 subplots. For iflustration, we chose k = 1. 2, 3, 4, 5, 10, 20. 30, 40, 50. loo. The estimators 7’-adj, T-plus and T-war(k) are defined to be the sample mean of the random variable Y over all trees in the selected subplots in Sampling Strategies 1, 2 and 3 respectively. The estimator of interest CT) is the sample mean of Y over all trees in the selected subplots using one of the Sampling Strategies 1 to 3. Without loss of generality, the problem can be described as follows. Suppose that in a subplot. m is the sample mean of Y over all trees and n is the total number of trees. Then for the 100 subplots in A, the observations: {(m,,n, 1. (m,,n,,), . ... .. .,(mloo, nrOO>I on the vector random variable (m,n) are assumed to have been generated by a random mechanism and are assumed

S. Ghosh, J.L. Innes/Forest

Ecology

to have a joint probability distribution G on R*. It is possible that (spatial) correlation may exist between any two pairs (m,, nil and (mj, nj>, i,j = 1,2,..., 100. No special assumption is made on the shape of the probability distribution function G, and all inferences made will be conditional on {cm,. n,), (m2,n2), .. .. (%lo%d~*

The sample mean T of Y over all trees in the selected subplots can be defined as follows. In particular let {(m(,p(,)), (y,,, nc2,>,.. .. (m~,,,n~,,)~ be the observations from the k subplots chosen using one of the sampling schemes. Then define T to be a function from R2 X R* X ...X R2 (k times) to R as:

where the summation is over i = 1,2,..., k. The exact distribution of T depends on G and given ((m,,n,>, h,,n,), ...> (q)g+Of) )), the conditional distribution of T may be approximated by a normal distribution

and Management

82 (1996)

231-238

23.5

(using the central limit theorem) under suitable regularity conditions when the sum 3~ is large. In general however the exact distribution of T becomes complicated and asymptotic approaches such as an Edgeworth type expansion (Babu and Singh, 1985) become necessary although even then, convergence to the asymptotic limit would normally require among others, a large number of plots to be sampled. In the current context, our interest lies in making no assumption about G, and in particular, we allow for possible spatial correlations in observations between and within subplots. In principle it is possible to compute directly the distributions of T-adj, T-plus and T-wor( k), k = 1,2. For others, simulation is the only reasonable way. For simplicity, we used the Monte Carlo method (e.g. Thisted, 1991) for all estimators. The Monte Carlo simulation simply calculates the value of T a large number (N) of times in indepen1000 simmbffom la DBH

(bl

-..

I

-2 0

2

MO,

a.3

M

40

-

Ouu~lila of Sundad Normal 7.

xl

2

2m. LslLoo xl, 0. mUnl 3

-t

0

Owmikr of St.wkd

4

Fig. 5. Normal

2

Nonrul

quantile-quantile

5

10

IS

memr

20

2.3

plots and histograms

-2

0

Oumtiks of .Vmh.l 1

4

for T-worik)

for k = 1.2.3.4 from

2 Nerd

1000 simulations,

25

?a

3s

-

40

4

for total defoliation

and DBH.

236

S. Ghosh, J.L. Innes / Forest

Ecology

dent (pseudo-random) simulations. Then during the rth simulation (I = 1,2,...,N), k subplots are selected using the sampling scheme of interest and let the observations in these subplots be: {(m,(,,,n,,,,), >} and T is calculated as: Cmr(2)' Q2,). -.-y (mr,k,Jb(k, T, = Z( m,.(i),n(,())/Zn,(i)

9r = 1.2,. . ,N

where the summation is over i = 1,2 ,..., k. The (pseudo) random observations {T,,T, ,...,7’,,} on T are are conditionally independent and are then used to study the empirical cumulative distribution function (ecdf) as defined by FN( t) = a?i{#r 1T, 5 r}/N.

where the summation is over r = 1>2 ‘...’ N. Note that for every fixed I, F,(t) is a random variable taking values between 0 and 1. In random sampling from a population, properties of ecdf are well known (see e.g. Serfling, 1980). In particular, for every fixed t, it is at least unbiased in small samples and strongly

1@O-ft8tMdddOiWO8

und Mutmgement

82 (1996)

231-238

consistent (so that any continuous functional of this ecdf would also be a strongly consistent esEimator of the corresponding population counterpart) in large samples. When the random samples from the poputation are generated using pseudo-random numbers (such as in Monte Carlo), the ecdf so created may be called a Monte Carlo ecdf. Due to randomness. different sets of simulations will produce different Monte Carlo distributions, although the precision may be increased with increasing A’ We chose N = 1000. The pseudo-random numbers were generated using S-PLUS version 3.1 software (StatScl. 1992).

3. Results Table 2 gives the Monte Carlo estimates of the means and the standard deviations of T-ad-j (based on

1owdmddumfarDBH

-

H

12 (b) Fig. 6. Boxplots for T-wet(k) in the forest Area A.

for k = I, 2, 3, 4, 5, 10, 20, 30, 40, 50 and 100. k = 1fXl

3

4

5

10

20

30

40

30

ioo

maem

cornspondsto taking the mpte meat overall trees

S. Ghosh, J.L. Innes/Foresr

Ecology

two subplots), T-plus (based on five subplots) and T-war(k) (based on k subplots, k = l,.., 510 ,..., 501, for defoliation and DBH. As expected, with increasing number of subplots, the standard deviations decrease so that, the means fluctuate more closely around the target values (14.9 for defoliation and 31.8 for DBH). For T-war(k), for defoliation, the target value is reached from below whereas for DBH, it is reached from the above. The above features can also be seen in Figs. 3 and 4 where Monte Carlo estimates of respectively the means and the standard deviations of T-war(k) are given. As an eye-estimate of how the standard deviations of T-war(k) change with varying k, in Fig. 4, polynomials were fitted through the points (,/k,s,), where k is the number of subplots sampled and sk is the Monte Carlo estimate of standard deviation of T-war(k). The fitting methods were largely exploratory, in that the fitted equations were only estimates and valid confidence bands were not worked out. In order to obtain valid confidence bands for the curves, further simulation work would be needed since the usual least squares formulas for regression limits cannot be applied here. This, being beyond the scope of the present work, was not pursued. The results indicate that for large values of k, the standard deviation of T-war(k) converges to zero at a rate that is proportional to \/k. For small values of k, other correction terms involving higher powers of Jk, in particular, up to cubic power for defoliation and up to quadratic power for DBH may be necessary. In depth.examination of this particular property however needs to be carried out in future research. Fig. 5 examines how fast the distributions of T-war(k) converge to normality. This is done by plotting boxplots and histograms for the Monte Carlo distributions of 7’-war(k). Only up to k = 4 is shown for both defoliation and DBH. The plots show that compared to the raw data distribution (Fig. l>, the distributions of T-w04 k) converge to normality rather fast as k increases. In other words, for moderately large k, the distribution of T-wol( k) may be reasonably approximated by a normal distribution. Fig. 6 gives boxplots of the Monte Carlo distributions of T-wodk) for varying k. k = 100 gives the target value namely the mean of the population (in this case, the raw data set).

and Management

82 (1996)

231-238

Fig. 7. Empirical quantile-quantile plots for(i) T-wol(2) and (ii) T-war(5) and T-plus from 1000 simulations defoliation and DBH.

237

and T-adj for total

From the amount of work involved in collecting the data, the sampling design for T-war(2) is comparable to that of T-adj, where in both cases, only two plots are sampled. Similarly, T-plus and T-wori5) are comparable, where five plots are sampled in either case. Fig. 7 compares the distributions of these estimators via empirical quantile-quantile plots with robust lines drawn through them. In these plots, the order statistics from one distribution are plotted against the corresponding order statistics of the other distribution. If these distributions were the same, one would expect to see a straight line pattern in these plotted points. It is interesting to note that the distributions of 7’-war(2) and T-adj are essentially the same (except perhaps near the extreme tails). The plotted points lie quite exactly on the line y = X. The differences in the tails of the two distributions are reflected particularly in the means (see Table 2). This has the implication that practically no information can be lost by sampling two plots that are adjacent to each other as compared to sampling them randomly from the population. The probability distri-

butions of T-plus and Two&T) are on the other hand differ more (Fig. 7). This indicates that appropriate caution may have to be taken regarding the assumptions about the probability distribution of an estimator that is based on moderately large number of adjacent plots.

Christian Hug and Hans-Ruedi Spengler for undcrtaking the field assessments. In addition, we wish to thank Hans-Ruedi Kuensch (Seminar fuer Statistik. Zuerich) for helpful remarks on an earlier version of the paper. References

4. Conclusions A number of sampling designs are possible when looking at forest health within a stand. The standard European assessment, involving a fixed sample of 24 predominant, dominant and co-dominant trees may be a valid compromise between cost and scientific validity for regional scale investigations of crown condition, but a more appropriate sampling design is required for plots where crown condition is being investigated in detail. Such a sampling design will clearly be dependent on the nature of the forest being sampled, and more research is required to see whether the results presented here are applicable to other forest types. In this study, considerable variation was present over a 1 ha area, and different sampling designs were therefore variable in their efficiency in predicting the mean crown transparency and DBH. The results suggest that in order to obtain a better picture of stand condition, it may be as valid to enlarge sample plots as to randomly allocate extra plots within a stand. The latter obviously will involve greater unit costs because of the time taken to transfer between sample plots.

Acknowledgements We gratefully acknowledge the comments of the referees that led to significant improvements of the presentation of the paper. We also thank our colleagues from WSL, namely, Hans Item for computer and software support, Andreas Schwyzer for data handling, Erik Riisler, and Johann Wey for further help and information related to the data, Michele KaeMel for help with some of the reference and

Babu. G.J. and Singh, K., 1985. Edgeworth expansions for sam plinp without replacement from finite population. J. Mult. Anal., 17: 261-278. Bimbaum. Z.W. and Sirken, M.G.. 1965. l&sign vi samplu surveys to estimate the prevalence of rare deseases: Three unbiased estimates. Public Health Serv. Publ. No. 1000. Series 2, No. I I, Chambers, J.M. and Cleveland, W.S., Kleiner. B. and Tukey, P.A.. 1983. Graphical methods for data analysis. Wadsworth and Brooks, Pacific Grove, CA, 395 pp. Innes, J.L. and Boswell, R.C., 1990. Reliability, presentation. and relationships among data from inventories of forest condition. Can. J. For. Res., 20: 790:799. Kalton, G. and Anderson, D.W.. 1986. Sampling rare populations. J. R. Stat. Sot. A. 149: 65-82. Mandallaz, D., Schlaepfer, S. and Amould, J., 1987. D@&irsement des &ets: Echantillonage simple ou par satellites” Schweizerische Zeitschrift fuer Forstwesen, 138: 277-292. Mueller, E. and Stierlin, H.R., 1990 Sanasilva tree crown photos with percentages of foliage loss. Swiss Federal Institute for Forest, Snow and Landscape Research, Birmensdorf. Switzerland. Naslund. M., 1944. Antalet provtrad och kubibmassans noggrannhet vid stamrakning av skog. Medd. Skogforsokanst. 34: 285-308. Roesch, F.A., 1993. Adaptive cluster sampling for forest inventcries. For. Sci., 39: 655-669. Salas Gonzalez R., Houllier, F., Lemonie, B. and Pierrat, J.C.. 1993. Reptisentativiti locale des placettes d’inventaife en vue de I’estimation de variables dendromktriques de pcuplement. Ann. Sci. For., 50: 469-485 Serfling, R.J., 1980. Approximation Theorems of Mathcmatrcal Statistics. Wiley, NY, 371 pp. StatSci, 1992. S-Plus version 3. I Statistical Sciences, Inc. Seattie. WA. Sture. S., 1987. Precision of volumeand volume increment estimates. Stand. J. For. Res., 2: 379-387. Thisted. R.A., 1991. Elements of Statistical Computmg. Chapman and Hall, NY, 427 pp. Thompson. S.K. 1990. Adaptive cluster sampling. J. Amer. Stat. Assoc., 85: 1050- 1059. Thompson, S.K., 1991. Adaptive cluster sampling: designs wtth primary and secondary units. Biometrics, 47: I 103-I 1 IS.