Double sampling with importance sampling to eliminate bias in tree volume estimation of the centroid method

Double sampling with importance sampling to eliminate bias in tree volume estimation of the centroid method

Forest Ecology and Management 104 Ž1998. 77–88 Double sampling with importance sampling to eliminate bias in tree volume estimation of the centroid m...

145KB Sizes 0 Downloads 99 Views

Forest Ecology and Management 104 Ž1998. 77–88

Double sampling with importance sampling to eliminate bias in tree volume estimation of the centroid method Michael S. Williams a

a,)

, Harry V. Wiant Jr.

b

Multiresource InÕentory Techniques, Rocky Mountain Forest and Range Experiment Station, USDA Forest SerÕice, 240 W. Prospect Road, Fort Collins, CO 80526-2098, USA b 113 Scenery DriÕe, Morgantown, WV 26505, USA Received 12 December 1996; accepted 18 August 1997

Abstract The bias associated with the centroid method for estimating tree volume was removed by double sampling using the centroid method in the first stage of estimation and importance sampling in the second. The bias and efficiency of various sampling methods were investigated in a number of different simulation studies. Three data sets, two of ponderosa pine and the other of mixed species, were used to evaluate the double sampling estimator. We assumed that for small area estimates, such as timber sales, every tree in the population could be visited in the first stage and an estimate of bole volume generated using the centroid method. Both simple random and 3P sampling were studied as methods for selecting the second-stage sample trees. For the estimation of volume over a large area, we chose point-3P sampling for applying the double sampling estimator. The bias associated with using the centroid method alone was reduced in every case. No consistent trends for the three data sets were found when efficiency was measured in terms of the simulation variance. The double sampling estimator was the most efficient in some cases and least efficient in others. q 1998 Elsevier Science B.V. Keywords: Efficiency; 3P; Point-3P; Simple random sampling

1. Introduction An important process underpinning most forest inventories is estimation of tree volume. Common methods of estimation include the use of volume tables and functions. A major drawback with these is the need to validate the methods of estimation whenever they are applied to data which differ from those used to generate the volume tables or functions. An example would be the application of a regional )

Corresponding author. Tel.: q1-970-498-1335; fax: q1-970498-1010; e-mail: [email protected].

volume table or taper function to a specific timber sale. An alternative is to divide the tree into short section and sum the measured volumes of all the sections. This requires the labor intensive task of felling the tree or measuring the bole with an optical instrument such as the Barr and Stroud optical dendrometer, Relaskop, Telerelaskop or laser dendrometer. Horizontal importance sampling, introduced by Gregoire et al. Ž1986., provides an unbiased estimate of the volume of a tree bole based on a proxy taper function AˆŽ h., which is used to estimate bole volume between the heights H l and Hu Ž H l - Hu . on

0378-1127r98r$19.00 q 1998 Elsevier Science B.V. All rights reserved. PII S 0 3 7 8 - 1 1 2 7 Ž 9 7 . 0 0 2 5 7 - 0

M.S. Williams, H.V. Wiant Jr.r Forest Ecology and Management 104 (1998) 77–88

78

the stem. The bole volume derived from the proxy function is given by P s HHHl u AˆŽ h . dh. Sample heights h k , k s 1,2, . . . m are selected randomly proportional to the estimated distribution of bole volume as determined by the proxy taper function, i.e., u k s HHhlk

AˆŽ h . P

dh,

where u k ; Uniform w0,1x. The proxy taper function estimate is then adjusted by the ratio of the true cross-sectional areas AŽ h k . at randomly chosen heights h k , to the estimated cross-sectional area AˆŽ h k . at the same height as predicted by the proxy function, i.e., yˆ s

m

1

xs

AŽ h k .

P. Ý m ks1 AˆŽ h k .

The true sampling variance for importance sampling is:

s 2s

1 m

HHHl u

AˆŽ h . P

ž

AŽ h . AˆŽ h .

Pyy

/

sˆ s

1

m

m Ž m y 1.

ks1

Ý

ž

AŽ h k . AˆŽ h k .

AŽ hc . AˆŽ h c .

P,

where h C is such that 0.5 s HHhlc

2

AˆŽ h . P

dh.

dh,

where y is the true volume between the H l and Hu . A sample-based estimator is: 2

per stem diameters can greatly increase the cost of the survey. Wiant et al. Ž1989. compared the efficiency of volume estimates derived by importance sampling with volumes derived from intensive dendrometry for an Australian radiata pine Ž Pinus radiata D. Don. stand. Volume estimates based on trees selected by 3P sampling ŽGrosenbaugh, 1964. were comparable for the two methods despite a 96% reduction in dendrometery for importance sampling. An alternative to importance sampling is the centroid method originally proposed by Wood et al. Ž1990.. This estimator is identical to importance sampling with m s 1, but the selection of the measurement point at random is eschewed. Instead the height h, at which the cross-sectional area is measured, is constrained to be the estimated centroid, i.e., the height that divides the predicted bole volume in half. This yields the volume estimate:

2

P y yˆ

/

.

For the estimation of bole volume on standing trees, the number of upper stem measurements has generally been m s 1 or 2 ŽValentine et al., 1992; Wiant et al., 1996.. The antithetic point, u 2 s 1 y u1 , has been suggested ŽVan Deusen and Lynch, 1987; Kleinn, 1993; Gregoire et al., 1993. when m G 2. While the use the antithetic point can greatly increase the precision of estimated volume, its use ensures that diameter measurements above the centroid of the bole must be measured on every tree. While this is of little concern when trees are felled before measurement, the remote measurement of up-

This method was originally referred to as centroid sampling, but because no random sampling of the measurement height occurs, we refer to this estimation technique as the centroid method. Two consequences of the nonrandom measurement height Ž h C . are: Ži. Estimates derived from the centroid method are only unbiased when the estimated height of the centroid and the actual height of the centroid agree, which may not be true in a given case; Žii. The variance associated with x is zero with the bias, d s y y x, being the only contributing factor to the mean squared error of x. Empirical tests of the centroid method indicate that this bias tends to be small. In a study of four large data sets, Wiant et al. Ž1996. found the bias of the centroid method to be small Žy0.2 to y4.1% of actual volume. but statistically significant. The centroid method has two significant advantages over importance sampling, namely: Ži. Estimates of tree volume derived from the centroid

M.S. Williams, H.V. Wiant Jr.r Forest Ecology and Management 104 (1998) 77–88

method tend to be more precise than volume estimates derived from importance sampling when m s 1 ŽWood and Wiant, 1992; Wiant et al., 1996.; Žii. The point of the cross-sectional area measurement is usually accessible and can often be measured from the ground while the importance sampling points can fall in the upper crown of the tree. We assume that the relationship between the importance sampling and centroid method estimates of tree volume satisfy the model yˆ s b x q e ,

Ž 1.

where e has mean zero and variance proportional to x l. Under such conditions, the ratio estimator is model-unbiased with the estimator being a best linear-unbiased estimator when l s 1 ŽSection 6.7 of Cochran, 1977.. We speculated that double sampling for ratio estimation Žcf. Section 12.9 of Cochran, 1977; Section 14.1 of Thompson, 1992., where the centroid method and importance sampling are used to estimate bole volumes in the first and second stages, respectively, may be a useful tool for reducing both the bias and variability of volume estimates. Thus, the focus of this paper was to propose and test methods of reducing the bias associated with the centroid method using double sampling for ratio estimation. We studied the performance of the double sampling estimator compared with estimators based solely on true volumes, or estimated volumes derived from the centroid method or importance sampling. Estimator bias, standard deviation and mean square error were used to compare the various estimation methods.

2. Estimation of total volume with importance sampling and the centroid method A primary goal of many forest inventories is to estimate the total volume of N boles based on a sample of size n. When importance sampling is used to estimate the volume of each of the sample trees, the resulting estimator is multistaged, with the final stage being the importance sampling estimation of bole volume. For the centroid method, no additional stage is added to the estimator, but the resulting

79

estimator is augmented by the error between the estimated and actual volume. Estimators for some common sampling designs are given in Appendices A–C and will be used in conjunction with the double sampling estimator later in the paper.

3. Data description Three data sets were used for the testing, two which contained only ponderosa pine Ž Pin. ponderosa Laws. and one containing a mixture of species including aspen Ž Populus tremuloides, Michx.., black poplar Ž Pop. balsamifera L.., white spuce Ž Picea glauca Voss., lodgepole pine Ž Pin. contorta Dougl.., and balsam fir Ž Abies balsamea Mill... These data sets are referred to as POND1, POND2 and MIX. Further description of the measurement process for the MIX data set is given in Alberta Environmental Protection, Land and Forest Service Ž1988.. Diameter measurements outside bark were derived from standing trees using a Barr and Stroud FP-15 dendrometer for POND1 and inside bark on felled trees for POND2 and MIX. Gross volumes were generated by summing the volumes of each section using Smalian’s formula ŽHusch et al., 1982.. The gross volumes for the proxy bole were also calculated for each tree for comparison with the true values. The estimated total volume, derived from the proxy function, was compared to the true total volume. The differences between the proxy bole and true volumes were y4, 27 and 26% for the POND1, POND2 and MIX data sets, respectively. Cross-sectional areas at any point along the bole were needed for the simulation study. These were estimated by finding the section that contained the given height, and interpolating between the two end measurements. The true importance sampling variance, s 2 , was derived for each tree using numerical integration. In one of the simulation studies, variable radius point sampling was employed. This method requires trees to be assigned a fixed location within a given sample area. Because none of the available data sets comprised mapped populations, trees were assigned random coordinates within a square boundary. The areas covered by the simulated mapped population were 1, 1, and 4 ha, respectively for the POND1, POND2 and MIX data sets.

M.S. Williams, H.V. Wiant Jr.r Forest Ecology and Management 104 (1998) 77–88

80

Table 1 Summary statistics for the three data sets Data set

Source

Number of trees

Number of sections

DBH Žcm.

H Žm.

Gross volume Žm3 .

Proxy volume Žm3 .

POND1 POND2 MIX

USFS USFS Alberta FS

364 185 704

2506 2010 6995

30.1 Ž10.6. 31.4 Ž8.4. 22.4 Ž9.6.

16.0 Ž5.3. 17.8 Ž4.1. 19.7 Ž5.3.

0.76 Ž0.79. 0.59 Ž0.55. 0.41 Ž0.38.

0.75 Ž0.73. 0.73 Ž0.75. 0.51 Ž0.49.

Standard deviations are listed in parentheses.

Descriptive statistics for all three data sets are listed in Table 1.

4. Methods used to test the double sampling estimator The most general form of the double sampling ratio estimator is given by Vˆr s

2 Ý nis1 yˆirp 1 ip 2 i 2 Ý nis1 x irp 1 ip 2 i

n1

Ý x irp 1 i ,

Ž 2.

is1

where n1 and n 2 are the first and second stage sample sizes respectively, p 1 i and p 2 i are the probabilities of selections for the first and second stage, respectively, and x i and yˆi are the volume estimates for the centroid method and importance sampling, respectively. Methods for selecting the first and second stage samples vary depending on the type of survey being conducted. For smaller, high-value timber stands, each tree may be visited. We refer to this group of sampling designs as 100% tally methods. For surveys conducted over a larger area, variable radius or fixed area sampling may be used for the first stage with second stage samples drawn using either unequal or equal probability methods, such as 3P or simple random sampling. We studied 100% tally sampling designs and large area survey using a combination of simulation and analytical results. The simulation results were used to validate the analytical results listed in Appendices A–C and to ensure the ratio estimator produced truly unbiased estimates of total volume, i.e., the model assumption given in Eq. Ž1. were correct. A number of alternative estimators based on true volumes and volumes derived using the centroid method or importance sampling were also included in the study for comparison. For 100% tally surveys, volume estimates, derived by the centroid method, were made for all trees

Ž n1 s N, p 1 i s 1.. Importance sampling volume estimates were generated for a subsample of size n 2 . Simple random and 3P sampling were employed to select the second stage sample, yielding probabilities o f se le c tio n p 2 i s n 2 r N a n d p 2 i s n e Di2 HirÝ Njs1 Dj2 Hj , respectively. The efficiency and bias of the double sampling estimator, VˆR , were compared with those of true volume Ž Vˆ ., the centroid method Ž VˆC ., and importance sampling Ž VˆI ., all of which were derived solely from the second-stage information. The estimator used for the comparison of the 3P samples was the adjusted 3P estimator Žcf. Schreuder et al., 1968., i.e., n e n2 Vˆ s Ý y rp , n 2 is1 i 2 i VˆC s

ne n2

n2

Ý x irp 2 i is1

and VˆI s

ne n2

n2

Ý yˆirp 2 i . is1

ˆ VˆI and VˆC ŽAppendices A and The variances of V, B. were compared to the variance of the double sampling estimator which is: Var VˆR f

N Ž N y n.

N

Ž yi y Rx i .

Ý

n

Ny1

is1

N

= Ý si 2 is1

for simple random sampling and N

Var VˆR f

Ý

Ž yi y Rx i .

is1

2

pi

N

y

Ý is1

N 2

= Ž yi y Rx i . q

Ý is1

for 3P sampling.

si 2 pi

2

N q n

M.S. Williams, H.V. Wiant Jr.r Forest Ecology and Management 104 (1998) 77–88

81

Table 2 Formulae used for estimating volumes using a single upper stem diameter measurement Estimator

Formula

Source

Proxy volume Ž P .

P s Ž aHu q 0.5bHu2 . y Ž aH l q 0.5bH l2 . AŽ h k . 1 m P m Ý ks 1 AˆŽ h k . AŽ hC . P AˆŽ h C .

Williams and Wiant, 1994

Importance sampling Ž yˆ . Centroid method Ž x .

Gregoire et al., 1986 Wood et al., 1990

a s AŽ H l . y bH l . AŽ H1 . s cross-sectional area at H l . H l s lower height. b s AŽ H l .rŽ H y H l .. H s total height. Hu s the upper height. m s the number of upper stem measurements for importance sampling. h k s H y ŽŽ H y H l . 2 y ug . 0.5 which is the importance sampling heights, k s 1, . . . m. AŽ h k . s the cross-sectional area at importance sampling height k. u ; Uniform w0,1x. g s 2 H Ž Hu y H l . q Ž H l2 y Hu2 .. AˆŽ h k . s a q bh k . AŽ h C . s cross-sectional area at the centroid method height h C . h C s H y ŽŽ H y H l . 2 y 0.5g . 0.5 . AˆŽ h C . s a q bh C .

For the large area survey situation, we chose the point-3P sampling design of Grosenbaugh Ž1971, 1979., where the first-stage sample was selected using variable radius point sampling, and the second stage subsample was selected by 3P sampling. The second stage probabilities of selection were derived using the volume estimates from the centroid method 1 Žp 2 i s n e x irÝ njs1 x j .. The point-3P double sampling estimator was compared with the point-3P estimators for the true volume, centroid method and importance sampling respectively, i.e., Vˆ s 2 2 Ý nis1 yirp 1 ip 2 i VˆC s Ý nis1 x irp 1 ip 2 i and VˆI s n2 Ý is1 yˆirp 1 ip 2 i . A disadvantage of point sampling is the need to calculate joint probabilities of selection in order to determine the approximate variance of the estimator. For this reason, we used Monte Carlo simulation to determine the true variance of all point-3P estimators. For each data set and survey method, 250,000 samples were drawn from which estimates of total ˆ VˆR , VˆI and VˆC . The volume were made using V, bias, standard deviation and mean square error were computed for each estimator. For the 100% tally methods, the simple random sample size was n 2 s 20 and the desired 3P sample size was n e s 20. For the

large-scale survey, 20 variable radius plots were selected with a desired second stage 3P sample size of n 2 s 25. In every case, the number of upper stem measurements for importance sampling was m s 1. The proxy function used for the centroid method and importance sampling are listed in Table 2. 5. Results and discussion For both the 100% tally and large survey methods, the double sampling procedure Ž VˆR . essentially eliminated the bias of centroid sampling ŽTables 3–5.. When 3P sampling was used in the 100% tally simulations, there was little difference between the efficiency of double sampling and the importance sampling estimators Ž VˆR and VˆI ., but the former was slightly less efficient for estimating total volume for all three data sets ŽTable 3.. In every case, the estimator based on the centroid method Ž VˆC . was substantially more efficient than both the double and importance sampling estimators. The improved efficiency of VˆC in relation to VˆR is explained by the difference in efficiency between the adjusted and unadjusted 3P sampling estimator. Analytical results,

M.S. Williams, H.V. Wiant Jr.r Forest Ecology and Management 104 (1998) 77–88

82

Table 3 Results for the 100% tally methods using 3P sampling to select trees at the second stage

Table 5 Simulation results for the large-scale survey method using point-3P sampling

Data set

Statistic

VˆR

VˆI

V ^C



Data set

Statistic

VˆR

VˆI

VˆC



POND1

Bias Standard deviation Mean square error

0.0 29.6 29.6

y0.1 29.3 29.3

y2.6 3.7 4.5

0.0 3.1 3.1

POND1

Bias Standard deviation Mean square error

y0.2 66.0 66.0

0.0 85.1 85.2

y1.0 55.0 55.4

y0.1 66.9 66.9

POND2

Bias Standard deviation Mean square error

0.4 30.4 30.4

0.4 30.2 30.2

2.3 3.6 4.2

0.0 2.4 2.4

POND2

Bias Standard deviation Mean square error

y0.1 25.1 25.1

0.1 38.7 38.7

2.1 35.8 35.9

y0.1 36.5 36.5

MIX

Bias Standard deviation Mean square error

0.0 6.0 6.0

0.0 5.7 5.7

3.6 3.4 5.0

0.0 2.2 2.2

MIX

Bias Standard deviation Mean square error

0.0 31.8 31.9

0.1 83.0 83.0

3.7 82.2 82.9

0.0 80.5 80.5

All results are expressed as a percentage of the total volume.

given in Appendix D, indicate that it is unlikely the double sampling estimator Ž VˆR . will produce standard errors smaller than the centroid method estimator when only one upper stem measurement Ž m s 1. is used for the importance sampling estimator. When simple random sampling was used, the results were substantially different ŽTable 4.. For the MIX data, the double sampling estimator was substantially more efficient than both the importance and the centroid estimators, its standard deviation being about 36% of the other methods. For the POND data sets, the double sampling estimator was the second most efficient after that of the centroid method. In the large survey setting involving point-3P sampling, the double sampling estimator Ž VˆR . estiTable 4 Simulation results for the 100% tally methods using simple random sampling to select trees at the second stage Data set

Statistic

VˆR

VˆI

VˆC

POND1

Bias Standard deviation Mean square error

0.2 41.7 41.7

y0.1 49.6 49.6

y2.6 22.6 22.8

0.0 22.5 22.5

POND2

Bias Standard deviation Mean square error

0.6 16.7 16.7

0.1 25.3 25.3

2.1 22.2 22.3

y0.1 19.9 19.9

Bias Standard deviation Mean square error

0.0 7.7 7.7

y0.1 21.9 21.9

3.6 21.2 21.5

0.0 20.8 20.8

MIX



All results are expressed as a percentage of the total volume.

mator was superior to the true volume Ž Vˆ . and importance sampling Ž VˆI . estimators for every data set ŽTable 5.. For the POND1 data sets, the double sampling estimator was less efficient than the point3P estimator based on the centroid volume estimates. For the POND2 and MIX data sets, the double sampling estimator was substantially more efficient than the centroid method estimator. Of particular interest was the performance of the centroid method estimator under point-3P sampling. This estimator outperformed the true volume estimator for both POND data sets. This result is due to a strong negative correlation between the error in estimation, d , and true bole volume, y, under the point3P sampling design. For the 100% tally methods, the correlation was not as strong, or in many cases, was actually positive. The appropriate allocation of resources to the various stages of sampling Ž n1 , n 2 , m. is substantially more complicated when importance sampling is employed at the final stage of estimation. For m fixed and using simple random sampling at both stages yields n 2 s n1

)

C1

Ž sr 2 q s I2 .

C2

Ž s V2 y sr 2 .

,

where C1 and C2 are the cost of measuring trees in N each stage, sI2 s Ý is1 si 2rN is the mean importance sampling variance, s V2 is the finite population variance for the true volumes and sr 2 is the population variance for the ratio estimator. This relationship

M.S. Williams, H.V. Wiant Jr.r Forest Ecology and Management 104 (1998) 77–88

83

Fig. 1. Variance of the importance sampling estimate vs. total volume.

assumes an equal number of upper stem measurements Ž m. for importance sampling, and is in no way an optimum allocation of resources. Fig. 1 shows the relationship between si 2 and the centroid method estimate of volume for each tree in the MIX data set with m s 1. With some simplifying assumptions, it can be shown that si 2 increases approximately with the square of the bole volume. This result indicates that the number of upper stem measurements for importance sampling should increase with tree size to achieve more efficient estimates of volume for a minimal cost. The drawback is that simple closedform solutions to estimate optimal sample allocation no longer exist.

volume is given by: VˆI s

N n

n

1

Ý is1

m

m

AŽ h k .

Ps

Ý Aˆ h Ž k.

ks1

N n

n

ˆ Ý y. is1

The corresponding variance for this estimator is Var VˆI s Var w Vˆ x q

N

N

Ý si 2 ,

n

is1

where Var w Vˆ x s

N Ž N y n.

N

Ý Ž y yY . n Ž N y 1 . is1 i

2

is the population between-tree variance. An unbiased variance estimator is: N Ž N y n.

VˆI

n

2

N

n

Appendix A. Derivations for total volume estimators: simple random sampling

var VˆI s

When importance sampling is used to estimate the volume of each of n trees selected by simple random sampling, the resulting estimator of total volume is two-stage, with the first stage being the selection of the sample tree, and the second being the estimation of bole volume. The two-stage estimator of total

when m G 2. The estimate of total volume for the centroid method is VˆC s

N n

n

ž

Ý yˆ y n Ž n y 1 . is1 i N

AŽ hc .

N

n

Ps Ý ˆ Ýx, n is1 i is1 A Ž h c .

/

q n

Ý sˆi 2 , is1

M.S. Williams, H.V. Wiant Jr.r Forest Ecology and Management 104 (1998) 77–88

84

with corresponding variance N

N Ž N y n.

Var VˆC s

Ý Ž x yX . n Ž N y 1 . is1 i

2

,

N where X s Ý is1 x irN. The sample estimator of Varw VˆC x is

var VˆC s

N Ž N y n.

n

n Ž n y 1.

is1

Ý

ž

xi y

VˆC N

2

/

exist Žcf. Grosenbaugh, 1965, 1976; Schreuder et al., 1968, 1971.. These approximations tend to add adjustment factors or slight variations to the PPS with replacement estimator. We took a different approach to deriving the variance when 3P and importance sampling were used in the first and second stage of sampling, respectively. Reformulate the estimator

. VˆI s

A more illuminating version of the variance is Var VˆC s Var w Vˆ x q Var w d x q 2Cov w Vˆ , d x , where d i s yi y x i and the second and third terms are the variance of the errors and covariance between the error and true volume. Note that when the covariance between the true volume and the error is negative, the variance of the centroid method can be smaller than both the variance of the importance sampling estimator and the single-stage estimator based on true volumes.

ne n

n



Ý

pi

is1

s

Ý nis1 yˆirp i

VˆIU

s

Ž nrn e .

nrn e

,

where VˆIU is the unadjusted 3P sampling estimator Žcf. Schreuder et al., 1968.. The variance of the resulting ratio estimator can be expressed as Var VˆI f Var VˆIU q 2V y ne

V2 n 2e

Var w n x

Cov VˆIU ,n ,

where V is the total volume. Using the conditional variance formula Appendix B. Derivations for total volume estimators: 3P sampling Grosenbaugh Ž1964, 1965. introduced 3P sampling as an efficient method for forest inventory. When 3P sampling is used for selecting trees in the first stage and importance sampling is used to estimate volumes in the second stage, the resulting estimator of total volume is VˆI s

ne n

n

Ý is1

1

pi

ž

1 m

m

AŽ h k .

Ý Aˆ h P Ž k.

ks1

/

s

ne n

n

Ý is1



pi

Var VˆIU s V1 E2 VˆIU yields N

Var VˆIU s Ý yi2 is1

Ž1ypi . pi

N

q

Ý is1

si 2 pi

.

Employing the conditional covariance expression, Cov VˆIU ,n s E1 E2 nVˆIU

.

In this study we used

y E1 E2 VˆIU

Ý d j2 h j s n e d i2 h irD 2 H js1

as the probability of selection where d and h were the diameter at breast height and total tree height, respectively. Grosenbaugh Ž1965. suggested approximating the variance of the 3P estimator using the variance formula associated with probability proportional to size ŽPPS. sampling with fixed sample size and replacement. Numerous additional approximations for the variance of the single-stage 3P estimator

E1 E2 w n x ,

yields

N

p i s n e d i2 h ir

q E1 V2 VˆIU

N

Cov VˆIU ,n s

Ý yi Ž 1 y p i . . is1

The variance of the achieved sample size is N

Var w n x s

Ý pi Ž1ypi . . is1

The results of Cochran’s Theorem 11.2 ŽCochran, 1977. and Van Deusen Ž1987. were used to suggest the following two approximate variance estimators

M.S. Williams, H.V. Wiant Jr.r Forest Ecology and Management 104 (1998) 77–88

for Varw VˆI x;

for importance sampling and the centroid method are yˆi2 Ž 1 y p i .

n

var

VˆI f Ý

Ž1.

y2

q

p i2

is1

VˆI

n

pi

is1

n2

n

Ý Ž1ypi .

n 2e is1

yˆi Ž 1 y p i .

Ý

ne

VˆI2

n

q

Ý is1

VˆC s

sˆi 2

and

pi

VˆI s

var

VˆI f

n

n e Ž n y 1.

is1

n

q

Ý

sˆi 2

is1

pi

Ý

n e yˆi

ž

pi

/

ne n

n

yˆi

p 1 ip 2 i

Var VˆC s Var w Vˆ x q Var w d x q 2Cov w Vˆ , d x ,

.

as was the case with the 100% tally methods. The variance for V I can be derived from

For the centroid method the estimator for total volume is VˆC s

Ý

and respectively. The variance for the centroid method is

2

y VˆI

p 1ip 2 i

n2 is1

1

xi

Ý is1

and Ž2.

Var VˆI s V1 E2 E3 VˆI

is1

pi

N

x2

i Var VˆC f Ý y p i is1

ž

ž

= 1q

N xi . Ž Ýis1

ne

Var w n x n2e

/

2

/

.

Var VˆC s Var w Vˆ x q Var w d x q 2Cov w Vˆ , d x ,

N

E1 E2 V3 VˆI

where

s

Ý is1

Cov w Vˆ , d x

ne

N

ž

Ý is1

Ž ne y pi . pi

N

yi d i y 2

,

where the first two terms reduce to the variance of the usual point-3P estimator because of the unbiased properties of importance sampling. The third term is the variance of the importance sampling volume estimates averaged over all possible first- and second-stage samples. Schreuder et al. Ž1984. concluded that an exact variance under point-3P sampling is unlikely to exist and studied the performance of numerous approximations to the variance. These approximations assume a fixed second-stage sample size. To the same degree of approximation

As with simple random sampling

1

q E1 V2 E3 VˆI

q E1 E2 V3 VˆI

xi

Ý

with approximate variance

s

85

N

Ý Ý yi d i is1 j)i

/

.

Appendix C. Derivations for total volume estimators: point-3P sampling Point-3P sampling ŽGrosenbaugh, 1971, 1979. has been employed as an efficient method of estimating volume over larger areas. The total volume estimator

si 2 p 1 ip 2 i

.

Schreuder et al. Ž1992. suggested a simple bootstrap procedure for estimating the variance under point-3P sampling. They also emphasize that any resampling procedure must mimic the original sampling design. We are concerned that a bootstrap estimator may not perform well when the number of upper stem measurements for importance sampling is small. We suggest using Grosenbaugh’s approximate variance estimator Žcf. Schreuder et al., 1992. with the additional term n2

Ý is1

sˆi 2 p 1 ip 2 i

.

M.S. Williams, H.V. Wiant Jr.r Forest Ecology and Management 104 (1998) 77–88

86

We did not test the performance of any variance approximation techniques as we feel it is beyond the scope of this paper. A disadvantage of point sampling is the need to calculate joint probabilities of selection to determine an approximate variance of the estimator. For this reason, we used the Monte Carlo simulation to determine the true variance of all point-3P estimators.

Appendix D. Efficiency of VˆR under 3P sampling The goal is to determine the conditions under which the variance of the double sampling estimator is smaller than that of the centroid sampling estimator when 3P sampling is used to select the secondstage sample. Let x i and yˆi denote the volume estimates derived from the centroid method and importance sampling, respectively. Schreuder et al. Ž1968. define the unadjusted and adjusted 3P sampling estimators as: n

VˆCU s

xi

Ý

pi

is1

s

ne n

, VˆC s

ne n

n

VˆCU , VˆIU s

Ý is1

yˆi

pi

, and VˆI

s s V2CU

, Var VˆC

s s V2C , Var

VˆIU

We define q to be the ratio of the centroid method variances for adjusted and unadjusted 3P estimators, i.e., qs

s V2CU

k s

s V2IU s V2CU

,

with k 2 ) 0. We assume that k 2 will generally be greater than one whenever the number of upper stem measurements for importance sampling is small, although this was not the case for one of our data sets. Using the analytical results of Appendix B and m s 1 upper stem measurement per tree, we found k 2 values of 3.53, 1.43 and 0.98 for the POND1, POND2 and MIX data sets, respectively. The desired relationship between the variance of the double sampling and centroid method 3P estimator is Var w VR x F s V2C , where

.

Williams et al. Ž1997. show that it is possible for q G 1 when x i A p i for all but a few units in a population for which p i is much smaller than expected for a given x i , but in general q - 1 with the variance of the adjusted estimator commonly ranging from 0.5 to 0.08 times that of the unadjusted estimator Žcf. Schreuder et al., 1968.. We found q values of 0.030, 0.031, and 0.023 for the POND1, POND2 and MIX data sets, respectively.

VˆI

Var w VR x sVar

=

s s V2IU , and Var VˆI s s V2I .

s V2C

2

Yˆu ,

with corresponding variances Var VˆCU

The relationship between the 3P unadjusted variances for the centroid method and importance sampling is defined as

ž

VˆC

XsVar

s V2IU E w VIU x

2

q

VˆIU

2

Xf X =

VˆCU

s V2CU E w VCU x

2

y2 r

E VˆIU

2

E VˆCU

2

s V CU s V IU E w VCU x E w VIU x

/

.

The key feature to note is that the sample size adjustment term, n ern, cancels and the unadjusted 3P estimators Ž VˆIU and VˆCU . remain. By assuming that the bias of the centroid method is small, we make the simplifying assumption Ew VˆCU x f Ew VˆIU x s Y, where Y is the total volume. Thus, Var w VR x f s V2IU q s V2CU y 2 rs V IU s V CU , where r is the correlation between VˆIU and VˆCU . Simplifying gives the relationship Var w VR x f s V2CU Ž k 2 q 1 y 2 r k . F s V2C s qs V2CU . Note that y1 - r - 1 provides a constraint which can be used to define situations where Varw VR x F s V2C , leading to the final approximate relationship

Ž k2q1yq. 2k with r F 1.

Fr,

M.S. Williams, H.V. Wiant Jr.r Forest Ecology and Management 104 (1998) 77–88

87

Fig. 2. Contour plot of the correlation coefficient for different values of k and q.

Fig. 2 shows contour plots of r over a range of possible k and q values. Note that values of r less than one are only found when the centroid method is no more than twice as efficient as importance sampling and when the difference between the efficiency of the unadjusted and adjusted estimator is small. The area enclosed by the dashed lines denotes were k and q values are likely to occur.

References Alberta Environmental Protection, Land and Forest Service, 1988. Alberta Stage 3 Forest Inventory: Tree Sectioning Manual. Pub. No. 168. Edmonton, AB, 103 pp. Cochran, W.G., 1977. Sampling Techniques, 3rd edn. Wiley, New York, 428 pp. Gregoire, T.G., Valentine, H.T., Furnival, G.M., 1986. Estimation of bole volume by importance sampling. Can. J. For. Res. 16, 554–557. Gregoire, T.G., Valentine, H.T., Furnival, G.M., 1993. Estimation of bole surface area and bark volume with Monte Carlo methods. Biometrics 49, 653–660. Grosenbaugh, L.R., 1964. Some suggestions for better sample-tree measurement. Proc. Soc. Am. For. Boston, MA, 1963, pp. 36–42. Grosenbaugh, L.R. 1965. Three-pee sampling theory and program ‘THRO’ for computer generation of selection criteria. USDA

For. Serv., Pac. S. W. For. and Range Expt. Sta. Res. Pap. PSW-13, 76 pp. Grosenbaugh, L.R. 1971. STX 1-11-71 for dendrometry of multistage 3P samples. USDA For. Serv. Pub. FS-277, 63 pp. Grosenbaugh, L.R., 1976. Approximate sampling variance of adjusted 3-P estimates. For. Sci. 22, 173–176. Grosenbaugh, L.R., 1979. 3P sampling theory, examples and rationale, USDA BLM Tech. Note 331, 18 pp. Husch, B., Miller, C.I., Beers, T.W., 1982. Forest Mensuration, 3rd edn. Wiley, New York, 402 pp. Kleinn, C., 1993. Single tree volume estimation with multiple measurements using importance sampling and control–variate sampling. In: Wood, G.B., Wiant Jr., H.V. ŽEds.., Modern Methods for Estimating Tree and Log Volume. Proc. IUFRO Conf., 14–16 June 1993, Div. For., W.VA. Univ., Morgantown, VA, pp. 96–104. Schreuder, H.T., Sedransk, J., Ware, K.D., 1968. 3-P sampling and some alternatives, I. For. Sci. 14, 429–454. Schreuder et al., 1971. Schreuder, H.T., Brink, G.E., Wilson, R.L., 1984. Alternative estimators for point-poisson sampling. For. Sci. 30, 803–812. Schreuder, H.T., Ouyang, Z., Williams, M., 1992. Point-poisson, point-pps, and modified point-pps sampling: efficiency and variance estimation. Can. J. For. Res. 22, 1071–1078. Thompson, W.G., 1992. Sampling. Wiley, New York, 343 pp. Valentine et al., 1992. Van Deusen, P.C., 1987. 3-P sampling and design versus modelbased estimates. Can. J. For. Res. 17, 115–117. Van Deusen, P.C., Lynch, T.B., 1987. Efficient unbiased treevolume estimation. For. Sci. 33, 583–590. Wiant, H.V. Jr., Wood, G.B., Miles, J.A., 1989. Estimating the

88

M.S. Williams, H.V. Wiant Jr.r Forest Ecology and Management 104 (1998) 77–88

volume of a radiata pine stand using importance sampling. Aust. For. 52, 286–292. Wiant, H.V. Jr., Wood, G.B., Williams, M., 1996. Comparison of three modern methods for estimating volume of sample trees using one or two diameter measurements. For. Ecol. Manage. 83, 13–16. Williams, T.B., Wiant, H.V. Jr., 1994. Evaluation of nine taper systems for four Appalachian hardwoods. North. J. Appl. For. 11, 24–26. Williams, M.S., Schreuder, H.T., Terrazas, G.H., 1997. Poisson

sampling—the adjusted and unadjusted estimator revisited. USDA For. Serv., Rocky Mtn. For. and Range Expt. Sta. Res. Pap. Žin press.. Wood, G.B., Wiant, H.V. Jr., 1992. Test of application of centroid and importance sampling in a point-3P forest inventory. For. Ecol. Mange. 53, 107–115. Wood, G.B., Wiant, H.V. Jr., Loy, R.J., Miles, J.A., 1990. Centroid sampling: a variant of importance sampling for estimating the volume of sample trees of radiata pine. For. Ecol. Manage. 36, 233–243.