THEORETICAL
POPULATION
Gamma
24, 302-3 12 (1983)
BIOLOGY
Diversity Indices with Distributed Importance
Values
N. I. LYONS AND KERMIT HUTCHESON Department of Statistics and Computer Science, University of Georgia, Athens, Georgia 30602 Received June 9, 1983
The mean and variance of Simpson’s index and the information index of diversity are derived assuming a two-parameter gamma distribution of importance values. The results assume quadrat sampling and do not require counts. Both indices are asymptotically normal except in the equiprobable case. The result holds even if some species do not occur in all quadrats. The information index and its standard error are compared with those obtained by methods of Pielou and Odum using data concerned with time of plowing on the diversity of vegetation. A test for comparing diversities of two populations is suggested.
1. INTRODUCTION The simplest measure of the speciesdiversity of an ecological community in s, the number of species in a simple. Most diversity indices take into account the frequencies (n,, i = 1,2 ,..., s), or counts of individuals of each species.However, it is often impractical to count individuals in a sample. In many grasses,for example, it is impossible to separate individual plants. In some insect populations, counting of individuals is tedious and is not economically feasible unless the data can be used in other studies. A diversity index based on biomass, which can be measured from the combined individuals of each species, or at least per sampling unit, would be preferable. Such an adaptation of Simpson’s and Shannon’s indices for use with biomass is examined in this paper. Quadrat sampling is assumed.The total biomass per quadrat of each speciesis assumedto follow an underlying two-parameter gamma distribution. A well known invariance property of this distribution allows exact moments of the indices to be obtained. Large sample results are also obtained. Although biomass is used as an example throughout, any importance value may be used.
302 0040.5809183$3.00 Copyright All rights
Q 1983 by Academic Press, Inc. of reproduction in any form reserved.
DIVERSITY INDICES WITHOUT COUNTS
303
2. PREVIOUS RESULTS Pielou (1966) and Odum (197 1) use biomass units instead of counts with the information index h = -2 (n,/N) ln(n,/N) by substituting the biomass of species i for n,, so that N = C n, is replaced by the total biomass of the sample. Odum uses the average and standard deviation of the values of h obtained on several subsamples, to obtain an estimate of diversity and its standard error. The estimate of standard deviation is on a subsamplebasis. It is therefore important that subsamples be representative of the entire population sampled, since clumping of speciescould result in severe biasing of the results if the species switch ranks of importance from quadrat to quadrat. This problem is alleviated by Pielou’s method. Pielou uses the weighted differences H = Nkhk -Nk-14-1 k
k = 2, 3, 4,...
Nk-Nk-l
where h, is the diversity, computed using Brillouin’s exact formula, from cumulating the first k subsampleswith biomass Nk. She usesthe average and standard deviation of the Hk’s as estimates, starting k at a value k, at which the graph of h, versus k levels off. However, successivevalues of h, are not independent, which Pielou points out. The covariance of such values is derived by Lyons and Hutcheson (1977). In addition this method may require an excessive number of quadrats. Similar methods could be applied to Simpson’s index. Simpson’s index of concentration is defined as A= C (n,/N)’ where N is a fixed number of individuals from a population with s species.Pielou (1969) suggests d= 1 - 3, as a possible measure of diversity, noting that d, = ((N - 1)/N)d is an unbiased estimator of the corresponding population diversity in this sampling scheme.The mean and variance of A are derived by Simpson, and the skewness and kurtosis by Lyons and Hutcheson (1979). Since these results assume a fixed value of N, the moments of the biased index can be derived from these. If the proportion of biomass of species i is substituted for n,/N in the expressions for the variance, the expression obtained is not independent of the units of measurement of biomass. Thus comparison of estimates in different populations could give conflicting results if the units of measurement are changed. In the next section the mean, variance, skewness, and kurtosis of Simpson’s index and the mean and variance of the information index are derived when the measurements are continuous. This requires an assumption concerning the underlying distribution of biomass for each species.
304
LYONS AND HUTCHESON
3. ASSUMPTIONS AND NOTATION
Suppose q quadrats are randomly selected. For each quadrat let Xii represent the total biomass of all individuals of species i in quadrat j. Assume that Xi,, Xi, ,..., Xi, are independent, identically distributed, twoparameter gamma random variables, i.e., Xii - r(a,,/3), j = 1, 2,..., q; i = 1, 2,..., s. The one-parameter version of this distribution was used by Kempton and Wedderburn (1978), Simpson (1949), and Bailey (1972) in diversity studies. By the reproductive property of the gamma distribution, Xi. = Cj Xii N r(qai, /I), and T = JJi Xi. N r(q 2 ai, p), the biomass of each species, and the combined sample, respectively. Any scale-free function of independent gamma random variables is independent of T (Pitman, 1937). Since h and d are examples of such functions this result is used to derive their moments. 3.1. Moments of d
Stiteler and Pate1 (1969) point out that the joint distribution of Xi, /T, i= I,2 ,..., s, given T, is Dirichlet. By Pitman’s result, the expected value of d is equal to its conditional expectation given T. Therefore, C
ai(4ai
+
l)
E(d)=lpxai(qzai+
1)’
Therefore d is an asymptotically unbiased estimator of 1 - Cf=, (ai/C ai)’ (4+ a)>-
In general, ni=, (q
C
(qa.)“” ai)[k
‘il
’
where (qai)‘Q1 = qa,(qa, + 1) *.. (qai + ri - l), the rth ascending factorial of qai. The second, third, and fourth moments about zero are
E(d2) = (c; qai~[41 ’
E(d3)=
cq c
&,
“-
(qai)14’
+
l$+F
(WiY2’
(Pj)[”
I
’
i
&
ai)[6l I
”
(qa,Y6’
+
3 T+F
(4ai)‘4’
(4aj)‘2’
DIVERSITY
INDICES
WITHOUT
1 E(d4) = (q c ai)181 1 (qaJ@J + 4 cc I i+j + 3 xx
(qcfiy61 (qcQ)12
(qai)[41 (qaj)‘4’ + 6 J:,;>:
iij
305
COUNTS
(qai)‘41 (qaj)“’
(qak)‘2’
i#.i#k
+ ,\;:YFy):
(qaJ”l
(qaj)Izl (qa,)“]
(qal)[*l
.
i#j#k#l
Note that p, which represents a scale parameter in the gamma distribution, does not appear in the formulas. In the equiprobable case (ai = a, i = 1, 2,..., s) the moments reduce to qa- 1 E(d) = qsa - 1
W2)= (qs($4,
{s(qa)‘“l + s(s - l)[(qa)“‘]‘)
1 W3) = (qsa) ,6, {s(qa)‘6’ t 3s(s - l)(qa)‘4’ (qa)“’ + s(s - l)(s - 2)[(qa)‘2J]3} 1 W4) = (qsa)r81 Is(w) Is1 + 4s(s - l)(qa)16’ (q(r)“’ + 3s(s - l)[(qa)‘41]2 t 6s(s - l)(s - 2)(qa)‘“] [(qa)121]2
t S(S- I)(s - 2)(s - 3)[(qa)“‘]“). From these the variance, skewness,and kurtosis of d can be found. 3.2 Moments of h
The first two moments of the information index h=-Cf= can be obtained in terms of Euler Y-functions.
,(Xi/T)ln(Xi./T)
E(h) = - x & E(h2)=
4aiCWi
t
+
1) (y(y(qaj+
i=l
! 4 C
+
!P’(qai + 2) - Y’
ai(4
C
‘i
+
l>
L
q 2 aj + 2
2)-
y
(
q’-Lai+‘)jl
306
LYONS ANDHUTCHESON a,+ X
[!P(qCXi+l)-’
(qxakt2)
)]
1
J*
Higher moments are too complex to obtain for h. Series expressions are available for evaluating Y and Y’, the derivative of the Euler function.
4. ZERO FREQUENCIES
Since X, is a continuous random variable, theoretically an observed value of Xii can never be equal to zero. However, in practice, some quadrats may contain no members of a specieswhich occurs in other quadrats. To account for this, let pi represent the probability that a quadrat contains speciesi, so that 1 -pi is the probability that speciesi does not appear. Then the density function of X, is xii = 0 1 =pi Qa,) fini
i
e-Xij14X”i-
ij
x,
’
>
0.
The joint density of the Xi.‘s, i = 1, 2,..., s, can be expressedas f (X, 9x, ,..., X,) = +
( :i) J‘ fi r-1 o
pli(l
1 e-XilBxy’ Qniai) /3”i”i
-pi)4-ni
I7
O
CO,
where C = n;=, [ 1 - (1 -pi)“], the probability that every speciesis present in at least one quadrat. The exact moments of d and h are not obtainable in closed form because the Pitman conditions are no longer satisfied. In the next section the asymptotic moments and distribution of d and h are derived. Substituting 1 for pi for every species gives asymptotic results for the assumptions of Section 3.
5. LARGE SAMPLE RESULTS
The mean and variance of Xi. are E(XL) =pic@ and Var(Ti.) = aiPZpil 1 + ai( 1 - pi)]/q. The covariance betweenXi, and Xj, is zero. By the multivariate
DIVERSITY
INDICES
WITHOUT
COUNTS
307
central limit theorem the variables Yi = (fii. --pi@) are independent and fi Yi are asymptotically normally distributed. Both diversity indices can be represented as functions of the Yi)s by yi
G(Y, ,..., V=if I
for
proper
C
+
Cyi
PiaiD +PiaiP)
’
choice
of the function J Then &G(F, ,..., Y,,) = is asymptotically normally distributed with mean lh~f(xi.lCxi.) ,D= G(0, O,...,0) = Cf=, f(piai/Cpiai) and variance c2 = g’ Vg, where g’ = (g, ,..., g,) and gi = JG(O,..., O)/aYi, provided not all the g;s are zero. The matrix V is the diagonal matrix of variances of the Yi’s, with ith diagonal element equal to piaiD [ 1 t a;( 1 --pi)]. Now
For Simpson’s index f(x) 4/q(C
a;P;J6
CS
(a;P;(C
=x2 with ,u~= 1 -C: a;P;)
-
C
afpf12
(piai/Cpiai)’
Piail
1 +
ai(l
If pi = 1 the asymptotic mean and variance are ,uld= 1 - C: (ai/2
4 I\> ?- (y)2 is u’=q(Cai)4
I-”
a,
For the information index, f(x) = -x In x, with
and
XP;a;[l + ai(l -Pj)]/q* Substituting pi = 1 the results become
and
and ai=
-Pi)17for large9. ai)’ and
308
LYONS AND HUTCHESON
All results are independent of p. Bailey (1972) states that Simpson’s index approaches a Pearson Type III distribution in the equiprobable case. 6. ESTIMATION OF a There are several methods of estimating the parameters a in the gamma distribution (see Johnson and Katz, 1969). The method of approximating the maximum likelihood estimator, due to Greenwood and Durand (1960), is used as an example. The parameter a is estimated in two ways depending upon the value of y = log(arithmetic mean of X,‘s/geometric mean of X,‘s). In particular, if 0 < y < 0.5772, then ii = $ (0.5000876 + 0.1648852~ - 0.0544272~~)
(1)
and if 0.5772
(2)
Bowman and Shenton (1968) state that the error in approximation of the maximum likelihood estimate does not exceed0.0088 % for (1) and 0.0054 % for (2). The maximum likelihood estimator for p is (3) For the case of zero frequencies, jYi = ni/q, and the conditional maximum likelihood estimator of ai can be obtained using (1) and (2) for those species occurring in more than one quadrat (ni > 1). Then j? can be obtained from (3) using only those species.For the remaining species for which n, = 1 the authors use cii = Xi. /j?. An outside estimate of these aI)s may also be used. The accuracy of the asymptotic moments and normal approximation was checked by computer for several values of ai’s for the non-zero frequency case with Simpson’s index by Mai (1982) for the cases s = 2,3, 5, 10, and 15. The normal approximation appears to be better for more skewed distributions of the ats. For q = 10 the skewnessand kurtosis were 0.8 and 2.8, respectively, for the most skewed cases considered. With the above results it is possible to estimate and compare population diversities based on either index in the usual manner (see Lyons, 1981).
DIVERSITY
INDICES
WITHOUT
TABLE
309
COUNTS
I
Biomass of Vegetation by Genus in Each of Five Plots Plowed in Five Different Months at Horseshoe Bend Experimental Area, Athens, Georgia, 1977 Plowing date Category
April
May
June
July
August
Litter Trifolium Geranium Solidago Campsis Bromus Ambrosia Lactuca Solarium Oenothera Rumex Vicia Allium Ipomea Heterotheca Specularia Oxalis Lolium Cynodon Veronica Lepidium Digitaria Aster Ranunculus Cerastium Standing Dead Daucus Poa Hordium Cassia Privet Mollugo Cyperus Similex Plantago Duchesnia Rubus Quercus Aster
254.95 2.88 41.98 21.30 5.19 20.20 12.81 10.88 6.17 5.27 33.37 18.95
166.52 105.35 14.11 11.82 29.49 63.26 10.95 3.49 12.72 1.71 0.27 49.89 8.27 0.17 15.60 41.86 36.26 13.70 0.45 7.99 2.55 43.78 0.36 3.85 0.12 0.95 -
171.16 163.13 5.89 19.58 99.52 278.85 4.54 1.83 5.71 4.08 3.34 Il.98 1.16 0.42 2.59 0.52 61.24 2.24 1.45 3.32 0.72 127.18 10.32 0.85 0.06 0.79 0.37 3.41 0.17 1.52
320.09 58.47 28.65 54.99 123.82 163.53 5.37 7.07 5.57 20.56 0.03 Il.64 5.27 0.45 3.46 0.86 8.20 15.87 0.14 6.12 40.28 110.53 154.33 0.60 0.09 19.44 0.09 -
165.12 60.47 16.66 109.23 20.65 489.01 0.4 I 16.21 0.45 147.13 165.41 27.37 9.34 0.35 0.10 5.30 0.44 14.24 1.29 0.91 4.19 5.39 0.5 I -
0.76 26.37 1.04 25.87 74.31 10.25 0.27 44.48 6.55 124.56 6.76 0.30 1.08 0.09 0.17 0.10 2.50 -
310
LYONS AND HUTCHESON
7. APPLICATION An experiment was performed to assessthe effect of length of time after plowing on the diversity of plant communities at Horseshoe Bend experimental area, Athens, Georgia (Odum, 1975). One experimental plot was plowed on each of five successive months (April-August 1977). Ten 1 a m2 quadrats were sampled in each plot the following spring. Vegetation was harvested at ground level, identified, dried, and weighed. The biomass diversity of the plots are compared using the methods above. Since each genus had only one representative in the study area, with only rare exceptions, species are listed only by genus. For those exceptions the species were lumped together so that the comparison is a‘ctually genus diversity. There were a total of 39 categories in all five samples. A listing of the data appears in Table I. Using Simpson’s index there was a significant difference between April and August plowing (z = 2.16). With the information index April plowing was significantly different from both June and August plowing (z = 2.44, z = 2.97). The 39 categories included two labelled “litter” and “standing dead.” The standing dead did not appear in the August sample. If these categories are removed, April plowing was different from both June and August using both indices. The Pielou and Odum methods were used with the information index to compare the estimates of diversity obtained and their standard errors with those obtained using the authors’ method (gamma method). The results appear in Table II. The litter and standing dead categories were omitted. Odum’s method appears to severely underestimate the information diversity. His estimate of standard error is fairly consistent in magnitude across samples, as might be expected. The gamma method should produce TABLE II Estimate/(Standard Error) of the Information Diversity for Five Plowing Dates Using the Pielou, Odum, and Gamma Methods Plowing date Method
April
May
June
July
August
Pielou
2.153 (0.1946)
2.550 (0.1746)
1.770 (0.2233)
2.341 (0.07624)
2.169 (0.4062)
Odum
2.064 (0.06694)
1.623 (0.08346)
1.386 (0.09482)
1.492 (0.1085)
1.238 (0.1546)
Gamma
2.636 (0.06529)
2.437 (0.3199)
1.790 (0.1171)
2.219 (0.3728)
1.799 (0.1510)
DIVERSITY INDICES WITHOUTCOUNTS
311
the best point estimate of diversity since the data from the entire sample is used. The Pielou and gamma methods result in estimates which are close in value, however, the estimates of standard error obtained by these two methods are more variable than Odum’s. Ten quadrats appear sufficient in this example for Pielou’s method although the choice of k, was somewhat arbitrary (k, = 3, 3, 8, 4, and 7, respectively, for the five samples). If the assumption of a gamma distribution is appropriate and the number of quadrats is adequate, the authors believe the gamma method is preferable to that of Pielou or Odum, due to the clear superiority of the point estimate and the lack of necessity of choosing the k, value, which for sufficiently large q could make the variance of the estimate deceivingly small.
ACKNOWLEDGMENTS This research was supported by National Science Foundation Grant DEB 82-01564. through the Institute of Ecology at the University of Georgia. The authors thank E. P. Odum and J. Pechmann for the use of their data.
REFERENCES BAILEY, R. C. 1972. “A Montage of Diversity,” Ph. D. Dissertation, Emory University, Atlanta, Ga. GREENWOOD,J. A., AND DURAND, D. 1960. Aids for fitting the gamma distribution by maximum likelihood, Technometrics 2, 55-65. JOHNSON, N. L., AND Korz, S. 1969. “Distribution in Statistics: Continuous Univariate.” Houghton Mifflin, Boston. KEMPTON,R. A., AND WEDDERBURN,R. W. M. 1978. A comparison of three measures of species diversity, Biometrics 34, 25-37. LYONS, N. I. 1981. Comparing diversity indices based on counts weighted by biomass or other importance values, American Naturalist 118, 438-442. LYONS, N. I., AND HUTCHESON,K. 1977. “Species Diversity: A Stopping Rule and FORTRAN Program,” Tech. Rep. 124, Department of Statistics and Computer Science, University of Georgia, Athens, Ga. LYONS, N. I., AND HUTCHESON,K. 1979. Distributional properties of Simpson’s index of diversity, Comm. Statist. A-Theory Methods 8, 569-574. MAI, N. M. 1982. “The Index of Diversity Based on Variable Biomass Weights,” MAMS Thesis, Department of Statistics and Computer Science, University of Georgia, Athens, Ga. ODUM, E. P. 1971. “Ecology,” Holt, Reinhart & Winston, New York. ODLJM, E. P. 1975. “The Subsidy-Stress Gradient: Assessment of Experimental Perturbations on the Basis of Ecosystem-Level Properties,” NSF Grant DEB 7513962. PIELOU, E. C. 1966. The measurement of diversity in different types of biological collections, J. Theor. Biol. 13, 131-144. PIELOU, E. C. 1969. “An Introduction to Mathematical Ecology,” Wiley-Interscience, New York.
312
LYONS AND HUTCHESON
PITMAN, E. J. G. 1937. The “closest” estimates of statistical parameters, Proc. Comb. Phil. sot. 33, 212-222. SHENTON,L. R., AND BOWMAN,K. 0. 1977. “Maximum Likelihood Estimation,” MacMillan Co., New York. SIMPSON,E. H. 1949. Measurement of diversity, Nafure 163, 688. STITELER,W. M., AND PATEL, G. P. 1969. Variance-to-mean ratio and Morisita’s index as measures of spatial patterns in ecological populations, in “Statistical Ecology: Spatial Patterns and Statistical Distributions,” Pennsylvania State Univ. Press, University Park, Pa. WILHM, J. L. 1967. Use of biomass units in Shannon’s formula, Ecology, 49, 153-156.