Simulation of counts of aphids over two hectares of Brussels sprout plants

Simulation of counts of aphids over two hectares of Brussels sprout plants

Computers and Electronics in Agriculture 21 (1998) 33 – 51 Simulation of counts of aphids over two hectares of Brussels sprout plants Joe N. Perry a,...

149KB Sizes 0 Downloads 17 Views

Computers and Electronics in Agriculture 21 (1998) 33 – 51

Simulation of counts of aphids over two hectares of Brussels sprout plants Joe N. Perry a,*, W.E. Parker b, Lynda Alderson a, Sam Korie c, J.A. Blood-Smyth d, R. McKinlay e, S.A. Ellis f a

Department of Entomology and Nematology, IACR-Rothamsted Experimental Station, Harpenden, Herts AL5 2JQ, UK b ADAS Wol6erhampton, Woodthorne, Wergs Road, Wol6erhampton WV6 8TQ, UK c Department of Statistics, IACR-Rothamsted Experimental Station, Harpenden, Herts. AL5 2JQ, UK d ADAS Arthur Rickwood, Mepal, Ely, Cambridgeshire CB6 2BA, UK e Scottish Agricultural College, The King’s Buildings, Edinburgh, EH9 3JG, UK f ADAS High Mowthorpe, Duggleby, Malton, N. Yorks YO17 8BP, UK Received 13 May 1997; received in revised form 14 May 1998; accepted 1 June 1998

Abstract Populations of the mealy cabbage aphid, Bre6icoryne brassicae (L.), on Brussels sprout plants, Brassica oleracea (L.) were sampled, fortnightly throughout the early season of 1996, in 2 ha2 blocks. Sampling was done on two spatial scales, firstly over 25, 5× 5 m sample areas, and then on the 20 plants within each of these areas. On each occasion, the occurrence of aphids was recorded on all the 500 plants sampled per block (one in 16 plants) and the count of the number of aphids per plant measured on a selected five plants per area, yielding information from 125 plants per block (one in 64 plants). The data were analysed to study various aspects of the aphids’ frequency distribution at each of the two scales, particularly the relationship between variability and population density, and the relationship between the proportion of plants infested and density. Patterns of the incidence of infestation differed radically between the two blocks. Also, for a given aphid density, more plants were infested as the season progressed, and at that density there were therefore, on average, fewer aphids per plant at later dates. Methodology was outlined to simulate realistic aphid counts, conforming to the relationships found for each of the two scales around the time of the achievement of a threshold incidence of 10% infestation. This was used to generate a count * Corresponding author. Tel.: +44 158 2763133; fax: +44 158 2760981. 0168-1699/98/$ - see front matter © 1998 Elsevier Science B.V. All rights reserved. PII S0168-1699(98)00026-X

34

J.N. Perry et al. / Computers and Electronics in Agriculture 21 (1998) 33–51

for each of the 8000 plants per block, for each of the blocks, representing a 64-fold increase in information. It was verified that these simulations provided realistic sets of counts, that could be used subsequently to study the efficiency of aphid sampling schemes and to replace costly extensive surveys for the development of control strategies. © 1998 Elsevier Science B.V. All rights reserved. Keywords: Simulation; Heterogeneity; Spatial scale; Counts; Variance – mean relationship; Incidence; Spatial pattern

1. Introduction This paper develops simulation methodology to generate frequency distributions typical of aphid populations on Brussels sprout plants. Such simulated data is required as the basis for the assessment of likely yield losses and for the development of efficient sampling schemes (Perry, 1994). Currently, to provide a fair test of competing schemes, or to properly validate existing schemes, very intensive sampling is required, so that, by adequate replication, stochastic effects may be minimised (Perry, 1989). It would be much cheaper to sample just sufficient units to characterize the aphid populations accurately, and then to simulate large quantities of further data that matched closely their observed characteristics (Perry, 1997a). Then, various schemes could be tested on this simulated data, experiments performed, ‘what-if’ scenarios played out, and probabilities of exceeding various thresholds estimated numerically, at a fraction of the cost previously incurred. The approach described in this paper has four phases. Firstly, the data is analysed to reveal the most important features of the aphid frequency distribution, and how these depend crucially on the density of population; the emphasis is on selection of the simplest statistical relationships adequate to describe the data, using methods such as those in Perry (1982). Secondly, simulation methodology is outlined to generate data that conforms to the fitted relationships found in the analysis phase. Thirdly, a large set of such data is generated. Fourthly, the simulated data is examined to verify that it conforms to the fitted relationships and to compare it with features of the original observed data. Note that the primary aim of the simulations is to match the fitted relationships, and not to reproduce the original data itself. This is because the analysis phase seeks to allow for the inevitable stochastic random variability in the data, so that the generation phase may reproduce the signal and reflect the correct degree of appropriate stochastic variation, rather than reproducing slavishly the noise inevitably present in the observed data. The observed data is by definition sparse, relative to that generated; hence the data generated may be verified, but cannot sensibly be validated against that observed. Data concerning insect pests (Perry, 1997b) is collected usually in one of two forms. It may be a count, for example, the number of aphids per plant. Alterna-

J.N. Perry et al. / Computers and Electronics in Agriculture 21 (1998) 33–51

35

tively, it may be a record of the presence or absence of an insect irrespective of its density in that sampling unit; this is termed its incidence, for example, the number of infested plants out of twenty examined. Incidence is used particularly when to count accurately would be too laborious to be cost-effective. Both the statistical form of the frequency distribution and the part of this distribution that governs the incidence, i.e. the proportion of counts that are zero, are known to depend separately on population density. Efficient sampling schemes must allow simultaneously for both these sources of variation (e.g. Blackshaw and Perry, 1994). Here, this density – dependence is modeled by two well-known relationships that have a central role in economic entomology. The variance-mean power-law for counts, discovered by Taylor (1961), remains a citation classic (see review by Taylor, 1984). The complementary log – log incidence-mean relationship, described first by Kono and Sugino (1958), has likewise been used (Perry, 1987) for a very wide range of pests (see review by Binns and Nyrop, 1992). In addition, simple empirical indicators of distribution such as minimum and maximum and range will be adopted to describe the observed counts. The data was collected in an intensive survey during 1996, for a project (Parker et al., 1997) that aims to develop an efficient sampling scheme for the mealy cabbage aphid, Bre6icoryne brassicae (L.), on Brussels sprout plants, Brassica oleracea (L.). The current early-season threshold for intervention to control cabbage aphids on Brussels sprouts is an incidence of 10%; this paper aims to generate data with that incidence. The numeric information in frequency distributions with which this study is concerned is independent of the spatial information in observed samples that relates to the location of the data and their arrangement with respect to one another; this aspect of simulation (Perry, 1996) is not addressed here. However, since the project sampled on two spatial scales, and since the numeric relationships studied are known to differ between scales (Bliss, 1941; Greig-Smith, 1952; Clark and Perry, 1994; Perry 1995), this is allowed for fully. Subsequent sections of this paper will describe the observed data, define notation, give the methods of analysis and present the results, detail the methodology for the generation of counts contingent on those results, and illustrate its use to simulate 16000 integer counts.

2. Materials and methods

2.1. Experimental sites Field assessment of cabbage aphid populations was done in a commercial field of Brussels sprouts at Mepal, Cambridgeshire; the crop remained entirely untreated for aphids throughout the season. Work was done in two 1 ha square blocks in each field, one (A) located in the south-west corner of the field, the other (B) in the north-east corner. Each block contained 25, 5× 5 m sampling areas (large-scale sample units), each of which comprised a 4× 5 block of 20 plants (small-scale sample units) (Fig. 1). On each assessment date, the incidence of aphids was

36

J.N. Perry et al. / Computers and Electronics in Agriculture 21 (1998) 33–51

recorded on each of the 20 plants within each sample area, but a full count was made on only five of these. Aphid recording commenced on July 16, : 1 week after planting, and continued for ten further occasions until 20 November. Since only 25 out of 400 possible 5× 5 m sample areas were sampled from each 1 ha block, a 16-fold increase in the incidence data observed would be required to generate data to represent each possible plant in the block. Furthermore, since aphids were only counted on one in every four plants examined, a 64-fold increase in the data, from 125 to 8000 counts, would be required to generate data with maximal coverage. The aim of this paper is to generate 8000 such counts for each of the sampled blocks, to typify the frequency distributions and fitted relationships found for each of the 400 possible sampling areas, to represent the period around the time that the threshold is achieved.

2.2. Notation and analyses On any given occasion, occurrence of infestation was indicated by P= 1 if aphids were present on the plant and P =0 otherwise, so occurrence for the kth plant (at the small scale), k=1, ..., 20, in the jth (large-scale) sample area was denoted as Pjk. For the jth large-scale sample area, incidence was defined as the proportion of

Fig. 1. Schematic representation of the square hectare block A, after Parker et al. (1997), not to scale. The small squares shown represent 5 ×5 m sample areas. There were 25 of these, at what is termed the large scale. The layout of the other block, B, was identical, except that the sample areas formed a mirror-image of those for A reflected in the south-west corner.

J.N. Perry et al. / Computers and Electronics in Agriculture 21 (1998) 33–51

37

plants infested, pj =Sk Pjk /20. Incidence was also calculated over the entire block, as P =Sj Sk Pjk /500. The count for the jth area, for those five values of k for which it was recorded, was denoted Cjk. The total count for a sample area over the five plants sampled was denoted as Tj = Sk Cjk. Each of the two blocks within a site were treated separately, unless analysis showed that they could be combined. For each scale, the parameters, log10 a and b of Taylor’s power law (Taylor, 1961; Perry and Woiwod, 1992; Clark and Perry, 1994) were estimated. The dependence of sample variance, s 2, on sample mean, m, was fitted as the simple, least-squares, linear regression model (Perry and Woiwod, 1992), using the usual form of logarithms to base 10: E[log10 s 2]= log10 a + b log10 m

(1)

where E[ ] denotes expectation. At the small scale, the regression was calculated, for each occasion separately, from the j= 1,...,25 points comprising the pairs of logarithmically-transformed (base 10) sample means, mj, and variances, s 2j , computed over the five counts, Cjk, within the kth sample area. The regressions for each occasion were compared to see whether their intercepts and/or slopes were equal (Perry, 1982). At the large scale, the regression was calculated from the 11 points, one for each occasion over the entire season, comprising the transformed meanvariance pairs, computed over the 25 totals, Tj. For each scale, the parameters, log10 c and d of the complementary log–log incidence-mean relationship (Perry, 1987) were estimated. The dependence of incidence, P, on sample mean, m, was fitted as the usual simple, least-squares, linear regression model using natural logarithms: E[ln( − ln(1 − P)] = ln(c) +d ln m

(2)

On almost all occasions there was at least one occupied plant per sample area, so analysis was feasible only at the small scale. Two regression models were studied. In the first, the regression was calculated from the 11 points, one for each occasion over the entire season, comprising the transformed incidence over the entire block, P, and the corresponding sample mean over the entire block. This was used to provide a rough estimate of the mean density pertaining when the 10% incidence threshold was achieved. In the second, the regression was calculated, for each occasion separately, from the j= 1,...,25 points comprising the pairs of transformed sample means, mj, and incidences, pj, computed within the kth sample area. Again, the regressions for each occasion were compared to see whether they were coincident or parallel.

3. Results of analyses Aphid invasion appeared to occur earlier in block A, although infestations were recorded in a few sample areas of both blocks on 16 July (occasion 1) when the overall sample mean density of aphids in blocks A and B was, respectively, 6.2 and 0.2. By 30 July (occasion 3), almost all sample areas in both blocks had at least one

38

J.N. Perry et al. / Computers and Electronics in Agriculture 21 (1998) 33–51

Table 1 Systematic variability in estimated Taylor’s power law parameters throughout the season, for selected sample occasions Date

16 July 23 July 30 July 7 August 1 October 20 November

Occasion

1 2 3 4 8 11

Estimated parameter Intercept, log10 as(t)

Slope, bs(t)

Block A

Block B

Block A

Block B

0.699 0.581* 0.493 0.261 −0.955 −0.413

0.540 0.699 0.629* 0.457 −0.453 −0.346

2.000 1.974* 1.964 2.033 2.406 2.216

2.205 2.005 1.850* 1.927 2.348 2.204

*, Estimates that pertain on the threshold occasion, T, used subsequently in the simulation of counts. S.E. of all intercepts were between 0.35 and 0.65, and of all slopes were between 0.15 and 0.4.

infested plant. Aphid density continued to increase virtually monotonically in each block, until, when sampling ceased on 20 November (occasion 11), it had reached 314.4 and 108.6, respectively in the two blocks. Unusually for insect data, the small-scale power-law analysis fitted to each occasion separately showed that the estimated intercepts and slopes, although typical of those for other insect species (Taylor et al., 1978), changed systematically (PB0.001) throughout the season, in a broadly consistent fashion for both blocks. The estimated intercepts declined sharply as the season progressed (Table 1) to a minimum in October, before increasing slightly by the end of the season in late November; the estimated slopes remained around 2.0 for the early part of the season, before increasing markedly. This was due to similar temporal variability in the incidence-mean relationship, described below. The large-scale power-law analysis showed that the estimated intercept and slope did not differ significantly between the blocks (Fig. 2) and that all the data was well-fitted by the line with log10 a= 0.775 (S.E. 0.142), and b = 1.896 (S.E. 0.053). The full-season incidence-mean analysis, over each entire block, showed that the fitted lines for the two blocks differed significantly (F2,18 = 4.44). For block A, the estimated intercept, ln(c), was −6.238 (S.E. 0.766) and slope, d, was 1.187 (S.E. 0.142), while for block B, the estimated intercept was − 2.824 (S.E. 0.375) and slope was 0.759 (S.E. 0.098). As expected, in both blocks the incidence of infestation increased both with increasing density and as the season progressed; this increase was greater for block A. For an identical density of aphids observed in the two blocks a greater incidence was observed for block B (Fig. 3), i.e. the same number of aphids infested more plants (but with fewer per plant on average) in block B than in block A. For block A, interpolation gave an estimated threshold date of 24 July for 10% incidence; we later approximated this for simulation purposes by 23 July (occasion 2) and used relationships pertaining to this occasion, when 43 of the 500 sampled

J.N. Perry et al. / Computers and Electronics in Agriculture 21 (1998) 33–51

39

plants were infested (P =0.086) with a maximum count of 1500. Similarly, for block B, interpolation gave an estimated threshold date of 29 July; this we later approximated by 30 July and used relationships pertaining to occasion 3, when 79 plants were infested (P =0.158) with a maximum count of 54. There was an important difference in the pattern of incidence within sample areas between the two blocks on these occasions. Given the overall incidence, p, over the block, if aphids had infested each sample area with equal probability, the expected number of sample areas with r infested plants out of the 20 sampled would follow a binomial distribution, and the goodness-of-fit could be tested easily. For block B,

Fig. 2. Taylor’s power law, single fitted linear regression at the large scale over full season. Observed sample variance, logarithmically transformed (base 10), versus observed mean density transformed similarly. Data represented by symbols: ‘1’ and ‘3’ refer to block A; ‘2’ and ‘4’ to block B; the different symbols within blocks are not relevant in this context. The line where variance equals mean is shown for reference.

40

J.N. Perry et al. / Computers and Electronics in Agriculture 21 (1998) 33–51

Fig. 3. Fitted linear regressions of observed incidence transformed by a complementary log – log transformation, on observed mean density transformed to natural logarithms, at the small scale for block A (symbols ‘a’), and block B (symbols ‘b’), for the 11 sampling occasions. The upper line is the equality line, shown for reference; a value on the y-axis of − 0.366 corresponds to an incidence of 50%, a value of zero to an incidence of 63%; a value on the x-axis of zero corresponds to a density of unity.

the values of r observed were much as expected (x 29 = 9.3, P\ 0.05); for example there was one sample area with no plants infested (r= 0) compared with 0.80 expected and four sample areas with one infested plant (r= 1) compared with 3.29 expected. By contrast, for block A there were 11 sample areas with r=0, many more than the 4.15 expected from the binomial distribution, plus two areas with six and one with seven plants infested, where only 0.22 and 0.04 would be expected by chance. The source of this large extra-binomial heterogeneity in block A (x 27 = 57.8, PB 0.001) is unknown, but one feasible scenario, employed for simulation below,

J.N. Perry et al. / Computers and Electronics in Agriculture 21 (1998) 33–51

41

concerns a two-stage colonisation process. In the first stage, aphids may invade a proportion, say 1 − u, of the available sample areas (but do not necessarily infest all of them), but must of necessity leave a proportion u completely uninfested. In the second stage, in those sample areas where there is some infestation, those aphids present distribute themselves over the 20 plants, infesting a proportion, pj, dependent on their density, mj, in that sample area. Perhaps the most striking and unusual result concerned the systematic changes (PB0.001) over time in the incidence-mean analysis, done within sample areas, for separate occasions. For both blocks, the best-fitting model was a set of parallel lines with common positive slope, d, constant over the occasions, and intercepts, ln(c), that increased systematically with time (Table 2). In both blocks, incidence increased with time, and this was not only an effect of the increasing density, but also an effect that was manifest at a given density. Hence, for a given aphid density, more plants were infested as the season progressed, although for that density there would then be, on average, fewer aphids per plant at later dates. For example, for an observed density of five aphids per plant, the predicted infestation incidence rose steadily over the period from 16 July (occasion 1) to 17 September (occasion 7) from 0.073 to 0.820 in block A, and from about 0.04 to 0.843 in block B. As a further illustration of the magnitude and variability of this effect, consider for block A, all the instances of sample areas that fell within the observed narrow range of mean densities of between 4.0 and 7.0 aphids per plant, and how the number of infested plants out of 20 recorded for each such area increased as the season progressed (Table 3). From Tables 2 and 3 it is clear that the effect of the increase in incidence that was due to changes in time greatly exceeded in magnitude the effect due to changes in density; further biological experimentation is required to explain the mechanisms responsible.

Table 2 Systematic variability in estimated incidence-mean parameters throughout the early season Date

16 July 23 July 30 July 7 August 20 August 3 September 17 September 1 October

Occasion

1 2 3 4 5 6 7 8

Estimated parameter Intercept, ln(c(t))

Slope, d(t)

Block A

Block B

Block A

Block B

−2.954 −2.225* −1.681 −0.983 −0.364 −0.029 0.181 0.551

— −2.653 −1.826* −1.408 −0.774 −0.604 −0.090 0.313

0.226* (S.E. 0.071)

0.188* (S.E. 0.083)

*, Estimates that pertain on the threshold occasion, T, used subsequently in the simulation of counts.

J.N. Perry et al. / Computers and Electronics in Agriculture 21 (1998) 33–51

42

Table 3 The increase in infestation incidence with time in block A Date

Occasion

Aphid density

Number of infested plants

23 July

2

4.8 5.0 6.0

7 1 2

30 July

3

4.6 5.4

6 1

7 August

4

6.0 6.8

8 7

20 August

5

6.4

14

17 September

7

6.4 6.6

19 15

The number of infested plants out of 20 for all sample areas that recorded mean densities of between 4.0 and 7.0 aphids per plant.

4. Simulation methodology Symbols relating to statistics of generated counts are denoted by * to distinguish them from those relating to the observed data. All random number generation was done using the package Genstat (Payne and Members of the Genstat 5 Committee, 1993). The methodology will be illustrated solely by reference to block A, although the formulae are completely analogous for both blocks. The overall target mean, m*, for the generated counts was calculated by substitution of the target value, P*= 0.1, into Eq. (2), with the parameter estimates reported above of ln(c) = − 6.238 and d= 1.187, from the full-season analysis for block A, yielding m* =28.77. This small-scale mean was equivalent to a large-scale total per five plants sampled of T*= 143.85. From this value, the expected variance of the totals over the sample areas was calculated from equation 1, using the large-scale power-law parameter estimates reported above of log10 a= 0.775 and b= 1.896, as s 2*(T*) =73445.6. Simulated large-scale totals were generated for each of the 400 target sample areas from a simple two-parameter log–normal distribution with mean 143.85 and variance 73445.6. It may be shown that if Z is a standard normal random variable then: 1

exp{(ln[1 + (s 2*(T*)/T *2)])2Z+ [ln(T* 4)− ln(s 2*(T*)+ T* 2)]/2} has a log – normal distribution with the desired mean, T*, and variance, s 2*(T*), so a uniform random variable on the interval (0,1) may be converted easily, first to a standard normal random variable, and then to the required log–normal random variable. Before simulation of the small-scale counts, each of the j= 1,…,400 simulated large-scale totals per sample area was transformed back to a small-scale mean density per plant, m*j , by dividing by 5.

J.N. Perry et al. / Computers and Electronics in Agriculture 21 (1998) 33–51

43

For the k= 1,...,20 plants within each sample area, we simulated counts from a negative binomial distribution (Clark and Perry, 1989), which is known to provide a good fit to a wide range of insect data, adjusted to generate extra zeroes. The unadjusted distribution had mean, m*j , and shape parameter, kj (defined as in Taylor et al., 1979), where: kj =

m*j am*j (b − 1) − 1

(3)

with log10 a =0.581 and b =1.974 (Table 1), which ensured that the counts generated conformed to the observed small-scale power-law relationship corresponding to occasion two. The adjustment for extra zeroes was required because this distribution would be unlikely by chance to generate a sufficient overall proportion of zero counts to match the overall value required, 0.9, or to follow the within sample area incidence-mean relationship for the specific occasion two. The observed small-scale variance-mean relationship was maintained through the result that if t counts have a sample mean of u and a sample variance of 6, and if t− r of them are zero, then the sample mean of the subset of r non-zero counts is tu/r and the sample variance of this subset is: [(t-1)6/(r-1)] −[(t −r)ru 2/t(r-1)]; the result was used to adjust the values of m*j , s 2* j and hence kj, for each sample area, as required. For block A, the two-stage colonisation process to simulate zeroes was adopted, as outlined above. To match roughly the observed proportion of 0.44 completely uninfested sample areas observed on occasion two, the value of u required for an overall incidence of P = 0.1 was calculated to be 0.318. In the first stage, a binomial (n= 1, P = 0.318) random variable was independently simulated for each of the 400 sample areas, to assign, with probability u, some sample areas as being completely uninfested. (This part of the process was not followed for block B, i.e. u was set to zero). Now, let the j= 1, …, 400 quantities: exp{−exp[ln(c)+ d ln(m*j )]}, where ln(c) = − 2.225 and d = 0.226 are the estimated within sample area incidence-mean parameters for block A on occasion two (see Table 2), be denoted as fj, and let the j= 1, …, 400 quantities: [400(0.9− u)fj ]/[(1 − u)Sj fj ] be denoted as cj. In the second stage, attention focuses on just those sample areas that were not assigned as being completely uninfested in the first stage. Within these, each of the k= 1,…,20 plants is, independently, declared uninfested with probability cj. The number uninfested within each such sample area is simulated from the random variable Rj = int[(Xj /5) +0.5], where Xj is a binomial (n=100, P= cj ) random variable. Simulations from the negative binomial distribution were made by using the fact that it may be represented as a Poisson distribution, the parameter of which varies as a gamma (e.g. Johnson and Kotz, 1969). If for any sample area the adjusted shape parameter was negative, then simulations were taken instead from a Poisson distribution with the same adjusted mean.

44

J.N. Perry et al. / Computers and Electronics in Agriculture 21 (1998) 33–51

5. Verification of simulations The properties of the two sets of 8000 generated counts, C*, were checked to verify whether they matched satisfactorily with the target overall mean, target overall incidence, and the relationships fitted in Section 3. For example, for block A, the overall mean generated was 35.86 (target 28.77) and the generated incidence was 0.1012. It was of some interest also to compare informally the frequency distribution of the data generated with that observed on the dates when the threshold was approximately achieved, but care is required in such a comparison. For example, for block A, the value of the largest count generated was 6477, but this cannot be compared directly with the observed maximum because the sample sizes differ. The 8000 generated counts would be expected to have more extreme values and to cover a larger range than the 125 counts observed on the specific occasion two. Further, the frequency distribution of the observed counts comprised so few non-zero values that it contained relatively little information. In any event, since the estimated date of achievement of the threshold was 1 day different from the nearest occasion when data was observed for both blocks, and since it is the fitted relationships and not the observed data that the generated data were intended to mimic, any comparison between observed and generated data should be informal only. However, given all these caveats, the generated and observed frequency distributions (Table 4) did show good concordance and the histogram for generated counts was typical of those for insect distributions (Perry and Taylor, 1988). A fairer comparison was achieved by taking random samples from the generated counts with the same structure as that of the observed, i.e. by considering 25 randomly-chosen sample areas of the 400 generated, with incidence measured on all plants and counts measured on a randomly-chosen subset of five plants, measuring various statistics on this set of 500 plants, and then repeating the process many times. Two hundred sets of such samples gave a mean sample mean of 36.48 (compare with target value of 28.77 and note informally that the corresponding value observed on occasion two was 17.01), a mean sample variance of 43560 (note informally that the corresponding value observed on occasion two was 19193), and a mean maximum value of 1380 (note informally that the corresponding value observed on occasion two was 1500). Two hundred power-law regressions were also done from random samples of the generated counts; these gave a mean estimated value of log10 a of 0.732, with a mean S.E. of 0.183 (compare with target value of 0.581 with S.E. of 0.530 from fitted relationship) and a mean estimated value of b of 1.968 with mean S.E. of 0.107 (compare with target value of 1.974 with S.E. of 0.343 from fitted relationship). The fit of the generated data from the first of these 200 regressions is shown, for visual comparison with that observed for occasion two, in Fig. 4. The same number of incidence-mean regressions gave a mean estimated value of log10 c of − 2.128, with a mean S.E. of 0.144 (compare with target value of −2.219 with S.E. 0.191 from fitted relationship) and a mean estimated value of d of 0.137 with mean S.E. of 0.045 (compare with target value of 0.226 with S.E. of 0.071 from fitted relationship). In all of the above cases the

112 2 1 3 3 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 1 6 11 21 31 41 51 61 71 81 91 101 111 121 131 141 151 161 171 181 191 201 211 221 231

Lower 0 5 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240

Upper 7190 22 23 18 17 19 21 19 15 20 26 34 27 26 17 19 19 19 21 16 14 18 21 12 15 18

104 1 2 4 1 1 3 2 0 0 0 0 0 0 1 0 0 0 2 0 1 0 0 0 0 0

Observed counts

Simulated counts

Observed counts

Class limits

Block B

Block A

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Lower

Class limits

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Upper 7197 8 25 17 28 39 36 35 41 32 36 24 25 30 21 33 20 17 17 24 16 14 13 13 11 13

Simulated counts

Table 4 Histograms of 125 observed counts sampled 1 day prior to estimated date of achievement of 10% incidence threshold (on 23 July for block A and 30 July for block B), and histograms of 8000 simulated counts generated to match relationships found at 10% incidence J.N. Perry et al. / Computers and Electronics in Agriculture 21 (1998) 33–51 45

0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0

241 251 271 301 326 351 376 401 426 451 501 601 701 801 901 1001 1251 1501 2001 3001 4001 5001 6001 \7001

Lower

Upper 250 275 300 325 350 375 400 425 450 500 600 700 800 900 1000 1250 1500 2000 3000 4000 5000 6000 7000 \7001

11 22 20 21 27 23 16 18 11 14 18 21 16 4 9 25 11 6 10 7 3 0 1 0

0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0

Observed counts

Simulated counts

Observed counts

Class limits

Block B

Block A

Table 4 (continued)

26 27 28 29 30 31 36 41 46 51 61 71 81 91 101 151 201 251 301 400 \500

Lower

Class limits

26 27 28 29 30 35 40 45 50 60 70 80 90 100 150 200 250 300 400 500 \500

Upper 17 13 12 10 6 33 20 23 20 23 9 5 4 5 8 4 1 1 1 0 0

Simulated counts

46 J.N. Perry et al. / Computers and Electronics in Agriculture 21 (1998) 33–51

J.N. Perry et al. / Computers and Electronics in Agriculture 21 (1998) 33–51

47

generated value is no more than one and a quarter S.E.s away from its target value and the degree of variability of those values is comparable with that of the observed data. For block B, the overall mean generated was 2.217 (target 2.129) and the generated incidence was 0.1004. Again given the above caveats, the generated and observed frequency distributions (Table 4) showed good concordance. One hundred sets of random samples from the generated data gave a mean sample mean of 2.307 (compare with target value of 2.129 and note informally that the corresponding value observed on occasion three was 2.104), a mean sample variance of 128.1 (note informally that the corresponding value observed on occasion three was 59.35), and a mean maximum value of 75.14 (note informally that the corresponding value observed on occasion three was 54). Two hundred power-law regressions were done from random samples of the generated counts; these gave a mean estimated value of log10 a of 0.708, with a mean S.E. of 0.068 (compare with target value of 0.629 with S.E. 0.542 from fitted relationship) and a mean estimated value of b of 1.989 with mean S.E. of 0.102 (compare with target value of 1.850 with S.E. of 0.094 from fitted relationship). The same number of incidence-mean regressions gave a

Fig. 4. Taylor’s power–law relationship at the small scale for observed data from occasion two and for a single random sample from the generated data, for block A. The observed data (*) and their associated fitted solid line extend over a larger range, particularly at smaller densities, than the generated data (open circles) and their fitted solid line, but there is otherwise little difference between the two sets. The dashed line where variance equals mean is shown for reference.

48

J.N. Perry et al. / Computers and Electronics in Agriculture 21 (1998) 33–51

Fig. 5. Incidence–mean relationship for observed data from occasion three and for a single random sample from the generated data, for block B. The observed data are shown as * and their associated fitted line is the top solid line; the generated data are shown as open circles and their fitted line is the lower solid line. The dashed line where transformed incidence equals transformed mean is shown for reference.

mean estimated value of log10 c of− 2.171, with a mean S.E. of 0.113 (compare with target value of− 1.826 with S.E. 0.297 from fitted relationship) and a mean estimated value of d of 0.145 with mean S.E. of 0.058 (compare with target value of 0.188 with S.E. of 0.083 from fitted relationship). The fit of the generated data from the first of these 200 regressions is shown, for visual comparison with that observed for occasion three, in Fig. 5. In all of the above cases the generated value is no more than one and a half S.E.s away from its target value and the degree of variability of those values is comparable with that of the observed data. In summary, visually and numerically, the generated data matches closely with the observed data and the relationships fitted to the observed data.

6. Discussion The verification of the simulations demonstrated satisfactorily that the methodology outlined can generate realistic counts, that represent observed counts from the field closely. The method presented here allowed for two spatial scales, within each

J.N. Perry et al. / Computers and Electronics in Agriculture 21 (1998) 33–51

49

of which there may be different relationships between variance and mean and between incidence and mean, and for the excessively skew data expected for insect counts; further strata in the hierarchy of spatial scales could be added as necessary. This methodology to simulate counts of insects in a field may be extended easily to weeds or disease incidence. Also, application of the methodology is not restricted to agronomy, but may be of use in ecology, especially for population–dynamic modeling, where simulations of metapopulations may require the generation of initial counts for subpopulations at particular stages of a season. It must always be realised that the price for realism in simulation is a degree of built-in imprecision. A simulation may be accurate, in the statistical sense of being unbiased and generating a value with expectation equal to the quantity being simulated. However, the essence of simulation in this context is that it should incorporate some stochastic element, so that there may be a range of outcomes; the simulation is of no use if it always produces the same result. Given that, although the simulation is accurate it may be imprecise, in the statistical sense that the variance of the value generated may be such that individual values may be far from the expected value. This may cause difficulties of assessment for situations, like those of insect counts studied here, where quantities have naturally large variability. Then, it may be difficult to match observed data from individual replicates or occasions, because the variability inherent in that observed data needs first to be accounted for and removed. Further, a large number of simulations may be required before assessment of highly-variable simulations is possible at all. Finally, although technically it is relatively straightforward to produce simulations that are accurate in the sense of being unbiased, the problem of achieving the correct variance may often be mathematically intractable, and require an approach of trial and error. From the point of view of the biological results, the incidence–mean relationship over time seems to point to an important feature of aphid spatial behaviour that, if consistent over different sites and years, must be allowed for in any proposed sampling scheme. Clearly, the shape of the frequency distribution of counts changed over time, becoming less excessively J-shaped or skew. Such a change of form over a season is not unusual (Perry and Taylor, 1988; Duchateau et al., 1993; Blackshaw and Perry, 1994), but occurs usually as a response to a change of density. Any explanation must in this case allow for the effect to be manifest at the same densities. One possible explanation for this result might be that the cue for aphids to leave a host plant because of overcrowding, following the establishment of a successful colony, occurs earlier in the development of that colony as the season progresses. Little is known of aphid plant to plant movement within the crop and this must be studied further in the field before a more definite explanation can be given. The differences found between the two blocks, and the extra-binomial heterogeneity found for one of the two blocks (A), casts doubt on the ability to generate, semi-automatically, large amounts of data, without an initial phase of fairly detailed analysis and without considerable thought being given to the simulation process. However, now that a realistic set of counts has been generated for each

50

J.N. Perry et al. / Computers and Electronics in Agriculture 21 (1998) 33–51

block, they may be arranged to conform to the observed spatial pattern within the blocks. This may be done, as argued in Section 1, as an exercise independent of the generation process, i.e. conditional on the counts generated. It may require the algorithm for arrangement of a set of counts with a desired degree of aggregation (Perry, 1996), modified to allow for the effects of pattern at two spatial scales, and will be described elsewhere. Acknowledgements This work was funded by the UK Ministry of Agriculture, Fisheries and Food. IACR-Rothamsted receives grant-aided support from the Biotechnology and Biological Sciences Research Council of the United Kingdom. Sam Korie was supported by a Rothamsted International Staff Fellowship. We gratefully acknowledge the ADAS Arthur Rickwood staff whose hard work provided the data from the Mepal site: Jo Fitzpatrick and Jackie Town. References Binns, M.R., Nyrop, J.P., 1992. Sampling insect populations for the purpose of IPM decision making. Ann. Rev. Entomol. 37, 427–453. Blackshaw, R.P., Perry, J.N., 1994. Predicting leatherjacket population frequencies in Northern Ireland. Ann. Appl. Biol. 124, 213–219. Bliss, C.I., 1941. Statistical problems in estimating populations of Japanese beetle larvae. J. Econ. Entomol. 34, 221–232. Clark, S.J., Perry, J.N., 1989. Estimation of the negative binomial parameter k by maximum quasi-likelihood. Biometrics 45, 309–316. Clark, S.J., Perry, J.N., 1994. Small scale estimation for Taylor’s power law. Environ. Ecol. Stat. 1, 287–302. Duchateau, L., Ross, G.L.S., Perry, J.N., 1993. Parameter estimation and hypothesis testing for Ade`s distributions applied to tse-tse fly data. Biome´trie Praxime´trie 33, 101 – 111. Greig-Smith, P., 1952. The use of random and contiguous quadrats in the study of the structure of plant communities. Ann. Bot. N.S. 16, 293– 316. Johnson, N.L., Kotz, S., 1969. Discrete Distributions. Houghton Mifflin, Boston. Kono, T., Sugino, T., 1958. On the estimation of the density of rice stems infested by the rice stem borer. Jpn. J. Appl. Zool. 2, 184–188. Parker, W.E., Turner, S.T.D., Perry, J.N., Blood-Smyth, J.A., Ellis, S.A., McKinlay, R.G., 1997. Development of a G.I.S.-based tool for testing field sampling plans by modelling the within-field distribution of mealy cabbage aphid in Brussels Sprouts. In: Stafford, J.V. (Ed.), Precision Agriculture Proceedings of the First European Conference. BIOS Scientific Publishers, Oxford, pp. 811 – 819. Payne, R.W., Members of the Genstat five Committee, 1993. Genstat Five Release Three Reference Manual. Oxford University Press, Oxford. Perry, J.N., 1982. Fitting split-lines to ecological data. Ecol. Entomol. 7, 421 – 435. Perry, J.N., 1987. Host-parasitoid models of intermediate complexity. Am. Nat. 130, 955 – 957. Perry, J.N., 1989. Review: population variation in Entomology: 1935 – 1950: I. sampling. Entomologist 108, 184–198. Perry, J.N., 1994. Sampling and applied statistics for pests and diseases. Aspects Appl. Biol. 37, 1 – 14. Perry, J.N., 1995. Spatial aspects of animal and plant distribution in patchy farmland habitats. In: Glen, D., Greaves, M., Anderson, H. (Eds.), Ecology and Integrated Arable Farming Systems. Wiley, Chichester, pp. 221–242.

J.N. Perry et al. / Computers and Electronics in Agriculture 21 (1998) 33–51

51

Perry, J.N., 1996. Simulating spatial patterns of counts in agriculture and ecology. Comput. Electron. Agric. 15, 93–109. Perry, J.N., 1997a. Statistical aspects of field experiments. In: Dent, D.R., Walton, M.P. (Eds.), Methods in Ecological and Agricultural Entomology, chapter 7. CABI, Wallingford, pp. 171 – 201. Perry, J.N., 1997. ‘Inchworm, inchworm, measuring the marigolds…’: Biometry in action. Inaugural Lecture Series of the University of Greenwich, ISBN: 1 86166 049 9, p. 30. Perry, J.N., Taylor, L.R., 1988. Families of distributions for repeated samples of animal counts. Biometrics 44, 881–890. Perry, J.N., Woiwod, I.P., 1992. Fitting Taylor’s power law. Oikos 65, 538 – 542. Taylor, L.R., 1961. Aggregation, variance and the mean. Nature Lond. 189, 732 – 735. Taylor, L.R., 1984. Assessing and interpreting the spatial distributions of insect populations. Ann. Rev. Entomol. 29, 321–357. Taylor, L.R., Woiwod, I.P., Perry, J.N., 1978. The density-dependence of spatial behaviour and the rarity of randomness. J. Anim. Ecol. 47, 383 – 406. Taylor, L.R., Woiwod, I.P., Perry, J.N., 1979. The negative binomial as a dynamic ecological model for aggregation and the density dependence of k. J. Anim. Ecol. 48, 289 – 304.

.