‘Catch me if you can’: Evaluating the population size in the presence of a spatial pattern

‘Catch me if you can’: Evaluating the population size in the presence of a spatial pattern

G Model ECOCOM 638 No. of Pages 11 Ecological Complexity xxx (2017) xxx–xxx Contents lists available at ScienceDirect Ecological Complexity journal...

2MB Sizes 0 Downloads 46 Views

G Model ECOCOM 638 No. of Pages 11

Ecological Complexity xxx (2017) xxx–xxx

Contents lists available at ScienceDirect

Ecological Complexity journal homepage: www.elsevier.com/locate/ecocom

Original Research Article

‘Catch me if you can’: Evaluating the population size in the presence of a spatial pattern N.B. Petrovskaya School of Mathematics, University of Birmingham, Birmingham B15 2TT, UK

A R T I C L E I N F O

Article history: Received 10 December 2016 Received in revised form 9 March 2017 Accepted 17 March 2017 Available online xxx Keywords: Population size Population density Spatial density distribution Sampling plan Sampling grid Coarse grid Sparse data

A B S T R A C T

Many biological and ecological problems require accurate evaluation of the total population size. We discuss a sampling procedure used for evaluation of the population abundance from information collected on a grid of spatial sampling locations. It is shown in the paper how insufficient information about the spatial population density obtained on a coarse sampling grid affects the accuracy of evaluation. The insufficient information is collected because of inadequate spatial resolution of the population density on coarse grids and this is especially true when a heterogeneous spatial population is sampled. It is argued in the paper that the evaluation error is a random variable on coarse sampling grids because of the uncertainty in sampling spatial data and a probabilistic approach should be employed in the evaluation procedure. We also show that there exists a threshold number of sampling locations on a regular sampling grid where we can guarantee desired accuracy of evaluation. Information about the threshold number of sampling locations allows one to reconcile the probabilistic approach based on the assumption about randomness of sampling data with the deterministic approach based on the requirement that spatial data are collected only once as the sampling procedure cannot be repeated under the same conditions. © 2017 Elsevier B.V. All rights reserved.

1. Introduction Many biological and ecological problems require evaluation of the total population size. This evaluation should be accurate as insufficient information about the total population size may make an undesirable impact on the ecosystem, e.g. unnecessary application of pesticide if the pest abundance is not correctly evaluated (Jepson and Thacker, 1990). Inaccurate evaluation of the total population size may also result in a wrong conclusion about presence or absence of some important ecological traits, e.g. synchronization between population fluctuations in different habitats (Petrovskaya and Petrovskii, 2017). A standard approach to evaluation of the total population size is to consider a simple estimate of the spatial population density integral (Davis, 1994; Snedecor and Cochran, 1980) where a sampling grid is used to collect data related to the spatial density distribution. The definition of a sampling grid depends on an ecological problem where sampling is required. For an estimate of the total population size to be accurate the sampling grid must capture sufficient information to adequately represent the true

E-mail address: [email protected] (N.B. Petrovskaya).

population size. One important consideration is the sampling plan, that is, the prescribed locations at which samples are to be taken. Comparisons of various spatial arrangements have been made in order to make recommendations (Alexander et al., 2005) but in many cases sampling locations are defined as nodes of a regular grid (Ferguson et al., 2000; Holland et al., 1999). Another important factor is the total number of locations where samples are taken. In some cases this number is derived based on theoretical recommendations (Taylor et al., 1978), while in many other cases this property of the sampling grid is decided ad hoc (Boag et al., 2010). For example, a widely used sampling technique is trapping where trap counts provide information about the population density at the position of the traps (Byers et al., 1989; Raworth and Choi, 2001). In a trapping procedure applied in routine insect pest monitoring programs the number N of traps rarely exceeds twenty per a typical agricultural field with a linear size of several hundred meters (Mayor and Davies, 1976) and in some cases N can be as small as one or a few traps per field (Northing, 2009). Sampling protocols do not usually allow one to make extensive repetition of sampling. Sometimes a pre-sample (or series of them) can be used to obtain a sample mean and sample variance from which an estimate of the number of sample units needed to

http://dx.doi.org/10.1016/j.ecocom.2017.03.003 1476-945X/© 2017 Elsevier B.V. All rights reserved.

Please cite this article in press as: N.B. Petrovskaya, ‘Catch me if you can’: Evaluating the population size in the presence of a spatial pattern, Ecol. Complex. (2017), http://dx.doi.org/10.1016/j.ecocom.2017.03.003

G Model ECOCOM 638 No. of Pages 11

2

N.B. Petrovskaya / Ecological Complexity xxx (2017) xxx–xxx

achieve a specified accuracy can be calculated; e.g. see Binns et al., 2000; Dent, 2000; Pedigo and Rice, 2009. However there always is a trade-off between the number of sample units needed to achieve sufficient accuracy and the number that can be practically afforded. Moreover even when the practitioners are not happy with the accuracy of the sampling procedure a repeated sampling with an increased number of samples is not available in most ecological applications due to the impossibility to reproduce the same sampling conditions in the environment (the air temperature, the soil moisture, etc.). The above restrictions on a sampling protocol lead to a question about accuracy of evaluation of the total population size on coarse sampling grids. It has been shown in our previous work (Embleton and Petrovskaya, 2013; Petrovskaya and Embleton, 2013, 2014) that the standard evaluation technique does not work when coarse grids are used for evaluation because of the insufficient information (uncertainty) in data collected on such grids. It was suggested in Petrovskaya and Embleton (2013) that the total population size on coarse sampling grids is a random variable and it has to be handled by using probabilistic techniques. However, while a probabilistic approach implies that multiple realisations of a random variable are available in the evaluation task, the sampling procedure is ‘deterministic’ because it has to deal with a single realisation of the total population size based on data collected only once. How can then practitioners know that a single realisation of the random variable gives them a sufficiently reliable result? In the present paper we discuss this issue and introduce reliability criterion required to ‘reconcile’ the probabilistic and deterministic approaches when the total population size is evaluated. The paper is organized as follows. In Section 2 we briefly revisit the problem of evaluation of the total population size and introduce the concept of uncertainty in sampling data related to a spatial pattern of the density distribution. We discuss how insufficient information about the spatial pattern can be re-formulated in terms of location of grid nodes on a coarse sampling grid. We then consider the evaluation error as a random variable and calculate probabilistic characteristics of the total population size in Section 3. We discuss the probability of accurate evaluation of the population size on a sampling grid with a given number of grid nodes in Section 4. It will be argued in Section 4 that probabilistic characteristics of the evaluation error become insignificant when a sampling grid is fine enough to guarantee that the desired accuracy can be achieved with the probability equal to one. Thus transition from a probabilistic problem to a deterministic problem can be formulated in terms of a certain threshold number of grid nodes on a sampling grid. Several numerical examples are provided to illustrate the above argument. Finally, in Section 5 we summarise our experience with the problem of accurate evaluation of the total population size. 2. Evaluation of the total population size on coarse sampling grids In the present paper we do not refer to any particular sampling technique yet we assume that samples are collected at nodes of a regular spatial grid. This assumption can be readily linked to sampling procedures used in real-life ecological applications such as e.g. pitfall trapping of invertebrates (Woodcock, 2008). Furthermore we assume that sampling brings us reliable information about the population density u(x,y), i.e. we have exact values u(xi, yi) of the population density at sampling locations (xi, yi), i = 1, . . . , N. The further discussion of the latter assumption can be found in Bearup et al. (2015).

If the population density function u(x, y) were known at any point (x, y) of the domain D then the total population size I would be given by ðð I¼ uðx; yÞdxdy: ð1Þ D

However we have a discrete density distribution as a result of sampling and we therefore have to replace integral (1) with a weighted sum of density values. The following formula is widely used in practical ecological applications, e.g. see Davis (1994) and Snedecor and Cochran (1980), I  Ia ðNÞ ¼

N AX u; N i¼1 i

ð2Þ

where Ia is the approximate value of the total population size I and A is the area of the domain. It is important to note that the result of evaluation Ia depends explicitly on the number N of sampling locations, Ia = Ia(N). The evaluation error has to be introduced since the exact population size is replaced by some approximation Ia. The relative evaluation error is defined as eðNÞ ¼

jI  Ia ðNÞj ; I

ð3Þ

where we assume that the exact population size is I > 0. Conclusions about accuracy of evaluation can then be made based on the following requirement eðNÞ  t ;

ð4Þ

where t is specified tolerance. In ecological applications the accuracy requirement (4) is not very demanding as the tolerance 0.2 < t < 0.5 is already considered as acceptable (Pascual and Kareiva, 1996; Sherratt and Smith, 2008). However we will see later in the paper that even this relatively high tolerance cannot always be provided. One important feature of approximation (2) is convergence of Ia to the exact value I when the hypothetical number of sampling locations is very large. We have the absolute error and the relative error of evaluation jI  Ia ðNÞj ! 0 as N ! 1;

eðNÞ ! 0 as N ! 1:

ð5Þ

At the first glance, the choice of method (2) may not be optimal for evaluation of total population size (1). From a numerical integration viewpoint the method (2) can be loosely interpreted as the midpoint rule of integration where the whole domain is subdivided into N subdomains with the area of each subdomain given by A/N and the density function u(x) is approximated by constant ui, i = 1, 2, . . . , N in each of those subdomains. Convergence properties (5) of the midpoint rule are well studied and it also is well known, e.g. see Davis and Rabinowitz (1975) that its convergence rate is inferior to more advanced methods of numerical integration, e.g. the Simpson method on regular grids or Gauss quadratures Petrovskaya et al. (2012). However, the aim of this paper is to study the accuracy of the total population size evaluation on coarse sampling grids where the number N of sampling locations is small. It will be shown later in the paper that accuracy of evaluation on coarse grids is not related to the convergence rate of the method when N is large. Hence methods with higher convergence rate cannot be immediately recommended for the purpose of our study and for the sake of clarity we choose to consider a simple procedure (2) instead of an advanced method of numerical integration in order to introduce our approach to accuracy evaluation on coarse grids. Coarse sampling grids are typical in many sampling protocols because financial and labor resources available for sampling are

Please cite this article in press as: N.B. Petrovskaya, ‘Catch me if you can’: Evaluating the population size in the presence of a spatial pattern, Ecol. Complex. (2017), http://dx.doi.org/10.1016/j.ecocom.2017.03.003

G Model ECOCOM 638 No. of Pages 11

N.B. Petrovskaya / Ecological Complexity xxx (2017) xxx–xxx

very often limited. While there are certain economical constraints on the number of sampling locations, it seems to be obvious that a small number of samples may bring wrong information about the population density u(x, y) which, in turn, may result in inaccurate evaluation of the total population size. However, the above statement is not always true. It has been argued in Petrovskaya and Embleton (2013) that we may occasionally have very accurate results even on very coarse sampling grids. Let us further illustrate this problem with the following example. Consider square domain D shown in Fig. 1 and let us generate two distributions of the spatial population density in the domain. It is worth noting here that there are a plenty of various theoretical frequency distributions, e.g., the Poisson distribution and the negative binomial distribution, that are used to describe sampling data collected for different species and under different ecological conditions (Bolker, 2008; Young and Young, 1998) However, it was noticed in Bolker (2008) and previously in Pielou (1977) that theoretical frequency distributions do not completely cover the wealth of spatial density patterns arising in nature. Hence, instead of data generation from a frequency distribution with certain properties we obtain density distributions from a generic mathematical model describing spatiotemporal population dynamics. The model is based on a system of coupled diffusionreaction equations (e.g. see Murray, 1989) where it is possible to generate different ecologically meaningful density distributions by varying the system parameters. Two examples of spatial density distributions numerically generated from the model are shown in Fig. 1. It is convenient to label the density function u(x, y) shown in Fig. 1a as a ‘continuous front’ density distribution, while we refer to the function in Fig. 1b as a ‘highly aggregated’ density distribution. Let us note that both types of the spatial density distribution in Fig. 1 can be found in nature. In particular, highly aggregated spatial distributions often appear in ecological systems (Ferguson et al., 2000; Holland et al., 1999) and their monitoring is a difficult task because the location of a non-zero density subdomain is not known a priori, Moreover, that location constantly changes with time and, as a patch of the non-zero density has been detected we cannot expect it to be found at the same place in the next round of monitoring. Since the density function u(x, y) has been obtained from numerical solution of a system of partial differential equations we assume that u(x, y) is a continuous function and is available to us at any point of the domain D. We also assume that we know the exact value of the total population size from (1).

3

We now simulate a sampling procedure in domain D where in both cases the values of the density function are taken at nodes of a regular sampling grid. Taking samples at nodes of a regular grid is a common case in ecological applications (Ferguson et al., 2000; Holland et al., 1999). A regular grid in a square domain with linear size L is generated as follows. We consider a set of points xi, i = 1, . . . , N1 at the interval [0, L], where we require that x1 = a > 0, xi+1 = xi + h1, i = 1, . . . , N1 1, xN1 ¼ b < L and the grid step size h1 is h1 = (b  a)/N1. We then consider a set of points yj, j = 1, . . . , N2 in the domain [0, L] and generate a one-dimensional grid in the y-direction by the same rule as above, where the grid step size is defined as h2 = (d  c)/N2 for some 0 < c < d < L. The grid point coordinates (xi, yj) are then determined by taking points of the respective one-dimensional grids in the x and y directions. We require that the corner points of the sampling grid are sufficiently close to the corner points of domain D (i.e. the sampling grid covers most of the domain; cf. Fig. 1). Let point A = (a, c) be the bottom left corner of a sampling grid shown in Fig. 1. Application of the formula (2) to the density distribution in Fig. 1a provides a very accurate approximation Ia of the exact population size. It has been confirmed in our computation that the evaluation error (3) is e  102 when the evaluation procedure (2) is used for the continuous front density distribution of Fig. 1a. Meanwhile evaluation of the density distribution in Fig. 1b made on the same sampling grid results in an extremely inaccurate answer with the error e  1. It becomes immediately clear from Fig. 1b that such a large error appears because any non-zero value of the density function is missed when u(x, y) is considered at grid points only. Let us now move the bottom left corner of the grid from point A to point A0 = (a0 , c0 ). A visual examination of Fig. 1a reveals that the values of function u(x, y) will change slightly if the distance between A0 and A is not large. A slight change in the function values will result in a slight change in the evaluation error. This conclusion is further confirmed by direct computation of the error e  102 on the new grid. However, when we apply the formula (2) to the density function on the grid defined by point A0 in Fig. 1b, the evaluation error decreases dramatically from e  1 to e  103. The explanation of this very accurate estimate of the total population size is that we now have high values of the density u(x, y) at grid points B0 and C0 (shown as open circles in Fig. 1b) and when those values are cast

Fig. 1. (a) A ‘continuous front’ spatial density distribution. (b) A ‘highly aggregated’ spatial density distribution. A regular grid (a grid of solid lines in the figure) is used for data sampling in square domain D where the domain boundary is shown as a bold solid line in the figure. Different information about density values at grid points will be collected when the bottom left corner of the grid is shifted from point A to point A0 and a new sampling grid (a grid of dashed lines) is generated.

Please cite this article in press as: N.B. Petrovskaya, ‘Catch me if you can’: Evaluating the population size in the presence of a spatial pattern, Ecol. Complex. (2017), http://dx.doi.org/10.1016/j.ecocom.2017.03.003

G Model ECOCOM 638 No. of Pages 11

4

N.B. Petrovskaya / Ecological Complexity xxx (2017) xxx–xxx

into the formula (2) their averaging with the zero function values at the other grid points will occasionally give us an accurate answer Ia . When a sampling protocol is designed we know neither the location of the front line in Fig. 1a nor the location of a sub-domain of the non-zero density in Fig. 1b. In fact, we often do not even know whether the population distribution is relatively smooth (as in Fig. 1a) or highly aggregated (as in Fig. 1b). That is why our original choice of point A as the bottom left corner of the grid has no advantage over a choice of point A0 . Hence we have uncertainty in our sampling procedure as we do not know the optimal position of grid nodes and, as we could see in our discussion, this uncertainty may seriously affect the results of evaluation. Any change in the location of grid points would result in a different value of Ia. Let us emphasize again that we cannot decide about the optimal location of grid nodes in the domain, as we do not have a priori information about the spatial density distribution in the domain and therefore our choice of point A is random. A standard sampling protocol does not allow for changing the number N of sampling locations or moving any sampling location to a different position. In other words, in real-life applications evaluation formula (2) is applied just once and the results cannot be recomputed for different N or different sampling locations. Once grid points (i.e. sampling locations) have been stationed, our evaluation of the total population size should be considered as a single realisation of random variable Ia. Consequently the evaluation error is a random variable too. On the other hand the uncertainty related to the choice of sampling locations must disappear on a sampling grid with a very large number of grid points because of convergence property (5). Hence our next aim is to investigate how the uncertainty in evaluation depends on the number of grid points on a sampling grid. That will be done in the next section.

3. Probabilistic characteristics of the total population size For the rest of this paper we restrict our consideration to a onedimensional (1d) spatial density distribution u(x). The 1d case corresponds to a sampling procedure where samples are collected along a line. While the 1d case presents a relatively simple mathematical problem in comparison with the 2d case, it has also been shown in Alexander et al. (2005) that choosing sampling locations along a line can be an efficient sampling strategy and this sampling technique is used in some real-life ecological applications. For example, installing pitfall traps along a linear transect is a recommended sampling technique for the identification of the effects of environmental gradients on invertebrate communities (Woodcock, 2008) (see also references therein). Consider domain D : x 2 [a, b] of size L = b  a. In our further discussion we assume that we know the true population size I given by Z b uðxÞdx; ð6Þ I¼ a

where u(x) is the population density. Let us now assume that the first sampling location x1 can be stationed anywhere in the sub-domain [a, a + h], where h = (b  a)/ N is a grid step size on a grid of N nodes. Namely, we assume that x1 is a random variable with the uniform distribution over the interval [a, a + h] so it has the probability density function given by pðx1 Þ ¼ 1=h:

ð7Þ

A regular 1d sampling grid is generated as xi+1 = xi + h, i = 1, . . . , N  1, where we have xN = b for x1 = a + h and xN = b  h for x1 = a. The location of each grid node is therefore related to the location of the

first grid point x1 as xi+1 = x1 + ih. Hence sampling data ui  u(xi) become random on a sampling grid with fixed number N of grid points as we have u(xi) = u(x1 + (i  1)h) at any grid point i = 1, . . . , N. The evaluation of the total population size based on random data will result in random variable Ia defined as N LX uðx1 þ ði  1ÞhÞ: N i¼1

Ia ðx1 Þ ¼

ð8Þ

Consider evaluation error (3). Since the error depends on a random location of point x1, it becomes a random variable too, eðx1 Þ ¼

jIa ðx1 Þ  Ij : I

ð9Þ

Let us notice that Ia(x1) and e(x1) are continuous random variables because x1 is continuous over the interval [a, a + h]. A standard approach to handle random variables Ia(x1) and e(x1) is to calculate their mean and the standard deviation on a grid of N points. Since integral Ia and error e are continuous random variables, the mean m and the standard deviation s are given by the following expressions: Z y2 i mi ¼ pðyi Þyi dyi ; ð10Þ y1

i

and

si ¼

sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Z y2 i pðyi Þðyi  mi Þ2 dyi ;

ð11Þ

y1

i

where yi is a random variable (evaluation Ia of the total population size or evaluation error e), p(yi) is the probability density function for random variable yi, and the subscript i can be Ia or e depending on what random variable is considered. The expressions (10) and (11) need to be calculated on a grid with fixed number N of grid points. We note that changing the value of N will result in a different answer for mi and s i, so we have mi = mi(N) and s i = s i(N). It is known (e.g. see Davis, 1994) that jI  mIa ðNÞj ! 0 as N ! 1;

me ! 0 as N ! 1;

ð12Þ

and

s Ia ! 0 as N ! 1;

s e ! 0 as N ! 1:

ð13Þ

The expressions (12) and (13) give us a clear answer about accuracy in the asymptotic limit N ! 1 based on integral characteristics (10) and (11). A real-life evaluation procedure, however, does not supply us with this information as we have to deal with a single realisation of random variable Ia. Moreover, a single realisation of Ia is obtained on a sampling grid where N is relatively small and it is unclear how close a particular single realisation of Ia is to the mean value. A difference between the mean and a single realisation obtained as a result of the sampling procedure is further illustrated by two numerical examples. We consider two ‘extreme’ test cases that present 1d counterparts of the continuous front and the highly aggregated spatial density distributions shown in Fig. 1a and b respectively. In the first test case the density function u(x) is given by uðxÞ ¼

1 ; 1 þ sin 2 x

x 2 ½0; p=2:

ð14Þ

The function u(x) shown in Fig. 2a can be thought of as a snapshot of spatio-temporal density distribution when the population starts spreading from the left boundary of the domain that originally has zero population density. The true population size in the domain is pffiffiffi I ¼ 2ðp=4Þ. Let us fix number N of grid points and vary the location x1 of the first grid node from x1 = 0 to x1 = h, where h is the grid step size. The

Please cite this article in press as: N.B. Petrovskaya, ‘Catch me if you can’: Evaluating the population size in the presence of a spatial pattern, Ecol. Complex. (2017), http://dx.doi.org/10.1016/j.ecocom.2017.03.003

G Model ECOCOM 638 No. of Pages 11

N.B. Petrovskaya / Ecological Complexity xxx (2017) xxx–xxx

5

Fig. 2. (a) The spatial population density distribution (14). The domain size is x 2 [0, p/2]. (b) The function Ia(x1) generated on a regular grid of N = 3 nodes. The frequency fi of having the value Ii of random variable Ia/I when (c) a coarse grid of N = 3 nodes and (d) a fine grid of N = 33 nodes is used for evaluation of the total population size. Variable Ia is scaled with the value I of the true population size which is shown as dashed line Ia/I = 1 in the figure.

total population size Ia given by (8) becomes a continuous function of the variable x1 when x1 rises monotonically from 0 to h. The range of variable Ia when the value of x1 changes in the range x1 2 [0, h] is shown in Fig. 2b. We have Ia(x1) 2 [Imin, Imax] where Imin = Ia(h) and Imax = Ia(0) for the population density distribution (14). Any random location of point x1 will then generate a random value taken from [Imin, Imax] according to (8). It also is important to note that the function Ia(x1) shown in Fig. 2b has been generated on a grid of N = 3 points and we will have another range of Ia on a grid with a different number of nodes. The frequency fi of having the value Ii of the variable Ia/I is shown in Fig. 2c. This figure corresponds to Fig. 2b as the variable (8) is computed on a grid of N = 3 points when a location of the first grid point is randomly taken from uniform distribution (7). It can be seen from the figure that all realisations of random variable Ia/I are close to the true population size (dashed line Ia/I = 1 in the figure) and any single realisation of random variable Ia obtained as a result of the sampling procedure can therefore be considered as a good approximation to the true population size. It can also be concluded in Fig. 2c that all realisations of random variable Ia occur with approximately equal probability. The probability distribution in Fig. 2 depends on the number N of grid points in formula (8). As we increase the number of grid points the range of Ia gets smaller because requirement (5) holds for any single realisation of Ia. This conclusion is illustrated in Fig. 2d where the probability distribution is shown for calculation (8) made on the grid of N = 33 points. All values of Ia/I in Fig. 2d are within 1.5% of the exact value Ia/I = 1, while the range in Fig. 2c is within 15%. Consider now the 1d counterpart of highly aggregated spatial distributions discussed in the previous chapter (cf. Fig. 1). The

mathematical expression for a 1d ‘single peak’ density distribution is given by 2

uðxÞ ¼ A exp ððx  x Þ2 =2d Þ;

x 2 ½0; L;

ð15Þ

where L, A, d and x* are parameters and the total population size is pffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffi I ¼ Ad ðp=2Þerf ðL  x = 2dÞ. The function (15) is shown in Fig. 3a. The range of variable Ia produced for spatial distribution (15) when the value of x1 changes continuously from x1 = 0 to x1 = h is shown in Fig. 3b where the function Ia(x1) has been computed on a grid of N = 3 points. The range and the shape of Ia(x1) are very different from that in Fig. 3b and Ia is not a monotone function of x1 anymore. This difference is further illustrated in Fig. 3c where the frequency fi of having the value Ii of random variable Ia/I is computed on a grid of N = 3 points. First, the range [Imin, Imax] of random variable Ia is very big and only some realisations can be considered as a good approximation to the true population abundance Ia/I = 1 (dashed line in the figure). Also, the probability distribution in Fig. 3c is not quasi-uniform (cf. Fig. 2c) and therefore inaccurate realisations of random variable Ia are more likely to appear as a result of the sampling procedure. As we increase the number of grid points the range of Ia gets smaller but not as significantly as for the population density distribution (14) in the previous test case. The probability distribution computed on a grid of N = 33 points still has the range of Ia within approximately 20% of the true population size Ia/I = 1 (see Fig. 3d). This range is greater than the range of Ia computed for the population density distribution (14) on the grid of N = 3 nodes only. We conclude from our inspection of Fig. 3 that information provided by a single realisation of random variable can be

Please cite this article in press as: N.B. Petrovskaya, ‘Catch me if you can’: Evaluating the population size in the presence of a spatial pattern, Ecol. Complex. (2017), http://dx.doi.org/10.1016/j.ecocom.2017.03.003

G Model ECOCOM 638 No. of Pages 11

6

N.B. Petrovskaya / Ecological Complexity xxx (2017) xxx–xxx

Fig. 3. (a) The spatial population density distribution (15). The problem parameters are A = 15.0, d = 3.0, x* = 89.3 and the domain size is x 2 [0, 300]. (b) The function Ia(x1) generated on a regular grid of N = 3 nodes. The frequency fi of having the value Ii of random variable Ia/I when (c) a coarse grid of N = 3 nodes and (d)a fine grid of N = 33 nodes is used for evaluation of the total population size. The true population size appears as dashed line I_a/I=1 in the figure.

misleading on coarse grids. This conclusion is further confirmed by calculation of the mean value and the standard deviation of a random variable. Consider variable y in (10) and (11) where the subscript ‘i’ is omitted for the sake of convenience and let the range of y be y 2 [y1, y2]. We divide the interval [y1, y2] into M subintervals or ‘bins’ and set the size of each bin as hy = (y2  y1)/M. The integrals (10) and (11) are then approximated by the midpoint rule. Namely, we consider the following approximation of the integral (10) Z

y2

pðyÞydy 

y1

M X

~m Þy ~m hy ; pðy

ð16Þ

m¼1

where ym+1 = ym + hy for m = 1, 2, . . . , M  1 and   ~m ¼ ym þ yfmþ1g =2 is the midpoint of the subinterval [ym, ym y +1]. The probability density function p(y) is given by p(y) = dP(y)/dy,

where P(y) is the cumulative probability defined for continuous ~m Þ is approximated as variable y. Hence pðy ~m Þ  pðy

Pðymþ1 Þ  Pðym Þ Pðymþ1 Þ  Pðym Þ ¼ : ymþ1  ym hy

The cumulative probability can be computed as PðyJ Þ ¼

J1 X kj ; N r j¼1

J ¼ 2; . . . ; M;

ðPðy1 Þ ¼ 0Þ

where kj is the number of realisations of random variable y in the subinterval [yj, yj+1] and Nr is the total number of realisations of y we use in our computation. Hence we have Pðymþ1 Þ  Pðym Þ ¼

km ; Nr

and substituting the above values into (16) we arrive at the following formula for the mean (the index i is omitted):

m

M X km ym þ ymþ1 : N 2 m¼1 r

ð17Þ

Similarly, the midpoint rule of integration employed to approximate the formula (11) results in vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u M 2 Z y2 u X km y þ y 2 m mþ1 s¼ pðyÞðy  mÞ dy  t ð18Þ m ; N 2 y1 m¼1 r where m is given by (17). The formulae (17) and (18) give us approximation of the mean and the standard deviation for variables y = Ia and y = e. Since the mean and the standard deviation are functions of the number N of grid nodes we compute them on a sequence of regular grids and we compare the upper bound me(N) + s e(N) with the tolerance t . The graph me(N) obtained for the ‘continuous front’ density distribution (14) along with the lower error bound me(N)  s e(N) and the upper error bound me(N) + s e(N) are shown in Fig. 4a where a logarithmic scale is used for the y-axis. In our computation we use t = 0.25, Nr = 10,000 and the range of variable e is split into M = 32 subintervals. The results of Fig. 4a confirm good accuracy of the sampling procedure, as even on a very coarse grid of N = 2 points the upper error bound is below the tolerance t . Let us check how close the probability density function for the evaluation error e is to the probability density function when the error e is uniformly distributed. Namely, we calculate the mean and the standard deviation of the error under the assumption of p (y) = const in formulae (10) and (11). The integral approximation

Please cite this article in press as: N.B. Petrovskaya, ‘Catch me if you can’: Evaluating the population size in the presence of a spatial pattern, Ecol. Complex. (2017), http://dx.doi.org/10.1016/j.ecocom.2017.03.003

G Model ECOCOM 638 No. of Pages 11

N.B. Petrovskaya / Ecological Complexity xxx (2017) xxx–xxx

7

Fig. 4. (a)–(b) Analysis of the evaluation error for the ‘continuous front’ spatial density distribution (14): (a) The functions me(N) (curve I, open circles), me(N)  s e(N) (curve II, closed squares), and me(N) + s e(N) (curve III, closed triangles) computed on a sequence of regular grids. For any fixed number N of grid nodes the values me and s e are computed from (17) and (18) respectively. (b) The difference jme ðNÞ  mae ðNÞj (curve I, open diamonds) between the mean (17) and the arithmetic mean (19). The difference js e ðNÞ  s ae ðNÞj (curve II, closed diamonds) between the standard deviation (18) and the arithmetic standard deviation (20). (c)–(d) Analysis of the evaluation error for the highly aggregated spatial density distribution (15): (c) the functions me(N) and me(N) s e(N). (d) The functions jme ðNÞ  mae ðNÞj and js e ðNÞ  s ae ðNÞj. The figure legends in Fig. 4c and d are the same as in Fig. 4a and b respectively. A logarithmic scale is used for all functions of variable N in the figure.

then results in the arithmetic mean

mae 

Nr 1 X en ; Nr n¼1

ð19Þ

Nr is the total number of realisations of the random evaluation error. The expression for the standard deviation becomes vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u Nr u1 X a se  t ðen  mae Þ2 : ð20Þ Nr n¼1 The difference jme  mae j is shown for the density distribution (14) in Fig. 4b. In the figure we also show the difference js e  s ae j. In both cases the values obtained in (17) and (18) are very close to the arithmetic mean (19) and standard deviation (20). We therefore conclude that the probability density function for the distribution (14) can be thought of as a uniform distribution and any realisation of the random variable e is within given accuracy. Hence any realisation of Ia gives us good approximation of the total population size and this conclusion always holds, no matter how small the number N of sampling locations is. We now repeat computation of Fig. 4a and b for highly aggregated density distribution (15). The graph me(N) obtained for density distribution (15) along with graphs for the lower error bound me(N)  s e(N) and the upper error bound me(N) + s e(N) are shown in Fig. 4c where a logarithmic scale is used for the y-axis. Let

us note here that the standard deviation s e(N) is so large on grids with N < 5 that the variable me(N)  s e(N) is negative on those grids and is therefore not shown on the graph. It can be seen from the graphs that we need intense grid refinement in order to approach the desired accuracy as the requirement

me ðNÞ þ s e ðNÞ  t

ð21Þ

first holds on a grid of N = 33 points. Furthermore the difference between jme  mae j and js e  s ae j is significant on coarse grids when the function (15) is considered (see Fig. 4d). This result indicates that the probability density function for the evaluation error presents a strongly nonuniform distribution of a random variable e on coarse grids. Consequently a significant number of realisations of random variable e may still have e(N) > t when the upper boundary me(N) + s e(N) is close to tolerance t on a grid of N nodes. Since we have to deal with a single realisation of random variable e we want to take the above feature of e into account and, following our previous work (Petrovskaya and Embleton, 2013), we suggest another criterion of accurate evaluation. This criterion will be discussed in the next section. 4. The probability of accurate evaluation and its properties We consider the probability p(N) of the event that the error is e  t on a grid of N nodes. The aim of probability computation is to

Please cite this article in press as: N.B. Petrovskaya, ‘Catch me if you can’: Evaluating the population size in the presence of a spatial pattern, Ecol. Complex. (2017), http://dx.doi.org/10.1016/j.ecocom.2017.03.003

G Model ECOCOM 638 No. of Pages 11

8

N.B. Petrovskaya / Ecological Complexity xxx (2017) xxx–xxx

find threshold number Nt of grid nodes such that pðNÞ ¼ 1;

ð22Þ

for any N Nt . In other words, we require that the probability of obtaining the desired accuracy of evaluation is always p = 1 when we approach threshold Nt , no matter where the first grid node x1 is located. Information about threshold number Nt allows one to reconcile the probabilistic approach based on the assumption about randomness of calculation with the deterministic approach based on the assumption that a single realisation obtained as a result of the sampling procedure does bring us reliable information about the total population size. Indeed if the criterion (22) holds then the accuracy requirement (4) holds for any single realisation of the error obtained in the sampling procedure. The error range becomes therefore insignificant and any single realisation of the total population size can be considered as a correct answer. Hence if (22) holds we may think of any single realisation of the total population size as a deterministic variable. On the contrary, if the requirement (22) does not hold, then the results of sampling must be handled from a probabilistic viewpoint. In this case risk assessment should be incorporated in the sampling protocol as a single realisation of the total population size obtained as a result of sampling may give us a very inaccurate answer. We use statistical simulation to calculate the probability function p(N), pðNÞ ¼

^r N ; Nr

ð23Þ

where Nr is the total number of realisations of random variable e, ^ r is the number of realizations for which the random error is e  t , N and t is the given tolerance. Random variable x1 is taken from the uniform distribution (see discussion in Section 3). In our numerical experiments we select Nr = 10,000 and t = 0.25. We first compute probability (23) for spatial density distribution (14). The results of our computation show that the probability p(N) = 1 on any grid with N 2. Hence, no matter how many sampling locations we use, the sampling procedure will bring us a reliable answer for the total population size if the accuracy of evaluation has to be within 25%. Let us now compute probability (23) for spatial density distribution (15). The probability p(N) is shown in Table 1 as a function of number N of grid points. In the table we also show the mean me(N) and the upper error bound me(N) + s e(N). The probability p(N) remains very small on coarse grids and the condition p = 1 is only achieved when a grid is intensively refined. The threshold number of grid points is Nt = 33. It can be seen from the table that the requirement (21) holds on grids with N Nt points. However, the condition (21) does not properly work on any grid with a number of grid points N < Nt . In particular we have me(N) + s e(N) = 0.2502  t on a grid with N = 32 nodes while the probability of accurate evaluation (4) is less than 80% on that grid. Hence even if the requirement (21) holds we may still have occasionally ~ e > t for a single realisation  e of error e obtained in

the sampling procedure. On a relatively fine grid of N = 30 points we only have the 50% chance of getting an accurate answer as p  0.5, while on coarse grids with N < 10 this chance is extremely low (p(N)  0.001  0.003). Moreover, as the probability density function for random variable e is not constant (cf. Fig. 3d) the risk of having inaccurate evaluation remains high even when the number of grid nodes is close to threshold value Nt . The above results have been obtained for prescribed tolerance t = 0.25. More generally, the probability p is a function of t as shown in Fig. 5a where the function p(N) is computed for spatial density distribution (15) when d = 4.0 and tolerance t varies. It is seen from the figure that the probability of accurate evaluation increases on any grid with fixed number N of nodes as we increase the tolerance from t = 0.1 (curve 1 in Fig. 5a) to t = 0.4 (curve 3). In other words, we need a fewer number of grid nodes if we want to evaluate the same spatial peak with lower accuracy t. One interesting observation, however, is that threshold number Nt of grid points required to achieve the critical value p = 1 does not increase significantly when the tolerance changes, as we have Nt = 22 for t = 0.4 and Nt = 30 for t = 0.1. Meanwhile in the former case we have a rather unreliable estimate as the error (9) is within 40%, while in the latter case the condition (22) guarantees that the error will be within 10% on a regular grid of Nt = 30 points, no matter how the density peak is located with respect to grid nodes. It follows from the above analysis that the probability p of the event e  t depends on the shape of density function u(x). In particular it was discussed in Petrovskaya and Embleton (2013) that, when single peak (15) is considered, threshold number Nt of grid nodes required to achieve p(Nt ) = 1 can be evaluated as Nt 

a w

;

ð24Þ

where w is the peak width and parameter a = a(t ) depends on the peak shape, the chosen tolerance, and the domain size. This result is illustrated by probability graphs in Fig. 5b. The probability has been calculated for single peak (15), where parameter d varies from 2.0 to 8.0 and peak width w can be evaluated as w  6d. For the narrowest peak with d1 = 2.0 the number of grid points Nt = 49 required to achieve p = 1 is four times higher than for the widest peak with d3 = 4d1 where we have Nt = 13. It is important to note that, while the estimate (24) holds for a single density peak, the probability function p(N) exhibits more complex behaviour when a heterogeneous density distribution with several peaks is considered. We model the 1d counterpart of a patchy density distribution as uðxÞ ¼ 4p2 xsinðApxÞcosðBpxÞ þ C;

ð25Þ

where in our computation we choose x 2 [0, 1], A = 20.0, B = 2.0 and C = 50.0. The function u(x) is shown in Fig. 6a. The range of variable Ia when the value of x1 changes continuously from x1 = 0 to x1 = h is shown in Fig. 6b for several coarse grids (cf. Figs. 2b and 3b). It can be seen from the figure that on grids of N = 2 and N = 4 nodes function Ia(x1) has a similar shape while its shape is different on a grid of N = 3 nodes. Hence the

Table 1 The probability p(N), the mean me(N) and the upper error bound me(N) + s e(N) computed on a sequence of regular grids for spatial density distribution (15). N is the number of grid nodes on a sampling grid. The problem parameters for distribution (15) are the same as in Fig. 3. N

2

3

4



30

31

32

33

34

p(N) me(N) me(N) + s e(N)

0.0083 1.7105 4.957

0.014 1.6173 3.8879

0.0203 1.5768 3.4374



0.529 0.2154 0.3194

0.6254 0.1916 0.2832

0.7859 0.1687 0.2502

1 0.1483 0.2201

1 0.1293 0.1925

Please cite this article in press as: N.B. Petrovskaya, ‘Catch me if you can’: Evaluating the population size in the presence of a spatial pattern, Ecol. Complex. (2017), http://dx.doi.org/10.1016/j.ecocom.2017.03.003

G Model ECOCOM 638 No. of Pages 11

N.B. Petrovskaya / Ecological Complexity xxx (2017) xxx–xxx

9

Fig. 5. (a) Graphs of the probability function p(N) for d = 4.0 in (15) and various values of the tolerance: t = 0.1 (curve 1), t = 0.25 (curve 2), t = 0.4 (curve 3). (b) Graphs of the probability function p(N) for various values of parameter d in (15): d1 = 2.0 (curve 1), d2 = 4.0 (curve 2), d3 = 8.0 (curve 3). The tolerance is fixed as t = 0.25.

Fig. 6. (a) Spatial density distribution (25). (b) Function Ia(x1) generated on a regular grid of N = 2 (solid line), N = 3 (dashed line), and N = 4 (dash-dotted line) nodes. (c) Functions s e(N) (dashed line) and me(N) + s e(N) (solid line). (d) Probability p(N) for tolerance t = 0.25.

probability density function for random variable Ia will be different on coarse grids with an odd and even number of points and we assume that this difference will be reflected in the values of the standard deviation s e(N) and the upper error bound me(N) + s e(N). The graphs of s e(N) and me(N) + s e(N) are shown in Fig. 6c for tolerance t = 0.25. Both graphs have oscillatory behaviour on coarse grids where increasing the number of grid nodes does not necessarily results in decreasing the standard deviation of the error. In other words, test case (25) demonstrates that increasing the number of grid nodes on coarse grids does not always guarantee

more accurate evaluation of the total population size when patchy density distributions are considered. The above conclusion is further confirmed by consideration of probability function p(N) shown in Fig. 6d. While p(N) was a monotone function in our previous test cases, it now has several occasional jumps before it finally approaches the required value p (N) = 1 for N Nt = 13. The graph of p(N) is well synchronised with graphs of s e(N) and me(N) + s e(N), as peaks of function p(N) clearly correspond to troughs in the graph of me(N) + s e(N). We have shown that probability p(N) of an accurate answer depends on tolerance t prescribed in the evaluation procedure. We

Please cite this article in press as: N.B. Petrovskaya, ‘Catch me if you can’: Evaluating the population size in the presence of a spatial pattern, Ecol. Complex. (2017), http://dx.doi.org/10.1016/j.ecocom.2017.03.003

G Model ECOCOM 638 No. of Pages 11

10

N.B. Petrovskaya / Ecological Complexity xxx (2017) xxx–xxx

5. Conclusions

Fig. 7. Spatial density distribution (25). Graphs of the probability function p(N) for tolerance t = 0.05 (solid line), t = 0.2 (dash-dotted line), and t = 0.4 (dotted line).

therefore can expect different behaviour of the probability function depending on the level of accuracy we require in the problem. Graphs p(N) are shown in Fig. 7 for several values of tolerance t . If we want the error to be within 40% (i.e. t = 0.4) then the probability is p(N) = 1 for any N 2 considered in the test case. The probability function p(N) is however different when the accuracy of evaluation is increased. When the tolerance t = 0.2 is required in the problem, p(N) is not a monotone function and it has several occasional jumps before it finally approaches the required value p(N) = 1 for N Nt = 12 (cf. Fig. 6c). The oscillatory behaviour of probability p(N) becomes more pronounced when the required accuracy is further increased: function p(N) shows strong oscillations on coarse grids with N < Nt where Nt = 14 for t = 0.05. Hence we conclude that if ‘patchy’ density distribution (25) is sampled on coarse grids in order to evaluate the total population size within the tolerance of 40% then the evaluation procedure is entirely deterministic as we always meet the required accuracy on any coarse grid. Meanwhile a fine grid of N > 14 nodes has to be used in the same sampling procedure in order to guarantee high accuracy of 5%. Oscillating probability p(N) on coarse grids appears as a result of the choice of spatial density pattern (25). However, while this phenomenon cannot be generalised for a wider class of spatial density distributions (cf. results for Eqs. (14) and (15) where the probability is a monotone function), we have seen similar spikes in graphs of p(N) for various oscillating density distributions in our numerical experiments that have not been included into the paper for the sake of brevity. Apart from obvious observation about poor resolution of spatial density distributions on coarse grids the nature of those spikes remains unclear and this topic should be investigated in future work. One interesting observation about the problem is that the threshold number of grid points estimated above strongly disagrees with our previous estimate made in Petrovskaya and Petrovskii (2010). It was suggested in Petrovskaya and Petrovskii (2010) that the grid resolution should be ‘three points per a single peak’. That estimate results in a grid of Nt  40 points for peak width w  0:05 evaluated for spatial density distribution (25). The strong discrepancy between the two estimates of Nt arises from considering in Petrovskaya and Petrovskii (2010) a single realisation of the random variable Ia and handling it as a deterministic value. Namely, in Petrovskaya and Petrovskii (2010) we only considered the case when the location of x1 is always chosen as the left boundary of the domain (i.e. x1 = a). Using the probabilistic approach leaves us with a more realistic estimate of the threshold number Nt .

Knowledge of the total population size is essential in many ecological applications. The total population size is often used as input information for a number of ecological problems such as assessment of the condition of a given population and making correct management decisions. Hence accurate evaluation of the total population size is important for obtaining correct conclusions about the ecological problem. In our paper we studied a sampling procedure used for evaluation of the population abundance and discussed how insufficient sample size affects the results of evaluation. A standard approach to evaluation of the total population size is to consider a simple estimate of density integral (1). This approach can be thought of as ‘deterministic’ because data used for evaluation are collected just once. It was shown in the paper that this standard evaluation technique does not always work when a coarse sampling grid is used. Evaluation of the population abundance on coarse grids cannot provide the prescribed accuracy because of the insufficient information about the spatial density distribution. As a result of uncertainty in data, the evaluation error becomes a random variable and an alternative ‘probabilistic’ approach should be employed in the evaluation procedure. However, the probabilistic approach requires calculation of the mean and standard deviation and this calculation is impossible because data collection cannot be repeated in the sampling procedure. The result of sampling can therefore be considered as a single realisation of a random variable and the question arises about how different this realisation is from the true population size. It has been shown in our paper that for density distributions slowly changing in space any single realisation of the sampling procedure provides accurate estimate of the true population size. This result can be interpreted as the mean value being close to the true population size and the standard deviation of the evaluation error being small. Hence any single realisation can be considered as good approximation to the true value of the total population size. However, the above conclusion is wrong when heterogenous density distributions are considered. It has been demonstrated in the paper that a single realisation obtained as the result of a sampling procedure can strongly differ from the true population size. The probability density function for the evaluation error presents a strongly nonuniform distribution on coarse grids and therefore our ‘lucky chance’ of having an accurate estimate of the population size is small. Moreover, the accuracy of evaluation (or the probability of obtaining an estimate within given tolerance) can become worse when the size of a sampling grid is increased; e.g. see Figs. 6c,d and 7. On the other hand, we know that when the number of sampling locations is increased we will end up with accurate evaluation as the estimate converges to the exact answer in the asymptotic limit of the infinite number of nodes on a sampling grid. Hence we have certain threshold number Nt of sampling locations in the problem that will guarantee us desired accuracy. It was shown in the paper that threshold number Nt can be found from the condition p(Nt ) = 1 where p(N) is the probability of the event e  t on a grid of N points; cf. (22). It has been discussed in Section 4 of the paper that introduction of threshold number Nt in the problem allows one to reconcile the probabilistic approach based on the assumption about randomness of calculation with the deterministic approach based on the assumption that a single realisation obtained as a result of the sampling procedure does bring us reliable information about the total population size. The results of our paper require one to revisit the definition of a ‘coarse’ or ‘fine’ grid. Consider a grid of N sampling locations. We will say that the grid is coarse if N < Nt where the number Nt has

Please cite this article in press as: N.B. Petrovskaya, ‘Catch me if you can’: Evaluating the population size in the presence of a spatial pattern, Ecol. Complex. (2017), http://dx.doi.org/10.1016/j.ecocom.2017.03.003

G Model ECOCOM 638 No. of Pages 11

N.B. Petrovskaya / Ecological Complexity xxx (2017) xxx–xxx

been obtained for a particular spatial density distribution. Thus ‘grid coarseness’ is entirely defined by a spatial pattern of the density function: what is considered as a ‘fine’ grid (i.e. N > Nt ) for a quasi-homogeneous density function can be a very coarse grid (i.e. N Nt ) for a highly aggregated density distribution. One important conclusion that can be made from our results is that properties of evaluation methods on coarse grids are very different from their asymptotic properties considered for N ! 1. While the convergence rate is a reliable and conventional way of assessing any method of numerical integration in the limit of large N, the asymptotic error estimates do not hold on coarse grids. It has been discussed in Petrovskaya and Embleton (2013) that, if we want to compare two evaluation methods on coarse grids, they should be compared based on the probability of accurate evaluation of the total population size rather than their convergence rate because the evaluation error is a random variable on coarse grids. In other words, the probability of making evaluation with prescribed accuracy should be calculated for each method and the evaluation method with a higher probability is then a better method on coarse grids even if it has a slower convergence rate. First steps in developing methodology for such comparison have been made in Petrovskaya and Embleton (2013) and in the present paper we have discussed in more detail the probability calculation for evaluation method (2). Clearly the same approach can be applied if any ‘advanced’ method (i.e. a method with a higher convergence rate) is considered on coarse sampling grids. Another conclusion drawn from our work is about spatial arrangement of a sampling grid. There have been a number of ecological studies on how to select sampling locations, where various alternatives to a regular spatial grid (e.g. random spatial layouts, cluster sampling, multi-level sampling) have been considered (Greenwood and Robinson, 1996; Woodcock, 2008). Recommendations of those studies, however, do not take into account poor resolution of a heterogenous spatial density pattern on coarse sampling grids. Meanwhile it readily follows from the study in this paper that insufficient information about the spatial pattern is an inherent feature in the problem if a coarse grid is used for sampling, no matter what spatial arrangement of sampling locations has been selected. Hence, in our opinion, any comparison between two spatial arrangements, e.g. between a random sampling grid and a regular grid with the same number N of sampling locations, should take into account the uncertainty of evaluation on coarse grids. If the number N of sampling locations is prescribed in the sampling protocol then, among the other features of spatial grid geometry, the probability of accurate evaluation of the total population size has to be compared in order to decide which of the two spatial arrangements is more advantageous for given N. Our work leaves several difficult open questions, evaluation of threshold number Nt being one of them. It is clear from our previous discussion that the knowledge of a spatial pattern of the density function is crucial for accurate evaluation of the total population size and any information about it must be used to its fullest extent. Let us emphasize again that the uncertainty in data on coarse grids is an intrinsic property of the problem as in many cases we do not know where essential features of the spatial density distribution are positioned, i.e. we may not know the location of a density front, or non-zero density patches, or the strong density gradient in the domain. Thus our random choice of the first grid node on a sampling grid just reflects our lack of information about those spatial patterns. One way to deal with this

11

uncertainty is to determine threshold number Nt for certain classes of spatial distributions that are seen in nature (e.g. one or several ‘humps’ or ‘patches’ of the nonzero density). That should become a topic of future work. References Alexander, C.J., Holland, J.M., Winder, L., et al., 2005. Performance of sampling strategies in the presence of known spatial patterns. Ann. Appl. Biol. 146, 361– 370. Bearup, D., Petrovskaya, N.B., Petrovskii, S.V., 2015. Some analytical and numerical approaches to understanding trap counts resulting from pest insect immigration. Math. Biosci. 263, 143–160. Binns, M.R., Nyrop, J.P., Van Der Werf, W., 2000. Sampling and Monitoring in Crop Protection: The Theoretical Basis for Designing Practical Decision Guides. CABI Publishing, Wallingford. Boag, B., Mackenzie, K., McNicol, J.W., Neilson, R., 2010. Sampling for the New Zealand flatworm. Proceedings Crop Protection in Northern Britain 2010 45–50. Bolker, B.M., 2008. Ecological Models and Data in R. Princeton University Press, Princeton. Byers, J.A., Anderbrant, O., Löfqvist, J., 1989. Effective attraction radius: a method for comparing species attractants and determining densities of flying insects. J. Chem. Ecol. 15, 749–765. Davis, P.M., 1994. Statistics for describing populations. In: Pedigo, L.P., Buntin, G.D. (Eds.), Handbook of Sampling Methods for Arthropods in Agriculture. CRC Press, Boca Raton, USA, pp. 33–54. Davis, P.J., Rabinowitz, P., 1975. Methods of Numerical Integration. Academic Press, New York. Dent, D., 2000. Insect Pest Management. CABI Publishing, Wallingford. Embleton, N.L., Petrovskaya, N.B., 2013. On numerical uncertainty in evaluation of pest population size. Ecol. Complex. 14, 117–131. Ferguson, A.W., Klukowski, Z., Walczak, B., et al., 2000. The spatio-temporal distribution of adult Ceutorhynchus assimilis in a crop of winter oilseed rape in relation to the distribution of their larvae and that of the parasitoid Trichomalus perfectus. Entomol. Exp. Appl. 95, 161–171. Greenwood, J.J.D., Robinson, R.A., 1996. Principles of sampling. In: Sutherland, W.J. (Ed.), Ecological Census Techniques: A Handbook. Cambridge University Press, pp. 11–86. Holland, J.M., Perry, J.N., Winder, L., 1999. The within-field spatial and temporal distribution of arthropods in winter wheat. Bull. Entomol. Res. 89, 499–513. Jepson, P.C., Thacker, J.R.M., 1990. Analysis of the spatial component of pesticide side-effects on non-target invertebrate populations and its relevance to hazard analysis. Funct. Ecol. 4, 349–355. Mayor, J.G., Davies, M.H., 1976. A survey of leatherjacket populations in south-west England, 1963–1974. Plant Pathol. 25, 121–128. Murray, J.D., 1989. Mathematical Biology. Springer, Berlin. Northing, P., 2009. Extensive field based aphid monitoring as an information tool for the UK seed potato industry. Asp. Appl. Biol. 94, 31–34. Pascual, M.A., Kareiva, P., 1996. Predicting the outcome of competition using experimental data: maximum likelihood and Bayesian approaches. Ecology 77, 337–349. Pedigo, L.P., Rice, M.E., 2009. Entomology and Pest Management. Pearson Prentice Hall, New Jersey. Petrovskaya, N.B., Embleton, N.L., 2013. Evaluation of peak functions on ultra-coarse grids. Proc. R. Soc. A 469, 20120665. doi:http://dx.doi.org/10.1098/ rspa.2012.0665. Petrovskaya, N.B., Embleton, N.L., 2014. Computational methods for accurate evaluation of pest insect population size. In: Godoy, W.A.C., Ferreira, C.P. (Eds.), Ecological Modelling Applied to Entomology. Springer-Verlag, Berlin Heidelberg, pp. 171–218. Petrovskaya, N.B., Petrovskii, S.V., 2010. The coarse-grid problem in ecological monitoring. Proc. R. Soc. A 466, 2933–2953. Petrovskaya, N.B., Petrovskii, S.V., 2017. Catching ghosts with a coarse net: use and abuse of spatial sampling data in detecting synchronization. J. R. Soc. Interface 14, 20160885. Pielou, 1977 Pielou, E., 1977. Mathematical Ecology. Wiley, New York. Raworth, D.A., Choi, M.J., 2001. Determining numbers of active carabid beetles per unit area from pitfall-trap data. Ent. Exp. Appl. 98, 95–108. Sherratt, J.A., Smith, M., 2008. Periodic travelling waves in cyclic populations: field studies and reaction-diffusion models. J. R. Soc. Interface 5, 483–505. Snedecor, G.W., Cochran, W.G., 1980. Statistical Methods. The Iowa State University Press, Ames. Taylor, L.R., Woiwod, I.P., Perry, J.N., 1978. The density-dependence of spatial behaviour and the rarity of randomness. J. Anim. Ecol. 47, 383–406. Woodcock, B.A., 2008. Pitfall trapping in ecological studies. In: Leather, S.R. (Ed.), Insect Sampling in Forest Ecosystems. Blackwell Publishing, pp. 37–57. Young, L.J., Young, J., 1998. Statistical Ecology. Springer, Berlin.

Please cite this article in press as: N.B. Petrovskaya, ‘Catch me if you can’: Evaluating the population size in the presence of a spatial pattern, Ecol. Complex. (2017), http://dx.doi.org/10.1016/j.ecocom.2017.03.003