Preventive Veterinary Medicine 71 (2005) 173–182 www.elsevier.com/locate/prevetmed
Exploratory spatial relative risk mapping Olaf Berke a,b,* a
Department of Population Medicine, Ontario Veterinary College, University of Guelph, Guelph, Ont., Canada N1G 2W1 b Department of Biometry, Epidemiology and Information Processing, School of Veterinary Medicine Hannover, Bu¨nteweg 2, D-30559 Hannover, Germany
Abstract The many faces of disease mapping include maps of disease case locations, regional counts of cases, and disease risk. Another approach is that of mapping the relative risk. Previous methods to map the relative risk were based on regression models of relative risk, given information about geographical locations and established risk factors. However, spatial epidemiological investigations are often exploratory with limited knowledge about the putative risk factors. Indeed, often the primary motivation for the analysis is to identify unknown geographically varying risk factors. An exploratory approach to mapping the spatial relative risk is to scale the risk map using the background risk in the unexposed (or less-exposed) population. Exposure to unknown spatial risk factors is defined via specific cluster analysis. Identification of spatial disease clusters separates the population into those inside and those outside high risk areas (the exposed and unexposed populations). This exploratory approach to relative risk mapping gives the investigator an impression about the importance and geographical distribution of the unknown spatial risk factors. Two examples illustrate the exploratory relative risk mapping approach using a spatial point data set on pseudorabies in pig-herds and a regional count data set on small fox tapeworm infections in red foxes. # 2005 Elsevier B.V. All rights reserved. Keywords: Disease mapping; Risk mapping; Spatial epidemiology; Spatial statistics; Relative risk
* Tel.: +1 519 824 4120x58924; fax: +1 519 763 8621. E-mail address:
[email protected]. 0167-5877/$ – see front matter # 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.prevetmed.2005.07.003
174
O. Berke / Preventive Veterinary Medicine 71 (2005) 173–182
1. Introduction Spatial statistics remains a new frontier in veterinary epidemiologic research (Martin, 2004). However, mapping of disease to investigate the geographical pattern of disease occurrence, or to stimulate the formulation of causal hypotheses has a long history. Early examples are dot maps of outbreaks of Yellow Fever in New York in 1796 by Seaman, and of Cholera in London, in 1854, by Snow (Howe, 1989). Later, diffusion maps were used with arrows and isolines indicating the direction of epidemic spread and dates of first occurrence; for example, the Newcastle disease epidemic of 1970–1971 in England and Wales (see Thrusfield, 1997, p. 120). As interest changed from presenting the distribution of a disease to explaining its geographical variation in the mid 20th century (Bithell, 1998), choropleth maps became more important. A reason for this is that explanatory variables are seldom available at the individual level but could be obtained from census information at administrative levels of regional aggregation. Unfortunately, there are certain problems associated with choropleth mapping. One of these is the visual bias due to the uneven shape and size of the various regions under study. This means that large areas may visually dominate the map, even though their importance may be limited with respect to the corresponding population size; small densely populated urban regions are easily overlooked on a map when surrounded by large rural areas. To overcome the visual bias and for a more natural representation of the continuous spatial variation in risk, recent developments promote isopleth maps to help visualize the underlying (relative) risk of disease using geostatistical methods (Berke, 2004; Kelsall and Wakefield, 2002; Oliver et al., 1992; Carrat and Valleron, 1992). Early developments in relative risk mapping were based on knowledge of the exposure status of the individuals (Bithell, 1990). These disease mapping methods provide local (point or regional) estimates of the risk in the exposed population relative to the local risk in the unexposed, or lesser exposed, population. The resulting relative risk map quantifies the impact of the factor on the risk of disease and helps to demonstrate its geographical variation. When the risk factors are unknown, new mapping ideas are needed to help identify spatially varying risk factors. One approach that leads to exploratory relative risk mapping follows from the concept of high risk areas. If there is spatial variation in risk, due to the presence or absence (or variation of intensity) of unknown risk factors, then the disease will tend to cluster. Thus, a cluster is an expression of elevated risk and locates a particularly exposed subpopulation on the map. By extension, the population outside the cluster could be viewed as a less or even unexposed population, and this supports the calculation of the background risk. Thus, point or regional count data can easily be scaled using the background risk to obtain the exploratory relative risk map.
2. Material and methods This section introduces two example data sets and the proposed approach to exploratory spatial relative risk mapping. This includes using the spatial scan statistic, to detect disease
O. Berke / Preventive Veterinary Medicine 71 (2005) 173–182
175
Fig. 1. Choropleth map of regional prevalence of tapeworm infection among red foxes in Lower Saxony, 1991–1997.
clusters and their locations, a brief review of kernel smoothing and kriging methods for the purpose of isopleth risk mapping, and a formal description of the relative risk mapping method. 2.1. Regional count data on small fox tapeworm infections in red foxes The data set consists of 706 cases of Echinococcus multilocularis among 5365 foxes from 43 regions of Lower Saxony sampled (repeatedly and cross-sectionally) during the six winter seasons from November 1991 to March 1997 (Berke, 2001). The overall prevalence was approximately 13%. The choropleth map of the regional period prevalences is shown in Fig. 1. 2.2. Spatial point data on pseudorabies seropositive pig herds The data consists of a total of 482 pig farms of which 186 were declared positive for pseudorabies based on serological test results from sampled pigs (Berke and Grosse Beilage, 2003) (Fig. 2). The overall herd prevalence in this area of Germany was approximately 39%. 2.3. Cluster detection General geographical cluster methods are used to investigate whether a disease is clustering, and specific methods to identify where the cluster(s) is/are located (Besag and Newell, 1991). Among the many proposed methods for cluster investigations (Carpenter, 2001) the scan test is distinguished for its good performance in comparative power studies (Tango, 1999) and its wide range of applications including spatial point data as well as regional count data. The spatial scan test (Kulldorff, 1997; Kulldorff and Nagarwalla, 1995) was proposed as a specific cluster detection method and starts by imposing ‘‘virtual’’ circular scanning windows centred on certain locations across the investigation area. The radius of the circle
176
O. Berke / Preventive Veterinary Medicine 71 (2005) 173–182
Fig. 2. Point map of 482 farm locations, in Germany, indicating the pseudorabies positive and negative farms (*, positive; *, negative). The co-ordinates are in kilometres.
varies such that from zero to an upper limit of less than 50% of the total population is included in the window. Each window is called a zone (z), and includes the set of all individuals or subpopulations whose coordinates are inside the scanning window. Thus, a set Z, of zones, is created with each element being a candidate for delineating a cluster. With individual or spatial point data a binomial probability model applies, whereas with regional count data a Poisson probability model is used to calculate the likelihood. The scan test statistic is based on the maximum likelihood ratio statistic for the zone whose likelihood gives the maximum over all zones z 2 Z. Let p denote the ‘‘success probability’’ or risk within the zone, and q is the risk outside the zone, the spatial scan test evaluates the null hypothesis H0: p = q of equal risk against the alternative H1: p > q of elevated risk within the zone. The p-value is obtained via Monte-Carlo hypothesis testing, by comparing the rank of the maximum likelihood for the real data set with those from random data sets. Doherr et al. (2002) recently applied the spatial scan test; the locations of Swiss bovine spongiform encephalopathy (BSE) cases born after the feed ban were tested for clustering against a sample of control locations of BSE cases born before the feed ban (case-control approach for spatial point data). Furthermore, the BSE cases were tested for clustering within the population at the community level (regional count data approach). Both approaches showed significant clustering and the cluster locations detected were congruent. 2.4. Isopleth risk mapping An isopleth map is a continuous surface mapping of the process under study and is especially well suited to help the viewer see the geographical distribution of latent risks. However, spatial point data and regional counts are generally irregularly and discretely georeferenced in space and therefore require spatial interpolation for isopleth mapping. For spatial point data, the interpolation method of choice is kernel density estimation (Diggle, 2003). This method is also applicable for regional data (Lawson, 2001, pp. 131–132), as is the
O. Berke / Preventive Veterinary Medicine 71 (2005) 173–182
177
geostatistical kriging method (Carrat and Valleron, 1992). Prior to kriging, the regional estimates of risk need spatial smoothing, as will be described subsequently. Kernel density estimation is a weighted moving average method that can be used to estimate the intensity, or mean function, for point processes. In the spatial setting the average is calculated per unit area over all observations in a (generally circular) window that is moved over the entire study region. The weights depend on the specific kernel function, which can be chosen to be any bivariate and symmetrical density function. However, the choice of an appropriate kernel function is less important than the specification of the size of the window (the radius of circular subregions over which the estimates will be calculated). The larger the radius, the smoother is the surface map (the estimate for the spatial mean function or intensity l(s) of the spatial point process where s is any site in the study area A). Diverse methods have been proposed to adjust the window size appropriately to the study area and population density. Bailey and Gatrell (1995, pp. 84–87) give a more detailed description of kernel density estimation or smoothing. Generating a risk map with spatial point data (locations of cases and non-cases) is based on the ratio of two intensities RðsÞ ¼
l1 ðsÞ ; lþ ðsÞ
s 2 A;
where l1(s) denotes the intensity for the case process and l+(s) is the intensity for the ‘‘population at risk’’ process. Kriging is a geostatistical method that allows optimal spatially continuous prediction of the latent risk surface from observed regional risk estimates. Unlike kernel density estimation, kriging is based on a parametric spatial model. The model is specified by a spatial dependence function such as the semivariogram, and also may include a linear model for a spatial trend. The kriging predictor is a weighted average, but in contradiction to kernel density estimation, the average is calculated from the whole sample with weights depending on the spatial dependence structure. The weights are constructed to give regional risk estimates more influence on the prediction the closer they are to the prediction sites and to downplay a cluster of points that contains largely redundant information. A critical assumption in kriging is the ergodicity of the disease phenomenon under study. This includes variance homogeneity for the regional risk estimates, an assumption that does not hold when risks are estimated from varying regional sample sizes (Lawson, 2001, p. 131). Kriging regional estimates of risk to generate an isopleth risk map R(s) is described by Carrat and Valleron (1992) and Oliver et al. (1992). More details about kriging and spatial modelling are given in Cressie (1993) and Berke (1998). Linear Bayes estimation prior to kriging has been proposed to overcome the problem of variance heterogeneity (Berke, 2004). The idea behind empirical Bayesian smoothing is to improve an estimate for a subsample by using information from the entire sample. This is achieved by calculating a weighted average from the regional risk plus the global risk estimates for each region in turn. The larger the regional sample size is, the more reliable is the regional risk estimate and so the empirical Bayes estimator gives more weight to it. On the other hand, small regional sample sizes result in unstable regional risk estimates and thus more weight is given to the global risk. In effect, empirical Bayes estimates have a
178
O. Berke / Preventive Veterinary Medicine 71 (2005) 173–182
smoothing effect, shrinking the raw regional estimates toward the global risk estimate. Empirical Bayes estimation also reduces the variance heterogeneity across the regional estimates. Berke (2001) explains empirical Bayesian smoothing of regional data more formally and applies the method to the fox–tapeworm example data. 2.5. Exploratory relative risk mapping Maps can indicate the geographical variation in the relative risk and give detailed insight into the disease process. Lawson (2001) gives an overview of approaches to relative risk mapping for spatial point data as well as for regional count data. When spatially varying risk factors are unknown and the aim of the study is to identify relevant spatial risk factors, exploratory relative risk mapping may provide useful information for hypothesis generating. The approach proceeds in four steps as follows. First, search for disease clusters, if any, in the study area using the spatial scan statistic. The sub-regions where clusters are located constitute high risk areas. Second, calculate the background risk (R0) from the population at risk outside the high risk areas. Third, predict the risk surface map R(s). As described in Section 2.4, this can be done using kernel density estimation with point data or kriging the empirical Bayesian smoothed regional risk estimates from regional count data. In the fourth and last step, the risk surface R(s) is scaled using the background risk R0 to obtain the exploratory relative risk map RRðsÞ ¼
RðsÞ ; R0
s 2 A:
With prevalence data, the spatial relative risk map will be approximated by the spatial prevalence ratio.
3. Results 3.1. Regional count data example The spatial scan test detected one cluster of small fox tapeworm infections among red foxes in 43 sub regions of Lower Saxony. The cluster comprised 5 out of the 43 regions, 704 out of 5365 foxes, and 297 of the 706 cases. Therefore, the background risk in foxes was R0 = 409/4661 = 0.09, (the number infected over the population at risk outside the cluster). The regional risks were smoothed using linear empirical Bayes estimation, and then universal kriging with a quadratic trend model in the north-south direction was applied. The spatial dependence was modelled through a spherical semivariogram model without nugget effect and a range of 5.3 units in the co-ordinate system. The resulting isopleth risk map is given in Fig. 3. The map then was scaled by the respective background risk to obtain the relative risk map (Fig. 4). More details can be found in Berke (2001) and Berke and Becher (2004).
O. Berke / Preventive Veterinary Medicine 71 (2005) 173–182
179
Fig. 3. Isopleth risk map of the prevalence of Echinococcus multilocularis infections in red Foxes in Lower Saxony (from choropleth map in Fig. 1). Isolines are at the 5%, 50% and 95% quantiles of the distribution of the empirical Bayesian smoothed regional prevalences.
3.2. Spatial point process example In the pseudorabies example, the spatial scan test detected two clusters of pseudorabies seropositive herds among 482 pig farms in an animal intensive region. The two clusters consisted of 56 and 11 seropositive herds out of 77 and 11 herds under risk, respectively. Thus, the unexposed population was 394 herds of which 119 tested positive, giving a background risk estimate of R0 = 119/394 = 0.3. The kernel density estimation was applied once to the case data and once to the whole data set, using a quartic kernel function (Bailey and Gatrell, 1995) with a bandwidth of 4 km. The ratio of the two intensity maps is the risk
Fig. 4. Relative risk (prevalence ratio) map of Echinococcus multilocularis infections in red Foxes sampled in Lower Saxony. Hatched lines indicate the disease cluster.
180
O. Berke / Preventive Veterinary Medicine 71 (2005) 173–182
Fig. 5. Isolines of prevalence in percent of pseudorabies positive farms per km2. Cluster locations are indicated; polygons around the case locations show the size and location of the clusters.
Fig. 6. Relative risk (prevalence ratio) map of pseudorabies infection. The co-ordinates are in km.
map as shown in Fig. 5. The resulting explorative relative risk map is given in Fig. 6; more details on the pseudorabies example are given in Berke and Grosse Beilage (2003).
4. Discussion With respect to the concept of the epidemiologic triangle, knowledge about host, agent and environment is needed to gain some insight into the aetiology of diseases and to develop and promote preventive health as well as disease control measures. This means spatial epidemiological studies have to find out if a disease is clustering, where the
O. Berke / Preventive Veterinary Medicine 71 (2005) 173–182
181
clustered subpopulations are located and how the risk is varying across the study area. All this can be presented by the exploratory relative risk map, which makes it a useful tool for spatial epidemiological analysis. Within the context of spatial epidemiology, the main interest is in identification and quantification of geographically varying risk factors. Considerable spatial variation in risk results in areas of higher and lower risk. Thus, spatial variation in risk is a characterization of the study region (Diggle, 2003). Spatial variation in risk factors also results in clustering of the disease, and the disease clusters reflect the specific subpopulation within the study areas that are under elevated risk, hence exposed to a spatially varying risk factor. Thus the concepts of clustering and cluster for agent and host, respectively, as well as variation in risk for the environment are the spatial equivalent to the epidemiological triad, and are summarized in the exploratory relative risk map. In the statistical literature the term exploratory is reserved for methods based on few assumptions that should be robust to data anomalies. Here it is used to indicate that there is no knowledge about the identity of any risk factors. The only assumption is that the risk is continuously varying in space. In epidemiology, calculation of the relative risk is usually based on cohort data. Often, however, investigators just have access to prevalence data from cross-sectional studies or disease registries. In this instance, the prevalence ratio can be used as an approximation to the relative risk. Prevalence studies are generally less suitable for causal inferences because the temporal relationship between exposure and disease occurrence is not monitored (Noordhuizen et al., 2001). Detecting a potential disease cluster is an initial step in the analysis. In this stage, the spatial scan statistic uses a circular window for cluster detection and this window-shape may be inappropriate. For example, when airborne transmission from point sources is suspected, a plume model might be more powerful. Mapping the convex hull of the cluster population may serve as a tool to check the geometric form of clustering (see Fig. 4). The next step is to verify that the cluster is not due to chance or bias. When it is concluded that the cluster is caused by unknown spatial risk factors, the next step in the assessment of the exploratory risk map is to decide whether the relative risk is high enough to further investigate the cause for the elevation in risk. In this regard, the risk map shows the geographical variation and range of risk, whereas the exploratory relative risk map shows the relative risk and allows quantification of the potential importance of unknown causes. Unlike kernel density estimation, kriging is based on a parametric model that offers interpretations for regional count data. For example, the range parameter of the semivariogram allows inference about the distance for the spread of an infectious disease within the time window of the study. However, in the spatial epidemiological context, it is also argued that kriging is inappropriate, because variogram estimation assumes homogeneous data. This assumption does not hold with regional estimates of risk, based on varying sample sizes. Therefore, empirical Bayesian estimation of regional risks is proposed as a method for variance stabilization by ‘‘strengthening from the ensemble’’ before kriging is applied. In summary, this paper presents an exploratory relative risk mapping approach to identify and investigate the relevance of unknown spatially varying risk factors and to
182
O. Berke / Preventive Veterinary Medicine 71 (2005) 173–182
facilitate hypothesis generation about putative risk factors. Explicitly assuming that the risk factor in question is spatially varying, the population at risk is spatially separated into exposed and unexposed (or lesser exposed) groups by use of a specific cluster method – the spatial scan test. Risk mapping is accomplished by use of kernel density estimation with spatial point data and by use of kriging the Bayesian smoothed regional risks from tract count data. The relative risk map follows from scaling the risk map by the background risk of the unexposed population.
References Bailey, T., Gatrell, A., 1995. SatialDataAnalysis. Longman, Harlow. Berke, O., 1998. Estimation and prediction in the spatial linear model. Water Air Soil Pollut. 110, 215–237. Berke, O., 2001. Choropleth mapping of regional count data of Echinococcus multilocularis among red foxes in Lower Saxony. Prev. Vet. Med. 52, 119–131. Berke, O., 2004. Exploratory disease mapping: kriging the spatial risk function from empirical Bayesian estimates of regional counts. Int. J. Hlth. Geo. 3, 18. Berke, O., Becher, H., 2004. A case study in geographical risk assessment using the spatial relative risk function: an exposure map for Echinococcus multilocularis in Lower Saxony (in German). Informatik, Biometrie und Epidemiologie in Medizin und Biologie 35, 195–206. Berke, O., Grosse Beilage, E., 2003. Spatial relative risk mapping of pseudorabies-seropositive pig herds in an animal dense region. J. Vet. Med. B 50, 322–325. Besag, J., Newell, J., 1991. The detection of clusters in rare disease. J. Roy. Stat. Soc. A 154, 143–155. Bithell, J., 1990. An application of density estimation in geographical epidemiology. Stat. Med. 9, 691–701. Bithell, J., 1998. Geographical analysis. In: Armitage, P., Colton, T. (Eds.), Encyclopedia of Biostatistics. Wiley, Chichester, pp. 1701–1716. Carpenter, T., 2001. Methods to investigate spatial and temporal clustering in veterinary epidemiology. Prev. Vet. Med. 52, 227–249. Carrat, F., Valleron, A.J., 1992. Epidemiological mapping using the kriging method: application to an influenzalike illness epidemic in France. Am. J. Epidemiol. 135, 1293–1300. Cressie, N., 1993. Statistics for Spatial Data, rev. ed. Wiley, New York. Diggle, P., 2003. Statistical Analysis of Spatial Point Pattern, 2nd ed. Arnold, London. Doherr, M.G., Hett, A.R., Ru¨fenacht, J., Zurbriggen, A., Heim, D., 2002. Geographical clustering of cases of bovine spongiform encephalopathy (BSE) born in Switzerland after the feed ban. Vet. Rec. 151, 467–472. Howe, G.M., 1989. Historical evaluation of disease mapping in general and specifically of cancer mapping. In: Boyle, P., Muir, C.S., Grundmann, E. (Eds.), Cancer Mapping. Springer, Berlin, pp. 1–21. Kelsall, J., Wakefield, J., 2002. Modelling spatial variation in disease risk: a geostatistical approach. J. Am. Stat. Assoc. 97, 692–701. Kulldorff, M., 1997. A spatial scan statistic. Commun. Stat. Theor. M. 1481–1496. Kulldorff, M., Nagarwalla, N., 1995. Spatial disease clusters: detection and inference. Stat. Med. 14, 799–810. Lawson, A., 2001. Statistical Methods in Spatial Epidemiology. Wiley, New York. Martin, S.W., 2004. Measuring health and disease: progress in analytical approaches. Prev. Vet. Med. 62, 165–175. Noordhuizen, J.P.T.M., Thrusfield, M.V., Frankena, K., Graat, E.A.M., 2001. Application of Quantitative Methods in Veterinary Epidemiology, second ed. Wageningen Pers, Wageningen. Oliver, M.A., Muir, K.R., Webster, R., Parkes, S.E., Cameron, A.H., Stevens, M.C.G., Mann, J.R., 1992. A geostatistical approach to the analysis of pattern in rare disease. J. Pub. Hlth. Med. 14, 280–289. Tango, T., 1999. Comparison of general tests for spatial clustering. In: Lawson, A., Biggeri, A., Bo¨hning, D., Lessaffre, E., Viel, J.-F., Bertollini, R. (Eds.), Disease Mapping and Risk Assessment in Public Health. Wiley, New York, pp. 111–117. Thrusfield, M., 1997. Veterinary Epidemiology, rev. second ed. Blackwell Science, Oxford.