Sm. Sci. Med. Vol. 34, No. Printedin Great Britain.
7. pp.
769-777.
0211-9536192 SS.00 + 0.00 Pergamon Press plc
1992
DISTANCE AND RISK MEASURES FOR THE ANALYSIS OF SPATIAL DATA: A STUDY OF CHILDHOOD CANCERS S. SELVIN,‘.~ J. SCHULMAN
and D. W.
MERRILL’.)
‘Department of Biomedical and Environmental Health Sciences, University of California, Berkeley, CA 94720, U.S.A. ‘California Birth Defects Monitoring Program, March of Dimes Birth Defects Foundation, 5900 Hollis Street, Emeryville, CA 94608, U.S.A. and ‘Information and Computing Sciences Division, Lawrence Berkeley Laboratory, I Cyclotron Road, Berkeley, CA 94720, U.S.A. Abstract-Three statistical approaches, used to detect spatial clusters of disease associated with a point source exposure, are applied to childhood cancer data for the city of San Francisco (1973-88). The distributions of incident cases of leukemia (51 cases), brain cancer (35 cases), and lymphatic cancer (37 cases) among individuals less than 2 1 years of age are described using three measures of clustering: distance on a geopolitical map, distance on a density equalized transformed map, and relative risk. The point source of exposure investigated is a large microwave tower located southwest of the center of the city (Sutro Tower). The three analytic approaches indicate that the patterns of the major childhood cancers are essentially random with respect to the point source. These results and a statistical model for spatial clustering are used to explore distance and risk measures in the analysis of spatial data. Both types of measures of spatial clustering are shown to perform similarly when a specific area of exposure can be defined. Key words--distance, relative risk, spatial data, childhood cancer, spatial analysis, brain cancer, lymphatic cancer and leukemia
ANALYSIS Introduction
The geographic pattern of human disease is influenced by the spatial distribution of the population-at-risk. Cases of disease plotted on a geopolitical map provide a description of the location but do not accurately reflect risk. In other words, humans tend to concentrate in specific areas, causing high frequency of disease regardless of other factors. The confounding of any spatial pattern with the underlying distribution of the population-at-risk is a fundamental issue in the study of human disease clustering. To display a pattern of disease ‘free’ of this confounding influence, a geopolitical map can be transformed. When a map is transformed so that the population under investigation is uniformly distributed, then the spatial pattern of disease will also be uniformly distributed in the absence of clustering. Such a map more clearly reflects influences from factors of environmental or epidemiologic importance, when they exist. The construction of maps transformed to equilize the density of individuals at risk is discussed elsewhere [I, 21. A geopolitical map of San Francisco city/county is shown in Fig. 1 (left) divided into 148 census tracts. A transformation of this map so that each census tract has the same density of individuals under the age of 21 is also shown in Fig. 1 (right). When a disease occurs with equal probability among individuals under 21 years old, the distribution of cases will have no systematic pattern when displayed on this transformed map.
A unique source of environmental exposure potentially associated with occurrence of childhood cancer is the electromagnetic radiation emitted from a large microwave tower located to the southwest of the center of San Francisco (Sutro Tower, displayed as an star in Figs 1, 2(a) and 2(b) is the only such tower in the City/County of San Francisco). Some speculation exists that such exposures are a health hazard (e.g. Ref. [3]). Three statistical approaches are explored to evaluate the association between this point source of exposure and site-specific childhood cancer incidence. Two methods are based on distance and the third on traditional relative risk arguments. These approaches do not require specialized data but employ the type of information routinely collected by registries and surveillance programs. For example, these three methods do not require specific information on exposure. However, if exposure measures are available, the approaches described are certainly not adequate substitutes for methods using actual exposure measurements. For the spatial analysis of cancer incidence using rates, individuals at risk can be grouped by administrative or census-defined boundaries [4]. Alternatively, to apply Poisson-based procedures, individuals can be divided into a series of geographic areas where the population-at-risk has approximately uniform distribution [5]. Both approaches lack sensitivity and work well only when a large number of observations is available. Since a limited number of site-specific cases occurred in the San Francisco data, approaches employing rates calculated for a series of geographic
s.
770
SELVIN
et
a/.
- Geopolitical
San Francisco
San Francisco - Transformed 600
400 200 0 -200 -400 -600
-600 -600 Fig.
sub-units
-400
-200 0 200 kilometers x 100
1. San Francisco
are not practical.
to the analysis a prespecified
rare point
400
600
-600
-200 0 200 kilometers x 100
city/county by census tract displayed on a geopolitical density-transformed map (right).
A recent paper relevant
phenomenon base
-400
on
in the vicinity strictly
spatial
of data
has been authored by Diggle [6]. Additionally, studies based exclusively on spatial distances associHodgkins
400
map (left) and on a
ated with cases of disease are clearly not definitive since important questions concerning the role of other possibly relevant variables such as smoking status, occupation, and socioeconomic status are not addressed. Non-Hodgkins
Disease
Lymphomas
l
+ +++++++ +
+
-600
c
h
J
-500
0
-200
+
+
-400 -600
I
I
-MO
500
0
5w
Brain Cancer
Leukemia 600 c
600
-500
500
0
Kilometers
600
Kilometers
x 100 Fig. 2(a). See caption opposite
x 100
Distance and risk measures Hcdgkins Disease
-600
-400 -200
0
200
Non-Hodgkins Lymphomas
400
600
-600
-400 -200
Leukemia
A00
-200 0 200 400 Kilometers x 100
200
400
60(
Brain Cancer
-600
600
0
600
,
400
, I
-400
I
I
I
I
-200 0 200 400 Kilometers x 100
I
600
Fig. 2(b). Fig. 2. (a) Cases of lymphatic cancer, leukemia and brain cancer in individuals under 21 years of age located on a geopolitical map (coincident cases are slightly displaced for the purposes of display). (b) Cases of lymphatic cancer, leukemia and brain cancer in individuals under 21 years of age located on a density-transformed map (coincident cases are slightly displaced for the purposes of display). This work illustrates and contrasts two types of statistical techniques useful for the analysis of routinely collected spatial data when the number of cases of disease is small. Although the results do not directly generalize to all situations, the principles discussed and the results observed are likely to provide background for the investigation of other possible clusters of disease. The data
A total of 123 cases of cancer among 50,686 white individuals at risk under the age of 21 were abstracted from the Surveillance, Epidemiology and End Results (SEER) cancer registry for the years 1973-88. Incident cases of leukemia (International Classification of Disease-Oncology [ICD-0] code 169.1; 51 cases), brain cancer (ICD-0 code 191-2; 35 cases) and lymphatic cancer (ICD-0 code 196; 37 cases) are plotted on both geopolitical and transformed maps, shown in Figs 2(a) and (b) respectively. Since exact geographic information on location was not availSSM %4,7--E
able, each case is plotted at the centroid of the census tract of residence. This approximation to the location of a case produces an essentially unbiased estimate of distance and negligibly increases the variability among the observed distance, since the census tracts are small and compact so that little distortion occurs from assigning cases to centroids. The lymphomas were further classified into Hodgkins and nonHodgkins cases based on histology codes. The 123 site-specific cases comprise about 50% of the total childhood cancers occurring over the 16 year period. The age distribution of the 123 cases of childhood cancers shows, as expected the younger ages of leukemia patients and the relatively older ages of the lymphoma cases (Table 1). Brain cancer and non-Hodgkins lymphomas occurred more or less uniformly among all ages, with a mean age of occurrence 9.6 and 13.7 years, respectively. The distribution of the years in which cases were diagnosed are not evenly distributed over the 16 years of data, with fewer cases reported in the years 1985
772 Table I. Age (years) distribution Age
o-2
f5
6-8
9-11
12-14
15-17
IS-20
Total
Mean
Leukemia Brain Lymphomas Hodnkins Non-Hodgkins Lymphomas and leukemia
I6 5 0 0 0 I6
II 7 I 0
6 6 3 2 I 9
I 5 6 4 2 7
6 2 5 3 2 II
4 5 5 4 I 9
7 5 17 I3 4 24
51 35 37 26 II 88
7.7 9.6 15.1 15.7 13.7 10.8
I I2
Table 2. Year of diagnosis distribution Years Leukemia Brain Lymphomas Hodgkins Non-Hodgkins Lvmphomas and leukemia
73-74
75-76
77-78
79-80
81-82
83-84
85-86
87-88
Total
9 5 8 5 3 17
5 5 3 0 3 8
IO 7 7 5 2 I7
5 4 4 4 I IO
6
IO 3 3 5 0 15
3 4 4
3 6 6 2 0 5
51 35 37 26 II 88
through 1988 (Table 2). Case ascertainment has not been completed for these years. The annual site-specific rates (number cases divided by the approximate person-years at risk) are reported in Table 3 for each childhood cancer. Distance measured on a geopolitical map A basic variable in the study of a cluster of disease cases is the distance from a person at risk to a point source of possible exposure, represented as D. Under the hypothesis that no spatial pattern exists, the approximate expected distance (ED) from a person at risk to a specific point (x0, y,) on a geopolitical map divided into k sub-areas is L
2 n,d;
expected
mean distance
= ED = w
(1)
where n, is the number of individuals at risk in the i-th sub-area (e.g. census tract) and d; is the distance from the centroid of the sub-area to the point (.q,, yo). The variance of the distribution of the distances between a person at risk and a specific point is estimated by
variance
(D) = i,
nl(d: - ED)?
(2)
2 n, ,=I again derived under the assumption that no spatial pattern of disease exists. From expressions (1) and (2), the expected mean distance from a person at risk to Sutro Tower is ED = 3.718 km with a variance (D) = 2.080 km2. Table 3. Annual cancer rates for San Fraqcisco f\ 1973-88) I
Ci3"CU Leukemia Brain Lymphomas Hodgkins Non-Hodgkins Lvmuhomas and leukemia
Cases Rate/ I 00,000 51 35 37 26 II 88
-
6.29 4.31 4.56 3.21 1.36 10.85
I I 4 0 IO
I 2 6
The average observed distance from a set of m cases to the point (x0, y,,) is calculated in the usual way. That is, the average distance for m cases is
d=J=l-
m
where
d, = J(.x, - x0)’ + (,; - y0)2
(3)
and (x,,v,) is the location of the j-th case. The observed mean values are given in Table 4 for six sets of cancer incidence data. The hypothesis of no association between the point (x,,, J,) and the location of cases can be evaluated by comparing the observed mean with the expected mean distance or ~-ED z=.&ZG@F)
(4)
where z has an approximate standard normal distribution (mean = 0 and variance = 1) when disease is randomly distributed among the population-at-risk. Implicit in this test-statistic is the assumption that d has an approximately normal distrribution. Values of d substantially smaller that ED lead to large negative values of z, implying that the observed spatial pattern of cases is not likely a result of chance variation. The test-statistic z is an accurate assessment of the observed spatial pattern when the number of sub-areas k is large and more than 10 or so cases are under investigation. The results of this z-test approach applied to the childhood cancer data are given in Table 4. To assess the accuracy of the p-values obtained using this z-test analysis, a series of distances was simulated under the conditions that cases of disease occur randomly among San Francisco residents under the age of 21. Five thousand mean values, each based on m cases, were calculated for the six cancer categories. The proportion of simulated mean values less than each of the observed d-values (given in Table 4) constitutes an empirical “p-value.” These estimates are also included in Table 4. The z-statistic and the simulation results are essentially identical. The distribution of childhood cancer
Distance
and risk measures
773
Table 4. Mean distance on a geopolitical map, summary results from the z-test statistic and simulation analyses Data Site Leukemia Brain Lymphomas Hodgkins Non-Hodgkins Lvmohomas and leukemia
:-Test
Simulation
dckm)
m
z
o-value
3.783 3.435 3.525 3.631 3.274 3.674
51 35 37 26
0.321 - 1.160 -0.814 -0.308 - 1.020 -0.284
0.625 0.123 0.208 0.379 0.154 0.388
II 88
‘~-value’ 3 I30/5ooO = 615/5000 = lO75/5000 = l875/5000 = 750/5000 = 1955/5000 =
0.626 0.123 0.215 0.375 0. I50 0.391
Table 5. Mean squared distance on a transformed map, summary results from the z-test and simulation analvses z-Test
Data z
m
20.726 18.623 17.898 18.719 15.958 19.537
51 35 37 26 II 88
Site Leukemia Brain Lymphomas Hodgkins Non-Hodgkins Lymphomas and leukemia
shows no systematic respect to the point source.
cases
spatial variation
with
Distance measured on a transformed map
A similar analysis of the distribution of the cancer cases can be performed using a transformed map. As before, the expected distance is estimated under the hypothesis that the disease occurs at random among the population-at-risk and is compared to the value calculated from the observed data. For the analysis of the transformed maps, the squared distance is used as a measure of clustering since the expected value EDf (t denotes values that refer to the transformed map) is known exactly [l] and the use of the distance as a measure of clustering would involve an additional approximation. Simulation techniques based on either squared distance or distance will produce identical results in terms of a statistical test, and the z-scores differ only because exact expressions do not exist for the expectation and variance of distance. Also, on a transformed map, distance is not interpretable in units of kilometers.* Using standard but rather involved statistical arguments, the expectation and variance of squared distances measured on a density equalized map are developed elsewhere [l] and are easily calculated for the San Francisco transformed map using the expressions given in Ref. [l]. The expected squared distance is ED: = 20.381 and the variance associated with the distribution of these squared distances is variance (D:) = 191.993. The mean values of the squared distances (8) computed from the cancer incidence cases (plotted on trans*Unit area on a geopolitical and transformed map is proportional to the square of kilometers and to population, respectively. Accordingly, the distance measure on the transformed map is actually the square root of the population. The ‘kilometers’ notation used in Figs I and 2 reflects the fact that the geopolitical and transformed maps are normalized to have the same total area.
z 0.178 -0.746 - I .090 -0.612 - 1.059 -0.571
Simulation
p-value 0.571 0.228 0.138 0.270 0.145 0.284
‘p-value’ 2847/5000 = ll34/5000 = 691/5000 = 1390/5000 = 686/5Ow = lUS/SOOO=
0.569 0.227 0.138 0.278 0. I37 0.291
formed maps in Fig. 2b) are given in Table 5. These mean values are calculated by applying expression (3) to the df-values for the m observed locations for each cancer category. The z-statistic (expression 4) can also be used to compare the expected and observed values from a transformed map using the mean of the squared distances as a measure of clustering. Six sets of 5000 simulated mean values were again calculated, to provide a nonparametric alternative method of analysis that does not depend on the normality assumption required with the z-test approach. The results of the two analyses are compared in Table 5. The significance probabilities derived from analyzing the cancer cases located on transformed maps are similar to those from the analyses based on geopolitical maps (Table 4). The z-test values again agree closely with the simulation results, indicating that the normal approximation is also useful when applied to the squared distance measure derived from a transformed map. Relative risk analysis
Yet another approach to evaluating data associated with a point source exposure is to divide the population-at-risk into an exposed group and an unexposed group. A relative risk statistic is then used to compare groups. Formally, relative risk (rr) is the ratio of the probability of disease among exposed individuals (p), divided by the same probability among those unexposed ( pO). To apply a relative risk analysis to the San Francisco data, individuals under the age of 21 must be divided into exposed and unexposed groups. ‘Exposed’ individuals are defined here as all those residing within 3.5 km of the microwave tower. The choice of 3.5 km will be justified in the next section. A result of chasing 3.5 km is that 44.8% of the population under the age of 21 is ‘exposed’. An observed relative risk can be compared
774
S. SELvts et al. Table 6. Estimated
relative
risk .
:-testand
simulation
:-Test
analysis Simulation
Site
E*
E’
ri
p-value
‘p-value‘
Leukemia Brain Lymphomas Hodgkins Non-Hodgkins Lymphomas and leukemia
19 17 18 I3 5 37
32 18 19 I3 6 51
0.731 I.162 1.166 I.231 1.026 0.893
0.861 0 328 0.320 0.298 0.483 0.700
1344/5000 = 1337:SOOO = 1344/5000 = l200/5000 = I7961 5000 = 3333/X100 =
0.821 0.267 0.269 0.240 0.359 0.667
*E = exposed (cases G 3.5 km), E= unexposed cases (cases > 3.5 km). Personsat-risk are: exposed = 50.686(0.448) = 22.707, and unexposed = 50,686 (0.552) = 27,979, from U.S. census data 1980.
to a relative risk of 1.0 (no difference in disease probability among exposed and unexposed individuals; p0 = p), again using a z-statistic analogous to expression (4). The details of such a test based on the logarithm of the estimated relative risk are described in a variety of textbooks (e.g. Refs [7,8]). The data and the calculations for the San Francisco cancer incidence data are given in Table 6 where the p-values come from the test of the null hypothesis rr = 1. The relative risk calculations producep-values that differ somewhat from the calculations based on distance (Tables 4 and 5) but lead to similar conclusions (Table 6). As in the previous two cases, 5000 samples were generated to produce empirically the distribution of the relative risk measure for each cancer category, under the assumption that no pattern of spatial distribution exists. These simulated distributions were used, as before, to verify the use of the normal distribution approximation. The simulation results show (‘p-values’-Table 6) that the ‘normal-assumption’ produces consistent but somewhat inflated significance probabilities. All six significance probabilities are increased over the distance-measure significance levels and, therefore, also show no evidence of an increased risk of childhood cancer among those individuals living within 3.5 km of the microwave tower. A MODEL
To explore the statistical properties of the two distance measures and the relative risk measure, it is necessary to postulate a statistical model. A statistical model serves as a way to study the distance and risk measures when cases of disease display a systematic pattern of aggregation (i.e. alternative hypothesis is true). The following is a simple and useful model of the pattern of cases of disease distributed over an area of interest on a geopolitical map. We define the ‘background’ probability of disease as p0 for a population-at-risk made up of N individuals, the symbol p as the probability of disease among exposed individuals, and then assume that cases are distributed (i) NP, ‘background’ spatially at random (every person at risk is equally likely to be a case) over a defined area;
(ii) the number of cases associated with the exposed region is NPp, where P is the proportion of the population-at-risk exposed, then the number of excess cases of disease associated with a point source is approximately (when Np is large) number of excess cases = NpP - Np,, P = Np, P(rr - 1);
(5)
(iii) the distribution of the locations of these excess cases is described by a circular pattern defined by the bivariate normal density function, f(x,
y) = -
1
2nr2
exp
1 (x -
- 2
x0>‘+ (y -yo)? rz
1. (6)
For a given value of the radius of exposure r (standard deviation of the bivariate normally distributed values x and v), the location of 95.5% of all excess cases will be within a distance 2r of the point (x,, , y,,). Geometrically, the model describes a uniform distribution of Np, cases distributed among the San Francisco population-at-risk with a circularly symmetric pattern of increased risk for Np, P (rr - 1) excess cases, centered at the point (x,,, y,). The bivariate normal distribution is used by Cuzick [9] also to model the spatial pattern of cases of disease, in another context. An important consequence of this model is shown in Fig. 3. As the assumed radius of exposure (r) increases, the probability of identifying a cluster (statistical power) increases to a maximum and then decreases. The pattern is generally the same for relative risks of 2, 3, 4 and 5. These power curves are based on the San Francisco disease data and population distribution (i.e. hypothetical number of cases = NpO= 30, P = 0.448, significance level = z = 0.05 and, clearly, will differ for other values of these parameters and other statistical models). The power curves in Fig. 3 do not increase and decrease in a smooth pattern since the San Francisco census tract populations used to calculate the power increase non-uniformly with increasing values of r. The maximum value occurs at about r = 1.75 km. Viewed from an exposure point of view, the maximum power occurs when 95% of the excess cases occur within 3.5 km of the point source (those within radius of 2r). Therefore, under the postulated model a procedure that lacks statistical power when
Distance and risk measures
775
Power Curves 1.aI-
0.6
t
B
a
0.4
0.2
0.0 v
z
1
Radius of exposure
3
(r)
Fig. 3. Power to detect clustering of disease as a function of assumed exposure radius for relative risks of 2, 3, 4 and 5.
the exposure limit is defined by r = 1.75 km, will have even less power for all other choices of r. For the distance of r = 1.75 km, the analyses reported in Table 6 are ‘optimum’ in terms of statistical power. Of course, a radius of exposure of 1.75 km yields the most powerful test of relative risk only when the model is an accurate description of the population sampled. The statistical structure provides a description of the relationship between distance and risk. Under the conditions of the model, distances calculated from a set of collected data can be viewed as made up of two components: (a) distances D, among the ‘background’ cases and (b) distances D’ among the excess cases associated with the point source. Viewing the data set as a mixture of distances yields the following weighted average relating expected distance and relative risk: ED = ED,,+ P(rr - l)ED’
l+P(rr-1)
.
(7)
As relative risk increases (rr > l), ED decreases from ED, to ED’ as long as ED’ is less than ED,. Similarly the variance of the estimated mean distance 2, calculated from a sample containing both types of distances, Do and D’, is variance (6) = variance (Do) + P(rr - 1) variance (D’) - ’ NP,[l + P(rr - l)]*
(8)
It should be noted that expression (9) implicitly contains the expected sample sizes as part of the variance; that is, the expected number of ‘background’ cases is Np,,, the expected number of excess
cases is Np,P(rr - 1) so that the total expected cases is NP,(l + P[rr - 11). In other words, the mean distance can be envisioned as a weighted average of two types of distances and expression (8) is then no more than a weighted average of two variances, one associated with the ‘background’ cases and one associated with the excess cases. The postulated model also allows the calculation of statistical power for the distance approach to analyzing spatial data. Using the expectation and variance given by expressions (7) and (8), the approximate power of the z-test used to derive the significance probabilities given in Table 4 can be calculated as a function of increasing risk. For a significance level of CL,where z, _ I is the (1 - a)-percentile of a standard normal distribution, then z1_fl=
~,a,+
ED,Q
ED (9)
and the probability that a standard normal variate is less than z,_~ is the power of the procedure for a given level of risk [lo] (i.e. power = 1 -/I). The symbol q, represents the standard deviation generated under the null hypothesis (i.e. no spatial pattern) and u represents the standard deviation under the alternative hypothesis (i.e. a spatial pattern exists). Specifically, ED,, = 3.718 km with a variance ED’=2.217 km with a (D,) = 2.080 km* and variance (D’) = 1.332 km* for the San Francisco geopolitical map. The first two values were obtained from the expressions (1) and (2) given earlier, and the last two values were calculated using properties of the bivariate normal distribution (expression 6). Applying expressions (8) and (9) using these specific
776
s. SELVIN eI cd.
0.60
a=005 1
I
I
I
2
3
4
I 1 5
Risk of disease
Fig. 4. Power to detect clustering of disease for distance and relative risk analyses of spatial data on a geopolitical map (radius of exposure = 1.75 km).
values produces the power curve (solid line) given in Fig. 4 for Np, = 30 cases of disease and u = 0.05. The analogous power calculation carried out for the relative risk approach (Table 6) yields the power curve also depicted in Fig. 4 (dotted line). The x-coordinate of the plot in Fig. 4 is the relative risk associated with the population under study. An expression of the same form as expression (10) can be developed from the expectation and variance of the relative risk measure of association [8] and the power calculated, again based on a normal distribution approximation. Note that the relative risk power curve describes the power of the test-statistic to detect excess risk. The approximate power is roughly the same for both distance and risk procedures. DISCUSSION
If a point source of exposure is associated with a disease, then the average distance between a case and a point source is likely to be smaller than a distance calculated under the hypothesis that no spatial pattern exists. Clustering of disease about a point source of infection often produces such a patter of infectious disease cases. The hypothesis of no association can be stated two ways: (a) all individuals are at equal risk regardless of their location on a geopolitical map, or (b) cases of disease occur randomly on a density equalized map. In the first case, the analysis is not complicated, but cases displayed on a geopolitical map are confounded by the spatial distribution of the population-at-risk. In the second case, clustering may be apparent from cases displayed on a transformed map, but the mechanics of creating and analyzing transformed maps are complicated. Distance is an efficient measure of clustering when directly used in an analysis when actual measurements on exposure are unavailable. Classifying individuals as exposed or unexposed reduces efficiency since information is lost by dichotomizing a continuous variable. However, cases occurring far from a point source of exposure diminish the power of an analysis based on distance (measured on either a
geopolitical or transformed map) to detect spatial clustering. A risk approach, on the other hand, is relatively unaffected by a few noninformative extreme distances. The respective advantages and disadvantages of these two approaches (distance vs risk) for analyzing spatial data appear to balance out and produce statistical power that is not very different. This similarity in performance only occurs when the ‘exposure’ is defined in such a way that the power associated with relative risk is at its maximum. This lack of a definitive difference in power between distance and risk measures is observed for a specific model applied to the San Francisco analysis, but might differ under other conditions [I 1). The postulated model provides some justification for defining an ‘exposed’ individual as a person under 21 years old living within 3.5 km of Sutro Tower. This definition was postulated on the basis of a statistical structure; if the model reflects the data, this choice has optimal properties. That is, in this example the relative risk approach has maximum power to detect non-random spatial patterns when the radius of exposure is r = 1.75 km. The distance-based analyses of the childhood cancer spatial distributions do not require a model of the exposure pattern, However, to compare the relative risk and distance approaches, exposure from a point source based on a circular pattern (bivariate normal distribution) of exposure is incorporated into the statistical structure in a simple way. If the pattern of exposure is known, more specific and potentially more useful results could emerge from incorporating this additional information into the analysis. For example, the influence of electromagnetic radiation could be modeled using a decay function such as I/r* as a reflection of exposure. Furthermore, San Francisco is not a smooth homogeneous surface and the effects of geographic variation such as hills and valleys are also an issue when constructing an exposure model. Additionally, if other ways of evaluating exposure are available in substantial numbers, the power of an analysis would be increased by including these variables in the analysis. For example, if time of ascertainment of the incident case is accurately known, a time/space analysis would certainly be appropriate. None of the three analytic approaches indicates evidence of clustering of childhood cancers associated with Sutro Tower. The p-values are greater than 0.1 for each cancer site and, as discussed, both distance and relative risk approaches do not substantially differ in their power to detect non-random spatial patterns when they exist. This study, like most studies based exclusively on the spatial distribution of disease, does not address the question of confounding bias caused by variables not included in the analysis. Although the childhood cancer data covers a somewhat restricted age range, other variables such as race, gender and socioeconomic status are not controlled. The conclusions are
Distance and risk measures based on exploring a pairwise association between distance and disease case location and is subject to the same type of bias inherent in most observational data. This work explores the properties of three approaches to evaluating spatial distributions associated with a point source exposure when minimal information is available. In other words, these techniques provide a starting point for the analysis of spatial distributions employing data of the type typically collected by registries or surveillance programs. Acknowledgemenfs-This work was partially supported by the Director, Office of Energy Research, Office of Health and Environmental Research, Human Health and Assessments Division, of the U.S. Department of Energy under contract No. DE-AC03-76SF00098. REFERENCES 1.
Schulman J., Selvin S. and Merrill D. W. Density equilized map projections: a method for analyzing clustering around a fixed point. Sfaf. Med. 7, 491-505, 1988. 2. Selvin S., Merrill D. W., Schulman J., Sacks S., Bedell L. and Wong L. Transformations of maps to investigate clusters of disease. Sot. Sci. Med. 26, 215-221, 1988.
777
Brodeur P. Currenrs o/ Death: Power Line, Computer Terminals and the Allempt to Cover Up their Threat IO your Health. Simon and Schuster, New York, 1989. 4. Mason T. J., McKay F. W., Hoover R., Blot W. J. and Fraumeni J. F. Atlas of Cancer Mortality for U.S. Counties: 1950-1969, 1975. U.S. DHEW Publ. No. (NIH) 75-180 U.S. GPO, Washington D.C. Whittemore A. S.. Friend D.. Brown B. and Hollv E. A. A test to detect clusters of disease. Biometrika 74, 631635,
1987.
Kahn H. A. and Sempos C. T. Statistical Methods in Epidemiology. Oxford University Press, New York, 1989. Diggle P. J. A point process modelling approach to raised incidence of a rare phenomenon in the vicinity of a prespecified point. J.R. Stafist. Sot. A. 153, 349-362, 1990.
8. Kleinbaum D. G., Kupper L. L. and Morgenstem Epidemiologic
Research:
Principle
and Methods.
H. Van
Nostrand Reinhold, New York, 1982. 9. Cuzick J. and Edwards R. Spatial clustering for inhomogeneous populations. J.R. Statisr. Sot. B. 52, 73-104, 1990. 10. Snedecor G. W. and Cochran W. G. Statistical Methoa!s. The University of Iowa State Press, Ames, Iowa, 1974. 11. Schulman J., Selvin S. and Merrill D. W. Detection of excess disease near an exposure point: a case study. Arch. Environ. Hlth 45, 168-174
1990.