CHAPTER 13
Biased Sentinel Hospital Area Disease Estimator 245 Jinfeng Wang*, Maogui Hu*, Qiao Sun†, Yilan Liao*, Chuchu Ye† *Institute of Geographic Sciences and Natural Resources Research Chinese Academy of Sciences, Beijing, China †Shanghai Pudong New Area Center for Disease Control and Prevention, Shanghai, China
13.1 MEANS OF SURFACE WITH NONHOMOGENEITY: MSN METHOD In epidemiological investigation, disease prevalence is usually estimated by a sample. This can be done by simple summation of a random sample, stratified sampling, or the Kriging method. Sampling method requires randomly spatial distribution of samples, but in reality this is often not the case. The Kriging method can generate unbiased estimate for homogeneous populations; nevertheless, this method is not applicable to stratification with nonhomogeneity, i.e., significantly different covariances among different strata. Therefore, by integrating with spatial stratification of heterogeneity and the Kriging method for optimization, we developed the best unbiased method to estimating means of stratified surface with nonhomogeneity, and verified the theory by using different types of true and simulated data sets. The theory verification and calculation results indicate that, when both spatial relevance and stratification nonhomogeneity exist for a disease spatial distribution, the precision of means of surface with nonhomogeneity (MSN) method for the estimation of area prevalence and incidence is higher than that of other commonly used estimation methods. You can download free MSN software at www.sssampling.org/MSN. In spatial epidemiology, a surface represents the spatial distribution of an attribute—for example, area prevalence, incidence, environment pollution, or population density. The true mean value of a surface can be expressed as the area integration of the attribute. In actual investigation, the true mean of the surface is estimated by using the sample mean from a simple randomized or stratified sampling. If the samples are randomly selected, the estimated mean is unbiased for both nonspatial data and spatial data. Nevertheless, due to the prevalent correlations with geographically distributed events, the variance of estimation results changes and needs adjustment. In reality, the two hypotheses, i.e., random sampling and homogeneity of the attribute, are often not met. There are multiple Early Warning for Infectious Disease Outbreak. http://dx.doi.org/10.1016/B978-0-12-812343-0.00013-8 © 2017 Elsevier Inc. All rights reserved.
246
PART 3 Exploratory Research on Early Warning Technology influencing factors, including the economic constraint upon sampling survey, the heterogeneity of characteristics attributes, and the variation of importance of different areas, etc. For example, there are more hospitals and clinics in densely populated areas than in sparsely populated areas. Small sample size, nonrandom sampling, and the nonhomogeneity of attributes would often result in bias and large variance of a sample estimation, using both conventional statistics and geostatistics. If the target attribute is independent and identically distributed, conventional sampling and statistic is appropriate; if the target attribute is spatially stratified heterogeneity, the property that the within-strata variance is smaller than the between-strata variance (Wang et al., 2016), stratified sampling (Cochran, 1977) and the sandwich estimator (Wang et al., 2013a) are appropriate; when the attribute is spatially autocorrelated, Kriging (Matheron, 1963) is appropriate for sampling and estimation; when the attribute is both spatially stratified heterogeneity and spatially autocorrelated, and the sample is stratified (each stratum has at least two sample units), MSN (Wang et al., 2009; Hu and Wang, 2011) is appropriate for sampling and estimation; when the attribute is both spatially stratified heterogeneity and spatially autocorrelated, and the sample is biased (some strata is absent any sample), biased sentinel hospital based area disease estimator (B-SHADE) (Wang et al., 2011a,b; Xu et al., 2013) is appropriate for estimation; and when the attribute is both spatially stratified heterogeneity and spatially autocorrelated, and only one sample unit is available, SPA (Wang et al., 2013b) is appropriate for estimation if some covariate is available.
13.1.1
MSN Theory
Assume that the study target ℜ can be divided into H different strata, i.e., stratum {ℜh, h ¼ 1, …, H}, and the random fields y(s) within each stratum are homogeneous, i.e., the mathematically predictive E½yðsÞjs 2 ℜh is constant, and “s” is the population unit or a spatial location. Stratification can usually be determined by expert knowledge or understanding of covariates. Then, the true mean of study target ℜ is: Yℜ ¼ ℜ1
H X h1 ð
Yh ¼ ℜh 1
ℜh
ℜh Yh yðsÞds
The calculation of the formula needs the value across all population units, which is hardly possible in actual application. Therefore, estimation should be done by sampling survey. If there are nh sampling units on each stratum ℜh, the total number of sampling units in the entire study area is n, then the true mean value of the study target can be estimated with the weighted mean value: yℜ ¼
nh H X X ah whi yhi h¼1
i¼1
Biased Sentinel Hospital Area Disease Estimator CHAPTER 13 where yhi is the value of sample unit i within stratum ℜh, whi is the weight of the sample unit, and ah ¼ ℜh ℜ1 stands for the weight of stratum h. To let yℜ be the unbiased estimate of Yℜ , the sum of weights of sampling units on each stratum should be 1; we also hope the difference between the sample mean yℜ and the population mean Yℜ is minimized as whi. The sample weights whi can be obtained by solving the following equation (Wang et al., 2009): ap
XH Xnh
a w i¼1 h hi
h¼1
¼ ap ℜ1
ð
ℜ
cov yhi , yj + μj
cov ypj ,yðsÞ ds p ¼ 1, ...., H; j ¼ 1, …,np g
In the equation, μp is the Lagrangian multiplier; ah ¼ ℜh ℜ1 ; ap ¼ ℜp ℜ1 . cov(yhi, ypj) is the covariance between sampling unit i on stratum h and sample unit j on stratum p. Further, the mean estimate variance thus obtained can be expressed as σ 2ℜ
¼ℜ
2
ð ð ℜ ℜ
ℜ1
13.1.2
covðyðsÞ,yðs0 ÞÞdsds0
ð X nh H X ℜ h¼1 i¼1
ah whi covðyhi , ðys ÞÞds
H X
μh
h¼1
Verification Through Test
DATA SETS Three different data sets are selected to compare the precision of the various methods for estimating population means: two true data sets and one simulated data set, which are further stratified according to expert knowledge. Data set 1: Agricultural acreage data in Shandong Province in 2000 (Fig. 13.1A): This data set is a rasterized vector data set obtained via image interpretation; the pixel size is 1 km 1 km, and the value of each pixel is the agricultural acreage value in the picture element. The ratio of surface average area of agricultural land is 659.419%. Data set 2: MODIS land surface temperatures (Fig. 13.1B): Data include the average land surface temperatures in Jan.–Aug. 2005; the spatial resolution is 927 m 927 m, and the mean is 29.015°C. Data set 3: Simulated temperatures (Fig. 13.1C): Random images are generated from multivariant normal distribution according to the selected predictive mean and covariance matrix. The gross mean of simulated temperatures is 16.018°C.
247
N
Legend Area (km2)
(A)
Legend N Temperature (⬚C)
High: 1000
High: 39.41
Low:0
Low. 13.53
0 20 40
80
km 120 160
Legend Temperature (⬚C) 19.177 km
(B)
0 20 40
80
11.935
120 160
(C) FIG. 13.1 Sample data sets and stratification: (A) Agricultural acreage in Shandong Province; (B) MODIS land surface temperatures; and (c) Simulated temperatures. Adapted from Wang, J.F., Christakos, G., Hu, M.G., 2009. Modeling spatial means of surfaces with stratified non-homogeneity. IEEE Trans. Geosci. Remote Sens. 47(12), 4167–4174.
Biased Sentinel Hospital Area Disease Estimator CHAPTER 13 SAMPLE MEANS There are several methods to calculate sample means. MSN(1): Spatial autocorrelation is calculated as follows: If h ¼ p, each cov(yhi, ypj) in the above equation is determined by the spatial covariance model on the specific stratum; if h 6¼ p, cov(yhi, ypj) is determined by the global spatial covariance model on all strata. MSN(2): Set the covariance between sampling point pairs on different strata as 0. If h ¼ p, each cov(yhi, ypj) is calculated by using the spatial covariance model on the stratum; if h 6¼ p, cov(yhi, ypj) is 0. MSN(3): Calculation method 3 proposed in this paper. If h ¼ p, each cov(yhi, ypj) is determined by the spatial covariance model for the samples on is calculated by the specific stratum; if h 6¼ p, cov(yhi, ypj) h i cov yhi , ypj Σ cov yhi , ypj h cov yhi , ypj p =2, while cov(yhi, ypj)h and cov(yhi, ypj)p represent covariances calculated by using the spatial covariance models respectively on stratum h and stratum p, respectively. cov(yhi, ypj)Σ also represents a covariance, which is calculated by using the fitted spatial covariance modelforallsamplesonstratahandp. Other estimation methods include the universal Kriging method, the block Kriging method, the ordinary Kriging method, spatial random sampling, and simple random sampling.
CALCULATION RESULT COMPARISON The mean estimate variances calculated by using different methods with different data sets are shown in Figs. 13.2–13.4, respectively. According to the calculation results, the MSN precision is high compared to other methods. This effect is most evident in the data set of simulated temperatures (Fig. 13.4), when the stratified heterogeneity is significant and sample is small. Of course, the accuracy of all methods can be enhanced along with the increase in the number of samples. The errors of the sample means are attributed to the difference of the assumption of the methods and the properties of the targets.
13.1.3
Selection of Sentinel Hospitals
The Shanghai EXPO 2010 is held in Pudong New District, Shanghai. A few sentinel hospitals were selected for infectious disease syndromic surveillance for this significant event. Cases data of all hospitals in Pudong New District were analyzed, and the spatial autocorrelation of daily clinical visits to each hospital for the infectious disease was obtained. The parameter matrix was introduced into the MSN, and a theoretical best sampling scheme was obtained, i.e., using less sentinel hospitals to obtain an estimate of area disease prevalence with higher precision. According to Fig. 13.5, selecting different number of surveillance hospitals can influence the error of total estimated value to some extent. We found that when the number of hospitals selected was greater than 21, the variation of the error was very small. Therefore, 21 of the 34 hospitals were selected as sentinel hospitals for the syndromic surveillance during the Shanghai EXPO 2010.
249
PART 3 Exploratory Research on Early Warning Technology
Our model (method 1) Our model (method 2) Our model (method 3) Universal Kriging Block Kriging Ordinary Kriging Spatial random sampling Simple random sampling
Variance of mean surface estimation error
3500
3000
2500
2000
1500
1000 60 75 Number of samples
45
90
105
16 Simple random sampling Ordinary Kriging Block Kriging Universal Kriging Our model (method 1) Our model (method 2) Our model (method 3)
14 12 10 Absolute error
250
8 6 4 2 0 30
45
60
90 75 Number of samples
105
120
FIG. 13.2 Comparison of mean estimate variances and absolute errors based on agricultural acreage data set. Adapted from Wang, J.F., Christakos, G., Hu, M.G., 2009. Modeling spatial means of surfaces with stratified non-homogeneity. IEEE Trans. Geosci. Remote Sens. 47(12), 4167–4174.
Biased Sentinel Hospital Area Disease Estimator CHAPTER 13
Variance of mean surface estimation error
0.45 Our model (method 1) Our model (method 2) Our model (method 3) Universal Kriging Block Kriging Ordinary Kriging Spatial random sampling Simple random sampling
0.40 0.35 0.30 0.25 0.20 0.15 0.10 0.05 60
45
90
75
105
Number of samples 0.7 Simple random sampling Ordinary Kriging Block Kriging Universal Kriging Our model (method 1) Our model (method 2) Our model (method 3)
0.6
Absolute error
0.5 0.4 0.3 0.2 0.1 0.0 −0.1 45
60
75
90 105 120 Number of samples
135
150
165
FIG. 13.3 Comparison of mean estimate variances and absolute errors based on MODIS land surface temperature data set. Adapted from Wang, J.F., Christakos, G., Hu, M.G., 2009. Modeling spatial means of surfaces with stratified non-homogeneity. IEEE Trans. Geosci. Remote Sens. 47(12), 4167–4174.
251
PART 3 Exploratory Research on Early Warning Technology
Variance of population mean estimator
0.05
Our model (method 1) Our model (method 2) Our model (method 3) Universal Kriging Block Kriging Ordinary Kriging Spatial random sampling Simple random sampling
0.04
0.03
0.02
0.01
0.00 80
40
120
160
200
240
Number of samples 0.035 Simple random sampling Ordinary Kriging Block Kriging Universal Kriging Our model (method 1) Our model (method 2) Our model (method 3)
0.030 0.025 Absolute error
252
0.020 0.015 0.010 0.005 0.000 100
400 200 300 Number of samples
500
FIG. 13.4 Comparison of mean estimate variances and absolute errors based on simulated data set. Adapted from Wang, J.F., Christakos, G., Hu, M.G., 2009. Modeling spatial means of surfaces with stratified nonhomogeneity. IEEE Trans. Geosci. Remote Sens. 47(12), 4167–4174.
Absolute error of total estimate
Biased Sentinel Hospital Area Disease Estimator CHAPTER 13 1800 1600 1400 1200 1000 800 600 400 200 0 0
5
10
15
20
25
30
Number of hospitals selected by MSN
FIG. 13.5 Curve of sample size-absolute error of sample estimate.
Nevertheless, in practice there are several principles for site selection: (1) the sentinel hospitals should be as close to the Expo park as possible, and meanwhile the surveillance throughout Pudong should also be considered; (2) hospitals with a high level of electronic informatization, stable clinic visits, and appropriate keeping of clinic attendance information are preferred; and (3) in consideration of the visitors’ flow and health insurance systems, level 3A hospitals see much more clinic visits than lower levels do. Based on the above principles, we modified the theoretical best sampling scheme. Four level 1 hospitals in coastal areas of Pudong which were far from the Expo park (hospitals no. 21, 22, 24, and 25 in Table 13.1) were removed from the surveillance scheme, and meanwhile four hospitals adjacent to the Expo park (hospitals no. 2, 3, 4, 18 in Table 13.1) were added; finally, 21 sentinel hospitals responsible for syndromic surveillance were determined. See Fig. 13.6 Table 13.1
Table of Syndromic Surveillance Hospitals
No.
Level
No.
Level
1 5 6 7 8 9 10 11 12 13 14 15 16
2 3 3 2 1 1 1 1 1 1 1 2 1
17 19 20 23 21 22 24 25 2 3 4 18
1 1 2 2 1 1 1 1 1 1 1 1
253
254
PART 3 Exploratory Research on Early Warning Technology
FIG. 13.6 Spatial distribution of syndromic surveillance hospitals.
for the locations of the hospitals selected by the theoretical best protocol and those actually selected. The theoretical best sampling scheme based on the MSN method and the modified scheme were compared, and a chi-square test was conducted with their daily number of cases; no statistically significant difference was found. Meanwhile, we observed that, with the theoretical best sample scheme and by applying the MSN theory, the estimated total number of syndromic cases across the 21 hospitals in Pudong was 97,520, with an error of 266; with the modified sampling scheme and by applying the MSN theory, the estimate was 97,494, with an error of 292 (Table 13.2). This suggests that the actual sampling scheme was slightly inferior to the theoretical best protocol, and the limited adjustment to the theoretical best sampling scheme was acceptable.
13.2 ESTIMATING AREA DISEASE PREVALENCE BASED ON SENTINEL SURVEILLANCE DATA: B-SHADE METHOD The MSN described in the section above can be used to help select best unbiased sentinel hospitals. Nevertheless, due to various factors, the selection of
Biased Sentinel Hospital Area Disease Estimator CHAPTER 13
Table 13.2
Statistical Inferences and Absolute Errors of Total Syndromic Cases at 21 Sentinel Hospitals Versus All Surveillance Network (34 Hospitals in Total) Statistical method MSN unbiased best statistics
Sampling method MSN unbiased best sampling MSN + actual minor adjustment
Total
Absolute error
Total
Absolute error
All surveillance bodies (34 hospitals) True value
97,520
266
94,268
3518
97,786
97,494
292
95,390
2396
97,786
Simple summation
sentinel hospitals may be biased, i.e., the mathematically expectation of simple arithmetic mean value of the monitored values of sentinel hospitals is not equal to the area real disease prevalence. When the sentinel hospitals are biased, most mainstream epidemiological statistical methods cannot generate unbiased, lowest variance estimates of area disease prevalence. This problem was overcome by the B-SHADE method: biased sentinel hospitals based area disease estimation (Wang et al., 2011a,b; Hu et al., 2013; Xu et al., 2013). This method combines the best estimation technique of the block Kriging method and the sampling bias correction technique, while overcoming the shortfalls of these two methods: the failure of the block Kriging method to correct sampling bias and spatial clustering; ratio value estimation has a shortfall in minimum error estimation. The B-SHADE method reduces to the block Kriging method in the circumstance of no sampling bias, and reduces to the ratio value estimation method when there is no relevance among hospitals. If there is neither bias nor spatiotemporal relevance among hospital records, the B-SHADE method will become simple random statistics. You can download B-SHADE software for free at www.sssampling.org/B-SHADE.
13.2.1
B-SHADE Theory
B-SHADE theory is used to estimate the total number of cases in an area based on the case reports of sentinel hospitals. Assuming reports are weekly, the actual total number of cases in the entire study area per week should be the sum of XN y , including n sentinel all cases reported by hospitals in the area: Y ¼ i¼1 i hospitals among all the N ð> nÞ hospitals; yi is the number of cases reported by hospital i in the week. The accurate Y value is usually estimated on the basis of available records maintained by the sentinel hospitals (yi, i ¼ 1, …,n).
255
256
PART 3 Exploratory Research on Early Warning Technology Since n < N, by taking account of the autocorrelation among the hospitals, possible bias can be corrected by assigning appropriate weight to sentinel hospital reports, such as the following equation: y ð wÞ ¼
n X
wi yi
i¼1
where y(w) is the estimate of the true Y value and wi is the weight of sentinel hospital i. We expect it to come with two important characteristics: (1) no bias: h E(y(w)) ¼ E(Y); and (2) minimum sample estimate variance, i min w σ 2yðwÞ ¼ EðyðwÞ Y Þ2 .
13.2.2
Correlations Among Hospitals
The number of cases reported by hospitals is one of the most important entries for estimating incidence in an area. The estimation is based on the (direct or indirect) associations between the area total and sentinel hospitals. Behind these associations, there is a social interactive network of hospitals and individuals. In Fig. 13.7, bi is the ratio between the number of cases monitored by the sentinel hospital i (yi) and the total number of cases (Y); Cij ¼ C(yi, yj) represents the correlation between record yi and yj of hospitals i and j (Fig. 13.7).
13.2.3
System Modeling
As indicated above, the objective is the best linear unbiased estimation (BLUE) of the number of cases based on the biased records of sentinel hospitals.
FIG. 13.7 Correlations between hospitals and correlations between the cases in sentinel hospitals and those across all hospitals in the area. Adapted from Wang, J.F., Reis, B.Y., Hu, M.G., et al., 2011b. Area disease estimation based on sentinel hospital records. PLoS One 6(8), 23428.
Biased Sentinel Hospital Area Disease Estimator CHAPTER 13 The objective and the block Kriging method are consistent in terms of the estimation of spatial variation phenomenon. The block Kriging method describes the disease distribution with homogeneous covariance random field that represents spatial distance function. Nevertheless, these hypotheses are rarely satisfied in the real world, because sentinel hospitals are often subject to sample selection bias, and the spatial covariance among hospitals may not necessarily be the function of their geographical distances. In consideration of the above situations, the B-SHADE method assumes a non-Euclidean measure appropriate for specific study objectives, and the method can modify the total estimate bias brought about by sampling bias. y(w), the estimator for the number of cases in an area (Y), should satisfy two conditions: it should be unbiased estimate of Y, and it minimizes the estimate variance. The mathematical expression for the former condition is n X
w i bi ¼ 1
i¼1
where bi ¼ Eðyi Þ=EY is the ratio between the number of cases at hospital i and the total number of cases across all hospitals, and weight wi is the contribution to the total case estimate of hospital i, which suggests cases correlation among hospitals. Parameters wi and bi can be obtained by estimate variance minimization constrained by no bias based on mathematical extreme values: 3 C y , y 1 j j¼1 7 6 76 7 6 6 7 6 6 7 7 ⋮ ⋱ ⋮ ⋮ 76 ⋮ 7 6 7 ⋮ 6 6 7 6 7 6 7 ¼ XN 7 6 Cðyn , y1 Þ ⋯ Cðyn ,yn Þ bn 76 wn 7 6 6 C yn , yj 7 4 54 5 4 5 j¼1 2
Cðy1 ,y1 Þ ⋯ Cðy1 , yn Þ b1
b1
⋯
bn
0
32
w1
3
2 XN
μ
1
where μ is the Lagrangian coefficient. The minimum estimate variance can be expressed as σ 2yðwÞ ¼ ðrn 1Þ
n X n X
wi wj C yi, yj 2μ
i¼1 j¼1
where rn ¼
XN XN i¼1
j¼1
Xn Xn C yi, yj = i¼1 j¼1 wi wj C yi , yj . bi expresses the contri-
bution of cases at sentinel hospitals to the area total; rn expresses the ratio of correlation among all hospitals and its estimated value; and Xn Xn represents the current spatiotemporal correlations w w C y , y i j i j i¼1 j¼1 among sentinel hospitals. C(yi, yj), bi, and rn are estimated on the basis of historical data.
257
258
PART 3 Exploratory Research on Early Warning Technology 13.2.4
Case Study
Take the HFMD (hand foot and mouth disease) surveillance in an area as an example. From 53 hospitals, the daily number of HFMD cases were collected, covering Jan. 1, 2009 to Sep. 9, 2010, and these data are used to compare B-SHADE method and other methods. The three methods compared are: (1) the B-SHADE hmethod, y(w)(t); (2) the i Xn XN Xn 0 0 y ð t Þ y ð t Þ= y ð t Þ ; ratio value estimation method, yratio ðt Þ ¼ i i i i¼1 i¼1 i¼1 X N n y ðt Þ. and (3) the simple random estimation method, ys ðt Þ ¼ i¼1 i n Table 13.3 lists the mean absolute errors (AEs) of the estimation of the number of cases in the entire area based on the weekly numbers of cases at nine hospitals during weeks 3–34, by using the three methods. See Fig. 13.8 for the spatial locations and levels of 53 hospitals and the distribution of 9 sentinel hospitals. Fig. 13.9 shows the average weekly numbers of cases in the area as estimated by using the three methods, and the corresponding standard deviations. Fig. 13.10 shows the mean AEs of the weekly numbers of cases within the same period which are obtained by applying the three methods to the same sentinel hospital data set. The estimate variance of B-SHADE method is consistently minimal. Table 13.3 explicitly reveals that AE(B-SHADE) < AE(ratio estimation) < AE(simple estimation). In this case, B-SHADE performs the best among the three methods. As illustrated in Fig. 13.9, the average numbers of cases obtained by using the three methods are close to each other, but the statistical standard deviation of B-SHADE method is smaller. In addition, B-SHADE method uses smaller data set than the other two methods do. For example, if a horizontal line is drawn in 1.25AE, the B-SHADE, ratio value estimation method, and simple random estimation method use 5, 6, and 8 sentinel hospitals, respectively, to achieve the same accuracy.
Table 13.3
Mean Absolute Errors of the Numbers of HFMD Cases in Pudong During Weeks 3–34 in 2009 by Using the Three Methods
Number of hospitals
B-SHADE
Ratio estimation
Simple estimation
2 3 4 5 6 7 8 9
8.64 5.89 1.71 1.33 1.10 0.68 0.52 0.37
10.63 10.65 2.00 1.48 1.58 1.09 1.03 0.69
49.24 48.38 6.32 5.06 4.65 2.85 2.53 0.65
Adapted from Wang, J.F., Reis, B.Y., Hu, M.G., et al., 2011b. Area disease estimation based on sentinel hospital records. PLoS One 6(8), 23428.
Biased Sentinel Hospital Area Disease Estimator CHAPTER 13 121°0⬘E
121°30⬘E
259
122°0⬘E
31°30⬘N
Shanghai 31°0⬘N
Legend Selected by Chinese CDC 1st level 2nd level 3rd level
Selected by B-SHADE 1st level 2nd level
Unselected Pop density (people/km2) 1st level 399–1135 2nd level 1136–2015
3rd level
2016–4738 4739–17,350 17,351–28,015
N
3rd level 0
4
km 12
8
FIG. 13.8 Study area and sentinel hospitals. Adapted from Wang, J.F., Reis, B.Y., Hu, M.G., et al., 2011b. Area disease estimation based on sentinel hospital records. PLoS One 6(8), 23428. B-SHADE
Ratio estimator
350 300
250
250
250
200
200
200
180
Number of cases
300
190
190 180
190 180
150
150
150
100
100
100
50
50 4
5 6 7 8 Number of hospitals
9
Simple estimator
350 300
Number of cases
Number of cases
350
50 4
5 6 7 8 Number of hospitals
9
4
5
6
7
8
9
Number of hospitals
FIG. 13.9 Estimation of the number of cases and the standard deviations. Adapted from Wang, J.F., Reis, B.Y., Hu, M.G., et al., 2011b. Area disease estimation based on sentinel hospital records. PLoS One 6(8), 23428.
PART 3 Exploratory Research on Early Warning Technology Ruler
Average absolute error of cases
260
B-SHADE estimator
6 5 4 3
Ratio estimator Simple estimator
2.0
1.5
1.0
0.5
0.0 3
4
5
7 6 Number of hospitals
8
9
10
FIG. 13.10 Comparison of mean absolute errors of the estimated numbers of cases. Adapted from Wang, J.F., Reis, B.Y., Hu, M.G., et al., 2011b. Area disease estimation based on sentinel hospital records. PLoS One 6(8), 23428.
References Cochran, W.G., 1977. Sampling Techniques. Wiley, New York. Hu, M.G., Wang, J.F., 2011. A meteorological network optimization package using MSN theory. Environ. Model. Softw. 26, 546–548. Hu, M.G., Wang, J.F., Zhao, Y., Jia, L., 2013. A B-SHADE based best linear unbiased estimation tool for biased samples. Environ. Model. Softw. 48 (2013), 93–97. Matheron, G., 1963. Principles of geostatistics. Econ. Geol. 58 (8), 1246–1266. Wang, J.F., Christakos, G., Hu, M.G., 2009. Modeling spatial means of surfaces with stratified nonhomogeneity. IEEE Trans. Geosci. Remote Sens. 47 (12), 4167–4174. Wang, J.F., Guo, Y.S., Christakos, G., et al., 2011a. Hand, foot and mouth disease: spatiotemporal transmission and climate. Int. J. Health Geogr. 10 (1), 25. Wang, J.F., Reis, B.Y., Hu, M.G., et al., 2011b. Area disease estimation based on sentinel hospital records. PLoS One 6 (8), 23428. Wang, J.F., Haining, R., Liu, T.J., et al., 2013a. Sandwich spatial estimation for multi-unit reporting on a stratified heterogeneous surface. Environ. Plan. A. 45 (10), 2515–2534. Wang, J.F., Hu, M.G., Xu, C.D., Christakos, G., Zhao, Y., 2013b. Estimation of citywide air pollution in Beijing. PLoS One 8 (1), e53400. Wang, J., Zhang, T., Fu, B., 2016. A measure of spatial stratified heterogeneity. Ecol. Indic. 67 (2016), 250–256. Xu, C.D., Wang, J.F., Hu, M.G., Li, Q.X., 2013. Interpolation of missing temperature data at meteorological stations using P-BSHADE. J. Clim. 26 (19), 7452–7463.
Biased Sentinel Hospital Area Disease Estimator CHAPTER 13 Further Reading Christakos, G., 1985a. Modern statistical analysis and optimal estimation of geotechnical data. Eng. Geol. 22 (2), 175–200. Christakos, G., 1985b. Recursive parameter estimation with applications in earth sciences. Math. Geol. 17 (5), 489–515. Christakos, G., 2000. Modern Spatiotemporal Geostatistics. Oxford University Press, Oxford. Christakos, G., 1992. Random Field Models in Earth Sciences. Academic Press, San Diego, CA. Foody, G.M., 2002. Status of land cover classification accuracy assessment. Remote Sens. Environ. 80 (1), 185–201. Gething, P.W., Noor, A.M., Gikandi, P.W., et al., 2006. Improving imperfect data from health management information systems in Africa using space–time geostatistics. PLoS Med.. 3(6). Griffith, D.A., 2005. Effective geographic sample size in the presence of spatial autocorrelation. Ann. Assoc. Am. Geogr. 95 (4), 740–760. Haining, R., 1988. Estimating spatial means with an application to remote sensing data. Commun. Stat. Theory Methods 17 (2), 537–597. Harris, P., Brunsdon, C., Fotheringham, A.S., 2011. Links, comparisons and extensions of the geographically weighted regression model when used as a spatial predictor. Stoch. Environ. Res. Risk Assess. 25 (2), 123–138. Heckman, J.J., 1979. Sample selection bias as a specification error. Econometrica 47 (1), 151–161. Hoffman, J., 2010. Deception by numbers. Nature 467, 1043–1044. Kolovos, A., Eskupin, A., Jerrett, M., et al., 2010. Multi-perspective analysis and spatiotemporal mapping of air pollution monitoring data. Environ. Sci. Technol. 44 (17), 6738–6744. Li, L.F., Wang, J.F., Cao, Z.D., et al., 2008. An information-fusion method to regionalize spatial heterogeneity for improving the accuracy of spatial sampling estimation. Stoch. Environ. Res. Risk Assess. 22 (6), 689–704. Lloyd, C., Atkinson, P.M., 2002. Non-stationary approaches for mapping terrain and assessing prediction uncertainty. Trans. GIS 6 (1), 17–30. Olea, R.A., 1999. Geostatistics for Engineers and Earth Scientists. Kluwer, Boston, MA. Panlilio, A., Orelien, J., Srivastava, P., et al., 2004. Estimate of the annual number of percutaneous injuries among hospital-based healthcare workers in the United States, 1997–1998. Infect. Control Hosp. Epidemiol. 25 (7), 556–562. Reis, B.Y., Kohane, I.S., Mandl, K.D., 2007. An epidemiological network model for disease outbreak detection. PLoS Med. 4 (6), 210. Stehman, S., Sohl, T., Loveland, T., 2003. Statistical sampling to characterize recent United States land cover change. Remote Sens. Environ. 86 (4), 517–529. T€ orner, A., Duberg, A., Dickman, P., et al., 2010. A proposed method to adjust for selection bias in cohort studies. Am. J. Epidemiol. 171 (5), 602–608. Wang, J.F., Haining, R., Cao, Z.D., 2010a. Sample surveying to estimate the mean of a heterogeneous surface: reducing the error variance through zoning. Int. J. Geogr. Inf. Sci. 24 (4), 523–543. Wang, J.F., Jiang, C.S., Li, L.F., et al., 2010b. Spatial Sampling and Statistical Inference. Science Press, Beijing.
261