Applied Geography 32 (2012) 546e555
Contents lists available at ScienceDirect
Applied Geography journal homepage: www.elsevier.com/locate/apgeog
Small-area estimation of land cover statistics by post-stratification of a national area frame survey Geir-Harald Strand*, Linda Aune-Lundberg Norwegian Forest and Landscape Institute, P.O.Box 115, N-1431 Ås, Norway
a b s t r a c t Keywords: Land cover Agriculture Sheep Grazing Area frame survey Small-area estimation Post-stratification GIS Mountain Norway
The objective of this paper is to examine a method for estimation of land cover statistics for local environments from available area frame surveys of larger, surrounding areas. The method is a simple version of the small-area estimation methodology. The starting point is a national area frame survey of land cover. This survey is post-stratified using a coarse land cover map based on topographic maps and segmentation of satellite images. The approach is to describe the land cover composition of each stratum and subsequently use the results to calculate land cover statistics for a smaller area where the relative distribution of the strata is known. The method was applied to a mountain environment in Gausdal in Eastern Norway and the result was compared to reference data from a complete in situ land cover map of the study area. The overall correlation (Pearson’s rho) between the observed and the estimated land cover figures was r ¼ 0.95. The method does not produce a map of the target area and the estimation error was large for a few of the land cover classes. The overall conclusion is, however, that the method is applicable when the objective is to produce land cover statistics and the interest is the general composition of land cover classes e not the precise estimate of each class. The method will be applied in outfield pasture management in Norway, where it offers a cost-efficient way to screen the management units and identify local areas with a land cover composition suitable for grazing. The limited resources available for in situ land cover mapping can then be allocated efficiently to in-depth studies of the areas with the highest grazing potential. It is also expected that the method can be used to compile land cover statistics for other purposes as well, provided that the motivation is to describe the overall land cover composition and not to provide exact estimates for the individual land cover classes. Ó 2011 Elsevier Ltd. All rights reserved.
Introduction The motivation for this study came from user requirements linked to the management of outfield pasture resources in the mountain areas of Norway. Detailed mapping of land cover is carried out in selected areas in order to calculate the available grazing resources and produce management plans for the outfield pastures. The pasture cooperatives (Norwegian: Beitelag) are using these plans in order to increase the revenue for their members through improved management of the outfield pastures. The outfield surveys are expensive, and it is important to spend available funds in areas with high grazing potential. It was therefore an objective to develop a screening mechanism where existing,
* Corresponding author. Tel.: þ47 64 94 96 99, þ47 41 50 16 40 (mobile); fax: þ47 64 94 80 01. E-mail addresses:
[email protected], ghs@skogoglandskap. no (G.-H. Strand),
[email protected] (L. Aune-Lundberg). 0143-6228/$ e see front matter Ó 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.apgeog.2011.06.006
national surveys can be used to provide meaningful, albeit uncertain, information about the production potential of the pasture in smaller areas. This information can, together with data about the actual usage in terms of number of animals and slaughter weight, be used to prioritize management areas for more detailed in situ surveys of land cover and grazing resources. Background The best, most precise and most reliable mapping of land cover is obtained by carefully designed and executed field surveys. Undertaking these surveys is both time-consuming and expensive and remote sensing from aerial photographs or satellite images are therefore used as practical alternatives to field surveys (Aplin, 2004; Lucas, Rowlands, Brown, Keyworth, & Bunting, 2007). The suitability of remotely sensed spectral data and the related classification algorithms for thematic mapping have, however, both been questioned (Estes et al., 1999; Wilkinson, 1996). Another disadvantage of remote sensing is that the technology is associated with
G.-H. Strand, L. Aune-Lundberg / Applied Geography 32 (2012) 546e555
high uncertainty and may lead to statistical bias (Czaplewski, 1992; Foody, 2002). An alternative for users not requiring complete map coverage is to carry out in-situ mapping on a limited number of sampling plots at a cost much lower than a complete survey. Examples of this approach are the European Land Use and Land Cover Area Frame Survey (LUCAS) (Eurostat 2003; Gallego & Delincé, 2010) and the Norwegian AR18X18 survey of the outfields (Strand & Rekdal, 2006), as well as a number of surveys conducted by the National Agricultural Statistics Service in the US (Cotter, Davies, Nealon, & Roberts, 2010; Cotter & Tomczak, 1994). The purpose of an in situ sampling survey is to provide reliable statistics for a particular region; often a whole country or at least a large district, and the design will reflect this purpose. The results of such a survey cannot easily be downscaled and is therefore of limited value to a user mainly interested in the situation in a smaller locality. The local user may, however, not have the resources required to implement a separate, more detailed survey of the area of interest. Therefore, a need exists for methods allowing results from wide-area surveys to be downscaled and applied locally. Sampling points in the wide-area survey falling within the local area can be used on their own, as a separate sample, in order to compute local statistics. Typically, however, the number of such sampling points is small and the statistical support is therefore weak. The inevitable result is high uncertainty. The sample size can be increased with sample points from the surrounding region, provided that the user is confident that the local area of interest has a composition resembling this surrounding region. The assumption of similarity between the local and the regional situation does, on the other hand, introduce new uncertainty. A viable alternative to the simple approach described above is to apply small-area estimation (Ghosh & Rao, 1994; Rao, 2003). Smallarea estimation is a family of statistical techniques, where auxiliary data are used in order to downscale sample results to smaller populations where the sample itself is unable to provide sufficient statistical support. Previous work The bulk of applied studies where small-area estimation is used are addressing socioeconomic and, to some extent, health issues (Gambino & Dick, 2000; Li et al., 2009; Longford, 2005; Longva, Thomsen, & Severeide, 1998). Small-area estimation is also infrequently employed in agricultural surveys. One example is the Italian agricultural point frame survey AGRIT where stratification based on satellite images is used together with a field survey to estimate areas and yield of main crop types (Bendetti & Filipponi, 2010). A general overview of applications of small-area estimation in agriculture, with examples from India, the US Census of Agriculture and regional surveys in Iowa and Kansas is provided by Rao (2010). Small-area estimation has also been used in other land resource surveys besides agriculture. Examples are watershed erosion assessment (Opsomer, Botts, & Kim, 2003), forest area estimates (Finley, Banerjee, & McRoberts, 2008), forest stand structure (Reich & Aguirre-Bravo, 2009) and the downscaling of land cover and land use data (Flores & Martinez, 2000). Furthermore, Gallego and Bamps (2008) describes an application where the European CORINE Land Cover map was used for post-stratification of data from the Eurostat LUCAS survey of land cover and land use in Europe. We used small-area estimation in order to compile land cover statistics for a study area in the Norwegian mountains based on data from a national area frame survey of land cover. Our implementation of the method was based on stratification of the study
547
area and the surrounding region into a small set of land cover classes. This method is one of several small-area estimators reviewed by Gallego (2004). Variants known as “synthetic estimators” of land cover area statistics (Gonzalez & Hoza, 1978; Särndal, 1984) or the closely related “apportionment method” (Longford, 2005), can be used to estimate the proportion of a land cover class in a small region when an auxiliary classification is available for a larger, surrounding region where a sampling survey has been carried out. More sophisticated methods for small-area estimation are certainly available (Jiang, 2010; Longford, 2005; Rao, 2003) but the selected approach was chosen due to its practical simplicity. Description of the study Our study area was located in Gausdal in Eastern Norway and is described in the second section below. The input data consisted of a national land resource map where the mountain areas are mapped using satellite remote sensing. This land resource map resemble a land cover map, but the classes represent a mixture of land cover, land use and land capability. The map was used for poststratification of data from the Norwegian area frame survey of land cover in the outfields. These data sources are also described in the second section below. The method, described in the third section, was to apply GIS techniques in order to describe the land cover composition of each stratum using data from the area frame survey. This information was subsequently used in a small-area estimate of land cover statistics for the study area. The estimate was compared to the results from a complete in situ inventory of the area. The results are presented and discussed in section four and the conclusion is drawn in section five below. Study area and data Study area The study area covered a total of 152 km2 composed of three physically separate spatial compartments all located in Gausdal in Eastern Norway (Fig. 1). The approximate centre of the area is located at 9 350 4000 East and 61 140 3900 North. The area is mainly covered by mountainous forest and open tundra at elevations between 850 and 1500 m above sea level. The region has continental climate with cold winters and warm summers, while the annual precipitation range is between 650 and 750 mm. The geology is mainly sandstone covered by a thick glacial moraine (Rekdal, 2002). The land resource map AR50 AR50 is the Norwegian national land resource map for use in scale 1:50 000. The land resource map resemble a land cover map, but the classes represent a mixture of land cover, land use and land capability. The map is digital and can be inspected at, and downloaded from the Internet (NIFL, 2011). AR50 is produced by generalization from more detailed land resource data (series AR5) in scale 1:5000 (Bjørdal & Bjørkelo, 2006) in areas where this dataset is available, which is mainly below the tree line. AR50 in mountain areas and other areas without AR5 coverage is compiled using a combination of the satellite based land cover map AR-FJELL (Gjertsen, Angeloff, & Strand, 2011) and data from national topographic maps for scale 1:50,000 produced by the Norwegian Mapping Authority. The minimal mapping unit used in AR50 is 1.5 ha and the geometrical accuracy is 20 m. The AR50 nomenclature has eight major land use/land cover classes. Three of these classes are further
548
G.-H. Strand, L. Aune-Lundberg / Applied Geography 32 (2012) 546e555
Fig. 1. Location of the study area (Base map N50 Ó Norge digitalt).
subdivided, providing 18 classes in total, listed in Table 1. These 18 classes constituted the stratification in our exercise. The AR18 18 area frame survey AR18 18 (Strand & Rekdal, 2005, 2006) is an area frame survey (Stehman & Czaplewski, 1998) of land cover in Norway, emphasizing the outfields. The methodology is closely related to the first
version of the European LUCAS-survey (Eurostat 2003). This first version of LUCAS was carried out in the EU in the period 2001e2003, with the main objective to provide agricultural statistics. AR18X18 is a variant carried out in Norway - not a member of the EU - and adapted to national conditions and needs. The area frame is an 18 by 18 km grid shown in Fig. 1 (where the plots surveyed before 2010, and therefore used in this study, are shown with larger symbols than the remaining plots). A survey plot
G.-H. Strand, L. Aune-Lundberg / Applied Geography 32 (2012) 546e555 Table 1 The Norwegian land resource map AR50 nomenclature consists of eight major classes (left column). Three of the classes are further subdivided, resulting in a total of 18 classes. These classes were used for post-stratification. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Built-up land Agriculture
Forest
Open land
Harvested land Infield pasture Unspecified agricultural land Conifer forest Deciduous forest Mixed forest Unspecified forest No vegetation Scattered and meagre vegetation Lichen Continuous dry or intermediate vegetation Lush vegetation Unspecified coverage
Peat and wetland Perennial ice and snow Inland water bodies Ocean
(1500 600 m ¼ 0.9 km2) is located around the centre of each grid cell. The land cover of this survey plot is mapped in situ in accordance with the Norwegian Forest and Landscape Institute handbook for vegetation and land cover mapping at scale 1:20,000 (Rekdal & Larsson, 2005). The handbook is written in Norwegian, but an English language account of the system is available in Bryn (2006). The mapping units are polygons and the minimal mapping unit is 0.5 ha, although smaller features (down to 0.1 ha)
549
are occasionally mapped. More detailed vegetation information is also collected at ten sample points inside the survey plot, using a classification system described in Fremstad (1997). An example of a survey plot with a land cover map and sample points is shown in Fig. 2, where the land cover map has been masked into a topographic map of the surrounding area in order to show the map in a wider context. The example chosen for Fig. 2 is a survey plot close to the study area in Gausdal. Four different subsamples of AR18 18 sample plots were used in this study (Table 2). The first subsample (SampALL) consisted of all available AR18 18 sample plots at the time of the study. The AR18 18 programme will be completed in 2015 but data from 347 plots e shown in Fig. 1 as larger, blue dots - were already available when this study was carried out in 2010. The second subsample (SampSOUTH) consisted of the available plots in Southern Norway. The third subsample (Samp50) consisted of the sample plots within a 50 km buffer around the study area, and the fourth subsample (Samp20) of the sample plots within a 20 km buffer around the study area. The choice of the buffer sizes (20 and 50 km) is not entirely arbitrary. A previous study of forest health in Norway documented autocorrelation functions related to vegetation with a range of 20 km. This was assumed to be a function of the topography of valleys separated by highland areas (Strand,1994). The autocorrelation of the tree-line in Norway has, on the other hand, been shown to have a range of 50 km (Strand, 1998). These results are only indicative, and further studies of the spatial autocorrelation function of Norwegian vegetation types is needed in order to make better and more well-informed decisions about the appropriate buffer size.
Fig. 2. Land cover map of survey plot 1720 located close to the study area in Gausdal (Base map N50 Ó Norge digitalt).
550
G.-H. Strand, L. Aune-Lundberg / Applied Geography 32 (2012) 546e555
Table 2 Four subsamples of field plots from the Norwegian area frame survey of land cover (AR18X18) were tested as input for estimation of the land cover statistics in the study area in Gausdal. Selection of AR18 18 sample plots
No of AR18 18 plots
SampALL All available AR18 18 sample plots in Norway 347 SampSOUTH All available AR18 18 sample plots in southern 261 Norway Samp50 All available AR18 18 sample plots within 37 a distance of 50 km from the study area in Gausdal Samp20 All available AR18 18 sample plots within 10 a distance of 20 km from the study area in Gausdal
Reference data A complete (wall-to-wall) field survey of land cover was carried out in the study area in Gausdal in 2001 (Rekdal, 2002). This survey shared the field methodology of the AR18X18 survey and the two datasets are easily compared. Land cover in the mountain region does not change rapidly and the 2001 land cover map of the study areas was therefore used as the reference data set. Calculation Data from the area frame survey, the AR50 land resource map and the reference data set were integrated and prepared for analysis using ArcGIS software. Post-stratification of the area frame survey was done by intersecting the maps of the survey plots with the AR50 map and compile statistics for the land cover classes within each AR50 class. Similarly, the distribution of AR50 classes in the study area was obtained by intersecting the AR50 map with the border of the study area. Land cover statistics for the study area was estimated by a twostep procedure involving data from both AR50 and the area frame survey AR18 18. Let k represent the number of land cover classes (AR18 18 classes) and m the number of strata (AR50 classes). In our general case there were k ¼ 45 land cover classes and m ¼ 18 strata (although fewer land cover classes and strata were present inside the study area and in some subsets of the area frame sample). Let also zij be the proportion of land cover class i in stratum j in the sampling survey, such that k X
zij ¼ 1:0
i¼1
for each stratum j. Finally, presume that the total area aj of each of the stratum j in the study area is known. It is now straightforward to calculate the estimated area bi of each land cover class i in the study area as
bi ¼
m X
aj zij
j¼1
This is a small-area estimate using the “apportionment method” (Longford, 2005). The method was applied to the four different subsamples of the AR18 18 area frame survey described above. In addition, a direct downscaling from the subsample consisting of plots within the 50 km buffer (Samp50) was also carried out as an additional reference. This direct downscaling, where the relative distribution of land cover classes in the sampling survey is applied directly, without any form of stratification, can be explained as a special case of the apportionment method where the number of strata is m ¼ 1.
Both methods e small-area estimation and direct downscaling rests on the assumption that the relative distribution of land cover within each stratum is stationary (Cressie, 1993), i.e. the distribution is independent of location. Under this assumption, the composition of land cover classes in a stratum should be the same inside and outside the study area. This assumption was tested using a Wilcoxon signed ranks test (Wilcoxon, 1945), a non-parametric alternative to a t-test of two related samples. The land cover classes were used as the entities of the Wilcoxon signed ranks test and ranked according to relative acreage. The test implies a null-hypothesis of “no difference” between the population underlying the estimated and the observed land-cover composition. Rejection of the null-hypothesis implies that the estimated and the observed land cover composition are drawn from different populations. The reason for rejection was interpreted as indicative of the assumption of stationarity being violated. Descriptive statistics were obtained for estimates based on the subsamples that passed the Wilcoxon signed ranks test. The correlation between estimated and observed land cover statistics was calculated using Pearson’s rho. The data does not follow a normal distribution, but a scattergram was used to investigate the possible influence of outliers on the results (Fig. 3 shows the scattergram for Samp50 with post-stratification). The residuals from a linear regression model was also inspected and found to approximate a normal distribution, indicating homoscedasticity. A Fisher r-to-Z transformation (Fisher, 1921) was used to compare correlation coefficients. Finally, the distribution of observed and estimated land cover was compared visually by using block diagrams and the differences explored using Student’s t. Results and discussion Small-area estimation of the land cover distribution in the study area was compiled from each of the four subsamples of AR18X18 plots. The four estimates of the land cover distribution were compared to the distribution obtained from the reference data using the Wilcoxon signed ranks test. The results from the Wilcoxon signed ranks test are shown in Table 3. The null-hypothesis of no difference could be rejected with p < 0.05 for SampALL, and with p < 0.1 for SampSOUTH. The two rejections indicate that the assumption of stationarity was violated in these two cases, although the rejection of SampSOUTH is based on weaker probability (p < 0.1 instead of 0.05) and can be disputed. Based on this result, SampALL and SampSOUTH were found unsuitable as input data for small-area estimation. These two samples cover very large areas with highly variable topography, geology and climate and considerable internal variation, even within each stratum. The result is not surprising because the detailed content of the AR50 classes in terms of land cover also previously has been shown to vary over longer distances (Aune-Lundberg & Strand, 2011). The null-hypothesis could not be rejected for estimates based on the two local selections of AR18X18 sample plots: Samp50 and Samp20 (within 50 and 20 km from the study area). The failure to reject the null-hypothesis in these two tests indicates that the assumption of stationarity is valid for land cover distribution over distances up to at least 50 km in this region. An additional estimate based on Samp50, but without poststratification, was also carried out. This estimate was included in order to investigate the difference between estimation with and without post-stratification. The three estimates based on Samp20 (with post-stratification) and Samp50 (both with and without poststratification) were subsequently compared to the reference data. Descriptive statistics for the difference between estimates and reference data are listed in Table 4. In Table 4, “Min” shows the largest underestimation (in percentage) of a land cover class, while
G.-H. Strand, L. Aune-Lundberg / Applied Geography 32 (2012) 546e555
551
Fig. 3. Land cover classes in the study area in Gausdal. Observed coverage is the proportion of the area as observed from in situ field survey, while estimated coverage is the proportion of the area estimated from post-stratification of Samp50.
“Max” shows the largest overestimation. For the small-area estimate based on Samp50, the differences range from 5.04% (underestimation of the land cover class Fen (9c)) to 5.16% (overestimation of land cover class Dwarf shrub heath (2e)). “Mean” and “St.dev” lists the mean and standard deviation of difference across all land cover classes, while “Rho” is the correlation (Pearson’s rho) between the estimate and the reference data. The estimate based on Samp50 with post-stratification turned out to give the smallest absolute differences, and also the smallest mean difference (0.016) and associated standard deviation (1.88). Pearson’s rho for correlation between in situ data and estimates (also shown in Table 4) was r ¼ 0.95 (p < 0.01) when Samp50 with post-stratification was used and r ¼ 0.91 (p < 0.01) when Samp20 with post-stratification was used. When Samp50 without poststratification was used for direct estimation of land cover, r ¼ 0.75 (p < 0.01). The difference between the correlation obtained with Samp50 with post-stratification (0.95) and without poststratification (0.75) was significant (Z ¼ 3.21, p < 0.01), showing that post-stratification does imply an improvement. The difference between the correlation obtained with Samp50 with poststratification (0.95) and Samp20 with post-stratification (0.91)
Table 3 Results from Wilcoxon signed ranks test (Z) comparing the reference data and the estimates based on four samples of sample plots listed in Table 2. Samp50 is included twice (both with and without post-stratification). Datasett compared with in situ data
Z
p
Estimate Estimate Estimate Estimate Estimate
2.204 1.842 0.681 0.365 0.241
0.027 0.066 0.496 0.715 0.809
based based based based based
on on on on on
SampALL post-stratified SampSOUTH post-stratified Samp50 post-stratified Samp20 post-stratified Samp50 without post-stratification
was not significant (Z ¼ 1.02, p ¼ 0.31) providing no evidence of differences between the 50 km buffer and the 20 km buffer as a basis for estimation. The preliminary conclusion was that smallarea estimation provided better results than the direct downscaling but that the question of buffer size was unresolved. The estimate based on sample points from the 50 km buffer (Samp50) was still chosen for the remaining part of the study due to the larger sample size, better statistical support and smaller differences between observed and estimated values (Table 4). The detailed estimates based on post-stratification of Samp50 are shown in Table 5. This table is a list of all land cover classes either present on the area frame survey plots in Samp50 or in the reference area. The table lists the observed coverage (in percent) of each land cover type based on the reference map, along with the estimated coverage based on post-stratification of the area frame survey. The differences (estimated e observed coverage) are also listed in a separate column. Observed land cover statistics and land cover statistics estimated using small-area estimation based on Samp50 are compared in Fig. 3. The relationship is approximately linear and not seriously influenced by outliers. The dotted line represents no difference between the observed and the estimated coverage. The relationship
Table 4 Descriptive statistics of the difference between reference data and estimated land cover statistics. See text for explanation. Difference between reference data and Min estimates based on: Samp50 without post-stratification Samp50 post-stratified Samp20 post-stratified
Max Mean
St.dev Pearson’s Rho
10.5 7.08 0.147 3.45 5.04 5.16 0.016 1.88 7.52 6.94 0.137 2.78
0.75 0.95 0.91
552
G.-H. Strand, L. Aune-Lundberg / Applied Geography 32 (2012) 546e555
Table 5 Observed and estimated (using Samp50 with post-stratification) land cover statistics for the study area in Gausdal in Eastern Norway. ID 1a 1b 1c 2b 2c 2e 2ex
Land cover class
Moss snow bed Sedge and grass snow bed Polygon stone land Dry grass heath Lichen heath Dwarf shrub heath Dwarf shrub heath with high lichen content 2g Alpine heather heath 3a Low herb meadow 3b Tall forb meadow 4a Lichen and heather birch forest 4b Blueberry birch forest 4c Meadow birch forest 4e Alder forest 4g Pasture land forest 6a Lichen and heather pine forest 6b Blueberry pine forest 6c Meadow pine forest 7a Lichen and heather spruce forest 7b Blueberry spruce forest 7c Meadow spruce forest 8a Damp forest 8b Bog forest 8c Poor swamp forest 8d Rich swamp forest 9a Bog 9b Deer-grass fen 9c Fen 9d Mud-bottom bog 9e Sedge marsh 10e Moist meadows 11a Cultivated land 11b Infield Pasture 12b Boulder field 12c Exposed bedrock 12e Scattered housing 12f Artificial impediment 12g Glaciers and perpetual snow 13 Water Total
Observed (%)
Estimated (%)
Difference (%)
0.04 0.73 0.02 2.22 7.61 21.62 8.55
0.18 1.04 0.00 0.10 3.93 26.78 8.46
0.14 0.31 0.02 2.12 3.68 5.16 0.09
0.44 0.44 2.90 0.84 14.96 4.12 0.00 0.01 1.61 0.73 0.00 0.76 7.69 0.59 0.32 0.11 0.37 0.14 4.29 0.09 14.96 0.48 0.31 0.00 0.00 0.31 0.09 0.28 0.00 0.02 0.00 2.37 100.02
0.00 0.00 4.49 2.44 17.23 2.99 0.45 0.01 2.28 0.54 0.02 1.61 4.52 2.56 0.00 0.17 0.34 0.68 3.77 0.04 9.92 0.67 0.00 0.00 0.21 0.39 1.41 0.10 0.08 0.07 0.00 2.55 100.03
0.44 0.44 1.59 1.60 2.27 1.13 0.45 0.00 0.67 0.19 0.02 0.85 3.17 1.97 0.32 0.06 0.03 0.54 0.52 0.05 5.04 0.19 0.31 0.00 0.21 0.08 1.32 0.18 0.08 0.05 0.00 0.18 0.01
is further explored in Fig. 4, showing observed coverage (above the abcissa) and estimated coverage (below the abcissa) for each land cover class. (The estimated values are artificially assigned negative values in order to show both distributions in the same graph). This graph should ideally be symmetrical around the abscissa. Lack of symmetry is a sign of estimation errors. These errors are illustrated in Fig. 5 showing the difference between estimated and observed coverage for each land cover class. Fig. 5 shows that the study overestimates Dwarf shrub heath (2e) and underestimates Dry grass heath (2b), Lichen heath (2c), Blueberry birch forest (4b) and Fen (9c). A simple explanation is that this is an artefact of the sampling method. Samp50 has only 37 sample plots, and although each plot has a size of 0.9 km2 and cuts across several strata, some strata may still be poorly covered by the area frame survey. The solution is to increase the buffer size, but this may lead to systematic spatial variation within each stratum and eventually violate the assumption of stationarity. As pointed out above, further research is needed in order to determine the optimal buffer size and the conclusion could even be that different buffer sizes should be used for different strata in order to secure sufficient statistical support for the estimation exercise. Other explanations are also possible. An alternative explanation is that the field surveyors who participated in the in situ mapping of the study area and those who carried out data collection on the sample plots had different understanding of the classification scheme. This is less likely, because the field crew is experienced and considerable effort is made to harmonize their perception of the land cover classes. The most likely explanation regarding Lichen heath (2c) is the effect of outfield pasture. Gausdal and the surrounding mountain areas have a large herd of reindeer and much of the lichen cover in the mountains is dissipated by overgrazing and trampling (Y. Rekdal, pers. comm.). There is no study of the reindeer grazing in Gausdal, but the effect has been well documented in studies of similar mountain environments in other parts of Scandinavia (Skogland, 1990; Suominen & Olofsson, 2000) These areas may simply not have been identified as lichen heath in the satellite images used as a basis for stratification, but rather interpreted as intermediate or meagre vegetation. This is a stratum with less lichen heath (2c) and more dwarf shrub heath (2e), which may cause (2e) to be overrepresented in the estimate.
Fig. 4. Relative coverage of each land cover class observed in situ (above the abscissa) and estimated by post-stratification of Samp50 (below the abscissa).
G.-H. Strand, L. Aune-Lundberg / Applied Geography 32 (2012) 546e555
553
Fig. 5. Difference between estimated (from post-stratification of Samp50) and observed coverage of land cover classes in the Gausdal study area. Land cover classes where the area is overestimated are represented as positive numbers while land cover classes where the area is underestimated are represented as negative numbers.
Fig. 5 also shows an overestimation of Tall forb meadow (3b) and Mountain birch forests (4a og 4b) together with the underestimation of Blueberry spruce forest (7b) and Fen (9c). The explanation may be that the strata have different composition of vegetation inside and outside the study area and that this is linked to the size of the subsample as discussed above. The differences are, in any case, small, except for Fen (9c) which is underestimated by approximately 5%. This land cover class is hard to delineate in aerial photographs and stratum 15 Peat and wetland (see Table 1) is therefore probably underrepresented in the map used for stratification. This kind of errors must be expected and should be accepted when land cover statistics is estimated from sample surveys rather than mapped in situ. The differences shown in Table 5 and Fig. 5 can furthermore be analysed statistically. For this purpose, we squared the differences in order to avoid the effect of mixing positive and negative numbers. The mean squared difference for the 29 land cover classes that were both observed and estimated in the study area was 3.42 (%2) with a standard deviation of 7.02 and a standard error of 1.3. A t-test of the mean against the null-hypotheses of no difference (mean squared difference is zero) gives t ¼ 2.63 with 28 degrees of
Table 6 The five most common land cover classes estimated using different samples and approaches (rank 1 is the most common class). Notice that the classes (4b) and (9c) have the same size in the reference data set. Rank Field observation With post-stratification
1 2 3 4 5
2e Dwarf shrub heath 4b Blueberry birch forest 9c Fen
Samp20
Samp50
2e Dwarf shrub heath 9c Fen
2e Dwarf shrub heath 4b Blueberry birch forest 9c Fen
Direct estimate using Samp50 4b Blueberry birch forest 7b Blueberry spruce forest 2e Dwarf shrub heath 9c Fen
4b Blueberry birch forest 2ex Dwarf shrub 12b Boulder field 2ex Dwarf shrub heath with lichen heath with lichen 7b Blueberry 2ex Dwarf shrub 7b Blueberry 7c Meadow spruce forest heath with lichen spruce forest spruce forest
freedom, and it is reasonable to refute the null-hypothesis (p < 0.01). This implies that the differences are not just random fluctuations, but a result of actual differences between Samp50 and the study area e even when post-stratification is applied. The critical t-value (p < 0.01) for the squared difference for individual observations is 17.3. Three land cover classes exhibit a squared difference exceeding this value. The three classes are Lichen heath (2c), Dwarf shrub heath (2e) and Fen (9c). It is in particular for these three land cover classes that the method performs poorly in the study area, as discussed above. Table 6 lists the top five land cover classes in the study area, both observed and from each of the three estimations. The top five land cover classes in the estimate using Samp50 with post-stratification are identical to the top five land cover classes found in the field survey. The only difference is that Fen (9c) and Blueberry birch forest (4b) are ranked (2nd and 3rd rank) in the estimate but share the second rank in the observed dataset. Samp20 with poststratification is introducing Boulder fields (12b) among the top five land cover classes, at the expense of Blueberry spruce forest (7b). This is probably because the range of Samp20 is too small and the sample becomes sensitive to local anomalies in the composition of the strata. These local anomalies become less influential in the larger Samp50. Samp50 without post-stratification, on the other hand, introduces Meadow spruce forest (7c) at the expense of Dwarf shrub heath with lichen (2ex) and also leads to a considerable reshuffle of the ranking order. This is probably because Samp50 includes substantial areas in the surrounding valleys where spruce forest is widespread. The amount of spruce forest is corrected when the sample is post-stratified because the study area has only limited forest land, but this adjustment does not take place when Samp50 is used in a direct estimate without post-stratification. Conclusion The small-area estimation of land cover statistics using field samples from an area frame survey and interpolation assisted by a simple stratification has shown that the method is workable within certain constraints. The final implementation used field
554
G.-H. Strand, L. Aune-Lundberg / Applied Geography 32 (2012) 546e555
samples collected from a 50 km buffer zone around the study area. Comprehensive calibration of the size of the buffer zone was not carried out, but a smaller buffer zone of 20 km resulted in weaker correlation between estimated and observed values and slightly larger estimation errors. Experiments using all available in situ sample plots for the whole country or for large parts of the country did not satisfy the basic assumption of stationarity and could not be used. It was also demonstrated that the small-area estimation using the apportionment method involving post-stratification resulted in a statistically significant improvement of the estimate, compared to unstratified “direct estimation”. The correlation between the best small-area estimates of land cover and the reference data was high. Still, the magnitude of the estimation error is large for a few of the individual land cover classes. Fen (9c), which constitutes 15% of the study area, is underestimated with a third of its acreage. Several of the smaller classes also suffer from high relative differences between observed and estimated coverage. Dry grass heath (2b) constitutes 2% of the study area, but the estimate is only 0.1%. Rich swamp forest (8d), on the other hand, covers only 0.14% of the area but the estimate is 0.68%. Clearly, these are serious errors if the accuracy of small classes is critical. This is, however, not the case concerning the main objective of this study, the screening and selection of areas for more detailed mapping of pasture resources. The results obtained in this test were admirable, although this does not guarantee similar results for other study areas. The overall conclusion is that the method is applicable when the objective is to produce overall land cover statistics and the interest is the general composition of land cover classes e not the precise estimate of each class. The method is sufficiently precise for the purpose of screening areas in order to identify regions with high grazing potential that should be subject to subsequent wall-to-wall mapping. Supplementary tests from other areas with available in situ data should be carried out before the method is implemented in an operational context for other purposes than pasture management, eg official land cover statistics for administrative units. The method can also be considered for compilation of land cover statistics for national parks and other protected areas. The challenge is, also in this case, the prohibitive cost of complete wall-to-wall inventories. The potential with respect to land cover statistics for protected areas is, however, equivocal. Areas protected because they have outstanding natural qualities are usually dissimilar to their surroundings and small-area estimation of land cover does not apply in these cases. Areas protected because they are representative of the regional landscape or biodiversity can, on the other hand, be expected to have qualities similar to the surrounding region. Small-area estimation of land cover may be used to compile land cover statistics for these protected areas when funds for complete (wall-to-wall) inventories are insufficient. Land cover mapping is rarely an end in itself. The motivation for the present study was to improve the efficiency in outfield pasture management in Norway. The pasture cooperatives and their advisors use land cover maps to calculate the available grazing resources; produce management plans for the outfield pastures and provide farmers with advice on how to distribute and manage their herds throughout the summer season. The objective is to increase the revenue through improved management of the outfield pastures. The complete land cover surveys are expensive, and it is imperative to spend available funds efficiently by allocating them to areas with high grazing potential. It is therefore important to have a screening mechanism where data from the existing, national sampling survey can be used to provide basic information about the overall pasture quality in smaller areas. This information will, together with additional information about the number and
weight of animals, be used to select management areas for more detailed in situ surveys. The small-area estimation methodology has proved sufficiently precise and efficient and will be applied for this purpose. Acknowledgements This study was carried out with financial support from the Norwegian Space Centre (grant JOP.10.09.2) and from the Norwegian Research Council (grant 194052/i30). The authors also want to thank the anonymous reviewers for their time and valuable comments and suggestions. References Aplin, P. (2004). Remote sensing: land cover. Progress in Physical Geography, 28, 283e293. Aune-Lundberg, L., & Strand, G. H. (2011). Land resource classification in mountain areas. Examination of the classification system used in land resource mapping of Norwegian mountain areas. Report 01/2011. Ås, Norway: Norwegian Forest and Landscape Institute. Bendetti, R., & Filipponi, D. (2010). Estimation of land cover parameters when some covariates are missing. In R. Benedetti, M. Bee, G. Espa, & F. Piersimoni (Eds.), Agricultural survey methods (pp. 213e230). John Wiley & Sons. Bjørdal, I., & Bjørkelo, K. (2006). AR5 klassifikasjonssystem. Handbook 01/2006. Ås, Norway: Norwegian Forest and Landscape Institute. (in Norwegian). Bryn, A. (2006). Vegetation mapping in Norway and a scenario for vegetation changes in a Norwegian mountain district. Geographica Polonica, 79, 23e37. Cotter, J., Davies, C., Nealon, J., & Roberts, R. (2010). Area frame design for agricultural surveys. In R. Benedetti, M. Bee, G. Espa, & F. Piersimoni (Eds.), Agricultural Survey Methods (pp. 169e192). John Wiley & Sons. Cotter, J. J., & Tomczak, C. M. (1994). An image analysis system to develop area sampling frames for agricultural surveys. Photogrammetric Engineering & Remote Sensing, 60, 299e306. Cressie, N. A. C. (1993). Statistics for spatial data. John Wiley & Sons. Czaplewski, R. L. (1992). Misclassification bias in areal estimates. Photogrammetric Engineering & Remote Sensing, 58, 189e192. Estes, J., Belward, A., Loveland, T., Scepan, J., Strahler, A., Townshend, J., et al. (1999). The way forward. Photogrammetric Engineering & Remote Sensing, 65, 1089e1093. Eurostat. (2003). The Lucas survey. European statisticians monitor territory. Luxembourg: Office for Official Publications of the European Communities. Finley, A. O., Banerjee, S., & McRoberts, R. E. (2008). A Bayesian approach to multisource forest area estimation. Environmental and Ecological Statistics, 15, 241e258. Fisher, R. A. (1921). On the "Probable Error" of a coefficient of correlation deduced from a small sample. Metron, 1(4), 3e32. Flores, L. A., & Martinez, L. I. (2000). Land cover estimation in small areas using ground survey and remote sensing. Remote Sensing of Environment, 74, 240e248. Foody, G. M. (2002). Status of land cover classification accuracy assessment. Remote Sensing of Environment, 80, 185e201. Fremstad, E. (1997). Vegetasjonstyper i Norge. NINA Temahefte: 12. Trondheim, Norway: Norwegian Institute for Nature Research. (in Norwegian). Gallego, F. J. (2004). Remote sensing and land cover area estimation. International Journal of Remote Sensing, 25, 3019e3047. Gallego, F. J., & Bamps, C. (2008). Using CORINE land cover and the point survey LUCAS for area estimation. International Journal of Applied Earth Observation and Geoinformation, 10, 467e475. Gallego, F. J., & Delincé, J. (2010). The European land use and cover area-frame statistical survey. In R. Benedetti, M. Bee, G. Espa, & F. Piersimoni (Eds.), Agricultural survey methods (pp. 151e168). John Wiley & Sons. Gambino, J., & Dick, P. (2000). Small area estimation practice at statistics Canada. Statistics in Transition, 4, 597e610. Ghosh, M., & Rao, J. N. K. (1994). Small area estimation: an appraisal. Statistical Science, 9, 55e76. Gjertsen, A. K., Angeloff, M., & Strand, G. H. (2011). Arealressurskart over fjellområdene. Kart og Plan, 71, 45e51, (in Norwegian). Gonzalez, M. E., & Hoza, C. (1978). Small-area estimation with application to unemployment and housing estmates. Journal of the American Statistical Association, 73, 7e15. Jiang, J. (2010). Large sample techniques for statistics, Springer texts in statistics. Springer. Li, W., Kelsey, J. L., Zhang, Z., Lemon, S. C., Mezgebu, S., Boddie-Willis, C., et al. (2009). Small-area estimation and prioritizing communities for obesity control in Massachusets. American Journal of Public Health, 99, 511e519. Longford, N. T. (2005). Missing data and small-area estimation. Springer. Longva, S., Thomsen, I., & Severeide, P. I. (1998). Reducing costs of censuses in Norway through use of administrative registers. International Statistical Review, 66, 223e234.
G.-H. Strand, L. Aune-Lundberg / Applied Geography 32 (2012) 546e555 Lucas, R., Rowlands, A., Brown, A., Keyworth, S., & Bunting, P. (2007). Rule-based classification of multi-temporal satellite imagery for habitat and agricultural land cover mapping. ISPRS Journal of Photogrammetry & Remote Sensing, 62, 165e185. NIFL. (2011). Norwegian forest and landscape institute map browser and free download service at. http://kilden.skogoglandskap.no Accessed 07.06.11. Opsomer, J. D., Botts, C., & Kim, J. Y. (2003). Small area estimation in a watershed erosion assessment survey. Journal of Agricultural, Biological and Environmental Statistics, 8, 139e152. Rao, J. N. K. (2003). Small area estimation. John Wiley & Sons. Rao, J. N. K. (2010). Small-area estimation with applications to agriculture. In R. Benedetti, M. Bee, G. Espa, & F. Piersimoni (Eds.), Agricultural survey methods (pp. 139e147). John Wiley & Sons. Reich, R. M., & Aguirre-Bravo, C. (2009). Small-area estimation of forest stand structure in Jalisco, Mexico. Journal of Forestry Research, 20, 285e292. Rekdal, Y. (2002). Vegetasjon og beite i Gausdal vestfjell e Revsjø/Liumseterhamna, Dokklihamna, Tverrlihamna og Ormtjernkampen nasjonalpark med foreslåtte utvidingsområder. NIJOS Report 7/2002. Ås, Norway: Norwegian Institute of Land Inventory. (in Norwegian). Rekdal, Y., & Larsson, J. Y. (2005). Veiledning i vegetasjonskartlegging, M 1:20 000 e 1:50 000. NIJOS Rapport 5/2005. Ås, Norway: Norwegian Institute of Land Inventory. (in Norwegian). Särndal, C. E. (1984). Design-consistent versus model-dependent estimation for small domains. Journal of the American Statistical Association, 79, 624e631.
555
Skogland, T. (1990). Density dependence in a fluctuating wild reindeer herd; maternal vs. offspring effects. Oecologia, 84, 442e450. Stehman, S. V., & Czaplewski, R. L. (1998). Design and analysis for thematic map accuracy assessment: fundamental principles. Remote Sensing of Environment, 64, 331e344. Strand, G. H. (1994) A Geographical Study of Vitality Changes in Norwegian Conifer Forests, Doctor Scientiarum These 1994:1, Department of Surveying, Agricultural University of Norway, Ås, Norway. Strand, G. H. (1998). Kriging the potential tree level in Norway. Norwegian Journal of Geography, 52, 17e25. Strand, G. H., & Rekdal, Y. (2005). Nasjonalt arealrekneskap e utprøving i fjellet i Hedmark. Kart og Plan, 65, 236e243, (in Norwegian). Strand, G. H., & Rekdal, Y. (2006). Area frame survey of land resources, AR18X18 system description. NIJOS Report 3/2006. Ås, Norway: Norwegian Institute of Land Inventory. Suominen, O., & Olofsson, J. (2000). Impacts of semi-domesticated reindeer on structure of tundra and forest communities in Fennoscandia: a review. Annales Zoologici Fennici, 37, 233e249. Wilcoxon, F. (1945). Individual comparisons by ranking method. Biometrics, 1, 80e83. Wilkinson, G. G. (1996). Classification algorithmsdwhere next? In E. Brivio, P. A. Brivio, & A. Rampini (Eds.), Soft computing in remote sensing data analysis (pp. 93e99) Singapore: World Scientific.