Semantic enrichment of building data with volunteered geographic information to improve mappings of dwelling units and population

Semantic enrichment of building data with volunteered geographic information to improve mappings of dwelling units and population

Computers, Environment and Urban Systems xxx (2015) xxx–xxx Contents lists available at ScienceDirect Computers, Environment and Urban Systems journ...

3MB Sizes 0 Downloads 31 Views

Computers, Environment and Urban Systems xxx (2015) xxx–xxx

Contents lists available at ScienceDirect

Computers, Environment and Urban Systems journal homepage: www.elsevier.com/locate/compenvurbsys

Semantic enrichment of building data with volunteered geographic information to improve mappings of dwelling units and population Carola Kunze, Robert Hecht ⇑ Leibniz Institute of Ecological Urban and Regional Development Dresden (IOER), Weberplatz 1, 01217 Dresden, Germany

a r t i c l e

i n f o

Article history: Available online xxxx Keywords: Building model Disaggregation Census data Cadastre data Land use Geographic information system Population mapping Dasymetric mapping Volunteered geographic information

a b s t r a c t Small-scale data on dwellings and population density are required for precise geospatial urban modelling. Further, knowledge of building usage is necessary to model socio-economic aspects such as the distribution of dwellings and population. In an effort to limit costs and resourcing efforts, users and institutes in research and spatial planning are developing strategies to extract such information from existing geographic base data. Currently, land-use information from official datasets merely distinguishes residential from non-residential building usage, but cannot identify areas of non-residential usage inside residential buildings. Additional data sources are therefore needed to fill this gap. In this paper we propose an approach to process semantic information from user-generated OpenStreetMap (OSM) data to specify non-residential usage in residential buildings. This estimation is based on OSM attributes, so-called tags, which are used to define the extent of non-residential usage. Our objective is to identify the potentials and reveal the limitations of integrating semantic OSM data for the evaluation of building usage. Official statistical data on dwellings and population is used to validate results. Thereby we prove the benefit of integrating OpenStreetMap semantic data to refine the estimation of non-residential floor area in the study area of the German City of Dresden, Saxony. Ó 2015 Elsevier Ltd. All rights reserved.

1. Introduction The fields of urban and regional planning as well as ecological surveys and disaster management require detailed information about the number and location of dwellings and their inhabitants (see Ahola, Virrantaus, Krisp, & Hunter, 2007; Aubrecht, Steinnocher, Hollaus, & Wagner, 2009; BBK, 2010). Usually this data can be taken from census datasets or regional surveys. For example, new population statistics for Germany were recently released as part of the register-based census for the year 20111. Demographic and socioeconomic data for European countries and regions are provided by Eurostat. However, census data has a number of drawbacks, such as lengthy periods between data acquisition, the aggregation on large spatial units and the fact that administrative borders can change over time. In some countries census population data can also be of poor spatial resolution (ward level, state level) or non-existent (Alahmadi, Atkinson, & Martin, 2013). For these reasons, methods have been developed to disaggregate socio-economic information such as population and dwellings ⇑ Corresponding author. Tel./fax: +49 (0)3514679248. E-mail addresses: [email protected] (C. Kunze), [email protected] (R. Hecht). 1 https://www.zensus2011.de.

from various data sources, whether digital topographic databases or remote sensing imagery (Lo, 1995; Wu, Qiu, & Wang, 2005; Wurm & Taubenböck, 2010). In this way it is possible to determine population distribution with the help of detailed information at the level of buildings. Although remotely sensed data can be used for automatic building extraction and the identification of land use classes (residential or non-residential building), this approach is prone to errors (Wurm & Taubenböck, 2010). Current topographic databases usually contain single building footprints and usage information, in which distinctions are made between residential, industrial/commercial and other uses for leisure, sports, etc. However, data on building usage generally refers to an entire building or a building block, and does not provide any information on non-residential usage within residential buildings (ground-floor shops, offices, etc.). According to Meinel, Hecht, and Herold (2009), this lack of data on non-residential usage affects population mapping and the analysis of settlements structure, leading to over- or underestimations of the number of dwelling units or residents, especially in the city centre. Other data sources may remedy the deficit of small-scale information on building use. Commercial registers or workplace statistics are possible input data to estimate the floor area used for non-residential purposes, often called non-residential floor space. However, most of these sources are not readily available or are

http://dx.doi.org/10.1016/j.compenvurbsys.2015.04.002 0198-9715/Ó 2015 Elsevier Ltd. All rights reserved.

Please cite this article in press as: Kunze, C., & Hecht, R. Semantic enrichment of building data with volunteered geographic information to improve mappings of dwelling units and population. Computers, Environment and Urban Systems (2015), http://dx.doi.org/10.1016/j.compenvurbsys.2015.04.002

2

C. Kunze, R. Hecht / Computers, Environment and Urban Systems xxx (2015) xxx–xxx

prohibitively expensive for a majority of potential users. One promising alternative approach is to exploit Volunteered Geographic Information (VGI) (Goodchild, 2007) such as from the free and internationally available OpenStreetMap (OSM). As well as extensive geometric data, OSM offers a variety of semantic information, also describing the building stock. In this research paper we intend to discuss the potential of semantic OSM data to assist in the estimation of non-residential floor area by identifying non-residential usage within residential buildings. To this end we have developed an estimation model based on official building polygons and OSM point, line and polygon features. This non-residential floor space estimation model is validated by comparing calculated data on dwellings and population to a reference dataset. 2. State of the art Models of the urban environment require precise data on land use. There already exist several studies in geospatial science which have, for example, undertaken micro-spatial analyses of the urban structure as well as looking at principles of geometrical and semantic data integration. In order to give an overview of this research and subsequently to validate our results, previous works on dwelling and population estimation are presented in the following. In Section 2.3 we give a brief overview of the principles of VGI data collection. 2.1. Micro-spatial analysis of the urban structure Detailed information on the functional, morphological, and socio-economic structure of the built environment is required for urban modelling. Clearly the building stock is the most important component as it directly affects urban structures such as urban form, building density, or the distribution of dwellings and population. Although some major cities provide 3D-city models, there is a lack of fine-scale data at the level of individual buildings (e.g. information on height, the number of floors, building type, building functions), particularly for smaller towns and rural municipalities. Homogeneous datasets are also needed to enable comparative regional or national studies. Therefore, during the last few years, various methods have been developed to classify and describe urban structures based on small-scale settlement indicators. One way to capture urban structure is to make use of remote sensing data such as very high resolution (VHR) satellite imagery (Bauer & Steinnocher, 2006; Dogruso & Aksoy, 2007; Geiß et al., 2011; Herold, Liu, & Clarke, 2003; Walde, Hese, Berger, & Schmullius, 2013) or aerial imagery (Banzhaf & Höfer, 2008). More recently, laser-scanner data (LiDAR) has become increasingly applied to urban modelling as it permits the construction of 3D models of buildings (Wurm, Taubenböck, Roth, & Dech, 2009). Meinel et al. (2009) and Hecht, Herold, Meinel, and Buchroithner (2013) have described a method to derive urban structure types from scanned topographic maps at scale 1:25 k. Today topographic vector data provided by National Mapping and Cadastral Agencies (NMCA) is widely used to determine urban structure. Yet while these geo-topographic databases usually contain digital 2D building footprints, they often lack information on building types. There have been several attempts to classify building footprints automatically according to a building typology (Colaninno, Cladera, & Pfeffer, 2011; Henn, Römer, Gröger, & Plümer, 2012; Lüscher, Weibel, & Burghardt, 2009; Steiniger, Lange, Burghardt, & Weibel, 2008). Address point data has also been used to enhance the quality of building classification (Orford & Radcliffe, 2007; Smith & Crooks, 2010).

2.2. Dwelling and population mapping at the level of individual buildings GIS applications such as disaster management, risk and vulnerability assessment, facility management, public health and planning require precise spatial analysis of dwelling units and population at the scale of individual buildings. Due to concerns over privacy, census data at the building level is not available for public usage. Depending on the specific national policy, such data is aggregated to administrative units (municipalities, district, federal states) or census tracts. In Germany, national statistics on population are available at municipal level only, whereas other European countries such as Austria, Denmark, Estonia, Finland, Ireland, Norway or Slovenia already provide population data in raster format with cell size as small as 100 or 125 m (Eurostat, 2012). For many applications, aggregated data at administrative level does not adequately represent the underlying population distribution. To resolve this problem, techniques of dasymetric mapping are used to transform census population data into finer map units by means of ancillary data (Mennis, 2003). The technique refers to the process of disaggregation, discussed in detail by Eicher and Brewer (2001) and Maantay, Maroko, and Herrmann (2007). The simplest approach to disaggregation is the areal weighted transformation of population from a source unit to a target unit assuming that the population is proportionate to area (Maantay et al., 2007). However, since population is typically non-uniformly distributed within a unit due to different urban densities, auxiliary data is introduced to distinguish habited from uninhabited land, excluding the latter from the population mapping process. Data on land use/land cover or remote sensing imagery has been employed as such auxiliary sources of data. For example, Gallego, Batista, Rocha, and Mubareka (2011) distribute the population to a 100 m density grid by means of CORINE Land Cover data. Many micro-spatial analyses such as for local disaster management and urban planning require reliable population data at the level of urban block, plot or building. In the past, GIS expert systems have been developed to disaggregate census data using cadastral data (Maantay et al., 2007) or digital topographic maps (Meinel et al., 2009). These approaches rely on the fact that population is closely linked to the characteristics of the built environment such as the function, size, and height of buildings. First attempts to estimate the population of individual buildings have been undertaken by Lwin and Murayama (2009), using residential building footprints as the target unit for disaggregation. They introduced an areametric and a volumetric method, employing the building footprint area and the number of floors as input information, respectively. Ural, Hussain, and Shan (2011) have modified these areametric and volumetric models by implementing a weighting scheme that differentiates between ‘‘houses’’ and ‘‘apartments’’. A more detailed analysis of building types has been carried out by Meinel, Hecht, and Herold (2008) and Meinel et al. (2009), in which building footprints are extracted from topographic maps; average dwelling and population densities for building footprints are empirically determined for eight different residential building types using census data at block level. The characterization of buildings in terms of their function, size, and number of floors plays a key role in population mapping. Previous studies have looked at the impact of non-residential usage within residential buildings in distorting the mapping of dwelling unit and population (Lwin & Murayama, 2009; Meinel et al., 2009). In particular, populations are overestimated for buildings in the central districts of urban agglomerations and along main shopping streets (Wurm & Taubenböck, 2010). In such areas

Please cite this article in press as: Kunze, C., & Hecht, R. Semantic enrichment of building data with volunteered geographic information to improve mappings of dwelling units and population. Computers, Environment and Urban Systems (2015), http://dx.doi.org/10.1016/j.compenvurbsys.2015.04.002

C. Kunze, R. Hecht / Computers, Environment and Urban Systems xxx (2015) xxx–xxx

the ground floors of buildings are generally given over to commercial rather than residential use. Since comprehensive datasets that could increase data accuracy, such as commercial registries, are often unavailable, it is necessary to develop methods to estimate of the extent of non-residential usage inside primarily residential buildings. One way is to apply adjustment factors for mixed-use buildings to model the non-residential usage (Schiller & Bräuer, 2013). Bakillah, Liang, Mobasheri, Arsanjani, and Zipf (2014) show the potential of ancillary data to refine population estimates. Our study looks at the integration of VGI to enhance dwelling and population mapping at the level of individual buildings. POI, polygon and line information from OpenStreetMap is used to determine the usage at floor level and the residential floor area as a basis for dwelling and population mapping. 2.3. VGI as an additional data source At one time the gathering and updating of geospatial data was the responsibility of official land surveying agencies or commercial companies. Nowadays, the prevalence of new technology (handheld GPS, smartphones, etc.), online cartographic applications and services as well as social networks has created the basis for crowd-sourced mapping. In projects such as Foursquare, Google Maps, Twitter, OpenStreetMap and others, users are encouraged to provide personal data, for example their geographical location, or contribute geospatial information, for example points of interest (POI) or street lines. In OpenStreetMap, these days the most successful VGI-platform, a worldwide community of volunteers collects geospatial content that is subsequently stored in a spatial database. There are numerous ways of contributing information to OSM, for example the tracking of geometries in the field using GPS-enabled devices such as smartphones or tablet computers, the digitizing of data from aerial imagery, or the outlining and characterizing of spatial features on so-called ‘walking papers’. OSM users get assistance on how to collect and attribute the geospatial data from the official OpenStreetMap Wiki2 or during mapping parties. Researchers and professionals from industry or government as well as users from different disciplines have grasped the significance of crowd-sourced projects. Today information from OSM is used in various applications such as the rendering of map tiles, navigation services, the generation of 3D models, or disaster management (see crisis mapping3 for Haiti 2010 or the Philippines 2013). Of course, one issue when working with user-generated OSM data is the quality of data; in particular, (geometric and semantic) data completeness and positional accuracy must be reviewed. Several research studies have investigated the quality of OSM features such as streets, POIs or buildings (Girres & Touya, 2010; Haklay, 2010; Hecht, Kunze, & Hahmann, 2013; Strunck, 2010; Zielstra & Zipf, 2010). Similar results have been found concerning completeness: (i) in most countries urban areas are more complete than rural regions (Zielstra & Zipf, 2010); (ii) the completeness within agglomerations decreases with increasing distance from the city centre (Hecht et al., 2013); (iii) OSM show heterogeneities in geometrical accuracy and the attribution of features, affecting usability for research studies and practical applications (Girres & Touya, 2010). In addition to these investigations on geometrical quality, other studies have focussed on semantic accuracy and the annotation process (e.g. Girres & Touya, 2010; Mooney & Corcoran, 2012) as well as applying semantic web strategies for data integration and structuring (Ballatore & Bertolotto, 2011; Codescu, Horsinka, 2 3

http://wiki.openstreetmap.org/wiki/Main_Page. http://wiki.openstreetmap.org/wiki/Humanitarian_OSM_Team.

3

Kutz, Mossakowski, & Rau, 2011; Ramos, Vandecasteele, & Devillers, 2013; Stadler, Lehmann, Höffner, & Auer, 2012). However, the integration of VGI data is still a challenging process in view of such factors as data heterogeneity, redundant information, topicality or the absence of information on scaling (Sester, Arsanjani, Klammer, Burghardt, & Haunert, 2014). A first method to enrich 3D buildings with semantic information from freely available sources of Web 2.0 has been elaborated by Smart, Quinn, and Jones (2011). The authors integrated thematic information from geo-referenced locations (or buildings) in Wikipedia, OpenStreetMap and the web gazetteer Geonames. The main barriers to data integration are the absence of address information as well as positional inaccuracy of POI reference points resulting in data conflation. Here, we propose a method to integrate VGI data for a specific application: the enhancement of population estimation models. The work focuses on the integration of OSM and topographic data, the utilization of the gained semantic information for dwelling and population mapping as well as a model validation based on a comparison of estimated values with reference data. 3. Input data In our study we use two different sources of data. Firstly, a 3D building model containing building footprints and information on building height and building type is used. Secondly, OSM data served as the basis to enrich the 3D buildings with additional information on non-residential uses. These datasets are presented in the following two sections. 3.1. 3D building model Detailed urban modelling requires data on individual buildings, their size, shape and arrangement as well as the underlying usage. According to the Statistical Office of the European Commission (Eurostat), a building is defined as a ‘‘roofed construction which: can be used separately; has been built for permanent purposes; can be entered by persons; is suitable or intended for protecting persons, animals or objects’’ (Eurostat, 2015). The 2D geometry of buildings (i.e. footprints) and their functional attributes are usually provided by national mapping and cadastral agencies or, in large cities, by survey offices. Building representation in authoritative databases is well-defined through feature catalogues. The geometry is captured either by traditional surveying techniques, by the digitization of large-scale maps or the analysis of aerial imagery. In Germany, all cadastral building polygons are compiled in an homogeneous dataset called Amtliche Hausumringe Deutschland (HU-DE). The federal surveying and mapping authorities of the German Federal States are responsible for the production, whereas the dataset is made nationwide available by the Zentrale Stelle für Hauskoordinaten, Hausumringe und 3D-Gebäudemodelle (ZSHH). However, these building polygons do not provide any attribute information, and thus additional sources of data are required. With the help of land use information from datasets such as the ATKIS Base-DLM, the building stock can be further classified according to the functional usage, whether residential, commercial or public use. This information on usage refers to entire buildings and does not take account of mixed-use structures within an individual building. To enhance dwelling and population mapping, building polygons can be enriched with additional information on height by combining 2D geometry with, for example, airborne laser scanning data. While large cities often provide such 3D modelling of buildings at different levels of detail (LoD), similar datasets are not always available or accessible for large study areas. LoD is a

Please cite this article in press as: Kunze, C., & Hecht, R. Semantic enrichment of building data with volunteered geographic information to improve mappings of dwelling units and population. Computers, Environment and Urban Systems (2015), http://dx.doi.org/10.1016/j.compenvurbsys.2015.04.002

4

C. Kunze, R. Hecht / Computers, Environment and Urban Systems xxx (2015) xxx–xxx

Table 1 Building types of the building model.

VGI Dataset

Building Model

Building types Abr. Residential usage MFH-C MFH-O MFH-TR MFH-IR MFH-HR SFH-D SFH-T SFH-SD

Building type Multi-family house in closed block development (e.g. Wilhelminian style) Multi-family house in open block development Traditional row house Industrial row house High-rise building with a height greater than 22 m Single/ two family house (detached) Single/ two family house (terraced) Single/ two family house (semidetached)

Non-residential usage SFH-R Rural house IC Industrial/commercial building SF Specific functional area (e.g. public, education, health)

1 Data Integration • Pre-processing • Semantic integration • Geometric integration

2 Modelling NonResidential Floor Space • Typology • Parameter • Calculations

Enriched Building Model concept of CityGML, a common information model for the representation, storage and exchange of virtual 3D city and landscape models, defined by the Open Geospatial Consortium (OGC). LoD ranges from a prismatic building model with flat roofs (LoD 1) up to a fully textured building model containing walls and roofs with interior structures (LoD 4). In population and dwelling mapping, it is also useful to differentiate residential buildings according to building type (Meinel et al., 2009). When this information is unavailable, data on building type can be obtained using an automated classification process developed by Hecht (2014), which allows the classification of residential buildings into various building types. Table 1 shows nine residential building types that can be classified in this way. Thus building footprints can be supplemented with the following information: building area, building type and building height. For each building type, various assumptions can be made regarding the characteristics of buildings (e.g. storey height, dwelling unit size), which provides the foundation for the dwelling and population estimates discussed later in this contribution (Section 4.2.3). 3.2. Non-residential information from VGI (OpenStreetMap) Official data on buildings does not usually contain any information on mixed usage such as a partial non-residential usage within a residential building. OSM data, in particular the tagged values (the key-value pair) of points of interests (POI), indicates the spatial location of bakeries (shop = bakery) or medical practices (amenity = doctors), to name two examples. In contrast to the restricted availability of official spatial data, the OSM dataset is freely available, and provides a large amount of information that can be used to describe a building in terms of usage. The model introduced in this paper integrates data taken from point, line and polygon features into the 3D building model. The model to estimate non-residential floor space makes use of data in the form of OSM points of interest (POI), polygons tagged as buildings with additional use information and streets tagged as pedestrian zones. 4. Method In this section we introduce a method for the semantic enrichment of individual building footprints using VGI on non-residential usage taken from OpenStreetMap. Fig. 1 illustrates the workflow of the procedure and its validation based on reference data. In our approach the input data consists of the initial building model, VGI data and a reference dataset. The building model includes the 2D geometry and attributive information on building

3 Dwelling Unit and Population Estimation • Determination of building based parameters • Calculation of dwelling and population numbers

4 Model validation

Reference Data

• Aggregation of model results at block level • Comparison with reference dataset • Calculation of validation parameters Fig. 1. Workflow of model to estimate non-residential floor area.

area, height and type. VGI data from the OSM project contains suitable geometric and semantic information to model non-residential usage within each building footprint. Small-scale statistical data on population and dwelling units serve as reference data to validate the model results. 4.1. Data integration When working with more than one spatial dataset, it is vital that data be correctly integrated into the GIS environment. Thus aspects such as data type, coverage, the reference system and spatial accuracy have to be considered (Flowerdew, 1991). Data integration is driven by the aim either to derive new knowledge from the combined output dataset or to mutually exchange attributes of the two fused data sets (Goesseln & Sester, 2003). It also plays an important role in the harmonization of data across national borders (Gedrange, Neubert, & Röhnert, 2011). In order to integrate two datasets it is necessary to know the ways in which they represent geometric and semantic data. Official geo-topographical base data and user-generated spatial data in the form of VGI are characterized by a different geometric and semantic modelling, and thus data integration is essential. The following steps must be observed when integrating OSM data. 4.1.1. OSM data import In a first step VGI data must be converted into a form suitable for processing in a GIS environment. OSM extracts (countries, states) reflecting the complete dataset at a certain point in time

Please cite this article in press as: Kunze, C., & Hecht, R. Semantic enrichment of building data with volunteered geographic information to improve mappings of dwelling units and population. Computers, Environment and Urban Systems (2015), http://dx.doi.org/10.1016/j.compenvurbsys.2015.04.002

5

C. Kunze, R. Hecht / Computers, Environment and Urban Systems xxx (2015) xxx–xxx

can be downloaded from data providers such as Geofabrik.4 Available data formats are ESRI Shape File, the XML-OSM File (⁄.osm) and the Protocolbuffer Binary Format (PBF) (⁄.osm.pbf). Smaller extracts such as cities can be created using, for example, the Java application Osmosis. There are several import options to load OSM data into a GIS, depending on the data format, the chosen GIS environment and the size of the data extract. For a comprehensive description of the various data formats and import possibilities see Ramm, Topf, and Chilton (2010). In the case of an extensive study such as presented in this paper, which requires OSM point, polygon and line data covering the entire range of attributes (tags), the raw OSM XML format is best suited, and has in fact been chosen. 4.1.2. Semantic integration of OSM information To allow semantic integration, semantic models from each of the datasets must first be compared to reveal any disparities. Since the official dataset is the target dataset, all relevant OSM objects are assigned to one of the land use classes specified by the building model. In order to integrate the OSM objects by means of this classification, the semantic meaning from OSM, stored in the particular key-value combinations (tags), must be analysed. The most commonly used OSM tags are listed as Map Features in the OSM Wiki and can be used as basis for the selection of all relevant OSM tags. The most relevant tags are those describing any usage within a building such as the key-value pairs ‘‘amenity = bar’’, ‘‘building = hospital’’ or ‘‘landuse = residential’’. Afterwards, these key-value pairs are assigned to either residential or non-residential usage. Table 2 gives a short extract of the OSM-tag classification. A complete list of key-value pairs describing non-residential usages relevant for the further processing can be found in Appendix. 4.1.3. Geometric integration of OSM information Geometric data integration presupposes that data is available in a uniform reference system. In a first step the OSM dataset (WGS84 with geographic coordinates) must be transformed into the reference system specified by the official building dataset (usually European Terrestrial Reference System 1989 with UTM coordinates). In a second step the geometric features are integrated according to defined matching methods. Spatial intersection of is a common approach in GIS and can be used to match point features to polygons. Positional shifting due to varying acquisition accuracies can result in mismatching allocations. Another problem occurs when polygon features overlap. Then more than one feature can correspond to a target object, hindering a unique mapping. Four different types of relevant OSM features are integrated here:    

Points of interest (point) Street (line) Building (polygon) Land use (polygon)

The OSM points of interest (POI) such as shops, business, etc. are the primary source of data on non-residential usage. By means of a semantic query only those POIs were selected that describe non-residential usage. For OSM polygons, two different feature classes were created according to the classification on usage and their key information: non-residential buildings and land use polygons. Street line features do not directly indicate non-residential space, but contain information to help pinpoint non-residential usage, e.g. along pedestrian zones. Depending on their geometry type, features are processed differently. Points of interest (POI) can be matched through a 4

http://download.geofabrik.de/.

Table 2 Classification of OSM tag by usage. OSM key-value combinations Residential usage

Non-residential usage

Building = apartments = house = residential = terrace = dormitory

Building = hotel = shop = clothes = bag = optician = furniture ...

Landuse = residential ...

Building = school = university

Amenity = college = school = university = driving_school ...

point-polygon intersection, although a buffer needs to be applied to overcome spatial inaccuracies in the OSM POI dataset (Strunck, 2010). An appropriate buffer size to rectify duplicate and missing point assignments is 2 m (Kunze, 2013). The intersection and transfer of POI attributes to the building footprints can be performed by means of the ‘‘Spatial Join’’ function of ESRI ArcGIS. A polygon-to-polygon matching process is needed to match OSM building tags to the corresponding official building footprint. Here, an object-based method is best suited for this kind of attribute transfer from one dataset to another, where OSM polygon features are reduced to their centroids, which are subsequently intersected with the official building footprints (Hecht et al., 2013). In the case of land use polygons (e.g. tagged with landuse = commercial), the centroid of the building footprint is used to transfer the tag information. Land use tags are particularly helpful when buildings are only tagged ‘‘building = yes’’ and no specific usage information is available. Line features with the OSM tag highway = pedestrian were used to identify non-residential buildings primarily in commercial city centres. All building footprints >800 m2 at a distance <20 m to the pedestrian line element that were initially classified as residential were reclassified as industrial/commercial buildings. 4.2. Modelling of non-residential floor area 4.2.1. Extent of non-residential usage To determine the non-residential floor area (also called non-residential floor space) within buildings it is essential to describe the extent of the non-residential areas within the building. A fashion store (shop = clothes), a bakery (shop = bakery) and a school (amenity = school) are all common non-residential uses, but clearly differ in size. The dimension of a non-residential space can be specified by its floor area, the number of floors or a percent ratio. Due to the absence of reliable data on typical floor areas of shops, business, etc., non-residential parts are modelled on the assumed number of storeys. Thus a parameter called non-residential floor value (nrfv) is set for each relevant OSM key-value pair, given an integer value between 0 and x, where x is less than or equal to the total number of storeys (sn). For buildings that are entirely residential the value is 0, whereas for a building completely give over to non-residential usage the non-residential floor value equals the storey number (nrfv = sn).  nrfv = 0 No non-residential space  nrfv = 1, 2, . . ., x Partly non-residential space  nrfv = sn Full non-residential space

Please cite this article in press as: Kunze, C., & Hecht, R. Semantic enrichment of building data with volunteered geographic information to improve mappings of dwelling units and population. Computers, Environment and Urban Systems (2015), http://dx.doi.org/10.1016/j.compenvurbsys.2015.04.002

6

C. Kunze, R. Hecht / Computers, Environment and Urban Systems xxx (2015) xxx–xxx

In addition to the extent of the non-residential usage (expressed in the number of floors) we specify another parameter describing the vertical position of the storey(s) with non-residential use. We distinguish between businesses mostly found on the ground floor (i.e. those with walk-in customers) and those which are also found on an upper floor. This step is essential if more than one non-residential use object exists per building polygon. Therefore we introduce the parameter category (cat):  cat = 0 Non-residential usage in the whole building  cat = 1 Non-residential usage only on the ground floor  cat = 2 Non-residential usage also on an upper floor Both parameters are defined for each OSM feature depending on their semantic information (key-value pairs). An assignment table containing all possible tags with usage information was used to match the non-residential floor area parameters to the corresponding OSM objects (see Appendix). The figures in Table 3 illustrate the various forms of non-residential usage within buildings. 4.2.2. Calculation of non-residential floor area Areas of non-residential use are calculated for all buildings classified as residential according to Table 1. All industrial/commercial buildings (IC) or buildings with specific functional usage (SF) are omitted from further analysis. As OSM point-objects are spatially linked to the official building footprints, an algorithm is required to calculate the actual non-residential floor area for the whole building considering that there can be more than one non-residential object within the official polygon. Using the official building footprint ID, corresponding non-residential objects were selected and processed according to their non-residential parameters. The algorithm uses the non-residential floor value (nrfv) to calculate the non-residential floor area using the category value to indicate whether the nrfv has to be summed up or not. If the ground floor of the building contains at least one non-residential object (cat = 1 for e.g. shop = bakery), the whole ground floor is assumed to be non-residential space. In this case, possible residential space on the ground floor neighbouring the bakery is not considered. Two examples of how the algorithm specifies the non-residential floor area as a simplified pseudo-code are shown in Table 4 Three additional variables were introduced into the algorithm. The default values for the first floor (first) and upper floors (upper) as well as the non-residential floor area (nrfa) were set at the beginning of the calculation. The calculation was finished as soon as a loop ran through all official buildings by means of their identification number and their nrfa was calculated. In OSM multiple tags with the same meaning can appear. A building may thus be tagged as ‘‘building = hotel’’ and ‘‘tourism = hotel’’. In addition, several tags can describe a non-residential

usage with different levels of detail. A POI, for example, can be tagged as ‘‘historic = castle’’, ‘‘tourism = museum’’ and ‘‘shop = ticket’’ at the same time. Therefore a priority value (rank) is introduced defining the precedence of a certain key over another (see Appendix). The higher the rank the more specific is the usage information. If, for example, a key with a priority value of 1 is available for a building, all other keys with lower priorities are disregarded in the processing since they do not provide any further concretisation. The above-mentioned algorithm is applied to all point objects (POIs and the derived building polygon centroids) to calculate the non-residential floor area for each building. OSM line features and land use polygons are treated differently. Those buildings primarily classified as residential buildings are reclassified as buildings with solely non-residential usage if they match predefined criteria (e.g. distance to pedestrian streets; see Section 4.1.3). In a final step the relative non-residential floor area is calculated for each official building polygon. 4.2.3. Estimation of dwelling units and population Some assumptions must be made in order to estimate the number of dwelling units and population of a building. These are derived from typical building parameters as described in the literature. Table 5 shows the parameter settings of the nine building types. Based on the building height (bh), building area (ba) and the assumed storey height (sh), it is possible to calculate the number of stories (st) and the floor area (fa) for each building, which function as basic building parameters to determine the population and number of dwellings. The estimation of the number of dwelling units and population is based on the derived floor area with residential use (also called living space), which is determined by means of the building area multiplied by the calculated number of stories and reduced by a conversion factor (cf) to take account of stairwell and hall areas. In order to calculate the number of dwelling units within the building, the living space is divided by the average dwelling unit size (ds). This statistical parameter varies according to building type, ranging from 60 m2 to 120 m2 (see Table 5). The final step is to multiply the calculated number of dwelling units (nd) with the parameter describing the average number of inhabitants per dwelling unit (p). The descriptions and calculations of all building parameters used for the estimation of dwelling units and population are displayed in Table 6. 4.3. Model validation Due to the lack of appropriate reference data it is essential to validate the estimated non-residential floor area. The validation process is to compare estimated building populations and numbers of dwelling units with official statistical data at city block level. A

Table 3 Forms of non-residential usage within buildings. Figure

Description Non-residential floor value (nrfv) Category (cat) Example

Non-residential usage in the whole building sn

Non-residential usage in several storeys 1 . . . sn

Non-residential usage on the ground floor 1

No non-residential usage (residential building) 0

0 Building = university nrfv = sn; cat = 0

2 Office = it nrfv = 1.5; cat = 2

1 Amenity = bar nrfv = 1; cat = 1

– Building = house nrfv = 0

Please cite this article in press as: Kunze, C., & Hecht, R. Semantic enrichment of building data with volunteered geographic information to improve mappings of dwelling units and population. Computers, Environment and Urban Systems (2015), http://dx.doi.org/10.1016/j.compenvurbsys.2015.04.002

7

C. Kunze, R. Hecht / Computers, Environment and Urban Systems xxx (2015) xxx–xxx Table 4 Two examples of determining the non-residential floor area (nrfa) for an entire building. Example 1

Table 6 Building parameters for the estimation of dwelling units and population. Building parameter

Description

Calculation/ Source

Building height (bh)

Height of the building in metres Size of the building area in square metres

3D-building model 3D-building model (building footprints) Meinel et al. (2008) and Gruhler et al. (2002) Müller and Korda (1999) Meinel et al. (2008)

Example 2

Figure

Building area (ba)

Input

Official building footprints; several nonresidential objects Official building footprints with one nonresidential floor area value (nrfa) Bakery: nrfv = 1; Bakery: nrfv = 1; cat = 1 cat = 1

Output Non-residential objects from OSM

Hotel:

nrfv = sn; cat = 0 Hair nrfv = 1; stylist: cat = 1 first = 0; upper = 0; nrfa = 0 # 1st object if cat == 1 and first == 0 first = 1 # 2nd object if cat == 0 nrfa = sn break # 3rd object is not processed

Initialize Pseudo code

Hair stylist: Lawyer:

nrfv = 1; cat = 1 nrfv = 1; cat = 2 first = 0; upper = 0; nrfa = 0 # 1st object if cat == 1 and first == 0 first = 1 # 2nd object if cat == 2 upper = nrfv + upper upper = 1 # 3rd object if cat == 1 and first <> 0 first = 1 # outer loop if nrfa == 0 then nrfa = first + upper

Results

nrfa = sn

nrfa = 2

Table 5 Building parameters for each of the building types. Building types Code

Shortcut

11 12 21 22 23

MFH-C MFH-O MFH-TR MFH-IR MFHHR SFH-D SFH-T SFH-R SFH-SD

31 32 33 34

Storey heighta

Conversion factorb

Dwelling unit size (m2)c

Number of persons per dwelling unitd

3.5 3.5 3.0 3.0 3.0

1.25 1.25 1.25 1.25 1.25

70 70 60 60 60

1.8 1.8 1.8 1.8 1.8

3.0 3.0 3.0 3.0

1.25 1.25 1.40 1.25

120 100 100 100

2.4 2.4 2.4 2.4

MFH = multi-family house. SFH = single/two family house. Taken from: a (Meinel et al., 2008), (Gruhler, Böhm, Deilmann, & Schiller, 2002). b (Müller & Korda, 1999), (Schiller & Bräuer, 2013). c (Meinel et al., 2008). d (Destatis, 2012).

city block, sometimes referred to as an urban block or street block, is usually defined as a group of plots bounded by street lines (Conzen, 1960) whose geometry can be derived by a decomposition of the street network (Jiang & Liu, 2012) or through a delineation process from topographic maps (Muhs, Herold, Meinel, & Burghardt, 2013). To determine the benefit of the semantic enrichment with VGI, the calculation is carried out twice – initially with the original number of stories and then with the reduced number

Storey height (sh)

Average height of a storey (depending on building type)

Conversion factor (cf)

The ratio between living space and total floor area Average size of a dwelling unit in square metres (depending on building type) Average number of persons living in a dwelling unit (depending on building type) Number of stories of a building Usable area within the whole building Residential space within the whole building Number of dwelling units within the whole building Number of inhabitants within the whole building

Dwelling unit size (ds)

Number of persons per dwelling unit (p) Number of stories (st) Floor area (fa) Living space (ls) Number of dwelling units (nd) Number of Inhabitants (ni)

Taken from statistical surveys st = bh/sh fa = st ⁄ ba ls = fa/cf nd = ls/ds ni = ap ⁄ p

of stories resulting from the OSM non-residential floor area. This produces a dataset on city blocks that can be used to determine all sorts of useful parameters such as population and dwelling unit densities as well as variations in building quality. In our analysis we focused on two indicators to compare the population and dwelling unit values without and with OSM data on usage. The first indicator is the relative difference between the calculated results and the validation data considered as an over- or underestimation of modelling results. Another way to compare the findings is to look at the block-based densities of population and dwelling units. As absolute values, these density parameters allow easily comparison of calculated data with the reference dataset. Using these two indicators it is possible to visualize the results of the present study, thereby aiding discussion. Another useful indicator for evaluating modelling results is the building type. Through an aggregation process (Hecht et al., 2013), the dominant building type can be determined for each city block. This information can be used to realize a more differentiated validation of estimates on population and number of dwelling units according to building type.

5. Study area and employed data sets The area investigated is the city of Dresden, located in the east of Germany (see Fig. 2). Dresden is the capital city of the Free State of Saxony and has a population of around 530,000. In our study we distinguish between the city centre and the urban outskirts (city area minus the city centre).

5.1. Creation of 3D building model For this study a classified 3D building model (LoD1) (see Section 3) was developed using official building polygons (HU-DE from 31 March 2011), a 3D-City model of Dresden and land use information from the Digital Basic Landscape Model (Basic DLM)

Please cite this article in press as: Kunze, C., & Hecht, R. Semantic enrichment of building data with volunteered geographic information to improve mappings of dwelling units and population. Computers, Environment and Urban Systems (2015), http://dx.doi.org/10.1016/j.compenvurbsys.2015.04.002

8

C. Kunze, R. Hecht / Computers, Environment and Urban Systems xxx (2015) xxx–xxx

Fig. 2. The study area of Dresden and its location in Germany.

of the Authoritative Topographic-Cartographic Information System (ATKISÒ, 2011). Since the official building polygons do not contain any attributes, usage information from ATKISÒ Base-DLM and building heights from the 3D-City model of Dresden were attached to the building footprints. Four main classes of use are distinguished within the ATKIS dataset: Residential Area (class 2111), Industrial, Commercial Area (class 2112), Mixed Use Area (class 2113) and Specific Functional Area (class 2114). Based on this data, the entire stock of buildings was classified according to the typology described in Section 3. Table 7 gives an idea of the city’s built structure and mean values of selected building parameters used for further modelling.

Table 7 Statistics of the 3D building model. City of Dresden Building types

No. of buildings

Mean building area (m3)

Mean floor area (m3)

Mean storey height (m)

MFH-C MFH-O MFH-TR MFH-IR MFH-HR SFH-D SFH-T SFH-R SFH-SD IC SF

4285 15,478 7126 4186 67 13,967 4213 3923 5902 27,345 1376

246.3 210.9 187.4 195.6 712.6 100.6 64.0 164.9 75.4 309.2 913.7

1124.1 641.3 864.8 1226.3 9311.1 215.4 160.9 363.4 164.5 n.a. n.a.

4.5 2.8 4.5 6.1 13.7 2.1 2.6 2.1 2.2 n.a. n.a.

Total

87,868

5.2. OpenStreetMap data An OSM extract for the Free State of Saxony was downloaded on 18 June 2013 from the Geofabrik website to be used as input data. After pre-processing regarding non-residential use, OSM data for the study area of Dresden consisted of 3650 POIs, 1,757 non-residential buildings, 785 land use polygons and 158 lines tagged as pedestrian highways. 5.3. Reference data For the purpose of model validation, data on population and dwelling units for 5790 city blocks in Dresden was obtained from the local statistics department (Statistikstelle der Landeshauptstadt Dresden). These datasets are as of 31 December 2012. It must be pointed out that the dwelling units and population numbers are in fact incomplete due to anonymized city block (387 blocks with no data on population and 823 blocks with no number of dwelling units). Those city blocks were ignored in the following analyses. 6. Results 6.1. Descriptive results of non-residential floor area There are a total of 87,885 reference buildings within the city of Dresden, of which 58,227 contain residents, i.e. are classified as either residential or mixed use buildings. The outcome dataset contains all buildings along with their building information, such as the number of storeys, as well as the non-residential floor area as a total number of storeys or a relative value. In 1749 buildings (3% of all residential/mixed use buildings), the relative proportions

Please cite this article in press as: Kunze, C., & Hecht, R. Semantic enrichment of building data with volunteered geographic information to improve mappings of dwelling units and population. Computers, Environment and Urban Systems (2015), http://dx.doi.org/10.1016/j.compenvurbsys.2015.04.002

C. Kunze, R. Hecht / Computers, Environment and Urban Systems xxx (2015) xxx–xxx

of non-residential floor areas ranged between 6% and 100%. These are generally instances of non-residential usages with a floor area value of one storey. The following diagram (Fig. 3) visualizes the proportion of non-residential floor area in a histogram. It can be seen that the proportion of non-residential usage in buildings lies either in the range 10% to 30% or is equal to 100% (whole building). The map extract (Fig. 4) shows all residential buildings colour coded by their relative non-residential floor area within the Altstadt area of Dresden. Non-residential POIs from OSM are also visualized. Most of the darker polygons displaying up to 100% non-residential floor area are large in size and rectangular in shape. These polygons seem to be non-residential buildings for which non-residential space has been (largely) correctly identified. Residential row houses in the south-west of Dresden’s Altstadt do not contain any non-residential OSM objects and therefore have no value for non-residential floor area.

Proporon of non-residenal floor area in %

Number of buidlings

600 500 400 300 200 100 0

Fig. 3. Distribution of relative non-residential floor area.

9

6.2. General results on population and number of dwellings As described in Section 4.3, dwelling and population values (with and without OSM) were calculated and compared to the reference data in order to evaluate the determined non-residential space. The following two bar charts in Fig. 5 show the population values and number of dwelling units for the whole dataset, containing the reference values and the calculated results without and with integrated OSM data. There is a significant difference between the reference values and the determined results for the study area of Dresden; both of the investigated parameters (number of dwellings and inhabitants) are overestimated by around 25%. A reduction in the calculated figures for dwelling units and population by incorporating non-residential information from OSM is also identifiable, though less pronounced. The use of OSM data helped reduce the level of error within the whole dataset by approx. 12%. 6.3. Dwelling units and population at city block level A detailed analysis of the number of dwelling units and inhabitants was undertaken for the 11,539 city blocks of Dresden. At this level the dwelling and population densities as well as disparities between the calculated results and the reference data can be accurately investigated and visualized. The model results are aggregated for city blocks so that all required parameters such as the number of dwelling units, inhabitants or stories have to be summed. The following choropleth map (Fig. 6) shows the study area of Dresden with population density per city block as calculated from reference data. The most densely populated city blocks are situated at a distance of approx. 5 km from the city centre. It can be

Fig. 4. Proportions of non-residential floor area of buildings in Dresden’s Altstadt.

Please cite this article in press as: Kunze, C., & Hecht, R. Semantic enrichment of building data with volunteered geographic information to improve mappings of dwelling units and population. Computers, Environment and Urban Systems (2015), http://dx.doi.org/10.1016/j.compenvurbsys.2015.04.002

10

C. Kunze, R. Hecht / Computers, Environment and Urban Systems xxx (2015) xxx–xxx

assumed that the overestimated values for population and dwelling units are largely to be found in this zone, because most of the commercial use is also located here. Improvements in the approach of using OSM to identify non-residential usage can be analysed by calculating the differences between population densities in each city block without and with non-residential floor area. Fig. 7 shows the absolute difference between the population density of the reference data and the results after OSM data integration. A blown-up section of the commercial city centre reveals large differences (highlighted in red) in calculated values for individual blocks. To better understand existing inaccuracies within particular city blocks and to find ways to improve the estimation model, it is useful to consider building types in greater detail. The bar charts in Fig. 8 show the total number of dwelling units and population for each building type within city blocks (anonymized city blocks

are left out) for the entire study area. Next to the reference values (dark grey) we see the calculated results without (light grey) and with OpenStreetMap data (grey). It is apparent that the most common type of dwelling in city blocks featuring residential buildings in Dresden is open multi-family houses (MFH-O) followed by traditional row multi-family houses (MFH-TR). Rural houses (SFH-R) and semidetached single/two family houses (SFH-SD) are less frequently the dominant building type in the study area. Although the figures for population are based on the estimated number of dwelling units, some building types show clear differences between their reference and calculated values. For example the dominant building type of the high rise building is characterized by an underestimation of dwelling units, but a slight overestimation of inhabitants. A detailed study of the differences between the number of dwelling units and population for each building type could reveal the constraints of the population estimation model

Populaon 800,000

350,000

700,000

Number of Inhabitants

Number of Dwelling Units

Dwelling Units 400,000

300,000 250,000 200,000 150,000 100,000 50,000 0

600,000 500,000 400,000 300,000 200,000 100,000 0

Dwelling Units

Populaon

Reference

291,658.00

Reference

529,922.00

Without OSM

358,640.63

Without OSM

671,150.83

With OSM

348,037.01

With OSM

651,844.32

Fig. 5. Comparison of numbers of dwelling units and inhabitants.

Fig. 6. Population density in city blocks for the study area of Dresden.

Please cite this article in press as: Kunze, C., & Hecht, R. Semantic enrichment of building data with volunteered geographic information to improve mappings of dwelling units and population. Computers, Environment and Urban Systems (2015), http://dx.doi.org/10.1016/j.compenvurbsys.2015.04.002

11

C. Kunze, R. Hecht / Computers, Environment and Urban Systems xxx (2015) xxx–xxx

Fig. 7. Absolute difference between population densities derived from reference data and from calculations with OSM integration.

80.000

Number of dwelling units

Dwelling units per building type

(a)

70.000

Reference Esmated without using OSM Esmated using OSM

60.000 50.000 40.000 30.000 20.000 10.000 0 MFH - C

MFH - O

MFH - TR

MFH - IR

MFH - HR

SFH - D

SFH - T

SFH - R

SFH - SD

Populaon per building type Number of inhabitants

(b) 140.000 Reference Esmated without using OSM Esmated using OSM

120.000 100.000 80.000 60.000 40.000 20.000 0 MFH - C

MFH - O

MFH - TR

MFH - IR MFH - HR

SFH - D

SFH - T

SFH - R

SFH - SD

Fig. 8. Distribution of (a) dwelling units and (b) population for all building types within the whole study area.

and its underlying parameters. These characteristics were previously discussed in Section 5. 6.4. Comparing the city centre with the outskirts In view of the different building structures and population density between the city centre (inner city) and the urban outskirts (see Fig. 6), further investigations were undertaken to analyse both areas separately. The boundary of the inner city zone was

borrowed from a planning concept of the city of Dresden regarding inner city development (City of Dresden, 2008). The city centre area can be seen highlighted in the previous figure of the study area (Fig. 2). Typical housing forms in the city centre are multi-family houses, mostly closed, traditional or industrial row houses, as well as high-rise buildings. In contrast, the rest of the city is made up of a large number of blocks with open multi-family houses and detached/semidetached or terraced single/two family houses.

Please cite this article in press as: Kunze, C., & Hecht, R. Semantic enrichment of building data with volunteered geographic information to improve mappings of dwelling units and population. Computers, Environment and Urban Systems (2015), http://dx.doi.org/10.1016/j.compenvurbsys.2015.04.002

12

C. Kunze, R. Hecht / Computers, Environment and Urban Systems xxx (2015) xxx–xxx

An analysis was carried out for these two areas on the dominant building types presented in the previous section. For the most common building types of the inner city, a significant overestimation in calculated values was found in contrast to the reference data. The decrease in population density as a result of the integration of OSM data also varies between the individual building types. For instance, in city blocks dominated by closed multi-family houses, the reduction in estimated population by considering non-residential use areas was larger than for other multi-family houses. As expected, single/two family houses (SFH) are generally located outside the city centre. From the data in Fig. 9, it is apparent that the calculated population within single/two family houses (except for rural buildings) is largely the same as the reference value. Furthermore, the differences between the reference values and the calculated results in all of the other building types are smaller than in the city centre. In fact an underestimation in calculated population can be seen in the case of two of the multi-family house classes (open and industrial row). Error reduction of the estimated population by means of integrated OSM objects is significantly higher in the city centre than in the outskirts. In particular, the disparity was reduced by more than 50% within the closed multi-family house.

Populaon per building type within the city centre

35.000

As shown Fig. 9, the disparities between the reference data and the modelling results vary according to building type. Taking a closer look at the error ratio for each type, it becomes clear that closed multi-family houses (MFH-C) and traditional row houses (MFH-TR) are the main sources of error. The pie charts in Fig. 10 illustrate all building types with the corresponding proportion of total error within the estimated dataset without and with OSM integration. Three building types are particularly prominent due to high error ratios. Comparing the results before and after the integration of OSM objects and non-residential floor area estimation, it can be seen that the error within the closed block development (MFH-C) was significantly reduced. At the same time the relative contribution to total error of other classes such as rural houses (SFH-RH) or multi-family houses in open block development (MFH-O) became larger.

7. Discussion Different to earlier more general studies, the current research shows a concrete case in which VGI, specifically OpenStreetMap

Populaon per building type in the outskirts

(b) 140.000 Number of inhabitants

Number of inhabitants

(a) 40.000

6.5. Errors within particular building types

30.000 25.000 20.000 15.000 10.000

120.000 100.000 80.000 60.000 40.000

5.000

20.000

0

0

Reference Esmated without using OSM Esmaed using OSM

Reference Esmated without using OSM Esmaed using OSM

Fig. 9. Population per building type (a) within the city centre and (b) in the outskirts.

Proporons of total error within the esmated dataset for all building types by using OSM

without using OSM SFH-T: 0%

SFH-SD: 0%

SFH-T: 1%

MFH-IR: 3%

MFH-C: 8%

SFH-D: 1%

SFH-D: 2% MFH-HR: 5%

SFH-SD: 1%

SFH-RH: 11%

MFH-HR: 4% MFH-C: 29%

MFH-TR: 46%

SFH-RH: 13%

MFH-IR: 1%

MFH-O: 4%

MFH-O: 18%

MFH-TR: 53%

Fig. 10. Buildings types and corresponding proportions of the total error by estimating population without using OSM (left) and using OSM (right).

Please cite this article in press as: Kunze, C., & Hecht, R. Semantic enrichment of building data with volunteered geographic information to improve mappings of dwelling units and population. Computers, Environment and Urban Systems (2015), http://dx.doi.org/10.1016/j.compenvurbsys.2015.04.002

C. Kunze, R. Hecht / Computers, Environment and Urban Systems xxx (2015) xxx–xxx

information, is used to improve the mapping of dwellings and population by developing building-based data sets that incorporate the number of non-residential floors. In contrast to other approaches, including an estimation of non-residential space for the whole building stock, the presented method considers the varying building structure and non-residential floor space at the level of buildings. Overestimations of dwelling units and inhabitants per building can be identified for multi-family houses and especially within the city centre. These can be largely attributed to undetected non-residential space due to patchy OSM data. As discussed in other VGI studies (see Section 2.3.), the incompleteness and inconsistency of the OSM data set are currently the main barriers to working with this data in the field of research and for large-scale applications. Although we have not investigated the completeness of point, polygon or line elements to describe non-residential use, we can assume that overestimations, for example within traditional row houses (multi-family houses), are the result of unidentified shops and businesses. Instead we investigated a pattern within non-residential use information supplied by OSM that can be explained by data acquisition procedures largely carried out on-site using handheld GPS devices or on paper, as well as by digitizing aerial imagery. As copying from other data sources such as Google Maps is prohibited, objects can only be mapped that are freely available or which are visible to OSM volunteers. This means that shops and businesses that depend on walk-in customers are more ‘visible’ to mappers than, for example, doctors’ surgeries. Moreover, semantic information is only completed for OSM POI or polygon features which gets rendered by popular mapping applications. This means that objects and information not visible in OSM maps are neglected by OSM mappers. On the other hand, existing semantic information cannot be used due to contradicting tag combinations or typing errors. Tagging errors may also appear due to different interpretations of OSM tags by various OSM mappers or due to incorrect spelling. In our research we focus on the most commonly used OSM tags listed in the ‘‘Map Features’’ on the OSM Wiki website, which act as an informal standard. These key-value pairs are approved and frequently used. According to the OSM taxonomy new tags can be established or old tags used in a new way. A solution to this may be the modification of the assignment table. This uncertainty of possible misinterpretations of several tags cannot be identified easily and was not considered in our calculations. It may be possible to verify a selection of OSM-objects like hotels or malls, based on average building parameters like size or height. The final decision about tagging a building as e.g. a hotel or a hostel is in the hand of the OSM mapper and cannot be determined. Another source of error in the estimation of non-residential floor area is the varying size of shops and businesses that cannot be distinguished by the currently used OSM tag combinations. That is to say, a clothing store (shop = clothes) can in fact be a small boutique or a large multi-storey clothes store. Hence, the determination of areas of non-residential use within buildings is, in the case of large department stores and warehouses, prone to error. The applied non-residential tags do not provide any further information on the size of the shop or business that could help refine the modelling process. One way to compensate for some of the problems of extracting and utilizing the OSM non-residential use data is to apply semantic strategies, for example to develop or adapt an ontology containing information on building usage. One task in this respect could be to analyse the name=⁄ tag to identify common department store chains that are clearly large in size. Looking at the overall results before and after integrating OSM data, a relatively minor improvement can be observed. In order to get the estimates of dwelling units and population closer to the reference data official statistical data (usually available at municipality level) can be used to constrain them. Here, the

13

building parameters living space (ls) and number of dwelling units (nd) serve as target units in the disaggregation process. This would certainly improve the model in terms of total errors on the city level, but would also affect the validation of results. In order to get realistic (unbiased) error estimates official statistical data has not been introduced to our research before the model validation phase. Nevertheless, this final step is still essential in the practical application of the model. In addition, further fine-tuning can be arranged by adjusting the initial building parameters. The method has been tested and validated on a City in Germany. Further research on the transferability to other cities with comparable official data (e.g. 3D building model) is needed. An application to cities outside of Germany or even Europe is accompanied by some adaptions due to regional differences of the urban structure. In this case, building types and model parameters (esp. building parameters and the non-residential spaces of different usages) need to be reviewed and revised according to the local conditions. Due to the fact that OSM is a worldwide project the proposed procedure of integrating semantic information can theoretically be applied to every city. However, one of the biggest challenges is the spatially varying quality of data. OSM usually provides better coverage in urban areas but completeness may vary for different world regions due to different contributor activities (Neis, Zielstra, & Zipf, 2013). Current developments of intrinsic quality measures (Barron, Neis, & Zipf, 2014) may help identifying areas with potentially low data quality. Another aspect is that the meanings of tags can change over time (Ballatore, Bertolotto, & Wilson, 2013). This requires keeping the corresponding assignment table up to date in regard to the current developments of the OSM project.

8. Conclusion and further research In this paper we have introduced a method to enrich official building footprints with semantic VGI information in order to refine the estimation of the number of dwelling units and population using building-based dasymetric mapping techniques. Information on usage was extracted from OpenStreetMap data, and then applied to determine non-residential space for each building. This information is the basis for a more accurate modelling of dwellings units and population. Parameters describing the non-residential floor area were developed and extracted from OSM information. Medium-sized cities such as the study area selected here offer a large number of business and commercial information for cartographic presentation as point, polygon and line features. The results of the model validation for numbers of dwelling units and population show the usefulness of this method to estimate the non-residential floor area for individual buildings, indicated here by a decrease in the calculated number of dwelling units and inhabitants per building. Moreover, the validation for single family houses illustrates the suitability of the dwelling unit and population estimation approach by revealing only a minor deviation from reference data. By applying information on usage extracted from OSM, it was possible to reduce the overestimation of numbers of dwelling units and population by 12% for the whole study area. Comparing the inner city with peripheral zones, it could be found that overestimations of results were larger in the city centre. Differences in the size of the overestimation were also apparent between the particular building types. The error reduction was most successful for closed multi-family houses. Further studies could assess the application of data sets on business or land use other than OpenStreetMap to enhance the estimation of socio-economic parameters. If alternative geospatial data sets are introduced then attention must be paid to the semantic specification of data. Assuming that attributes are described in

Please cite this article in press as: Kunze, C., & Hecht, R. Semantic enrichment of building data with volunteered geographic information to improve mappings of dwelling units and population. Computers, Environment and Urban Systems (2015), http://dx.doi.org/10.1016/j.compenvurbsys.2015.04.002

14

C. Kunze, R. Hecht / Computers, Environment and Urban Systems xxx (2015) xxx–xxx

the same way as in OSM, the presented model can be applied. Otherwise it will be necessary to process the source data to fit the discussed method. To further improve the accuracy of the model, residential vacancy rates at district level could be introduced to reduce the population overestimation. Another line of research could be to investigate the influence of the building parameters adopted for the estimation of the number of dwelling units and for population. These parameters can be reviewed and adjusted in the light of the findings concerning errors within the dominant building types as well as the disparities between the outskirts and the inner city. Acknowledgement The presented findings were in part the object of a diploma research project realized at th Leibniz Institute of Ecological Urban and Regional Development (IOER) with the assistance of the Institute for Cartography of Dresden University of Technology. All mentioned official spatial base data were at the disposal of the IOER for the purposes of research. The authors would like to thank the Federal Agency for Cartography and Geodesy (Bundesamt für Kartographie und Geodäsie, BKG) and the Statistics Department of the City of Dresden (Statistikstelle der Landeshauptstadt Dresden) for providing this data. Appendix A. Supplementary material Supplementary data associated with this article can be found, in the online version, at http://dx.doi.org/10.1016/j.compenvurbsys. 2015.04.002. References Ahola, T., Virrantaus, K., Krisp, J. M., & Hunter, G. J. (2007). A spatio-temporal population model to support risk assessment and damage analysis for decisionmaking. International Journal of Geographical Information Science (IJGIS), 21(8), 935–953. Alahmadi, M., Atkinson, P., & Martin, D. (2013). Estimating the spatial distribution of the population of Riyadh, Saudi Arabia using remotely sensed built land cover and height data. Computers, Environment and Urban Systems, 41, 167–176. Aubrecht, C., Steinnocher, K., Hollaus, M., & Wagner, W. (2009). Integrating earth observation and GIScience for high resolution spatial and functional modeling of urban land use. Computers, Environment and Urban Systems, 33(1), 15–25. Bakillah, M., Liang, S., Mobasheri, A., Arsanjani, J. J., & Zipf, A. (2014). Fine-resolution population mapping using OpenStreetMap points-of-interest. International Journal of Geographical Information Science (IJGIS). http://dx.doi.org/10.1080/ 13658816.2014.909045. Ballatore A. & Bertolotto M. (2011). Semantically enriching vgi in support of implicit feedback analysis. In Web and wireless geographical information systems. Lecture notes in computer science (Vol. 6574, pp. 78–93). Ballatore, A., Bertolotto, M., & Wilson, D. C. (2013). Geographic knowledge extraction and semantic similarity in OpenStreetMap. Knowledge and Information Systems, 37(1), 61–81. Banzhaf, E., & Höfer, R. (2008). Monitoring urban structure types as spatial indicators with CIR aerial photographs for a more effective urban environmental management. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 1(2), 129–138. Barron, C., Neis, P., & Zipf, A. (2014). A comprehensive framework for intrinsic OpenStreetMap quality analysis. Transactions in GIS, 18, 877–895. http:// dx.doi.org/10.1111/tgis.12073. Bauer, T., & Steinnocher, K. (2006). Per-parcel land use classification in urban areas applying a rule-based technique. GeoBITGIS, 6, 24–27. BBK – Bundesamt für Bevölkerungsschutz und Katastrophenhilfe (Eds.) (2010). Abschätzung der Verwundbarkeit gegenüber Hochwasserereignissen auf kommunaler Ebene. Schriftenreihe Praxis im Bevölkerungsschutz, Band 4, Bundesamt für Bevölkerungsschutz und Katastrophenhilfe. City of Dresden (2008). Living history – Urban townscape – Dresden – Inner city planning strategy 2008. City of Dresden. Municipal Department for City Development. Accessed 17.06.14. Codescu, M., Horsinka, G., Kutz, O., Mossakowski, T., & Rau, R. (2011). OSMonto – An ontology of OpenStreetMap tags. In Conference proceedings of the state of the map Europe (SOTM-EU). Colaninno, N., Cladera, J. R., & Pfeffer, K. (2011). An automatic classification of urban texture: Form and compactness of morphological homogeneous structures in

Barcelona. In 51st European congress of the regional science association international. Conzen, M. R. G. (1960). Alnwick, northumberland: A study in town-plan analysis. Transactions and Papers (Institute of British Geographers), 27, 3–122. http:// dx.doi.org/10.2307/621094. Destatis (2012): Fachserie 5, Heft 1. Bauen und Wohnen. Mikrozensus – Zusatzerhebung 2010. Bestand und Struktur der Wohneinheiten. Wohnsituation der Haushalte. Wiesbaden: Statistisches Bundesamt. Accessed 17.06.14. Dogruso, E. & Aksoy, S. (2007). Modeling urban structures using graph-based spatial patterns. In IEEE international geoscience and remote sensing symposium (IGARSS 2007) (pp. 4826–4829). Eicher, C. L., & Brewer, C. A. (2001). Dasymetric mapping and areal interpolation: Implementation and evaluation. Cartography and Geographic Information Science, 28(2), 125–138. Eurostat (2012). GEOSTAT 1A – Representing Census data in a European population grid, final report Eurostat 2012. Accessed 17.02.14. Eurostat (2015). Glossary: Building. Accessed 17.02.15. Flowerdew, R. (1991). Spatial data integration. In D. J. Maguire, M. F. Goodchild, & D. W. Rhind (Eds.). Geographical information systems: Principles and applications (Vol. 1, pp. 375–387). Harlow: Principles. Longman. Gallego, F. J., Batista, F., Rocha, C., & Mubareka, S. (2011). Disaggregating population density of the European Union with CORINE land cover. International Journal of Geographical Information Science., 25(12), 2051–2069. http://dx.doi.org/10.1080/ 13658816.2011.583653. Gedrange, C., Neubert, M., & Röhnert, S. (2011). Cross-border harmonisation of spatial base data between Germany and the Czech Republic. International Journal of Spatial Data Infrastructures Research, 6(2011), 53–72. Accessed 17.06.14). Geiß, C., Taubenböck, H., Wurm, M., Esch, T., Nast, M., Schillings, C., et al. (2011). Remote sensing-based characterization of settlement structures for assessing local potential of district heat. Remote Sensing, 3(7), 1447–1471. http:// dx.doi.org/10.3390/rs3071447. Girres, J. F., & Touya, G. (2010). Quality assessment of the French OpenStreetMap dataset. Transactions in GIS, 14(4), 435–459. http://dx.doi.org/10.1111/j.14679671.2010.01203.x. Goesseln, G. & Sester, M. (2003). Semantic and geometric integration of geoscientific data sets with ATKIS – Applied to geo-objects from geology and soil science. In Proceedings of ISPRS commission IV joint workshop, Stuttgart, Germany, September 8–9. Goodchild, M. (2007). Citizens as sensors: the world of volunteered geography. GeoJournal, 69(4), 211–221. http://dx.doi.org/10.1007/s10708-007-9111-y. Gruhler, K., Böhm, R., Deilmann, C., & Schiller, G. (2002). Stofflich-energetische Gebäudesteckbriefe – Gebäudevergleiche und Hochrechnungen für Bebauungsstrukturen, Band 38, Dresden: IÖR-Schriften. Haklay, M. (2010). How good is volunteered geographical information? A comparative study of OpenStreetMap and ordnance survey datasets. Enviroment and Planning B: Planning and Design, 37(4), 682–703. http:// dx.doi.org/10.1068/b35097. Hecht, R. (2014). Automatische Klassifizierung von Gebäudegrundrissen – Ein Beitrag zur kleinräumigen Beschreibung der Siedlungsstruktur. Berlin: Rhombos-Verlag. Hecht, R., Herold, H., Meinel, G., & Buchroithner, M. F. (2013). Automatic derivation of urban structure types from topographic maps by means of image analysis and machine learning. In M. Buchroithner et al. (Eds.), 26th International cartographic conference – Proceedings: International cartographic association. Accessed 17.06.14. Hecht, R., Kunze, C., & Hahmann, S. (2013). Measuring completeness of building footprints in OpenStreetMap over space and time. ISPRS International Journal of Geo-Information, 2(4), 1066–1091. http://dx.doi.org/10.3390/ijgi2041066. Henn, A., Römer, C., Gröger, G., & Plümer, L. (2012). Automatic classification of building types in 3D city models using SVMs for semantic enrichment of low resolution building data. GeoInformatica, 16(2), 281–306. http://dx.doi.org/ 10.1007/s10707-011-0131-x. Herold, M., Liu, X. H., & Clarke, K. C. (2003). Spatial metrics and image texture for mapping urban land use. Photogrammetric Engineering & Remote Sensing, 69(9), 991–1001. Jiang, B., & Liu, X. (2012). Scaling of geographic space from the perspective of city and field blocks and using volunteered geographic information. International Journal of Geographical Information Science, 26(2), 215–229. http://dx.doi.org/ 10.1080/13658816.2011.575074. Kunze, C. (2013). Nutzung semantischer Informationen aus OSM zur Beschreibung des Nichtwohnnutzungsanteils in Gebäudebeständen; Diploma Thesis, TU Dresden. Accessed 17.06.14. Lo, C. (1995). Automated population and dwelling unit estimation from high resolution satellite images: A GIS approach. International Journal of Remote Sensing, 16(1), 17–34. Lüscher, P., Weibel, R., & Burghardt, D. (2009). Integrating ontological modelling and Bayesian inference for pattern classification in topographic vector data. Computers, Environment and Urban Systems, 33(5), 363–374. http://dx.doi.org/ 10.1016/j.compenvurbsys.2009.07.005.

Please cite this article in press as: Kunze, C., & Hecht, R. Semantic enrichment of building data with volunteered geographic information to improve mappings of dwelling units and population. Computers, Environment and Urban Systems (2015), http://dx.doi.org/10.1016/j.compenvurbsys.2015.04.002

C. Kunze, R. Hecht / Computers, Environment and Urban Systems xxx (2015) xxx–xxx Lwin, K., & Murayama, Y. (2009). A GIS approach to estimation of building population for micro-spatial analysis. Transactions in GIS, 13(4), 401–414. http:// dx.doi.org/10.1111/j.1467-9671.2009.01171.x. Maantay, J. A., Maroko, A. R., & Herrmann, C. (2007). Mapping population distribution in the urban environment: the cadastral-based expert dasymetric system (CEDS). Cartography and Geographic Information Science, 34(2), 77–102. http://dx.doi.org/10.1559/152304007781002190. Meinel, G., Hecht, R., & Herold, H. (2008). Automatische Ableitung von stadtstrukturellen Grundlagendaten und Integration in einem Geographischen Informationssystem. Berlin, Bonn: BMVBS; BBR (Forschungen, Bundesministerium für Verkehr, Bau und Stadtentwicklung, Bundesamt für Bauwesen und Raumordnung). Meinel, G., Hecht, R., & Herold, H. (2009). Analyzing building stock using topographic maps and GIS. Building Research & Information, 37(5–6), 468–482. http://dx.doi.org/10.1080/09613210903159833. Mennis, J. (2003). Generating surface models of population using dasymetric mapping. The Professional Geographer, 55(1), 31–42. http://dx.doi.org/10.1111/ 0033-0124.10042. Mooney, P., & Corcoran, P. (2012). The annotation process in OpenStreetMap. Transactions in GIS, 16(4), 561–579. http://dx.doi.org/10.1111/j.14679671.2012.01306.x. Muhs, S., Herold, H., Meinel, G., & Burghardt, D. (2013). Automatic delineation of urban blocks from topographic maps. In M. Buchroithner et al. (Eds.): 26th International cartographic conference – Proceedings, international cartographic association. Accessed 25.03.15. Müller, W. and Korda, M. (1999). Städtebau. Teubner. Neis, P., Zielstra, D., & Zipf, A. (2013). Comparison of volunteered geographic information data contributions and community development for selected world regions. Future Internet, 5(2), 282–300. http://dx.doi.org/10.3390/fi5020282. Orford, S., & Radcliffe, J. (2007). Modelling UK residential dwelling types using OS Mastermap data: A comparison to the 2001 census. Computers, Environment and Urban Systems, 31(2), 206–227. http://dx.doi.org/10.1016/j.compenvurbsys. 2006.08.003. Ramm, F., Topf, J., & Chilton, S. (2010). OpenStreetMap: Using and Enhancing the Free Map of the World. UIT Cambridge. Ramos, J. M., Vandecasteele, A., and Devillers, R. (2013). Semantic integration of authoritative and volunteered geographic information (VGI) using ontologies. In Proceedings of 16th AGILE conference on geographic information science, Leuven, Belgium, 14–17 May. Accessed 17.06.14. Schiller, G. and Bräuer, A. (2013). GIS-basierte kleinräumige Schätzung von Planungsparametern zur Unterstützung der strategischen Siedlungs- und Infrastrukturplanung. In Strobl, J., Blaschke, T., Griesebner, G. and Zagel, B. (Eds.), Angewandte Geoinformatik 2013, Beiträge zum 25. AGIT-Symposium Salzburg (pp. 628–637). Berlin: Wichmann.

15

Sester, M., Arsanjani, J. J., Klammer, R., Burghardt, D., & Haunert, J.-H. (2014). Integrating and generalising volunteered geographic information. In D. Burghardt, C. Duchêne, & W. Mackaness (Eds.), Abstracting geographic information in a data rich world – Methodologies and applications of map generalisation (pp. 119–155). Springer International Publishing. http:// dx.doi.org/10.1007/978-3-319-00203-3. Smart, P. D., Quinn, J. A., & Jones, C. B. (2011). City model enrichment. ISPRS Journal of Photogrammetry and Remote Sensing, 66(2), 223–234. http://dx.doi.org/ 10.1016/j.isprsjprs.2010.12.004. Smith, D. and Crooks, A. (2010). From buildings to cities: Techniques for the multiscale analysis of urban form and function. CASA Working Papers, Vol. 155, Centre for advanced spatial analysis (UCL). London. Accessed 17.06.14. Stadler, C., Lehmann, J., Höffner, K., & Auer, S. (2012). LinkedGeoData: A core for a web of spatial open data. Semantic Web Journal, 3(4), 333–354. http://dx.doi.org/ 10.3233/SW-2011-0052. Steiniger, S., Lange, T., Burghardt, D., & Weibel, R. (2008). An approach for the classification of urban building structures based on discriminant analysis techniques. Transactions in GIS, 12(1), 31–59. http://dx.doi.org/10.1111/j.14679671.2008.01085.x. Strunck, A. (2010). Raumzeitliche Qualitätsuntersuchungen von OpenStreetMap. Bonn, Rheinische Friedrich-Wilhelms-Universität, Geographisches Institut: unveröffentlichte (Diploma Thesis). Ural, S., Hussain, E., & Shan, J. (2011). Building population mapping with aerial imagery and GIS data. International Journal of Applied Earth Observation and Geoinformation, 13(6), 841–852. http://dx.doi.org/10.1016/j.jag.2011.06.004. Walde, I., Hese, S., Berger, C., & Schmullius, C. (2013). Graph-based mapping of urban structure types from high-resolution satellite image objects – Case study of the german cities Rostock and Erfurt. IEEE Geoscience and Remote Sensing Letters, 10(4), 932–936. http://dx.doi.org/10.1109/LGRS.2013.2252323. Wu, S. S., Qiu, X., & Wang, L. (2005). Population estimation methods in GIS and remote sensing: A review. GIScience & Remote Sensing, 42(1), 80–96. Wurm, M., Taubenböck, H., Roth, A., and Dech, S. (2009). Urban structuring using multisensoral remote sensing data: By the example of the German cities Cologne and Dresden. In 2009 Joint urban remote sensing event. Shanghai, China (pp. 1–8). http://dx.doi.org/10.1109/URS.2009.5137555. Wurm, M., & Taubenböck, H. (2010). Abschätzung der Bevölkerungsverteilung mit Methoden der Fernerkundung. In H. Taubenböck & S. Dech (Eds.), Fernerkundung im urbanen Raum – Erdbeobachtung auf dem Weg zur Planungspraxis (pp. 143–152). Darmstadt: Wissenschaftliche Buchgesellschaft. Zielstra, D. and Zipf, A. (2010). A comparative study of proprietary geodata and volunteered geographic information for Germany. In Proceedings of 13th AGILE international conference on geographic information science. Guimarães (Portugal), 11th – 14th May.

Please cite this article in press as: Kunze, C., & Hecht, R. Semantic enrichment of building data with volunteered geographic information to improve mappings of dwelling units and population. Computers, Environment and Urban Systems (2015), http://dx.doi.org/10.1016/j.compenvurbsys.2015.04.002