Available online at www.sciencedirect.com
ScienceDirect Transportation Research Procedia 19 (2016) 119 – 134
International Scientific Conference on Mobility and Transport Transforming Urban Mobility, mobil.TUM 2016, 6-7 June 2016, Munich, Germany
Determination of Attributes Reflecting Household Preferences in Location Choice Models B. Heldt *, K. Gade, D. Heinrichs Institute of Transport Research, Rutherforstr. 2, 12489 Berlin, Germany
Abstract Challenges coming along with changing mobility behaviour patterns require planning decisions to mitigating negative effects. Land-use and transport interaction models can provide valuable decision support for this purpose. But they require tremendous effort in terms of model design as well as data collection and preparation. We introduce a methodology and procedures with the aim to minimize the magnitude of modelling work with particular attention to the selection of model segmentation and model parameters in a structured and efficient way. The methodology combines literature and statistical analyses for model design. The paper outlines the methodology and presents its application to the design of a location choice model for the city of Berlin, Germany. We demonstrate how household types exhibiting specific location patterns and related accessibility parameters can be identified from the literature and how standard deviation maps and correlation analysis can be used to detect these households and test hypotheses. The results suggest that the methodology is capable to identify segmentations and parameters for usage in choice models, such as location decisions, in an efficient way. © 2016 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license © 2016 The Authors. Published by Elsevier B.V. (http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer-review under responsibility of the organizing committee of mobil.TUM 2016. Peer-review under responsibility of the organizing committee of mobil.TUM 2016. Keywords: residential location choice; land-use and transport interaction model; accessibility; census data; Berlin
1. Introduction Where and how we move from one place to another everyday depends on the purpose of activities and on the distribution of locations related to these activities, also referred to as land-use. Since both the ongoing
* Corresponding author. Tel.: +49-30-670 557 971; fax: +49-30-670 552 83. E-mail address:
[email protected]
2352-1465 © 2016 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer-review under responsibility of the organizing committee of mobil.TUM 2016. doi:10.1016/j.trpro.2016.12.073
120
B. Heldt et al. / Transportation Research Procedia 19 (2016) 119 – 134
transformation of land-use patterns and travel patterns and the associated infrastructures keep changing, understanding the interaction of both phenomena is crucial to perform comprehensive planning processes to mitigate resulting negative effects on the environment. Quantitative models are an instrument to help decision-making by predicting future changes of both the composition of land-use and corresponding travel patterns. The purpose of these models is to reduce the complex reality in a way that general phenomena can be forecasted. Because land-use and transport interact, models need to predict both simultaneously and consequently integrate land-use pattern analysis and transport analysis in so-called land-use and transport interaction models (LUTI). A common assumption for modelling approaches is that land-use and travel patterns are the results of household decisions based on choice alternatives that are subject to constraints, e.g. resources or regulation of supply (De la Barra, 1989; McFadden, 1978; Straszheim, 1987; Wegener, 2014). While land-use patterns result from location decisions, travel patterns are the outcome of decisions about transport mode, destination, route etc. Discrete-choice models explain these decisions by analysing preferences from patterns discovered in observed data. In order to reduce the complexity of reality, discrete-choice based LUTI group similar households into segments (de Dios Ortuzar & Willumsen, 2011) where similarity refers to location patterns and corresponding decisions. Considering the location choice part of LUTI-models that is tailored to predict land-use, complexity is reduced by applying aggregate spatial concepts – choices are analysed for real estates in zones that intend to represent neighbourhoods. Location choice models thus consider an extensive amount of different attributes that aim to reflect or approximate real influence factors on choice of a real estate and neighbourhood. By incorporating accessibility measures as one group of these factors, they establish the crucial link to transport models (de Dios Ortuzar & Willumsen, 2011; Francisco J. Martinez, 1995; Wegener, 2014). Alongside with accessibility, location factors obviously include attributes related to the neighbourhood, but also the dwelling, and characteristics of the households themselves (Hurtubia, 2012; Hurtubia, Gallay, & Bierlaire, 2010; Schirmer, Van Eggermond, & Axhausen, 2014). The relevance of these attributes then varies by household type or segment since different households exhibit different location preferences. Designing a model to predict location choice requires two main decisions: First, how to segment households that exhibit similar choice behaviour (and patterns), and second, which attributes to include in the explanation and prediction of location choice. Considering the many possibilities to designing discrete-choice models, misspecification is likely and it becomes clear that the high complexity associated with model design challenges validity and efficiency (D. B. Lee, 1994; Lee Jr, 1973). Using simple methods to determine eligible household segments and location attributes facilitates and improves the development of these models. Applying such a framework is important to save time and therefore resources in model development. In this article, we present a comprehensive analysis framework that combines methods to determine household types with similar location preferences with methods to identify location attributes that are associated with the distribution of these household types. We describe this methodology in the first part of the paper. In the second part, we apply it to a case study that involves the development of a location choice model for Berlin, Germany, called “SALSA” (SimulAting Location Demand and Supply in Urban Agglomerations). We outline the framework underlying SALSA as foundation for the subsequent analysis in which we apply the methodology introduced in the first part. Finally, we turn to the results of the application of our analyses, i.e. the literature review, and statistical analyses, eventually showing the efficiency of the proposed framework. 2. Methodology The approach basically combines a literature review with descriptive statistical and geovisual analysis. The steps are shown in Fig. 1.
B. Heldt et al. / Transportation Research Procedia 19 (2016) 119 – 134
Methodology Standard deviation maps • Maps of standardised household type proportions • Identification of similar distributions • Checking whether distributions correspond to hypotheses
Household types Literature review
• Clustered by attributes that reflect resources and preferences • Similar within-group patterns, aggregation if same distribution
• Identification of household attributes to describe household types • Determining attributes that reflect location preferences and assumptions about their household type-specific relevance
Hypotheses • Which attributes describing a location are associated with which household types? • Redefinition if mapping results do not correspond
Model parameters
Correlation analysis • Correlation of household type proportions and densities with location attributes at zone level
Fig. 1. Methodological framework.
The first step is a comprehensive literature review. The analysis of existing theories associated with residential location choice provides the basis for defining characteristics to describe households. Empirical studies give insights into appropriate location attributes and their relevance related to the household attributes found. In a second step, based on analysis of secondary data and accounting for data availability and predictability, we combine identified household attributes to form household types or segments that behave differently regarding their location choice. A type refers to the combination of several classes (Marradi, 1990). Thus, types are the result of categorising household characteristics found in step 1 and intersecting these categories. A third step is the formulation of hypotheses on the likely relation between location attributes and the identified household types, particularly regarding their spatial distribution. We use the literature from step 1 for this purpose. The fourth and fifth steps form the core of the analysis. They include the evaluation of hypotheses analysing empirical data at a spatially aggregate level of zones using standard deviation maps and correlation analysis. Basically, the fourth step involves the preparation of choropleth maps of standardised values (Jenks & Caspall, 1971); in this case values reflect location probabilities operationalised as proportions of household types. By this we identify which household segments show which distributions. This visual investigation allows confirming or rejecting hypotheses about the spatial distribution of households defined in step 3. Additionally, the analysis reveals zones with casespecific patterns indicated by proportions that are extremely above or below the average of all zonal values across the study area. In addition, closer investigation of standard deviation maps uncovers atypical location patterns that require special consideration in designing location models because they may not be explainable by attributes intended to disclose the general pattern. Such zones are adjacent to each other but have different household proportions than surrounding neighbourhoods. This is particularly relevant if directions of deviations differ, i.e. one or more zones feature above-average proportions of a household type while other zones nearby exhibit a proportion of the same type that is below the average. As a further consequence, household types with similar spatial distributions are considered to be merged, resulting in a new household segmentation and thus a jump back to step 2. The consequence of this
121
122
B. Heldt et al. / Transportation Research Procedia 19 (2016) 119 – 134
would be a revision and likely redefinition of hypotheses, hence a new iteration of step 3, yielding new spatial distributions in step 4 and so on. Fifth, correlating location probabilities of household types with aggregated zonal attributes indicates which ones are likely relevant for which household groups, i.e. by where correlation coefficients are high. Such attributes describe the average accessibility, real estate and activity location structure of the neighbourhood. Correlation analysis and standard deviation maps alike may be used to redesign the household segmentation. If household types show similar correlations, particularly if signs and coefficients are the same, aggregation of these segments should be considered, i.e. going back to step 2 to 4 and repeating step 5. The result of this framework is a preselection of household types and model parameters that are likely associated with their spatial distribution which can be used for the specification of a location choice model. Since performing all these steps manually does not increase model design efficiency, the code for defining household types and analysing their spatial distributions and correlations to location attributes was implemented in R, enabling easy testing of different segmentations and comfortable merging where needed. 3. Case study: determining the parameters for a location choice model for Berlin, Germany The methodological steps described in the previous section will now be applied to a case study with the objective to design a location choice model for the city of Berlin. At first, we describe the basis for our simulation framework SALSA that sets the conditions for the analyses, then highlight the data used and briefly describe how we processed demographic data and accessibility measures. Finally, we will turn to the main results of the case study. In order to reduce complexity, we focus on few household characteristics on the one hand, and attributes describing accessibility – the crucial link between land-use and transport – on the other. 3.1. Simulation model Our analysis has the purpose to prepare the application of an existing simulation model to the city of Berlin. This model, ‘MUSSA II’ is based on bid-choice theory, which assumes that an equilibrium state can be attained between the demand and supply of residential locations (Martínez & Donoso, 2010; F. J. Martinez, 1992; Francisco J. Martinez, 1996; Francisco J. Martinez & Henriquez, 2007). In short, households maximise their utility and developers their profits. Based on this assumption, households move into locations for which they are the highest bidders. At the same time suppliers decide whether to offer a location, which they only do if they gain enough considering highest bids as rents deducted by real estate costs. This way the model connects the discrete-choice approach with the bid-rent approach. As a macroscopic discrete-choice model it bases on relative associations between household preferences predicting location probabilities. The implication of this is that analyses should focus on relations between ratios and distributions rather than absolute numbers. 3.2. Data Recalling from the introduction that discrete choice-based location models assume that decisions are reflected in observed location patterns, required data should connect disaggregate household and dwelling information with geolocation and, optimally, mobility options and travel behaviour (Heldt, Gade, & Heinrichs, 2014). Sources used in the following analysis therefore comprise person- and household-level census data from the population and building census of 2011 amended with geodata used to describe activity locations (cp. Table 1).
B. Heldt et al. / Transportation Research Procedia 19 (2016) 119 – 134 Table 1. Data sources. Purpose
Name
Source
Observations of households in dwellings and traffic analysis zones
Zensus 2011 Gesamtdatenbestand (population census and building and dwelling census)
Amt für Statistik Berlin-Brandenburg (2015)
Spatial characteristics and activities
Grünanlagenbestand Berlin (Parks 2016)
Geoportal Berlin
Einzelhandelsbestandsdaten 2013 (Retail census 2013)
Senate Department for Urban Development and the Environment Berlin (2014)
OpenStreetMap
OpenStreetMap contributors, CC-BYSA (2015)
Network
The main data source is microdata of Zensus 2011 for the city of Berlin, consisting of databases that describe persons, households, and dwellings and including geolocation at several levels of detail. Zensus 2011 is a so-called register-based census, i.e. main information was derived from the German population register and combined with a new total census of dwellings and buildings as well as information from registers of federal job agencies. Furthermore, due to data privacy protection, micro-level census data underlies strict conditions of usage. For descriptive analyses, only frequencies of categories with more than two cases can be published; thus methods that analyse distributions rather than frequency numbers have advantages. This, however, is in line with discrete choice, which analyses relative patterns and not absolute ones. Geodata comprises of a complete sample retail survey by the Senate for Urban Development and the Environment of Berlin. Shapefiles of parks and green space were obtained using the Senate’s Open Data Web Feature Service. OpenStreetMap network data is crucial for computing accessibility measures (see below and Appendix A. ). 3.3. Data processing In SALSA, we assume that location patterns are the result of households’ decisions where to locate. Hence, deriving corresponding parameters presupposes a database that includes households and dwellings. Households may be characterised by aggregated person attributes and / or the features of one household representative. The same is true for the relation of dwellings and buildings. For achieving this purpose, identifiers for persons, households, dwellings, and buildings allow splitting and applying aggregation procedures to the data sources and combining them back into a single database describing location observations. In order to further qualify observed locations, we add accessibility measures at zone level using the zonal identifier. Accessibility measures considered mainly include generalised cost measures, and opportunity-based measures, among others (El-Geneidy & Levinson, 2006; Wachs & Kumagai, 1973). They were obtained using a tool to compute accessibility measures for different modes of transport, including intermodal combinations (Krajzewicz & Heinrichs, 2016). Applying the tool, we calculated cumulative and average opportunity measures as well as generalised travel times and times to the closest activity as listed below. For a detailed description of the data processing of the census and the accessibility tool please refer to Appendix A. .
123
124
B. Heldt et al. / Transportation Research Procedia 19 (2016) 119 – 134 Table 2. Accessibility measures. Code
Description
1a
Travel time (average / generalized) to other zones (car)
1b
Travel time (average / generalized) to other zones (foot)
2
Travel time to closest commercial center (foot)
3a
Travel time to closest grocery store (>= 1,000 m2) (car)
3b
Travel time to closest grocery store (>= 200 m2) (foot)
4a
Average floor space of grocery stores within 10 min (car)
4b
Average floor space of grocery stores within 10 min (foot)
5a
Travel time to closest large park (>= 50,000 m2) (car)
5b
Travel time to closest small park (>= 10,000 m2) (foot)
The complete dataset serves as input to parameter estimation for the location choice model and is thus the basis for the following analyses. 4. Results In this section we describe the application of the methodological steps introduced earlier to the requirements of the bid-choice model and the data for Berlin going into detail for each of the steps 1 to 5. 4.1. Literature review Theoretical and empirical studies on residential location choice considering different household attributes provide the basis for the derivation of the household segmentation and hypotheses on their spatial distribution and association with accessibility. 4.1.1. Household size Theoretical considerations on the relationship between household size and centrality by Rossi and Alonso reveal a generally negative association in terms of accessibility: the larger the household, the more decentrally it is located and the lower is its level of accessibility. Rossi in his 1955 study found that life cycle and composition of households operationalised by household size are related to where and how people live (Rossi, 1955). According to his study households of different sizes have different location preferences; in particular the larger the household, the higher is the demand for a large dwelling in terms of living area or number of rooms. This has been confirmed by later empirical studies (B. H. Y. Lee & Waddell, 2010a, 2010b), which for instance affirm the preferences of single-person households for multi-family homes. Combining these findings with Alonso’s bid-rent theory (Alonso, 1960) which states that land-uses are allocated spatially according to the owner’s willingness to pay, we can assume that a household faces a trade-off between centrality respectively accessibility and neighbourhood characteristics, and the size of the dwelling. As a consequence, larger households requiring larger dwellings are supposed to rather be located decentrally and smaller households the opposite. 4.1.2. Presence of children Similar to household size, having a child in the household rather indicates a decentral location implying low accessibility levels which is the finding of some studies. The presence of children indicates a certain stage in life and comes with specific needs; families demand access to educational and recreational facilities which Kim, Pagliara, and Preston concluded from their 2005 study (Kim, Pagliara, & Preston, 2005). In line with the assumption of larger
B. Heldt et al. / Transportation Research Procedia 19 (2016) 119 – 134
households preferring larger sizes, Lee and Waddell showed in their 2010 studies for the Seattle Puget Sound metropolitan area that households with children prefer single-family homes (B. H. Y. Lee & Waddell, 2010a, 2010b). 4.1.3. Householder age Several studies imply that younger households rather locate centrally and cluster together, while households with older heads locate in more decentral locations associated with low levels of accessibility. Clark and Dieleman add age of the household head to the previous consideration for indicating the household’s stage of life which is relevant for residential mobility (Clark & Dieleman, 1996). A number of studies confirmed that householder age has a significant effect on location decisions and influences location patterns and house prices (De Palma, Motamedi, Picard, & Waddell, 2005; De Palma, Picard, & Waddell, 2007). For instance, older people have different preferences regarding residential locations, i.e. dwelling and neighbourhood characteristics than younger ones (Duncombe, Robbins, & Wolf, 2003; Somenahalli & Shipton, 2013). 4.1.4. Household resources According to basic microeconomic theory, location options considered for residential location decisions are furthermore constrained by household resources. Variables to describe such resources include net household income, level of education of the household head, number of employed persons, etc. The latter variable also proxies car availability which is an important attribute related to accessibility. However, due to data constraints, we cannot explore this empirically. Nevertheless, household income is correlated to other household attributes we consider in our analysis. 4.2. Household types As a consequence from the literature review in our case study we consider household size with number of children as one main attribute, and age of household head as the other (see Table 3). Categories for household size comprise of households with one, two, three, and four or more persons. Number of children – where a child is defined as a person below 18 years – is split into two categories, either at least one child is present, or none. Combining both attributes yields the first typology of which only a few typical categories enter subsequent analyses. These are: singleperson households (denoted as C1), two-person households without children (C2) three-person households with one child (C3) and households with four or more persons without children (C4). These categories represent the most prevalent household types in our study area, including special cases such as multi-person adult households. For the second segmentation, we categorised households by age of householder into young households (younger than 31), medium-aged households (31 to 64), and senior households (65 and older), which each reflect different phases in the household life cycle. For instance, income and household structure likely differ by age: young adults below 31 have rather low incomes and no children, medium-aged households mainly represent traditional families with children, and in families with over 64-year-old mostly retired seniors children have left. Table 3. Household segmentation. Household composition (number of adults and children)
Householder age
Code
Description
Code
C1
Single-person-households
A1
Description Households whose head is 30 or younger
C2
Two-person-households without children
A2
Households whose head is between 31 and 64
C3
Three-person-households with one child
A3
Households whose head is older than 64
C4
Households with four or more persons without children
125
126
B. Heldt et al. / Transportation Research Procedia 19 (2016) 119 – 134
4.3. Hypotheses Through the literature review, we found two general hypotheses on the relevance of accessibility for our household types: Hypothesis 1: Assuming that household size and dwelling size correlate and centrality is expressed as accessibility, we hypothesise that smaller households more likely locate in zones where accessibility is higher in general; and the opposite for larger households. According to our analysis for the presence of children, families with children are decentrally located as well. Hypothesis 2: Young households tend to locate more centrally where activities cluster. We therefore assume that they – like small households – are more likely in zones with high accessibility; while high probability of older households is associated with low accessibility. 4.4. Standard deviation maps and correlation analysis For statistical analysis, we operationalise our main attributes and the corresponding hypotheses in order to test them quantitatively. The first group of attributes is location probabilities which we treat as the proportion of a household type C1 to C4 or A1 to A3 of all households within a zone. The other part of the analysis form accessibility attributes aggregated to zonal level as described in Table 2 which operationalise centrality. Accessibility is often expressed as travel time. High centrality means high levels of accessibility but low values for travel times and low centrality the opposite. In the following we detail the geovisual analysis and correlation analysis separated by both household segmentations: household composition and householder age, respectively. 4.4.1. Household composition The maps in Fig. 2a-d show the distribution of household type proportions across the city of Berlin by household composition. They visualise standardised values, symbolising the magnitude of deviation from the mean of all zones by saturation and positive or above-average deviations in red, and negative or below-average ones in blue. Cases in saturated colours falling apart from the pattern of neighbouring zones require particular attention in explaining the related location – particularly if the direction of the deviation is different.
B. Heldt et al. / Transportation Research Procedia 19 (2016) 119 – 134
a
b
c
d
127
Standardised household type proportions b y zone lo wer than -1.96 -1.96 to -0.98 -0.98 to -0.49 -0.49 to 0.49 0.49 to 0.98 0.98 to 1.96 greather than 1.96 Missing
lower than -1.00 standard deviation (std) -1.00 to -0.50 std -0.50 to -0.25 std -0.25 to 0.25 std 0.25 to 0.50 std 0.50 to 1.00 std greater than 1.00 std
Fig. 2a-d. Standardised household type proportions by traffic analysis zones in Berlin, for the following household types (from top left to bottom right): single households (C1), two-person households (C2), three-person households with one child (C3), four-person households without children (C4); cases with a count lower than 50 were excluded from the analysis and are represented as missing Sources: Household census data: Amt für Statistik Berlin-Brandenburg 2016 (state of data: 2011), traffic analysis zones: Amt für Statistik BerlinBrandenburg 2014.
The maps above show that spatial household distributions indeed vary by composition which confirms our Hypothesis 1. Small single households (C1) appear with above-average proportions in the inner city, while two-person households without children (C2) rather locate in the outskirts of Berlin, which is similar for household types C3 and C4; in other words: small households can be found in the city centre while the probability for large households is higher on average on the outskirts. The spatial distribution further implies that single person households exhibit a different location pattern than other composition groups and it is therefore appropriate to represent them as one
128
B. Heldt et al. / Transportation Research Procedia 19 (2016) 119 – 134
segment in a location choice model. For groups C2, C3, and C4, however, patterns look rather similar although C3 shows more on-average values (represented in white). These groups may thus be collapsed into one.
Correlation coefficients between household groups
Correlations between groups and attributes
Significance levels of correlations
NS * ** ***
Not significant p > 0.05 p > 0.01 p > 0.001
Correlations between attributes
Fig. 3. Pearson correlations coefficients between proportions of household type segments C1 to C4 (cp. Table 3. Household segmentation.) and accessibility measures 1a to 5b (Table 2). Sources: as indicated in the text.
Having this way found geovisual support for our Hypothesis 1, we now turn to correlation analysis. Standard deviation maps only capture the distribution of one attribute, i.e. the probability of one household type; in correlation analysis, however, we introduce accessibility as another attribute that we correlate household type proportions with. Fig. 3 shows the matrix of Pearson correlation coefficients with their significance levels for the segmentation of households by household composition. The lower left section depicts correlations coefficients; positive ones are symbolised in red and negative ones in blue. The saturation and brightness of the colour visualises the intensity of the coefficient. Significance levels that indicate whether correlation coefficients differ from zero on a significance level of 5%, 1%, and 0.1% respectively are depicted in the upper section of the figure. Both sections of the chart consist of three parts: in the coefficients section, the upper left part shows correlations among household type proportions, the central part marked by a black rectangle depicts correlations between household type proportions and accessibility measures at zonal level, and the bottom part includes coefficients among accessibility measures. This structure is mirrored in the significance section of the figure.
B. Heldt et al. / Transportation Research Procedia 19 (2016) 119 – 134
Correlation analysis confirms to a large extent the pattern hypothesised from literature reviews and confirmed by geovisual analysis: proportions of single households (C1) negatively correlate with generalised travel times (1a and 1b), while the opposite is true for larger households (C2-C4). Hence, assuming that good accessibility is reflected by low travel times, accessibility and household type proportions mostly correlate as expected. Results also indicate similarly to standard deviation map analysis that household segments C2 to C4 show somewhat similar patterns and may be subject to aggregation. Finally, generalised cost accessibilities 1a and 1b and grocery-store-related measures 3b and 4a show very high coefficients above ±0.45 and should therefore be included in parameter estimation, while correlations for average park area accessible (5a and 5b) are rather low – the attribute should be redefined or excluded as model parameter.
4.4.2. Householder age Standard deviation maps for zonal proportions by householder age (Fig. 4a-c) to some extent feature a similar pattern as household composition which is in line with Hypothesis 2. Young households tend to locate in the inner city, while older households rather cluster on the city’s edges. Which is different is that classification by age also indicates variation from east to west – this particularly applies to medium-aged households and may be a consequence of the city’s historically different demographic development in both parts of the city which may be due to different political systems. The analysis furthermore discovers zones that exhibit values significantly different from their neighbours that deserve particular attention in model estimation.
129
130
B. Heldt et al. / Transportation Research Procedia 19 (2016) 119 – 134
a
b
c Standardised household type proportions by zone lower than -1.96 lower than -1.00 standard deviation -1.96 to -0.98 -1.00 to -0.5 0 std -0.98 to -0.49 -0.50 to -0.2 5 std -0.49 to 0.49 -0.25 to 0.25 std 0.49 to 0.98 0.25 to 0.50 std 0.98 to 1.96 0.50 to 1.00 std greather than 1.96 greater than 1.00 std Missing
Fig. 4a-c. Standardized household type proportions by traffic analysis zones in Berlin, for the following household types (from top left to bottom left): households with a householder below 31 (A1), households whose householder is between 31 and 64 (A2), households with a householder above 64 (A3); cases with a count lower than 50 were excluded from the analysis and are represented as missing Sources: Household census data: Amt für Statistik Berlin-Brandenburg 2016 (state of data: 2011), traffic analysis zones: Amt für Statistik BerlinBrandenburg 2014.
The correlation analysis (cp. Fig. 5) again confirms our assumptions for young and senior households that seem to exhibit the same pattern as household composition. However, for middle-age group A2 we find almost no significant correlation which is in line with the results of the geovisual analysis. Either for this group accessibility at an aggregate level is not relevant, or the very large group of 31 to 64-year olds is too heterogeneous and thus distributed across the whole city. However, segmentation by age of householder seems to describe accessibility patterns well, with generalized cost measures 1a and 1b exhibiting the strongest correlations with coefficients above ± 0.3.
131
B. Heldt et al. / Transportation Research Procedia 19 (2016) 119 – 134
A1
***
***
***
***
***
***
***
***
***
-0.23
A2
***
NS
NS
*
*
**
NS
NS
A3
***
***
***
***
***
***
***
NS
**
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
*
*
***
***
***
NS
***
***
***
*
**
NS
NS
-0.67 -0.61
-0.41 -0.00 0.34 -0.44 -0.02 0.38
0.97
-0.27 0.08
0.17
0.48
0.50
-0.36 0.08
0.25
0.51
0.48
0.56
-0.39 0.10
0.26
0.52
0.51
0.58
0.69
-0.23 0.02
0.18
0.67
0.60
0.24
0.06
-0.12 -0.06 0.15
0.35
0.31
0.19 -0.07 -0.00 0.54
-0.08 0.07
0.27
0.24
0.53
0.01
0.49
0.23
0.45
0.07
*
***
**
0.00 0.02
Fig. 5. Pearson correlations coefficients between proportions of household type segments A1 to A3 (see Table 3) and accessibility measures 1a to 5b (see Table 2). Sources: as indicated in the text.
4.5. Case study lessons The application of the methodological steps to the Berlin case study produces some valuable results. First, household segmentation should include household composition and householder age. Second, not all categories may be included separately in the model as some display similar location patterns and correlation results. Third, generalised accessibility measures and accessibility to the closest supermarket seem to be relevant parameters for location modelling. Finally, standard deviation maps reveal local patterns that cannot be explained by general considerations but should somehow be addressed in the location model. Overall, the case study application suggests that the methodological framework introduced produces household types clustered by their spatial distribution or location pattern. Likewise it identifies location attributes that are likely to explain these patterns. 5. Conclusions Land-use and transport interaction models can provide valuable decision support. At the same time they require tremendous effort in terms of model design as well as data collection and preparation. The paper explores the question: how to support an efficient and structured selection of model segmentation and model parameters? It introduces and tests a set of methodologies and procedures with the aim to minimize the magnitude of modelling work. The framework combines the commonly applied methods literature analysis, descriptive statistics and geovisualisation techniques. The application of the methodological steps to the case of Berlin identifies which household characteristics are appropriate to form household types that theoretically exhibit similar location patterns. The analysis of empirical data
132
B. Heldt et al. / Transportation Research Procedia 19 (2016) 119 – 134
for the city confirms or collapses (i.e. merges) the segments previously identified. Standard deviation maps show that household segmentation should include household composition and householder age; however, some categories such as family households with one child and multi-person adult households can be aggregated into one type. Correlation analysis suggests that, generalised accessibility measures and accessibility to the closest supermarket are relevant parameters for location modelling. There are some limitations to the applicability of this methodology. First, the framework is tailored to models that focus on relative distributions such as macroscopic discrete-choice models and it may be less applicable to microscopic models. Furthermore, in the case study, household segmentation only considers two separate attributes. Combining these attributes into one household type may yield better results. What is more, the case study does not include singleparent households which, however, have specific preferences and constraints and – with diversification of lifestyles – will make up a considerable proportion of the future population. Finally, the segmentation considered here is based on theoretical assumptions only; applying more sophisticated statistical analyses, such as clustering, could help to better define household types and account for correlations between household characteristics. Future work should extend the approach introduced in this article by applying cluster analysis to empirical data in order to group segments according to location patterns which are similar for households within the same segment and different from others. Appendix A. Data processing Data processing involves two main steps: processing of the census data and computation of accessibility attributes. Thus we first describe the preparation of census data and then turn to explaining how our accessibility tool ‘UrMoAC’ works.
A.1. Census data For preparing microdata from Zensus 2011, we first define categories of person-level characteristics, such as age, cultural background, and employment status along which they will be aggregated to household level by using the household identifier. This aggregation involves calculating the sum of persons within the previously defined categories, e.g. the number of workers per household. Some other household attributes are characteristics of the householder, supposing that he takes the location decisions. We define the householder as the oldest employed person in the household or the oldest person if no household member is employed. Finally each household is represented by quantitative person-level characteristics and features of the householder. Processing the census similarly requires aggregating and merging building and dwelling data. Some dwelling-level characteristics of the building such as number of dwellings are aggregated to the building level by using the building identifier. Other attributes are copied from the building directly and all building-level information is attached to dwellings using the dwelling and building identifiers. As a final step, the so-calculated household and dwelling characteristics were merged back into a single data source using the household or dwelling identifier. A.2. Accessibility tool UrMoAC – “UrMo Accessibility Computer” The accessibility tool reads a set of sink and a set of source locations from a database as well as a road network, first. The sources and sinks usually describe single real estate units with their respective function (shop, home location, etc.) or positions of certain activity places, such as parks, public transport stops, or similar. Read objects can be simply filtered by certain values, so that e.g., shops that are larger than a given threshold or specific types of shops (e.g. groceries) are read only. In addition, the tool reads a further value for every source or sink that is used to weigh the
B. Heldt et al. / Transportation Research Procedia 19 (2016) 119 – 134
influence of this source when aggregating the found measures (see below) or to count activities , e.g. for obtaining the number of jobs in found dwellings. As network input we currently use a pre-processed road network from OpenStreetMap (OpenStreetMap contributors, 2016) and the public transport network and schedules based on the Google Transport Format Specification (Google, 2016). In a first processing step, sources and sinks are connected to the road network by choosing the respectively closest road that allows the investigated mode of transport including public transport halts. The computation of accessibility measures uses a plain Dijkstra (Dijkstra, 1959) routing algorithm, and all sinks allocated at the accessed roads are counted. The tool is designed to optionally load boundaries (polygons) that define how measures shall be aggregated, e.g., by zones. In this case, the measures found for all sources within a respective zone will be aggregated and averaged. Using previously defined limits, the search, performed by the Dijkstra algorithm can be aborted if a certain distance, time limit or number of (weighted) source locations is reached, or as soon as a first sink has been found. This way, the tool allows to calculate accessibility indicators such as cumulative and average opportunity measures, generalised travel times, and travel times to nearest activity location. References Alonso, W. (1960). A theory of the urban land market. Papers and Proceedings of the Regional Science Association, 6, 149 - 157. Clark, W. A. V., & Dieleman, F. (1996). Households and housing. Choice and outcomes in the housing market. New Brunswick: Center for Urban Policy Research. de Dios Ortuzar, J., & Willumsen, L. G. (2011). Modelling Transport. Chichester, West Sussex, UK: John Wiley & Sons, Ltd. De la Barra, T. (1989). Integrated land use and transport modelling. Decision chains and hierarchies. De Palma, A., Motamedi, K., Picard, N., & Waddell, P. (2005). A model of residential location choice with endogenous housing prices and traffic for the Paris region. European Transport | Transporti Europei, 31, 67-82. De Palma, A., Picard, N., & Waddell, P. (2007). Discrete choice models with capacity constraints: An empirical analysis of the housing market of the greater Paris region. Journal of Urban Economics, 62(2), 204-230. Dijkstra, E. W. (1959). A note on two problems in connexion with graphs. Numerische mathematik, 1(1), 269-271. Duncombe, W., Robbins, M., & Wolf, D. A. (2003). Place characteristics and residential location choice among the retirement-age population. J Gerontol B Psychol Sci Soc Sci, 58(4), S244-252. El-Geneidy, A. M., & Levinson, D. M. (2006). Access to destinations: Development of accessibility measures. Google. (2016). General Transit Feed Specification pages. Retrieved 25th of April, 2016, from https://developers.google.com/transit/gtfs/ Heldt, B., Gade, K., & Heinrichs, D. (2014). Challenges of Data Requirements for Modelling Residential Location Choice: the Case of Berlin, Germany. Hurtubia, R. (2012). Discrete choice and microsimulation methods for agent-based land use modeling. École Polytechnique Fédérale de Lausanne, Lausanne. Hurtubia, R., Gallay, O., & Bierlaire, M. (2010). Attributes of household, locations and real-estate markets for land use modeling, SustainCity Deliverable 2.7. Lausanne: EPFL Lausanne. Jenks, G. F., & Caspall, F. C. (1971). Error on choroplethic maps: Definition, measurement, reduction. Annals of the Association of American Geographers, 61(2), 217-244. Kim, J.-H., Pagliara, F., & Preston, J. (2005). The intention to move and residential location choice behaviour. Urban Studies, 42(9), 195-199. Krajzewicz, D., & Heinrichs, D. (2016). UrMo Accessibility Computer – A tool for computing contour accessibility measures. Proceedings of the SIMUL 2016 conference. Lee, B. H. Y., & Waddell, P. A. (2010a). Reexamining the influence of work and nonwork accessibility on residential location choices with a microanalytic framework Environment and Planning A, 42(4), 913-930. Lee, B. H. Y., & Waddell, P. A. (2010b). Residential mobility and location choice: a nested logit model with sampling of alternatives. Transportation, 37(4), 587-601. Lee, D. B. (1994). Retrospective on large-scale urban models. Journal of the American Planning Association, 60(1), 35-40. Lee Jr, D. B. (1973). Requiem for large-scale models. Journal of the American Institute of Planners, 39(3), 163-178. Marradi, A. (1990). Classification, typology, taxonomy. Quality & Quantity, 24(2), 129-157. Martínez, F., & Donoso, P. (2010). The MUSSA II land use auction equilibrium model Residential Location Choice (pp. 99-113): Springer. Martinez, F. J. (1992). The bid-choice land-use model: An integrated economic framework. Environment and Planning A, 24, 871 - 885. Martinez, F. J. (1995). Access: The transport-land use economic link. Transportation Research Part B, 29(6), 457 - 470. Martinez, F. J. (1996). MUSSA: Land use model for Santiago City. Transportation Research Record: Journal of the Transportation Research Board, 1552(1), 126 - 134. Martinez, F. J., & Henriquez, R. (2007). A random bidding and supply land use equilibrium model. Transportation Research Part B: Methodological, 41(6), 632 - 651. McFadden, D. (1978). Modelling the choice of residential location. In A. Karlqvist, L. Lundqvist, F. Snickars & J. Weibull (Eds.), Spatial interaction theory and planning models (pp. 75 - 96). Amsterdam: North Holland.
133
134
B. Heldt et al. / Transportation Research Procedia 19 (2016) 119 – 134
OpenStreetMap contributors, (2016). OpenStreetMap. Retrieved 23rd of April, 2016, from http://www.openstreetmap.org/ Rossi, P. H. (1955). Why Families Move: A Study in the Social Psychology of Urban Residential Mobility. London: Free Press. Schirmer, P., Van Eggermond, M. A. B., & Axhausen, K. W. (2014). The role of location in residential location choice models: a review of literature. Journal of Transport and Land Use, 7(2), 3-21. Somenahalli, S., & Shipton, M. (2013). Examining the Distribution of the Elderly and Accessibility to Essential Services. Procedia - Social and Behavioral Sciences, 104, 942-951. Straszheim, M. (1987). The theory of urban residential location. In E. S. Mills (Ed.), Handbook of Regional and Urban Economics (Vol. 11): Elsevier Science Publishers B.V. Wachs, M., & Kumagai, T. G. (1973). Physical accessibility as a social indicator. Socio-Economic Planning Sciences, 7(5), 437-456. Wegener, M. (2014). Land-Use Transport Interaction Models. In M. M. Fischer & P. Nijkamp (Eds.), Handbook of Regional Science (pp. 741758). Berlin, Heidelberg: Springer Berlin Heidelberg.