Applied Geography 42 (2013) 108e115
Contents lists available at SciVerse ScienceDirect
Applied Geography journal homepage: www.elsevier.com/locate/apgeog
Quantifying the uncertainty of regional demographic forecasts Tom Wilson* Queensland Centre for Population Research, School of Geography, Planning and Environmental Management, Chamberlain Building, The University of Queensland, Brisbane, Qld 4072, Australia
a b s t r a c t Keywords: Probabilistic Population Households Sydney Australia
Population forecasts are inherently uncertain, and as a general rule the smaller the population, the greater the uncertainty surrounding its demographic future. Over the last two decades demographers have refined probabilistic forecasting models to produce estimates of uncertainty associated with national demographic forecasts. Since the mid-1960s geographers have progressively developed multiregional models to produce regional demographic forecasts. However, these two streams of research have remained largely separate. This paper draws on ideas from both literatures. It introduces a probabilistic model which is suitable for large subnational regions and which produces both population and household forecasts. It was created with a view to informing metropolitan planning, and includes a number of simplifications to reduce input data requirements and run-times relative to ‘standard’ probabilistic models. It is illustrated with an application to the Greater Sydney region for the period 2011e51. The paper concludes by arguing that instead of assuming there to be one inevitable future demographic trajectory, regional planning should consider the plausible envelope of demographic futures, and plan desired futures within it. Ó 2013 Elsevier Ltd. All rights reserved.
Introduction The period from the mid-1960s to the late 1980s witnessed considerable advances in the modelling of spatial population systems, much of it contributed by geographers who brought a spatial outlook to demographic projections. Rogers (1966) extended the single region cohort-component projection model to deal with many regions, thus creating the multi-regional model. Rogers later linked it to the life table (Rogers 1975) whilst Rees and Wilson (1977) developed multi-regional models from a population accounting perspective, focussing particularly on the use of census migration data. This stream of research conceptualised regional populations as an interacting spatial system, constantly evolving through migration exchanges with other regions and countries and varying fertility and mortality regimes. A whole series of refinements and extensions was developed, especially during the 1970s and 1980s, as reviewed in detail by Willekens (1990), Rees (1997) and Wilson and Rees (2005). Multi-regional models now form part of the standard toolbox of models applied by national and regional statistical offices across the world to produce subnational population forecasts (Kupiszewski & Kupiszewska 2003).
* Tel.: þ61 07 3365 6515. E-mail address:
[email protected]. 0143-6228/$ e see front matter Ó 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.apgeog.2013.05.006
Without doubt research on multi-regional modelling has made an immense contribution to the study of regional population systems and the practice of subnational population forecasting, but it has not yet tackled the issue of uncertainty. The demographic future is inherently uncertain, especially at the subnational scale. Population forecasts always turn out to be in error to a greater or lesser extent due to factors such as an incomplete understanding of demographic processes, imperfect demographic data, and unpredictable immigration policy changes. Whilst researchers strive to refine their methods to give more accurate population forecasts, it is equally important to provide users of forecasts with estimates of likely error. As the famous demographer Nathan Keyfitz wrote, “Demographers can no more be held responsible for inaccuracy in population forecasting 20 years ahead than geologists, meteorologists or economists when they fail to announce earthquakes, cold winters, or depressions 20 years ahead. What we can be held responsible for is warning one another and our public what the error of our estimates is likely to be” (Keyfitz, 1981, p. 579). In attempting to address the issue of demographic uncertainty statistical offices often publish high and low variants in addition to a principal or middle series. Unfortunately, this high-low approach is beset with problems (Lee, 1999). High-low ranges are generally not accompanied by any statement on the probability of the future population lying within the range. More seriously, they are statistically inconsistent. Compared to past forecast errors, a high-low
T. Wilson / Applied Geography 42 (2013) 108e115
range can be implausibly small for one variable (such as the proportion of the population aged 65 and above) but very wide for another (such as the annual number of births). Similarly, the highlow range can often be very narrow in the first few years of a projection but wider further into the future. A better approach is to provide prediction intervals surrounding a principal projection which indicate the likelihood of future population lying within them. Usually the intervals are presented in the form of a ‘fan diagram’ with progressively wider intervals marked out either side of the principal forecast (e.g. 50%, 67%, 80%, 95%). These intervals can be created in a number of ways (Tayman, 2011). Some researchers have explored the creation of empirical prediction intervals based on past forecast errors. For example, Stoto (1983) estimated empirical prediction intervals for forecasts of the US total population, prediction intervals for State populations in the US were generated by Smith and Sincich (1988), Rayer, Smith, and Tayman (2009) took the approach to a finer geographical scale by creating prediction intervals for county populations in the US, whilst Tayman, Schafer, and Carter (1998) estimated intervals for very small sub-county areas. The advantages of this approach include its relative simplicity and applicability to regional and local populations where the input data and past forecast errors required for fully probabilistic models are not available. However, a limitation is that usually it is only possible to create prediction intervals for total populations and not for specific age groups or derived variables such as dependency ratios. Another way in which prediction intervals can be created is through the use of time series models. For example, Pflaumer (1992) applied an ARIMA model to estimate prediction intervals around forecasts of the total US population, Ahlburg (1987) did so for South Pacific countries, whilst Tayman, Smith, and Lin (2007) similarly investigated various ARIMA models for forecasting US State populations. Again, these were for total populations only. The limitations of ARIMA models for directly forecasting population include the long base periods of data required and the mixed results of some applications. The most sophisticated approach to creating prediction intervals is through fully probabilistic forecasts created by a cohortcomponent model. They are prepared by running a forecasting model thousands of times using different fertility, mortality and migration rates in each run. The last two decades have witnessed considerable methodological development in national probabilistic demographic forecasting with applications to an increasing number of countries. Contributions over the last few years include those for Australia (Bell, Wilson, & Charles-Edwards, 2011; Hyndman & Booth, 2008), China (Li, Reuser, Kraus, & Alho, 2009), Germany (Härdle & Mysickova 2009), and Japan (Okita, Pfau, & Giang, 2009). Whilst these studies consider countries individually, some researchers have created coordinated probabilistic forecasts for several countries in which correlations in fertility, mortality and migration trends between countries are taken into account (e.g. Alho, 2008; Lutz, Sanderson, & Scherbov, 2001; Raftery, Li, Sevcikova, Gerland, & Heilig, 2012; Scherbov, Mamolo, & Lutz, 2008; Statistics Netherlands, 2005). Nearly all the research in probabilistic demographic forecasting to date has concerned countries and major world regions. Subnational regions and other disaggregations of national populations rarely feature in such work. This is unfortunate because the need for probabilistic forecasts is even greater for sub-populations than national ones. Numerous studies of past population forecast errors clearly demonstrate an inverse relationship with population size (e.g. Rayer, 2008; Wilson & Rowe, 2011). The literature which does exist on probabilistic forecasts for sub-populations is very limited. Examples focussing on subnational populations include Lingaraj and Runte (1975), Lee, Miller, and Edwards (2003) for California, Cameron and
109
Poot (2011) for regions of New Zealand, and Hunsinger (2010) for Alaska. Whilst important contributions, all these forecasts are based on net migration rates or flows. Examples of probabilistic models employing directional migration include Rees and Turton (1998) for European Union regions and Wilson and Bell (2007) for Queensland and the rest of Australia. In addition, very few population forecasts disaggregated by living arrangement or household type have been produced. The few exceptions include Alho and Keilman (2010), Jiang and O’Neill (2004) and De Beer and Alders (1999). All these, however, are national-level household forecasting models. This paper draws on research from the largely separate streams of literature on multi-regional demography (developed primarily by population geographers) and national probabilistic forecasting (contributed by demographers). It introduces on a new model which produces probabilistic population and household forecasts for a large subnational region, illustrated with an application to the Greater Sydney region. The new model makes three main contributions. First, it is one of the few examples of subnational probabilistic demographic forecasting, and one of the very few to employ the more conceptually satisfactory directional, rather than net, migration flows. Second, it produces probabilistic living arrangement and household forecasts as well as population forecasts. Third, it is simplified in several ways with a view to improving its practicality for State/provincial government demography sections. It does this by (a) using five year age groups and projection intervals to reduce dimensionality, thus hugely reducing input data requirements, assumption-setting and run times, (b) creating predictive distributions of regional fertility, mortality and overseas migration by linking them to national predictive distributions for which data and past forecast errors are more plentiful, and (c) simplifying the model’s operation through an Excel front-end. The paper continues by presenting the new forecasting model (Section The PROBREG forecasting model) followed by a description of input data and assumptions for the example application (Section Input data and assumptions). Results are the focus of Section Results whilst the concluding section summaries the key points, discusses limitations and outlines possibilities for further work. The PROBREG forecasting model Overview Population and household forecasts presented in this paper were produced by the PROBREG (PROBabilistic REGional) forecasting system created by the author. This consists of a bi-regional cohort-component population projection model linked to a form of extended headship rate household projection model. The biregional description refers to a two-part division of the national space into the region of interest and the rest of the country in which directional migration flows between regions are modelled explicitly (as opposed to net internal migration). Immigration and emigration are also modelled separately (rather than as net international migration) in both regions. The population component of the system makes use of five-year interval transition migration data from the census and is based on Rees’s (2002) modified transition accounts-based model. The forecasting system is operationalised in a fortran 95 program.1 The household component of the system comprises a simplified version of a sequential propensity model (Wilson, 2013). This is a form of extended headship rate model which takes population
1 As pointed out by a referee, other programming languages could have been used. Fortran 95 was chosen because of this author’s extensive knowledge of it and because of its ability to handle large computational tasks very well.
110
T. Wilson / Applied Geography 42 (2013) 108e115
projections by age (but not sex) and divides them into different living arrangements in several steps. The process is summarised in Fig.1. For each age group the resident population is first divided into those living in non-private dwellings (institutional accommodation) and those in private dwellings; then residents of private dwellings are divided into those living alone and those with others; people living with others are then disaggregated into group and family households. Persons by living arrangement are then converted to numbers of households via average household sizes (which are empirically derived for group and family households, and 1.0 by definition for lone person households). This model was inspired by the conditional shares method of Alho and Keilman (2010). The computational strength of this approach lies in the ability to set just one assumption in each living arrangement pair, which is especially helpful when creating randomly varying proportions. A key objective guiding the design of the forecasting system was to keep it as simple as possible and suitable for application by State Government demography sections. Many probabilistic models are highly complex and data-intensive, incur long computer run times, and are therefore not especially user-friendly or practical. To achieve this objective five year age groups and projection intervals were chosen. The lower level of dimensionality significantly reduces computing time relative to a single year age group model to the extent that 5000 population and household simulations over a 50 year horizon take about 1½ minutes to complete on a standard desktop computer. In addition, an Excel workbook front-end was developed to simplify data inputting and assumption-setting. A screen shot of the top portion of the workbook is shown in Fig. 2. Input data and assumptions are entered in the green cells or selected via pull-down menus. The top right button executes a VBA macro which calls the fortran code and transfers all input data from the Excel worksheet. The probabilistic aspect of the forecasts is obtained by running the cohort-component and simplified sequential propensity models several thousand times with varying sample paths of the summary indicators of fertility, mortality, migration, living arrangements and average household sizes. These summary indicators comprise the Total Fertility Rate (TFR), life expectancy at birth, Gross Migraproduction Probabilities (calculated from five year interval transition migration data), the Gross Rate of Living in a Non-Private Dwelling, the Gross Rate of Living Alone, the Gross Rate of Living in a Group Household, average group household size and average family household size. The Gross Migraproduction Probability (GMP) is analogous to the TFR: it is the sum of all age-specific probabilities multiplied the width of the age group in years (5). The living arrangement Gross Rates are similarly defined. For example, the Gross Rate of Living in a Non-Private Dwelling is the sum of all
Resident population
Residents of non private dwellings
Fertility, mortality and migration sample paths The approach to the generation of regional sample paths for fertility, mortality and international migration is via national-level forecasts. Doing so takes advantage of the more detailed and longer time series of data at the national level, as well as a larger number of past forecasts from which empirical error distributions can be calculated. In addition, it is often the case that trends in fertility, mortality and overseas migration for large subnational regions move largely in line with national trends (e.g. Tromans, Natamba, Jefferies, & Norman, 2008; Ueffing & Wilson 2013). Sample paths for national Total Fertility Rate and life expectancy at birth by sex are produced first with random walk models subject to ceiling and floor limits, whilst total immigration and emigration numbers are produced by ARIMA(1,0,0) models. These time series model were chosen on the basis of past error patterns, and for their simplicity. Regional fertility, life expectancy, immigration and emigration sample paths, which are closely correlated with those at the national scale, are then created as a random fraction of each national sample path via ARIMA(1,0,0) models. For example, the national TFR for sample path z is modelled as a random walk constrained to the main deterministic TFR assumption:
Aus Aus Aus Aus TFRAus z ðtÞ ¼ TFRz ðt 5Þ þ TFRmain ðtÞ TFRmain ðt 5Þ þεTFR;z ðtÞ The TFR for the subnational region of interest (reg) is then calculated as the national TFR multiplied by a random scaling factor (s): reg
reg
TFRz ðtÞ ¼ TFRAus z ðtÞsz ðtÞ where the scaling factor is assumed to be an AR(1) process constrained to the scaling factor assumption. It is calculated as:
reg reg sz ðt 5Þ sreg sreg ðtÞ þ sreg ðtÞ þ εreg z ðtÞ ¼ fs s;z ðtÞ: main main Scaling factor errors for both the region of interest and the rest of the country are correlated via Cholesky decomposition (Press, Teukolsky, Vetterling, & Flannery, 2001). Life expectancy at birth is modelled similarly to the TFR, except that correlations between both sex and region are incorporated. National immigration and emigration totals are modelled with ARIMA(1,0,0) models. These totals are then allocated to the two regions via a random proportion, p: reg
reg
Iz ðtÞ ¼ IzAus ðtÞpz ðtÞ
Residents of private dwellings
Persons living alone
age-specific proportions of the population living in a Non-Private Dwelling multiplied the width of the age group in years (5). With 21 age groups (0e4 to 95e99, 100þ) the maximum Gross Rate is 21 5 ¼ 105.
Persons living with others
Persons in group households
Persons in family households
Fig. 1. Summary of the simplified sequential propensity model for projecting living arrangements. Note: Output categories are shown by the shaded boxes.
where the proportion is assumed to follow an AR(1) process using the same equation as for the regional TFR scaling factor. For internal migration sex-specific Gross Migraproduction Probabilities for migration between the two regions are modelled. There is no national scale forecast in this case. GMPs are assumed to follow an AR(1) process constrained to set assumptions. For example, the GMP for migration from the region of interest to the rest of the country is:
reg reg reg GMP GMPzreg ðtÞ ¼ freg ðt 5Þ GMP ðtÞ þ GMPmain ðtÞ z GMP main reg
þ εGMP;z ðtÞ
T. Wilson / Applied Geography 42 (2013) 108e115
111
Fig. 2. Part of the Excel workbook front-end to PROBREG.
Errors are correlated between sexes and the two directional migration flows. Living arrangement and average household size sample paths All living arrangement sample paths are modelled as random walks constrained to the main deterministic assumptions. For example, the Gross Rate for Living in a Non-Private Dwelling (GRNPD) is calculated as: Reg
Reg
GRNPDz ðtÞ ¼ GRNPDz ðt 5Þ Reg Reg þ GRNPDmain ðtÞ GRNPDmain ðt 5Þ Reg
þεGRNPD;z ðtÞ: A helpful feature of the household model’s tree structure is that only one summary indicator needs to be calculated in each step. Once the GRNPD has been modelled the Gross Rate for Living in a Private Dwelling is derived automatically as 105 e GRNPD. Average household sizes for group households and family households are similarly modelled as random walks constrained to deterministic assumptions.
Section Ex-post error analysis). Strictly, this adjustment is not part of the model; it occurs manually and is implemented in an iterative manner. Nonetheless, it is an important aspect of the final outputs and is thus included in this section on modelling. The adjustment was applied in this case as follows. (1) PROBREG is run. (2) The 80% half-width of the total population predictive distribution at 5, 10 and 15 years into the forecast horizon is examined. (3) This is compared with the width of the 80% Absolute Percentage Error distribution from the ex-post error analysis at the same points of the forecast horizon. (4) If the two sets of predictive distributions are close then the probabilistic forecasts are accepted and the process stops. If not, it continues to step 5. (5) Parameters of the internal and overseas migration time series models are adjusted to obtain predictive distributions which align with past errors. In practice this may be achieved by slightly altering the correlations between random errors in inward and outward migration flows, thereby adjusting the predictive distributions of net migration. The process then returns to step (1).
Aligning model-based and empirical prediction intervals In the application reported in this paper, model-generated prediction intervals were aligned with those obtained from an analysis of past forecast errors for Greater Sydney (described in
The prediction intervals are therefore conditional on future uncertainty being similar to that of the past. Despite the impossibility of knowing this for sure, this approach is believed to be better than relying solely on statistical models or expert judgement.
112
T. Wilson / Applied Geography 42 (2013) 108e115
Input data and assumptions
Ex-post error analysis
Population projection assumptions
An assessment of the forecast accuracy of past population projections for Greater Sydney prepared by the ABS and the New South Wales Government was undertaken. Data limitations restricted the analysis. Only 14 sets of projections could be obtained, and due to the limited extent of data published it was only possible to calculate errors for projected total populations. Despite being a tiny sample, the errors of these projections nonetheless contribute useful information about the predictability of Sydney’s population. Fig. 3 reveals the Absolute Percentage Errors covering 80% of past projections by number of years into the projection horizon. The dashed line depicts a ‘smoothed’ Absolute Percentage Error trend, which is about 1.6%, 3.0% and 4.3% after 5, 10 and 15 years respectively. These values were used in aligning the model-based predictive distributions with past errors (Section Aligning model-based and empirical prediction intervals).
Estimated Resident Populations (ERPs) by sex and five year age groups from 0 to 4 to 95e99 and 100 þ for 30th June 2011 were used as jump-off populations (ABS, 2012). These estimates are based on the 2011 Census but make allowance for net underenumeration and residents temporarily overseas on census night. The national TFR was fixed at 1.89 for the whole forecast horizon, the annual average recorded over the 2006e11 period. The standard deviation of the random error variable was set to give a predictive distribution which aligned approximately with past fiveyear average TFR errors from ABS projections produced in the below-replacement fertility era. Greater Sydney’s TFR was assumed to remain at 94.6% of the national level (as observed over 2006e11) which translates to a forecast TFR of 1.79. TFR scaling factor errors for the two regions were assumed to have a correlation of 0.7. National life expectancy at birth by 2046e51 was assumed to reach to 90.3 years for females and 87.9 years for males (up from 84.0 to 79.5 years in 2006e11). Like fertility, standard deviations of the random error variables were set to approximately align with past five-year average errors from ABS projections since 1981. Greater Sydney’s life expectancy was assumed to be slightly higher than the national trend (a ratio of 1.008 for females and 1.011 for males, based on observations over the last five years). Regional life expectancy scaling factor errors were assumed to have a correlation of 0.7. Assumptions for internal migration were set in terms of Gross Migraproduction Probabilities (GMPs) by sex and migration direction (i.e. migration into, and migration out of, Sydney). All GMPs were assumed to gently decline over time, reflecting greater economic opportunities in the two subnational regions as they grow and therefore a reduced need for labour market-related migration between them. All autoregressive parameters f were set to 0.95 on the basis of judgement, as were correlations between opposite flows of internal migration which were assumed to be 0.35 (these being final values after adjusting to align the total population predictive distribution with past errors). These settings result in net internal (transition) migration of approximately 85,000 per five year interval, very close to the intercensal average recorded over the 1976e2011 period. National overseas migration assumptions were set as five year transition totals for both immigration and emigration, and assumed to gradually increase over time. Both immigration and emigration autoregressive parameters f were set to 0.95 and the correlation between immigration and emigration error was set to 0.80 (this being the final value after adjusting to total population past errors). The main deterministic assumption for Greater Sydney’s proportion of both immigration and emigration was assumed to be 0.251 (the value for immigration recorded by the census for 2006e11) and the correlation between the errors of the random proportions of immigration and emigration was set to 0.85. These assumptions generate net (transition) overseas migration of roughly 725,000 in 2011e16, increasing to 833,000 by 2046e51.
Results Population forecasts The forecast population of Greater Sydney out to 2051 is shown in Fig. 4. From an estimated population of 4.6 million in 2011 the median of the forecast distribution passes through 5.8 million in 2031, reaching 6.9 million by the end of the forecast horizon. This is marginally higher than the New South Wales Department of Planning’s 2008 release projections which anticipate Sydney’s population to pass through 5.7 million in 2031 (NSW Department of Planning, 2008). However, in the wider scheme of things this is a minor issue. The emphasis should really be placed on the likely envelope of potential population futures rather than point forecasts. Following Lutz, Sanderson, and Scherbov (2004) 80% prediction intervals are reported here because they cover the majority of possible outcomes but avoid the greater uncertainty of predictive distributions at their extreme ends. Thus, 80% of likely futures place Greater Sydney’s total population between 5.4 and 6.1 million by 2031, and between 6.0 and 7.9 million by 2051. Even the lowest plausible growth scenarios are likely to result in population increase, with the lower bound of the 95% prediction interval reaching 5.5 million by 2051. The latest ABS population projections also anticipate a total population by 2031 of 5.7 million according to the Series B variant,
Household projection assumptions Household projections were based on the assumption that 2011 Census age-specific living arrangements would hold into the future. The 2011 Gross Rate of Living in a Non-Private Dwelling was thus held at 9.9, the Gross Rate of Living Alone was set to 18.6 and the Gross Rate of Living in a Group Household was fixed at 3.9. All three Gross Rates are forecast using random walk models constrained to these main deterministic assumptions. Standard deviations of the random errors were set based on judgement due to the lack of past errors and estimates of living arrangements and households.
Fig. 3. 80% Absolute Percentage Error interval for past projections of the total population of Greater Sydney. Note: The dashed line depicts a smoothed 80% interval. Source: Calculated using past ABS and New South Wales Government population projections and ABS ERPs.
T. Wilson / Applied Geography 42 (2013) 108e115
Fig. 4. The past and forecast population of Greater Sydney, 1976e2051. Source: ABS ERPs; author’s forecasts
which is widely interpreted as the principal projection (ABS, 2008). In its publications the ABS also emphasises two other projection series, A and C, though it refrains from assigning any specific meaning to them. Nonetheless, in the absence of any explicit description, many users tend to interpret series A, B and C as high, medium and low respectively. By 2031 Series A puts Sydney’s population at 5.8 million, whilst Series C places it at 5.6 million. The interval between these two covers only 30% of the predictive distribution for 2031 shown in Fig. 4. The conclusion is that alternative deterministic projection series should not be interpreted as a range within which the population is likely to fall. As in this case, such ranges are often much too narrow. Population ageing Whilst Sydney’s population is likely to experience considerable growth, it will also undergo significant population ageing in line with the continuing age structural transition being experienced by Australia as a whole (Wilson, 2012). Comprising just 12.7% of the population in 2011, the proportion of the population aged 65 years and above is likely to increase to between 18.0 and 20.1% by 2031 (80% interval) and to between 21.0 and 25.8% by 2051 (80% interval). The numbers aged 65 þ are forecast to increase from 0.58 million in 2001 to between 1.05 and 1.14 million by 2031 (80% interval) and to 1.48 to 1.73 million by 2051 (80% interval). Therefore population ageing will definitely continue and can be forecast with a fairly high degree of confidence for several decades into the future. The 65 þ population is far from homogenous, of course. The fastest growing parts of this age group are at the very highest ages thanks to increases in birth cohorts many decades ago, net migration additions, and considerable falls in elderly mortality rates. Fig. 5 illustrates the forecast growth and uncertainty of Sydney’s very elderly population, defined here as those aged 85 years and above. For the next 15e20 years the growth of this population is likely to remain roughly in line with that of the last two decades. Growth will accelerate from the beginning of the 2030s as the large baby boom cohorts (born 1946e65) begin to join the very elderly ages. What is certain is that under any plausible demographic parameters, the very elderly population will increase in number to a very large degree. The 80% prediction interval in 2051 extends from 318,000 to 424,000.
113
Fig. 5. The past and forecast population aged 85þ in Greater Sydney, 1991e2051. Source: ABS ERPs; author’s forecasts
million in 2011 to 2.2 million in 2031 to 2.6 million by 2051. The New South Wales Government’s Draft Metropolitan Strategy for Sydney to 2031 is based on “545,000 additional dwellings needed over the period 2011 to 2031” (NSW Government, 2013, p. 12). Applying the ABS definition of one dwelling per household, this is higher than the 474,000 increase in households implied by the median of the forecasts presented here. However, the most important message from the probabilistic forecasts is that fertility, mortality, migration and living arrangement trends could well deliver household increases between 2011 and 2031 considerably below or above these numbers. Fig. 6 illustrates the predictive distribution of household increases over the first two decades of the forecasts. The 80% prediction interval spans 307,000 to 671,000 households. The implication for metropolitan planning is to put in place sufficiently flexible plans so that enough greenfield land and infill sites can be made available if necessary to meet the upper end of this interval. At the same time it isn’t necessary to plan for outcomes right at the highest end of the distribution, such as the upper 99% interval boundary, which is very unlikely (an increase of nearly one million households). Relative uncertainty of demographic variables Conventional population projections with high and low variants give misleadingly narrow high-low ranges for some
Household forecasts According to the median of the predictive distribution the future number of households in Sydney is expected to increase from 1.7
Fig. 6. Predictive distribution of the increase in the number of households in Greater Sydney, 2011e31. Source: author’s forecast
114
T. Wilson / Applied Geography 42 (2013) 108e115
be applied to any region for which five year fixed interval migration data are available. The strengths of the model include its:
Table 1 The Relative Interdecile Rangea of selected forecast variables. 2021
2031
2041
2051
Total population Ages 0e19 Ages 20e34 Ages 35e49 Ages 50e64 Ages 65e84 Ages 85þ Ages 65þ % Aged 65þ Total no. of households Lone persons households Group households Family households
0.058 0.104 0.100 0.059 0.026 0.042 0.146 0.055 0.061 0.100 0.054 0.175 0.131
0.127 0.240 0.181 0.156 0.070 0.068 0.205 0.084 0.110 0.166 0.101 0.266 0.215
0.203 0.365 0.257 0.257 0.151 0.097 0.245 0.118 0.162 0.235 0.148 0.353 0.295
0.282 0.479 0.359 0.328 0.247 0.145 0.291 0.161 0.205 0.308 0.207 0.438 0.378
TFR Female e0 Net internal migration Net overseas migration Births Deaths
2016e21 0.233 0.042 1.033 0.704 0.241 0.269
2026e31 0.315 0.045 1.377 0.914 0.366 0.306
2036e41 0.378 0.051 1.667 1.025 0.476 0.324
2046e51 0.437 0.052 1.950 1.095 0.599 0.327
a
Difference between the ninth and first deciles divided by the median.
demographic variables whilst for others the ranges are too wide. Probabilistic forecasts avoid this problem because a predictive distribution can be calculated for any demographic variable output by the model. The advantage of this feature is that the analyst can see which aspects of the demographic future can be forecast with high or low levels of confidence and how they vary over time. Table 1 summarises how uncertainty differs between variables and over time by presenting the Relative Interdecile Range (RIDR) for a selection of output variables. The RIDR is the difference between the ninth and first deciles of a distribution (i.e. the 80% prediction interval) divided by the median (Lutz et al., 2004). For total population the RIDR increases almost linearly over time, reaching 0.282 by 2051, i.e. the 80% prediction interval spans 28.2% of the median value. Forecasts for age groups 50e64 and 65e 84 are more certain than those for the population as a whole, whilst those for age groups under 50 are less certain, especially 0e19 year olds after 2031. The explanation for these differences lies in the relative uncertainty of fertility, mortality, internal migration and overseas migration (lower panel of Table 1) and varying intensities by age group. Forecasts for the young adult ages, the most migratory age group, are affected by high levels of uncertainty surrounding both internal and overseas migration. Uncertainty over fertility and, after about 20 years into the forecasts, the size of the childbearing age population creates relatively high and continually increasing uncertainty over the size of the 0e19 year old age group. Age groups 50e64 and 65e84 are generally subject to much lower interregional migration rates and still fairly low mortality rates (except at the higher end of this age range), resulting in more certain forecasts. This largely explains the lower uncertainty in forecasts of lone person households because the proportions of people living alone are high at these older ages. Uncertainty surrounding forecasts for the 85 þ age group are primarily due to uncertainty over mortality rates. Conclusions This paper has presented a probabilistic population and household projection model, PROBREG, for a subnational region, illustrating it with an application to the Greater Sydney region. However, the model is geographically flexible and could potentially
focus on a subnational region explicit handling of demographic interaction with the rest of Australia and the rest of the world through directional rather than net migration flows linkage to national-level fertility, mortality and overseas migration predictive distributions creation of consistent population and household forecasts ability to use deterministic projection assumptions as median forecasts, thus enabling it to be an enhancement to, rather than a complete replacement for, conventional deterministic forecasts production of outputs for the rest of the country region, and by addition for the country as a whole incorporation of correlations in demographic variables across sex and region calibration against the distribution of past forecast errors lower input data requirements and faster run-times than an equivalent single year of age probabilistic model. Notwithstanding these features the PROBREG model possesses several limitations, some of which could be addressed by further research and development. Alignment with past forecast errors ideally requires a large number of past population forecasts, and preferably far more than the 14 used in the example here. One approach might be to evaluate past forecasts for a large number and variety of regions and estimate a relationship between error distributions and characteristics of regions, such as population size and recent growth rates (e.g. Rayer et al., 2009). Of course, using past error distributions in this way assumes the future will be as hard to forecast as the past. However, this is a reasonable assumption for which there is some evidence (e.g. Smith & Sincich, 1988) and one which the author believes is better than relying solely on statistical models or expert judgement. Second, the model does not explicitly account for correlations between internal and overseas migration, a feature which has been observed in the case of Sydney’s migration system (e.g. Burnley, 1996). Possible errors in jump-off populations are also not yet included in the model. Third, there were an insufficient number of past household forecasts and subsequently published household estimates to undertake a forecast error evaluation and use these findings to fine-tune the household predictive distributions. Finally, the cost of simplification through the use of five year age groups and forecast intervals is, of course, a lack of single year age and time detail. Nonetheless the model contains many strengths and is one of the first to offer a means of quantifying the demographic uncertainty faced by subnational areas. It is widely accepted that demographic forecasts always turn out to be erroneous to a greater or lesser extent. Given that uncertainty generally increases as population size deceases, probabilistic methods are particularly relevant for regions and local areas. Probabilistic regional forecasts, like those for Greater Sydney, provide planners and policymakers with an indication of the plausible range of demographic futures. Too often metropolitan plans include population, household and dwelling forecasts for many decades ahead with a high level of certainty implied (Bunker, 2012). There is almost a ‘demographic determinism’ in which the demographic future is seen to be predetermined, with planning consisting of “forecasting first and then making plans to accommodate the numbers” (Isserman, 2007, p. 197). As an alternative, metropolitan planning could instead focus on the creation of desired futures set within the broad envelope of plausible demographic futures, where ‘plausible’ might be defined by 80% prediction intervals.
T. Wilson / Applied Geography 42 (2013) 108e115
References ABS (Australian Bureau of Statistics). (2008). Population projections, Australia, 2006 to 2101. Catalogue No. 3222.0. Canberra: ABS. ABS (Australian Bureau of Statistics). (2012). Population by age and sex, regions of Australia, 2011. Catalogue No. 3235.0. Canberra: ABS. Ahlburg, D. A. (1987). Population forecasts for South Pacific nations using autoregressive models. Journal of the Australian Population Association, 4(2), 157e167. Alho, J. (2008). Aggregation across countries in stochastic population forecasts. International Journal of Forecasting, 24(3), 343e353. Alho, J., & Keilman, N. (2010). On future household structure. Journal of the Royal Statistical Society: Series A (Statistics in Society), 173(1), 117e143. Bell, M., Wilson, T., & Charles-Edwards, E. (2011). Australia’s population future: probabilistic forecasts incorporating expert judgement. Geographical Research, 49(3), 261e275. Bunker, R. (2012). Reviewing the path dependency in Australian metropolitan planning. Urban Policy and Research, 30(4), 443e452. Burnley, I. H. (1996). Associations between overseas, intra-urban and internal migration and internal migration dynamics in Sydney, 1976e91. Journal of the Australian Population Association, 13(1), 47e65. Cameron, M. P., & Poot, J. (2011). Lessons from stochastic small-area population projections: the case of Waikato subregions in New Zealand. Journal of Population Research, 28(2e3), 245e265. De Beer, J., & Alders, M. (1999). Probabilistic population and household forecasts for the Netherlands. In Paper prepared for the Joint ECE-EUROSTAT work Session on demographic projections, Perugia, Italy, May 3e7, 1999. Härdle, W., & Mysickova, A. (2009). Stochastic population forecast for Germany and its consequence for the German pension system. Berlin: School of Business and Economics, Humboldt University. Hunsinger, E. (2010). An expert-based stochastic population forecast for Alaska, using autoregressive models with random coefficients. Juneau, Alaska: Alaska Department of Labor and Workforce Development. Hyndman, R., & Booth, H. (2008). Stochastic population forecasts using functional data models for mortality, fertility and migration. International Journal of Forecasting, 24, 323e342. Isserman, A. M. (2007). Forecasting to learn how the world can work. In L. D. Hopkins, & M. Zapata (Eds.), Engaging the future: Forecasts, scenarios, plans, and projects (pp. 175e197). Cambridge, MA: Lincoln Institute of Land Policy. Jiang, L., & O’Neill, B. C. (2004). Toward a new model for probabilistic household forecasts. International Statistical Review, 72(1), 51e64. Keyfitz, N. (1981). The limits of population forecasting. Population and Development Review, 7(4), 579e593. Kupiszewski, M., & Kupizewska, D. (2003). Internal migration component in subnational population projections in member states of the European Union. Working Paper 2/2003. Warsaw: Central European Forum for Migration Research. Lee, R. D. (1999). Probabilistic approaches to population forecasting. In W. Lutz, J. Vaupel, & D. A. Ahlburg (Eds.), Frontiers of population forecasting (pp. 156e 190). New York: Population Council. Lee, R., Miller, T., & Edwards, R. D. (2003). The growth and aging of California’s population: Demographic and fiscal projections, characteristics and service needs. UC Berkeley: Center for the Economics and Demography of Aging. Lingaraj, B. P., & Runte, R. A. (1975). Simulation of regional population trends. Computers and Urban Society, 1(1), 121e129. Li, Q., Reuser, M., Kraus, C., & Alho, J. (2009). Ageing of a giant: a stochastic population forecast for China, 2006e2060. Journal of Population Research, 26, 21e50. Lutz, W., Sanderson, W., & Scherbov, S. (2001). The end of world population growth. Nature, 412, 543e545. Lutz, W., Sanderson, W. C., & Scherbov, S. (2004). The end of world population growth in the 21st Century: New Challenges for Human Capital Formation and Sustainable development. London: Earthscan. NSW Department of Planning. (2008). New South Wales state and regional population projections, 2006e2036. Sydney: Department of Planning. NSW Government. (2013). Draft metropolitan plan for Sydney 2031. Sydney: NSW Department of Planning & Infrastructure. Okita, Y., Pfau, W. D., & Giang, T. L. (2009). A stochastic forecast model for Japan’s population. Discussion Paper 09e06. Tokyo: National Graduate Institute for Policy Studies.
115
Pfaumer, P. (1992). Forecasting US population totals with the Box-Jenkins approach. International Journal of Forecasting, 8(3), 329e338. Press, W., Teukolsky, S., Vetterling, W., & Flannery, B. (2001). Numerical Recipes in Fortran 77: The art of scientific computing. New York: Cambridge University Press. Raftery, A. E., Li, N., Sevcikova, H., Gerland, P., & Heilig, G. K. (2012). Bayesian probabilistic population projections for all countries. Proceedings of the National Academy of Sciences, 109(35), 13915e13921. Rayer, S. (2008). Population forecast errors: a primer for planners. Journal of Planning Education and Research, 27(4), 417e430. Rayer, S., Smith, S. K., & Tayman, J. (2009). Empirical prediction intervals for county population forecasts. Population Research and Policy Review, 28(6), 773e793. Rees, P. (1997). Problems and solutions in forecasting geographical populations. Journal of the Australian Population Association, 14(2), 145e166. Rees, P. (2002). New models for projecting UK ethnic group populations at national and subnational scales. In J. Haskey (Ed.), Population projections by ethnic group: A feasibility study (pp. 27e51). London: Stationery Office. Rees, P., & Turton, I. (1998). Investigation of the effects of input uncertainty on population forecasting. In Paper prepared for the GeoComputation 98 Conference, Bristol, UK, 17e19 September 1998. Rees, P., & Wilson, A. (1977). Spatial population analysis. London: Edward Arnold. Rogers, A. (1966). Matrix methods of population analysis. Journal of the American Institute of Planners, 32(1), 40e44. Rogers, A. (1975). Introduction to multiregional mathematical demography. New York: John Wiley. Scherbov, S., Mamolo, M., & Lutz, W. (2008). Probabilistic population projections for the 27 EU member states based on Eurostat assumptions. In European demographic research papers 2/2008. Vienna: Vienna Institute for Demography. Smith, S. K., & Sincich, T. (1988). Stability over time in the distribution of population forecast errors. Demography, 25(3), 461e474. Statistics Netherlands. (2005). Changing population of Europe: Uncertain future. The Hague: Statistics Netherlands. Stoto, M. A. (1983). The accuracy of population projections. Journal of the American Statistical Association, 78(381), 13e20. Tayman, J. (2011). Assessing uncertainty in small area forecasts: state of the practice and implementation strategy. Population Research and Policy Review, 30(5), 781e800. Tayman, J., Schafer, E., & Carter, L. (1998). The role of population size in the determination and prediction of population forecast errors: an evaluation using confidence intervals for subcounty areas. Population Research and Policy Review, 17(1), 1e20. Tayman, J., Smith, S. K., & Lin, J. (2007). Precision, bias, and uncertainty for state population forecasts: an exploratory analysis of time series models. Population Research and Policy Review, 26(3), 347e369. Tromans, N., Natamba, E., Jefferies, J., & Norman, P. (2008). Have national trends in fertility between 1986 and 2006 occurred evenly across England and Wales? Population Trends, 133, 7e19. Ueffing, P., & Wilson, T. (2013). Estimating historical total fertility rates for Australia and its States. Working Paper. Brisbane: Queensland Centre for Population Research, The University of Queensland. Willekens, F. J. (1990). Demographic forecasting: state-of-the-art and research needs. In C. A. Hazeu, & C. A. B. Frinking (Eds.), Emerging issues in demographic research (pp. 9e66). Amsterdam: Elsevier Science. Wilson, T. (2012). Forecast accuracy and uncertainty of Australian Bureau of Statistics state and territory population projections. International Journal of Population Research, 2012. http://www.hindawi.com/journals/ijpr/2012/419824/. Wilson, T. (2013). The sequential propensity household projection model. Demographic Research, 28(24), 681e712. http://www.demographic-research.org/ volumes/vol28/24/28-24.pdf. Wilson, T., & Bell, M. (2007). Probabilistic regional population forecasts: the example of Queensland, Australia. Geographical Analysis, 39(1), 1e25. Wilson, T., & Rees, P. (2005). Recent developments in population projection methodology: a review. Population, Space and Place, 11(5), 337e360. Wilson, T., & Rowe, F. (2011). The forecast accuracy of local government area population projections: a case study of Queensland. Australasian Journal of Regional Studies, 17(2), 204e243.