Transportation Research Part B 35 (2001) 667±676
www.elsevier.com/locate/trb
Inference for origin±destination matrices: estimation, prediction and reconstruction Martin L. Hazelton * Department of Mathematics and Statistics, University of Western Australia, Nedlands, WA 6907, Australia Received 7 July 1999; received in revised form 7 December 1999; accepted 21 December 1999
Abstract This paper concerns inference about an origin±destination (O±D) matrix from a single observation on a set of network link ¯ows. Two problems in this area have received attention; ®rst, the problem of reconstructing the actual number of O±D trips, and second, estimation of mean O±D trip rates. Little distinction has been drawn between these in the literature; indeed, it has sometimes been implicitly suggested that their a posteriori most probable solutions are identical. We show that this is not the case. An example is provided to demonstrate that the dissimilarity between the solutions to the two problems is potentially unbounded. The relative merits of reconstructed versus estimated mean trip count vectors are discussed with particular reference to their use as predictors of future trac ¯ows. Ó 2001 Elsevier Science Ltd. All rights reserved. Keywords: Bayesian statistics; Link ¯ows; Posterior density; Predictive distribution; Trip rates
1. Introduction The aim of this paper is to clarify some fundamental theoretical aspects of statistical inference for the origin±destination (O±D) matrix of a transport network. We assume that a single observed set of link ¯ows is available, and that prior information about the O±D matrix is at hand (via an out-of-date estimate, for instance). O±D matrix inference under these conditions has been widely studied; see Cascetta and Nguyen (1988) (and the many references therein), Bell (1991), Lo et al. (1996) and Tebaldi and West (1998) for instances. Research in this area has addressed two subtly dierent problems. Many authors have investigated methods for estimating or reconstructing (as we shall call it) the actual number of O±D trips that gave rise to the link ¯ows observed. For example, Bell (1983) was interested in ®nding, ``the most probable set of O±D movements that are *
Tel.: +618-9380-3460; fax: +618-9380-1028. E-mail address:
[email protected] (M.L. Hazelton).
0191-2615/01/$ - see front matter Ó 2001 Elsevier Science Ltd. All rights reserved. PII: S 0 1 9 1 - 2 6 1 5 ( 0 0 ) 0 0 0 0 9 - 6
668
M.L. Hazelton / Transportation Research Part B 35 (2001) 667±676
consistent with a set of link ¯ows.'' This phrase neatly summarises many papers (such as Van Zuylen and Willumsen, 1980; Van Vliet and Willumsen, 1981) in which the elements of the estimated O±D matrix are constrained in terms of the observed link ¯ows by a deterministic assignment equation. More recently, some authors have sought to estimate the expected (or mean) number of O±D trips. For instance, Lo et al. (1996) express this aim in terms of estimation of the, ``population trip matrix,'' while Vardi (1996) considers estimating trac intensities (mean number of O±D trips per unit time). Till date, researchers do not seem to have paid any particular attention to the distinction between the two inferential problems. Indeed, some authors implicitly suggest that the two problems are equivalent. Spiess (1987), for instance, states that the aim of his work is, ``to estimate the parameters . . . of the underlying Poisson process,'' (i.e., the expected numbers of O±D trips), yet his estimated O±D trip matrix is constrained by the observed link ¯ows through an assignment equation (i.e., in practice he seeks the actual set of O±D trips that gave rise to the observed link ¯ows). Lo et al. (1996) do recognise the distinction between the mean estimation problem (the problem of principal interest to them) and the O±D matrix reconstruction problem, but they do not delve into the matter in any detail. Certainly, it does not seem to be widely appreciated that not only are the two problems non-equivalent but that (within a Bayesian statistical framework) the a posteriori most probable solutions to both problems can be arbitrarily dissimilar, as our work demonstrates. This paper is structured as follows. In Section 2, we introduce necessary notation and develop a statistical model for O±D matrix inference. A detailed comparison of the problems of O±D ¯ow reconstruction and of mean O±D trip count estimation is given in Section 3, including an example demonstrating the potentially unbounded dissimilarity between solutions to the two problems. We also describe situations under which the two problems are more or less equivalent. In Section 4, the issue of prediction of future trac ¯ows is discussed. It is shown that the estimated mean trip count vector is an optimal predictor within a certain theoretical framework. Nevertheless, a numerical example demonstrates that the reconstructed trip count vector may produce better predictions in practice if the prior information about the O±D matrix is of poor quality.
2. Statistical modelling ~ directed links. Suppose that trac counts are Consider a transport network with n nodes and m available on a set of m of the network links. Let x
x1 ; . . . ; xm T denote a single observation on the vector of trac counts (over a given period) on the monitored links, where superscript T denotes a transpose. Let y
y1 ; . . . ; yc T be the (unobserved) vector of trac counts on all feasible routes (ordered in some arbitrary fashion). Then x Ay; where A is the path-link incidence matrix for the monitored links only, whose
i; jth element is de®ned by 1 link i forms part of route j; aij 0 otherwise:
M.L. Hazelton / Transportation Research Part B 35 (2001) 667±676
669
The vector, z, of O±D trac counts can be computed by z By; where the
i; jth element of B is de®ned by 1 route j connects O±D pair i; bij 0 otherwise: Working within a standard Bayesian statistical framework (so as to allow prior information to be incorporated in a natural fashion), x, y and z are all random vectors. Let h Eyjh. Then Ah Exjh is the vector of expected link counts, and l Bh is the mean O±D trip vector (from which mean O±D trip rates can P be obtained by simply dividing by the duration of the observational period). Also, pij bij hj =
j bij hj is the probability that a traveller between O±D pair i utilizes route j. We let p
h denote the prior density for h. Under the commonly used assumption (see Lo et al., 1996, for instance) that y1 ; . . . ; yc are independent observations on Poisson random variables with means h1 ; . . . ; hc respectively, the joint distribution of x can be precisely calculated. However, its form is cumbersome, so the following multivariate normal (MVN) approximations (from Hazelton, 2000) for the densities of x and y (both given h) are useful f
y j h Nc
h; H; where H diag
h, and f
xjh Nm
Ah; AHAT ; where the notation Nm
m; N denotes an m-variate normal distribution with mean vector m and dispersion matrix N. Note that the discrepancy between this approximation and the exact joint distribution will become arbitrarily small in relative terms as the elements of Ah become large (since under this limiting process a version of the Central Limit Theorem takes eect). In practice, this means that the approximation will typically be good when dealing with trac counts collected over long periods of time or during shorter intervals when the ¯ow levels are high (e.g., during peak hour). 3. Estimation and reconstruction Having covered the necessary background in the previous section, we are now in a position to formally state the two estimation problems that concern us. O±D matrix reconstruction. The aim of O±D matrix reconstruction is to estimate z. That is, to try and pinpoint the actual number of trips between each O±D pair that occurred during the observational period. Henceforth we shall sometimes refer to this simply as the `reconstruction' problem. Mean O±D trip count estimation. The aim of mean O±D trip count estimation is to estimate the expected number of O±D trips, l Bh Ezjh. Henceforth we shall sometimes refer to this simply as the `estimation' problem. In deriving the Bayesian inferential theory for these problems, it is necessary to consider the corresponding theory for y and h Eyjh. The quantities y and h are, in many senses, more fundamental than z and l. The former pair de®ne not only mean number of O±D trips, but also
670
M.L. Hazelton / Transportation Research Part B 35 (2001) 667±676
route choice probabilities (the pij ). Sometimes these probabilities are of direct interest in themselves, but even when not, they constitute nuisance parameters whose values have to be estimated. (Many authors have assumed these to be known; this type of situation can be represented within our modelling framework by use of a suitably informative prior distribution on h.) For both types of problems, a fundamental inferential tool (within the Bayesian statistical setting) is the posterior distribution of h, given by p
hjx
f
xjhp
h ; h
x
R where the normalising constant h
x f
xjhp
h dh is simply the marginal density of the link ¯ow vector. Once p
hjx has been found, the posterior density of l can be calculated as Z p
hjx dh;
1 p
ljx S
l
where S
l fh : l Bhg; i.e., the set of all mean route trac counts consistent with mean O±D trac count l. The a posteriori most probable vector of expected O±D counts (a natural solution to the estimation problem) is the mode of Eq. (1), b l arg max p
ljx: l
We note that the posterior mean, Eljx, also constitutes a legitimate solution to the estimation problem. Indeed, posterior mode and mean will be close to each other when the posterior is unimodal (as will generally be the case). The reconstruction problem is based upon the posterior distribution of z. In order to derive this, we note that the posterior density of y is given by Z p
yjx f
yjx; hp
hjx dh: The fact that x is a deterministic function of y allows this posterior to be expressed in the simple form f
y; x ; h
x h
yIx Ay ; h
x R where h
y f
yjhp
h dh is the marginal density of y and IE is the indicator of an event E (taking the value one if E occurs and zero otherwise). We can now write the posterior density of z as X h
yIx Ay p
zjx :
2 h
x y2S
z p
yjx
Note that the summation in Eq. (2) will be replaced by an integral when using a continuous (e.g., multivariate normal) approximation to the Poisson model. The a posteriori most probable O±D count vector (a natural solution of the reconstruction problem) is the mode of this density, bz arg max p
zjx: z
M.L. Hazelton / Transportation Research Part B 35 (2001) 667±676
671
The mere fact that p
ljx and p
zjx are not identical demonstrates that the estimation and reconstruction problems are dierent. However, the forms of both these posterior densities are complicated, and it is dicult to come to any further general conclusions. Nevertheless, greater insight into the matter can be gained by considering two special cases: inference under ®xed routing and inference with unique route counts. 3.1. Inference under ®xed routing Under ®xed routing precisely one (known) route is used between each O±D pair, so the matrix B is the identity and hence y z and h l. To develop matters further we will apply the MVN approximation developed at the end of the previous section, and assume the following prior density for h p
h Nc
h0 ; R0 :
3
The posterior density of h then becomes p
hjx /
AHAT
1=2
exp f
h
T
h0 R0 1
h
h0
x
T
1
Ah
AHAT
x
Ahg=2:
4
We note in passing that Eq. (4) is reminiscent of the posterior distribution given by Maher (1983) (whose work implicitly covers ®xed routing as a special case). However, there is an important dierence. Whilst Maher assumed (a little unintuitively) that the dispersion matrix of y is independent of h, the reverse is true under our Poisson model. A signi®cant consequence is that for Maher the log-posterior for h was simply a quadratic form, while for us the log-posterior is rather more complicated. The posterior density of z is intractable even under ®xed routing and with the MVN approximation to the Poisson model. Speci®cally, it does not seem possible to simplify the integral form Z 1=2 T T expf
z h H 1
z h=2
h h0 R0 1
h h0 =2g dh: p
zjx / Ix Ay jHj However, a ®rst-order approximation to this (which will be particularly good in the presence of a highly informative prior) is p
zjx / expf
z
h0 T
R0 H0 1
z
h0 =2gIx Ay;
5
where H0 diag
h0 . Under this approximation the a posteriori most probable z is simply the O±D trip vector, consistent with the observed link ¯ows, which is closest to the prior mean h0 , where the metric is the canonical one de®ned by the dispersion matrix R0 H0 . This provides a nice link between the formalism presented here and the early heuristic approaches to O±D matrix estimation. Any methodology which concentrates on the set of O±D trip vectors reproducing the observed link ¯ows and then selects from within this set the element closest to a `target vector', can be interpreted as an attempt to solve the reconstruction problem within a Bayesian framework. The conclusions that have been drawn in the ®xed routing case can be generalized to situations where the proportion of travellers taking each feasible route is known without error. This is because with perfect knowledge of assignment proportions, the posterior distribution for z (or l) is identical to that of y (or h), as in the ®xed routing case. While several authors writing on O±D
672
M.L. Hazelton / Transportation Research Part B 35 (2001) 667±676
matrix estimation have assumed that the assignment proportions are known (see Van Zuylen and Willumsen, 1980; Spiess, 1987 and for instances) this assumption is highly unrealistic and so we devote little space to its consequences. It seems to us perfectly reasonable to assume some degree of prior knowledge about the assignment probabilities (the pij ), but the assignment proportions are often observable functions of y. For a speci®c example, consider a single O±D pair connected by two parallel links. We may know almost certainly a priori that travellers are equally likely to select either route, but even conditional on a total O±D count of 100 units and both assignment probabilities being one half, the actual number of travellers using the ®rst link cannot be assumed to be 50. Rather, it will be an observation on a binomial random variable. 3.2. Inference with unique route counts We now consider a situation in which there is only a single route count vector, yy , satisfying Ay x. The analysis is split depending on the strength of the prior information available about h. As with the ®xed routing case, we will use MVN approximations, and assume the form of prior in Eq. (3). When the prior information about h is vague the eigenvectors of R0 will be large, and the posterior density for h will be well approximated by c Y 1 p exp
yiy hi 2 =
2hi : p
hjx / hi i1 Using elementary calculus, the maximizer of this function (the a posterior most probable h) is b h yy 2 1=2 1, where 1 is a vector (length c) of ones. Hence the solution to the estimation problem is b l Byy 2 1=2 B1. Turning to the reconstruction problem, the posterior distribution of z is degenerate, placing probability one on the single point Byy . It follows that with vague prior information the a posteriori most probable solutions to the estimation and reconstruction problems are very close. When there is very precise prior information about h, the posterior distribution will be very similar to p
h. In particular, the posterior mode can be made arbitrarily close to the prior mode, h0 , by choosing an appropriate R0 with suciently small eigenvalues (indicating high prior precision). To see this, note that if R0 is diagonal then the derivative of the log-posterior distribution with respect to rth component of h is o log p
hjx
hr ohr
h0r r0r1
o log f
xjh; ohr
6
where h0r is the rth component of h0 and r0r is the rth diagonal element of R0 . Now if the eigenvalues of R0 are suciently small (and hence r0r1 large) then the partial derivative in Eq. (6) h h0 and so the can only be zero when hr is within a given small distance of h0r . It follows that b estimation problem is solved by b l Bh0 in this case. Nevertheless, the only a posteriori possible O±D count vector reconstruction remains Byy . It follows that the solutions to the two problems can be arbitrarily dissimilar if prior information and observed link counts are highly contradictory. Such extreme dissimilarity between optimal solutions to estimation and reconstruction problems is by no means restricted to this rather speci®c example. It can occur under quite general
M.L. Hazelton / Transportation Research Part B 35 (2001) 667±676
673
circumstances. To appreciate this, note that the sum of the elements of bz will be constrained to equal the sum of the trac counts on all links emanating from origin nodes (so long as we observe all these links). On the other hand, b l can be made arbitrarily close to the prior mean l0 Bh0 (the sum of whose elements is entirely arbitrary) in the presence of precise prior information. The distance between bz and b l is therefore potentially unlimited. 4. Prediction In the context of transport planning it is not the estimation or reconstruction problems which are of direct interest, but rather whether the solutions to these problems provide good predictions of future O±D trac ¯ows. Within a formal Bayesian framework optimal prediction is based upon the posterior distribution of h rather than that of y, indicating that from a theoretical perspective it makes sense to concentrate on the estimation problem for the purposes of prediction. To take a speci®c (idealized) example, suppose that we collect link ¯ow counts, xold , today and wish to predict the vector of route trips, ynew , tomorrow. Now, the posterior predictive density of ynew can be expressed as Z f
ynew jxold f
ynew jh; xold p
hjxold dh Z f
ynew jhp
hjxold dh:
7 In order to minimise the expected squared error of our prediction we must use the mean of this posterior predictive density as our point prediction. This posterior predictive mean, ypred , is given by Z Z ynew f
ynew jh dynew p
hjxold dh ypred Z p
hjxold h dh;
8 i.e., the posterior mean of h. It follows that, in a certain theoretical sense, optimal prediction is achieved via solution of the estimation, rather than reconstruction, problem. At a practical level it is important to recognise that theoretically optimal Bayesian prediction is dependent on the particular prior distribution chosen for h. We illustrate this fact using a numerical example based on the network displayed in Fig. 1, and with true matrix of mean O±D trip counts given in Table 1. We assume the trac counts are available on m 8 of the links; speci®cally, those numbered 1, 2, 5, 6, 7, 8, 11, and 12. To tie in with one of the particular cases discussed in detail in Section 3 we assume that ®xed routes are used between each O±D pair, so that y z and h l. Six test scenarios are investigated, each speci®ed by the form of prior, p
h, employed. Prior 1 is centred at the true value of the mean trac count vector, htrue , and has small variance, while prior 2 is centred at the same point but has variance 36 times as large. Priors 3 and 4 are both centred at h0 htrue =2, with the former having small variance and the latter large. Priors 5 and 6 are centered at h0 h 100
10; 1; 0; 5; 5; 4; 6; 1; 7; 8; 2; 5T and have small and large variances, respectively. In
674
M.L. Hazelton / Transportation Research Part B 35 (2001) 667±676
Fig. 1. Network topology for numerical example. Double circle nodes represent zone centroids (origins or destinations).
Table 1 Mean origin-destination ¯ows for numerical example Destination
Origin 1 3 4 6
1
3
4
6
0 800 600 800
800 0 400 200
400 200 0 200
200 200 400 0
each test case, we generated a vector of today's link trac counts xold , found both b h and b y , and then used both of these vectors as predictors of ynew (a randomly generated vector of tomorrow's route trac counts). By repeating this prediction process over 500 simulated `today±tomorrow' sets of data, the (estimated) root mean square prediction error, v u 500 h i uX
i
i jjy ynew jj2 RMSPE
y t pred
i1
pred
was computed, where ypred is either b h or b y and bracketed superscript i indicates simulation number. The results, displayed in Table 2, can be interpreted in a very intuitive fashion. The posterior distribution, p
hjx, is an amalgam of the prior information and the information provided by the data (through the likelihood f
xjh). If the prior accurately re¯ects the location of htrue we can therefore expect the posterior to do likewise. In that case the posterior predictive density is based upon averaging over values of h near the true one (see Eq. (7)) and so the theoretical optimality of b h will be apparent. This is precisely what has happened with priors 1 and 2, where b h outperforms b b y as a predictor. Note that h has a slightly smaller prediction error with prior 1 than with prior 2,
M.L. Hazelton / Transportation Research Part B 35 (2001) 667±676
675
Table 2 Root mean square prediction errors using solutions to estimation and reconstruction problems as predictors RMSPE
b h RMSPE
b y Prior Prior Prior Prior Prior Prior
1 2 3 4 5 6
71.7 74.8 790.4 567.5 815.2 816.6
94.1 96.1 190.0 190.0 818.7 818.8
since the former prior has higher precision (lower variance) and so leads to a posterior which is a little more concentrated close to htrue . When the prior does not re¯ect the location of htrue , the posterior distribution will be shifted away from this true value. This will clearly be detrimental to the performance of b h as a predictor, but the eect on the predictive ability of b y is more subtle. As noted in Section 3, the fact that b y is de®ned to be completely consistent with the link count data implicitly places a constraint on the total level of trac. Now priors 3 and 4 both imply a far lower level of trac ¯ow than is actually the case. As a result, the predictions made using b h are quite poor (since this vector constantly underestimates tomorrow's route ¯ows). The predictions for prior 4 are somewhat less bad than those for prior 3 because the higher precision of the latter prior results in the posterior being y , performs almost as well as for shifted further from htrue . The reconstructed trac count vector, b priors 1 and 2, because of the implicit ¯ow level constraint just mentioned. Moving on to the ®nal pair of test scenarios, priors 5 and 6 are centred at a mean route count vector, h , for which the overall level of ¯ow is very similar to htrue . However, htrue and h imply very dierent distributions of this ¯ow between O±D pairs. As a consequence the implicit ¯ow level constraint is of no help to b y , and both it and b h perform poorly as predictors of ynew . Overall these results provide a mixed message. The theoretical advantages of b h appear in the b presence of high quality prior information, but y can be a more robust predictor when the prior information is misleading. This is potentially unwelcome news for practitioners since the quality of prior O±D matrices can be hard to judge. Nevertheless, this diculty can be viewed as an artefact of the particular framework within which we have worked. The in¯uence of the prior distribution on the posterior diminishes as the number of observations increases. We have assumed (in common with the vast majority of authors on O±D matrix estimation) that trac counts are observed over a single time period only, so the marked dependence of the results on the prior distribution is hardly surprising. However, in reality the option usually exists to collect data over a sequence of time periods when the in¯uence of the prior distribution will be reduced. Vardi (1996) and Hazelton (2000) look at the estimation problem for such data. Finally, we note that the tenor of this paper has been largely theoretical, and practical issues regarding maximization of posterior distributions (and computation of Bayesian credible intervals) have been ignored. Nonetheless, the posterior distributions in Eqs. (1) and (2) are the basic tools for Bayesian inference for O±D matrices and practical methods for their exploration are of considerable interest. We suggest that, in practice, properties of these can be computed using Markov chain Monte Carlo methods (see Gamerman, 1997, for example). In particular, the Metropolis-Hastings algorithm (Hastings, 1970) has considerable potential for exploration of the
676
M.L. Hazelton / Transportation Research Part B 35 (2001) 667±676
posterior for h since the normalizing constant for that density will have no closed form, except in special cases. Acknowledgements The author acknowledges the helpful comments of an anonymous referee. References Bell, M.G.H., 1983. The estimation of an origin±destination matrix from trac counts. Transportation Science 10, 198±217. Bell, M.G.H., 1991. The estimation of origin±destination matrices by constrained generalized least squares. Transportation Research B 25, 13±22. Cascetta, E., Nguyen, S., 1988. A uni®ed framework for estimating or updating origin/destination matrices from trac counts. Transportation Research B 22, 437±455. Gamerman, D., 1997. Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference. Chapman & Hall, London. Hastings, W.K., 1970. Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57, 97±109. Hazelton, M.L., 2000. Estimation of O±D matrices from link ¯ows on uncongested networks. Transportation Research B 34, 549±566. Lo, H.P., Zhang, N., Lam, W.H.K., 1996. Estimation of an origin±destination matrix with random link choice proportions: a statistical approach. Transportation Research B 30, 309±324. Maher, M.J., 1983. Inferences on trip matrices from observations on link volumes: a Bayesian statistical approach. Transportation Research B 20, 435±447. Spiess, H., 1987. A maximum likelihood model for estimating origin±destination matrices. Transportation Research B 21, 395±412. Tebaldi, C., West, M., 1998. Bayesian inference on network trac using link count data (with discussion). Journal of the American Statistical Association 93, 557±576. Van Vliet, D., Willumsen, L.G., 1981. Validation of the ME2 model for estimating trip matrices from trac counts. In: Proceedings of the Eighth International Symposium on Transportation and Trac Theory. Van Zuylen, J.H., Willumsen, L.G., 1980. The most likely trip matrix estimated from trac counts. Transportation Research B 14, 281±293. Vardi, Y., 1996. Network tomography: estimating source-destination trac intensities from link data. Journal of the American Statistical Association 91, 365±377.