Computational Statistics and Data Analysis 70 (2014) 328–344
Contents lists available at ScienceDirect
Computational Statistics and Data Analysis journal homepage: www.elsevier.com/locate/csda
A cluster analysis of vote transitions✩ Xavier Puig ∗ , Josep Ginebra Department of Statistics, Technical University of Catalonia, Avgda.Diagonal 647, 6a Planta, 08028 Barcelona, Spain
article
info
Article history: Received 26 November 2012 Received in revised form 30 September 2013 Accepted 7 October 2013 Available online 18 October 2013 Keywords: Bayesian model checking Bayesian hierarchical model Ecological inference Election data Spatial data
abstract To help settle the debate triggered the day after any election around the origin and destination of the vote of winners and losers, a Bayesian analysis of the results in a pair of consecutive elections is proposed. It is based on a model that simultaneously carries out a cluster analysis of the areas in which the results are broken into and links the results in the two elections of areas in a given cluster through a vote switch matrix. The number of clusters is chosen both through predictive checks as well as by testing whether the residuals are spatially correlated or not. The analysis is tried on the results in Barcelona of a pair of consecutive elections held just four months apart, in 2003 for the Catalan parliament and in 2004 for the Spanish parliament. The proposed approach, which reconstructs individual behavior from aggregated data, can be exported to be a solution for any ecological inference problem where one cannot assume that all the areas are exchangeable the way typically assumed by other ecological inference methods. © 2013 Elsevier B.V. All rights reserved.
1. Introduction The day after any election a debate is always triggered around the way voters switched their vote or switched to (from) not voting from (to) voting for this or that option relative to previous elections. That debate is especially poignant in Catalonia, an autonomous region in north-east Spain, due to its voters splitting across a national allegiance divide on top of the usual ideological divide, which leads to a lot of options to chose from and to individuals voting very differently depending on the kind of election at hand. To help assess how voting age individuals change their vote, a Bayesian analysis is proposed based on a model for the results of a pair of consecutive elections broken down into small areas. The model simultaneously carries out an s-cluster analysis on the areas, assuming that both the average voting behavior as well as the way in which individuals switch their vote in areas of the same cluster are similar, and it estimates s vote switch matrices, each ruling the way in which individuals in an area of a given cluster change their vote between the first and the second election. The number of clusters is chosen by checking whether the corresponding models capture the levels, the dispersion and the spatial dependence both in the election results as well as in the way these results change. Disregarding the fact that the voting age population in the areas considered changes slightly between the first and the second election, one can consider the first election results of each area to be the row totals and the second election results of that area to be the column totals of a k1 × k2 contingency table. Posed in these terms, the goal of the analysis is the estimation of the k1 k2 cells of these tables, which is the canonical formulation of the ecological inference problem in the social sciences, where one aims to extract information about individual behavior starting from information reported only at an aggregate level. Good overviews of the approaches considered for that problem can be found in King (1997), Freedman (2001), Freedman et al. (1991, 1998) and Glynn and Wakefield (2010).
✩ Supplementary materials are available online expanding on Sections 2 and 4.
∗
Corresponding author. Tel.: +34 93 4016737. E-mail addresses:
[email protected] (X. Puig),
[email protected] (J. Ginebra).
0167-9473/$ – see front matter © 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.csda.2013.10.006
X. Puig, J. Ginebra / Computational Statistics and Data Analysis 70 (2014) 328–344
329
Fig. 1. Map of Barcelona divided in ten districts and 248 smaller areas.
The most serious drawback of the basic methods proposed for that problem, mostly based on ecological regression models with or without random coefficients, is their reliance on the assumption of the similarity across tables and hence that all the areas are exchangeable. That ‘constancy assumption’, imposed by forcing that the coefficients of the ecological regression model either be the same or share the same distribution for all areas, is often inappropriate due to the existence of contextual effects. In our case for example, one expects the vote pattern and the vote switch pattern of an area to depend on its demographic composition, and hence to be strongly associated to the location of that area. Hence, areas will not be exchangeable. One way around that drawback is to allow the coefficients of the ecological regression models to depend on area level covariates, but that requires one to know which are the relevant contextual characteristics and to be able to measure them, which most of the time is not feasible. Embedding an ecological regression model into a cluster model the way advocated here is an alternative way around the failure of that ‘constancy assumption’. In our setting the parameters of the ecological inference part of the model will be constant (or share the same distribution) only among areas in the same cluster. That allows one to estimate vote switch patterns by only pulling in the information of areas that are similar. The clusters found will typically be determined by the contextual characteristics that set areas apart, without having to make explicit and measure these covariates. In fact, the special one-cluster version of our model has a lot in common with the basic ecological regression models first considered by Goodman (1953, 1959). By considering the more general s-cluster version of the model and letting the data choose whether the one-cluster model is appropriate or not, (in our example it is clearly not appropriate), one is in effect checking whether the usual ‘constancy assumption’ holds or not. The article is organized as follows. Section 2 describes the results of a pair of recent elections in Barcelona that will be used as a showcase example. Section 3 presents the model and briefly relates it to the models used by the social sciences literature on ecological inference problems. Section 4 describes how one can decide on the number of clusters and check the final model based first on predictive checks comparing the levels and the variability of the actual election results with simulations from the models, and second by testing whether the residuals of the models are spatially correlated or not. Special care is devoted to picking up the statistics, graphics and residuals that best fit the main objective of the analysis. Section 5 presents the results of the analysis of the pair of elections in Barcelona, where it is found that individuals vote and switch their vote according to four different voting patterns and vote switch patterns depending on the area where they live. Even though the model is blind with regard to the location of the areas, the four-cluster structure uncovered has a rather strong spatial structure, which simplifies the interpretation of the results. In settings like the one of the example, where individuals vote and switch their vote very differently depending on where they live, using models that do not take into account the cluster structure of the data can lead to different, and potentially misleading conclusions. Section 6 discusses variations of the analysis, and ponders the limitations inherent to ecological inference methods. 2. Description of the example To illustrate our approach, we will consider the results of a pair of consecutive elections in Barcelona, which is the capital of Catalonia and holds about 20% of its population. Politics in Catalonia are unusual because of the existence of two cleavages, due to voters splitting across a national allegiance divide on top of the usual ideological divide. As a consequence, the results of an election change a lot depending on whether it is for the Catalan parliament or for the Spanish parliament. The first election of the pair considered here was held on November 16th, 2003 and it was for the Catalan parliament, while the second election was held on March 14th, 2004 and it was for the Spanish parliament. Barcelona is organized in the 10 districts and 248 areas shown in Fig. 1. The results of these elections in these areas are grouped into 7 categories. The first five correspond to the parties or coalition of parties with seats in both parliaments, labeled CIU, PSOE, PP, ERC and ICV. All the votes for parties obtaining less than 1% of the vote, null votes, and votes casted
330
X. Puig, J. Ginebra / Computational Statistics and Data Analysis 70 (2014) 328–344
Table 1 Part of the results in the 2003 and 2004 elections in Barcelona. District
Area
CIU
PSOE
PP
ERC
ICV
others
abs
N
Catalan parliament 2003 1 1
1 2
...
...
10
248
Total
...
195 208
...
375 333
...
76 75
...
86 97
...
58 70
...
19 26
...
701 790
...
1 510 1 599
441
1 535
592
245
229
82
2 202
5 326
227 783
249 020
123 163
126 626
69 234
19 295
407 294
1 222 415
141 154
488 498
127 110
156 183
52 57
28 25
496 564
1 488 1 591
Spanish parliament 2004 1 1
1 2
...
...
10
248
Total
...
...
...
...
...
...
...
...
375
2 037
814
282
267
125
1 372
5 272
188 386
359 254
171 102
138 762
65 001
24 489
268 393
1 215 387
for no one, are combined into the category others, and the voting age individuals in each area that abstain from voting are counted under abs. Table 1 partially presents the two 248 × 7 tables with these results. The results of the i-th area in the first election, with Ni1 voting age individuals and k1 voting options, will be denoted by 1 yi = (y1i1 , . . . , y1ik1 ), and the results of that area in the second election, with Ni2 individuals and k2 options, will be denoted
by y2i = (y2i1 , . . . , y2ik2 ). The whole set of results of a pair of elections will be denoted by y = (y1 , y2 ) with y1 = (y11 , . . . , y1n )
and y2 = (y21 , . . . , y2n ). In the 2003–2004 pair k1 and k2 are 7, and n is 248. On the conservative versus progressive scale, one can order the main parties as PP, CIU, PSOE, ERC and ICV. On top of this ideological divide one needs to consider the divide that separates the options that push for more self rule for Catalonia from the options satisfied with the current level of self government. On this national allegiance scale parties can be ordered from being more inclined to seceding from Spain to being less inclined to that as ERC, CIU, ICV, PSOE and PP. In fact PSOE and PP can be voted everywhere in Spain, where they are the main parties, and ICV is a steady partner of the third party in Spain, while CIU and ERC basically only contend for the vote in Catalonia. Before 2003 the parties pushing for more self rule for Catalonia, CIU and ERC, always did a lot better in Catalan elections than in Spanish elections, while the opposite happened to PSOE and PP. That is usually explained by many voters for PSOE and PP in Spanish elections either abstaining from voting or voting CIU, ERC or ICV in Catalan elections, and by voters for CIU, ERC and ICV in Catalan elections either abstaining or voting PP or PSOE in Spanish elections. The main goal of the analysis is to assess the extent to which these phenomena take place between 2003 and 2004. For a given pair of elections like this one, data are two sets of seven dimensional categorical observations, y = (y1 , y2 ), that are ordered in time and located in space. In order to check whether it is plausible that these results could have been simulated by the models, one needs to summarize them through smaller dimensional statistics capturing the features considered to be most relevant. One can focus on features in the results for either one of the elections separately, and on features in the differences between these results. One way to summarize the results is by mapping the proportion of the vote in 2003 and 2004 for each of the categories, plj = (pl1j , . . . , plnj ) for l = 1, 2 and j = 1, . . . , 7 where plij = ylij /Nil , as in the first two columns of Fig. 2. Another summary that helps appreciate the differences across districts even better are the first two columns in Fig. 3, presenting the value of these proportions with areas grouped by district to take into account their location in the map. These statistics indicate that there is a strong spatial dependency in the results, which is very similar in the two elections, with CIU doing best in districts 2, 4, 5 and 6, PSOE doing best in districts 7, 8, 9 and 10, PP doing best in districts 4 and 5, ERC doing best in district 6, and abstention being highest in districts 1 and 8. jj
jj
jj
The third column in Figs. 2 and 3 presents Djj (y) = (D1 (y), . . . , Dn (y)) for j = 1, . . . , 7, where Di (y) = log(p2ij /p1ij ) is the natural logarithm of the ratio between the proportions of the vote for category j. The fact that the level of these statistics changes with district indicates that individuals switch vote differently depending on where they live, which indicates that the ‘constancy assumption’ on which ecological inference models are typically based does not hold. This seems to be especially the case for CIU, PP and ERC voters and for abstainers. Note also that in 2004 the abstention and the vote for CIU is everywhere smaller than in 2003, the vote for ICV is mostly smaller, the vote for ERC is larger everywhere except in districts 4 and 5, and the vote for PSOE and PP in 2004 is everywhere a lot larger than in 2003. To further summarize this pair of election results we have also considered fourteen statistics of the form pla = (pl1a , . . . , aa l pna ) for l = 1, 2 and seven statistics of the form Daa (y) = (Daa 1 (y), . . . , Dn (y)), obtained by replacing categories, j, by ab ab ab combinations of categories, a, as well as twenty-one statistics of the form D (y) = (Dab 1 (y), . . . , Dn (y)), where Di (y) = 2 1 log(pia /pib ) is the logarithm of the ratio between the proportions of the vote in 2004 and in 2003 for pairs of combinations of categories, a and b. Combinations of categories have been carefully chosen to summarize the vote and the difference in vote across the national divide, across the ideological divide, and across the vote versus abstention divide. All these statistics,
X. Puig, J. Ginebra / Computational Statistics and Data Analysis 70 (2014) 328–344
331
Fig. 2. Maps of the proportion of the vote for each of the categories considered in the 2003 and 2004 elections in each area, (p1ij , p2ij ), and of the logarithm jj
of their ratio, Di (y), all categorized according to their quartiles to emphasize the spatial dependency.
332
X. Puig, J. Ginebra / Computational Statistics and Data Analysis 70 (2014) 328–344
Fig. 3. Proportion of the vote for each of the categories in the 2003 election for the Catalan parliament and in the 2004 election for the Spanish parliament, (p1ij , p2ij ), and natural logarithm of the ratio of these proportions, Djji (y), with areas grouped by district.
X. Puig, J. Ginebra / Computational Statistics and Data Analysis 70 (2014) 328–344
333
presented in the supplementary materials file (see Appendix A), and the residuals associated to them will be used in Section 4 to check models and choose the number of clusters. 3. Description of the model Here a model is proposed for the simultaneous analysis of the results in two consecutive elections. The model does a cluster analysis grouping the areas into a small number of clusters, each with distinctive vote pattern and vote switch pattern, and it links the two elections through vote switch matrices determining the average voting behavior of an area in the second election starting from its first election result. The general idea is to embed an ecological regression model into a cluster model and fitting a different regression model in each cluster; in that way one only imposes the ‘constancy assumption’ that areas be exchangeable among areas in the same cluster. Even though there is a single model for the results in both elections, y = (y1 , y2 ), with areas being allocated into clusters based both on the results in the first election as well as on the way in which the results change in the second one, next the model is described in two stages, separating the cluster analysis part from the ecological inference part. 3.1. On the cluster analysis part of the model If one assumes that the Ni1 voting age individuals of the i-th area have independently chosen one of the k1 options available in the first election according to the probability distribution, θi1 = (θi11 , . . . , θik1 1 ) with
k1
j =1
θij = 1, one can model
the results in that area, y1i = (y1i1 , . . . , y1ik1 ), through a multinomial model, Mult(Ni1 , θi1 ), where θij1 is the probability that an individual in the i-th area chose the j-th voting option in that election. If one further assumes that, conditional on the parameter values, the results in different areas are independent, then all the results of the first election, y1 = (y11 , . . . , y1n ), n are i=1 Mult(Ni1 , θi1 ) distributed. In practice the average voting behavior of the i-th area, θi1 = (θi11 , . . . , θik1 1 ), will be related to the one of other areas with similar demographic composition. To model that, one could group areas into a small number of clusters, with all areas in the same cluster sharing the same average voting behavior, but that might not be realistic. Modeling θi1 through area level covariates is often not feasible because covariates are not available. Instead, one might want to group areas into a small number of clusters and let θi1 change from area to area in the same cluster while still sharing a common distribution using a hierarchical model. Here it is assumed that the θi1 for all the areas in
k
the r-th cluster are independent realizations of a Dirichlet(τr (µr1 , . . . , µrk1 )) distribution, where 0 < µrj < 1, j=1 1 µrj = 1 and τr > 0. This parametrization allows one to model separately µr , representing the expected value of the average voting behavior, θi1 , of the areas in the r-th cluster, E [θi1 |µr , τr ] = (µr1 , . . . , µrk ), and τr , determining the heterogeneity of θi1 , because: Var (θij1 |µr , τr ) =
µrj (1 − µrj ) , τr + 1
(3.1)
and the larger τr , the more similar all the θi1 of areas in the r cluster tend to be. A consequence of this assumption is that the results, y1i = (y1i1 , . . . , y1ik1 ), of the areas in the r-th cluster can be seen as a realization of a continuous mixture of multinomial distributions with a Dirichlet mixing distribution, labeled as DirMult(Ni1 , τr µr ); In the limit, when τr tend to infinity, the distribution of y1i becomes multinomial. From the standpoint of the first election, the goal is to estimate (µr , τr ) and allocate areas into clusters, and not estimate the θi1 , and one can pose this part of the model ‘‘non-hierarchically’’ by stating that y1 = (y11 , . . . , y1n ) is a realization of a finite mixture of s Dir-Mult distributions, p(y1 |ω, µ1 , . . . , µs , τ1 , . . . , τs ) =
n s
ωr Dir-Mult(Ni1 , τr µr ),
(3.2)
i =1 r =1
where ω = (ω1 , . . . , ωs ) is a set of weights determining the proportion of areas in each cluster. To allocate areas into clusters one needs to introduce a vector of unobserved categorical variables, ζ = (ζ1 , . . . , ζ248 ), such that ζi = r whenever the i-th area belongs to the r-th cluster and that they are conditionally independent with π (ζi = r |ω) = ωr . Then p(y1 , ζ |ω, µ1 , . . . , µs , τ1 , . . . , τs ) =
n
ωζi Dir-Mult(Ni1 , τζi µζi ),
(3.3)
i=1
which makes the computation of the posterior distribution a lot easier than (3.2). To link this part of the model with the part for the results of the second election though, it is helpful to resort back to the hierarchical formulation in terms of θi1 and θi2 , as in Box 1. According to this first part of the model all voting age individuals in the same area are exchangeable, but they are not exchangeable with individuals in other areas of the same cluster. The average voting behaviors, θi1 , of all the areas in the same cluster are exchangeable, but they are not exchangeable with the ones of areas in other clusters.
334
X. Puig, J. Ginebra / Computational Statistics and Data Analysis 70 (2014) 328–344
3.2. On the ecological inference part of the model If one assumes that in the second election the Ni2 voting age individuals of the i-th area have independently chosen one of the k2 options available according to the probability distribution, θi2 = (θi12 , . . . , θik2 2 ) with
k2
j=1
θij = 1, the results of
that election in the i-th area, = ( ,..., ), are Mult( , θ ) distributed. In the second election we let the i-th area belong to the same ζi -cluster as in the first election. By using the same cluster labeling variable ζ in both parts of the model and by updating both parts of the model simultaneously, the allocation of areas into clusters is based on the results in both elections. Typically, the cluster structure found will be mostly explained by area level covariates and hence, as long as the value of these covariates does not change much between elections, it will be safe to assume that clusters do not change between them the way we do. In our example that holds, because elections are very close in time, and the population in a given area barely changes. To relate the set of probabilities of the i-th area for the second election, θi2 , with its results and probabilities for the first election, y1i and θi1 , one defines γjlr to be the conditional probability that an individual in an area of cluster r that has chosen the j-th option in the first election chooses the l-th option in the second election. That leads one to consider the k1 × k2 matrix: y2i
γ1r,1 .. Γr = . γkr1 ,1
γ1r,2 .. . r γk1 ,2
y2i1
... .. . ...
y2ik2
Ni2
2 i
γ1r,k2 .. , . γkr1 ,k2
(3.4)
that will be called the vote switch matrix of cluster r and is such that all its rows, γjr = (γjr,1 , . . . , γjr,k2 ), add up to one because they are the conditional distribution of the second election choices given that one chose j in the first election and is in the r-th cluster. Researchers familiar with the terminology of the ecological inference literature will recognize γjlr to be the propensity of the individuals in the r-th cluster that voted for the j-th option in the first election to vote for the l-th option in the second election. In our case for example γ1r,1 is the probability that someone in the r-th cluster voting CIU first votes again CIU in the second election, and hence it measures the fidelity to CIU in that cluster, while γ1r,2 is the probability that someone voting CIU first switches to voting PSOE second. Given that abstention is an option, this matrix includes the probabilities that one switches from not voting to voting each one of the other options or vice-versa. If the i-th area belongs to the ζi -th cluster, one can compute the θi2 = (θi12 , . . . , θik2 2 ) determining its average voting behavior on the second election either by: 1. using the total probability theorem indicating that θil2 =
θi2 = θi1 Γζi
k1
ζ
j =1
θij1 γjl i for l = 1, . . . , k2 , and hence through:
for i = 1, . . . , n,
(3.5) ζi
2. or by using that ≃ and that E [ |θ ] = θ while E [ |γj , that these expectations be equal, which leads to: Ni1
θi2 = p1i Γζi where
p1i
=(
y1i1
Ni2
y2il
for i = 1, . . . , n,
/
Ni1
,...,
y1ik1
/
Ni1
2 il
Ni2 il2
y2il
y1i
]=
1 ζi j=1 yij jl
k1
γ for l = 1, . . . , k2 , and imposing (3.6)
).
We consider that (3.6) reproduces the way in which individuals actually switch their voting preferences more faithfully than (3.5), because when voting in the second election one benefits from remembering what one actually voted in the first election. Furthermore, (3.6) is more similar to the models used by the ecological inference literature, and we find the predictions of the second election results based on (3.6) to be better because they take advantage of information available that is not used by (3.5). Hence that is how we will model it, but other ecological inference problems could be better modeled through (3.5). In fact, note that (3.6) just amounts to the deterministic part of the usual set of k1 ecological regression models first considered in Goodman (1953, 1959) for ecological inference problems, but here applying only to the areas that are included in the ζi -cluster instead of to all the areas under study. Hence, the special one-cluster version of our model reduces to a new version of the ecological regression models widely used in the political sciences literature. By considering as many matrices as clusters the way done here, the vote switch patterns are different in different clusters. By using the same labeling variable, ζ to classify both the first election voting patterns and heterogeneity parameters, µζi and τζi , as well as the transition matrices, Γζi , and by simultaneously updating both the cluster analysis part of the model as well as the ecological inference part of it the way done here, both the results of the first election as well as the way individuals switch their vote contribute to the determination of the size, shape and number of these clusters. At this point, one might think about modeling the transition matrix rows in each cluster hierarchically, by letting them change from area to area in a given cluster while still sharing a common distribution. That would be akin to using a different random coefficients ecological regression model in each cluster, and in that case, the special one-cluster version of our model would be similar to the models proposed by Hawkes (1969), Brown and Payne (1986), Rosen et al. (2001), Wakefield (2004)
X. Puig, J. Ginebra / Computational Statistics and Data Analysis 70 (2014) 328–344
335
and Greiner and Quinn (2009). Embedding hierarchical versions of the ecological regression model into the cluster model in this manner though, complicates the update of the Bayesian model a lot due to model identifiability problems, and we have not implemented it in our example. 3.3. On the choice of the prior and its update As a prior distribution for (µr , τr , ωr , Γr ) here it is assumed that the average voting patterns, µr , are Dirichlet(mr1 , . . . , mrk1 ), that the heterogeneity parameters, τr , are Gamma(cr , dr ), that the cluster weights, ω, are Dirichlet(b1 , . . . , bs ), and r that the j-th row of Γr , γjr = (γjr,1 , . . . , γjr,k2 ), is Dirichlet(gj1 , . . . , gjkr 2 ). All these distributions are also assumed to be independent. Depending on the values chosen for the parameters of these distributions, this prior can go from being very subjective to reflecting very vague information, and we have thoroughly explored the impact of these choices on the posterior distribution. k In particular, the prior expected value of µr is (mr1 , . . . , mrk1 )/( j=1 1 mrj ), and one can chose the mri to reflect that
k
1 some voting options are more likely than others, and that the larger j=1 mrj the smaller the variances of µri and the more informative the prior on µr . In the actual implementation that follows all the (mr1 , . . . , mrk1 ) are set to be equal to (1, . . . , 1), which is a uniform distribution on the corresponding simplex. A similar argument has lead us to set (b1 , . . . , bs ) to be (1, . . . , 1), but sometimes one might want to discourage small clusters by setting (b1 , . . . , bs ) to be (b, . . . , b) with b larger than 1 and hence concentrating the prior probability of the cluster weights, ω = (ω1 , . . . , ωs ), away from the simplex boundaries. Information about τr , determining the heterogeneity of θi in the r-th cluster, is harder to come by and that is why all the cr have been set to be 1 and all the dr have been set to be 0.001, which is a vague prior for the τr . r The choice of the (gj1 , . . . , gjkr 2 ) determining the prior distribution for the rows of the vote switch matrices is more delicate, especially when data is too sparse to allow for a reliable estimation of the joint distribution of two election results based on the marginal results alone. Information can be incorporated in the prior by taking advantage of the fact that one knows that some vote switches are very unlikely, and hence that the corresponding γir,j are close to zero, and that the probability of repeating the vote is typically larger than the probability of switching into either one of the other options, and hence that γjr,j is larger than γjr,l for j ̸= l. Our data is not sparse and hence we can resort to vague priors for γjr . In the implementation that follows gjjr is set to be equal to 4.5, and gjlr is set to be 0.3 for j ̸= l, which corresponds to assuming that the prior expected value for γjr,j is .71. We also tried varying the values for gjlr , using the fact that some vote switches are more unlikely than others, but in the final presentation we let data speak in this regard. By varying the prior parameter values around the ones indicated above it is found that the posterior distribution is very mildly affected by our prior choice. That happens because in our example the total number of areas and the total number of counts in the areas is large, but one should not expect that to be the case when one is analyzing sparse data. Finite mixture models have a cluster label identifiability problem that can lead to labels switching in the middle of MCM chains. One partial remedy proposed for that is the introduction of order restrictions on ω or a component of µr , but that complicates the MCMC simulations and does not fully solve the problem. In our example the clusters are very different, and we did not observe any label switching in our MCM chains. To update the model and simulate from it the WinBugs MCMC implementation has been used (see Lunn et al., 2000). The code can be found in a supplementary material file (see Appendix A). The convergence of the chains has been assessed through the visual inspection of the sample traces and the use of various diagnostic tools, like their sample autocorrelations and the Rˆ diagnostic measure in Gelman and Rubin (1992). For each model, four parallel MCMC chains with different initial values have been run until all of their ergodic means converged to the same values. The diagnostic tools used have lead to discard the first 100 000 iterations for each chain. Only one out of each 50 iterations left was kept, to save on storage, and the final analysis has been based on 10 000 realizations, 2500 from each chain.
(y11 , . . . , y1n )|(θ11 , . . . , θn1 ) ∼ ni=1 Mult(Ni1 , θi1 ) . . τs ), ζ ∼ ni=1 Dirichlet(τζi (µζi 1 , . . . , µζi k1 )) (θ , . . . , θ )|(µ1 , . . . , µs ), (τ1 , . (µ1 , . . . , µs ) ∼ sr = 1 Dirichlet(mr1 , . . . , mrk1 ) (τ1 , . . . , τs ) ∼ sr =1 Gamma(cr , dr ) 1 1
1 n
(y21 , . . . , y2n )|(θ12 , . . . , θn2 ) ∼ ni=1 Mult(Ni2 , θi2 ) θi2 = p1i Γζi for i = 1, . . . , n k 1 (γ1 , . . . , γk11 ) ∼ j=1 1 Dirichlet(gj11 , . . . , gjk1 2 ) .. . k (γ1s , . . . , γks1 ) ∼ j=1 1 Dirichlet(gj1s , . . . , gjks 2 ) π (ζ1 , . . . , ζn |ω) = ni=1 ωζi (ω1 , . . . , ωs ) ∼ Dirichlet(b1 , . . . , bs ) Box 1. Bayesian Dirichlet-multinomial s-cluster model with vote switch matrices.
336
X. Puig, J. Ginebra / Computational Statistics and Data Analysis 70 (2014) 328–344
4. Choice of the number of clusters, s The only reliable way to build a useful model is through the iterative use of model checking tools that help discover aspects of reality not adequately captured by the models and suggest ways of improving them. Building a Bayesian model is like building a data simulation model, and hence they can be assessed and chosen based on whether it is plausible that they could simulate data like the one observed in reality or not. Here choosing a model means choosing the number of clusters, s, and it will be done: 1. by first looking for the smallest s that makes it plausible that the model could simulate data with similar levels and variability as the actual election results, and second, 2. by checking whether the model captures most of the spatial dependency in the actual results by testing whether its residuals are spatially independent or not. The emphasis will be on assessing the models based on the features in the data and on the residuals more directly related to the way in which individuals switch their vote. All will be illustrated through the pair of elections in Barcelona presented in Section 2. There are clustering methods that simultaneously estimate the number of clusters, the parameters describing the clusters and the cluster allocations, (see, e.g., Richardson and Green (1997)). Note though that implementing that in our ecological inference setting would be a lot more demanding computationally than in any of the settings where these techniques are being used, and even if one did that, one would still need to check whether the final model is appropriate or not and hence one would not be spared the need to carry out model diagnostic checks like the ones considered next. 4.1. Choice of s based on mixed predictive checks Following the lead of Gelman et al. (2004), the focus here is on comparing the pair of 248 × 7 election result tables, y = (y1 , y2 ), with pairs of 248 × 7 tables simulated from the predictive distribution of Dirichlet-multinomial s-cluster models with transition matrices, yrep = (yrep1 , yrep2 ). These replicates are obtained by: 1. first simulating (µ ˜ 1, . . . , µ ˜ s , τ˜1 , . . . , τ˜s , Γ˜ 1 , . . . , Γ˜ s , ζ˜ ) from their joint posterior, 1 1 ˜ ˜ 2. then simulating (θ1 , . . . , θ248 ) from the 248 ˜ζ˜i µ ˜ ζ˜i ), i=1 Dirichlet(τ rep1
248 1 ˜1 , . . . , yrep1 248 ) from the i=1 Mult(Ni , θi ), 2 1 ˜ 1 1 ˜ 4. then computing θi = pi Γζ˜i for i = 1, . . . , 248, where pi = (yi1 /Ni1 , . . . , y1ik1 /Ni1 ), 248 rep2 rep2 5. and finally simulating yrep2 = (y1 , . . . , y248 ) from the i=1 Mult(Ni2 , θ˜i2 ),
3. then simulating yrep1 = (y1
in what in the terminology of Gelman et al. (1996) and Marshall and Spiegelhalter (2003) would be a mixed predictive check. When the row totals in each area and the number of areas in each cluster are large enough, the parameters will be well estimated and this is basically like assuming that the values of µζi , τζi and of Γζi that rule (y1i , y2i ) and the ones that rep1
, yrep2 ) are almost the same, but that the values of (θi1 , θi2 ) that rule (y1i , y2i ) are different of the ones that rule i rep1 rep2 (yi , yi ). This is the closest one can think to an ideal leaving one out cross-validation posterior predictive check, which here is unfeasible because models do not include any spatial term allowing one to estimate ζi without (y1i , y2i ). The distributions of statistics of the type p1a = (p11a , . . . , p1248a ), where a is either a category, j, or a combination of categories, depend mainly on (µr , τr ) and ζ , and hence they are useful to check the part of the model describing the results of ab ab 1 2 2 the first election. Instead, the distribution of Dab (y) = (Dab 1 (y), . . . , D248 (y)) with Di (y) = log(pia /pib ), and the one of pa = 2 2 1 (p1a , . . . , p248a ) depend both on the (µr , τr ) and ζ , ruling the distribution of pa , as well as on Γr , which is our main object of interest, and hence the number of clusters, s, will be chosen mostly based on the comparison of Dab (y) and p2a (y) with Dab (yrep ) and p2a (yrep ). 04 04 Fig. 4 for example, assesses these models by comparing how they fare when replicating the statistic (p03 PSOE , pCIU , log(pCIU / 03 04 03 pPSOE )). Note that these models predict pCIU a lot more accurately than pPSOE because under them one benefits from the results
rule (yi
in 2003 to predict the results in 2004. According to this figure, the one and two cluster models are clearly unable to reproduce 04 03 the levels of p03 PSOE and of log(pCIU /pPSOE ). The simulations from the three- and four-cluster models fare a lot better, but it is not easy to decide whether either one of them is good enough just based on figures like this one. To refine the analysis and help decide on the smallest number of clusters that fulfils what one expects from an adequate model, the sample median of the values taken by fourteen p2a statistics and thirty-five Dab (y) statistics in each district were compared with 90% mixed predictive credible intervals for them. A model can be considered to be adequate if its 90% predictive credible intervals for the sample medians contain the observed sample median dots with probability .9, and hence in about nine out of ten occasions. When too many sample medians fall outside of these intervals and/or there is a median falling very far from it, one knows that the model fails to capture that feature adequately. Fig. 5 presents these model diagnostic checks for the Djj (y) statistics in Fig. 3; All the plots that correspond to the same statistic (i.e., all the ones on a given row) share the same sample median dots, but their credible intervals change with the number of clusters. According to that figure, in order to capture the way in which the proportion of the vote for CIU , PP , ERC
X. Puig, J. Ginebra / Computational Statistics and Data Analysis 70 (2014) 328–344
337
04 04 03 Fig. 4. The top panel is the observed value for (p03 PSOE , pCIU , log(pCIU /pPSOE )). Below, replicates from the mixed predictive distribution of the Dirichletmultinomial s-cluster model with vote switch matrices and s = 1, 2, 3, 4, with areas grouped by district.
and the abstention changes in all districts one needs the four-cluster model, because models with less than four clusters lead to three or more credible intervals missing the sample median, sometimes by a lot. To replicate the change in vote for PSOE
338
X. Puig, J. Ginebra / Computational Statistics and Data Analysis 70 (2014) 328–344
jj
Fig. 5. Dots are the sample median of the values taken by Di (y) = log(p2ij /p1ij ) for j = 1, . . . , 7 in each district. Segment lines represent the 90% mixed predictive credible intervals for them under Dirichlet-Multinomial s-cluster models with vote switch matrices.
X. Puig, J. Ginebra / Computational Statistics and Data Analysis 70 (2014) 328–344
339
one needs at least a three-cluster model, because the one- and two-cluster models clearly fail in districts 5 and 8. On the other hand, the one-cluster model is good enough to capture the change in vote for ICV and for others. The supplementary materials file (see Appendix A) presents similar checks based on a sample of seven Dab (y) statistics. Overall, it is found that the credible intervals under the four-cluster model include the observed median of all but one or two districts for almost all the fourteen p2a statistics and thirty-five Dab (y) statistics considered. The only statistic for which 03 the intervals under the four-cluster model fail to include the medians of more than three districts is log(p04 PSOE /pERC +ICV ), which indicates that this feature is not well captured by that model. Note also that most of the discrepancies between sample medians and their corresponding credible intervals under the four-cluster model happen in district 8, which indicates that this model is unable to reproduce a few features of the results of that district accurately. 4.2. Choice of s based on the spatial dependence of residuals So far it has been checked that the four-cluster model is quite good at replicating both the levels as well as the variability in the actual election results. In our context, a model will be especially useful as a summary of reality if it has a small number of clusters and yet captures most of the spatial dependency in the two election results p1 = (p11 , . . . , p1248 ) and ab p2 = (p21 , . . . , p2248 ), where pli = (yli1 /Nil , . . . , yli7 /Nil ), and in Dab (y) = (Dab 1 (y), . . . , D248 (y)) for all relevant combinations of categories a and b. That the spatial dependency in these statistics is strong is clear from Figs. 2 and 3. That is why here models with different numbers of clusters will also be assessed and compared by testing whether their residuals are spatially correlated or not, in a way analogous to the one typically used to check and chose models in time series analysis by testing whether their residuals are time correlated or not. For a discussion of the connection between the correlation of the error term of the ecological regression model and the validity of the assumptions made by that model in the one-cluster case, see Gelman et al. (2001). Given that here the observation in the i-th area are two seven dimensional vectors, p1i = (p1i1 , . . . , p1i7 ) and p2i = (p2i1 , . . . , 2 2 1 pi7 ), as well as many statistics of the form Dab i = log(pia /pib ), there are lots of ways of defining what a residual of the i-th area is under the Dirichlet-multinomial s-cluster models with transition matrices considered. Hence, one needs to tailor these residuals to the objective of the analysis. If one was mainly interested in summarizing the voting behavior in the first election, a natural candidate to be the residual for the i-th area would be the posterior expectation of the χ 2 discrepancy between the results in that area and their expected value under the Dirichlet-multinomial s-cluster model. Here the main goal is to check whether vote transition matrices capture the way individuals switch their vote, and hence the analysis is based on the Pearson residuals of 14 statistics p2a = (p21a , . . . , p2248a ), denoted and defined as: p2ia − E [p2ia |y] , pP2 ia = Var [p2ia |y]
(4.1)
ab Pab as well as on the Pearson residuals of 35 statistics Dab (y) = (Dab (y) and defined analo1 (y), . . . , D248 (y)), denoted by D Pab gously. The distribution of Di (y) depends both on Γζi as well as on (µζi , τζi ) while the distribution of pP2 ia mostly depends Pab Pab P2 P2 Pab on Γζi , and therefore pP2 = ( p , . . . , p ) will be more helpful than D ( y ) = ( D ( y ), . . . , D ( y )) to check the validity a 1a 248a 1 248 of the estimated vote transition matrices. Even though the spatial dependence left in both types of residuals was checked, here only the analysis using residuals of the form pP2 a will be reported. To measure the spatial dependence left in pP2 the following index of Moran is used, a 248 248
IM (pP2 a ) =
248 248 248
i=1 j=1
P2 P2 P2 λij (pP2 ia − pa )(pja − pa )
i=1 j=1
λij
248
,
(4.2)
P2 2 (pP2 ia − pa )
i=1
P2 where pP2 a is the average of pia , and λij is 1 if the i-th and j-th areas are in contact and it is 0 if they are not. To decide whether these residuals are spatially correlated or not, a permutation test is implemented. These tests are based on the fact that if the residuals were spatially independent and one randomly shuffled their values on the map without changing the area labels on the map and recomputed IM (pP2 a ), one would obtain a value similar to the one of the index observed. When the observed value is significantly larger, it means that the set of residuals are spatially correlated (see, e.g., Bivand et al., 2008). 2 Fig. 6 presents the permutation distributions of IM (pP2 a ) for the pa statistics in Fig. 3, computed based on 5000 random permutations of their residuals. Note that most of the Pearson residuals under the one-cluster model are strongly correlated, and the same holds to a lesser extent under the two- and three-cluster models. In particular, note that under the three-cluster model the strongest spatial dependence left is in the Pearson residuals of PSOE and of abstention, where Fig. 4 indicated that three clusters were good enough in terms of capturing the level of the vote for PSOE. This means that checking the model through the correlation of its residuals complements well the predictive checks in Section 4.1 focusing on matching observed and simulated levels.
340
X. Puig, J. Ginebra / Computational Statistics and Data Analysis 70 (2014) 328–344
Fig. 6. Permutation distributions of IM (pP2 j ) and the value it takes in the data. They allow one to check the spatial dependence left in the Pearson residuals of the proportion of the vote for each category in 2004, p2ij , under the Dirichlet-multinomial s-cluster model with vote transition matrices.
Fig. 6 shows that these Pearson residuals under the four-cluster model are either weakly correlated or uncorrelated. Adding more clusters barely changes the dependence left in these residuals. Similar conclusions are reached through the permutation distributions of the statistics IM (pP2 a ) presented in the supplementary material file (see Appendix A), and through the permutation distributions of the other thirty-five statistics IM (DPab (y)) considered in this paper. Hence we settle with a four-cluster model.
X. Puig, J. Ginebra / Computational Statistics and Data Analysis 70 (2014) 328–344
341
Fig. 7. Classification of the 248 areas into the four clusters using the mode of π(ζi |y), and posterior expectation of the vote switch matrices, 100*Γr , from the 2003 election for the Catalan parliament to the 2004 election for the Spanish parliament. The rows are the distributions of the vote in 2004 given the choice in 2003.
5. Presentation of the results of the analysis Here the four-clusters and four vote switch matrices uncovered by the analysis of the 2003–2004 elections in Barcelona are described. Table 2 presents the posterior expectation of µr = E [θi1 |ζi = r ] and of τr , determining the vote pattern and degree of heterogeneity of the clusters in 2003, the posterior expectation of E [θi2 |ζi = r ], determining their vote pattern in 2004, and the posterior expectation of ωr and % Pop, determining the relative cluster sizes in terms of number of areas and of voting age individuals. Cluster labels have been chosen so that they increase with the increasing expected percentage of the vote for PSOE, which in Barcelona was the winner in both elections. By assigning the i-th area to cluster r whenever the mode of π (ζi |y) is r, one is lead to the cluster map in Fig. 7. Even though the model does not use the spatial location of the areas, the clusters uncovered have a strong spatial structure that is closely related to area level covariates. In particular, the average income and the proportion of Catalan speakers and of
342
X. Puig, J. Ginebra / Computational Statistics and Data Analysis 70 (2014) 328–344
Table 2 Posterior expected value of µr = E [θi1 |ζi = r ] and of E [θi2 |ζi = r ], determining the voting patterns of the four clusters, and of ωr , % Pop and τr , determining their relative size in terms of number of areas and of voting age individuals, and their heterogeneity. Elect
Cluster
CIU
PSOE
PP
ERC
ICV
Others
abs
ω
% Pop
τ
2003
1 2 3 4
0.301 0.223 0.120 0.132
0.120 0.175 0.199 0.250
0.180 0.090 0.068 0.094
0.087 0.135 0.070 0.082
0.038 0.064 0.054 0.060
0.017 0.016 0.019 0.019
0.257 0.298 0.471 0.363
0.116 0.337 0.185 0.361
10.6 38.9 7.5 43.0
262.13 428.23 145.40 155.05
2004
1 2 3 4
0.279 0.189 0.086 0.105
0.173 0.262 0.289 0.355
0.239 0.124 0.103 0.135
0.082 0.145 0.090 0.098
0.033 0.058 0.058 0.056
0.018 0.018 0.020 0.023
0.176 0.204 0.354 0.228
0.116 0.337 0.185 0.361
10.5 38.9 7.5 43.1
Fig. 8. The rows of the first matrix are the overall distributions of the vote in 2004 of all the individuals with a given choice in 2003. The columns of the second matrix are the overall distributions of the vote in 2003 of all the individuals with a given choice in 2004.
Catalonia natives decreases as the cluster label increases. The cluster structure obtained based on each of the elections separately is extremely similar to the one obtained here, which backs the assumption that clusters do not change much between elections. Table 2 indicates that Cluster 1 is the conservative stronghold in Barcelona, with CIU always winning and PP always coming in second, and the map in Fig. 7 indicates that this cluster is formed mainly by the areas in district 5 and one half of the areas in district 4, which are the wealthiest parts of Barcelona. In Cluster 2 the winner in 2003 is CIU while the winner in 2004 is PSOE, and it is where ERC does best. This cluster is the most homogeneous one, and it is formed by almost all the areas in districts 2 and 6, the remainder of district 4 and a few neighboring areas. Both in clusters 1 and 2 the Catalan language and sense of identity are the dominant ones. What characterizes the areas in Cluster 3 is their high levels of abstention and the fact that PSOE always wins there, while Cluster 4 is the stronghold of PSOE. Cluster 3 is the smallest in terms of population, and it includes most of district 1 and a few areas elsewhere, while Cluster 4 is the largest one and it includes most of districts 7, 8, 9 and 10. In Clusters 3 and 4 the options pushing for more Catalan self rule have about one half of the support they have in Clusters 1 and 2, but the support for the options that do not push for that is more evenly spread across all four clusters. Fig. 7 also presents the posterior expectation of the four vote switch matrices. On top of the arguments given in Section 4 in favor of the four-cluster model, the need for at least four clusters is also backed by the fact that each of these four matrices has at least one important feature that distinguishes it from the other three matrices. As a summary of the way in which people in Barcelona switch their vote, Fig. 8 presents the global vote switch matrix, where each row is the distribution of the vote in 2004 of all the people with a given choice in 2003. That matrix is obtained by first translating the matrices for each cluster into absolute number of votes for each combination of categories and each cluster, and then adding these votes and estimating the global conditional distributions. The rows of these matrices indicate that everywhere in Barcelona the fidelity is highest among voters for PP, followed by voters for PSOE, which was to be expected given that these were the parties competing for the Spanish government in 2004. When it comes to the 2003 abstainers, which is the largest category in that election, Fig. 7 indicates that somewhere between 16% and 27% of them, depending on the cluster, switched into voting PSOE and that except in Cluster 2, between 7% and 9% of them switched into voting PP. Overall, Fig. 8 indicates that 25% of the 2003 abstainers, which is about 102,000 individuals, switched into voting PSOE, and 6% of them switched into voting PP. There are almost no switches from voting in 2003 into abstaining in 2004. On the other side of the national divide the matrices in Fig. 7 indicate that the individuals that in 2003 voted CIU in Clusters 1 and 2, mostly repeated their vote in 2004 and only about 11% of them switched into voting PP, but in Clusters 3
X. Puig, J. Ginebra / Computational Statistics and Data Analysis 70 (2014) 328–344
343
and 4 the fidelity to CIU was much lower, with 27% and 17% of its voters defecting to ERC, and 8% and 5% of them defecting to PSOE. Overall, Fig. 8 indicates that in 2004 CIU lost about 8% of its 2003 voters to PP, about 7% to ERC and about 3% to PSOE. Even though ERC received more votes in 2004 than in 2003, some of its 2003 voters did not repeat their vote; in Clusters 1 and 2 the ones that switched mostly voted for PSOE, but in Clusters 3 and 4 they mostly voted for CIU. Overall, ERC lost about 6% of its voters to CIU and 6% to PSOE. Finally, the 2003 voters for ICV that switched their vote, in Clusters 1 and 2 mostly went for ERC but in Cluster 4 they mostly went for PSOE; overall that amounts to 14% of them voting for PSOE and 10% voting for ERC. The columns of the second matrix in Fig. 8, adding up to one hundred, summarize the distributions of the vote in 2003 of all the people with a given choice in 2004. These columns are estimated from the absolute vote count matrix obtained by adding the estimated vote counts for each combination of categories and each cluster. According to this matrix, the largest contributor to the big win by PSOE in 2004, besides its own 2003 voters, were 2003 abstainers, contributing close to 28% of their vote, followed by 2003 voters for ICV, ERC and CIU, with each one of them contributing a bit more than 2% of their 2004 vote. Besides its own 2003 voters, the main contributor to the 2004 vote for PP are also 2003 abstainers, contributing about 13% of that vote, followed by 2003 voters for CIU, contributing about 10% of it. On the other side of the national divide, almost all the 2004 vote for CIU came from people also voting CIU in 2003, and only 4% of that vote came from people that had voted ERC. In 2004 ERC received more votes in the election for the Spanish parliament than in the previous election for the Catalan parliament for the first time ever; It turns that 78% of the people voting for ERC in 2004 had already voted them in 2003, about 11% of them had voted for CIU, 5% of them had abstained and about 5% of them had voted ICV. 6. Extensions and limitations of the analysis The s-cluster models proposed here represent a compromise between the one-cluster constant coefficients ecological regression models, estimating a single matrix ruling the way in which individuals everywhere switch their vote, and the one-cluster random coefficients ecological regression models, estimating a different vote switch matrix for each area. The beauty of the compromise is that it allows one to improve the estimation of the vote switch patterns by only pulling in the information of the areas that are indeed similar, without the need to assume that all areas are exchangeable. Some might be concerned by the fact that the joint distributions of the results of the two elections are estimated based only on the information in the marginals of the tables (with the results of a single election each), which only have limited information about these joint distributions, and that the number of clusters is selected only based on diagnostic checks that match the level, variability and spatial dependence in these marginals. But in practice there is almost never any way around that because one lacks reliable individual level data. In our case though, there is also evidence in favor of the ‘‘truth’’ in our model and our inferences in that all four MCMC chains converge to the same cluster configuration and that an independent analysis of similar pairs of elections lead to area cluster allocations and estimated transition matrices similar to the ones reported here. Furthermore, note that our diagnostic checks are sensitive enough to discard one-, two- and three-cluster models. The update of the model considered here can be quite challenging due to the large number of parameters involved, and there will be instances where one will need to estimate vote switch patterns through simpler models. For example, when comparing vote transition matrices in different pairs of elections, it might be convenient to do it based on a fixed and given cluster structure common to all pairs of elections, maybe obtained based only on a cluster analysis of the election results in a single election, thus separating the cluster analysis stage from the ecological inference stage. One could also do the clustering based only on the way in which individuals switch their vote, disregarding the first election results, but we did not find that approach useful when we tried it in our case. A second modeling shortcut would be using a non-hierarchical multinomial s-cluster model for the first part of the model. Even though that will not capture all the variability in the first election results, it will lead to cluster structures and vote switch matrices very similar to the ones obtained with the model used here, at a much lower computational cost. An implementation of this simplified approach can be found in Puig and Ginebra (2012). The careful determination of the number of clusters is only crucial if one is interested in learning about how do people vote differently at a local level. When one is not interested in the cluster structure and cluster matrices in Fig. 7, and is instead only interested in the overall transition matrices in Fig. 8, one could decide a priori on a number of clusters, s0 , that is neither so small that one will miss relevant clusters nor so large that will difficult the estimation of their vote transition matrix. In that case one can also resort to less parsimonious single cluster ecological regression models with random coefficients, like the ones considered in Hawkes (1969), Brown and Payne (1986), Rosen et al. (2001), Wakefield (2004) and Greiner and Quinn (2009). Using these alternative approaches in our example we find an overall vote switch matrix that is similar to the one in Fig. 8, but one misses the four cluster structure identified in Fig. 7, which is very useful to political strategists and scientists because it can be related to and interpreted through the demographic composition of the underlying areas. An alternative modeling approach would be to add spatial terms in the model to account for the dependence in the data in a way analogous to the one used in hierarchical models for disease mapping (see, e.g., Wakefield, 2004). Note though that our clustering approach is more flexible because it allows one to capture aggregation in the data that is not spatial, together with the aggregation that is spatial, which is a big advantage when one is dealing with election results in places like Catalonia, with an extremely fragmented and diverse demography criss-crossed by two deep cleavages. That extra flexibility though
344
X. Puig, J. Ginebra / Computational Statistics and Data Analysis 70 (2014) 328–344
might not be needed to analyze election results in places with a more homogeneous and smoothly changing demography. Spatial models will typically provide sound estimates of the overall transition matrix, but with them it will also be more difficult to interpret the vote transition estimates at a local level, and that might make them less useful. No approach to the ecological inference problem can claim to have good statistical properties under all circumstances, and if data is sparse enough, all approaches will fail. More research is needed to help assess when do ecological inference methods perform adequately. In our setting reliable individual level data is difficult to come by, but when it is available it should be used to help improve the transition matrix estimates. Furthermore, in those instances where it is safe to assume that transition matrices will not change much across pairs of elections and/or where one can estimate these matrices adequately ahead of the second election through survey sample data, one could adapt our model to help predict the results of the second election starting from the results of the first one. Acknowledgments This work was funded in part by Grant # MTM2010-14887 of the Ministerio de Ciencia e Inovación of Spain. We are very grateful to Miguel Angel Martínez for triggering our interest in ecological inference problems by showing us an implementation of a model similar to the one-cluster version of the model in Section 3. We are also grateful to the city council of Barcelona for providing us the data and the maps and to the Associate Editor and two referees for their constructive comments and suggestions. Appendix A. Supplementary material Supplementary material related to this article can be found online at http://dx.doi.org/10.1016/j.csda.2013.10.006. References Bivand, R.S., Pebesma, E.J., Gómez-Rubio, V., 2008. Applied Spatial Data Analysis with R. Springer Verlag, New York. Brown, P.J., Payne, C.D., 1986. Aggregate data, ecological regression, and voting transitions. J. Amer. Statist. Assoc. 81, 452–460. Freedman, D.A., 2001. Ecological inference and the ecological fallacy. In: Smelser, N., Baltes, P. (Eds.), International Encyclopedia of the Social and Behavioural Sciences, vol. 6. Elsevier, New York, pp. 4027–4030. Freedman, D.A., Klein, S.P., Ostland, M., Roberts, M.R., 1998. Review of a solution to the ecological inference problem (by G. King). J. Amer. Statist. Assoc. 93, 1518–1522. Freedman, D.A., Klein, S.P., Sacks, J., Smyth, C.A., Everett, C.G., 1991. Ecological regression and voting rights (with discussion). Evaluation Review 15, 673–816. Gelman, A., Carlin, J.C., Stern, H., Rubin, D.B., 2004. Bayesian Data Analysis, second ed. Chapman and Hall, New York. Gelman, A., Meng, X., Stern, H., 1996. Posterior predictive assessment of model fitness via realized discrepancies. Statist. Sinica 6, 733–807. Gelman, A., Park, D.K., Ansolabehere, S., Price, P.N., Minnite, L.C., 2001. Models, assumptions and model checking in ecological regressions. J.R. Stat. Soc. Ser. A 164, 101–118. Gelman, A., Rubin, D.B., 1992. Inference from iterative simulation using multiple sequences (with discussion). Statist. Sci. 7, 457–511. Glynn, A.N., Wakefield, J., 2010. Ecological inference in the social sciences. Stat. Methodol. 7, 307–322. Goodman, L., 1953. Ecological regressions and the behavior of individuals. Amer. Sociol. Rev. 18, 663–666. Goodman, L., 1959. Some alternatives to ecological correlation. Amer. J. Sociol. 64, 610–625. Greiner, D.J., Quinn, K.M., 2009. R ×C ecological inference: bounds, correlations, flexibility and transparency of assumptions. J.R. Stat. Soc. Ser. A 172, 67–81. Hawkes, A.G., 1969. An approach to the analysis of electoral swing. J.R. Stat. Soc. Ser. A 132, 68–79. King, G., 1997. A Solution to the Ecological Inference Problem. Princeton University Press, Princeton. Lunn, D.J., Thomas, A., Best, N., Spiegelhalter, D.J., 2000. WinBUGS—A Bayesian modelling framework: concepts, structure and extensibility. Stat. Comput. 10, 325–337. Marshall, E.C., Spiegelhalter, D.J., 2003. Approximate cross-validatory predictive checks in disease mapping models. Stat. Med. 22, 1649–1660. Puig, X., Ginebra, J., 2012. On vote switching, location and national divide in Catalonia. Technical Report 2012-08, Department of Statistics and O.R., UPC. Richardson, S., Green, P.J., 1997. On Bayesian analysis of mixtures with an unknown number of components. J. R. Stat. Soc. Ser. B Stat. Methodol. 59, 731–792. Rosen, O., Jiang, W., King, G., Tanner, M.A., 2001. Bayesian and frequentist inference for ecological inference: the R ×C case. Stat. Neerl. 55, 134–156. Wakefield, J., 2004. Ecological inference for 2 × 2 tables (with discussion). J.R. Stat. Soc. Ser. A 167, 385–445.