Economics Letters 148 (2016) 87–90
Contents lists available at ScienceDirect
Economics Letters journal homepage: www.elsevier.com/locate/ecolet
Composite marginal likelihood estimation of spatial autoregressive probit models feasible in very large samples Pavlo Mozharovskyi a , Jan Vogler b,∗ a
Agrocampus Ouest, 65 rue de Saint-Brieuc, 35000 Rennes, France
b
Institute of Econometrics and Statistics, University of Cologne, Universitätsstr. 22a, D-50937 Cologne, Germany
highlights • • • •
We implement Composite Marginal Likelihood (CML) for spatial probit models. Existing CML implementations are infeasible in large samples. We achieve computational feasibility using sparse matrix techniques. We illustrate feasibility of our CML implementation through a Monte-Carlo study.
article
info
Article history: Received 5 July 2016 Received in revised form 16 September 2016 Accepted 19 September 2016 Available online 1 October 2016
abstract Composite Marginal Likelihood (CML) has become a popular approach for estimating spatial probit models. However, for spatial autoregressive specifications the existing brute-force implementations are infeasible in large samples as they rely on inverting the high-dimensional precision matrix of the latent state variable. The contribution of this paper is to provide a CML implementation that circumvents inversion of that matrix and therefore can also be applied to very large sample sizes. © 2016 Elsevier B.V. All rights reserved.
JEL classification: C21 C25 C63 C87 Keywords: Spatial probit models Sparse matrices Composite marginal likelihood Partial maximum likelihood Spatial econometrics
1. Introduction Composite Marginal Likelihood (CML) has become a popular technique for estimating spatial probit models. It has been introduced to the analysis of spatial binary data by Heagerty and Lele (1998) in the field of geostatistics while Bhat et al. (2010) were the first to apply CML in the area of spatial econometrics. The Partial Maximum Likelihood approach of Wang et al. (2013)
∗ Correspondence to: Institut für Ökonometrie und Statistik, Universität zu Köln, Universitätsstr. 22a, D-50937 Köln, Germany. Fax: +49 0 221 470 5074. E-mail addresses:
[email protected] (P. Mozharovskyi),
[email protected] (J. Vogler). http://dx.doi.org/10.1016/j.econlet.2016.09.022 0165-1765/© 2016 Elsevier B.V. All rights reserved.
is equivalent to CML. For a comprehensive overview on CML see Bhat (2014). Relative to standard Maximum Likelihood (ML) the CML approach has the advantage of avoiding high-dimensional numerical integration. However, for spatial autoregressive specifications the existing brute-force implementations of CML rely on inverting the precision matrix of the latent state variable. Since that matrix has dimensions equal to the number of observations n brute-force CML is infeasible in large samples (Billé, 2013). The contribution of this paper is to provide a CML implementation that avoids computation of the full inverse matrix. Instead we compute only those elements of the inverse matrix that are actually required by CML. We do so by exploiting a recursion formula derived by Takahashi et al. (1973). Sparse matrix techniques allow us to apply it even for large n with low computational costs so that
88
P. Mozharovskyi, J. Vogler / Economics Letters 148 (2016) 87–90
CML becomes feasible in large samples. Originally the recursion has been introduced to the Bayesian spatial statistic context by Rue and Martino (2007). However, its potential for CML has not been recognized so far by the spatial econometrics literature.
Spatial probit models are defined as follows. Let yi denote a binary dependent variable observed for spatial unit i:
1, 0,
if y∗i ≥ 0 , else
i = 1, . . . , n.
(1)
∗
The factor yi denotes a latent state variable following a linear spatial model of the form y∗ = m + u,
u ∼ N(0, H −1 ),
(2)
where y = (y1 , . . . , yn ) denotes the state vector with mean m = (m1 , . . . , mn )′ and u = (u1 , . . . , un )′ is a vector of Gaussian errors with precision matrix H. For spatial autoregressive specifications ∗
∗
∗ ′
H = (I − ρ W )′ (I − ρ W ),
likelihoods associated to each pair: LCML =
G
P (z(g ) ≤ ν(g ) ) =
g =1
2. Spatial probit models
yi =
ν). The CML approach maximizes the product over the marginal
(3)
where I labels an (n × n) identity matrix, ρ denotes a correlation parameter and W = (wij ) is an (n × n) matrix of spatial weights wij . In typical applications W consists of a large proportion of zero entries so that W and H are sparse matrices. We consider two specifications for m. The first leads to the Spatial Autoregressive Lagged dependent variable (SAL) model, the second to the Spatial Autoregressive Error (SAE) model. Thus, the mean is given as SAL: m = (I − ρ W )−1 X β
(4)
SAE: m = X β,
(5)
where the (n × ℓ) matrix X denotes exogenous variables and β is an ℓ-dimensional vector of slope parameters. 3. Composite marginal likelihood Evaluating the likelihood of spatial probit models amounts to numerical integration over the n-dimensional interdependent state vector u, which is computationally demanding for large n. The idea underlying CML is to avoid high-dimensional integration by dividing the spatial units into groups and then to account for dependence within each group but to ignore dependence between groups. Typically these groups are pairs of units. The objective function to be maximized by CML then results as a product of bivariate Gaussian cdfs which are amenable to standard numerical integration. Larger group sizes of three or four spatial units increase statistical efficiency but come at the cost of increased computational burden. In fact the pairwise approach has proven to be a good balance between computational and statistical efficiency (see also Bhat, 2014). In total there are n(n − 1)/2 possibilities to create pairs from n observations. Since spatial dependence decreases rapidly with rising distance most correlation can be captured by accounting only for pairs including nearby observations. Two spatial units i and j are classified as nearby observations if the spatial weight wij exceeds a threshold value. For further discussions including computation of standard errors as well as efficiency and robustness of CML compared to ML see Bhat (2014). The pairwise CML implementation divides the spatial units into pairs indexed by g = 1, . . . , G with the members of each pair g denoted as g1 and g2. Furthermore define z = Qu and ν = −Qm, where Q is a diagonal matrix with entries 1 − 2yi for i = 1, . . . , n. The likelihood function is obtained as the joint probability P (z ≤
G
Φ2 (ν(g ) ; 0, Σg ),
(6)
g =1
where the (2 × 1) vector z(g ) = (zg1 , zg2 )′ collects the elements from z associated to pair g and ν(g ) = (νg1 , νg2 )′ denotes the corresponding upper bound. Thus z(g ) is bivariate Gaussian with cdf Φ2 (·) and covariance matrix Σg . 4. Computational issues In principle Σg can be computed by extracting the corresponding elements from Σ = Cov(z ) = QH −1 Q ′ . However, inversion of the (n × n) precision matrix H becomes prohibitively costly for large samples (n = 50,000+) so that the existing brute-force implementations of CML are infeasible. In order to overcome this problem Billé (2013) proposes the use of sparse matrix methods to implement a computationally efficient power series expansion of H −1 . Nevertheless this approach is just an approximation and furthermore is still costly for large n. Alternatively the sparse conjugate gradient method Smirnov (2010) suggested for pseudomaximum likelihood estimation of a spatial random utility choice model1 may be considered. Fortunately, inverting H is actually not necessary since only a subset of the elements in Σ is required. According to Takahashi et al. (1973) single elements of the inverse of a positive-definite matrix can be computed by using the following recursion. Let L denote a lower triangular Cholesky factor such that LL′ = Σ −1 = Q ′ HQ . Then the element in row i and column j of Σ is given by:
Σij =
δij L2ii
−
n 1
Lii k=i+1
Lki Σkj ,
where j ≥ i, i = n → 1
(7)
and δij = 1 if i = j and δij = 0 otherwise.2 For computational efficiency it is critical that L is obtained as a sparse matrix. In this case most of the terms in the sum are zero and can therefore be ignored which greatly accelerates computation time making the recursion feasible even in very large samples. Although the precision matrix Q ′ HQ is sparse this does not translate directly into sparsity of its Cholesky factor L. However, reordering the spatial units i = 1, . . . , n according to a symmetric approximate minimum degree permutation of the rows and columns of Q ′ HQ reduces the ‘fill-in’ occurring during computations resulting in a sparse L (see Rue and Martino, 2007 and Ch. 4 in LeSage and Pace, 2009). 5. Monte-Carlo study In the following, we illustrate our computational efficient CML implementation by conducting a Monte-Carlo study. We generate data sets with n = 50,000 spatial units. The spatial weight matrix W is constructed by simulating a pair of coordinates from a uniform (0, 1) distribution for each spatial unit. We construct a row-standardized spatial weight matrix by means of the MATLAB function fasymneighbors2 from the spatial statistics toolbox by Robert Kelley Pace (http://www.spatial-statistics.com), which – exploiting Delaunay triangulation – assigns exactly 6 neighbors to each spatial unit.
1 In this model the error term of each state variable follows a type 1 extreme value distribution. 2 The study by Takahashi et al. (1973) may be difficult to access. For a detailed discussion of recursion (7) we refer to Rue and Martino (2007).
P. Mozharovskyi, J. Vogler / Economics Letters 148 (2016) 87–90
89
Table 1 Computation times for varying samples sizes in seconds. Sample size n
500
1000
5000
15 000
50 000
100 000
Recursion CML1 Recursion CML2
0.0038 0.0038
0.0094 0.0103
0.0922 0.0966
0.5070 0.5120
2.9746 3.0280
8.5825 8.6024
Inverse LCML1 function
0.0068 0.0094
0.0328 0.0180
2.4963 0.1187
31.1403 0.5967
1.2282 ×104 3.2887
– 9.1151
LCML2 function
0.4957
0.9641
4.7812
15.1356
59.0531
136.3239
Table 2 Monte-Carlo results SAL variant. Parameter
True value
ρ
0.7500
β1
−1.5000
β2
3.0000
Estimates CML1
CML2
0.7498 (0.0013) [0.0013] −1.5008 (0.0269) [0.0270] 3.0055 (0.0476) [0.0479]
0.7498 (0.0013) [0.0013] −1.5012 (0.0279) [0.0279] 3.0061 (0.0491) [0.0495]
True value
Estimates CML1
CML2
0.8500
0.8500 (0.0012) [0.0012] −1.4993 (0.0263) [0.0263] 3.0009 (0.0517) [0.0517]
0.8500 (0.0011) [0.0011] −1.4993 (0.0259) [0.0259] 3.0009 (0.0510) [0.0510]
−1.5000
3.0000
Table 3 Monte-Carlo results SAE variant. Parameter
True value
ρ
0.7500
β1
−1.5000
β2
3.0000
Estimates
True value
CML1
CML2
0.7444 (0.0340) [0.0344] −1.4915 (0.0875) [0.0879] 2.9921 (0.1716) [0.1718]
0.7472 (0.0172) [0.0175] −1.4903 (0.0563) [0.0571] 2.9902 (0.0997) [0.1002]
The regression equation for spatial unit i is given by x′i β = β0 + β1 xi ,
(8)
where the xi ’s are uniformly i.i.d. on the interval (0, 1) and (β0 , β1 ) = (−1.5, 3). With respect to the strength of spatial dependence we consider two different scenarios: ρ = {0.75, 0.85}. For each setting we simulate 50 data sets. We implement CML with group sizes of 1 and 2 denoted as CML1 and CML2 . For CML2 , two observations form a group if the according spatial weight wij is different from zero. Thus, each spatial unit appears in 6 groups so that the groups are overlapping. The LCML function is maximized by using the BFGS optimizer. A full MATLAB code is available from the authors upon request. Table 1 summarizes computation times for the recursion (7), the full inverse Σ = H −1 and the LCML function in (6) at the true parameter values for varying n in the SAL case with ρ = 0.75. Since the CML1 approach does not require covariances it involves fewer elements to be computed during the recursion than its CML2 counterpart. Therefore, we report computation times for both cases. Clearly implementing CML on a computation of the full inverse becomes infeasible for n = 50,000+, where it takes more than three hours to obtain a result while running the recursion just needs 3 seconds. For n = 100,000 computing the inverse was even impossible because of memory constraints. Furthermore, for CML2 the recursion requires only a small fraction of overall computing time needed to evaluate the LCML function (3.03 s vs. 59.05 s for n = 50,000). This reflects the high computational effort necessary to compute the bivariate cdfs in (6). In contrast evaluation of univariate cdfs is computationally much
0.8500
−1.5000
3.0000
Estimates CML1
CML2
0.8461 (0.0147) [0.0152] −1.4816 (0.0856) [0.0876] 2.9763 (0.1633) [0.1650]
0.8482 (0.0080) [0.0082] −1.4887 (0.0622) [0.0632] 2.9889 (0.0953) [0.0960]
cheaper so that computation time drops significantly when moving from CML2 to CML1 . Tables 2 and 3 report mean, standard deviation (in parentheses) and root mean squared error [in brackets] over the 50 different parameter estimates. The results show that CML1 as well as CML2 estimates are essentially unbiased. Surprisingly, CML1 outperforms CML2 for ρ = 0.75. However, as pointed out by Wang et al. (2013, pp. 82–83), there is no theoretical result that establishes superiority of CML2 over CML1 in terms of statistical efficiency. For ρ = 0.85, CML2 has a slight advantage over the CML1 approach in the SAL model while it is clearly superior in the SAE case. 6. Conclusion In the current literature application of CML to probit models where the state variable exhibits a spatial autoregressive process is considered to be computationally infeasible due to the need to invert the (n × n) precision matrix. In this paper we have shown that this inversion can be circumvented by computing the covariance matrix elements CML requires through a recursion that exploits the sparsity of the spatial weights matrix W . Doing so the CML approach can be applied even in large samples (n = 50 000+). As pointed out by Liesenfeld et al. (forthcoming) spatial probit models are subsumed by a very general class of spatial models, that also includes specifications for censored and truncated variables, sample selection and count models. The recursion applied in this paper can be straightforwardly utilized to implement computationally feasible CML estimation of this broad class of models. Actually, we did so for a spatial Poisson specification but do not discuss this here due to space constraints.
90
P. Mozharovskyi, J. Vogler / Economics Letters 148 (2016) 87–90
Acknowledgments The authors thank Roman Liesenfeld and an anonymous referee for their insightful comments and helpful suggestions. They also thank Albrecht Mengel and Julian Schröder for providing access to the grid computing facilities of the Institute of Statistics and Econometrics at the University of Kiel. The authors are grateful to the Deutsche Forschungsgemeinschaft who supported the work of Jan Vogler [grant LI 901/3-1] and to the Lebesgue Centre of Mathematics who supported the work of Pavlo Mozharovskyi [program PIA-ANR-11-LABX-0020-01]. References Bhat, C.R., 2014. The composite marginal likelihood (cml) inference approach with applications to discrete and mixed dependent variable models. Found. Trends Econom. 7, 1–117. Bhat, C.R., Sener, I.N., Eluru, N., 2010. A flexible spatially dependent discrete choice model: Formulation and application to teenagers weekday recreational activity participation. Transp. Res. B 44, 903–921.
Billé, A.G., 2013. Computational issues in the estimation of the spatial probit model: A comparison of various estimators. Rev. Reg. Stud. 43, 131–154. Heagerty, P.J., Lele, S.R., 1998. A composite likelihood approach to binary spatial data. J. Amer. Statist. Assoc. 93, 1099–1111. LeSage, J.P., Pace, R.K., 2009. Introduction to Spatial Econometrics. CRC Press, Taylor and Francis Group. Liesenfeld, R., Richard, J.F., Vogler, J., 2016. Likelihood evaluation of highdimensional spatial latent gaussian models with non-gaussian response variables. In: Baltagi, B.H., LeSage, J.P., Pace, R.K. (Eds.), Spatial and Spatiotemporal Econometrics. In: Adv. Econometrics, Emerald, Bingley, UK, (forthcoming). Rue, H., Martino, S., 2007. Approximate Bayesian inference for hierarchical gaussian markov random field models. J. Statist. Plann. Inference 137, 3177–3192. Smirnov, O.A., 2010. Modeling spatial discrete choice. Reg. Sci. Urban Econ. 40, 292–298. Takahashi, K., Fagan, J., Chen, M.S., 1973. Formation of a sparse-bus impedance matrix and its application to short circuit study. In: Eighth PICA Conference Proceedings. IEEE Power Engineering Society. Papers Presented at the 1973 Power Industry Computer Application Conference in Minneapolis, MN, pp. 63–69. Wang, H., Iglesias, E.M., Wooldridge, J.M., 2013. Partial maximum likelihood estimation of spatial probit models. J. Econometrics 172, 77–89.