Exact computation of Censored Least Absolute Deviations estimator

Exact computation of Censored Least Absolute Deviations estimator

Journal of Econometrics 212 (2019) 584–606 Contents lists available at ScienceDirect Journal of Econometrics journal homepage: www.elsevier.com/loca...

607KB Sizes 0 Downloads 79 Views

Journal of Econometrics 212 (2019) 584–606

Contents lists available at ScienceDirect

Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom

Exact computation of Censored Least Absolute Deviations estimator ∗

Yannis Bilias a , , Kostas Florios a,b , Spyros Skouras a a

Department of International and European Economic Studies, Athens University of Economics and Business, 76 Patission Str, GR-10434, Athens, Greece b Laboratory of Industrial and Energy Economics, School of Chemical Engineering, National Technical University of Athens, Zografou Campus, GR-15780, Athens, Greece

article

info

Article history: Received 14 January 2014 Received in revised form 6 December 2018 Accepted 29 May 2019 Available online 2 July 2019 JEL classification: C13 C14 C24 C44 C61

a b s t r a c t We show that exact computation of the censored least absolute deviations (CLAD) estimator proposed by Powell (1984) may be achieved by formulating the estimator as a linear Mixed Integer Programming (MIP) problem with disjunctive constraints. We apply our approach to three previously studied datasets and find that widely used approximate optimization algorithms can lead to erroneous conclusions. Extensive simulations confirm that MIP-based computation using available solvers is effective for datasets typically encountered in econometric applications and that, despite the proliferation of competitors, CLAD remains a useful estimator. © 2019 Elsevier B.V. All rights reserved.

Keywords: CLAD estimator Censored regression models Mixed Integer Programming Disjunctive constraints

1. Introduction Powell (1984) introduced median regression for dependent variables subject to fixed known censoring. In econometric applications, his so-called censored least absolute deviations (CLAD) estimator is usually formalized as follows: min β

N ∑

|yi − max(x′i β, 0)|,

(1)

i=1

where {(yi , x′i ); i = 1, . . . , N } are N measurements on the dependent variable y and the (1 × p) vector of regressors x′ , and β is the (p × 1) vector of unknown coefficients. The dependent variable is left-censored at 0 in the sense that its measurements are assumed to be generated as y = max(x′ β + u, 0), u being an unobserved stochastic error term. The objective function is easily adapted to cases of right censoring usually met in the field of biostatistics, by switching the signs of all regressors. In the case of uncensored data, global minimization of the sum of least absolute deviations can be achieved by mapping the problem to a linear programming (LP) program. As a result, the least absolute deviations (LAD) estimator can be ∗ Corresponding author. E-mail addresses: [email protected] (Y. Bilias), [email protected] (K. Florios), [email protected] (S. Skouras). https://doi.org/10.1016/j.jeconom.2019.05.016 0304-4076/© 2019 Elsevier B.V. All rights reserved.

Y. Bilias, K. Florios and S. Skouras / Journal of Econometrics 212 (2019) 584–606

585

computed exactly and efficiently.1 On the other hand, in the case of censored data the objective function is non-convex and so it is much harder to compute its minimizer. The IPOL algorithm of Fitzenberger (1997b), is, to the best of our knowledge, the only available exact algorithm for CLAD estimation. However, because IPOL involves complete enumeration of potential (interpolated) solutions, it has exponential complexity in the number of parameters and therefore is impractical in many empirical applications of even moderate size as it requires the solution of C (N , p) linear p × p systems. We will show that the CLAD minimization problem (1) can be reformulated as a linear mixed integer programming (MIP) problem with appropriately constructed disjunctive constraints. As a consequence, an exact solution to the MIP problem is also an exact computation of the CLAD estimator. Moreover, widely available MIP solvers such as SCIP (Achterberg, 2009) can provide exact computation in problems of much larger scale than what is practical with IPOL. The difficulty in computing CLAD estimators is widely appreciated. Applied use of CLAD estimators typically involves approximate optimization algorithms (i.e., algorithms that are intuitively sensible but do not guarantee convergence to the true solutions in finite time) based on linear programming approximations — usually one of BRCENS (Fitzenberger, 1994, 1997a) or ILPA (Buchinsky, 1994). Additionally, approximate optimization algorithms have been proposed by Koenker and Park (1996), Fitzenberger and Winker (2007), and earlier by Womersley (1986). Fitzenberger and Winker (2007) offer an informative account of the current state of research on computing CLAD estimators. If exact optima are not absolutely essential (e.g., in forecasting applications where imperfect parameter estimates can be consistent with satisfactory properties of forecast loss), then approximate optimization algorithms can be very useful. However, as emphasized by Andrews (1997), approximate optimization algorithms are in general problematic because the statistical properties of such procedures will typically differ from those of exact estimates and therefore using approximate optimization algorithms to compute estimators and then using them as if they were exactly computed will typically lead to invalid inferences. Nevertheless, CLAD estimates computed using approximate optimization algorithms are widely used for inference in applied work. In our Online Appendix, we provide a long list of applied research in which such estimates are reported, in the hope that they correspond to global optima. Such empirical results should be approached with caution, especially in the light of evidence we present in Tables 3–5 where we demonstrate that published CLAD estimates computed with approximate optimization algorithms can lead to misleading conclusions about the economics of the empirical analysis. One response to the difficulty in computing CLAD estimators exactly has been to sidestep the problem by using other estimators that are easier to compute but have asymptotically similar theoretical properties. Notable such estimators include those developed by Buchinsky and Hahn (1998), Chernozhukov and Hong (2002, 2003), Khan and Powell (2001), Portnoy (2003), Peng and Huang (2008), Tang et al. (2012), and Chen (2018).2 The key difficulty in computing CLAD estimators is that the objective function contains many non-convexities, kinks and flat regions, since it is the sum of N functions, each of which contains a non-convexity, a kink and a flat region. We tackle the kink due to the absolute value operator, by adopting the standard linear programming approach used in LAD estimation. To tackle the two other irregularities which are caused by the max operator, we replace it with a set of disjunctive constraints, a standard approach in MIP modeling. In terms of antecedents to this work, it is worth noting that a MIP model has been used by Florios and Skouras (2008) to compute the max score estimator of Manski (1975), but the MIP model employed there was constructed to deal with the step function in the score objective as opposed to the max operator in the CLAD objective (this model has been refined and applied by Chen and Lee (2018) and Kitagawa and Tetenov (2018) among others). Our approach also has similarities with the threshold accepting algorithm in Fitzenberger and Winker (2007), notably because both approaches view CLAD as a combinatorial optimization problem, while attempting to avoid the curse of dimensionality associated with complete enumeration by structuring the search path appropriately. However, there are also important differences: most importantly, the threshold accepting algorithm is approximate, so it is subject to the previously discussed pros and cons of approximate optimization algorithms. The rest of our paper is structured as follows. In Section 2, we present our MIP representation of the CLAD estimator (1). In Section 3, our approach is applied to three previously used datasets. First, we apply it in the context of classic ‘Psychology Today’ and ‘Redbook’ datasets (Fair, 1978) on the determinants of extramarital affairs and obtain plausible estimates, whereas BRCENS and ILPA clearly fail. This means that standard computation of the CLAD estimator would lead to erroneous conclusions. We next re-compute the estimates reported in Fafchamps et al. (2000) in a study of African manufacturing firms and find that the published estimates (computed using the ILPA approximate optimization algorithm) are not the global optima and that therefore the reported parameters are not accurate CLAD estimates, while our MIP estimates significantly modify the economic conclusions of the empirical analysis. In Section 4.1, the computational 1 A modification of the simplex algorithm, proposed by Barrodale and Roberts (1973), is now routinely available in most statistical packages for median estimation. 2 Estimators with computational advantages often have their own drawbacks. Some of these estimators are not even defined for all samples (a situation which arises in many samples in our simulations) and often involve additional assumptions beyond those typically invoked when using CLAD estimators. As an example of additional assumptions required, the estimator by Portnoy (2003) was developed for randomly censored data as a regression generalization of the Kaplan–Meier estimator under the restrictive assumption that all regression quantiles are linear (in parameters) simultaneously; see Lin et al. (2012) for an extension of this estimator to doubly censored data. This global linearity requirement should be contrasted with a restriction only on the quantile of interest in Powell (1986).

586

Y. Bilias, K. Florios and S. Skouras / Journal of Econometrics 212 (2019) 584–606

Table 1 CLAD computation and other estimators in Fair (1978) Psychology Today (PT) data. Powell’s CLAD objective is minimized using different algorithms to estimate the effect of the determinants on the median of the left-censored dependent variable. Five alternatives to CLAD estimators are also used to estimate the same effect. Dependent variable: no. of extramarital sexual intercourses during the past year 601 obs, 151 non-zero obs (75% censoring) CLAD computation algorithmsa

Marital happiness Age No.of years married Degree of religiosity Constant

|error|c

Alternative estimatorsa

MIP

MIP Ltdb

MIP Ltd BRC

ILP

BRC

mBRC

TA

CH LT

CH 3ST

POR

PH

TOB

−6.00 (1.64) −3.00 (0.74) 5.70 (1.28) −3.00 (1.36) 37.50 (17.63)

−6.00 (1.64) −3.00 (0.74) 5.70 (1.28) −3.00 (1.36) 37.50 (17.63)

−6.00

0.00d (0.00) 0.00 (0.00) 0.00 (0.00) 0.00 (0.00) 0.00 (0.00)

0.00 (0.00) 0.00 (0.00) 0.00 (0.00) 0.00 (0.00) 0.00 (0.00)

−6.00 (3.99) −0.45 (0.44) 1.66 (1.16) −3.25 (3.08) 16.83 (15.51)

−6.00 (1.66) −3.00 (0.74) 5.70 (1.27) −3.00 (1.36) 37.50 (17.67)

−6.69 (0.79) −3.62 (0.43) 9.31 (2.38) −3.19 (0.57) 4.55 (34.55)

−4.13 [−7.6, 0] −0.40 [−.9, .1] 1.38 [0, 2.28] −3.00 [−7.2, 0] 15.43 [0, 35.7]

−2.40 (3.70) −0.12 (0.13) 0.19 (0.28) −1.20 (1.72) 11.15 (11.41)

−7.69 (7.05) −0.20 (1.00) −0.13 (1.64) −1.00 (7.57) 23.96 (33.04)

−2.27

(1.64) −3.00 (0.74) 5.70 (1.28) −3.00 (1.36) 37.50 (17.63)

(0.08) 0.54 (0.13) −1.72 (0.40) 9.08 (2.66)

827.00

827.00

827.00

875.00

875.00

850.55

827.00

828.39

856.25

864.07

881.80

860.97

(0.41)

−0.16

a

The acronyms of algorithms and estimators, and implementation details are explained in Appendix C. It runs with a one hour budget. c |error| is the value of Powell’s CLAD objective function when evaluated at the corresponding estimates. d ILP and BRC algorithms return the starting value of 0. b

Table 2 CLAD computation and other estimators in Fair (1978) Redbook (RB) data. Powell’s CLAD objective is minimized using different algorithms to estimate the effect of the determinants on the median of the left-censored dependent variable. Five alternatives to CLAD estimators are also used to estimate the same effect. Dependent variable: measure of time spent by women in extramarital affairs during the past year 6366 obs, 2053 non-zero obs (68% censoring) CLAD computation algorithmsa

Alternative estimatorsa

MIP

MIP Ltdd

MIP Ltd BRC

ILP

BRC

mBRC

TA

CH LT

CH 3ST

POR

PH

TOB

Occupation

nab

Education

na

Husband occupation

na

Marital happiness

na

Age

na

Years

na

Children

na

Religiosity

na

Constant

na

0.05 (0.01) −0.02 (0.00) 0.04 (0.01) −0.16 (0.03) 0.00 (0.00) 0.02 (0.00) 0.03 (0.01) −0.06 (0.01) 0.35 (0.06)

0.12 (0.02) −0.03 (0.01) 0.03 (0.02) −0.32 (0.04) −0.03 (0.01) 0.04 (0.01) −0.02 (0.02) −0.17 (0.04) 1.98 (0.34)

0.00e (0.05) 0.00 (0.01) 0.00 (0.00) 0.00 (0.05) 0.00 (0.02) 0.00 (0.02) 0.00 (0.02) 0.00 (0.07) 0.00 (0.82)

0.00 (0.05) 0.00 (0.01) 0.00 (0.00) 0.00 (0.05) 0.00 (0.02) 0.00 (0.02) 0.00 (0.02) 0.00 (0.07) 0.00 (0.82)

0.12 (0.03) −0.03 (0.01) 0.03 (0.02) −0.32 (0.04) −0.03 (0.01) 0.04 (0.01) −0.02 (0.02) −0.18 (0.04) 1.97 (0.32)

0.12 (0.01) −0.03 (0.00) 0.03 (0.00) −0.32 (0.03) −0.03 (0.00) 0.04 (0.00) −0.02 (0.00) −0.17 (0.02) 1.96 (0.18)

0.13 (0.04) −0.04 (0.02) 0.04 (0.03) −0.39 (0.06) −0.04 (0.01) 0.05 (0.02) −0.04 (0.05) −0.24 (0.07) 2.41 (0.47)

0.02 [0,.09] 0.00 [−.02, 0] 0.00 [−.01,.02] −0.06 [−.24,0] 0.00 [−.03,0] 0.01 [0,.03] 0.00 [−.02,.01] −0.03 [−.15, 0] 0.43 [0,1.65]

0.09 (0.03) −0.05 (0.01) 0.02 (0.04) −0.46 (0.03) −0.05 (0.02) 0.09 (0.02) 0.03 (0.02) −0.25 (0.03) 2.76 (0.24)

0.11 (0.05) −0.06 (0.03) 0.02 (0.03) −0.48 (0.05) −0.05 (0.03) 0.09 (0.02) 0.02 (0.04) −0.26 (0.05) 2.80 (0.64)

0.31 (0.08) −0.09 (0.04) 0.01 (0.06) −1.53 (0.00) −0.11 (0.02) 0.13 (0.03) −0.03 (0.08) −0.94 (0.08) 7.84 (0.71)

|error|c

na

4459.07

4410.90

4490.41

4490.41

4410.90

4410.92

4413.27

4477.83

4449.46

4451.44

4528.56

a

The acronyms of algorithms and estimators, and implementation details are explained in Appendix C. Estimates are not available (‘na’); MIP algorithm does not deliver an exact solution within a reasonable amount of time. c |error| is the value of Powell’s CLAD objective function when evaluated at the corresponding estimates. d It runs with a one hour budget. e ILP and BRC algorithms return the starting value of 0. b

performance of MIP against IPOL is evaluated in simulations for re-sampled data from Fair’s Psychology Today dataset. Moreover, censored DGPs previously used in simulation studies of Fitzenberger and Winker (2007), Khan and Powell

Y. Bilias, K. Florios and S. Skouras / Journal of Econometrics 212 (2019) 584–606

587

Table 3 CLAD computation and other estimators in Fafchamps et al. (2000) data — delayed deliveries. Dependent variable: measure of delayed deliveries 165 obs, 92 non-zero obs (44% censoring) CLAD computation algorithms

Predicted sales Food sector dummy Wood & furniture sector dummy Textile sector dummy Metal sector dummy Leather sector dummy Startup year Share of inputs imported Share main input from main supplier Located in Bulawayo Located in small town Constant

|error| a

Alternative estimators

MIP Ltd

MIP Ltd BRC

ILP puba

BRC

mBRC

TA

CH LT

CH 3ST

POR

PH

TOB

0.00 (0.00) 0.38 (0.59) 0.72 (0.62) −0.90 (0.36) −1.40 (0.38) −1.63 (0.93) −0.04 (0.01) 0.01 (0.01) −0.01 (0.01) 0.07 (0.20) −2.78 (0.12) 70.44 (15.98)

0.00 (0.00) 0.37 (0.65) 0.70 (0.78) −0.91 (0.63) −1.41 (0.71) −1.65 (0.82) −0.04 (0.01) 0.01 (0.01) −0.01 (0.01) 0.09 (0.55) −2.79 (0.00) 70.95 (24.68)

0.00 (0.00) 0.43 (0.62) 0.43 (0.79) −0.64 (0.52) −0.83 (0.54) −1.01 (0.87) −0.03 (0.01) 0.01 (0.01) −0.01 (0.01) −0.17 (0.39) −1.18 (0.75) 58.95 (23.76)

0.00 (0.00) 0.76 (0.65) −0.16 (0.73) −0.87 (0.66) −1.39 (0.59) −1.48 (0.75) −0.03 (0.01) 0.01 (0.01) −0.01 (0.01) 0.02 (0.45) −1.65 (0.85) 68.97 (23.12)

0.00 (0.00) 0.76 (0.68) −0.16 (0.74) −0.87 (0.63) −1.39 (0.60) −1.48 (0.73) −0.03 (0.01) 0.01 (0.01) −0.01 (0.01) 0.02 (0.47) −1.65 (0.84) 68.97 (23.70)

0.00 (0.00) 0.71 (0.60) −0.33 (0.60) −0.92 (0.34) −1.40 (0.36) −1.66 (0.85) −0.04 (0.01) 0.01 (0.00) 0.00 (0.00) 0.10 (0.20) −3.14 (0.30) 71.34 (14.41)

0.00 (0.00) 0.45 (0.51) −0.02 (0.66) −0.74 (0.38) −2.38 (1.24) −2.58 (1.42) −0.04 (0.00) 0.02 (0.01) −0.01 (0.01) −0.31 (0.39) −4.30 (1.92) 78.42 (3.11)

0.00 [0, 0] 0.39 [−1.06, 1.53] 0.48 [−1.43, 1.47] −0.84 [−2.4, .75] −1.36 [−2.55, .50] −0.92 [−2.98, .53] −0.02 [−.05, 0] 0.01 [−.01, .03] −0.01 [−.03, .01] 0.00 [−1.15, .86] −1.19 [−3.91, .54] 48.22 [2.12, 101.13]

0.00 (0.00) 0.38 (0.80) 0.05 (1.82) −0.86 (0.72) −1.25 (1.27) −1.72 (1.36) −0.04 (0.02) 0.02 (0.02) −0.01 (0.01) −0.55 (0.91) −1.27 (0.54) 72.92 (49.56)

0.00 (0.00) 0.21 (0.72) 0.07 (0.87) −1.45 (0.70) −1.15 (1.15) −2.40 (1.31) −0.05 (0.02) 0.01 (0.02) −0.02 (0.01) −0.52 (0.87) −1.51 (0.48) 94.62 (43.84)

0.00 (0.00) 0.43 (0.66) 0.34 (0.68) −0.77 (0.72) −0.86 (0.62) −1.72 (0.98) −0.04 (0.01) 0.01 (0.01) −0.01 (0.01) −0.50 (0.49) −1.47 (0.66) 79.06 (26.33)

176.98

176.98

182.94

177.82

177.82

176.98

180.98

178.87

180.45

182.37

184.43

These are the published estimates in Fafchamps et al. (2000). Other implementation details are as in Tables 1 and 2.

(2001) and Chernozhukov and Hong (2003) are utilized in Section 4.2 to compare the computational performance of MIP against competing algorithms and to study the statistical properties (bias and RMSE) of the exact CLAD estimator against four competing censored regression median estimators. In Section 5 we briefly conclude. In Appendix A we provide a toy CLAD estimation example and sketch a solution algorithm that can be used to solve our MIP formulation of the problem. In Appendix B we show how our model can be extended to LAD estimators for doubly censored data and censored quantile regression (CQR) estimators (Powell, 1986). The simulated DGPs are described in detail in Appendix C. A list of empirical applications using CLAD estimates computed with approximate optimization algorithms and detailed results from our simulations on the statistical performance of estimators are provided in our Online Appendix. Finally, we provide companion open source code for computing CLAD estimators.3 2. MIP approach to median estimation for censored data 2.1. Formulation of CLAD program as linear MIP with disjunctions Linear mixed integer programming (MIP) refers to a linear optimization problem with some integer choice variables. As is well known, the uncensored LAD problem can be solved as a regular linear programming problem. Censoring introduces the complication of the max() operator in the objective function to be minimized (1), which we propose to model using integer, in particular binary, choice variables. It is the max operator which makes CLAD particularly difficult to optimize, pushing the optimization into the territory of mixed integer rather than simple linear programming. It is well known (at least since Barrodale and Roberts, 1973) that the ‘|·|’ function in the objective function (1) can be modeled by introducing the split variables sm (slack minus) and sp (slack plus). For example, the uncensored LAD problem: min β

N ∑

|yi − x′i β|

i=1

3 This is available at https://sites.google.com/site/kflorios/clad.

588

Y. Bilias, K. Florios and S. Skouras / Journal of Econometrics 212 (2019) 584–606

Table 4 CLAD computation and other estimators in Fafchamps et al. (2000) data — overdraft reserves. Dependent variable: measure of overdraft reserves 312 obs, 158 non-zero obs (49% censoring) CLAD computation algorithms

Log incidence delayed deliveries Log incidence of delayed payments Predicted sales Food sector dummy Wood and furniture sector dummy Textile sector dummy Metal sector dummy Leather sector dummy Dummy for 1995 Constant

|error| a

Alternative estimators

MIP Ltd

MIP Ltd BRC

ILP puba

BRC

mBRC

TA

CH LT

CH 3ST

POR

PH

TOB

0.37 (0.10) 0.39 (0.09) 0.00 (0.00) 0.08 (1.37) −0.19 (1.86) −0.58 (3.14) 0.11 (2.94) 0.50 (1.34) −0.04 (0.29) −1.96 (1.21)

0.79 (0.44) 0.92 (0.54) 0.00 (0.00) −0.19 (0.69) −0.39 (0.50) −1.43 (1.10) 0.25 (0.59) 1.12 (0.85) −0.05 (0.06) −4.29 (2.32)

0.01 (0.00) 0.00 (0.00) 0.00 (0.00) 0.00 (0.01) 0.00 (0.01) 0.00 (0.02) 0.00 (0.01) 0.01 (0.01) −0.01 (0.01) −0.01 (0.01)

0.01 (0.00) 0.00 (0.00) 0.00 (0.00) −0.01 (0.01) 0.00 (0.01) 0.00 (0.02) 0.00 (0.02) 0.01 (0.01) −0.01 (0.01) −0.01 (0.01)

0.01 (0.00) 0.00 (0.00) 0.00 (0.00) 0.08 (0.07) 0.08 (0.07) 0.09 (0.07) 0.07 (0.07) 0.09 (0.07) 0.00 (0.01) −0.11 (0.08)

0.01 (0.00) 0.00 (0.00) 0.00 (0.00) 0.08 (0.15) 0.08 (0.15) 0.09 (0.15) 0.07 (0.16) 0.09 (0.16) 0.00 (0.00) −0.11 (0.15)

4.12 (3.49) −0.77 (4.39) −5.92 (0.91) 3.53 (3.19) −1.68 (3.09) −11.19 (3.94) −5.65 (6.10) −8.93 (3.75) 13.65 (2.70) 10.62 (5.35)

0.01 [.00, .02] 0.00 [.00, .01] 0.00 [.00, .00] 0.02 [−.02, .04] 0.02 [−.02, .04] 0.01 [−.04, .04] 0.02 [−.02, .03] 0.03 [−.01, .05] 0.00 [−.02, .01] −0.02 [−.06, .01]

0.01 (0.00) 0.00 (0.00) 0.00 (0.00) 0.00 (0.02) 0.01 (0.02) 0.01 (0.02) 0.01 (0.02) 0.02 (0.02) −0.01 (0.01) −0.03 (0.02)

0.01 (0.00) 0.00 (0.00) 0.00 (0.00) 0.00 (0.03) 0.01 (0.03) 0.01 (0.02) 0.01 (0.03) 0.02 (0.02) −0.01 (0.01) −0.03 (0.03)

0.02 (0.00) 0.01 (0.00) 0.00 (0.00) 0.05 (0.02) 0.00 (0.02) 0.04 (0.02) 0.03 (0.02) 0.03 (0.02) 0.00 (0.01) −0.09 (0.02)

8.34

8.27

8.77

8.73

8.71

8.71

9.39

8.85

8.78

8.79

9.36

These are the published estimates in Fafchamps et al. (2000). Other implementation details are as in Tables 1 and 2.

Table 5 CLAD computation and other estimators in Fafchamps et al. (2000) data — overdraft credit. Dependent variable: measure of overdraft credit 474 obs, 225 non-zero obs (53% censoring) CLAD computation algorithms

Log incidence delayed deliveries Log incidence of delayed payments predicted sales food sector dummy Wood and furniture sector dummy Textile sector dummy Metal sector dummy Leather sector dummy Dummy for 1994 Dummy for 1995 Constant

|error| a

Alternative estimators

MIP Ltd

MIP Ltd BRC

ILP puba

BRC

mBRC

TA

CH LT

CH 3ST

POR

PH

TOB

0.04 (0.01) 0.03 (0.01) 0.00 (0.00) −0.11 (0.03) −0.07 (0.02) −0.17 (0.72) −0.04 (0.01) −0.09 (1.05) −0.02 (0.01) −0.35 (1.02) −0.12 (0.04)

0.04 (0.01) 0.03 (0.01) 0.00 (0.00) −0.11 (0.05) −0.07 (0.03) −0.17 (0.08) −0.04 (0.03) −0.09 (0.14) −0.02 (0.02) −0.35 (0.16) −0.12 (0.05)

0.01 (0.01) 0.01 (0.01) 0.00 (0.00) −0.03 (0.02) −0.02 (0.02) −0.02 (0.03) 0.00 (0.02) 0.00 (0.05) 0.00 (0.01) −0.01 (0.01) −0.03 (0.03)

0.02 (0.01) 0.02 (0.01) 0.00 (0.00) −0.07 (0.04) −0.05 (0.03) −0.11 (0.05) 0.00 (0.03) −0.06 (0.09) −0.01 (0.02) −0.01 (0.02) −0.08 (0.05)

0.04 (0.01) 0.03 (0.01) 0.00 (0.00) −0.10 (0.06) −0.08 (0.03) −0.16 (0.10) −0.04 (0.03) −0.07 (0.14) −0.01 (0.02) −0.17 (0.10) −0.13 (0.06)

0.04 (0.01) 0.03 (0.01) 0.00 (0.00) −0.11 (0.04) −0.07 (0.02) −0.16 (0.17) −0.04 (0.02) −0.09 (0.33) −1.71 (0.14) −0.35 (0.12) −0.12 (0.03)

4.31 (2.24) 2.84 (2.00) −1.92 (0.82) 8.07 (3.82) −9.52 (3.79) −1.27 (2.83) −3.03 (5.73) −13.15 (5.14) −2.88 (3.73) 10.64 (2.70) −4.28 (2.54)

0.01 [.00, .02] 0.01 [.00, .02] 0.00 [.00, .00] −0.02 [−.08, .01] −0.02 [−.06, .00] −0.05 [−.12, .01] 0.00 [−.04, .02] −0.03 [−.07,.10] 0.00 [−.03, .01] −0.01 [−.05, .01] 0.00 [−.05, .01]

0.01 (0.00) 0.01 (0.00) 0.00 (0.00) −0.01 (0.01) −0.02 (0.01) −0.01 (0.02) −0.02 (0.01) 0.00 (0.04) 0.00 (0.01) 0.00 (0.01) −0.02 (0.01)

0.01 (0.00) 0.01 (0.00) 0.00 (0.00) −0.01 (0.02) −0.02 (0.01) −0.01 (0.06) −0.02 (0.01) 0.00 (0.03) 0.00 (0.01) 0.00 (0.01) −0.02 (0.01)

0.03 (0.01) 0.02 (0.00) 0.00 (0.00) −0.05 (0.02) −0.05 (0.03) −0.07 (0.03) −0.07 (0.03) 0.01 (0.03) 0.01 (0.02) −0.02 (0.02) −0.07 (0.02)

21.21

21.21

21.81

21.54

21.26

21.21

23.66

22.24

22.11

22.11

22.72

These are the published estimates in Fafchamps et al. (2000). Other implementation details are as in Tables 1 and 2.

Y. Bilias, K. Florios and S. Skouras / Journal of Econometrics 212 (2019) 584–606

589

is equivalent to the following linear programming problem (2)–(4): min

N ∑

s.t.

(2)

yi − x′i β + smi − spi = 0,

i = 1, . . . , N

(3)

smi ≥ 0, spi ≥ 0,

i = 1, . . . , N

(4)

{β,(smi ,spi ;i=1,...,N)}

(smi + spi )

i=1

where the p elements of vector β are unrestricted in sign, and smi and spi are additional choice variables, interpreted as under- and over-prediction of yi , respectively. Next, following Nemhauser and Wolsey (1999, p.12), we note that the maximum of two variables, φ = max(c 1 , c 2 ) can be equivalently defined by two standard linear inequalities {φ ≥ c 1 AND φ ≥ c 2 } subject to the disjunctive constraint {φ ≤ c 1 OR φ ≤ c 2 }. The disjunctive constraint introduces a complication, but it can be approximated to exact accuracy with the following linear inequalities:

φ ≤ c1 + ω · (1 − γ ) φ ≤ c2 + ω · γ γ ∈ {0, 1} Here, γ is introduced as a new binary choice variable that needs to be chosen to ensure both inequalities are simultaneously satisfied. Meanwhile, ω is an arbitrary parameter that must be large enough to ensure a solution to the system of inequalities exists. In other words, the inequalities only represent the max operator exactly if ω is large enough, specifically if ω > |c 1 − c 2 |. Combining the two modeling devices above, the CLAD minimization problem (1) can be reformulated as the following MIP problem (5)–(12): min

N ∑

s.t.

(5)

yi − φi + smi − spi = 0,

i = 1, . . . , N

(6)

φi ≥ xi β,

i = 1, . . . , N

(7)

φi ≥ 0,

i = 1, . . . , N

(8)

φi ≤ x′i β + ωi · (1 − γi ),

i = 1, . . . , N

(9)

φi ≤ 0 + ωi · γi ,

i = 1, . . . , N

(10)

smi ≥ 0, spi ≥ 0,

i = 1, . . . , N

(11)

γi ∈ {0, 1},

i = 1, . . . , N

(12)

{β,(γi ,φi ,smi ,spi ;i=1,...,N)}

(smi + spi )

i=1



The auxiliary choice variables smi and spi are the under-achievement and over-achievement positive slack variables for the CLAD model, analogously to the uncensored LAD model. The positive variable φi is equal to max(x′i β, 0). The optimization is with respect to binary choice variables γi and continuous choice variables smi , spi , φi and β .4 It is because we mix continuous and integer decision variables in a linear objective with linear constraints, that this optimization is properly referred to as a linear Mixed Integer Program. A key reason for the efficiency of MIP solution algorithms is that they relax the constraint (12), so that the γi ’s are continuous in [0, 1] and the objective becomes a standard linear programming problem. They then solve these easy problems, as in the uncensored LAD case to quickly obtain lower bounds on the actual objective at feasible γi ’s. These lower bounds typically prove that certain regions of the parameter space need not be searched since they contain parameter values which will be suboptimal relative to other parameters that have already been evaluated, effectively reducing very significantly the searchable parameter space. For example, as we shall see in the context of our empirical application, the MIP approach involves solving approximately 0.03%, i.e., several orders of magnitude fewer linear programs than the linear systems of equations required by a full enumeration IPOL approach. Cutting-edge MIP solvers (we use CPLEX) incorporate various tricks, some of which may be proprietary, that have been found to be effective for choosing how to structure the search over parameters, including starting values and search direction (though without formal proof of any specific optimality properties). To provide more specifics, in Appendix A we discuss how the MIP formulation of CLAD would be solved with a popular ‘Branch-and-Bound’ solution algorithm (Floudas, 1995, pp. 101–103, Papadimitriou and Steiglitz, 1998, p. 438) and illustrate the solution algorithm in the context of a specific small data sample. 4 In our applications, we used large values of ω which ensure accurate representation of the max operator with our linear inequality disjunctive i ∑p constraints at the cost of greater computational burden. More specifically, to ensure full accuracy we suggest setting ωi = j=1 dj · |xij | where dj is larger than the absolute value of a plausible bound on the range of βj , j = 1, . . . , p. In our empirical analysis of Table 1 we set dj = 15 in a normalized sample for which all x’s are standardized to have zero mean and unit variance.

590

Y. Bilias, K. Florios and S. Skouras / Journal of Econometrics 212 (2019) 584–606

We note that exact MIP solution algorithms, typically update a potential global optimum until it is proved that no further improvement is possible, thereby proving that the potential global optimum is indeed the actual global optimum. It is usually much faster to find a potential global optimum that is indeed the global optimum than to prove that this is indeed a global optimum. This distinction is worth emphasizing as it may be unfamiliar: MIP solvers implement a search for a global optimum, which at each step keeps track of the ‘currently best’ solution among all those searched (a potential global optimum). At any non-final step, this potential optimum may or may not be the actual optimum. While stepping through a solution algorithm, a potential optimum is eventually found that is the actual optimum, but many additional computations are typically required to eliminate the possibility that there exists another better solution and therefore to prove that the potential optimum is the actual optimum. For any MIP optimization, it is sensible to split the time it took to obtain an actual optimum between the time it took for the algorithm to identify the actual optimum as a potential optimum and the additional time it took to prove that this potential optimum was indeed the actual optimum. In practice, as we show in Section 4.1 and its Table 7, we have found that for difficult computation problems the second component is typically large relative to the first. This motivates us to introduce an approximate computation algorithm for CLAD which involves using a MIP solver constrained with a limited time budget but augmented with a BRCENS optimization with starting values given by the potential solution provided by the MIP solver when its time budget is exhausted.5 Because MIP approaches the global optimum quickly (even if it takes much longer to prove optimality), the potential optimum after a limited amount of computation time may be at or very close to the actual optimum. Augmenting it with a BRCENS optimization is sensible because MIP solution algorithms are not designed to deliver intermediate solutions that are local optima while they search for the global optimum. Therefore, any intermediate solution may be easily improvable with BRCENS, which will deliver the local optimum in the region of the intermediate solution, and will do so very quickly. At the very least, this BRCENS augmentation guarantees actual optimality in a neighborhood of the solution obtained. Our model can be extended very easily to the cases with non-zero and possibly varying across observations censoring point (as is the case with some of the simulated data used in Section 4.2). This simply involves replacing the zero in restrictions (8) and (10) above with the observation specific censoring point. In Appendix B, we show how to apply our MIP approach to two other important cases: the median estimation with doubly censored data and the censored quantile regression estimators. 2.2. Comparison with approximate computation algorithms and alternative estimators In the sections that follow, we will evaluate the usefulness of our MIP formulation of the CLAD objective, primarily as a way of obtaining an exact computation algorithm for CLAD and secondarily, as an approximate computation algorithm, when a binding time budget is available. We will therefore report parameter estimates and various performance measurements in the context of simulations for CLAD estimates obtained with our indefinite time budget exact MIP approach and with a limited time budget MIP approach augmented with a BRCENS post-stop optimization, as motivated in the previous subsection. Where this is informative and computationally feasible, we also report results from IPOL computations. For comparison purposes, we also report results obtained with various approximate computation algorithms. Specifically, we use standard widely available software implementations of ILPA and BRCENS (with no modification of default parameters), a threshold accepting (TA) algorithm in the spirit of Fitzenberger and Winker (2007) and a ‘warm start’ multiple BRCENS implementation, whereby we re-run BRCENS from multiple plausible starting points and choose the solution with least absolute deviations across all runs. The starting values we use are motivated by a referee’s observation that Tobit or censored or uncensored quantile regressions in the less censored part of the data can provide useful guidance for the region where a CLAD optimum might be found. Our algorithm is implemented using the crq function of the quantreg package for R (Koenker, 2012), with the following 39 starting values: (i) Tobit parameters, (ii) censored quantile regression estimates for each quantile in {0.05, 0.10, . . . , 0.90, 0.95} calculated using BRCENS and, (iii) uncensored quantile regression estimates for each quantile in {0.05, 0.10, . . . , 0.90, 0.95} with the intercept of each of these properly adjusted so that the residuals of the adjusted model have zero median.6 In addition, we report results obtained from alternative estimators such as Tobit and some recent estimators for censored regression median, including the Laplace type (LT) estimator of Chernozhukov and Hong (2003) with a CLAD criterion function, the three-step estimator of Chernozhukov and Hong (2002), the estimators of Portnoy (2003) and Peng and Huang (2008) which are regression generalizations of Kaplan–Meier and Aalen cumulative distribution function estimators respectively. Comparisons with alternative estimators are interesting as it is tempting to interpret them as approximate computation algorithms for CLAD (because convergence to the same parameters as CLAD is expected asymptotically), but also because it is interesting in itself to understand the behavior of CLAD estimates relative to other estimation methods and whether this is related to the performance of CLAD optimization algorithms. We collect a full list of the twelve computation algorithms and estimators used, together with their implementation details in Appendix C. 5 We thank an anonymous referee for suggesting this algorithm. 6 We thank an anonymous referee for suggesting an approximate optimization algorithm in this spirit.

Y. Bilias, K. Florios and S. Skouras / Journal of Econometrics 212 (2019) 584–606

591

3. Empirical applications We next apply our MIP approach to CLAD estimation in the context of three empirical applications. The first two involve revisiting the analysis of the two datasets on extramarital affairs from Fair (1978), in both of which the dependent variable is heavily censored. For the first, Psychology Today (PT) dataset on extramarital affairs, we present exact CLAD estimates using MIP. This dataset is moderately sized and we are able to both compute exact estimates and prove their exactness in a reasonable amount of time with our method and furthermore, we can show that the BCRENS approximate optimization algorithm delivers wrong and highly problematic estimates. The second, Redbook (RB) dataset, provides a challenging setting in which our MIP approach cannot deliver an exact solution within a reasonable amount of time, but in which our limited time budget MIP approach can be compared to other computation algorithms and estimators. The third empirical application relates to the analysis of African manufacturing in Fafchamps et al. (2000), in the context of which we report results for three related datasets. This data has features similar to the RB data in that the data is heavily censored and our MIP approach cannot prove exact optimality in a reasonable amount of time, but it is an especially interesting application because here we can compare our estimates to published estimates as a way to gauge the value added of our approach (it turns out our parameter estimates are economically different and yield a value of the CLAD objective that is much smaller than published estimates). 3.1. MIP analysis of extramarital affairs (Fair, 1978) 3.1.1. Psychology Today (PT) data: Exact computation of CLAD with MIP We test MIP-based CLAD estimation in the PT data including only the five predictors which are statistically significant in Fair’s Tobit regressions. As discussed in the introduction, the only known alternative to MIP-based exact CLAD computation is the IPOL algorithm. In this problem, with N = 601 observations and p = 5 regressors, the IPOL algorithm requires the solution of more than 642 billion 5 × 5 linear systems of equations. This is clearly impractical for most purposes — e.g., on an X86-64 AMD Opteron 265 CPU running Fortran compiled with Intel Fortran 10.1, we found it would take approximately 2252 h (≈ 94 days) to complete on a single thread. By contrast, for MIP to obtain and prove global optimality, only 192,291,263 linear programs were solved which required only approximately 81 h on a single thread. Moreover, returning to the distinction of Section 2 between the time required to identify the actual global optimum as a potential optimum and the additional time it took to prove that this was indeed the actual optimum, the former took only 15.75 s and involved the solution of only approximately 10,000 linear programs. In other words, the computation of the global optimum is very fast and while proof that it is the global optimum can be more expensive, it is still very significantly cheaper than IPOL.7 Our results are given in Table 1. Evidently, our exact MIP, MIP with limited time budget (MIPLtd), and BRCENS with MIPLtd as starting values (MIPLtdBRC) estimates coincide as does threshold accepting (TA), while ILPA and BRCENS with default settings do not provide a sensible solution for this data with default parameters and standard software. The multiple starting value BRCENS (mBRC) algorithm provides economically sensible parameter estimates which are however far from being global minimizers. In all cases, there is agreement in the sign of predictors’ coefficients. However, the exact estimates and therefore economic interpretation can differ substantially from the approximate optimization algorithm estimates. Moreover, there are differences in the standard errors (estimated using the resampling method of Bilias et al. (2000), the validity of which requires that the CLAD estimate is computed exactly) and consequently different conclusions are reached about the statistical significance of each variable.8 An econometrician using multiple BRCENS would erroneously conclude that all explanatory variables are statistically insignificant and would arrive at point estimates that are economically very different to the exact estimates. For example, exact MIP computation suggests that each year of age reduces the median number of extramarital intercourses in the last year by three with a standard error of 0.74, whereas multiple BRCENS calculates the reduction as 0.45 intercourses with a standard error of 0.44. For some parameters (years married and age), the multiple BRCENS calculated estimate is clearly outside the confidence interval based on the MIP estimate.9 The convergence of our MIP implementation as a function of CPU time is illustrated in Fig. 1. Comparing CLAD to alternative estimators, it is striking that only LT and Tobit have any statistically significant parameters, whereas the exact MIP CLAD estimates are all significant, so the choice of estimator seems critical to any conclusion drawn from this data. It is also worth noting that the LT estimates and the corresponding value of Powell’s CLAD objective function are reasonably close to those of the exact MIP CLAD estimator. 7 Using the newer CPLEX v12.6 solver this computation time of 81 h to prove optimality with CPLEX v12.2 has dropped to 5 min on the same hardware. 8 Problems associated with using BRCENS for bootstrap based standard errors are discussed in Fitzenberger (1997a, p.411). 9 In unreported analyses we conducted a regression using four regressors (dropping one) and found almost identical results (though global optimality was proved much faster, in 18 min 58 s). For this problem, the IPOL algorithm is also feasible, though much slower — on the same hardware, it delivered the identical result in 16 h 33 min 29 s.

592

Y. Bilias, K. Florios and S. Skouras / Journal of Econometrics 212 (2019) 584–606

Fig. 1. Convergence of the MIP approach applied to the Psychology Today data. The dashed line represents a lower bound on the CLAD objective and the solid line represents the minimized value of the CLAD objective as a function of CPU time. In total, 192,291,263 nodes of the Branch and Cut tree are solved and account for 291,519 CPU sec. The run was executed on an Intel core i5 processor running Windows 7 using GAMS CPLEX software.

3.1.2. Redbook (RB) data: Approximate computation of CLAD with time budget constrained MIP augmented with BRCENS While our approach greatly expands the scope for exact CLAD estimates beyond what is practically feasible with IPOL, there are still applications in which it will be impractical to let our MIP approach run until it delivers a final estimate. In this section, we analyze RB data (N = 6366, p = 9). We report the performance of the MIP algorithm after 28,800 s, which, for this empirical application, is far from sufficient to prove that the potential optimum obtained in this time is also actually optimal. Therefore, the most relevant estimates from our MIP approach are those solved with a limited time budget augmented with BRCENS (MIPLtdBRC). The analysis is reported in Table 2 and includes the same comparison estimates of Table 1. In terms of the improvements offered by BRCENS after running MIP with a limited time budget, we find that the value of the CLAD objective function becomes significantly smaller and the parameters become substantially different, in one case switching sign (children). As was the case in the previous subsection, ILPA and BRCENS blindly applied with default values of standard software fail to produce sensible results, but the multiple start BRCENS and the TA algorithms produce estimates very similar to the ones we propose with our MIPLtdBRC approach. As in the previous subsection, alternative estimators in some cases deliver very different economic conclusions to those derived from CLAD estimates, though the (Chernozhukov and Hong, 2003) LT estimator comes very close to our best MIP estimates. 3.2. MIP analysis of manufacturing in Zimbabwe (Fafchamps et al., 2000): Comparison to published estimates To further examine whether MIP computation is likely to lead researchers to more reliable conclusions, we re-estimated six regressions presented in Fafchamps et al. (2000) in order to compare our exact results with their results based on ILPA computations. They study contractual risk, short-term financing and inventories of manufacturing firms in Zimbabwe using a panel dataset of annual observations on 203 firms for the years 1991–1995. They present (pooled) CLAD estimates computed by ILPA. We focus on the contractual risk and short-term financing regressions which involve straightforward CLAD estimations (we do not attempt to replicate their CLAD version of 2SLS). While these regressions have too many parameters and observations for us to prove global optimality in a reasonable amount of time, we obtained a potential global optimum within 180 min that did not change after an additional 24 h of further computations towards proving optimality. In several cases our MIP estimates were very different to the published estimates and in all cases where there were differences the MIP computation delivered significantly better objective function values. In the next section we present simulations in which we find that potential optima that do not change after several hours of MIP calculations are almost always global optima. This provides a practical justification of our approach, even when – as is the case for the (Fafchamps et al., 2000) data – proof of global optimality is infeasible. Our results on ‘supplier credit’, ‘overdraft ceiling’ and ‘delayed payment’ regressions are very similar to the published ones and we do not report them. On the other hand, MIP based estimates for the regressions on ‘delayed deliveries by suppliers’, ‘overdraft reserves’ and ‘overdraft credit’, are very different from the published ones and are reported in Tables 3–5.

Y. Bilias, K. Florios and S. Skouras / Journal of Econometrics 212 (2019) 584–606

593

Table 6 Number of objective function evaluations required by the IPOL algorithm to find the CLAD estimates with sample size N and p regressors. Where these exceed one billion, we report approximate numbers, accurate within an order of magnitude. N

Number of regressors p 3

4

5

6

100 250 500

161,700 2,573,000 20,708,500

3,921,225 158,882,750 109

75,287,520 1010 1011

109 1011 1013

The economic interpretation of the results varies significantly when we use the MIP approach relative to the published estimates. For example, our MIP estimates reveal that delivery delays to small towns are significantly smaller and that overdraft credit tends to be smaller in the textile sector compared to any other sector. Confidence intervals of published CLAD estimates are also very misleading as was the case in the PT data discussed earlier. As with the previous applications, interpretation of the regression analysis can vary depending on the estimator used. While the TA and multiple BRCENS algorithms are sometimes reasonable approximations to the estimates obtained with our proposed MIPLtdBRC approach, this is not the case in the overdraft reserves analysis of Table 4. 3.3. Key conclusions from empirical analysis In sum, our empirical analysis suggests that using standard approximate optimization algorithms (multistart BRCENS, or simple BRCENS or ILPA) in CLAD optimization can lead to spurious conclusions and cannot be trusted; even published empirical studies reporting estimates computed with such approximate optimization algorithms should be treated with great caution and may deliver significantly different conclusions if repeated with exact optimization methods. Competing estimators for censored regression, developed partly as a response to difficulties involved in the computation of the CLAD estimator and partly from the need to accommodate different types of data, deliver significantly different estimates. In the next section we contribute to the study of finite statistical properties of the several alternative estimators offering guidance to the empirical analyst. We also found that where exact optimization is not feasible even with our MIP approach, a MIP calculation with a limited time budget augmented with a BRCENS optimization typically delivers an excellent approximate optimization algorithm. Our analysis suggests that TA in many cases quickly gives values for parameter estimates and for the objective function which are similar to those of MIP based approaches, though TA has the limitation that it can never prove exact optimality of any solution it delivers. In the next section, we report a simulation analysis which provides more comprehensive evidence on the range of applications in which our MIP approach is likely to be useful. 4. Simulation analysis Having shown in the context of significant empirical applications that commonly used approximate optimization algorithms can lead to CLAD estimates that are very misleading, we now turn to an analysis aiming to reveal the range of problems in which our MIP approach is useful. In the first subsection we study the range of applications in which exact MIP is feasible and significantly faster than IPOL. In the second subsection, we investigate the accuracy of our MIP approach when there are realistic bounds on the amount of computation time available to find a solution. 4.1. Performance evaluation of exact MIP in samples drawn from the Psychology Today dataset We now investigate the range of feasibility of exact MIP for various sample sizes and various numbers of regressors in the context of a real world dataset. Specifically, we generate ten samples of N randomly drawn observations, without repositioning, from PT dataset. We repeat this for N = 100, 250 and 500 and use p = 3, 4, 5, 6 regressors (as a sixth predictor beyond those used in Table 1 we included Occupation), generating a total of 10 × 3 × 4 = 120 datasets. Table 6 displays the number of linear systems of equations the IPOL algorithm must solve in each parameterization we consider. The table confirms that IPOL computations become much slower than MIP in problems of even moderate size, such as applications with 250 observations and four regressors. In Table 7, we report statistics associated with the computation time required for our optimizations. We emphasize three effects we believe are particularly striking. First and foremost, average MIP performance scales much better than IPOL with problem size, reflecting also the conclusions drawn from Table 6. In small estimation problems MIP and IPOL are both viable alternatives — IPOL may even be a little faster, though the difference is of no practical significance. More importantly, IPOL scales very badly with the size of the estimation problem, becoming impractical for most purposes in large problems. For example, when N = 500 and p = 3, IPOL is 100 times slower than MIP and when N = 100 and p = 6, IPOL is 1000 times slower than MIP.

594

Y. Bilias, K. Florios and S. Skouras / Journal of Econometrics 212 (2019) 584–606 Table 7 CPU time in seconds required by IPOL and MIP algorithms in estimation problems with simulated samples of size N from PT survey data and p regressors. We report both time to find the global optimum (t1 ) and the total time required to prove this optimum is indeed global (t1 + t2 ), which requires additional time after finding the optimum (t2 ). The shaded lower right part of the table reports results for parameter combinations where t2 is too large and proof of optimality is infeasible. In this part, for IPOL t1 is the time to find the best optimum among all those considered within a time budget limited to three hours; for MIP we report the time it took to find a best optimum with the same or better performance than the three-hour IPOL best optimum. Hardware and software details are given in Appendix C.

Second, the time MIP spends on proving a known potential optimum is the actual optimum is much more than the time it takes to find this potential optimum. To see this, note that, for instance, in the case of N = 500, p = 4, the average time to find the optimum (measured by t1 ) was 198 s, while it took 38 (≈ 7531/198) times longer to also prove (the additional time to achieve this proof is measured by t2 ) that this potential optimum was indeed the actual optimum. By contrast, IPOL took 15,928 s to find the optimum on average and 45,387 s to prove this was the optimum (‘only’ three times longer).10 This suggests that even when the econometrician’s computation budget is insufficient to prove optimality of a MIP estimate, he is nevertheless likely to obtain the optimum even if he cannot afford to prove that the estimate is indeed optimal. For very large problems, we therefore suggest running MIP optimizations until the potential optimum has not changed for a couple of hours. In our moderately sized simulations, we confirmed that this rule will indeed almost always deliver a global optimum within a few hours. Third, MIP performance can vary significantly among datasets, whereas IPOL performance is practically unchanged across estimation problems of the same size (the volatility of the computation time over the average computation time is much smaller for IPOL than MIP). This highlights the fact that the MIP algorithm exploits sample-specific features of each optimization to avoid unnecessary computations, whereas IPOL requires computations the number of which is in general roughly invariant across problems of the same size (computations are data dependent because as IPOL iterates

10 Beyond looking at averages across simulations, we also verified that the time-for-proof is far larger than time-for-optimum in individual runs as well, for moderately sized samples.

Y. Bilias, K. Florios and S. Skouras / Journal of Econometrics 212 (2019) 584–606

595

Table 8 Computation failure and estimator failure. The fraction of samples for which a computation algorithm or an estimator did not deliver a solution — due to computational intractability or to the nature of the sample. CLAD computation algorithmsa IPOL

MIP Ltd

MIP Ltd BRC

Alternative estimatorsa ILP

BRC

mBRC

TA

CH LT

CH 3ST

POR

PH

TOB

Panel A: Fitzenberger and Winker (2007) DGPs k=2 All C =1 C =0 k=3 All C =1 C =0 k=4 All C =1 C =0

0.75% 0.00% 2.20%

0.00% 0.00% 0.00%

0.00% 0.00% 0.00%

0.00% 0.00% 0.00%

0.00% 0.00% 0.00%

0.00% 0.00% 0.00%

0.00% 0.00% 0.00%

0.00% 0.00% 0.00%

22.64% 35.59% 11.61%

4.20% 0.00% 12.60%

3.37% 0.00% 10.10%

0.00% 0.00% 0.00%

0.04% 0.00% 0.13%

0.00% 0.00% 0.00%

0.00% 0.00% 0.00%

0.00% 0.00% 0.00%

0.00% 0.00% 0.00%

0.00% 0.00% 0.00%

0.00% 0.00% 0.00%

0.00% 0.00% 0.00%

14.35% 26.78% 4.55%

3.79% 0.00% 11.34%

2.76% 0.00% 8.28%

0.00% 0.00% 0.00%

0.00% 0.00% 0.00%

0.00% 0.00% 0.00%

0.00% 0.00% 0.00%

0.00% 0.00% 0.00%

0.00% 0.00% 0.00%

0.00% 0.00% 0.00%

0.00% 0.00% 0.00%

0.00% 0.00% 0.00%

10.38% 22.00% 1.75%

2.67% 0.00% 8.00%

2.08% 0.00% 6.25%

0.00% 0.00% 0.00%

0.00% 0.00% 0.00% 0.00%

0.00% 0.00% 0.00% 0.00%

0.00% 0.02% 0.00% 0.00%

0.00% 0.00% 0.00% 0.00%

0.00% 0.00% 0.00% 0.00%

0.00% 0.00% 0.00% 0.00%

0.04% 0.00% 0.00% 0.00%

8.26% 11.45% 19.45% 18.50%

25.63% 13.45% 6.93% 3.06%

0.00% 0.00% 0.00% 0.00%

0.00% 0.00%

0.00% 0.00%

0.00% 0.00%

0.00% 0.00%

0.00% 0.00%

0.00% 0.00%

8.00% 45.00%

100.00% 100.00%

0.00% 0.00%

Panel B: Khan and Powell (2001) DGPs N N N N

= 50 = 100 = 200 = 400

0.00% 14.29%b 0.00% 0.00%

0.00% 0.00% 0.00% 0.00%

Panel C: Chernozhukov and Hong (2003) DGP N = 400 N = 1600

100.00% 100.00%

0.00% 0.00%

0.00% 0.00%

a The acronyms of algorithms and estimators, DGPs, and implementation details are explained in Appendix C. k = number of regressors, N = sample size, C = degree of censoring. b The 14.29% reflects the infeasibility of IPOL for the seventh DGP with k = 5 regressors; this GDP is not included in N = 50, 200, 400 cases.

all C (N , p) combinations of data points, a good implementation will skip calculating a regression line when it encounters a combination for which the matrix of covariate vectors does not have full rank).11 4.2. Performance evaluation in a simulated data estimation problem We next turn our attention to an analysis of realistically limited time budget MIP estimates in the context of simulations from known DGPs that have been used in prior CLAD literature. We aim to investigate the role of factors such as degree of censoring, heteroskedasticity, sample sizes and the number of estimated parameters on both the computational performance of our MIP approach as well as the statistical performance of exact CLAD estimates. Our simulation analysis spans many simulation designs appearing in Fitzenberger and Winker (2007), Chernozhukov and Hong (2003) and Khan and Powell (2001). More specifically, we use the 72 DGPs described in Fitzenberger and Winker (2007, p. 96, Table 1), which includes eight regression models with three levels of censoring and two, three or four regressors. As in their study, all samples have 100 observations. These DGPs are especially useful for exploring the effect of different levels of censoring and while our results are averaged across regression models, we provide some additional results for each regression model in our Online Appendix. Using the simulation design of Khan and Powell (2001) we study the effect of exponential heteroskedasticity averaging across seven regression models as sample size varies, while using the design of Chernozhukov and Hong (2003), we study the effect of multiplicative heteroskedasticity in a regression model with four regressors as sample size varies in the context of a single DGP. We summarize the details of the simulation designs we have borrowed in Appendix C. Note that for various reasons, computation algorithms do not produce a solution for all samples and all estimators are not defined for all samples. In Table 8 we report the fraction of samples for which each algorithm or estimator delivered results. The analysis of each algorithm or estimator is based on samples for which there are estimates, penalizing variations like our MIP approach which always delivers a solution. It is worth emphasizing that some estimators considered to be strong competitors to CLAD, in fact are not even defined for many of the samples from the simulation designs we have borrowed from the literature. For a given dataset, the empirical researcher may have no option for regression median other than CLAD (or the LT estimator with a CLAD criterion function). 11 Since generating these results, we have found that the latest version of 12.6 CPLEX solver generates massive improvements relative to our CPLEX 12.2 analysis. For example, we found that some estimation problems that took several days can now be computed in less than five minutes.

596

Y. Bilias, K. Florios and S. Skouras / Journal of Econometrics 212 (2019) 584–606

Table 9 Global optimization. The percentage of samples for which the CLAD optimum was calculated within a relative tolerance of 0.1% on the objective function value by CLAD computation algorithms and alternative estimators. The true optimum is taken to be that of IPOL, in all cases except Panel C where IPOL is intractable, in which case we use the estimate with the lowest absolute error across all computation algorithms and alternative estimators as if it were the global optimum CLAD estimate. CLAD computation algorithmsa IPOL

MIP Ltdb

MIP Ltd BRC

Alternative estimatorsa ILP

BRC

mBRC

TA

CH LT

CH 3ST

POR

PH

TOB

Panel A: Fitzenberger and Winker (2007) DGPs. k=2 All C =1 C =0 k=3 All C =1 C =0 k=4 All C =1 C =0

100.00% 100.00% 100.00%

99.89% 100.00% 99.67%

99.97% 100.00% 99.92%

76.28% 90.54% 55.73%

90.06% 97.20% 78.59%

98.63% 100.00% 95.93%

98.57% 100.00% 95.79%

99.38% 100.00% 98.16%

5.94% 3.18% 10.41%

33.27% 47.24% 17.23%

46.46% 61.96% 26.90%

5.31% 7.58% 3.06%

100.00% 100.00% 100.00%

96.36% 100.00% 90.11%

96.97% 100.00% 91.48%

70.99% 89.88% 46.48%

82.26% 95.70% 63.35%

93.83% 99.99% 82.79%

95.17% 100.00% 86.52%

90.43% 100.00% 77.50%

11.44% 9.64% 13.97%

30.84% 44.45% 15.44%

47.25% 65.51% 25.84%

1.19% 1.68% 0.70%

100.00% 100.00% 100.00%

96.00% 100.00% 89.50%

96.67% 100.00% 90.88%

63.25% 85.88% 37.88%

74.58% 91.38% 54.88%

92.00% 100.00% 78.25%

94.75% 100.00% 85.63%

83.71% 99.88% 64.75%

6.42% 4.25% 9.26%

23.67% 37.50% 9.88%

39.97% 58.00% 20.91%

0.00% 0.00% 0.00%

73.78% 76.14% 96.00% 98.94%

95.64% 88.14% 99.58% 100.00%

99.64% 96.98% 100.00% 100.00%

98.83% 98.75% 99.98% 100.00%

97.56% 98.29% 100.00% 100.00%

50.71% 55.37% 76.88% 85.06%

58.88% 55.45% 72.52% 76.25%

65.16% 66.13% 88.52% 95.52%

10.41% 16.71% 31.27% 39.89%

0.00% 0.00%

17.00% 13.00%

100.00% 100.00%

97.00% 100.00%

100.00% 100.00%

41.00% 96.00%

20.65% 85.45%

NA NA

0.00% 0.00%

Panel B: Khan and Powell (2001) DGPs N N N N

= 50 = 100 = 200 = 400

100.00% 100.00% 100.00% 100.00%

99.54% 98.80% 99.60% 91.12%

99.69% 99.54% 100.00% 99.96%

Panel C: Chernozhukov and Hong (2003) DGP N = 400 N = 1600

NA NA

100.00% 100.00%

100.00% 100.00%

a

The acronyms of algorithms and estimators, DGPs, and implementation details are explained in Appendix C. k = number of regressors, N = sample size, C = degree of censoring. b MIPLtd uses a baseline time budget of 200 s. Baseline values were re-calibrated in challenging DGPs, with MIPLtd being allowed up to 3600 s.

4.2.1. Computational performance Our results on the computational performance of the algorithms and estimators we study are reported in Tables 9 and 10. In Table 9 we show that, our MIP approach with a limited time budget augmented with a BRCENS run, outperforms all optimization algorithms in terms of frequency with which the optimum is found, though TA and multiple BRCENS are strong competitors (in that order in terms of accuracy), while simple ILPA and BRCENS implementations are not. Note that because under general conditions, LT estimates and exact CLAD estimates converge in probability to the same values, while at the same time MIP costs increase quite steeply with sample size, we would expect LT estimates to be especially competitive as approximations to CLAD estimates when samples are large. Indeed, Table 10, suggests that in settings with large sample sizes, such as the case with 1600 observations, MIP computation becomes slow and the LT estimator computed using MCMC may be a good way to approximate CLAD estimates, especially if the time budget is very limited. However, as the degree of censoring and the number of estimated parameters increase, the effectiveness of the LT estimator as a computation device decreases and this is in line with the mixed evidence that can be observed in the empirical analysis of Tables 1–5 where the LT approach delivers a value of CLAD objective function that is sometimes, but not always, similar to that of CLAD. In sum, the LT estimator produces estimates which in large sample settings can be sensibly interpreted as approximations to the CLAD estimate, a perhaps under-explored use of this estimator. The other estimators deliver poor values for the CLAD objective and therefore cannot be sensibly interpreted as approximations to the CLAD estimator. In Table 10, we study the average speed with which solutions are found. Heteroskedasticity does not seem to pose a serious challenge for the MIP approach, with the number of estimated parameters and sample size being the key determinants of speed. Not surprisingly, the degree of censoring also increases the time necessary to calculate CLAD estimates. The effects of censoring are also reflected in Table 9 where higher censoring is associated with more failures to find the global optima for all algorithms and in Table 8 which shows that Portnoy’s and Peng-Huang’s estimators cannot even be applied in many empirical datasets when there is high censoring.12 Finally, while we observed that TA and multistart BRCENS are close runners-up to our MIP approach in terms of accuracy in Table 9, Table 10 shows that in terms of speed, TA can in some cases offer a moderate improvement, while multistart BRCENS can offer very significant improvements. 12 This is not surprising: as in the case of a sample, in which censored observations above the largest uncensored observation cause the Kaplan–Meier estimator to become defective, excessive censoring in the regression setting prevents identification of the upper conditional quantiles.

Y. Bilias, K. Florios and S. Skouras / Journal of Econometrics 212 (2019) 584–606

597

Table 10 Computation Time. We report the CPU time (seconds) required to calculate various CLAD computation algorithms and alternative estimators across a range of simulation designs. CLAD computation algorithmsa IPOL

MIP Ltd

MIP Ltd BRC

Alternative estimatorsa ILP

BRC

mBRC

TA

CH LT

CH 3ST

POR

PH

TOB

Panel A: Fitzenberger and Winker (2007) DGPs k=2 All C =1 C =0 k=3 All C =1 C =0 k=4 All C =1 C =0

0.02 0.02 0.02

5.93 5.84 5.54

5.98 5.89 5.59

0.06 0.05 0.07

0.00 0.01 0.00

0.17 0.19 0.14

15.68 15.71 15.77

30.90 30.90 30.83

0.06 0.03 0.10

0.33 0.47 0.17

0.47 0.62 0.27

0.05 0.08 0.03

0.42 0.42 0.42

51.19 17.34 102.37

51.25 17.40 102.42

0.05 0.05 0.06

0.01 0.01 0.01

0.21 0.24 0.18

33.04 32.53 33.02

65.71 65.75 65.79

0.11 0.10 0.14

0.31 0.44 0.15

0.47 0.66 0.26

0.01 0.02 0.01

12.24 12.29 12.08

110.54 60.47 180.71

110.59 60.53 180.76

0.05 0.05 0.06

0.01 0.01 0.01

0.24 0.27 0.21

33.51 33.53 33.51

188.71 187.26 190.93

0.06 0.04 0.09

0.24 0.38 0.10

0.40 0.58 0.21

0.00 0.00 0.00

1.04 60.79 354.45 1220.21

0.03 0.03 0.04 0.05

0.00 0.00 0.01 0.02

0.11 0.15 0.23 0.61

15.21 17.93 15.54 16.05

26.32 30.18 31.32 36.22

0.51 0.55 0.77 0.85

0.59 0.55 0.73 0.76

0.65 0.66 0.89 0.96

0.10 0.17 0.31 0.40

0.01 0.04

0.02 0.20

0.27 5.06

35.95 42.78

53.00 548.00

0.41 0.96

0.21 0.85

NA NA

0.00 0.00

Panel B: Khan and Powell (2001) DGPs N N N N

= 50 = 100 = 200 = 400

0.00 0.02 0.09 0.72

1.02 60.77 354.43 1220.17

Panel C: Chernozhukov and Hong (2003) DGP N = 400 N = 1600

NA NA

7.67 2957.29

7.68 2957.90

a

The acronyms of algorithms and estimators, and DGPs are explained in Appendix C. k = number of regressors, N = sample size, C = degree of censoring. Implementation details are as in Tables 8 and 9.

4.2.2. Statistical performance In recent years, several authors have suggested that the CLAD estimator may not always perform well in a statistical sense, even if it is computed exactly (see e.g., Fitzenberger and Winker, 2007, Khan and Powell, 2001, Koenker, 2008, and Portnoy, 2010). We revisit this issue in the context of our simulation, in Tables 11–13 which report respectively the RMSE, the interquartile range (IQR) and the average bias of estimates for the first slope parameter of each simulated regression model. In our Online Appendix, we provide disaggregated results for each regression model used, to facilitate detailed comparisons with previous studies using these simulation designs. Our results are broadly consistent with the previous literature: Certain DGPs under heavy censoring can generate samples in which CLAD RMSE is alarmingly high. We also confirm the finding of Fitzenberger and Winker (2007) that this is primarily due to some very large outliers as this effect is very muted when the focus is on the interquartile range of absolute error across samples. Note also that several samples used by Fitzenberger and Winker (2007) are designed to be very difficult for CLAD in the sense that the corresponding DGPs (like their DGPs (A) and (B) in the cases with heavy censoring) challenge the CLAD identification assumptions (specifically Assumption R.1 of Powell, 1984). The detailed simulation results reported in our Online Appendix show, in general, good statistical performance of the exact CLAD estimator in terms of RMSE for moderate and light censoring. The degree of censoring seems to be a decisive factor for the sampling behavior of exact CLAD estimator. In cases of moderate censoring the RMSE of CLAD estimator computed by MIP with limited time budget augmented by BRCENS tracks the statistical behavior of exact CLAD reasonably well. Especially in the case of light censoring, this is true even for MIP with limited time budget without further augmentation. Our simulation evidence suggests that exact CLAD suffers from a small bias which increases in the case of heavy censoring and small sample size. We also note that relative to competing estimators, CLAD has the important advantage that it is defined for all samples, whereas some of the other estimators are not, leading to significant failure rates reported in Table 8. Approximate algorithms for CLAD in some cases have better or more stable statistical performance to exact CLAD (as reported also by Fitzenberger and Winker, 2007) and a superficial interpretation would suggest this is a reason to prefer approximate over exact CLAD. However, in samples for which exact and approximate CLAD algorithms disagree, estimates based on the latter do not have any tractable theoretical properties and inference based on them will lead to invalid conclusions. As observed also by Fitzenberger and Winker (2007) the relatively good performance of approximate algorithms like ILPA and BRCENS in simulations may be due to the fact that their built-in starting values happen to be close to the true parameters of the regression models used in the simulations.

598

Y. Bilias, K. Florios and S. Skouras / Journal of Econometrics 212 (2019) 584–606

Table 11 RMSE. RMSE of the first non-constant coefficient obtained for various CLAD computation algorithms and alternative estimators averaged over degrees of censoring and sample sizes. CLAD computation algorithmsa IPOL

MIP Ltd

Alternative estimatorsa

MIP Ltd BRC

ILP

BRC

mBRC

TA

CH LT

CH 3ST

POR

PH

TOB

Panel A: Fitzenberger and Winker (2007) DGPs k=2 All C =1 C =0 k=3 All C =1 C =0 k=4 All C =1 C =0

0.91 0.09 2.53

0.71 0.09 1.89

0.74 0.09 2.01

0.10 0.09 0.12

1.07 0.09 3.02

0.67 0.09 1.77

0.52 0.09 1.35

1.29 0.09 3.37

4.63 10.88 1.71

0.30 0.27 0.36

0.18 0.08 0.39

0.10 0.07 0.16

22.61 0.09 59.82

2.70 0.09 7.72

3.78 0.09 10.27

0.11 0.08 0.16

2.35 0.09 6.86

4.75 0.09 13.74

10.02 0.09 28.09

1.73 0.09 4.48

14.84 32.12 2.41

0.30 0.26 0.36

inf 0.08 inf

0.16 0.07 0.35

426.83 0.08 1265.18

1.87 0.08 5.38

3.17 0.08 8.36

0.09 0.08 0.11

1.67 0.08 4.82

2.06 0.08 5.99

8.76 0.08 25.70

1.14 0.08 3.03

49.82 10.38 0.18

0.29 0.25 0.35

0.14 0.08 0.25

0.29 0.06 0.75

3.64 2.10 0.40 5.87

0.51 0.41 0.25 0.19

8.99 1.64 0.31 0.23

9.24 2.81 0.40 0.23

3.94 18.35 0.31 0.23

2.78 1.70 0.40 0.23

0.82 0.58 0.36 0.28

0.75 0.65 0.47 0.40

0.47 0.49 0.22 0.16

1.40 0.52 0.17 0.11

3.00 3.00

2.71 2.80

0.43 0.14

0.42 0.14

0.43 0.14

0.47 0.15

0.65 0.37

NA NA

0.80 0.44

Panel B: Khan and Powell (2001) DGPs N N N N

= 50 = 100 = 200 = 400

12.48 3.09 0.40 0.23

2.56 1.18 0.91 3.38

Panel C: Chernozhukov and Hong (2003) DGP N = 400 N = 1600

NA NA

0.43 0.14

0.43 0.14

a

The acronyms of algorithms and estimators, and DGPs are explained in Appendix C. Implementation details are as in Tables 8 and 9. Detailed results for each DGP are given in our Online Appendix. Table 12 Interquartile range (IQR). IQR of the first non-constant coefficient obtained for various CLAD computation algorithms and alternative estimators averaged over degrees of censoring and sample sizes. CLAD computation algorithmsa IPOL

MIP Ltd

MIP Ltd BRC

Alternative estimatorsa ILP

BRC

mBRC

TA

CH LT

CH 3ST

POR

PH

TOB

Panel A: Fitzenberger and Winker (2007) DGPs k=2 All C =1 C =0 k=3 All C =1 C =0 k=4 All C =1 C =0

0.20 0.12 0.36

0.20 0.12 0.33

0.20 0.12 0.35

0.12 0.11 0.13

0.14 0.12 0.17

0.17 0.12 0.25

0.17 0.12 0.24

0.20 0.12 0.36

0.53 0.86 0.25

0.12 0.11 0.14

0.13 0.11 0.16

0.09 0.09 0.10

11.22 0.11 33.42

2.56 0.11 7.44

1.79 0.11 5.12

0.12 0.11 0.12

0.13 0.11 0.15

0.25 0.11 0.53

2.42 0.11 7.02

1.05 0.11 2.91

0.70 1.57 0.17

0.12 0.11 0.13

0.12 0.11 0.14

0.09 0.09 0.10

2.84 0.10 8.31

1.76 0.10 5.06

1.18 0.10 3.32

0.11 0.10 0.11

0.12 0.10 0.14

0.17 0.10 0.28

0.64 0.10 1.69

0.56 0.10 1.45

0.23 0.34 0.14

0.11 0.10 0.11

0.12 0.10 0.14

0.08 0.07 0.09

0.37 0.41 0.26 0.22

0.75 0.60 0.34 0.26

0.78 0.66 0.36 0.27

0.77 0.67 0.36 0.27

0.79 0.70 0.36 0.27

0.90 0.73 0.46 0.35

0.52 0.45 0.20 0.14

0.52 0.51 0.24 0.17

0.42 0.29 0.14 0.10

0.00 0.00

0.00 0.00

0.47 0.17

0.47 0.17

0.49 0.18

0.46 0.20

0.46 0.16

NA NA

0.88 0.68

Panel B: Khan and Powell (2001) DGPs N N N N

= 50 = 100 = 200 = 400

0.79 0.54 0.36 0.27

0.78 0.68 0.36 0.36

0.79 0.71 0.36 0.27

Panel C: Chernozhukov and Hong (2003) DGP N = 400 N = 1600

NA NA

0.47 0.17

0.47 0.17

a

The acronyms of algorithms and estimators, and DGPs are explained in Appendix C. Implementation details are as in Tables 8 and 9. Detailed results for each DGP are given in our Online Appendix.

Turning to the performance of competing estimators, it is worth noting that the statistical performance of LT estimator tracks that of exact CLAD, which is to be expected from viewing the LT estimator as an approximate optimization algorithm for CLAD. From the detailed simulation results (see our Online Appendix), it appears that the estimator of Peng and Huang

Y. Bilias, K. Florios and S. Skouras / Journal of Econometrics 212 (2019) 584–606

599

Table 13 Bias. Absolute Bias of the first non-constant coefficient obtained for various CLAD computation algorithms and alternative estimators averaged over degrees of censoring and sample sizes. CLAD computation algorithmsa IPOL

MIP Ltd

MIP Ltd BRC

Alternative estimatorsa ILP

BRC

mBRC

TA

CH LT

CH 3ST

POR

PH

TOB

Panel A: Fitzenberger and Winker (2007) DGPs k=2 All C =1 C =0 k=3 All C =1 C =0 k=4 All C =1 C =0

0.08 0.01 0.21

0.06 0.01 0.15

0.09 0.01 0.24

0.01 0.00 0.01

0.06 0.00 0.18

0.05 0.01 0.12

0.04 0.01 0.09

0.15 0.01 0.42

0.19 0.50 0.03

0.01 0.00 0.01

0.01 0.00 0.03

0.00 0.00 0.00

0.67 0.01 1.76

0.36 0.01 1.04

0.27 0.01 0.79

0.00 0.00 0.01

0.34 0.00 1.00

0.11 0.01 0.31

0.96 0.01 2.77

0.36 0.01 1.04

0.52 0.95 0.10

0.00 0.00 0.01

inf 0.00 inf

0.00 0.00 0.00

44.59 0.01 132.23

0.23 0.01 0.65

0.45 0.01 1.14

0.01 0.01 0.01

0.31 0.01 0.93

0.24 0.01 0.70

0.75 0.01 2.16

0.22 0.01 0.61

5.83 1.85 0.03

0.01 0.01 0.02

0.01 0.01 0.01

0.00 0.00 0.00

0.80 0.46 0.09 0.27

0.10 0.08 0.04 0.02

0.80 0.24 0.07 0.06

1.25 0.38 0.09 0.06

0.69 0.92 0.08 0.06

0.74 0.38 0.09 0.06

0.21 0.12 0.06 0.06

0.10 0.09 0.10 0.10

0.12 0.15 0.06 0.06

0.45 0.39 0.44 0.44

3.00 3.00

2.38 2.61

0.13 0.00

0.13 0.00

0.14 0.00

0.14 0.00

0.10 0.01

NaN NaN

0.91 0.79

Panel B: Khan and Powell (2001) DGPs N N N N

= 50 = 100 = 200 = 400

1.86 0.40 0.09 0.06

0.68 0.29 0.11 0.61

Panel C: Chernozhukov and Hong (2003) DGP N = 400 N = 1600

NA NA

0.13 0.00

0.13 0.00

a

The acronyms of algorithms and estimators, and DGPs are explained in Appendix C. Implementation details are as in Tables 8 and 9. Detailed results for each DGP are given in our Online Appendix.

(2008) has a slightly smaller RMSE than exact CLAD, while this is not the case for the estimator of Portnoy (2003) which in many DGPs exhibits larger RMSE. The three-step estimator of Chernozhukov and Hong (2002) also appears to have high RMSE, exceeding that of CLAD in some DGPs with medium or light censoring.13 4.3. Key conclusions from simulation analysis Summarizing, our MIP approach is a compelling alternative to IPOL, the leading approach to exact evaluation of CLAD estimators. While we should be wary of all CLAD estimates that are not based on exact computation, there will always be interesting applications in which even our MIP approach is not feasible, so we also suggest augmenting a limited time budget MIP approach with the BRCENS algorithm, as a new approximate optimization algorithm for estimation, which performs very well in that it very frequently finds the exact CLAD estimate and does so quickly. These results echo similar conclusions in the empirical section. Comparing exact CLAD estimation to competing estimators for censored regression, we suggest that the prudent applied econometrician would be well advised to use several estimators, as results can be sensitive to the choice of estimator and theory offers little guidance on how to choose among them. Based on statistical performance, exact CLAD is a good estimator to include in such an arsenal, especially when there is limited censoring or if competing estimators are not defined for a particular sample, but if CLAD estimates are implausibly extreme, they should be disregarded. 5. Concluding remarks The aim of this paper is to propose and evaluate a mixed integer programming (MIP) approach to exact computation of the censored least absolute deviations (CLAD) estimator. This MIP approach relies on disjunctive constraint modeling of the censoring (max) operator in the CLAD objective function, which is what complicates its optimization. Empirical and simulation results reveal that the frequently employed approximate optimization algorithms for computing CLAD estimates are not as reliable as researchers might have hoped. Our approach can allow researchers to prove they have obtained global optima in medium sized samples (with 100–500 observations and 2–5 parameters). For larger problems, we still recommend our approach is used — global optima will typically be obtained very quickly even if proof of optimality is too slow to be of practical use given the 13 The Tobit estimator has very good RMSE in most DGPs, which may be attributed to normally distributed errors, but as expected, it has relatively large bias in DGPs with heteroskedastic error.

600

Y. Bilias, K. Florios and S. Skouras / Journal of Econometrics 212 (2019) 584–606

current state of computing. Furthermore, even in estimation contexts that are ‘too large’, our approach can be augmented with a follow-up local optimization using the BRCENS algorithm to deliver effective approximate computation. For smaller problems, with just a very small number of parameters and observations, IPOL can also be used for global optimization, but such empirical applications are rare. The evidence from our empirical analysis of several datasets and our simulation studies support continued use of CLAD in the face of competition by several more recent estimators. We report sensitivity of empirical conclusions to estimator choice and offer some guidance on which estimators to put more faith in, depending on the degree of censoring, the presence of heteroskedasticity and the need for computational exactness. We note these results are relevant for a large body of empirical research using approximate optimization algorithms to compute CLAD estimates. We provide an open source code implementation of our computation approach online, which should be useful for most applications of CLAD estimators. Our MIP modeling approach can be used to compute several related estimators appearing in the literature of semiparametric limited dependent variable models, but we leave a study of the MIP approach in other estimation settings for future research. Acknowledgments We thank Hanghui Zhang for useful discussions and Marcel Fafchamps for providing the dataset used in Section 3.2 and Bernd Fitzenberger for sharing one of his datasets which we used to test our MIP algorithm. We thank Roger Koenker for numerous helpful discussions on the computation of several estimators implemented in his R package quantreg. We thank Peter Winker for providing his code for the TA algorithm, Enrico Schumann for help with his R package NMOF, and Blaise Melly for suggesting his R package Counterfactual. We benefited from comments of participants of the Boneyard Econometrics Conference 2015, at the University of Illinois at Urbana-Champaign. We thank two anonymous reviewers and an associate editor for comments which led to significant improvements in our paper. Kostas Florios acknowledges financial support from the AUEB Research Center and the ‘Action Two 2012–2013 Reinforcement of Post-Doctoral Researchers for the promotion of research’ program. Yannis Bilias acknowledges funding from the AUEB Research Center under the program for ‘Original Academic Publications’. Dr. Alexandros Louka provided excellent research assistance during the second revision. Appendix A. Solution of our MIP representation of the CLAD estimator using a Branch and Bound approach There exist several approaches to solving MIP exactly, but one of the most popular approaches, which have been carefully engineered into popular optimization software packages is the ‘Branch and Bound’ approach.14 Note that we refer to this as an ‘approach’ rather than as an algorithm, since there are various details which are implemented differently across software packages, without there being any universal results on how these details should be determined. Here we will discuss a vanilla implementation of the Branch and Bound algorithm, based on the presentation of Papadimitriou and Steiglitz (1998, pp. 438) and Floudas (1995, pp. 95–107). In the context of our problem, the Branch and Bound method is an approach to implement a computationally efficient optimization of the MIP program (5)–(12) by a sequential partitioning of the solution space, which follows a ‘branching’ logic, across the values of the N binary variables γi . Specifically, one begins with a binary tree representation of the MIP-CLAD problem with N levels (the first level corresponds to γ1 and so on), where each node represents a constraint on the original MIP-CLAD, in that it fixes a specific combination of the values of γ ’s at its own level and levels above it. At the bottom Nth level of the tree, there are 2N nodes which fully specify all the γ ’s. Note that a full enumeration of the CLAD estimator can be achieved by solving the 2N ‘easy’ linear programs (LPs) corresponding to each node at the bottom level of the tree and the Branch and Bound method achieves efficiency because it is able to determine that the optimal solution cannot be on some branches of the tree, hence in practice far less than 2N linear programs need to be solved, though there are no general results on how many will be required in any given application. Generic Branch and Bound approach to solving MIP We provide our own matlab implementation of this algorithm on https://github.com/kflorios/clad-estimator-mip-bnb. Following Floudas (1995, pp. 101–103), a generic Branch and Bound algorithm to solving our MIP formulation of CLAD can be described in six steps as follows. (To fix ideas, we specify various implementation details which are to some degree arbitrary, but mention in footnotes other approaches that are also widely used.) Step 1 — Initialization: The MIP program (5)–(12) is chosen as the current candidate subproblem (CS) and the unique element in a list of candidate subproblems (LCS). Denote the value of the ‘incumbent solution’ by ζ ∗ and initialize it at ζ ∗ = +∞; ζ ∗ will be compared to the objective function values of the subsequent CSs and updated accordingly.15 Create 14

Other approaches include cutting plane methods, decomposition methods and logic-based methods. 15 Performance can be improved by using prior knowledge on what constitutes a sensible upper bound on the CLAD objective. The threshold parameter may be initialized at such an upper bound.

Y. Bilias, K. Florios and S. Skouras / Journal of Econometrics 212 (2019) 584–606

601

a binary tree as described earlier, with levels sorted by the value of yi , so that the root node contains the observation with the largest yi . Step 2 — Termination: If LCS is empty, then terminate; the incumbent solution ζ ∗ is optimal.16 Step 3 — Selection of candidate subproblem: If the CS has a child in the LCS with γ = 1, set this as the new CS. Otherwise, if the CS has a child in the LCS with γ = 0, set this as the new CS.17 Otherwise, ‘backtrack’ towards the root node, until a node that is an element of the LCS is encountered that has a child that has not been evaluated, set this child as the new CS and remove all nodes that have been visited from the LCS.18 Step 4 — Relaxation: Define the relaxed candidate subproblem (RCS) as the CS in which the γi ’s that have not been fixed to either zero or one, but are choice variables in {0, 1}, are instead ‘relaxed’ to vary continuously in 0 ≤ γi ≤ 1. Step 5 — Fathoming: If the RCS is infeasible, so is the CS, therefore go to Step 2 and remove this CS from the LCS without adding any of its children to the LCS. Otherwise, if zRCS ≥ ζ ∗ , then the current CS has no feasible solution better than the incumbent, so go to Step 2 and remove this CS from the LCS without adding any of its children to the LCS. If zRCS ≤ ζ ∗ and the optimal solution of RCS is feasible for CS, then record this solution as the new incumbent, ζ ∗ = zRCS and go to step 2 and remove this CS from the LCS without adding any of its children to the LCS. Otherwise proceed.19 Step 6 — Separation: If the CS has children, add its children to the LCS. Go to step 2. Example For the sake of illustration, let us illustrate with a very simple example.20 Assume that we wish to compute the CLAD estimate in the model y = max(0, β0 + β1 x + ε ) with the sample {(xi , yi )}3i=1 = {(4, 3), (5, 5), (2, 0)}. The MIP formulation of the CLAD estimator as developed in (5)–(12) with ωi specified as in footnote 4 with parameter bounds set at d1 = d2 = 10 is: min{sm1 + sm2 + sm3 + sp1 + sp2 + sp3 }

s.t.

3 − φ1 + sm1 − sp1 = 0, 5 − φ2 + sm2 − sp2 = 0, 0 − φ3 + sm3 − sp3 = 0, φ1 ≥ β0 + 4β1 , φ2 ≥ β0 + 5β1 , φ3 ≥ β0 + 2β1 , φ1 ≥ 0, φ2 ≥ 0, φ3 ≥ 0, φ1 ≤ β0 + 4β1 + 50(1 − γ1 ), φ2 ≤ β0 + 5β1 + 60(1 − γ2 ), φ3 ≤ β0 + 2β1 + 30(1 − γ3 ), φ1 ≤ 0 + 50γ1 , φ2 ≤ 0 + 60γ2 , φ3 ≤ 0 + 30γ3 , −10 ≤ β0 ≤ 10, −10 ≤ β1 ≤ 10, sm1 ≥ 0, sm2 ≥ 0, sm3 ≥ 0, sp1 ≥ 0, sp2 ≥ 0, sp3 ≥ 0, γ1 , γ2 , γ3 ∈ {0, 1}.

⎫ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎬

(A.1)

⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎭

Applying the Steps outlined above, we have: Iteration 1: [Step 1] The binary tree is created as displayed in Fig. 2 and the LCS is initialized to include only the MIP formulation of CLAD (A.1). We set ζ ∗ = +∞. Iteration 2: [Step 2] Since the LCS is not empty, we proceed to Step 3. Iteration 3: [Step 3] The candidate subproblem is defined by the only element of the LCS, i.e., (A.1). Iteration 4: [Step 4] Solve the relaxed candidate subproblem defined by the above CS but with restrictions on γi ’s replaced by:

γ1 , γ2 , γ3 ∈ [0, 1]. This is a standard LP with solution [β0 , β1 , γ1 , γ2 , γ3 ] = [−10.00, −8.00, 0.1000, 0.0833, 0.0000] and objective zRCS = 0.000; LP1 in Fig. 2. Iteration 5: [Step 5] Since zRCS < ζ ∗ and the solution to the RCS is infeasible for the CS, we proceed to Step 6. Iteration 6: [Step 2] Since the LCS is not empty, we proceed to Step 3. Iteration 7: [Step 3] We set the new CS as the child of the root node with γ1 = 1. More formally, the CS is program (A.1) with:

γ1 = 1, and γ2 , γ3 ∈ {0, 1}. In words, γ1 is fixed at γ1 = 1 while γ2 and γ3 are free, depicted in Fig. 2 as node 1FF. 16 Note that termination in the case of our MIP problem is simplified by the fact that by construction, a solution to the MIP problem always exists, since the MIP is equivalent to the CLAD optimization, which always has a solution (not necessarily unique). 17 Obviously, the order with which child nodes are searched is arbitrary but needs to be specified. 18 19

This is an example of ‘LIFO search’ or ‘depth-first search with backtracking’. Other approaches include breadth-first and best-bound search.

This is how Branch and Bound achieves efficiency by avoiding complete enumeration — in this step there are three scenarios in which many potential solutions are simultaneously excluded and do not need to be enumerated. 20 Because of the simplicity of the example we do not employ sorting by the values of y.

602

Y. Bilias, K. Florios and S. Skouras / Journal of Econometrics 212 (2019) 584–606

Fig. 2. Branch and Bound technique for the Example (A.1) with depth first search (DFS) with backtracking–branching priority is first 1 then 0 in DFS (note: F denotes a free variable).

Iteration 8: [Step 4] Solve the RCS defined by the above CS but with γ2 , γ3 ∈ {0, 1} replaced by:

γ2 , γ3 ∈ [0, 1]. This is a standard LP with solution [β0 , β1 , γ1 , γ2 , γ3 ] = [−5.00, 2.00, 1.000, 0.0833, 0.0000] and objective zRCS = 0.000; LP2 in Fig. 2. Iteration 9: [Step 5] Since this value of γ2 is not 0 or 1, this solution is not a feasible solution to the above CS, so we go to Step 2. Iteration 10: [Step 2] Since the LCS is not empty, we proceed to Step 3. Iteration 11: [Step 3] We set the new CS as the child of node 1FF with γ = 1. The CS is now at node 11F and is defined as program (A.1) with

γ1 = 1, γ2 = 1, γ3 ∈ {0, 1}. Iteration 12: [Step 4] The RCS is now program (A.1) with

γ1 = 1, γ2 = 1, γ3 ∈ [0, 1] and is solved by [β0 , β1 , γ1 , γ2 , γ3 ] = [−5.00, 2.00, 1, 1, 0] with objective zRCS = 0.000; LP3 in Fig. 2. Iteration 13: [Step 5] Note that this RCS solution is feasible for the CS and is therefore optimal for it. Since zRCS < ζ ∗ , we set ζ ∗ = zRCS = 0 and set [−5.00, 2.00, 1, 1, 0] as the incumbent solution, and proceed to Step 2. Iteration 14: [Step 2] Since the LCS is not empty, we proceed to Step 3. Iteration 15: [Steps 3, 4] Backtracking leads us to node 10F, i.e., program (A.1) with γ1 = 1, γ2 = 0, γ3 ∈ [0, 1] and solution [β0 , β1 , γ1 , γ2 , γ3 ] = [0.00, 0.00, 1, 0, 0] with zRCS = 8.; LP4 in Fig. 2. Iteration 16: [Step 5] Since zRCS > ζ ∗ , we go to Step 2 without adding child nodes of 10F to the LCS. This is indicated by the bottom horizontal line beneath node 10F in Fig. 2. Iteration 17: [Step 2] Since the LCS is not empty, we proceed to Step 3. Iteration 18: [Steps 3, 4] Backtracking leads us to node 0FF, i.e., (program (A.1)) with γ1 = 0, γ2 ∈ [0, 1], γ3 ∈ [0, 1] and [β0 , β1 , γ1 , γ2 , γ3 ] = [−10.00, −8.00, 0, 0.0833, 0] with zRCS = 3; LP5 in Fig. 2. Iteration 19: [Step 5] Since zRCS > ζ ∗ , we go to Step 2 without adding child nodes of 0FF to the LCS. This is indicated by the bottom horizontal line beneath node 0FF in Fig. 2. Iteration 20: [Step 2] The LCS is empty, hence the CLAD estimate can be derived from the incumbent solution as [−5.00, 2.00]. This example demonstrates that Branch-and-Bound was able to find the solution by solving just 5 instead of 23 = 8 linear programs which a naive complete enumeration would require. As discussed in the main text, the MIP formulation benefits from comparing incumbent solutions during the search, with infeasible solutions that bound the values of several feasible parameters. When this bound is above an incumbent solution, the feasible parameters themselves need not be evaluated (such parameters are referred to as fathomed in the MIP literature and Table 14). This happened quite dramatically in Iteration 19, where the superiority of the incumbent solution over the relaxed objective eliminated the need to solve all the additional LPs of the nodes further down Iteration 19’s branch. Appendix B. MIP formulations for two other estimators In this Appendix, we present our MIP formulation for the Fitzenberger (1997a) model for doubly censored data and the Powell (1986) model for censored regression quantiles.

Y. Bilias, K. Florios and S. Skouras / Journal of Econometrics 212 (2019) 584–606

603

Table 14 Summary of trace for Branch and Bound technique with depth first search (DFS) with backtracking for Example 1 — branching priority is first 1 then 0 in DFS. Step no.

Node

[β0 , β1 , γ1 , γ2 , γ3 ]

Objective function value

Comment

0 (root) 1 2 3 4

F-F-F 1-F-F 1-1-F 1-0-F 0-F-F

[−10.00, −8.00, 0.100, 0.0833, 0] [−5.00, 2.00, 1, 0.0833, 0] [−5.00, 2.00, 1, 1, 0] [0.00, 0.00, 1, 0, 0] [−10.00, −8.00, 0, 0.0833, 0]

0 0 0.000* 8 3

Relaxed Relaxed Integer Fathomed Fathomed

B.1. MIP model for CLAD estimator with doubly censored data

Consider the case of linear regression model with doubly censored data where Li and Ui , the left and right censoring values for observation i respectively, are fixed and known in advanced: yi = max{min(x′i β + ui , Ui ), Li };

i = 1, . . . , N

The Doubly Censored LAD (DCLAD) estimator for the (p × 1) parameter β in this model is defined by the minimization min

N ∑

β

|yi − max{min(x′i β, Ui ), Li }|

(B.1)

i=1

For the MIP formulation of (B.1), we introduce the 2N continuous decision variables

φi(h) = min(x′i β, Ui ),

φi = max{min(x′i β, Ui ), Li },

i = 1, . . . , N .

The MIP optimization program is: min

N ∑

(h)

{β,(γi ,δi ,φi ,φi ,smi ,spi ;i=1,...,N)} i=1

yi − φi + smi − spi = 0,

φi ≥ φi(h) , φ i ≥ Li , φi ≤ φi(h) + ωi · γi , φi ≤ Li + ωi · (1 − γi ), φi(h) ≤ x′i β, φi(h) ≤ Ui , φi(h) + ωi · δi ≥ x′i β, φi(h) + ωi · (1 − δi ) ≥ Ui , γi + δi ≤ 1, smi ≥ 0, spi ≥ 0, γi , δi ∈ {0, 1},

(smi + spi ) s.t. i i i i i i i i i i i i

= 1, . . . , N = 1, . . . , N = 1, . . . , N = 1, . . . , N = 1, . . . , N = 1, . . . , N = 1, . . . , N = 1, . . . , N = 1, . . . , N = 1, . . . , N = 1, . . . , N = 1, . . . , N

It is easily verified that, (i) when γi = δi = 0 then φi = x′i β , (ii) when γi = 1 and δi = 0 then φi = Li , (iii) when γi = 0 and δi = 1 then φi = Ui , (iv) both γi = 1 and δi = 1 cannot hold due to constraint γi + δi ≤ 1.

B.2. MIP model for Censored Quantile Regression (CQR) estimator

Consider the case of the linear regression model with left censored data where Li , the left censoring value for observation i, is fixed and known in advanced. For given τ , Powell’s CQR estimator for the (p × 1) parameter β is defined by the minimization: min β

N ∑

ρτ (yi − max(x′i β, Li )),

i=1

where the check function ρτ (·) is defined as ρτ (z) = (τ − 1(z ≤ 0))z.

(B.2)

604

Y. Bilias, K. Florios and S. Skouras / Journal of Econometrics 212 (2019) 584–606

The CQR-MIP optimization program is: min

{β,(γi ,φi ,smi ,spi ;i=1,...,N)}

N ∑

((1 − τ ) · smi + τ · spi ) s.t.

i=1

yi − φi + smi − spi = 0, φi ≥ x′i β, φ i ≥ Li , φi ≤ x′i β + ωi · (1 − γi ), φi ≤ Li + ωi · γi , smi ≥ 0, spi ≥ 0, γi ∈ {0, 1},

i i i i i i i

= 1, . . . , N = 1, . . . , N = 1, . . . , N = 1, . . . , N = 1, . . . , N = 1, . . . , N = 1, . . . , N

Appendix C. Estimation details and simulations designs

C.1. Computation algorithms and estimators Algorithms for finding the minimizer of Powell (1984) CLAD objective function and alternative estimators are discussed in Section 2 and presented in the tables with the following acronyms: (i) MIP is our proposed algorithm with an indefinite time budget. (ii) MIPLtd is MIP with a limited time budget. (iii) BRC is the BRCENS algorithm for censored quantile regression of Fitzenberger (1994). (iv) MIPLtdBRC uses the MIPLtd solution as a starting point for BRCENS. (v) ILP is the iterative linear programming algorithm of Buchinsky (1994). (vi) mBRC is BRCENS with multiple starting points as described in Section 2.2. (vii) TA is a threshold accepting algorithm proposed by Fitzenberger and Winker (2007) for CLAD estimation. (viii) CHLT is the Laplace type estimator of Chernozhukov and Hong (2003). (ix) CH3ST is the three-step estimator of Chernozhukov and Hong (2002). (x) POR is the estimator of Portnoy (2003). (xi) PH is the estimator of Peng and Huang (2008). (xii) TOB is the Tobit estimator. C.2. Computation details All runs in Tables 8–13 were conducted on a computer with Intel Core 2 Quad CPU Q9550 at 2.83 GHz with 16 GB RAM running Windows Server 2008 R2 Enterprise 64 bit. MIP runs and IPOL runs in Table 7 were executed on a notebook with i5 processor at 2.40 GHz and 3 GB RAM running Windows 7. IPOL runs were executed using Intel Visual Fortran Compiler 11.1; MIP runs were executed using GAMS CPLEX software. No parallelization was used in any calculation. BRC, POR and PH use the respective implementations in the quantreg package v4.79 for R, with default parameters except that to obtain POR and PH estimates for the median, we modified the default grid to have a step size 0.01 in [0,1]. In the case of BRC the default starting value is the regression median estimate treating every observation as uncensored, which is often zero when there is heavy censoring. Implementation of TA used the function TAopt from NMOF package v1.1-0 for R (Schumann, 2017). CHLT estimates and standard errors are produced using Gallant and Tauchen’s EMM software package. Our baseline CHLT implementation uses default values everywhere and zero as the starting value, with Markov Chains that have 10 strides with 100,000 iterations per stride. In empirical applications, ILP and CH3ST estimates and confidence intervals were computed in STATA using the contributed commands clad and cqiv respectively; confidence intervals are based on bootstrap with 100 replications. In the simulation exercises ILP and CH3ST were coded in R to accommodate the case of variable censoring; CH3ST code was based on an extract from the counterfactual R package and used its default specifications. In the empirical applications, standard errors for all MIP-based CLAD calculations are based on our implementation of Bilias et al. (2000) resampling method with 100 bootstrap replications while POR and PH standard errors are computed using 1000 replications and default quantreg options.

Y. Bilias, K. Florios and S. Skouras / Journal of Econometrics 212 (2019) 584–606

605

Table 15 Data generating processes (DGPs) used in the simulation study of Fitzenberger and Winker (2007). DGP

yci

(β1 , β2 , β3 , β4 )

Regressors

(A) (B) (C) (D) (E) (F) (G) (H)

C C C + 0.5 C + 0.5 N(C , 1) N(C , 1) N(C + 0.5, 1) N(C + 0.5, 1)

(0, 0, 0, 0) (0, 0, 0, 0) (0.5, .0.5, −0.5, 0.5) (0.5, .0.5, −0.5, 0.5) (0, 0, 0, 0) (0, 0, 0, 0) (0.5, .0.5, −0.5, 0.5) (0.5, .0.5, −0.5, 0.5)

xi2 , xi3 ∼ N(0, 1) xi2 = −9.9 + 0.2i, xi3 xi2 , xi3 ∼ N(0, 1) xi2 = −9.9 + 0.2i, xi3 xi2 , xi3 ∼ N(0, 1) xi2 = −9.9 + 0.2i, xi3 xi2 , xi3 ∼ N(0, 1) xi2 = −9.9 + 0.2i, xi3

= x2i2 = x2i2 = x2i2 = x2i2

Notes: C takes values in {0, 0.5, 1}. In all designs xi4 ∼ i.i.d.N(0, 1) and ε ∼ N(0, 1).

C.3. Data generating processes (DGPs) In our simulations we adopted DGPs that have previously been used in three influential studies in the censored quantile regression literature. Fitzenberger and Winker (2007) DGPs: These DGPs are based on the right-censored regression model: yi = min{yci , β1 +

k ∑

βj xij + εi }.

(C.1)

j=2

The DGPs (A)–(H) described in Table 15 (which reproduces Table 1 from Fitzenberger and Winker, 2007) specify the parameters, the distributions of random variables, and censoring values in the above equation for k = 4. The choice of censoring parameter C determines the degree of censoring in the simulated samples. When C = 0 the approximate degree of censoring is 50%. For C = 0.5 and C = 1 the percentage of censored observations is approximately 35% and 25%, respectively. The DGPs for dimension k = 3 and k = 2 are derived by removing the extra parameters and regressors. By varying C and k together we simulate a total of 72 DGPs. Khan and Powell (2001) DGPs: Six of their DGPs are based on the left-censored regression model: yi = max{0, β1 + β2 xi + σi εi }.

(C.2)

The intercept β1 is adjusted in each case to generate √ a censoring level at 50% while β2 = 1 in all six DGPs. In the first three √ designs the single regressor xi follows a Uniform[− 3, 3]. In the other three designs xi follows a mixture of the two normal distributions N(2, 0.252 ) and N(−2, 0.252 ) with mixing probability 0.5. The error εi is generated as i.i.d. N(0, 1). The scale of the error term σ takes three different forms: it is either σi = 1, or σi = ce0.75β2 xi , or σi = ce−0.75β2 xi , where the constant c is set so that the average value of the scale is 1. To study the effect of increased dimensionality, a seventh design, borrowed from Buchinsky and Hahn (1998), is employed with five regressors i.i.d. N(0, 1), εi ∼ N(0, 52 ) and σi = 1. The intercept is set to 1, the slope parameters are (1, 0.5, −1, −0.5, 0.25), and the censoring value is -0.75. Chernozhukov and Hong (2003) DGP: Their left-censored simulated model is: y = max{0, β1 + β2 xi,2 + β3 xi,3 + β4 xi,4 + σi εi };

εi ∼ N(0, 1);

σi = (xi2 )2

(C.3)

with xi2 , xi3 , xi4 ∼ N(0, 1) and (β1 , β2 , β3 , β4 ) = (−6, 3, 3, 3). Appendix D. Supplementary data Supplementary material related to this article can be found online at https://doi.org/10.1016/j.jeconom.2019.05.016. References Achterberg, T., 2009. SCIP: solving constraint integer programs. Math. Program. Comput. 1 (1), 1–41. Andrews, D., 1997. A stopping rule for the computation of generalized method of moments estimators. Econometrica 65 (4), 913–931. Barrodale, I., Roberts, F.D.K., 1973. An improved algorithm for discrete L1 linear approximation. SIAM J. Numer. Anal. 10 (5), 839–848. Bilias, Y., Chen, S., Ying, Z., 2000. Simple resampling methods for censored regression quantiles. J. Econometrics 99 (2), 373–386. Buchinsky, M., 1994. Changes in the U.S. wage structure 1963–1987: Application of quantile regression. Econometrica 62 (2), 405–458. Buchinsky, M., Hahn, J., 1998. An alternative estimator for the censored quantile regression model. Econometrica 66 (3), 653–671. Chen, S., 2018. Sequential estimation of censored quantile regression models. J. Econometrics 207 (1), 30–52. Chen, L.-Y., Lee, S., 2018. Best subset binary prediction. J. Econometrics 206 (1), 39–56. Chernozhukov, V., Hong, H., 2002. Three-step censored quantile regression and extramarital affairs. J. Amer. Statist. Assoc. 97 (459), 872–882. Chernozhukov, V., Hong, H., 2003. An MCMC approach to classical estimation. J. Econometrics 115 (2), 293–346. Fafchamps, M., Gunning, J.W., Oostendorp, R., 2000. Inventories and risk in African manufacturing. Econ. J. 110 (466), 861–893. Fair, R.C., 1978. A theory of extramarital affairs. J. Political Econ. 86 (1), 45–61.

606

Y. Bilias, K. Florios and S. Skouras / Journal of Econometrics 212 (2019) 584–606

Fitzenberger, B., 1994. A Note on Estimating Censored Quantile Regressions. Technical Report. Discussion Paper, Center for International Labor Economics (CILE), University of Konstanz. Fitzenberger, B., 1997a. A guide to censored quantile regressions. In: Maddala, G.S., Rao, C.R. (Eds.), Robust Inference, Handbook of Statistics, Vol. 15. North-Holland, Amsterdam, pp. 405–437. Fitzenberger, B., 1997b. Computational aspects of censored quantile regression. In: Dodge, Y. (Ed.), Proceedings of the 3rd International Conference on Statistical Data Analysis Based on the L1-Norm and Related Methods, Vol. 31. IMS Lecture Notes Series, Hayword, CA, pp. 171–186. Fitzenberger, B., Winker, P., 2007. Improving the computation of censored quantile regressions. Comput. Statist. Data Anal. 52 (1), 88–108. Florios, K., Skouras, S., 2008. Exact computation of max weighted score estimators. J. Econometrics 146 (1), 86–91. Floudas, C.A., 1995. Nonlinear and Mixed-Integer Optimization: Fundamentals and Applications. Oxford University Press. Khan, S., Powell, J., 2001. Two-step estimation of semiparametric censored regression models. J. Econometrics 103 (1–2), 73–110. Kitagawa, T., Tetenov, A., 2018. Who should be treated? Empirical welfare maximization methods for treatment choice. Econometrica 86 (2), 591–616. Koenker, R., 2008. Censored quantile regression redux. J. Stat. Softw. 27 (6), 1–25. Koenker, R., 2012. quantreg: Quantile Regression. R package version 4.79. Koenker, R., Park, B.J., 1996. An interior point algorithm for nonlinear quantile regression. J. Econometrics 71 (1–2), 265–283. Lin, G., He, X., Portnoy, S., 2012. Quantile regression with doubly censored data. Comput. Statist. Data Anal. 56 (4), 797–812. Manski, C., 1975. Maximum score estimation of the stochastic utility model of choice. J. Econometrics 3 (3), 205–228. Nemhauser, G.L., Wolsey, L.A., 1999. Integer and Combinatorial Optimization. Wiley. Papadimitriou, C.H., Steiglitz, K., 1998. Combinatorial Optimization: Algorithms and Complexity. Dover. Peng, L., Huang, Y., 2008. Survival analysis with quantile regression models. J. Amer. Statist. Assoc. 103 (482), 637–649. Portnoy, S., 2003. Censored regression quantiles. J. Amer. Statist. Assoc. 98 (464), 1001–1012. Portnoy, S., 2010. Is ignorance bliss: Fixed vs. random censoring. In: Antoch, J., Huskova, M., Sen, P.K. (Eds.), Nonparametrics and Robustness in Modern Statistical Inference and Time Series Analysis: A Festschrift in Honor of Professor Jana Jureckova, Vol. 7. IMS, pp. 215–223. Powell, J., 1984. Least absolute deviations estimation for the censored regression model. J. Econometrics 25 (3), 303–325. Powell, J., 1986. Censored regression quantiles. J. Econometrics 32 (1), 143–155. Schumann, E., 2017. NMOF: Numerical Methods and Optimization in Finance. R package version 1.1-0. Tang, Y., Wang, H., He, X., Zhu, Z., 2012. An informative subset-based estimator for censored quantile regression. Test 21 (4), 635–655. Womersley, R.S., 1986. Censored discrete linear l1 approximation. SIAM J. Sci. Stat. Comput. 7 (1), 105–122.