Journal of Econometrics 166 (2012) 33–48
Contents lists available at SciVerse ScienceDirect
Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom
IV models of ordered choice Andrew Chesher ∗ , Konrad Smolinski CeMMAP & UCL, United Kingdom
article
info
Article history: Available online 26 June 2011 JEL classification: C10 C14 C50 C51 Keywords: Endogeneity Incomplete models Instrumental variables Ordered choice Ordered probit Set identification Threshold crossing models
abstract This paper studies single equation instrumental variable models of ordered choice in which explanatory variables may be endogenous. The models are weakly restrictive, leaving unspecified the mechanism that generates endogenous variables. These incomplete models are set, not point, identifying for parametrically (e.g. ordered probit) or nonparametrically specified structural functions. The paper gives results on the properties of the identified set for the case in which potentially endogenous explanatory variables are discrete. The results are used as the basis for calculations showing the rate of shrinkage of identified sets as the number of classes in which the outcome is categorised increases. © 2011 Elsevier B.V. All rights reserved.
1. Introduction This paper studies single equation instrumental variable models for ordered outcomes in which explanatory variables may be endogenous. These models arise in structural econometric analysis of individuals’ choices amongst ordered alternatives, or of individuals’ attitudes arranged on an ordinal scale and they arise in many other settings in which data are interval censored continuous outcomes. A common ploy when dealing with endogenous variation in a discrete response situation is to presume that the discrete response is generated in a recursive, triangular system along with the endogenous variable. Then, calling on some further restrictions, a control function method is used as the basis for identification and estimation. See for example Smith and Blundell (1986), Rivers and Vuong (1988), Blundell and Powell (2003, 2004) and Chesher (2003).1 Unfortunately this control function strategy does not generally work when endogenous variables are discrete.2 And, as explained
∗
Corresponding author. E-mail address:
[email protected] (A. Chesher).
1 The control function approach is used quite widely in applied econometric practice. STATA, Statacorp (2007) and LIMDEP, Greene (2007), are examples of widely used proprietary software suites armed with commands to conduct control function estimation of models of binary responses. 2 Chesher (2005) gives partial identification results for a control function model with discrete endogenous variables. 0304-4076/$ – see front matter © 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2011.06.004
in Chesher (2009), the triangular model imposes strong restrictions on the process generating the endogenous variables, restrictions which may not be found plausible in many econometric settings. By contrast here we work with a limited information, single equation model which is far less restrictive in this regard. The model requires that a scalar ordered outcome Y , with M ≥ 2 points of support, is determined by a structural function h(X , U ) which is weakly monotone in scalar unobserved continuously distributed U. The observed vector of explanatory variables, X , and U may not be independently distributed. However, the model requires that U is distributed independently of instruments, Z . We call the model a Single Equation Instrumental Variable (SEIV) model. The SEIV model requires that the support of X does not depend on the value of U but places no other restrictions on the process generating the endogenous variables, X , and in this respect it is incomplete. Thinking about Manski’s ‘‘Law of Decreasing Credibility’’, Manski (2003), encourages us to take this approach. It allows one to see what is lost by relaxing the strong restrictions of the triangular control function model. It turns out that what is lost is point identification because the SEIV model is generally set not point identifying. This paper focusses on models with discrete endogenous variables, having K points of support, {x1 , . . . , xK }, and explores the identified sets the SEIV model delivers. The main results are now summarised. Since the structural functions of an SEIV model are monotone in scalar U, there is a threshold crossing representation in which U is
34
A. Chesher, K. Smolinski / Journal of Econometrics 166 (2012) 33–48
normalised marginally uniformly distributed on the unit interval.
1 2 h(X , U ) ≡ . . .
M
, , .. .
,
0 ≤ U ≤ h1 (X ) h1 (X ) < U ≤ h2 (X )
.. .
hM −1 (X ) < U ≤ 1.
In the discrete endogenous variable case a nonparametrically specified structural function, h, is characterised by N = K ×(M − 1) parameters, denoted γ , which are the values of the M − 1 threshold functions at the K values of X . Let H 0 (Z) denote the set of values of γ identified by the SEIV 0 model given FYX |Z , a probability distribution for Y and X conditional on Z , when Z takes values in a set Z. Each structural function is characterised by a point in the unit N-cube and H 0 (Z) is a subset of that space. The identified set delivered by a nonparametric SEIV model is shown to be a union of convex sets each defined by a system of linear equalities and inequalities. The number of sets involved can be enormous in what seem to be small scale problems. For example, when M = K = 5 there may be over 300 billion component sets. The result is generally not a convex set unless instruments are strong. We give examples in which the identified set is not convex and, indeed, not connected. Shape restrictions (e.g. monotonicity) or parametric restrictions can bring substantial simplification. A system of inequalities given in Chesher (2010) defines an outer set, C 0 (Z), within which the SEIV model’s identified set lies. We develop expressions for these inequalities for the M outcome, discrete endogenous variable case. We propose a second system of inequalities defining a set of values of γ , D 0 (Z), and show that the identified set resides in the intersection C˜ 0 (Z) ≡ C 0 (Z) ∩ D 0 (Z). When the outcome Y is binary C 0 (Z) is a subset of D 0 (Z) and, as shown in Chesher (2010), in that case C 0 (Z) is the identified set H 0 (Z). Here we show that when the endogenous variable is binary C˜ 0 (Z) is the identified set, however many categories there are for Y . The nonparametric SEIV model is falsifiable as the example in Appendix C demonstrates. We examine the impact of response discreteness on the identified sets. The discrete response model studied here is a nonadditive error model and the results for such models for continuous outcomes given in Chernozhukov and Hansen (2005) show that there can be point identification in SEIV models when observed responses are continuous. So it is to be expected that as the number of categories indicated by Y increases there is reduction in ambiguity and an approach to point identification. We investigate this in the context of a model with parametrically specified structural functions such as arise in ordered probit models. We find that in the cases considered identified sets for a parameter which is a coefficient in a linear index shrink at a rate approximately equal to the inverse of the square of the number of classes in which the outcome is categorised. In the example, when Y is categorised into 10 or more classes, the SEIV model delivers identified sets which are very small indeed. The paper is organised as follows. Section 2 gives a formal definition of the SEIV model and defines its identified set of structural functions. Section 3 develops the main results for nonparametrically specified structural functions with discrete endogenous variables. In Section 3.1, a piecewise uniform system of conditional distributions of U given X and Z is introduced and conditions under which a structural function lies in the identified set are stated. The geometry of the identified set for nonparametrically specified structural functions is discussed in Section 3.2, the impact of shape restrictions on the complexity of the identified set is considered in Section 3.3, and systems of inequalities obeyed by values of these functions that lie in the identified set are set out in Section 3.4.
Section 4 illustrates the results using a parametrically specified model which, in the absence of endogeneity, would be a conventional ordered probit model. This section gives results on the rate of shrinkage of identified sets as the number of categories of the discrete outcome increases. It also demonstrates the impact of instrument support on the identified set. Section 5 concludes. 2. An IV model for ordered outcomes In the SEIV model a scalar ordered outcome Y is determined by observable X , which may be a vector, and unobserved scalar U. Restriction 1 defines admissible structural functions. Restriction 1. Y is determined by a structural function as follows:
1 2 Y = h(X , U ) ≡ . . .
M
, , .. .
,
h0 (X ) ≤ U ≤ h1 (X ) h1 ( X ) < U ≤ h2 ( X )
.. .
hM −1 (X ) < U ≤ hM (X )
with, for all x, h0 (x) = 0 and hM (x) = 1 and for all x and m, hm (x) > hm−1 (x). U is continuously distributed and normalised to have a marginal uniform distribution on [0, 1]. The support of X conditional on U is independent of U. Specifying the values of Y to be the first M integers is an innocuous normalisation because Y is an ordered outcome. U and X are not required to be independently distributed so the model allows elements of X to be endogenous. However, U is required to be distributed independently of instrumental variables, Z , as set out in Restriction 2. Restriction 2. There are instrumental variables Z which take values in some set Z. U and Z are independently distributed in the sense that the conditional distribution function of U given Z = z satisfies FU |Z (u|z ) = u for all u ∈ [0, 1] and z ∈ Z. Restriction 1 excludes the instrumental variables from the structural function. Now consider the identifying power of this model. 0 Let FYX |Z denote some distribution function of Y and X conditional on Z . Imagine a situation in which data are informative about this distribution for values of Z that lie in a set Z. If this distribution function is compatible with the SEIV model, then there exists (i) a structural function h0 with threshold functions {h0m }M m=1 0 and (ii) a distribution function FUX |Z , both admitted by the SEIV model and such that the following condition holds when h = h0 0 and FUX |Z = FUX |Z . 0 FYX |Z (m, x|z ) = FUX |Z (hm (x), x|z ),
for all: z ∈ Z, m and x.
(1)
This relationship must hold because the threshold crossing Restriction 1 requires that Y is less than or equal to m when X = x if and only if U is less than or equal to the mth threshold function hm (x). There may be more than one admissible structure (h, FUX |Z ) satisfying (1) because it may be possible to compensate for variations in the x-sensitivity of the threshold functions {hm }M m=1 by adjusting the u- and x-sensitivity of FUX |Z leaving the left-hand side of (1) unchanged while respecting the independence Restriction 2. So the model is partially identifying. 0 0 For a distribution FYX |Z let S (Z) denote the set of structures identified by the model comprising Restrictions 1 and 2. This is the set of structures admitted by the SEIV model and satisfying (1). The set of structural functions identified by the model, denoted H 0 (Z),
A. Chesher, K. Smolinski / Journal of Econometrics 166 (2012) 33–48
is the set of structural functions h which are elements of structures lying in the identified set.
H (Z) ≡ {h : ∃ admissible FUX |Z s.t. (h, FUX |Z ) ∈ S (Z)}. 0
0
The set H 0 (Z) is a projection of the set S 0 (Z). This set is difficult to characterise and compute when X is continuously distributed because determining whether there exists a distribution function FUX |Z that can accommodate a particular structural function may require searching across an infinite-dimensional space of functions. However, Chesher (2010) shows that when Y is binary the identified set is determined by a system of inequalities in which the distribution function FUX |Z does not appear. One of the contributions of this paper is a similar result for the case in which a scalar endogenous explanatory variable X is binary and Y takes any number of values. When X is discrete, say with K points of support, the distribution function FUX |Z can be characterised by a finite number of parameters for each value of Z and the identified set can be computed when M and K are not too large. The remainder of the paper studies the case in which the explanatory variable, X , is discrete. 3. Identified sets with discrete endogenous variables 3.1. Identification When X is discrete and K -valued with X ∈ {xi }Ki=1 , the threshold functions are characterised by the parameters
γmi ≡ hm (xi ),
m ∈ {0, . . . , M }, i ∈ {1, . . . , K }
(2)
of which N ≡ (M − 1)K are free, that is not restricted to be K zero or one. Define γ i ≡ {γmi }M m=0 and γ ≡ {γi }i=1 with, for all i ∈ {1, . . . , K }, γ0i ≡ 0, γMi ≡ 1. In the discrete X case considered here an identified set of structural functions is a set of values of γ , comprising a subset of the unit N-cube. When determining whether a structural function characterised by a value of γ lies in the identified set it is sufficient to search across distribution functions which, at each value z of the instrumental variables are characterised by the following parameters.
βmij (z ) ≡ FU |XZ (γmi |xj , z ), m ∈ {0, 1, . . . , M }, (i, j) ∈ {1, . . . , K }. Let β(z ) denote the list of values βmij (z ), m ∈ {0, . . . , M }, (i, j) ∈ {1, . . . , K } for some value z. For all (i, j) ∈ {1, . . . , K } define β0ij (z ) ≡ 0 and βMij (z ) ≡ 1. Let β(Z) denote the list of values of β(z ) generated as z varies across Z. Values βmij (z ) with i = j are relevant because observational equivalence requires that if γ lies in the identified set then for each z ∈ Z, m and i the equality FU |XZ (γmi |xi , z ) = FY0|XZ (m|xi , z )
(3)
must hold. This follows directly from (1) on replacing x by xi and hm (xi ) by γmi as in (2). The conditional distribution FX0|Z is identified so (3) is effectively the observational equivalence condition (1). Values of βmij (z ) with i ̸= j are also relevant because the independence Restriction 2 together with the uniform distribution normalisation of the marginal distribution of U requires that for each m, i and z the following condition holds.
K − j =1
FU |XZ (γmi |xj , z ) Pr0 [X = xj |Z = z ] = γmi
Here EX0|Z =z indicates expectation taken with respect to the distribution FX0|Z with the conditioning variable Z taking the value z. So, for each point xj in the support of X the values of the conditional distribution functions, FU |XZ (u|xj , z ), at each value of u ∈ γ are relevant when determining whether γ is in the identified set. Other values of u are not relevant because they play no role in the fulfilment of the observational equivalence condition (3) or the independence condition (4). If γmi and γm′ i′ are adjacent3 values of the threshold parameters then the definition of FU |XZ for any values, xj and z of the conditioning variables can be completed by connecting FU |XZ (γmi |xj , z ) and FU |XZ (γm′ i′ |xj , z ) with straight line segments delivering histogramlike piecewise uniform conditional probability density functions.4 Let
δi (z ) ≡ Pr[X = xi |Z = z ]
i ∈ {1, . . . , K }
be conditional probabilities of X given Z . Let Pr0 denote probabilities calculated using a particular 0 distribution function FYX |Z so that
δi0 (z ) ≡ Pr0 [X = xi |Z = z ] i ∈ {1, . . . , K } and define δ0 (z ) ≡ {δi0 (z )}Ki=1 . Define conditional probabilities (α ) and cumulative probabilities (α¯ ) of the outcome: 0 αmi (z ) ≡ Pr0 [Y = m|X = xi , Z = z ], m ∈ {0, . . . , M }, i ∈ {1, . . . , K } m − 0 α¯ mi (z ) ≡ αni0 (z ), m ∈ {0, . . . , M }, i ∈ {1, . . . , K } n =0
with α (z ) ≡ 0 for all i and z, and define lists of conditional probabilities as follows. 0 0i
0 α0i (z ) ≡ {αmi (z )}M m=0
α0 (z ) ≡ {α0i (z )}Ki=1
0 α¯ 0i (z ) ≡ {α¯ mi (z )}M m=0
α¯ 0 (z ) ≡ {α¯ 0i (z )}Ki=1
Consider a structure characterised by 1. γ : a list of values of threshold functions, 2. β(Z): a list of values of conditional distribution functions of U given X and Z obtained as Z takes values in Z, and, 3. δ(Z): a list of values of conditional probabilities of X given Z = z, δ(z ) = {δi (z )}Ki=1 where δi (z ) ≡ Pr[X = xi |Z = z ], obtained as z varies across Z. Proposition 1. A structure, {γ, β(Z), δ(Z)} lies in the set identified by the SEIV model associated with probabilities α0 (z ) and δ0 (z ) and a set of instrumental values Z if and only if the following three conditions hold for all z ∈ Z. I1. Observational equivalence. For m ∈ {1, . . . , M } and i ∈ {1, . . . , K } 0 βmii (z ) = α¯ mi (z )
δi (z ) = δi0 (z )
I2. Independence. For m ∈ {1, . . . , M } and i ∈ {1, . . . , K } K −
δj (z )βmij (z ) = γmi .
(5)
j =1
I3. Proper conditional distributions. For (m, n) ∈ {1, . . . , M } and (i, j, k) ∈ {1, . . . , K } if γmi ≤ γnj then βmik (z ) ≤ βnjk (z ).
3 If there is no γ ∈ γ such that γ < γ < γ ′ ′ then γ and γ ′ ′ are adjacent. st mi st mi mi mi 4 Using straight line segments ensures that the independence condition:
EX0|Z =z [FU |XZ (γmi |X , z )]
≡
35
(4)
EX0|Z =z [FU |XZ (u|X , z )] = u holds for all u ∈ (0, 1) and z ∈ Z.
36
A. Chesher, K. Smolinski / Journal of Econometrics 166 (2012) 33–48
Proof of Proposition 1. The SEIV model only admits distributions satisfying the monotonicity and independence conditions I3 and I2 so all structures in the set identified by the SEIV model must satisfy I2 and I3. Considering the set of structures that satisfy I2 and I3, it 0 is precisely those that deliver the probability distribution FYX |Z for all values of Z in Z that comprise the identified set. Eq. (5) arises in I2 because on the left-hand side is the value of Pr[U ≤ γmi |Z = z ] associated with the distribution of U given X and Z characterised by β(z ) which the independence condition of the SEIV model requires to be equal to Pr[U ≤ γmi ]. This probability is equal to γmi because the marginal distribution of U is normalised Unif (0, 1) in the SEIV model. In the observational equivalence condition I1, βmii (z ) is the value of Pr[U ≤ γmi |X = xi , Z = z ] associated with the distribution of U given X and Z characterised by β(z ). All structures and only structures that have this value equal to Pr0 [Y ≤ m|X = xi , Z = zi ] and δi (z ) = Pr0 [X = xi |Z = z ] deliver the probability 0 distribution function FYX |Z . In Section 3.4, the identified set of values of γ is characterised as a set of values satisfying a system of inequalities. First, some aspects of the geometry of the identified set are examined.
When determining whether a particular value of γ lies in the identified set, the ordering of the elements of γ is crucial in determining whether there exist distribution functions which satisfy condition I3. There are (6)
admissible orderings of the N elements of γ which are not restricted to be zero or one.5 For example, when M = 3 and K = 2, there are 6 of the possible 24 orderings that are admissible. The 18 inadmissible orderings have γ11 > γ21 or γ12 > γ22 or both. Let l index the admissible orderings of γ . For each l ∈ {1, . . . , L} define sets Sl0 (z ) and Hl0 (z ) as follows.
Sl0 (z ) ≡ {(γ, β(z ), δ(z )) : γ is in order l and
(γ, β(z ), δ(z )) respects I1–I3} Hl0 (z ) ≡ {γ : γ is in order l and ∃ (β(z ), δ(z )) s.t. (γ, β(z ), δ(z )) ∈ Sl0 (z )}. The set Sl0 (z ) is the set comprising those structures admitted by the SEIV model that have γ in order l and deliver the distribution 0 0 FYX |Z for Z = z. The set Hl (z ) is the projection of this set onto the component γ , that is onto the structural function. Since for any ordering, l, conditions I1–I3 comprise a system of linear equalities and inequalities, each set Sl0 (z ) is convex, or empty. It follows, from consideration of the Fourier–Motzkin elimination algorithm6 , that the set Hl0 (z ) is also defined by a system of linear equalities and inequalities, so it is also convex or empty. The identified set of values of γ in order l obtained as z takes all values in the set of instrumental values Z, denoted Hl0 (Z), is the following intersection of the sets Hl0 (z ):
Hl0 (Z) ≡
H 0 (Z) =
L
Hl0 (Z).
l=1
Thus the identified set of values of γ , that is the identified set of structural functions, is a union of convex sets but that union may not itself be convex, indeed it may not even be connected. In the example in Section 4 there are a number of cases in which the identified set is disconnected. When instruments are strong or there are highly informative additional restrictions (for example parametric restrictions) the sets Hl (Z) may be empty for all but one value of l and then the identified set is convex. Otherwise, the identified set may be very irregular and complex, composed of the union of a very large number of convex sets. With M and K as low as 4 the value of L is 369, 600 and as M or K increases the value of L quickly becomes astronomical. Shape restrictions bring simplification as shown now. 3.3. Shape restrictions 3.3.1. Complete separation In this case, there is the following restriction.
3.2. Geometry of the identified set
L ≡ (K (M − 1))!/((M − 1)!)K
which is convex or empty. The identified set of values of γ of all orders is the union of the sets Hl0 (Z), as follows.
Hl0 (z )
z ∈Z
min{hm (x1 ), . . . , hm (xK )} ≥ max{hm−1 (x1 ), . . . , hm−1 (xK )} for all m. Such completely separated threshold functions must arise when X has no effect on the structural function and will arise when the effect of X is weak. There are K ! arrangements of the elements of γ associated with each threshold function and M − 1 threshold functions so there are L = (K !)M −1 admissible orderings of the elements of γ under the complete separation restrictions. This result presumes that no elements in γ are equal. The value is an upper bound on the number of admissible orderings once ties are allowed. Henceforth, counts of orderings are conducted assuming no ties. 3.3.2. Monotonicity Now suppose the threshold functions are all restricted to be monotone in a scalar explanatory variable X , with a common direction of dependence. This restriction would arise in a typical parametric ordered probit model. We consider the case in which the direction of dependence is specified.7 The problem of finding the number of admissible orderings of γ under this restriction can be recast as the problem of finding the number of ways of filling a (M − 1) × K matrix with the integers {1, 2, . . . , N } such that the array increases both across columns and across rows. With K = 2 this is the Catalan number 2(M −1) 1 . The restriction of monotonicity with respect to X M (M −1) brings an M-fold reduction in the number of admissible orderings in this case. These row and column ascending matrices are rectangular Young Tableaux and the hook length formula of Frame et al. (1954) delivers the following expression for the number of admissible orderings of γ under the monotonicity restriction. L=
(K (M − 1))! ∏ ∏ (M + K − m − k)
M −1 K
(7)
m=1 k=1
5 There are (K (M − 1))! permutations of the free elements of γ . Amongst these only 1 in each (M − 1)! have a sequence γ i in ascending order and there are K such sequences to be considered so only 1 in each ((M − 1)!)K have all these sequences in ascending order. 6 See Ziegler (2007).
Table 1 shows the number of admissible orderings (L) for values of M and K up to 5 with no shape restriction (None), the
7 Leaving this unspecified doubles the number of admissible arrangements.
A. Chesher, K. Smolinski / Journal of Econometrics 166 (2012) 33–48 Table 1 Number of admissible orderings of γ with no shape restrictions (None), monotonicity (M) with respect to scalar X and complete separation (CS). K M 2
3
4
5
Restriction None M CS None M CS None M CS None M CS
2
3
4
5
2 1 2 6 2 4 20 5 8 70 14 16
6 1 6 90 5 36 1,680 42 216 34,650 462 1,296
24 1 24 2,520 14 576 369,600 462 13,824 6,306,300 24,024 331,776
120 1 120 113,400 42 14,400 168,168,000 6,006 1,728,000 305,540,235,000 1,662,804 207,360,000
monotonicity restriction (M) and the complete separation restriction (CS). The restrictions bring substantial reductions in the number of admissible orderings but there remain very large numbers for only moderately large values of M and K . 3.3.3. Single- and twin-peakedness Now suppose Y is binary so there is just one threshold function, h1 . Let X be scalar and consider restrictions that limit the number of turning points of h1 (x) as x passes through its ordered points of support denoted by x[1] < x[2] < · · · < x[K ] .
37
definitions designate all single-peaked permutations twin-peaked. Here are examples of twin-peaked permutations when K = 5:
γ[1] γ[1] γ[1] γ[1]
< γ[2] < γ[2] > γ[2] < γ[2]
< γ[3] > γ[3] > γ[3] > γ[3]
< γ[4] > γ[4] < γ[4] < γ[4]
< γ[5] > γ[5] < γ[5] > γ[5]
and here is a permutation for K = 5 that is not twin-peaked
γ[1] > γ[2] < γ[3] > γ[4] < γ[5] there being 3 peaks, at γ[1] , γ[3] and γ[5] . Here are two propositions with proofs giving the number of single- and twin-peaked permutations of the first K integers. Proposition 2. The number of single-peaked permutations of {1, 2, . . . , K } is c1 (K ) = 2(K −1) . Proof of Proposition 2. 9 First observe that whatever the value of c1 (K − 1), the value of c1 (K ) is 2c1 (K − 1). This is so because for each single-peaked permutation of length {1, 2, . . . , K − 1} there are exactly two single-peaked permutations of {1, 2, . . . , K } constructed by placing the integer K to the left or to the right of the peak in the permutation with K − 1 elements. Placing the value K anywhere else in the permutation with K − 1 elements would introduce a second peak. Since c1 (1) = 1, it follows that c1 (K ) = 2K −1 . Proposition 3. The number of at most twin-peaked permutations of {1, 2, . . . , K } is
Let γ[i] denote h1 (x[i] ). When h1 (x) is unrestricted there are K ! admissible orderings of elements of γ .8 When h1 (x) is restricted to be monotone in x there are 2 admissible orderings. Monotone orderings have
c2 (K ) = 2(K −1) × (1 + 2K −2 − K /2).
γ[1] < γ[2] < · · · < γ[K ]
c2 (K ) = (K − 4) × 2(K −2) + 4 × c2 (K − 1),
or
γ[1] > γ[2] > · · · > γ[K ] . We now consider numbers of admissible orderings of elements of γ when h1 is restricted to be (i) single-peaked and (ii) at most twin-peaked. Denote these respectively c1 (K ) and c2 (K ), and note that these are equal to the number of permutations of the integers {1, . . . , K } which are respectively single- and at most twin-peaked. Definition 1. A single-peaked permutation of {1, 2, . . . , K } is a permutation in which either (i) elements are increasing, or (ii) elements are decreasing, or (iii) elements are first increasing and then decreasing. Note that permutations that are monotone are classified as single-peaked, their peak is at one or the other end of the permutation. Definition 2. An at most twin-peaked permutation of {1, 2, . . . , K } is a permutation that is either single-peaked or has elements that: either (i) first decrease and then rise with at most one subsequent decrease, or (ii) first rise and then fall with at most one subsequent increase. We drop the qualifier ‘‘at most’’ in most of the discussion that follows. One or both of the peaks in a twin-peaked permutation may appear at the start or the end of the permutation. Note that the
Proof of Proposition 3. We first show that there is the following recursion. (8)
Consider permutations of {1, 2, . . . , K − 1}. Amongst these, by Proposition 2, there are 2(K −2) that are single-peaked. Each of these can have the term K added at any one of K positions to deliver an at most twin-peaked sequence. So, associated with the single-peaked permutations of {1, 2, . . . , K − 1} there are c21 (K ) = K × 2(K −2) at most twin-peaked permutations of {1, 2, . . . , K }. There remain to be considered c2 (K − 1) − 2(K −2) twin-peaked but not single-peaked permutations of {1, 2, . . . , K − 1}. Each of these can have the term K added at any one of 4 positions, immediately before or after each of the two peaks. Any other placement would produce a third peak. So, associated with the twin-peaked but not single-peaked permutations of {1, 2, . . . , K − 1} there are c22 (K ) = 4 ×(c2 (K − 1)− 2(K −2) ) at most twin-peaked permutations of {1, 2, . . . , K }. Inserting the term K into a permutation of {1, 2, . . . , K − 1} that is not at most twin-peaked cannot deliver a twin-peaked permutation of {1, 2, . . . , K }. The recursion is then obtained on adding c21 (K ) and c22 (K ) and simplifying. On repeated substitution in (8) there is c2 (K ) = 4K −1 + 2K −2
K −2 −
(K − 4 − s)2s
s=0
and the result follows on using the following. n − s=0
8 We work supposing that all elements of γ are distinct. The counts of admissible arrangements reported are upper bounds on counts when there are ties.
c2 (1) ≡ 1.
2s = 2n+1 − 1
n −
s2s = 2(1 − 2n + n2n )
s =0
9 We are grateful to Richard Spady for suggesting an inductive approach.
38
A. Chesher, K. Smolinski / Journal of Econometrics 166 (2012) 33–48
Table 2 Number of admissible orderings of γ in a model for binary Y with no restriction and with monotone, single- and twin-peakedness restrictions. Restriction K
Monotone
Single-peaked
Twin-peaked
None
2 3 4 5 6 7 8 9 10
2 2 2 2 2 2 2 2 2
2 4 8 16 32 64 128 256 512
2 6 24 104 448 1,888 7,808 31,872 129,024
2 6 24 120 720 5,040 40,320 362,880 3,628,800
Table 2 shows the number of admissible orderings of γ in a single threshold model without restrictions and under the monotone, single- and twin-peakedness conditions. Relaxing a monotonicity restriction to allow single-peakedness increases the number of admissible orderings by a factor 2K −2 . Further relaxation allowing twin peaks increases the number of admissible orderings by the same order of magnitude. When K is large even the relatively weak twin-peakedness restriction brings substantial reductions. 3.3.4. Two or more endogenous variables Staying with the single threshold case, now suppose that there are two endogenous variables, X1 and X2 , with respectively K1 and K2 points of support. We will assume throughout that each value of X1 can arise paired with each value of X2 . There are then (K1 K2 )! admissible orderings of γ . If the threshold function h1 (X1 , X2 ) is restricted to be monotone (say increasing) in one of the variables, say X1 , then there are (K1 K2 )!/(K1 !)K2 admissible orderings, obtained from (6) on replacing M − 1 by K1 . If the threshold function is restricted to be monotone in both variables (with directions of dependence specified) then on applying the hook length formula as above there are L=
(K1 K2 )! K1
K2
∏ ∏
(K1 + K2 + 1 − k1 − k2 )
k1 =1 k2 =1
admissible orderings of γ . Finally suppose that an index restriction is imposed, for example that the single threshold function has the form h1 (X1 + X2 ). Now we must consider how many elements are there in γ . Denoting this by N and staying with the single threshold case there is just one admissible ordering if monotonicity is imposed, ∑N N ! admissible orderings with no restriction on h1 and i=1 (N − i + 1)i−1 under a single-peakedness condition. Some subtleties arise on considering the value of N. Consider the case in which both variables have K points of support. The support of X1 + X2 contains at most K 2 distinct values but it can contain less. For example, if the support of the two variables are identical the support of X1 + X2 contains at most K (K + 1)/2 distinct values and if the support of each variable is a sequence of K consecutive integers then X1 + X2 takes 2K − 1 distinct values. The impact of index restrictions on the complexity of the identified set depends critically on the nature of the support of the variables that contribute to the index. 3.4. Characterisation of the identified set In this section a set of values of γ , C˜ 0 (Z), is defined as the intersection of two sets, C0 (Z) and D0 (Z). These sets are defined below. There then follow three propositions which set out the relationship between C˜ 0 (Z) and the identified set, H 0 (Z).
Proposition 4 states that for all M and K , C˜ 0 (Z) is an outer region for the identified set, that is, H 0 (Z) ⊆ C˜ 0 (Z). Proposition 5 states that C˜ 0 (Z) is precisely the identified set when the endogenous variable is binary (K = 2). The relatively long proof is contained in an Annex. Proposition 6 states that when Y is binary (M = 2), C0 (Z) = C˜ 0 (Z). This reconciles the results of this paper with those in Chesher (2010) where it is shown that when Y is binary C0 (Z) is precisely the identified set. Here are the definitions of the sets C0 (Z) and D0 (Z).
C (Z) ≡
γ : max
0
K M −1 − −
z ∈Z
≤ γls ≤ min
0 δi0 (z )αmi (z )1(γmi ≤ γls )
i=1 m=1 K − M −
z ∈Z
i=1 m=1
0 δi0 (z )αmi (z )1(γm−1,i < γls ),
l ∈ {1, . . . , M − 1}, s ∈ {1, . . . , K } 0 0 D 0 (Z) ≡ {γ : γni − γmi ≥ max(δi0 (z )(α¯ ni (z ) − α¯ mi (z ))), z ∈Z
n > m, (n, m) ∈ {1, . . . , M − 1}, i ∈ {1, . . . , K }} Proposition 4. C˜ 0 (Z) ≡ C 0 (Z) ∩ D 0 (Z) is an outer region for the identified set, H 0 (Z). Proof of Proposition 4. It is required to show that H 0 (Z) ⊆ C˜ 0 (Z). This is done by showing in part (a) that all γ in the identified set lie in C 0 (Z) and then in part (b) that all γ in the identified set lie in D 0 (Z).
(a).H 0 (Z) ⊆ C 0 (Z). Chesher (2010) shows that all structural functions, h, in the set identified by the SEIV model associated with a conditional 0 distribution function FYX |Z and a set of instrumental values Z satisfy the following inequalities for all u ∈ (0, 1) and z ∈ Z. Pr0 [Y < h(X , u)|Z = z ] < u ≤ Pr0 [Y ≤ h(X , u)|Z = z ]. In terms of threshold functions these inequalities are as follows. M −
Pr0 [(Y = m) ∧ (hm (x) < u)|Z = z ] < u
m=1
≤
M −
Pr0 [(Y = m) ∧ (hm−1 (x) < u)|Z = z ].
m=1
For the discrete endogenous variable case, there is the following representation. K M −1 − −
0 δi0 (z )αmi (z )1(γmi < u) < u
i=1 m=1
≤
K − M −
0 δi0 (z )αmi (z )1(γm−1,i < u)
(9)
i=1 m=1
Consider some arrangement of the elements of γ in which two elements, γkr < γls are adjacent so that there is no element γqt ∈ γ satisfying γkr < γqt < γls . Consider u ∈ (γkr , γls ] and the righthand inequality in (9). This inequality must hold for all u in (γkr , γls ] and so must hold at the supremum of the interval which is its maximal value, γls , and so there is the following.
γls ≤
K − M − i=1 m=1
0 δi0 (z )αmi (z )1(γm−1,i < γls ).
A. Chesher, K. Smolinski / Journal of Econometrics 166 (2012) 33–48
Since this inequality must hold for all z ∈ Z, the minimal value achieved on the right-hand side as z varies in Z is all that matters.10 This produces the upper bounding function in the definition of C 0 (Z ). Now consider some arrangement of the elements of γ in which two elements, γls < γpr are adjacent so that there is no element γqt satisfying γls < γqt < γpr . Consider u ∈ (γls , γpr ] and the lefthand side of (9). This inequality must hold for all u in (γls , γpr ] and so must hold as in (9) with strong inequalities at every value of u in the interval and so with weak inequalities at the infimum of the interval which is γls and so there is the following inequality. K M −1 − −
0 δi0 (z )αmi (z )1(γmi ≤ γls ) ≤ γls .
i=1 m=1
Since this must hold for all z ∈ Z, the maximal value achieved on the left-hand side as z varies in Z is all that matters. This delivers the lower bounding function in the definition of C 0 (Z ). This concludes the demonstration that H 0 (Z) ⊆ C 0 (Z).
39
and the inequalities defining C 0 (Z) require that i −
δj0 (z )α1j0 (z ) ≤ γ1i ≤ 1 +
j =1
i ∈ {1, . . . , K }.
K −
δj0 (z )(1 − α1j0 (z ))
j =i
(11)
It is clear that (10) is satisfied if (11) is satisfied, and therefore when Y is binary C 0 (Z) ⊆ D 0 (Z). Appendix B sets out the 50 inequalities that arise in a 3 outcome case with a binary endogenous variable. The SEIV model generally set identifies structural functions. When values of endogenous variables can be predicted with high accuracy identified sets can be small. The model is falsifiable as is demonstrated in Appendix C via a simple example involving a binary outcome. The next section studies sets identified by an ordered outcome model with a binary endogenous variable. The effect on the identified set of varying the discreteness of the outcome and the strength of the instrumental variables is investigated.
(b). H 0 (Z) ⊆ D 0 (Z). For each γ in the identified set, there exists for each z ∈ Z a distribution function characterised by β(z ) satisfying conditions I1–I3. Conditions I1 and I2 imply that
γni = δ (z )α¯ (z ) + 0 i
0 ni
−
δ (z )βnij (z ) 0 j
j̸=i 0 γmi = δi0 (z )α¯ mi (z ) +
−
δj0 (z )βmij (z )
j̸=i
and on subtracting there is the following. 0 γni − γmi = δi0 (z )(α¯ ni0 (z ) − α¯ mi (z )) − 0 + δj (z )(βnij (z ) − βmij (z )). j̸=i
The properness condition I3 ensures that, since n > m, for each i and j, the distribution function characterised by β(z ) has βnij (z ) ≥ βmij (z ) and so 0 γni − γmi ≥ δi0 (z )(α¯ ni0 (z ) − α¯ mi (z )).
Since this inequality must hold for all z ∈ Z, the maximal value achieved on the right-hand side as z varies in Z is all that matters. This delivers the lower bounding function in the definition of D 0 (Z ). Proposition 5. When X is binary H 0 (Z) = C˜ 0 (Z) no matter how many points of support Y has. This is proved by showing that for every γ in C˜ 0 (Z) and for each z ∈ Z there exists a proper distribution of U given X and Z = z characterised by β(z ) such that Restrictions I1–I3 are respected. The proof is in Appendix A. Proposition 6. When Y is binary C˜ 0 (Z) = C 0 (Z). Proof of Proposition 6. The result follows once it is shown that when Y is binary, C 0 (Z) ⊆ D 0 (Z). When Y is binary the free elements in γ are γ1i , i ∈ {1, . . . , K }. The inequalities defining D 0 (Z) reduce to
δi0 (z )α1i0 (z ) ≤ γ1i ≤ 1 + δi0 (z )(1 − α1i0 (z )) i ∈ {1, . . . , K }
(10)
10 The element in Z which delivers this minimal value may vary with l and s.
4. Discreteness and identified sets in a parametric ordered probit model The complexity of identified sets for nonparametrically specified threshold functions increases rapidly as the number of points of support of the outcome, M, becomes large. In this section we study cases with M as large as 130 when the identified set can in principle have over 2 × 1076 convex components. Cases like this are computationally challenging, and there arises the question of representing and processing the information in the complex identified set. We tackle this problem by focusing on parametrically specified structural functions choosing, by way of example, the form that arises in conventional ordered probit analysis. We explain how identified sets in the parameter space defined by the parametric model can be calculated using our results on nonparametric identification but without encountering computationally intractable problems. For particular cases, we show the form taken by the identified sets and the effects on the sets of changing the discreteness of the outcome and the support of the instrumental variable. The ordered probit structural function is interesting because with exogenous explanatory variables it is widely used in econometric practice and for applied workers a first attempt at addressing potential endogeneity in explanatory variables could start with this form. Given the complexity of the identified set for nonparametrically specified structural functions it is likely that in applications some relatively low-dimensional parameterisation would be used. 4.1. Models We now investigate the nature of the identified sets delivered by a parametric ordered probit model with a binary endogenous variable. In this model the structural function is parametrically specified, as follows. 1 , 0 ≤ U ≤ Φ (s−1 (c1 − a0 − a1 X )) 2 , Φ (s−1 (c1 − a0 − a1 X )) < U ≤ Φ (s−1 (c2 − a0 − a1 X )) Y = . (12) .. .. .. . . M , Φ (s−1 (cM −1 − a0 − a1 X )) < U ≤ 1 Here Φ denotes the standard normal distribution function, the constants c1 , . . . , cM −1 are threshold values defining cells within which a latent normal random variable is classified, and a0 , a1 and s are constant parameters. Throughout X is binary with support
40
A. Chesher, K. Smolinski / Journal of Econometrics 166 (2012) 33–48
{−1, +1}, There is the independence restriction: U ⊥ Z , U is normalised Unif (0, 1). In one portfolio of illustrations (A) the model specifies the values of the threshold parameters c1 , . . . , cM −1 as known, and s as known and normalised to one. This leaves just two unrestricted parameters, a0 and a1 , and it is easy to display the identified sets graphically. In these illustrations M, the number of levels of the outcome, is varied from 2 to 130. In econometric practice with interval censored data it would be normal to know the values of the thresholds but the parameter s would typically be unknown. The methods employed here could be used with s unknown. In another illustration (B) M is held fixed at 3 and the model specifies the thresholds, c1 and c2 , along with the slope coefficient, a1 , as unknown parameters. In these illustrations, the values of a0 and s are normalised to respectively 0 and 1 because their values cannot be identified once the thresholds are unrestricted. A similar exercise could be conducted with larger values of M but the graphical display of results would be problematic. In econometric practice using attitudinal responses it would be normal to have unknown thresholds as in this example. In all cases the instrumental variable takes equally spaced values in an interval [−ζ , ζ ]. In two of the illustrations the effect on the identified sets of varying the support of the instrument is studied. 4.2. Calculation procedures The calculation of an identified set of parameter values for a 0 particular distribution FYX |Z and a set of instrumental values Z proceeds as follows.11 A fine grid of values of the parameters (e.g. a0 and a1 in the illustrations in set A) is defined. A value, (a∗0 , a∗1 ), is selected from the grid and the value of γ , say γ ∗ , determined by (a∗0 , a∗1 ) is calculated. Recall that γ is a list of values of the threshold functions defined by a model at the points of support of the discrete endogenous variable. For the selected value γ ∗ the ordering of its elements, say l∗ , is determined and the linear equalities and inequalities defining the convex set C˜ l0∗ (Z) are calculated, minimising and maximising expressions over z ∈ Z as set out in the definitions of the sets C0 (Z) and D0 (Z) at the start of Section 3.4. In all the illustrations, because X is binary, Hl0∗ (Z) = C˜ l0∗ (Z). If ∗ γ falls in this set then (a∗0 , a∗1 ) is in the identified set of parameter values, otherwise it is not. Passing across the grid the identified set is computed. Care is required because the set may not be connected and sometimes component connected subsets of the identified set can be small. To avoid missing component subsets, dense grids of values are used in the calculations presented here. 4.3. Illustration A1 The probability distributions used in this illustration are generated by triangular Gaussian structures with structural equations as follows. Y ∗ = α1 X + W X ∗ = 0.5Z 1 2 Y = .
. .
M
+V , −∞ ≤ Y ∗ ≤ c1 , c1 < Y ∗ ≤ c2 .. .. . . ,
cM −1 < Y ∗ ≤ +∞
11 All the computation in the paper was carried out using R (Ihaka and Gentleman (1996)).
Table 3 Illustration A1: threshold values. M
Threshold values (ci )
Shading in Fig. 1
2 4 6 8 10 12 14
{0.0} {±0.1, 0.0} {±0.3, ±0.1, 0.0} {±0.7, ±0.3, ±0.1, 0.0} {±1.1, ±0.7, ±0.3, ±0.1, 0.0} {±1.5, ±1.1, ±0.7, ±0.3, ±0.1, 0.0} {±1.8, ±1.5, ±1.1, ±0.7, ±0.3, ±0.1, 0.0}
Red Blue Red Blue Red Green Black
X =
−1 +1
, ,
−∞ ≤ X ∗ ≤ 0 0 < X ∗ ≤ +∞.
The value of α1 in this illustration is 1 and the distribution of (W , V ) is specified to be Gaussian and independent of Z .
[ ]
W |Z ∼ N2 V
[ ] [ 0 1.0 , 0 0.5
0.5 1.0
]
.
These structures are closely related to a special case of the parametric Gaussian models of discrete outcomes studied in Heckman (1978). Expressed in terms of a random variable U which is uniformly distributed on the unit interval the structural functions are as follows.
1 2 h( X , U ) = . . .
M
, , .. . ,
0 ≤ U ≤ Φ (c1 + X ) Φ (c1 + X ) < U ≤ Φ (c2 + X ) .. . Φ (cM −1 + X ) < U ≤ 1.
Two sets of instrumental values, Z1 and Z2 , are considered each containing just 10 values evenly spaced in [−1, 1] and in [−2, 2] respectively.
Z1 = {±1.0, ±0.777, ±0.555, ±0.333, ±0.111} Z2 = {±2.0, ±1.556, ±1.111, ±0.667, ±0.222}. In the structures used to generate the probability distributions applied in this and the next illustration, the effect of doubling of the values in the support of Z is the same as the effect of doubling the coefficient on Z in the equation for the latent variable X ∗ . The effect on the identified sets of such changes in the support or strength of the instrument are discussed shortly. In this illustration the number of classes in which Y is observed is increased from 2 through 14 with threshold values as set out in Table 3. Identified sets for the two parameters, (a0 , a1 ), are drawn in Fig. 1. The upper pane in the figure is obtained using Z1 and the lower pane is obtained using Z2 . The sets are rhombuses arranged with edges parallel to 45° and 225° lines. Identified sets are superimposed one upon another. Using the set of instrumental values Z2 with broader support delivers substantially smaller identified sets. In this case (but not in Illustration A2) the structure of the identified sets is very similar using instrumental values Z1 and Z2 , and the following discussion applies to the upper and the lower pane in Fig. 1. The largest rhombus drawn in Fig. 1 is the identified set with M = 2. Since the outcome is binary, this is the set C 0 (Z). The identified set with M = 4 is the rhombus comprising the lowest blue chevron and what lies above it but excluding a very narrow strip on the edge of the two upper boundaries. This strip (coloured green) is the set C 0 (Z) ∩ D 0 (Z ). Notice that this does not extend all the way along the upper edges of the set because for the case M = 2, C˜ 0 (Z) = C 0 (Z) ⊆ D 0 (Z).
A. Chesher, K. Smolinski / Journal of Econometrics 166 (2012) 33–48
Fig. 1. Illustration A1. Outer sets and identified sets in a binary endogenous variable SEIV model with a parametric ordered probit structural function with threshold functions of the form Φ (ci − a0 − a1 x) as the number of categories of the outcome varies from 2 to 10. The upper (lower) pane represents identified sets when the support of the instrumental variable, Z , is Z1 (Z2 ). The dark blue strip at the upper margin of the rhombuses is not part of the identified sets. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
41
Fig. 2. Illustration A2. Outer sets and identified sets delivered by a binary endogenous variable SEIV model with a parametric ordered probit structural function, intercept a0 and slope a1 . The upper (lower) pane represents identified sets when the support of the instrumental variable, Z , is Z1 (Z2 ). Number of categories of the outcome, M: 2 (red), 4 (blue) and 6 (green). The dark blue strip at the upper margin is not in the identified sets. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
4.4. Illustration A2 The identified set with M = 6 (respectively 8) is the rhombus comprising the second lowest red (respectively blue) chevron and all that lies above it apart from the narrow dark blue shaded strip on the edge of the two upper boundaries. The identified set with M = 10 is disconnected and comprises the two small red shaded rhombuses in the upper part of the picture. The identified set when M = 12 is the small yellow shaded rhombus in the centre of the picture and the identified set when M = 14 is the tiny black shaded rhombus at the intersection of the horizontal and vertical dashed lines. Further increases in numbers of classes deliver sets which are barely distinguishable from points at the scale chosen for Fig. 1. As the number of classes rises the extent of the identified sets falls rapidly but the passage towards point identification is complex and even when the sets are quite small they can be disconnected.
In this illustration the class of structures generating probability distributions is as in Illustration A1 and, as there, α1 = 1. Also as in Illustration A1 two sets of instrumental values are employed, Z1 and Z2 . In this illustration the number of classes is varied through the following sequence. M ∈ {2, 4, 6, 8, 10, 12, 14, 16, 18, 25, 50, 75}. Threshold values (ci ) are chosen to do a good job of covering the main probability mass of the distribution of Y marginal with respect to X and Z . They are chosen as quantiles of an N (0, (2.4)2 ) distribution associated with equally spaced probabilities in [0, 1], e.g. {1/2} for M = 2, {1/3, 2/3} for M = 3, and so on. The identified sets are drawn in Figs. 2–5. In each case the upper (lower) pane gives identified sets obtained using the instrumental values Z1 (Z2 ).
42
A. Chesher, K. Smolinski / Journal of Econometrics 166 (2012) 33–48
Fig. 3. Illustration A2. Identified sets delivered by a binary endogenous variable SEIV model with a parametric ordered probit structural function, intercept a0 and slope a1 . The upper (lower) pane represents identified sets when the support of the instrumental variable, Z , is Z1 (Z2 ). Number of categories of the outcome, M: 8 (red), 10 (blue) and 12 (green). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 4. Illustration A2. Identified sets delivered by a binary endogenous variable SEIV model with a parametric ordered probit structural function, intercept a0 and slope a1 . The upper (lower) pane represents identified sets when the support of the instrumental variable, Z , is Z1 (Z2 ). Number of categories of the outcome, M: 14 (red), 16 (blue) and 18 (green). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
First, consider the upper panes in each of Figs. 2–5 which are produced using the more restricted set of instrumental values Z1 . Fig. 2 shows identified sets for M = 2 (red), M = 4 (blue) and M = 6 (green). Notice that in the latter two cases the identified sets are disconnected comprising two rhombuses. On the upper edges of the upper rhombus in the case M = 4 is a narrow dark blue strip marking the intersection C 0 (Z) ∩ D 0 (Z ) which does not lie in the identified set. This intersection is empty in the other cases that are shown in this figure and in Figs. 3–5. Fig. 3 shows identified sets for M = 8 (red), M = 10 (blue) and M = 12 (green). The identified set for M = 10 is disconnected. Notice that the scale is greatly expanded in this figure, the identified sets are rapidly decreasing in size as the number of classes observed for Y increases. The outline unshaded rhombus in Fig. 3 is the identified set for M = 6 copied across from Fig. 2. Boxes formed by the dashed lines in Fig. 2 show the region focussed on in Fig. 3.
Fig. 4 shows identified sets for M = 14 (red), M = 16 (blue) and M = 18 (green). Again the scale is greatly expanded relative to the previous figure. The outline unshaded rhombus is the identified set for M = 12 copied across from Fig. 3. Fig. 5 shows identified sets for M = 25 (red), M = 50 (blue) and M = 75 (green). Yet again the scale is greatly expanded relative to the previous figure. The lower part of the identified set for M = 18 is drawn in outline. All the identified sets are connected and of very small extent. The situation is now very close to point identification. The identified set at M = 100 is not distinguishable from a point at the chosen scale. Now consider the lower panes in Figs. 2–5 which are obtained using the set of instrumental values Z2 . The scales of the graphs in the upper and lower panes are identical. In each case the doubled range of support results in substantially smaller identified sets. These identified sets have many features in common with the sets obtained with more limited instrument support. They are
A. Chesher, K. Smolinski / Journal of Econometrics 166 (2012) 33–48
43
Table 4 Predictive probabilities for X at values of z in two sets of instrumental values.
Z1
Z2
z
Pr [X = 1|Z = z ]
z
Pr [X = 1|Z = z ]
−1.00 −0.78 −0.56 −0.33 −0.11 +0.11 +0.33 +0.56 +0.78 +1.00
.31 .35 .39 .43 .48 .52 .57 .61 .65 .69
−2.00 −1.56 −1.11 −0.67 −0.22 +0.22 +0.67 +1.11 +1.56 +2.00
.16 .22 .29 .37 .46 .54 .63 .71 .78 .84
If Z included the values +∞ and −∞ then, given the linear specification in the parametric model for the structural function, there would be point identification of the slope parameter a1 . But the structure generating the probability distributions employed here is quite special. In econometric practice point identification ‘‘at infinity’’ is unlikely to be achievable. The upper and lower panes in Fig. 6 plot logarithm (base e) of the lengths of identified intervals for respectively a0 and a1 against the logarithm of the number of classes in which Y is observed. Results are shown for the weak and strong sets of instrumental values, respectively Z1 and Z2 . The lengths of the identified intervals are always smaller using Z2 which is the set of instrumental values with wider support. Fig. 7 plots the logarithm of the area of the identified set for a0 and a1 against the logarithm of the number of classes. The areas of the identified sets are always smaller using Z2 . In each case the points are quite tightly scattered around a negatively sloped linear relationships suggesting approach to point identification at a rate proportional to a power of the number of classes12 . OLS estimates indicate that the lengths of the sets for a0 and a1 both fall at a rate proportional to M −2.1 , and that the area of the identified set for a0 and a1 falls at a rate proportional to M −3.6 . The details of this approach towards point identification and the geometry of the identified sets depend on fine details of the specification of the structures generating the probability distributions such as the precise spacing of the thresholds. 4.5. Illustration B1 Fig. 5. Illustration A2. Identified sets delivered by a binary endogenous variable SEIV model with a parametric ordered probit structural function, intercept a0 and slope a1 . The upper (lower) pane represents identified sets when the support of the instrumental variable, Z , is Z1 (Z2 ). Number of categories of the outcome, M: 25 (red), 50 (blue) and 75 (green). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
disconnected in some cases but when the instrument has greater support this phenomenon disappears sooner as the discreteness of the outcome is reduced. In Chesher (2010) it is shown that the values of threshold functions hm (x∗ ) are point identified at any value x∗ for which there exists an instrumental value z ∗ such that Pr0 [X = x∗ |Z = z ∗ ] = 1. In the light of this result these probabilities are a good indicator of the ‘‘strength’’ of an instrument. Table 4 shows the values of these probabilities for the structures that generate the probability distributions used in this illustration and for the two sets of instrumental values Z1 and Z2 . The largest probability is 0.84 using Z2 compared with 0.69 using Z1 . In the structures used to generate the probability distributions used in this illustration Pr0 [X = +1|Z = +∞] = 1 = Pr0 [X = −1|Z = −∞].
The class of structures generating probability distributions is as in Illustration A1. In this illustration we use the instrumental values in Z1 In this illustration there are M = 3 classes throughout. The values of a0 and s are normalised to respectively zero and one. The unknown parameters are the thresholds c1 and c2 and the slope coefficient a1 . This is the sort of setup one finds when modelling attitudinal data where threshold values are unknown parameters of considerable interest. In the structure generating the probability distributions the values of the thresholds are as follows
(c1 , c2 ) = (−0.667, +0.667) and α1 = 1. The identified set resides in a three-dimensional square prism of infinite extent: R × (0, 1)2 . Figs. 8–10 show slices taken through this at a sequence of values of a1 showing at each chosen value of a1 the associated identified set for (c1 , c2 ). In all cases this is
12 Where sets are disconnected the lengths of the identified sets for individual parameters are calculated as the sum of the lengths of disjoint intervals, and the area of the sets for a pair of parameters is calculated as the sum of the areas of the connected component sets.
44
A. Chesher, K. Smolinski / Journal of Econometrics 166 (2012) 33–48
Fig. 7. Illustration A2. Shrinkage of identified sets as the number of outcome categories increases. Logarithm of the area of the identified set plotted against the logarithm of the number of categories of the outcome, Y . In green (red) the identified sets computed with the support of Z equal to Z1 (Z2 ). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 6. Illustration A2. Shrinkage of identified sets as the number of outcome categories increases: (upper pane) logarithm of the length of the identified interval for a0 plotted against the logarithm of the number of categories of the outcome, Y , (lower pane) logarithm of the length of the identified interval for a1 plotted against the logarithm of the number of categories of the outcome, Y . In green (red) the identified sets are computed with the support of Z equal to Z1 (Z2 ). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
connected and resides above the 45° line in the unit square because the restriction c2 > c1 has been imposed. In each case the rectangular regions (shaded red and green) indicate combinations of (c1 , c2 ) which at the chosen value of a1 lie in the set C 0 (Z). The green shaded regions indicate combinations of (c1 , c2 ) that at the chosen value of a1 are in the intersection C 0 (Z) ∩ D 0 (Z ). These combinations of (a1 , c1 , c2 ) do not lie in the identified set. The red shaded regions indicate combinations of (c1 , c2 ) that at the chosen value of a1 are in the intersection C˜ 0 (Z) = C 0 (Z) ∩ D 0 (Z). These combinations of (a1 , c1 , c2 ) are in the identified set. The extent of the regions in the c1 × c2 plane grows as a1 falls towards the value 1.0 and then shrinks as a1 falls further. 5. Concluding remarks Single equation instrumental variable (SEIV) models for ordered discrete outcomes generally set identify structural functions or,
if there are parametric restrictions, parameter values. Complete models, for example the triangular control function model, can be point identifying, but in applied econometric practice there may be no good reason to choose one point identifying model over another. For any particular distribution of observable variables the sets delivered by the SEIV model give information about the variety of structural functions or parameter values that would be delivered by one or another of the point identifying models which are restricted versions of the SEIV model. For the nonparametric case we have developed a system of equalities and inequalities that bound the identified sets of structural functions delivered by an SEIV model in the case when endogenous variables are discrete. When either the outcome or the endogenous variable is binary the inequalities sharply define the identified set. The inequalities involve only probabilities about which data is informative and the identified sets can be estimated and inferences drawn using the methods set out in Chernozhukov et al. (2009). Some illustrative calculations for the binary outcome case are given in Chesher (2009). Calculations in a parametric model suggest that the degree of ambiguity attendant on using the SEIV model reduces rapidly as the discreteness of the outcome is reduced. Research to determine the extent to which this is true in less restricted settings is one of a number of topics of current research. Acknowledgements We thank Martin Cripps and Adam Rosen for helpful discussions. Some of the results in this paper were presented at the conference Identification and Decisions held May 8–9, 2009 at Northwestern University. We gratefully acknowledge the financial support of the UK Economic and Social Research Council through a grant (RES-589-28-0001) to the ESRC Centre for Microdata Methods and Practice (CeMMAP). Appendix A. Proof of Proposition 5 Consider candidate structural functions, that is, values of γm1 and γm2 , m ∈ {1, . . . , M − 1}. Define β(Z) so that conditions I1 and I2 are satisfied for all z ∈ Z. There is only one way to do this: for each m and each z ∈ Z, to satisfy Condition I1: 0 βm11 (z ) = α¯ m1 (z )
0 βm22 (z ) = α¯ m2 (z )
(A1.1)
A. Chesher, K. Smolinski / Journal of Econometrics 166 (2012) 33–48
45
Fig. 8. Illustration B1. Three class ordered probit model with unknown threshold parameters c1 and c2 and slope coefficient a1 . Cross-section of the identified set (red) and outer set (red and green) for c1 , c2 and a1 at selected values of a1 . (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
A1. i = 1, j = 1. In this case t = s + 1 because γs1 < γt1 are adjacent. Properness requires that βs11 ≤ βs+1,11 but (A1.1) 0 ensures that this holds because βs11 = α¯ s1 (z ) ≤ α¯ s0+1,1 (z ) = βs+1,11 .
and to satisfy Condition I2:
δ (z )βm11 (z ) + δ (z )βm12 (z ) = γm1 0 1 0 1
0 2 0 2
δ (z )βm21 (z ) + δ (z )βm22 (z ) = γm2 and, on combining these results, for m ∈ {1, . . . , M } there are the following expressions 0 γm1 − δ10 (z )α¯ m1 (z ) βm12 (z ) = 0 δ2 (z )
βm21 (z ) =
0 (z ) γm2 − δ20 (z )α¯ m2 δ10 (z )
0 α¯ s1 (z ) ≤
and there are four possibilities to consider as follows.
0 γt2 − δ20 (z )α¯ t2 (z ) 0 δ1 (z )
which is written as follows. (A1.2)
It is now shown that for every γ ∈ C˜ 0 (Z) the value of β(Z) defined by (A1.1) and (A1.2) as z varies across Z satisfies the properness condition I3. It follows that C˜ 0 (Z) ⊆ H 0 (Z ) and Proposition 4 states that H 0 (Z ) ⊆ C˜ 0 (Z), so it must be that H 0 (Z ) = C˜ 0 (Z) in this binary endogenous variable case. To proceed, consider the distribution function characterised by βmj1 (z ) for m ∈ {1, . . . , M − 1} and j ∈ {1, 2} and any z ∈ Z. Here conditioning is on X = x1 and Z = z. The argument when conditioning is on X = x2 goes on similar lines and can be worked through by exchange of indices in what follows. Condition I3 is satisfied if for every adjacent pair of values γsi < γtj :
βsi1 (z ) ≤ βtj1 (z )
A2. i = 1, j = 2. Properness requires that βs11 ≤ βt21 which, on using (A1.1) and (A1.2), requires that
0 0 δ10 (z )α¯ s1 (z ) + δ20 (z )α¯ t2 (z ) ≤ γt2 .
(A1.3)
If γ ∈ C 0 (Z) then there is, on replacing γls by γt2 in the lower bounding inequality in the definition of C 0 (Z) and considering a particular value of z ∈ Z K M −1 − −
0 δi0 (z )αmi (z )1(γmi ≤ γt2 ) ≤ γt2
(A1.4)
i=1 m=1
and since γs1 < γt2 and the values are adjacent the left-hand side of (A1.4) as follows:
δ10 (z )
s −
0 αm1 (z ) + δ20 (z )
m=1
t −
0 αm2 (z )
m=1
0 0 = δ10 (z )α¯ s1 (z ) + δ20 (z )α¯ t2 (z )
and so (A1.3) holds.
46
A. Chesher, K. Smolinski / Journal of Econometrics 166 (2012) 33–48
Fig. 9. Illustration B1. Three class ordered probit model with unknown threshold parameters c1 and c2 and slope coefficient a1 . Cross-section of the identified set (red) and outer set (red and green) for c1 , c2 and a1 at selected values of a1 . (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
A3. i = 2, j = 1. Properness requires that βs21 ≤ βt11 which, on using (A1.1) and (A1.2), requires that 0 γs2 − δ20 (z )α¯ s2 (z ) 0 ≤ α¯ t1 (z ) 0 δ1 ( z )
(A1.5)
If γ ∈ C (Z) then there is, on replacing γls by γs2 in the upper bounding inequality in the definition of C 0 (Z) and considering a particular value of z ∈ Z: 0
γs2 ≤
K − M −
δ (z )α (z )1(γm−1,i < γs2 ) 0 i
0 mi
δ20 (z )αs0+1,2 (z ) ≤ γs+1,2 − γs2 .
(A1.7)
If γ ∈ D 0 (Z) then replacing γni and γmi by respectively γs+1,2 and γs2 in the definition of D 0 (Z) and considering a particular value z ∈ Z:
which is written as follows. 0 0 γs2 ≤ δ10 (z )α¯ t1 (z ) + δ20 (z )α¯ s2 (z ).
which is written as follows.
(A1.6)
0 γs+1,2 − γs2 ≥ δ20 (z )(α¯ s0+1,2 (z ) − α¯ s2 (z )) = δ20 (z )αs0+1,2 (z )
and so (A1.7) holds. We can think of the set C˜ 0 (Z) as an intersection of sets, one obtained at each value of z ∈ Z , thus.
i=1 m=1
and since γs2 < γt1 and the values are adjacent the right-hand side of (A1.6) is as follows:
δ10 (z )
t − m=1
0 αm1 (z ) + δ20 (z )
2 −
0 αm2 (z )
m=1
0 0 = δ10 (z )α¯ t1 (z ) + δ20 (z )α¯ s2 (z )
and so (A1.5) holds. A4. i = 2, j = 2. It must be that t = s + 1 because γs2 < γt2 are adjacent. Properness requires that βs21 ≤ βs+1,21 which, on using (A1.2), requires that 0 γs+1,2 − δ20 (z )α¯ s0+1,2 (z ) γs2 − δ20 (z )α¯ s2 (z ) ≤ δ10 (z ) δ10 (z )
C˜ 0 (Z) =
C˜ 0 (z ).
z ∈Z
It has been shown that for any z ∈ Z and for all γ ∈ C˜ 0 (z ) there are conditional distribution functions characterised by β(z ) defined as in (A1.1) and (A1.2) such that conditions I1, I2 and I3 hold. Let β(Z) be the conditional distribution functions generated using definitions (A1.1) and (A1.2) as z varies within Z. Values γ ∈ C˜ 0 (Z) lie in every set C˜ 0 (z ), and so for each such value of γ there are conditional distribution functions in β(Z) such that conditions I1, I2 and I3 are satisfied. It follows that C˜ 0 (Z) ⊆ H 0 (Z) and since H 0 (Z) ⊆ C˜ 0 (Z), it follows that H 0 (Z ) = C˜ 0 (Z). This completes the proof of Proposition 5.
A. Chesher, K. Smolinski / Journal of Econometrics 166 (2012) 33–48
47
Fig. 10. Illustration B1. Three class ordered probit model with unknown threshold parameters c1 and c2 and slope coefficient a1 . Cross-section of the identified set (red) and outer set (red and green) for c1 , c2 and a1 at selected values of a1 . (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Appendix B. Inequalities defining the identified set for the case M = 3, K = 2
(A2) 0 0 0 δ10 α11 < γ11 ≤ δ10 α11 + δ20 α12 0 0 0 0 0 δ10 α11 + δ20 α12 < γ12 ≤ δ10 (α11 + α21 ) + δ20 α12
When M = 3 and K = 2 there are six arrangements of the set of values of γ , of which three (B1–B3) can be obtained by the exchange of indexes labelling elements of support of X in arrangements A1–A3. Dependence of probabilities on z is suppressed in notation. The inequalities excluding (13) deliver set C 0 (z ). Intersection with D 0 (z ) introduces the additional inequality (13).
(A1) : γ11 < γ12 < γ21 < γ22 (A2) : γ11 < γ12 < γ22 < γ21 (A3) : γ11 < γ21 < γ12 < γ22 (B1) : γ12 < γ11 < γ21 < γ22 (B2) : γ12 < γ11 < γ22 < γ21 (B3) : γ12 < γ22 < γ11 < γ21
0 0 0 0 0 δ10 α11 + δ20 (α12 + α22 ) < γ22 ≤ δ10 (α11 + α21 ) 0 0 0 + δ2 (α12 + α22 ) 0 0 0 0 δ10 (α11 + α21 ) + δ20 (α12 + α22 ) < γ21 0 0 ≤ δ10 (α11 + α21 ) + δ20 0 δ20 α22 < γ22 − γ12
(13)
(A3) 0 0 0 δ10 α11 < γ11 ≤ δ10 α11 + δ20 α12 0 0 0 0 0 δ10 (α11 + α21 ) < γ21 ≤ δ10 (α11 + α21 ) + δ20 α12 0 0 0 0 δ10 (α11 + α21 ) + δ20 α12 < γ12 ≤ δ10 + δ20 α12 0 0 0 0 0 0 δ10 (α11 + α21 ) + δ20 (α11 + α21 ) < γ22 ≤ δ10 + δ20 (α11 + α21 )
(A1) 0 0 0 δ10 α11 < γ11 ≤ δ10 α11 + δ20 α12
δ α +δ α 0 1 0 1
0 11 0 11
0 2
0 12
Appendix C. Falsifiability of the SEIV model
< γ12 ≤ δ (α + α ) + δ α 0 1
0 11
0 0 21 2 0 0 1 11 0 0 12 22
0 12 0 21
0 δ (α + α ) + δ20 α12 < γ21 ≤ δ (α + α ) + δ20 (α + α ) 0 21
0 0 0 0 0 0 + α22 ) δ10 (α11 + α21 ) + δ20 (α12 + α22 ) < γ22 ≤ δ10 + δ20 (α12
This Annex provides a simple example of probability distributions for Y and X given Z which cannot be supported by an SEIV model. Faced with these probability distributions the SEIV model delivers a system of inequalities that define an empty set. The existence of such cases ensures that the SEIV model is falsifiable. In the
48
A. Chesher, K. Smolinski / Journal of Econometrics 166 (2012) 33–48
example, Y , X and Z are all binary. It is easy to extend to more complex cases and there are many choices of values of the probabilities that deliver an empty identified set. When Y is binary the structural function has a single threshold function taking values γ11 and γ12 at the two values of X . There are two orderings: l = 1: γ11 ≤ γ12 and l = 2 : γ12 ≤ γ11 . The identified sets associated with zi , i ∈ {1, 2} are, for each ordering:
H1 (zi ) = {γ : δ1 (zi )α1 (zi ) ≤ γ11 ≤ δ1 (zi )α1 (zi ) + δ2 (zi )α2 (zi )
≤ γ12 ≤ δ1 (zi ) + δ2 (zi )α2 (zi )}
H2 (zi ) = {γ : δ2 (zi )α2 (zi ) ≤ γ12 ≤ δ1 (zi )α1 (zi ) + δ2 (zi )α2 (zi )
≤ γ11 ≤ δ2 (zi ) + δ1 (zi )α1 (zi )}
and the identified set is as follows.
H (Z) = (H1 (z1 ) ∩ H1 (z2 )) ∪ (H2 (z1 ) ∩ H2 (z2 )) Here is an example with proper probabilities such that this set is empty.
δ1 (z1 ) = δ1 (z2 ) = δ2 (z1 ) = δ2 (z2 ) = 0.5 α1 (z1 ) = α2 (z1 ) = 0.2 α1 (z2 ) = α2 (z2 ) = 0.8.
(A2.1) (A2.2)
The components of the identified sets are:
H1 (z1 ) = {γ : 0.1 ≤ γ11 ≤ 0.2 ≤ γ12 ≤ 0.6} H1 (z2 ) = {γ : 0.4 ≤ γ11 ≤ 0.8 ≤ γ12 ≤ 0.9} H2 (z1 ) = {γ : 0.1 ≤ γ12 ≤ 0.2 ≤ γ11 ≤ 0.6} H2 (z2 ) = {γ : 0.4 ≤ γ12 ≤ 0.8 ≤ γ11 ≤ 0.9} and the identified set, H (Z) = φ , the empty set, because
H1 (z1 ) ∩ H1 (z2 ) = H2 (z1 ) ∩ H2 (z2 ) = φ. There is no nonparametric SEIV model that supports the probability distribution that delivers (A2.1) and (A2.2).
References Blundell, R.W., Powell, J.L., 2003. In: Dewatripont, M., Hansen, L.P., Turnovsky, S.J. (Eds.), Endogeneity in Nonparametric and Semiparametric Regression Models. In: Advances in Economics and Econometrics: Theory and Applications, Eighth World Congress, vol. II. Cambridge University Press, Cambridge. Blundell, R.W., Powell, J.L., 2004. Endogeneity in semiparametric binary response models. Review of Economic Studies 71, 655–679. Chernozhukov, V., Hansen, C., 2005. An IV model of quantile treatment effects. Econometrica 73, 245–261. Chernozhukov, V., Lee, S., Rosen, A., 2009. Intersection Bounds: Estimation and Inference, CeMMAP Working Paper 19/09. Chesher, A.D., 2003. Identification in nonseparable models. Econometrica 71, 1405–1441. Chesher, A.D., 2005. Nonparametric identification under discrete variation. Econometrica 73, 1525–1550. Chesher, A.D., 2009. Single Equation Endogenous Binary Response Models, CeMMAP Working Paper 23/09. Chesher, A.D., 2010. Instrumental variable models for discrete outcomes. Econometrica 78, 575–601. Frame, J.S., Robinson, G.de B., Thrall, R.M., 1954. The hook graphs of the symmetric groups. Canadian Journal of Mathematics 6, 316–324. Greene, W., 2007. LIMDEP 9.0 reference guide. Econometric Software, Inc., New York. Heckman, J.J., 1978. Dummy endogenous variables in a simultaneous equations system. Econometrica 46, 931–959. Ihaka, R., Gentleman, R., 1996. R: a language for data analysis and graphics. Journal of Computational and Graphical Statistics 5, 299–314. Manski, C.F., 2003. Partial Identification of Probability Distributions. SpringerVerlag, New York. Rivers, Douglas, Vuong, Quang, 1988. Limited information estimators and exogeneity tests for simultaneous probit models. Journal of Econometrics 39, 347–366. Smith, R.J., Blundell, R.W., 1986. An exogeneity test for a simultaneous equation tobit model with an application to labor supply. Econometrica 54, 679–685. Statacorp,, 2007. Stata Statistical Software: Release 10. College Station, TX: StataCorp LP. Ziegler, G.M., 2007. Lectures on Polytopes. Springer Science, New York, (Updated seventh printing of the first edition).