Identification of complete information games

Identification of complete information games

Journal of Econometrics 189 (2015) 117–131 Contents lists available at ScienceDirect Journal of Econometrics journal homepage: www.elsevier.com/loca...

548KB Sizes 1 Downloads 40 Views

Journal of Econometrics 189 (2015) 117–131

Contents lists available at ScienceDirect

Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom

Identification of complete information games Brendan Kline Department of Economics, University of Texas at Austin, Austin, TX 78712, United States

article

info

Article history: Received 11 March 2013 Received in revised form 24 January 2015 Accepted 29 June 2015 Available online 20 July 2015 JEL classification: C14 C31 C57 C72 L13

abstract This paper establishes sufficient conditions for point identification of the utility functions in generalized complete information game models. These models allow generalized interaction structures and generalized behavioral assumptions. The generalized interaction structures allow that the dependence of an agent’s utility function on the other agents’ actions can itself depend on characteristics of the agents, including an endogenous network of connections among the agents. The generalized behavioral assumptions relax the solution concept from Nash equilibrium play to weaker solution concepts like rationalizability. The results allow a non-parametric specification of the unobservables. © 2015 Elsevier B.V. All rights reserved.

Keywords: Identification Strategic interaction Entry game Social interactions

1. Introduction 1.1. Overview The complete information game framework is a way to model a choice problem involving interactions among economic agents: special cases include entry games, technology adoption with network effects, and social interactions. In any such model there is a group of agents that interact with each other, and the ‘‘choices’’ or ‘‘actions’’ of the agents are the outcomes of the game. The interaction arises because each agent has a utility function that depends on its own action and the actions of the other agents. This paper is concerned with providing sufficient conditions for point identification of the utility functions in a class of generalized complete information game models. These models allow generalized interaction structures and generalized behavioral assumptions. The generalized interaction structures concern the modeling of the interaction effects, which characterize the dependence of each agent’s utility function on the other agents’ actions. The generalized behavioral assumptions concern the solution concept.

E-mail address: [email protected]. http://dx.doi.org/10.1016/j.jeconom.2015.06.023 0304-4076/© 2015 Elsevier B.V. All rights reserved.

The first main contribution of this paper is the generalized interaction structure, which allows the interaction effects to depend on characteristics of the agents. For example, in a model of smoking with social interactions, the effect of the smoking behavior of an individual on the utility another individual gets from smoking can depend on the similarity of those individuals (e.g., in terms of demographics). The interaction effects can also depend on an endogenous network of connections among the agents. For example, the network can be a network of friendships: a pair of individuals are connected in this network if and only if they are friends. The network determines whether an agent’s utility function has any dependence on another agent’s action according to whether those agents are connected to each other in the network. This allows that ‘‘friends’’ have a potentially non-zero effect on each other’s utility functions, whereas ‘‘non-friends’’ have necessarily no effect on each other’s utility functions.1 The network can be endogenous, in

1 Similar issues have been addressed by Bramoullé et al. (2009) and De Giorgi et al. (2010) in the context of the linear-in-means model of social interactions. See also Manski (1993). Other approaches to social interactions include: the ‘‘treatment-response’’ or ‘‘best response’’ approaches of Kline and Tamer (2012), Manski (2013), and Lazzati (2015), and the ‘‘timing game’’ approaches of de Paula (2009) and Honoré and de Paula (2010).

118

B. Kline / Journal of Econometrics 189 (2015) 117–131

the sense that it can be related to the unobservable (and/or observable) components of utility. In models of social interactions, this corresponds to homophily, the empirical regularity that people tend to be friends with people that are similar to themselves (e.g., McPherson et al. (2001)). The fact that it is possible to point identify the model despite allowing an endogenous network of friendships means that it is possible to distinguish between the causal effect of the behavior of friends (the interaction effect) from the fact that friendships are not randomly assigned. This introduces a qualitatively different identification problem than exists in the standard model of a complete information game. Prior work (e.g., Tamer (2003)) maintained stronger homogeneity assumptions concerning the interaction effects, ruling out the possibility that interaction effects can depend on observed or unobserved characteristics of the agents, and in particular ruling out the dependence on the network. Much of the prior work on networks has treated the network as the outcome. In particular, as summarized for example in Kolaczyk (2009), prior work has focused on developing models of network formation, understanding the properties of the connections in networks, and related topics. More recently, perhaps especially in economics, a literature has developed that treats the network as an explanatory variable in models of other outcomes. For example, Bramoullé et al. (2009) and De Giorgi et al. (2010) study the role of networks in the context of identification and estimation of the linear-in-means model of continuous outcomes in social interactions, and the literature on games on networks (e.g., Galeotti et al. (2010) and Jackson (2010)) studies the economic theory of how games can be ‘‘built on’’ underlying network structures. In contrast to the prior economic theory work on games on networks, the current paper takes an econometrics approach and shows identification of a model that treats the network as an explanatory variable in a game theoretic model, for a ‘‘discrete choice with interactions’’ problem faced by the agents, further enriching the literature on networks by providing another avenue for treating the network as an explanatory variable rather than the outcome. The second main contribution of this paper concerns generalized behavioral assumptions. The identification result establishes that level-2 rationality is sufficient for point identification. Roughly, level-2 rationality corresponds to the assumption that all agents are rational and know that all agents are rational. The assumption of level-2 rationality is a weaker assumption on the solution concept than Nash equilibrium or rationalizability (e.g., Bernheim (1984), Pearce (1984), Tan and da Costa Werlang (1988), Fudenberg and Tirole (1991, Section 2.1.3)), and therefore assuming Nash equilibrium or rationalizability is a sufficient condition for the identification result. Most prior work, like Bresnahan and Reiss (1990, 1991), Berry (1992), Tamer (2003), and Bajari et al. (2010b), has focused on the stronger assumption of Nash equilibrium. Prior work in AradillasLopez and Tamer (2008) and Kline and Tamer (2012) has also studied identification under level-k rationality and rationalizability. However, those papers consider much different models than considered here, substantially altering the identification problem. Indeed, those papers generically find partial identification.2 Note two related papers: whereas this paper assumes the solution concept is known by the econometrician (and is not an object of interest), Kline (2015b) investigates the possibility of identifying which solution concepts players use, in the context of experimental data where the utility functions are known; and note that Kline (2015a) studies point identification under Nash equilibrium, relaxing the

2 Aradillas-Lopez and Tamer (2008) do report results concerning point identification in incomplete information games, in contrast to the complete information games studied in this paper.

assumption of large support used in this paper and prior identification results. Also, the identification results allow a non-parametric specification of the unobservable components of utility, allowing that the unobservables may be correlated across agents in the game and allowing heteroskedasticity. The main assumption on the unobservables is median independence of the unobservables from the explanatory variables. Prior work tended to rely on additional ‘‘independence’’ and/or ‘‘distributional’’ assumptions.3 Identification in models of games is difficult, and presents unique challenges not present in most other econometric models, especially because there is not a unique (‘‘equilibrium’’) outcome for all specifications of the utility functions, due to either multiple equilibria or mixed strategies. This problem is exacerbated if the solution concept is weakened from Nash equilibrium toward rationalizability and related solution concepts, since weaker solution concepts have more equilibrium potential outcomes.4 The other new features of the identification results in this paper further complicate the identification problem. For example, the potential endogeneity of the network adds a qualitatively different identification problem, discussed above. 1.2. Outline of the paper Section 2 introduces the model. Identification of the utility functions is discussed in Section 3. Section 4 illustrates the features of the model in the context of social interactions. Partial identification under weaker assumptions is discussed in Section 5. Section 6 discusses estimation. Section 7 concludes. Appendix A collects the proofs. Identification of the distribution of the unobservables is discussed in an online supplement (see Appendix B). 2. Setup of the model 2.1. Overview and utility functions In the game of complete information, each of the N agents simultaneously chooses an action from S = {0, 1}. In a generic instance m of the game, the payoff to agent i from choosing action yim when the other agents take actions y(−i)m is uim (yim , y(−i)m ),5 where uim (0, y(−i)m ) = 0

and

uim (1, y(−i)m ) = xim1 + u˜ i (xim(−1) , θi ) +



w gijm yjm + ϵim .

(1)

j̸=i

3 For example, correlation in social interactions allows an unobserved ‘‘classroom fixed effect’’ that affects the utility of all students in a classroom. In models of social interactions, this corresponds roughly to the ‘‘correlated effects’’ described by Manski (1993). Some prior work has used relatively stronger assumptions on the unobservables even in the standard model. Tamer (2003) assumes that the unobservables are in a known parametric family and are independent from the explanatory variables. Bajari et al. (2010b) assumes that the distribution of the unobservables is known, has independent realizations across agents, and is independent from the explanatory variables. In particular, in terms of allowing correlation among the unobservables, it is important to note that although this is not the first paper to allow correlation, nevertheless it is a non-trivial feature that correlation is allowed. 4 These are slightly different issues than those that arise in incomplete information models, or other models without ‘‘standard’’ complete information, like Brock and Durlauf (2001, 2007), Aradillas-Lopez (2010), Bajari et al. (2010a), de Paula and Tang (2012), or Grieco (2014). 5 The utility from taking action 0 is 0, for any actions of the other agents. Similar to single-agent discrete choice models, this normalization (or equivalent) is necessary for point identification, because only differences in utility are relevant to decision making in games.

B. Kline / Journal of Econometrics 189 (2015) 117–131

w Except for the j̸=i gijm yjm term, the utility function for agent i is a standard ‘‘partially linear’’ utility function, with xm the observed payoff shifters and ϵm the unobserved payoff shifters. (Since the underlying game involves complete information, the agents themselves have common knowledge of the utility functions. The distinction between ‘‘observed’’ and ‘‘unobserved’’ is from the perspective of the econometrician.) The utility function is ‘‘partially linear’’ in the sense that the first explanatory variable of agent i (i.e., xim1 ) appears additively in the utility function for agent i. The unit coefficient on xim1 is the corresponding scale normalization. The remaining explanatory variables of agent i (i.e., xim(−1) ) appear as arguments to the function u˜ i (xim(−1) , θi ). The econometrician knows the functional form of u˜ i (·), but does not know θi . The parameter θi is the parametrization of the possible utility functions, but need not be finite-dimensional.6 The utility function the interactions among the agents,  reflects w because of the j̸=i gijm yjm term. The model in this paper



generalizes the standard model7 of the interaction structure to w gijm = si (gm ) fij (zijm , δij ) + νijm gijm .





(3)

The interaction structure can be represented by the weighted w w adjacency matrix gm = (gijm )i,j : a weighted graph on N vertices, whose vertices are the agents, and whose weighted edges are the interaction effects among the agents. An example of this graphical representation is considered in Section 4. By construction in Eq. (3), the interaction structure has a functional dependence on many terms, described next. It is important to note that the interaction structure is subscripted by m, and therefore can be different in different instances of the game. The following describes the terms appearing in Eq. (3): (1) gijm is an element of gm = (gijm )i,j , which is an adjacency matrix that describes a network of binary connections among the agents, like friendships among individuals in the case of social interactions. A connection exists from agent im to agent jm if gijm = 1. The model does not require that gm is symmetric. By Eq. (3), the existence of the binary connections among the agents determines whether an agent’s action affects the utility w of another agent, because gijm ̸= 0 only if gijm = 1. For example, in social interactions, im’s utility function might depend on jm’s action only if im considers jm to be a friend. The connections in the network can be endogenous, in the sense that gijm can be related to the unobservable (and/or observable) parts of utility: in particular, gijm can depend on the ‘‘similarity’’ of (xim , ϵim ) and (xjm , ϵjm ). For example, in social interactions this allows for homophily, the empirical regularity that people tend to be friends with people that are similar to themselves. Identification with an endogenous

6 If θ is finite-dimensional and conformable to x ˜ i (xim(−1) , θi ) = i im(−1) , and u x′im(−1) θi , then the utility function has a standard linear specification, with parameter θi . More generally, the econometrician can assume some class of functions Ui such that the econometrician assumes that the utility function has the

 w ˜˜ i (·) ∈ Ui . In j̸=i gijm yjm + ϵim for some u ˜ that case, it can be taken that u˜ i (xim(−1) ) = θi (xim(−1) ) because there is a bijective mapping between specifications of θi and elements of Ui . Consequently, the form uim (1, y(−i)m ) = xim1 + u˜˜ i (xim(−1) ) +

identification strategy will not rely on finite-dimensional parametric assumptions on the utility functions. Related ‘‘partially linear’’ specifications appear in the economic theory literature, for example Jackson (2010). 7 The standard model of a complete information game (e.g., Tamer (2003)) specifies that the interaction structure has the form w gijm ≡ ∆ij ,

(2)

for an unknown non-random parameter ∆ij . At most, the only sort of heterogeneity allowed by this specification is that it may be that ∆ij ̸= ∆kl .

119

network requires distinguishing between the causal effect of the behavior of friends (the interaction effect) from the fact that friendships may not be randomly assigned. (2) fij (zijm , δij ) + νijm is the part of the interaction effect of jm on im, when indeed gijm = 1, that can depend on the characteristics of those agents. The econometrician knows the functional form of fij (·), but does not know the possibly infinite-dimensional δij . The observed explanatory variables relevant to the interaction effect are zijm , and unobservables relevant to the interaction effect are νijm . There can be overlap between the variables in zijm and the variables in xm . For example, the competitive effect of entry on profits in an entry game could depend on the substitutability of the products/services produced by the firms. (3) Finally, si (·) is a known ‘‘weight’’ function defined on the space of networks, where 0 < si (·) ≤ 1. For example, it could be that

−1

. In the former case utility si (gm ) ≡ 1 or si (gm ) ≡ j gijm depends on the sum of the actions of the other agents, whereas in the latter case utility depends on the average action of the other agents.8



In the setup of the model, the subscripting of the parameters by i refers to ‘‘agent role’’ i (or just ‘‘role’’ i) across instances of the game. There is one agent in role i per instance of the game. In an entry game the subscripts could represent the ‘‘large firm’’ and ‘‘small firm’’ in each market, or in social interactions the subscripts could order individuals by relative age in each classroom, for example. Consequently the model allows for asymmetric agent roles. The ‘‘equal parameters’’ model in which the parameters of the utility functions are the same across roles (i.e., θi ≡ θ ≡ θj and δij ≡ δ ≡ δkl ) is a special case, and therefore is covered by the identification results that allow for asymmetric agent roles. In the case of the equal parameters model, the indexing of the agents within each instance of the game is simply for the purposes of accounting, similar to how observations are indexed by i in cross-sectional data models, even though the parameters of crosssectional models are shared by all observations.9 A subtle but important point is that the ‘‘equal parameters’’ model concerns assumptions on the parameters of the utility functions, but does not imply any further restrictions, and in particular does not imply exchangeability (or symmetry) in the explanatory variables across agents. So, for example, the ‘‘equal parameters’’ model is compatible with different agents having different realized values of the explanatory variables. Further, the ‘‘equal parameters’’ model does not place any additional restrictions on the network: it can happen that gijm = 1 in instance m of the game while gijm′ = 0 in instance m′ of the game. And, the ‘‘equal parameters’’ model does not require that the network is symmetric. The identification results hold for an arbitrary, but fixed, number of agents ‘‘N’’ in the game. If the number of agents varies across instances of the game in the data, then it is possible to apply the identification results to the subsets of the data defined by the number of agents ‘‘N’’. For example, applying the results to instances of

8 Often, the literature on social interactions has supposed that peer effects operate through averages (e.g., the linear-in-means model of Manski (1993)), suggesting the latter specification, whereas in the case of entry games it might be more natural to assume that the competitive effects operate through the sum of the number of competitors in the market, suggesting the former specification. Roughly analogous specifications of the ‘‘weight’’ function have been considered in Galeotti et al. (2010) from the perspective of the economic theory of games played on networks. 9 And so, the indexing of the agents could be ‘‘imposed’’ by the econometrician, as in indexing individuals by relative age in social interactions, or simply could be ‘‘randomly assigned’’ by the survey process. In any case, the ‘‘equal parameters’’ model is the assumption that the utility functions for agents with different indices are the same.

120

B. Kline / Journal of Econometrics 189 (2015) 117–131

an entry game with N = 2 it would be possible to identify the parameters for the ‘‘large firm out of 2 firms’’ and ‘‘small firm out of 2 firms,’’ and applying the results to instances of an entry game with N = 3 it would be possible to identify the parameters for the ‘‘large firm out of 3 firms’’ and ‘‘medium firm out of 3 firms’’ and ‘‘small firm out of 3 firms.’’ The data consists of observations of independent instances of the game. 2.2. Solution concept The solution concept describes the behavior of the agents given the utility functions. The identification result is based on level-2 rationality, a solution concept weaker than Nash equilibrium, rationalizability, and level-k rationality for k ≥ 3. Therefore, any of those solution concepts is also sufficient for the identification result. Prior work on point identification assumed Nash equilibrium. The definition of the level-k rational strategies is provided in Definition 1. The level-k rational strategies for agent i are Rki and the rationalizable strategies for agent i are R∞ i (cf., Bernheim (1984) and Pearce (1984)). The notation σi ∈ BR(σ(−i) ) means that the strategy σi is a best response to the strategy profile σ(−i) . See also Fudenberg and Tirole (1991, Section 2.1.3). Recall that the action space is Si = S = {0, 1}. Definition 1 (Level-k Rationality). Let R0i = ∆(Si ), Rki = {σi ∈  k ∆(Si ) : ∃σ(−i) ∈ ∆(Rjk−1 ) s.t. σi ∈ BR(σ(−i) )}, and R∞ i = ∩k Ri . This definition shows that level-k rationality is based on the idea that agents use a strategy that is a best response to ‘‘reasonable’’ conjectures about the strategies used by the other agents. The ‘‘reasonable’’ conjectures are (mixtures of) the level-(k − 1) strategies. In particular, a strategy profile is level-2 rational if each agent i uses a strategy σi that is a best response to some strategy profile σ−i of the other agents. And, each agent j ̸= i might use its component of the strategy profile σ−i , because its component is a (mixture of) best response(s) to yet some other strategy profile(s) σ−′ j of the agents other than j. In contrast, a Nash equilibrium strategy profile requires that each agent uses a strategy that is a best response to the actual strategies used by the other agents, which essentially requires a level of coordination that is not implied by rationality alone (e.g., Aumann and Brandenburger (1995)). The existence of rationalizable (and thus level-2 rational) strategies is implied by the existence of a mixed strategy Nash equilibrium in any finite game. In other setups there may not always exist an equilibrium according to the maintained solution concept, in which case the model is incoherent. The implications of incoherence have been studied, for example, by Chesher and Rosen (2012). The level-k rational strategies can also be characterized epistemically, as in Tan and da Costa Werlang (1988). The level-k rational strategies, for k ≥ 2, are those strategies that can be used by a rational agent that knows everyone (knows everyone)k−2 is rational and knows everyone (knows everyone)k−2 acts independently. So, essentially, level-2 rationality is equivalent to the assumption that all agents are rational, and know that all agents are rational, and know that all agents act independently. Identification based on rationalizability (or level-k rationality) is useful because it is plausible that rationalizable (or level-k rational) but not necessarily Nash equilibrium strategy profiles are used in the data generating process. Such examples are easy to construct.10

10 Suppose that there are N = 2 players and u (1, y ) = θ + ∆ y , where ∆ > 0 i −i i i −i i and θi < 0 < θi + ∆i . Then any outcome (i.e., S 2 ) is rationalizable. This is because the conjecture that agent −i uses action 1 rationalizes using action 1, and the conjecture that agent −i uses action 0 rationalizes using action 0. However, the profiles (0, 1) and (1, 0) are not pure strategy Nash equilibria.

3. Identification of the utility functions This section establishes the sufficient conditions for point identification of the parameters θ = (θi )i and δ = (δij )ij . Assumption 3.1 (Median Independence of the Unobservables). The following two median independence conditions hold: (1) For each role i, ϵi |x has zero median for all x in the support. Moreover, it holds that the cumulative distribution function Fϵi |x (·) is strictly increasing in a neighborhood of zero. (2) For all roles i and j, (ϵi + si (g )νij )|(x, z , gij = 1, si (g )) has zero median for all (x, z , gij = 1, si (g )) in the support. Moreover, it holds that the cumulative distribution function Fϵi +si (g )νij |x,z ,gij =1,si (g ) (·) is strictly increasing in a neighborhood of zero. Assumption 3.2 (Continuous Unobservables). For each role i, the distribution of ϵi |x is continuous for all x in the support. Also, for all roles i and j, the distribution of (ϵi + si (g )νij )|(x, z , gij = 1, si (g )) is continuous for all (x, z , gij = 1, si (g )) in the support. Part 1 of Assumption 3.1 is similar to Manski (1975, 1985) and Horowitz (1992) for single-agent discrete choice models, and concerns ϵi .11 The unobservable component of the utility that agent i gets from taking action 1, when all other agents take action 0, has zero conditional median. The assumption that the median is zero is a normalization, because the ‘‘true’’ median will be absorbed by the intercept terms. Part 2 of Assumption 3.1 additionally concerns the unobservables appearing in the interaction structure: the unobservable component of the utility that agent i gets from taking action 1, when a connected agent j takes action 1 and all other agents take action 0, has zero conditional median. As discussed further below, the assumption that the median is zero can also be viewed as a normalization, because the ‘‘true’’ median will be absorbed by the interaction effect terms. Part 2 of Assumption 3.1 can roughly be interpreted to require that both ϵi and νij have zero median, conditional on (x, z , gij = 1, si (g )), except for the fact that unlike expectations, medians are not additive in general. One sufficient condition for part 2, and a useful heuristic way to interpret part 2, is that ϵi and νij are both conditionally symmetric about zero, and conditionally independent: see Lemma A.2 for the details. Another sufficient condition is that (ϵi , νij )|(x, z , gij = 1, si (g )) is jointly normal with mean zero but possibly non-zero covariance. Under this interpretation, the assumption that the median of νij is zero can be viewed as a normalization, because the ‘‘true’’ median will be absorbed by the fij (zij , δij ) term appearing in the interaction effect, similar to how the intercept term absorbs the ‘‘true’’ median of ϵi . Remark 3.1 (Assumption 3.1 Relative to Endogeneity of the Network). Part 2 of Assumption 3.1 should be interpreted in relation to the possible endogeneity of the network. In order to discuss this interpretation, consider the alternative ‘‘symmetric unobservables assumption’’ that is sufficient for part 2 of Assumption 3.1, and consider first the assumption that ϵi |(x, z , gij = 1, si (g )) is symmetric about zero. This assumption concerns only the marginal distribution of ϵi , not the joint distribution of ϵ . Recall that homophily translates to the condition that the existence of a friendship between i and j (i.e., gij ) depends on the ‘‘similarity’’ of (xi , ϵi ) and (xj , ϵj ). This assumption allows for homophily because it allows that friends tend to have similar unobservable components of utility

11 Recall that Manski (1988) establishes limits on identifiability of binary response models with mean independence assumptions.

B. Kline / Journal of Econometrics 189 (2015) 117–131

(i.e., (ϵi , ϵj )|(x, z , gij , si (g )) can depend on gij ). Also, the model allows that friends tend to have similar observable components of utility. Because this assumption concerns only the marginal distribution of ϵi , this assumption should be interpreted as the statement that gij is an exogenous explanatory variable in the ‘‘single-agent model’’ for individual i. It could be that gij is endogenous in the ‘‘single-agent model’’ if the mere fact that i has a friendship with j implies that i also has high (or low) ϵi , regardless of the value of ϵj . For example, that might happen if ϵi is related to the overall ‘‘friendliness’’ of i. This assumption does not entail the specification of a model for network formation. Rather, it is an assumption about the ‘‘reduced form’’ implications of network formation.12 Nevertheless, it is instructive to have a plausible model of network formation that is consistent with this assumption. Suppose that si (·) ≡ 1, and that (ϵi , ϵj )|(x, z ) is symmetric about zero. This is not necessary for the intuition, but simplifies the argument. Then, suppose that gij = 1 if and only if |ϵi − ϵj | ≤ K (xi , zi , xj , zj ), where K (·) is some (non-random, for simplicity) function. Thus, i and j are connected (i.e., are friends) if their unobservables are similar enough, compared to the function K (xi , zi , xj , zj ) of their observables, so this is a model of homophily.13 After some algebra, the density of ϵi |(x, z , gij = 1) is symmetric about zero.14 The intuition is: even though gij depends on ϵi and ϵj , the fact that i and j are connected is ‘‘equally consistent’’ with high ϵi and ϵj , and with low ϵi and ϵj . So, ϵi |(x, z , gij = 1) is symmetric about zero given that ϵi |(x, z ) is symmetric about zero. This model for g does imply correlation of (ϵi , ϵj )|(x, z , gij = 1), since pairs of agents are connected or not according to whether they have similar unobservables. So, in general, ϵi |(x, z , gij = 1, ϵj ) is not symmetric about zero. An alternative model of network formation shows how this assumption can fail, if ϵi is related to the overall ‘‘friendliness’’ of i. Suppose now that gij = 1 if and only if |ϵi − ϵj | ≤ K (xi , zi , xj , zj ) + ϵi + ϵj . So, as before, one feature of network formation is that i and j are connected if their unobservables are similar enough compared to the function K (xi , zi , xj , zj ), but now the existence of a connection is also adjusted directly by ϵi and ϵj . Consequently, ϵi and ϵj can be viewed as the ‘‘friendliness’’ of the individuals, as large ϵi and/or large ϵj can counteract the ‘‘dissimilarity’’ of the individuals to still result in a connection. After some algebra, now it holds that gij = 1 if and only if both: ϵi ≥ −21 K (xi , zi , xj , zj ) and

ϵj ≥ −21 K (xi , zi , xj , zj ). And so, now, clearly the symmetry assumption fails, since gij = 1 implies a truncation of the support of ϵi and ϵj . Although the truncation might not happen in all models in which ϵi captures ‘‘friendliness,’’ this example suggests the general statement that it is important that ϵi does not capture the ‘‘friendliness’’ of i in order to maintain the ‘‘symmetric unobservables assumption.’’ This seems to be a reasonable assumption, because ϵi is the unobservable component of the utility from taking a certain action, which in general need not have any relationship with ‘‘friendliness.’’

12 In contrast, for example Christakis et al. (2010) is a model focused on network formation. 13 One plausible specification is K (x , z , x , z ) = C −  γ |x − x | − i

i

j

j

k

xk

ik

jk

γzk |zik − zjk |, with γ ≥ 0, so that i and j are friends if they are sufficiently similar in terms of observables and unobservables. The form of K (·) is not particularly 

k

relevant for this argument, however, as the analysis is conditional on x and z. 14 Consider: f (e |x, z , −K + ϵ ≤ ϵ ≤ K + ϵ ) and f (−e |x, z , −K + ϵ ≤ ϵi i j i j ϵi i j ϵi ≤ K + ϵj ). The condition that −K + ϵj ≤ ϵi ≤ K + ϵj is equivalent to

 K +e

the condition that gij = 1. The density at ei is −K +ie f (ei , ϵj |x, z )dϵj divided by i  K −e P (−K + ϵj ≤ ϵi ≤ K + ϵj |x, z ). The density at −ei is −K −ie f (−ei , ϵj |x, z )dϵj

121

The second part of the ‘‘symmetric unobservables assumption’’ is that νij |(x, z , gij = 1, si (g )) is symmetric about zero. This assumption is equivalent to the correct specification assumption that gijw |(x, z , gij = 1, si (g )) is symmetric about si (g )fij (zij , δij ). So, si (g )fij (zij , δij ) is the part of gijw when gij = 1 that can be explained by z, and νij is the ‘‘regression error.’’ It need not be assumed that νij |(x, z , gij = 0, si (g )) is symmetric about zero, so connected agents can tend to have larger (or smaller) values of νij , compared to non-connected agents. If ν ≡ 0, as in the standard model in Eq. (2), this part of the assumption is automatically satisfied. Assumption 3.2 rules out mass points in the unobservables. Assumption 3.3 (Sufficient Variation of the Explanatory Variables). For each role i, if ti ̸= θi , then P (˜ui (·, ti ) = u˜ i (·, θi )) < 1. For all roles i and j, if dij ̸= δij , then P (fij (·, dij ) = fij (·, δij )) < 1. The first part of Assumption 3.3 amounts to assuming that u˜ i (·, θi ) are different functions for different θi , relative to the support of the argument. In case it is assumed that u˜ i (xi(−1) , θi ) = αi + xi(−1) βi(−1) , then a sufficient condition for the first part of Assumption 3.3 is the usual full rank assumption on xi(−1) appended with a constant to account for the intercept. Similarly, the second part of Assumption 3.3 amounts to assuming that fij (·, δij ) are different functions for different δij , relative to the support of the argument. This assumption is clearly necessary for point identification, because otherwise different parameter specifications would trivially be observationally equivalent. Assumption 3.4 (Large Support Regressor). For any x·(−1) in the support, x·1 |x·(−1) has everywhere positive density with support on RN . Also, for any (x·(−1) , z ) in the support, x·1 |(x·(−1) , z ) has everywhere positive density with support on RN . Assumption 3.4 is a standard ‘‘large support’’ assumption. Note that the ‘‘large support’’ is required to hold conditionally on the other explanatory variables, as standard. The notation x·1 , with a dot in place of the index for the agent, refers to the first explanatory variable of all agents. Other uses of a ‘‘dot’’ are similar: for example, the notation x·(−1) refers to all but the first explanatory variable of all agents. A discussion of partial identification, relaxing the large support assumption, is provided in Section 5. The next assumption concerns the observation of the network g. In instance m of the game, the econometrician observes gˆm , rather than gm . The introduction of gˆm allows for the possibility that the econometrician observes in the data only a ‘‘subset’’ of the connections in gm . The notation is that the observed network in instance m of the game is gˆm , which is an N × N adjacency matrix on the same nodes as gm . The edges of gˆm are a subset of the edges of gm . For example, in social interactions gˆm reflects the friendships that are actually reported in the survey. By construction, the econometrician observes that i and j are connected when gˆijm = 1, and does not observe them to be connected when gˆijm = 0. Of course, a special case is gˆm ≡ gm in all instances m of the game, so that the network gm actually is observed by the econometrician. Because edges of gˆm are a subset of the edges of gm , it follows that if gijm = 0 then gˆijm = 0; however, if gijm = 1, then it could be that gˆijm = 0 or it could be that gˆijm = 1.15 Therefore, the model allows that the econometrician ‘‘misses’’ some connections, but rules out the possibility that the econometrician ‘‘erroneously’’ observes connections that do not actually exist. For example, in the case of social interactions, individuals might not be able to report all of their friends because the survey limits the number

i

divided by P (−K + ϵj ≤ ϵi ≤ K + ϵj |x, z ). Then by a change of variables,

 K −ei −K −ei

 K +ei

f (−ei , ϵj |x, z )dϵj = −K +e f (−ei , −ϵj |x, z )dϵj . Since the conditional density i

 K +e

 K +e

i is symmetric, −K +ie f (−ei , −ϵj |x, z )dϵj = −K +ei f (ei , ϵj |x, z )dϵj . And so, as i claimed, the conditional density of ϵi is symmetric.

15 Equivalently, if gˆ ˆijm = 0, then it could be ijm = 1 then gijm = 1; however, if g that gijm = 0 or it could be that gijm = 1.

122

B. Kline / Journal of Econometrics 189 (2015) 117–131

of reportable friends, or might not be able to (or be willing to expend the effort to) perfectly recall all of their friends at the time of the survey, but seem unlikely to erroneously report someone as a friend if that person actually is not a friend. Another ‘‘intuitive’’ interpretation is that the data generating process results in a ‘‘random sample’’ of the connections in the network. Previously, the literature on networks spanning disciplines has recognized the importance of accounting for issues relating to measurement of the network. An incomplete listing of a few useful resources on measurement includes Holland and Leinhardt (1973), Marsden (1990, 2005), Kossinets (2006), Chandrasekhar and Lewis (2011), and Wang et al. (2012). Important aspects of that literature include: results on ‘‘sensitivity’’/‘‘robustness’’ of estimates of network statistics as a function of measurement error, the study of data collection methods, in particular to reduce measurement error, and the study of imputation methods. More generally, in terms of identification strategies dealing with measurement of the network, viewing the binary connections in the network as binary explanatory variables implies that measurement error cannot be classical measurement error. The literature on measurement error of binary explanatory variables generally finds partial identification (e.g., Bollinger (1996), Black et al. (2000), Mahajan (2006), Lewbel (2007), Hu (2008), Molinari (2008)) without further assumptions (or extra data). This paper achieves point identification by using the condition that the measurement error is ‘‘one-sided’’, in the sense that if gˆijm = 1 then the econometrician knows there is not measurement error, whereas if gˆijm = 0 then the econometrician does not know if there is measurement error. In the absence of this condition (and in the absence of alternative conditions), the network would effectively be unobserved by the econometrician, because in that case there would be no conditions relating the observed network to the true network. In that case, per the results on measurement error of binary explanatory variables, the parameters of the model would evidently be partially identified. An incomplete observation of the network complicates identification, because it means that when gˆij = 0, the econometrician does not know whether the effectof yj on agent i’s utility from ac tion 1 is 0 or si (g ) fij (zij , δij ) + νij . The next assumption requires that the observation of connections in the network is ‘‘at random,’’ in the sense that the unobservables are independent of the observed network of connections, after conditioning on the true network of connections. The true network can depend on the unobservables, as discussed in Remark 3.1. Assumption 3.5 (Observed at Random Connections). It holds that (ϵ, ν)|(x, z , g , {si (g )}i ) ∼ (ϵ, ν)|(x, z , gˆ , g , {si (g )}i ) and, for all roles i and j, (ϵ, ν)|(x, z , gij = 1, si (g )) ∼ (ϵ, ν)|(x, z , gˆij = 1, gij = 1, si (g )). Assumption 3.5 does not assume away the complications that arise because of an incomplete observation of the network, because an incomplete observation is still a sort of ‘‘mismeasured regressor’’ problem that could lead to attenuation bias if not dealt with appropriately even if the incompleteness is ‘‘at random.’’ In particular, Assumption 3.5 is satisfied if it is assumed that gˆ ≡ g, so that actually the true network is always observed, since in that case conditioning on gˆ is redundant. Assumption 3.5 is more than required for the identification result. Appendix A essentially shows that it is enough for all assumptions in this section to hold conditionally on gˆ instead of g. The assumptions are stated as conditional on g because assumptions on model primitives may be more transparent than equivalent assumptions on observables, when the observables differ from the model primitives. As a special case of this model, the econometrician can know g rather than observe it in the data. For example, the econometrician

might specify that gij = 1 for all i and j, to impose the assumption that all agents affect all other agents, as in the standard model. In that case, many of the assumptions on g become vacuous. The next two assumptions concern the interaction structure. They are mild regularity conditions that would be automatically satisfied in the standard model. Assumption 3.6 (Existence of Observed Connections). For all roles i and j, there is a non-random constant sij > 0 such that for any (x, z ) in the support, P (ˆgij = 1, si (g ) = sij |x, z ) > 0. Assumption 3.6 requires that there is positive probability of instances of the game where i and j are observed to be connected according to gˆ (and therefore actually are connected according to g) and si (g ) = sij , conditional on x and z. For the model without a network (e.g., si (·) ≡ 1 and gˆij ≡ 1, but still allowing that the interaction effects depend on z and ν ), Assumption 3.6 is automatically satisfied. This assumption guarantees the existence of instances of the game such that si (g ) ̸= 0 and gij = 1, which implies that the ‘‘leftmost term’’ and ‘‘rightmost term’’ of the interaction effect in Eq. (3) is non-zero. However, the ‘‘middle term’’ of the interaction effect is unrestricted by this assumption. Indeed, this assumption is used to recover the ‘‘middle term’’. In particular, therefore, even in the instances of the game guaranteed by this assumption, it is possible for there to be zero interaction effect, since the ‘‘middle term’’ can be zero. Also, note that since the connections in the network are in general dependent on each other, per the (unspecified) model of network formation, Assumption 3.6 does not imply the stronger condition that the observation of any given gˆ (or g) necessarily has positive probability.16 For the next assumption, use the notation that for matrix A and real number b, A ̸≤ b means that at least one component of A is strictly greater than b. And, |A| is the component-wise absolute value of A. Assumption 3.7 (Uniformly Bounded Interaction Structure). For any δ < 1, there is Gδ > 0 such that, for any (x, z , g , {si (g )}i ) in the support, P (|g w | ̸≤ Gδ |x, z , g , {si (g )}i ) ≤ 1 − δ . Assumption 3.7 requires that the components of the interaction structure (i.e., the interaction effects) cannot be arbitrarily large in magnitude with high probability. This guarantees that, at least in the majority of instances of the game, it is possible for the ‘‘non-interaction’’ part of the utility functions to dominate any possible value of the interaction effects, and thereby ‘‘cause’’ the agents to take particular actions. However, from the perspective of the agents in the game, within any instance of the game, the interaction effects are non-random, as the game involves complete information. And so this assumption concerns the uncertainty of the econometrician about unobservable components of the utility function, not the uncertainty of the agents. Since the functional form in Eq. (3) implies that |g w | ≤ maxij |fij (zij , δij )| + maxij |νij |, it is sufficient that the observable and unobservable components of the interaction structure are bounded with high probability. If there are homogeneous interaction effects, as in the standard model (i.e., Eq. (2)), or even if gijw = ∆ij gij , then this assumption is automatically satisfied. Finally, the following two technical assumptions are used to relax, and are implied by, the assumption that the unobservables are (conditionally) independent of the first explanatory variables x·1 , which of course is itself implied by the assumption that the unobservables are independent of the explanatory variables, as has been assumed in prior identification results. The economic intuition for both of these technical assumptions is discussed after the statement of the assumptions.

16 For example, if N = 3, it would be enough that each pair of agents are connected with positive probability, but not required that all three agents are simultaneously connected with positive probability.

B. Kline / Journal of Econometrics 189 (2015) 117–131

Assumption 3.8 (Conditional Tail Behavior of the Unobservables). For all roles i and j, for any non-random function τi (·) of x of the form that τi (x) ≡ axi1 + τ˜i (xi(−1) ), where a = 1 or a = −1, and any sequence of x that varies only in x·1 such that τi (x) → ∞, it holds that Fϵi |x (τi (x)) → 1 and Fϵi |x,z ,gij =1,si (g ) (τi (x)) → 1 and Fϵi |x,z ,g ,{si (g )}i (τi (x)) → 1; and, for any sequence of x that varies only in x·1 such that τi (x) → −∞, it holds that Fϵi |x (τi (x)) → 0 and Fϵi |x,z ,gij =1,si (g ) (τi (x)) → 0 and Fϵi |x,z ,g ,{si (g )}i (τi (x)) → 0. Assumption 3.8 is used to guarantee the intuition that as the observable part of utility becomes very positive or very negative as a ‘‘function’’ of x·1 , then it should happen that the overall utility also becomes very positive or very negative, respectively, with high probability. Roughly, the function τi (x) stands in for the part of the utility function in Eq. (1) that depends on the observable explanatory variables, where the fact that the utility function is ‘‘partially linear’’ is reflected by the fact that xi1 enters τi (x) additively, and τi (xi(−1) ) stands in for the ‘‘rest’’ of the utility function in Eq. (1). Since the coefficient on xi1 in this assumption might either be positive or negative, this assumption concerns the behavior as xi1 becomes very positive or very negative. Essentially, it rules out the possibility that when the observable part of utility is very positive (respectively, very negative), the unobservable component of utility is very negative (respectively, very positive) with high probability. Assumption 3.9 (Non-Flat Unobservables). The following two conditions hold: (1) For each role i, and each (x(−i)(−1) , xi ) in the support, and t ̸= 0, it holds that lim supx(−i)1 Fϵi |x (t ) ̸= 21 and lim infx(−i)1 Fϵi |x (t ) ̸= 1 , 2

where the limits are along any sequence of x(−i)1 such that, for each j ̸= i, either xj1 → ∞ or xj1 → −∞. (2) For all roles i and j, and each (x(−i)(−1) , xi ) and z and si (g ) in the support, and t ̸= 0, it holds that lim supx(−i)1 Fϵi +si (g )νij |x,z ,gij =1,si (g ) (t )

̸=

1 2

and

lim infx(−i)1

where the limits are along any Fϵi +si (g )νij |x,z ,gij =1,si (g ) (t ) ̸= sequence of x(−i)1 such that, for each k ̸= i, either xk1 → ∞ or xk1 → −∞. 1 , 2

The first part of Assumption 3.9 relates to the dependence of

ϵi |x on x(−i)1 . It is used to imply that if Fϵi |x (t ) < 12 , for fixed (x(−i)(−1) , xi ) and all x(−i)1 , then it cannot be that along a sequence of x(−i)1 it holds that Fϵi |x (t ) → 21 , and similarly when Fϵi |x (t ) > 12 .

A sufficient but not necessary condition, given Assumption 3.1, is ϵi |x ∼ ϵi |(x(−i)(−1) , xi ). The second part of Assumption 3.9 is similar. Although very reminiscent of previous assumptions, Assumption 3.9 does entail additional conditions not already implied by previous assumptions. Specifically, considering the first part of Assumption 3.9 for illustration, since the conditional median of ϵi |x is zero, and the cumulative distribution function for ϵi |x is strictly increasing in a neighborhood of zero, by Assumption 3.1, it is true, for example, that Fϵi |x (t ) < 12 for all x and all t < 0. However, that only implies that a limit of that inequality (along a sequence of x) would hold with weak inequality, whereas strict inequality is required by Assumption 3.9. The possible violation of Assumption 3.9 would be that the distributions of ϵi |x become increasingly ‘‘flat’’ in a neighborhood of zero, along some sequence of x, so that it happens that actually ‘‘in the limit’’ the distribution of ϵi |x is flat in a neighborhood of zero. However, in terms of economic intuition, Assumption 3.9 does not require much beyond the conditions from previous assumptions. The following theorem shows that under the above assumptions, the utility functions are point identified. Identification is

123

based on the population distribution of independent instances of the game, which is P ((yi , xi , si (g ))i , (zij )i,j , gˆ ).17 Theorem 3.1. Suppose that the model of the utility functions is given in Eqs. (1) and (3), and suppose that there is level-2 rational play, allowing for mixed strategies. Under Assumptions 3.1–3.9, θ and δ are point identified. Remark 3.2 (Intuition for Identification Strategy). Because of Assumptions 3.4 and 3.7, there are ‘‘extreme’’ values of x implying that, with very high probability, each agent j with j ̸= i has negative payoff from taking action 1, regardless of the actions of the other agents. This implies that taking action 0 is the only level-1 rational strategy for agents j ̸= i, so agent i best responds to the conjecture that all agents j ̸= i take action 0, since agent i is level2 rational. This implies, essentially, that agent i gets utility 0 from action 0 and utility xi1 + u˜ i (xi(−1) , θi ) + ϵi from action 1. Because of Assumptions 3.3 and 3.4, if ti ̸= θi , then there are xi such that xi1 + u˜ i (xi(−1) , θi ) has the opposite sign of xi1 + u˜ i (xi(−1) , ti ). And so, by Assumption 3.1, the best response to the conjecture that all other agents take action 0 is most likely to be to take action 1 when the parameters are ti but is most likely to be to take action 0 when the parameters are θi , or vice versa, where the uncertainty is from the perspective of the econometrician, due to ϵi . Therefore, θi is point identified. Similarly, because of Assumptions 3.4 and 3.7, there are ‘‘extreme’’ values of x implying that, with very high probability, agent j with j ̸= i has positive payoff from taking action 1, regardless of the actions of the other agents, and each agent k with k ̸= i, j has negative payoff from taking action 1, regardless of the actions of the other agents. This implies that taking action 1 is the only level-1 rational strategy for agent j ̸= i, and taking action 0 is the only level-1 rational strategy for agents k ̸= i, j, so agent i best responds to the conjecture that the other agents behave this way, since agent i is level-2 rational. This implies, essentially, that agent i gets utility 0 from action 0 and utility xi1 + u˜ i (xi(−1) , θi ) + si (g )fij (zij , δij ) + si (g )νij + ϵi from action 1, when i and j are connected from Assumption 3.6. Because of Assumptions 3.3 and 3.4, if dij ̸= δij , then there are xi and zij such that xi1 + u˜ i (xi(−1) , θi ) + si (g )fij (zij , δij ) has the opposite sign of xi1 + u˜ i (xi(−1) , θi ) + si (g )fij (zij , dij ). And so, by Assumption 3.1, the best response to the above conjecture is most likely to be to take action 1 when the parameters are dij but is most likely to be to take action 0 when the parameters are δij , or vice versa, where the uncertainty is from the perspective of the econometrician, due to ϵi and νij . Therefore, δij is point identified, given that θi is point identified. In models of social interactions, this intuition implicitly deals with distinguishing between the causal effect of the actions of friends (the interaction effect) from the fact that friendships may not be randomly assigned. The interaction effects can be identified by inspecting how the actions of agents whose own actions are driven primarily by their own explanatory variables affect the actions of other agents. Even if friendships are endogenous, this ‘‘additional’’ source of variation in the actions of agents other than i is exogenous from the perspective of agent i, so can be used to identify just the effect of the actions of agents other than i on agent i’s utility function, apart from the ‘‘effect’’ of the endogenous network.

17 At the expense of more complicated notation, it is possible to allow that s (g ) i is not observed for all agents, as long as it is observed for some agents, and all assumptions that involve conditioning on si (g ) hold in addition when conditioning on the observation of si (g ). Note also that it is possible to observe si (g ) even if g is only incompletely observed. First, some functional forms like si (g ) ≡ 1 guarantee

−1

that si (g ) is observed. Second, even for functional forms like si (g ) ≡ , it j gij is possible for datasets to reveal the ‘‘number of friends’’ (by asking for number of friends) but not the exact identity of those friends.



124

B. Kline / Journal of Econometrics 189 (2015) 117–131

Fig. 1. Example of the interaction structure.

4. Example application of the model: Social interactions This section describes a stylized example of the model related to social interactions and the Add Health dataset. The Add Health dataset is a longitudinal survey of adolescent students, sampling multiple students per school, and including network data concerning friendships amongst the students. Suppose that the game models the choice made by students of whether to drink alcohol, and that action 1 is the decision to drink and action 0 is the decision to not drink. The Add Health dataset includes multiple questions concerning drinking alcohol. Fig. 1 shows a stylized graphical representation of an example of the interaction structure defined in Eq. (3). Each node (i.e., circle containing a number) represents one of the fifteen students in a particular classroom (i.e., a particular instance of the game). The edges (i.e., lines connecting the nodes, regardless of being dashed or not) represent the true friendships among the students. So, for example, student 1 is friends with student 2, but student 1 is not friends with student 3. Equivalently, g12 = 1 but g13 = 0. This means that student 1’s utility function depends on the drinking behavior of student 2, but not on the drinking behavior of student 3. The solid edges but not the dashed edges (i.e., solid lines but not dashed lines connecting the nodes) represent the friendships that the econometrician observes. So, for example, g17 = 1 but gˆ17 = 0. In the Add Health data, individuals are allowed to report up to 10 friends, and so individuals with more than 10 friends will necessarily not be able to report all friends. Moreover, in any network dataset, it is possible that individuals do not report all of their friends, even if they have the opportunity. The weight on an edge between student i and student j (i.e., the number in the middle of a line connecting two nodes) is the value of gijw = gjiw . In this example it is assumed that gij = gji and gijw = gjiw , so that both friendships and the interaction effects are symmetric, but the model allows asymmetries in both. So, for example, student 2 drinking increases the utility that student 1 gets from drinking w = 0.2 utils, and vice versa. These effects are different by g12 for different pairs of students. For example, student 5 drinking w increases the utility that student 1 gets from drinking by g15 = 0.3 utils rather than 0.2 utils. Finally, for example since student 1 is not w friends with student 3, so g13 = 0, student 3 drinking has no effect on the utility that student 1 gets from drinking.   w Eq. (3) specifies that gijm = si (gm ) fij (zijm , δij ) + νijm gijm , with unknown parameter δij . In the context of the example, this describes how the drinking of student j affects the utility student i gets from drinking. Since this can depend on the characteristics of those students, it can be used to answer empirically relevant w questions that could not be answered if gijm = ∆ij as in the standard model. Consider the following two examples of this functional form. First, suppose that zijm = zjm is a binary variable indicating whether student jm (i.e., student j in classroom m) has some characteristic that the econometrician suspects might make student jm influential (e.g., participation in a sport, or beliefs about being ‘‘part of’’ the school, both of which are asked in the Add

Health dataset). Further suppose that fij (zijm , δ) = δ1 zjm + δ0 (1 − zjm ), so that the effect of a ‘‘popular’’ student drinking is δ1 , while the effect of an ‘‘un-popular’’ student drinking is δ0 . Thus, the model can be used to investigate whether ‘‘popular’’ students are more influential than ‘‘un-popular’’ students. Second, alternatively, suppose that zijm = (zim , zjm ) where zim (resp., zjm ) are demographic characteristics of student im (resp., jm). The Add Health dataset, for example, includes characteristics like age and race/ethnicity. Further suppose that fij (zijm , δ) = ∆ √ , where δ = (∆, Ω ), where ∆ is a ′ 1+

(zim −zjm ) Ω (zim −zjm )

scalar and Ω is a positive semidefinite matrix. If the students are observationally equivalent according to the demographics (i.e., zim = zjm ), then fij (zijm , δ) = ∆, but if not then fij (zijm , δ) equals ∆ divided by some number weakly greater than one, so that fij (zijm , δ) is weakly closer to zero than is ∆. Thus, in absolute magnitude, more similar students have weakly greater effect on each other. The matrix Ω determines exactly how the difference in the demographic characteristics of the students determines the effects they have on each other. Thus, the model can be used to investigate whether students that are more similar demographically are more influential on each other. One of the main conditions for the identification result is that there exists an explanatory variable with large support, or at least ‘‘large enough’’ support for informative partial identification in Section 5. Intuitively, these explanatory variables should be variables that have a significant influence on drinking behavior. The Add Health dataset includes, for example, variables relating to the income and intensity of drinking of the parents, both of which arguably satisfy the condition of having at least ‘‘large enough’’ support if not (for practical purposes) ‘‘large support’’ after re-centering. Of course, by a suitable transformation, these variables can essentially be forced to literally have large support. And even if these variables do not have large support, the partial identification result in Section 5 extends the point identification result to show that ‘‘large enough’’ support results in informative partial identification with a small identified set. See also the large literature in general based on ‘‘large support’’ regressors, for further empirical examples in other settings. For example, in entry games, significant cost shifters or demand shifters can be argued to satisfy the large support condition. Since the network is allowed to be endogenous, the identification results can answer this question: if people whose friends drink are more likely to drink themselves, is that because the utility from drinking increases when friends drink, or because people choose to be friends on the basis of similar (unobserved?) propensities to drink? In contrast to Fig. 1, the graphical representation of the standard complete information game model of this interaction would have edges between every pair of students, and the weight on each edge would be a constant parameter ∆. Consequently, the standard model cannot reveal how the characteristics of the students affects how the drinking of friends affects the utility from drinking. Indeed, the standard model does not allow for friendships, even if those friendships are exogenous.

5. Extension: Partial identification without large support The point identification result for the parameters θ = (θi )i and δ = (δij )ij is based on a large support assumption, Assumption 3.4. This section shows that there is informative partial iden-

B. Kline / Journal of Econometrics 189 (2015) 117–131

tification even without large support.18 The partial identification result shows that the identified set for the parameters shrinks with the size of the support of the explanatory variables. In the limit, a large support explanatory variable is sufficient for point identification. Consequently, the partial identification result justifies the interpretation of the large support assumption as an ‘‘idealized approximation’’ to a situation with bounded but ‘‘sufficiently large’’ support. It is not necessarily guaranteed that a point identification result based on large support assumptions will have this property, particularly those based on assumptions on expected values of the unobservables, rather than the medians of the unobservables, as discussed for example in Magnac and Maurin (2007) or Khan and Tamer (2010).19 Indeed, overall in econometrics, evidently issues relating to ‘‘non-robustness’’ to the assumptions is a main reason for skepticism about identification results based on a large support assumption. The following partial identification result establishes that the identification strategy in this paper is robust to the assumption of large support. The following additional assumption on the unobservables is used to establish the partial identification result. Assumption 5.1 (Median Preserving Spread). For each role i, and for all x in the support, there is a cumulative distribution function F˜ϵi |x (·) that is known by the econometrician such that: (1) (2) (3)

Fϵi |x (t ) ≤ F˜ϵi |x (t ) for t ≤ 0. Fϵi |x (t ) ≥ F˜ϵi |x (t ) for t ≥ 0. F˜ϵi |x (0) = 12 .

For all roles i, j, and k (where k can be equal to i or j), and for all (x, z , gˆij = 1, si (g )) in the support, there are cumulative distribution functions F˜ϵk |x,z ,ˆgij =1,si (g ) (·) and F˜ϵi +si (g )νij |x,z ,ˆgij =1,si (g ) (·) that are known by the econometrician such that: (1) Fϵk |x,z ,ˆgij =1,si (g ) (t ) ≤ F˜ϵk |x,z ,ˆgij =1,si (g ) (t ) for t ≤ 0. (2) Fϵk |x,z ,ˆgij =1,si (g ) (t ) ≥ F˜ϵk |x,z ,ˆgij =1,si (g ) (t ) for t ≥ 0. (3) F˜ϵk |x,z ,ˆgij =1,si (g ) (0) = 12 . (4) Fϵi +si (g )νij |x,z ,ˆgij =1,si (g ) (t ) ≤ F˜ϵi +si (g )νij |x,z ,ˆgij =1,si (g ) (t ) for t ≤ 0. (5) Fϵi +si (g )νij |x,z ,ˆgij =1,si (g ) (t ) ≥ F˜ϵi +si (g )νij |x,z ,ˆgij =1,si (g ) (t ) for t ≥ 0. (6) F˜ϵi +si (g )νij |x,z ,ˆgij =1,si (g ) (0) =

1 . 2

Assumption 5.1 requires that there are known median preserving spreads of the distributions of the unobservables, which implies that the ‘‘tail thickness’’ of the distributions of the unobservables, ϵi |x or ϵk |(x, z , gˆij = 1, si (g )) or (ϵi + si (g )νij )|(x, z , gˆij = 1, si (g )), is no greater than that of F˜ϵi |x or F˜ϵk |x,z ,ˆgij =1,si (g ) or F˜ϵi +si (g )νij |x,z ,ˆgij =1,si (g ) , respectively. Such an assumption can be motivated, heuristically, as similar to a compactness assumption on the parametrization of the variance of the unobservables. The identification strategy uses the fact that the probability that the agents take certain actions can be bounded below by expressions involving only the marginal distributions of the unobservables, evaluated at certain functions (related to the functional form

18 Another possibility is to find an identification strategy that results in point identification even without large support. Kline (2015a) shows that point identification is possible without large support if: there is pure strategy Nash equilibrium play, the unobservables are independent from the explanatory variables, and there is ‘‘minimal’’ heterogeneity in the interaction effects, among other conditions. Consequently, because the model in this paper has none of those features, that identification strategy does not apply here, but does apply to the standard complete information game model. Thus, there is a tradeoff between assumptions on the support of the explanatory variables, and other assumptions on the model. 19 The expected value functional is arbitrarily sensitive to even very small perturbations in the tails of a distribution, whereas the median functional is insensitive to sufficiently small perturbations in the tails of a distribution.

125

of the utility functions) of the explanatory variables. As a consequence, for any fixed specification for the distribution of the unobservables, the probability that the agents take those actions can be made arbitrarily close to 1 by evaluating those bounds either at very positive or very negative values of the explanatory variables, as appropriate (see the proof for the details). Therefore, with large support, the explanatory variables can be used as an ‘‘instrument’’ to ‘‘cause’’ the agents to take certain actions. Intuitively, then, it should be possible to make the probability that the agents take those actions ‘‘somewhat’’ close to 1 by evaluating those bounds at very positive or very negative values of the explanatory variables, even for explanatory variables with bounded support. However, since the distribution of the unobservables is not known, it is always possible that for any bounded support of the explanatory variables that the distribution of the unobservables has sufficiently thick tails so that the probability of taking those actions cannot be made close to 1. Essentially, the point identification result uses the fact that for any cumulative distribution function F , it holds that for t sufficiently negative, F (t ) ≈ 0, and for t sufficiently positive, F (t ) ≈ 1. However, if t must be in a bounded set, say t ∈ [a, b], which corresponds to the case of bounded support of the explanatory variables, it is not necessarily true that F (a) ≈ 0, since F could have arbitrarily thick tails so that F (a) is not close to 0, and similarly it is not necessarily true that F (b) ≈ 1. Assumption 5.1 guarantees that it is possible to make those probabilities ‘‘somewhat’’ close to 0 or 1 even with bounded support. Theorem 5.1. Suppose that the model of the utility functions is given in Eqs. (1) and (3), and suppose that there is level-2 rational play, allowing for mixed strategies. Under Assumptions 3.1, 3.2, 3.5, 3.7 and 5.1, the specification of the parameters t = (ti )i and d = (dij )ij is not in the identified set for θ and ∆ if: (1) For some role i, there is a set of explanatory variables x with positive probability such that there is 0 < δ < 1 such that  F˜ϵi |x (−xi1 − u˜ i (xi(−1) , θi )) + 2((1 − δ) + j̸=i (1 − F˜ϵj |x (−xj1 − u˜ j (xj(−1) , θj ) − (N − 1)Gδ ))) <

1 2

while

F˜ϵi |x (−xi1 − u˜ i (xi(−1) , ti )) − 2((1 − δ) +



j̸=i

(1 − F˜ϵj |x (−xj1 −

u˜ j (xj(−1) , tj ) − (N − 1)Gδ ))) > (2) Or, for some role i, there is a set of explanatory variables x with positive probability such that there is 0 < δ < 1 such that  ˜ F˜ϵi |x (−xi1 − u˜ i (xi(−1) , θi )) − 2((1 − δ) + j̸=i (1 − Fϵj |x 1 . 2

(−xj1 − u˜ j (xj(−1) , θj ) − (N − 1)Gδ ))) > 21 while  F˜ϵi |x (−xi1 − u˜ i (xi(−1) , ti )) + 2((1 − δ) + j̸=i (1 − F˜ϵj |x (−xj1 − u˜ j (xj(−1) , tj ) − (N − 1)Gδ ))) < 21 . (3) Or, for some roles i and j, there is a set of explanatory variables (x, z , gˆij = 1, si (g ) = sij ) with positive probability such that there is 0 < δ < 1 such that F˜ϵi +sij νij |x,z ,ˆgij =1,si (g )=sij (−xi1 − u˜ i (xi(−1) , θi ) − sij fij (zij , δij )) + 2((1 −δ)+

(1 − F˜ϵk |x,z ,ˆgij =1,si (g )=sij (−xk1 − u˜ k (xk(−1) , θk )− ˜ (N − 1)Gδ )) + Fϵj |x,z ,ˆgij =1,si (g )=sij (−xj1 − u˜ j (xj(−1) , θj ) + (N − 1)Gδ )) < 12 while F˜ϵi +sij νij |x,z ,ˆgij =1,si (g )=sij (−xi1 −˜ui (xi(−1) , ti )−sij fij (zij , dij ))−2((1−  δ) + k̸=i,j (1 − F˜ϵk |x,z ,ˆgij =1,si (g )=sij (−xk1 − u˜ k (xk(−1) , tk ) − (N − 1)Gδ )) + F˜ϵj |x,z ,ˆgij =1,si (g )=sij (−xj1 − u˜ j (xj(−1) , tj ) + (N − 1)Gδ )) > 

k̸=i,j

1 . 2 (4) Or, for some roles i and j, there is a set of explanatory variables (x, z , gˆij = 1, si (g ) = sij ) with positive probability such that there is 0 < δ < 1 such that F˜ϵi +sij νij |x,z ,ˆgij =1,si (g )=sij (−xi1 − u˜ i (xi(−1) , θi ) − sij fij (zij , δij )) −

2((1 −δ)+



k̸=i,j

(1 − F˜ϵk |x,z ,ˆgij =1,si (g )=sij (−xk1 − u˜ k (xk(−1) , θk )−

126

B. Kline / Journal of Econometrics 189 (2015) 117–131

(N − 1)Gδ )) + F˜ϵj |x,z ,ˆgij =1,si (g )=sij (−xj1 − u˜ j (xj(−1) , θj ) + (N − 1)Gδ )) > 21 while F˜ϵi +sij νij |x,z ,ˆgij =1,si (g )=sij (−xi1 −˜ui (xi(−1) , ti )−sij fij (zij , dij ))+2((1−  δ) + k̸=i,j (1 − F˜ϵk |x,z ,ˆgij =1,si (g )=sij (−xk1 − u˜ k (xk(−1) , tk ) − (N − 1)Gδ )) + F˜ϵ |x,z ,ˆg =1,s (g )=s (−xj1 − u˜ j (xj(−1) , tj ) + (N − 1)Gδ )) < j

ij

i

ij

1 . 2

The same results obtain replacing: (5) In part 1, F˜ϵi |x (−xi1 − u˜ i (xi(−1) , θi )) + 2((1 − δ) +



j̸=i (1 −

F˜ϵj |x (−xj1 − u˜ j (xj(−1) , θj ) − (N − 1)Gδ ))) with P (y = (0, . . . , 0)|x).  (6) In part 2, F˜ϵi |x (−xi1 − u˜ i (xi(−1) , θi )) − 2((1 − δ) + j= ̸ i (1 − F˜ϵj |x (−xj1 − u˜ j (xj(−1) , θj ) − (N − 1)Gδ ))) with P (y = (0, . . . , 0)|x). (7) In part 3, F˜ϵi +sij νij |x,z ,ˆgij =1,si (g )=sij (−xi1 − u˜ i (xi(−1) , θi ) − sij fij (zij ,

 ˜ δij )) + 2((1 − δ) + k̸=i,j (1 − Fϵk |x,z ,ˆgij =1,si (g )=sij (−xk1 − u˜ k (xk(−1) , θk ) − (N − 1)Gδ )) + F˜ϵj |x,z ,ˆgij =1,si (g )=sij (−xj1 − u˜ j (xj(−1) , θj ) + (N − 1)Gδ )) with P (yj = 1, y−j = 0|x, z , gˆij = 1, si (g ) = sij ). (8) In part 4, F˜ϵi +sij νij |x,z ,ˆgij =1,si (g )=sij (−xi1 − u˜ i (xi(−1) , θi ) − sij fij (zij ,  ˜ δij )) − 2((1 − δ) + k̸=i,j (1 − Fϵk |x,z ,ˆgij =1,si (g )=sij (−xk1 − u˜ k (xk(−1) , θk ) − (N − 1)Gδ )) + F˜ϵj |x,z ,ˆgij =1,si (g )=sij (−xj1 − u˜ j (xj(−1) , θj ) + (N − 1)Gδ )) with P (yj = 1, y−j = 0|x, z , gˆij = 1, si (g ) = sij ). If the ‘‘game’’ involved only one agent, then conditions 1 and 2, using the substitutions from parts 5 and 6, would essentially amount to the statement that P (yi = 0|x) < 21 (or, P (yi =

0|x) >

1 , 2

respectively) while F˜ϵi |x (−xi1 − u˜ i (xi(−1) , ti )) >

1 2

(or,

F˜ϵi |x (−xi1 − u˜ i (xi(−1) , ti )) < 21 , respectively), or equivalently −xi1 − u˜ i (xi(−1) , ti ) > 0 (or, −xi1 − u˜ i (xi(−1) , ti ) < 0, respectively). That is essentially the identification result in single-agent discrete choice models with median restrictions from Manski (1975, 1985). The other terms in the conditions in Theorem 5.1 represent a ‘‘wedge’’ introduced by the need to also use the explanatory variables as an ‘‘instrument’’ to ‘‘cause’’ the other agents to take certain actions. Conditions 3 and 4 concern pairs of agents, so would not arise at all in a ‘‘game’’ that involved only one agent. The purpose of Theorem 5.1 is to demonstrate how the identified set shrinks with the size of the support of the explanatory variables.20 Suppose for example that ti ̸= θi . Then, under Assumption 3.3 (which is not required for the statement of Theorem 5.1), u˜ i (xi(−1) , θi ) ̸= u˜ i (xi(−1) , ti ) with positive probability. If the support of xi1 is ‘‘sufficiently large’’ compared to how different θi and ti are (and therefore how different u˜ i (xi(−1) , θi ) and u˜ i (xi(−1) , ti ) are), there will be xi1 such that xi1 is between −˜ui (xi(−1) , θi ) and −˜ui (xi(−1) , ti ). In the limit of a ‘‘large support’’ explanatory variable, this is necessarily true, whereas with bounded support, this may or may not be true depending on the value of ti compared to θi . Since the median of ϵi |x is zero by Assumption 3.1 (and Assumption 5.1), this implies that F˜ϵi |x (−xi1 − u˜ i (xi(−1) , θi )) is on the opposite side of 21 from

F˜ϵi |x (−xi1 − u˜ i (xi(−1) , θi )). In a single-agent model of discrete choice, this would be the end of the identification argument.

20 Consequently, this is not a characterization of the sharp identified set, because an explicit characterization of the sharp identified set that also demonstrates the property that the identified set shrinks does not seem feasible given the complexity of the model. See for example Beresteanu et al. (2011) or Galichon and Henry (2011) for more on identified sets in games.

However, because of the interaction, it is also necessary to use the explanatory variables as ‘‘instruments’’ to ‘‘cause’’ the agents other than i to take certain actions, in order to identify the utility function of agent i given the actions of the other agents are ‘‘fixed’’. If it is possible to set δ to be close enough to 1 and xj1 to be sufficiently negative, then it will hold that 2((1 − δ) +

 (1 − F˜ϵj |x (−xj1 − j̸=i u˜ j (xj(−1) , θj ) − (N − 1)Gδ ))) and 2((1 − δ) + j̸=i (1 − F˜ϵj |x (−xj1 − u˜ j (xj(−1) , tj )−(N − 1)Gδ ))) are close to zero. The existence of the Gδ

term comes from Assumption 3.7. In the limit of a ‘‘large support’’ explanatory variable, those terms can be made arbitrarily small, whereas with bounded support, those terms can only be made ‘‘somewhat’’ small. As long as those terms are sufficiently small, relative to the values of F˜ϵi |x (−xi1 − u˜ i (xi(−1) , θi )) and F˜ϵi |x (−xi1 − u˜ i (xi(−1) , ti )) (which in turn depends on the values of the two specifications of the parameters), either condition 1 or 2 in the theorem obtains, establishing that the specification t = (ti )i and d = (dij )ij is not in the identified set. By similar arguments, if dij ̸= δij , condition 3 or 4 in the theorem can obtain depending on the size of the support relative to the difference between dij and δij . This result most likely would not be directly implemented as, for example, characterizing a set of moment inequalities to be estimated. Rather, suppose that the econometrician estimates the model by some likelihood method that allows partial identification. Then, Theorem 5.1 implies that the resulting identified set (which will automatically be the set of parameters that maximize the likelihood) will be ‘‘small’’ as long as the support of the explanatory variables is ‘‘large enough’’. 6. Estimation and inference The most straightforward approach to estimation is to complete the model by introducing a selection mechanism. Using the law of total probability: P (y|x, z , g ) =

K 

P (y|x, z , g , (ϵ, ν) ∈ Rk (·))

k=1

× P ((ϵ, ν) ∈ Rk (·)|x, z , g ), where y = (y1 , . . . , yN ) is a generic outcome of the game. The Rk (·) terms are functions of (x, z , g , θ , δ), and are K subsets of (ϵ, ν)-space that partition the space of utility functions into subsets that give rise to the same set of equilibria.21 Then, many existing estimation and inference methods apply to this likelihood. Note similar likelihood structures are shared by other models of games, so existing game estimation methods apply to this model. Automatically, the ‘‘sharp identified set’’ is simply the set of maximizers of the likelihood. If the conclusions of the main point identification theorem obtain, then θ and δ are point identified. If further the conclusions of theorem in the online supplement on point identification of the distribution of the unobservables are satisfied (see Appendix B), then P ((ϵ, ν) ∈ Rk (x, z , g , θ , δ)|x, z , g ) is point

21 In the case of a ‘‘standard’’ two player game, K = 5, where R –R would 1 4 characterize the set of utility functions resulting in a unique equilibrium outcome, and R5 would be the ‘‘box’’ of multiple equilibria. More generally, one way to construct the Rk sets is the following. Let S (y, x, z , g , θ, δ) be the set of all (ϵ, ν) consistent with y being an outcome, for that x, z, g, and parameters. Then, define the equivalence relation that characterizes Rk (·) as (ϵ, ν) ∼x,z ,g ,θ,δ (ϵ ′ , ν ′ ) defined by the condition that: for all logically possible outcomes y, (ϵ, ν) ∈ S (y, x, z , θ, δ) if and only if (ϵ ′ , ν ′ ) ∈ S (y, x, z , θ, δ). Let the resulting partition be {Rk (·)}k . So, if (ϵ, ν), (ϵ ′ , ν ′ ) ∈ Rk (·), then the set of potential outcomes when the unobservables are (ϵ, ν) is the same as the set of potential outcomes when the unobservables are (ϵ ′ , ν ′ ). For example, one such Rk might characterize the utility functions for which (1, 1) and (0, 0) are both potential outcomes, because of multiple equilibria. The selection mechanism P (y|x, z , g , Rk (x, z , g , θ, δ)) is the distribution describing which of these multiple potential outcomes actually obtains.

B. Kline / Journal of Econometrics 189 (2015) 117–131

identified. The ‘‘selection mechanism’’ terms P (y|x, z , g , (ϵ, ν) ∈

Rk (·)) may not be point identified without further assumptions,

like exclusion restrictions: see Bajari et al. (2011).22 If the econometrician is willing to make finite-dimensional parametric assumptions, as common in applied work, Liu and Shao (2003) establish the asymptotic sampling distribution of the likelihood ratio statistic, even if the model is not point identified. Chen et al. (2011) establish the asymptotic sampling distribution of a profile sieve likelihood ratio statistic, for the possibly partially identified finite-dimensional object of interest, that profiles out the possibly partially identified infinite-dimensional nuisance parameters. Chen et al. (2011) have a discussion of the applicability to models of games. In both sets of results, under partial identification, the resulting asymptotic sampling distribution is very complicated. However, if the model is actually point identified, then the asymptotic sampling distribution has a much simpler χ 2 form. The above likelihood assumes that the network g is observed, but the identification result for θ and δ does not require that. If g is incompletely observed, then the statistical likelihood could be further augmented by a ‘‘network imputation mechanism’’ by the law of total probability: P (y|x, z , gˆ ) =



P (y|x, z , gˆ , g )P (g |x, z , gˆ )

g

=

 K   g

P (y|x, z , g , gˆ , (ϵ, ν) ∈ Rk (·))

k=1

 × P ((ϵ, ν) ∈ Rk (·)|x, z , g , gˆ ) P (g |x, z , gˆ ) =

 K   g

P (y|x, z , g , (ϵ, ν) ∈ Rk (·))

k=1

 × P ((ϵ, ν) ∈ Rk (·)|x, z , g ) P (g |x, z , gˆ ), where one conditioning on gˆ can be dropped by an assumption like Assumption 3.5, and plausibly the selection mechanism is also conditionally independent of gˆ . Evidently the nuisance parameter P (g |x, z , gˆ ) is partially identified. Since identification of the interaction effect parameters between agents i and j relies on instances of the game with observed connected agents i and j, the statistical precision of the estimates will decrease as the number of connections that are observed decreases. The partial identification literature also provides other options. For example, another approach is the ‘‘minimum distance’’ approach discussed in the context of a more general approach to Bayesian inference in partially identified models in Kline and Tamer (2015). Kline and Tamer (2015) provide an illustration to a game model. 7. Conclusions and discussion This paper provides identification results for a class of complete information game models that generalize the specification of the

22 An interesting but ‘‘opposite’’ question asks about the probability that a specified action profile can be generated in a Nash equilibrium; this has been addressed by Aradillas-Lopez (2011). Another approach is to assume that the econometrician knows the selection mechanism. Of course, the remainder of the model parameters must still be shown to be identified, which is not necessarily obvious ex ante even if the selection mechanism is known. For example, Krauth (2006), Soetevent and Kooreman (2007), and Fox and Lazzati (2012), among other contributions of their papers, explore different important variations on such an approach, under various specifications for the selection mechanism (and various other conditions).

127

standard model. The issues addressed in this paper are applicable to special cases of the complete information game framework including entry games, technology adoption games with network effects, and social interactions. In particular, Section 4 illustrates the features of the model in the context of social interactions, and shows how the model can be used to answer empirically relevant questions that cannot be answered in the standard model. This class of models generalizes the standard model of complete information games used in prior econometric work in two main areas. First, this class of models has a generalized specification of the interaction structure, which characterizes how the utility functions depend on the actions of other agents. The model allows that the interaction effects can depend on observable or unobservable characteristics of the agents. In particular, the model allows that there is a network of endogenous connections among the agents. This network determines whether an agent’s utility function has any dependence on another agent’s action according to whether those agents are connected to each other in the network. Second, this class of models generalizes the behavioral assumptions, which corresponds to using a solution concept that is weaker than Nash equilibrium, in particular level-2 rationality. Also, the identification results allow a non-parametric treatment of the unobservables, which shows that identification comes from the ‘‘theory model’’ rather than distributional assumptions. Under stronger assumptions, provided in the online supplement (see Appendix B), the distribution of the unobservables are shown to be point identified. The identification strategy proposed in this paper fully exploits the binary nature of the action space. Specifically, because the action space is binary, the identification strategy can use the fact that the explanatory variables can be used as ‘‘instruments’’ for the actions taken by the agents. For example, ‘‘very positive’’ values of the explanatory variables are used to ‘‘cause’’ agents to take action 1, and ‘‘very negative’’ values of the explanatory variables are used to ‘‘cause’’ agents to take action 0. If the action space were not binary, the connection between actions and the explanatory variables would be more subtle, depending on the functional form of the utility function. One possible way to extend this identification strategy to richer action spaces would be to set up a utility function such that: ‘‘very positive’’ values of the explanatory variables can be used to ‘‘cause’’ agents to take the largest action in the action space, and ‘‘very negative’’ values of the explanatory variables can be used to ‘‘cause’’ agents to take the smallest action in the action space. Evidently, that could be achieved by an ‘‘ordered choice’’—like utility function. Then, following similar arguments to those used in the paper, it might be possible to point identify the utility function of a particular agent, evaluated at the cases where each of the other agents either takes the largest or smallest action in the action space. If the action space is binary, as in this paper, this fully identifies the utility function, but if the action space is not binary, then it would be necessary to somehow ‘‘extrapolate’’ from that to the utility functions at intermediate actions, presumably based on functional form assumptions about the utility function. In the case of a non-binary action space it does not seem readily apparent how to use the explanatory variables as ‘‘instruments’’ to ‘‘cause’’ agents to take intermediate actions, resulting in the need for ‘‘extrapolation’’. Acknowledgments This paper is a revised and extended version of the first chapter of my dissertation that was previously distributed as ‘‘Identification of semiparametric games on networks with application to social interactions’’ in 2011/2012. Many thanks go to Joel Horowitz, Charles Manski, and Elie Tamer for many helpful comments and discussions. Also thanks go to the editor, associate editor, and two

128

B. Kline / Journal of Econometrics 189 (2015) 117–131

anonymous referees for comments and suggestions that have improved the paper. Finally, thanks go to seminar audiences at Johns Hopkins University, New York University, Northwestern University, Purdue University, University of California—Los Angeles, University of Southern California, and University of Texas at Austin for helpful comments. Any errors are mine. Appendix A. Extra results and proofs The following are ‘‘extra’’ assumptions that are implied by the assumptions above, essentially related to dropping conditioning on some variables, or conditioning on gˆ rather than g. These ‘‘extra’’ assumptions are more convenient to use in the proofs. Lemma A.1 collects how the assumptions above imply these ‘‘extra’’ assumptions. Assumption A.1 (Median Independence of the Unobservables on gˆ ). The following two median independence conditions hold: (1) For each role i, ϵi |x has zero median for all x in the support. Moreover, it holds that the cumulative distribution function Fϵi |x (·) is strictly increasing in a neighborhood of zero. (2) For all roles i and j, (ϵi + si (g )νij )|(x, z , gˆij = 1, si (g )) has zero median for all (x, z , gˆij = 1, si (g )) in the support. Moreover, it holds that the cumulative distribution function Fϵi +si (g )νij |x,z ,ˆgij =1,si (g ) (·) is strictly increasing in a neighborhood of zero. Assumption A.2 (Continuous Unobservables on gˆ ). For each role i, the distribution of ϵi |x is continuous for all x in the support. Also, for all roles i and j, the distribution of (ϵi + si (g )νij )|(x, z , gˆij = 1, si (g )) is continuous for all (x, z , gˆij = 1, si (g )) in the support. Assumption A.3 (Uniformly Bounded Interaction Structure on gˆ ). For any δ < 1, there is Gδ > 0 such that, for any (x, z , gˆ , {si (g )}i ) in the support, P (|g w | ̸≤ Gδ |x, z , gˆ , {si (g )}i ) ≤ 1 − δ . Assumption A.4 (Uniformly Bounded Interaction Structure). For any

δ < 1, there is Gδ > 0 such that, for any x in the support, P (|g w | ̸≤ Gδ |x) ≤ 1 − δ . Assumption A.5 (Conditional Tail Behavior of the Unobservables on gˆ ). For all roles i and j, for any non-random function τi (·) of x of the form that τi (x) ≡ axi1 + τ˜i (xi(−1) ), where a = 1 or a = −1, and any sequence of x that varies only in x·1 such that τi (x) → ∞, it holds that Fϵi |x (τi (x)) → 1 and Fϵi |x,z ,ˆgij =1,si (g ) (τi (x)) → 1 and Fϵi |x,z ,ˆg ,{si (g )}i (τi (x)) → 1; and, for any sequence of x that varies only in x·1 such that τi (x) → −∞, it holds that Fϵi |x (τi (x)) → 0 and Fϵi |x,z ,ˆgij =1,si (g ) (τi (x)) → 0 and Fϵi |x,z ,ˆg ,{si (g )}i (τi (x)) → 0. Assumption A.6 (Non-Flat Unobservables on gˆ ). The following two conditions hold: (1) For each role i, and each (x(−i)(−1) , xi ) in the support, and t ̸= 0, it holds that lim supx(−i)1 Fϵi |x (t ) ̸= 21 and lim infx(−i)1 Fϵi |x (t ) ̸= 1 , 2

where the limits are along any sequence of x(−i)1 such that, for each j ̸= i, either xj1 → ∞ or xj1 → −∞. (2) For all roles i and j, and for any (x(−i)(−1) , xi ) and z and si (g ) in the support, and t ̸= 0, it holds that lim supx(−i)1 Fϵi +si (g )νij |x,z ,ˆgij =1,si (g ) (t )

̸=

1 2

and

lim infx(−i)1

Fϵi +si (g )νij |x,z ,ˆgij =1,si (g ) (t ) ̸= where the limits are along any sequence of x(−i)1 such that, for each k ̸= i, either xk1 → ∞ or xk1 → −∞. 1 , 2

Lemma A.1. The following holds: (1) Assumptions 3.1 and 3.5 imply Assumption A.1. (2) Assumptions 3.2 and 3.5 imply Assumption A.2. (3) Assumptions 3.5 and 3.7 imply Assumption A.3. (4) Assumption A.3 implies Assumption A.4. (5) Assumptions 3.5 and 3.8 imply Assumption A.5. (6) Assumptions 3.5 and 3.9 imply Assumption A.6.

Proof. For 1,

(ϵi + si (g )νij )|(x, z , gˆij = 1, si (g )) ∼ (ϵi + si (g )νij )|(x, z , gˆij = 1, gij = 1, si (g )) ∼ (ϵi + si (g )νij )|(x, z , gij = 1, si (g )). The first distributional equivalence follows because gˆij = 1 implies that gij = 1. The second distributional equivalence follows from Assumption 3.5. Therefore the second part of Assumption A.1 follows from the second part of Assumption 3.1. The first parts of these assumptions are the same. For 2, the proof is basically the same as the proof of 1, because the arguments of the proof of 1 show distributional equivalences relating the distributions in Assumption 3.2 to the distributions in Assumption A.2, so assumptions on the distributions in Assumption 3.2 imply the same assumptions on the corresponding distributions in Assumption A.2. For 3, note that P (|g w | ̸≤ Gδ |x, z , gˆ , {si (g )}i )



P (|g w | ̸≤ Gδ |x, z , gˆ , g , {si (g )}i )dP (g |x, z , gˆ , {si (g )}i )



P (|g w | ̸≤ Gδ |x, z , g , {si (g )}i )dP (g |x, z , gˆ , {si (g )}i )

= =

≤ 1 − δ, where the second equality follows from Assumption 3.5 since the random part of g w is conditionally independent of gˆ , and where the inequality follows from Assumption 3.7 since there is Gδ such that each term of the integrand is less than or equal to 1 − δ . For 4, by Assumption A.3, for any δ < 1, there is Gδ > 0 such that P (|g w | ̸≤ Gδ |x, z , gˆ , {si (g )}i ) ≤ 1 − δ . Therefore, by the law of total probability, P (|g w | ̸≤ Gδ |x) ≤ 1 − δ . For 5, note that the first and fourth limits in Assumptions 3.8 and A.5 are the same. For the second and fifth limits, note that similarly to the proof of part 1, ϵi |(x, z , gij = 1, si (g )) ∼ ϵi |(x, z , gˆij = 1, si (g )) under Assumption 3.5. For the third limit, note similarly that Assumption 3.5 implies that Fϵi |x,z ,ˆg ,{si (g )}i (τi (x))

 =  =

Fϵi |x,z ,ˆg ,g ,{si (g )}i (τi (x))dP (g |x, z , gˆ , {si (g )}i ) Fϵi |x,z ,g ,{si (g )}i (τi (x))dP (g |x, z , gˆ , {si (g )}i )

≥ inf Fϵi |x,z ,g ,{si (g )}i (τi (x)) → 1, g

where the first equality follows from the law of total probability, the second equality follows from Assumption 3.5, and the limit follows from the third limit in Assumption 3.8 since g can only take on finitely many values. The sixth limit is proved similarly to the third limit. For 6, the proof is basically the same as the proof of 1, for the same reason as discussed in the proof of 2.  The following lemma is useful. Lemma A.2. Suppose that ϵ ⊥ ν , and that both ϵ and ν are symmetric about zero. Then, ϵ + ν is symmetric about zero, and thus has zero median.23

23 The statement that replaces the assumption of symmetric about zero with having a median equal to zero is not true, since the median is not additive even for independent random variables. The assumption that ϵ and ν are independent cannot be relaxed, without replacing it with some other condition, because any mean zero random variable can be expressed as the sum of two random variables that are symmetric about zero (see Rubin and Sellke (1986)). Under stronger shape restrictions on (ϵ, ν) like joint normality, ϵ + ν can be symmetric about zero even if ϵ ̸⊥ ν .

B. Kline / Journal of Econometrics 189 (2015) 117–131

Proof of Theorem 3.1. All of the ‘‘extra’’ assumptions may be used by Lemma A.1. First, consider identification of θ . Suppose that for some role i it holds that ti ̸= θi . The first claim is that there exists either a set of xi with positive probability such that xi1 + u˜ i (xi(−1) , ti ) < 0 < xi1 + u˜ i (xi(−1) , θi ) or a set of xi with positive probability such that xi1 + u˜ i (xi(−1) , θi ) < 0 < xi1 + u˜ i (xi(−1) , ti ). These events are equivalent to −˜ui (xi(−1) , θi ) < xi1 < −˜ui (xi(−1) , ti ) and −˜ui (xi(−1) , ti ) < xi1 < −˜ui (xi(−1) , θi ), respectively. By Assumption 3.3, P (˜ui (xi(−1) , θi ) ̸= u˜ i (xi(−1) , ti )) > 0. So, the claim holds by Assumption 3.4. These inequalities hold for any x(−i) . The next claim, established in many steps, essentially is that P (y = (0, . . . , 0)|x) ≈ P (ϵi ≤ −xi1 − u˜ i (xi(−1) , θi )|x) for certain extreme values of x related to the large support assumption. By Assumption A.4, for any δ < 1, there is Gδ > 0 such that P (|g w | ̸≤ Gδ |x) ≤ 1 − δ uniformly in x. Let Ai,δ be the event ϵi < −xi1 − u˜ i (xi(−1) , θi ) − (N − 1)Gδ and Ci,δ = ∩j̸=i Aj,δ . By the law of total probability, P (y = (0, . . . , 0)|x) = P (y = (0, . . . , 0), Ci,δ , |g w | ≤ Gδ |x) + P (y = (0, . . . , 0), CiC,δ , |g w | ≤ Gδ |x) + P (y = (0, . . . , 0), |g w | ̸≤ Gδ |x). By choice of Gδ , P (y = (0, . . . , 0), |g w | ̸≤ Gδ |x) ≤ 1 − δ . Also, P (y = (0, . . . , 0), CiC,δ , |g w | ≤ Gδ |x) ≤ P (CiC,δ |x) =

˜ j (xj(−1) , θj )−(N − 1)Gδ |x) → P (∪j̸=i ACj,δ |x) ≤ j̸=i P (ϵj ≥ −xj1 − u 0, as long as −xj1 − u˜ j (xj(−1) , θi ) is a sequence diverging to +∞ by Assumption A.5. As xj1 → −∞ for all j ̸= i, by Assumption 3.4, it therefore holds that P (y = (0, . . . , 0), CiC,δ , |g w | ≤ Gδ |x) → 0. Also, when Ci,δ and |g w | ≤ Gδ , the only level-1 strategy for any agent j with j ̸= i is to use action 0. This is because an upper bound on the possible payoff from action 1 is xj1 + u˜ j (xj(−1) , θj ) +  w ˜ j (xj(−1) , θj ) + (N − 1)Gδ + ϵj . This is k:g w ≥0 gjk + ϵj ≤ xj1 + u 

jk

negative if ϵj < −xj1 − u˜ j (xj(−1) , θj ) − (N − 1)Gδ . Thus, when Ci,δ and |g w | ≤ Gδ obtain, the payoff to each agent j with j ̸= i from using action 1 is negative for any y(−j) , so the only level-1 strategy for any agent j with j ̸= i is to use action 0. That implies that the event that y = (0, . . . , 0), Ci,δ , and |g w | ≤ Gδ is equivalent to the event that ϵi ≤ −xi1 − u˜ i (xi(−1) , θi ), Ci,δ , and |g w | ≤ Gδ . This is because when Ci,δ and |g w | ≤ Gδ obtain the only level-2 conjecture for agent i is that each other agent uses action 0, since that is their only level-1 strategy. Thus, agent i may either use action 0 and get a payoff of 0 or use action 1 and get a payoff of xi1 + u˜ i (xi(−1) , θi ) + ϵi . Assumption A.2 implies that agent i is indifferent between actions 0 and 1 with probability zero. By the law of total probability, P (ϵi ≤ −xi1 − u˜ i (xi(−1) , θi ), Ci,δ , |g w | ≤ Gδ |x) = P (ϵi ≤ −xi1 − u˜ i (xi(−1) , θi )|x) − P (ϵi ≤ −xi1 − u˜ i (xi(−1) , θi ), CiC,δ , |g w | ≤ Gδ |x) − P (ϵi ≤ −xi1 − u˜ i (xi(−1) , θi ), |g w | ̸≤ Gδ |x). Similar to the previous arguments, P (ϵi ≤ −xi1 − u˜ i (xi(−1) , θi ), CiC,δ , |g w | ≤ Gδ |x) → 0 and P (ϵi ≤ −xi1 − u˜ i (xi(−1) , θi ), |g w | ̸≤ Gδ |x) ≤ 1 − δ . This implies that |P (y = (0, . . . , 0)|x) − P (ϵi ≤ −xi1 − u˜ i (xi(−1) , θi )|x)| ≤ P (ϵi ≤ −xi1 − u˜ i (xi(−1) , θi ), CiC,δ , |g w | ≤ Gδ |x) + P (ϵi ≤ −xi1 − u˜ i (xi(−1) , θi ), |g w | ̸≤ Gδ |x) + P (y = (0, . . . , 0), CiC,δ , |g w | ≤ Gδ |x) + P (y = (0, . . . , 0), |g w | ̸≤ Gδ |x). Now, first choose δ < 1, and then by the previous arguments note that for any ξ > 0 it is possible to find a set of sufficiently negative xj1 for all j ̸= i, that depends on δ , so that P (ϵi ≤ −xi1 − u˜ i (xi(−1) , θi ), CiC,δ , |g w | ≤ Gδ |x) + P (ϵi ≤ −xi1 − u˜ i (xi(−1) , θi ), |g w | ̸≤ Gδ |x)+ P (y = (0, . . . , 0), CiC,δ , |g w | ≤ Gδ |x)+ P (y = (0, . . . , 0), |g w | ̸≤ Gδ |x) ≤ ξ + 1 − δ + ξ + 1 − δ , which can be made arbitrarily close to 0 by choice of δ and then ξ . The third and final main claim, essentially, establishes identification based on the single-agent P (ϵi ≤ −xi1 − u˜ i (xi(−1) , θi )|x). Since the true distribution of the unobservables is not known, let P˜ be an arbitrary (possibly not true) distribution of the

129

unobservables satisfying the assumptions. Further, let Pθ ′ be the ˜ distribution of the data satisfying the assumptions, based on P, with ti the parameters in place of θi . From before there exists either a set of xi with positive probability such that xi1 + u˜ i (xi(−1) , ti ) < 0 < xi1 + u˜ i (xi(−1) , θi ) or a set of xi with positive probability such that xi1 + u˜ i (xi(−1) , θi ) < 0 < xi1 + u˜ i (xi(−1) , ti ). In the former case, and under Assumption A.1, there is a set of xi with positive probability such that P (ϵi ≤ −xi1 − u˜ i (xi(−1) , θi )|x) < 21 < P˜ (ϵi ≤ −xi1 − u˜ i (xi(−1) , ti )|x). Similarly, in the latter case, there is a set of xi with positive probability such that P˜ (ϵi ≤ −xi1 − u˜ i (xi(−1) , ti )|x) < 12 < P (ϵi ≤ −xi1 − u˜ i (xi(−1) , θi )|x). In either case, P (y = (0, . . . , 0)|x) = P (y = (0, . . . , 0)|x)− P (ϵi ≤ −xi1 − u˜ i (xi(−1) , θi )|x) + P (ϵi ≤ −xi1 − u˜ i (xi(−1) , θi )|x). So in the former case, lim sup P (y = (0, . . . , 0)|x) = lim sup P (ϵi ≤ −xi1 − u˜ i (xi(−1) , θi )|x) < 21 , and lim inf Pθ ′ (y = (0, . . . , 0)|x) =

lim inf P˜ (ϵi ≤ −xi1 − u˜ i (xi(−1) , ti )|x) > 12 , by Assumption A.6, and the limits are along the sequences of x used above. Similarly, in the latter case, lim inf P (y = (0, . . . , 0)|x) = lim inf P (ϵi ≤ −xi1 − u˜ i (xi(−1) , θi )|x) > 21 , and lim sup Pθ ′ (y = (0, . . . , 0)|x) =

lim sup P˜ (ϵi ≤ −xi1 − u˜ i (xi(−1) , ti )|x) < 12 . In either case, this shows that t can be distinguished observationally from θ . Second, consider identification of δ . Suppose that for some roles i and j it holds that dij ̸= δij . Let sij be given by Assumption 3.6. The first claim is that there exists either a set of xi and zij with positive probability such that xi1 + u˜ i (xi(−1) , θi ) + sij fij (zij , δij ) < 0 < xi1 + u˜ i (xi(−1) , θi ) + sij fij (zij , dij ) or a set of xi and zij with positive probability such that xi1 + u˜ i (xi(−1) , θi ) + sij fij (zij , dij ) < 0 < xi1 + u˜ i (xi(−1) , θi ) + sij fij (zij , δij ). These events are equivalent to −˜ui (xi(−1) , θi ) − sij fij (zij , dij ) < xi1 < −˜ui (xi(−1) , θi ) − sij fij (zij , δij ) and −˜ui (xi(−1) , θi ) − sij fij (zij , δij ) < xi1 < −˜ui (xi(−1) , θi ) − sij fij (zij , dij ), respectively. The endpoints of these intervals are equal if and only if fij (zij , dij ) = fij (zij , δij ). By Assumption 3.3, P (fij (zij , dij ) ̸= fij (zij , δij )) > 0. So, the claim holds by Assumption 3.4. The next claim, established in many steps, essentially is that P (y = (0, 1, 0, . . . , 0)|x, z , gˆij = 1, si (g ) = sij ) ≈ P (ϵi ≤ −xi1 − u˜ i (xi(−1) , θi ) − gijw |x, z , gˆij = 1, si (g ) = sij ) for certain extreme values of x related to the large support assumption. The notation (0, 1, 0, . . . , 0) has the 1 entry corresponding to role j. By Assumption A.3, for any δ < 1, there is a Gδ such that P (|g w | ̸≤ Gδ |x, z , gˆij , si (g )) ≤ 1 − δ . Let Ai,δ be the event ϵi < −xi1 − u˜ i (xi(−1) , θi ) − (N − 1)Gδ . Let Bi,δ be the event ϵi > −xi1 − u˜ i (xi(−1) , θi ) + (N − 1)Gδ . Let Dij,δ = ∩k̸=i,j Ak,δ ∩ Bj,δ . By the law of total probability, P (y = (0, 1, 0, . . . , 0)|x, z , gˆij = 1, si (g ) = sij ) = P (y = (0, 1, 0, . . . , 0), Dij,δ , |g w | ≤ Gδ |x, z , gˆij = 1, si (g ) = sij ) + P (y = (0, 1, 0, . . . , 0), DijC,δ , |g w | ≤ Gδ |x, z , gˆij = 1, si (g ) = sij ) + P (y = (0, 1, 0, . . . , 0), |g w | ̸≤ Gδ |x, z , gˆij = 1, si (g ) = sij ). By choice of Gδ , it follows that P (y = (0, 1, 0, . . . , 0), |g w | ̸≤ Gδ |x, z , gˆij = 1, si (g ) = sij ) ≤ 1 − δ . Also, it holds that P (y = (0, 1, 0, . . . , 0), DijC,δ , |g w | ≤ Gδ |x, z , gˆij = 1, si (g ) = sij ) ≤

P (DijC,δ |x, z , gˆij = 1, si (g ) = sij ) = P (∪k̸=i,j ACk,δ ∪ BjC,δ |x, z , gˆij =  ˜ k (xk(−1) , θk ) − (N − 1, si (g ) = sij ) ≤ k̸=i,j P (ϵk ≥ −xk1 − u 1)Gδ |x, z , gˆij = 1, si (g ) = sij ) + P (ϵj ≤ −xj1 − u˜ j (xj(−1) , θj ) + (N − 1)Gδ |x, z , gˆij = 1, si (g ) = sij ) → 0, as long as −xk1 − u˜ k (xk(−1) , θk ) is a sequence diverging to +∞ for k ̸= i, j, and −xj1 − u˜ j (xj(−1) , θj ) is a sequence diverging to −∞, by Assumption A.5. As xk1 → −∞ for all k ̸= i, j and as xj1 → ∞, by Assumption 3.4, P (y = (0, 1, 0, . . . , 0), DijC,δ , |g w | ≤ Gδ |x, z , gˆij = 1, si (g ) = sij ) → 0. Also, when Dij,δ and |g w | ≤ Gδ , the only level-1 strategy for any agent k with k ̸= i, j is to use action 0 and the only level-1 strategy for agent j is to use action 1. This is because when the event Dij,δ obtains, and thus Ak,δ obtains for each agent k with k ̸= i, j, the payoff to each agent k with k ̸= i, j from using action 1 is negative

130

B. Kline / Journal of Econometrics 189 (2015) 117–131

for any actions of the other agents. Similarly, a lower bound on the payoff to agent j from action 1 is xj1 + u˜ j (xj(−1) , θj ) +  possible w g + ϵj ≥ xj1 + u˜ j (xj(−1) , θj ) − (N − 1)Gδ + ϵj . This is w k:g <0 jk jk

positive if ϵj > −xj1 − u˜ j (xj(−1) , θj ) + (N − 1)Gδ . Thus, when Dij,δ and |g w | ≤ Gδ , the only level-1 strategy for any agent k with k ̸= i, j is to use action 0 and for agent j is to use action 1. That implies that the event that y = (0, 1, 0, . . . , 0), Dij,δ , and |g w | ≤ Gδ is equivalent to the event that ϵi ≤ −xi1 − u˜ i (xi(−1) , θi )− gijw , Dij,δ , and |g w | ≤ Gδ . This is because when Dij,δ , and |g w | ≤ Gδ obtains the only level-2 conjecture for agent i is that each agent k with k ̸= i, j uses action 0, and agent j uses action 1, since that is their only level-1 strategy. Thus, agent i may either use action 0 and get a payoff of 0 or use action 1 and get a payoff of xi1 + u˜ i (xi(−1) , θi ) + gijw + ϵi . Assumption A.2 implies that agent i is indifferent between actions 0 and 1 with probability zero. By the law of total probability, P (ϵi ≤ −xi1 − u˜ i (xi(−1) , θi ) − gijw , Dij,δ , |g w | ≤ Gδ |x, z , gˆij = 1, si (g ) = sij ) = P (ϵi ≤ −xi1 − u˜ i (xi(−1) , θi ) − gijw |x, z , gˆij = 1, si (g ) = sij ) − P (ϵi ≤

−xi1 − u˜ i (xi(−1) , θi ) − gijw , DijC,δ , |g w | ≤ Gδ |x, z , gˆij = 1, si (g ) = sij ) − P (ϵi ≤ −xi1 − u˜ i (xi(−1) , θi ) − gijw , |g w | ̸≤ Gδ |x, z , gˆij = 1, si (g ) = sij ). Similar to the previous arguments, P (ϵi ≤ −xi1 − u˜ i (xi(−1) , θi ) − gijw , DijC,δ , |g w | ≤ Gδ |x, z , gˆij = 1, si (g ) = sij ) → 0 and P (ϵi ≤ −xi1 − u˜ i (xi(−1) , θi ) − gijw , |g w | ̸≤ Gδ |x, z , gˆij = 1, si (g ) = sij ) ≤ 1 − δ . These arguments imply |P (y = (0, 1, 0, . . . , 0)|x, z , gˆij = 1, si (g ) = sij ) − P (ϵi ≤ −xi1 − u˜ i (xi(−1) , θi ) − gijw |x, z , gˆij = 1, si (g ) = sij )| ≤ P (ϵi ≤ −xi1 − u˜ i (xi(−1) , θi ) − gijw , DijC,δ , |g w | ≤ Gδ |x, z , gˆij = 1, si (g ) = sij ) + P (ϵi ≤ −xi1 − u˜ i (xi(−1) , θi ) − gijw , |g w | ̸≤ Gδ |x, z , gˆij = 1, si (g ) = sij ) + P (y = (0, 1, 0, . . . , 0), DijC,δ , |g w | ≤ Gδ |x, z , gˆij = 1, si (g ) = sij ) + P (y = (0, 1, 0, . . . , 0), |g w | ̸≤ Gδ |x, z , gˆij = 1, si (g ) = sij ). Now, first choose δ < 1, and then by the previous arguments note that for given ξ > 0 it is possible to find a set of sufficiently negative xk1 for all k ̸= i, j and sufficiently positive xj1 , that depends possibly on δ , so that the right hand side of this inequality is ≤ ξ +1−δ+ξ +1−δ , which can be made arbitrarily close to 0 by choice of δ and then ξ . The third and final main claim, essentially, establishes identification based on the single-agent P (ϵi ≤ −xi1 − u˜ i (xi(−1) , θi ) − gijw |x, z , gˆij = 1, si (g ) = sij ). From before there exists either a set of xi and zij with positive probability such that xi1 + u˜ i (xi(−1) , θi ) + sij fij (zij , δij ) < 0 < xi1 + u˜ i (xi(−1) , θi ) + sij fij (zij , dij ) or a set of xi and zij with positive probability such that xi1 + u˜ i (xi(−1) , θi ) + sij fij (zij , dij ) < 0 < xi1 + u˜ i (xi(−1) , θi ) + sij fij (zij , δij ). In the former case, and under Assumption A.1, there is a set of (x, z ) with positive probability such that P˜ (ϵi + sij νij ≤ −xi1 − u˜ i (xi(−1) , θi ) − sij fij (zij , dij )|x, z , gˆij = 1, si (g ) = sij ) < 12 < P (ϵi + sij νij ≤ −xi1 − u˜ i (xi(−1) , θi ) −

sij fij (zij , δij )|x, z , gˆij = 1, si (g ) = sij ), where again P˜ is a generic distribution of the unobservables satisfying the assumptions. This follows from Assumption A.1, since (ϵi + sij νij )|(x, z , gˆij = 1, si (g ) = sij ) has unique zero median. Similarly, in the latter case, there is a set of (x, z ) with positive probability such that P (ϵi + sij νij ≤ −xi1 − u˜ i (xi(−1) , θi )− sij fij (zij , δij )|x, z , gˆij = 1, si (g ) =

sij ) < 21 < P˜ (ϵi +sij νij ≤ −xi1 −u˜ i (xi(−1) , θi )−sij fij (zij , dij )|x, z , gˆij = 1, si (g ) = sij ). In either case, P (y = (0, 1, 0, . . . , 0)|x, z , gˆij = 1, si (g ) = sij ) = P (y = (0, 1, 0, . . . , 0)|x, z , gˆij = 1, si (g ) = sij ) − P (ϵi + sij νij ≤ −xi1 − u˜ i (xi(−1) , θi ) − sij fij (zij , δij )|x, z , gˆij = 1, si (g ) = sij ) + P (ϵi + sij νij ≤ −xi1 − u˜ i (xi(−1) , θi ) − sij fij (zij , δij )|x, z , gˆij = 1, si (g ) = sij ). So, in the former case, lim inf P (y = (0, 1, 0, . . . , 0)|x, z , gˆij = 1, si (g ) = sij ) = lim inf P (ϵi + sij νij ≤ −xi1 − u˜ i (xi(−1) , θi ) − sij fij (zij , δij )|x, z , gˆij = 1, si (g ) = sij ) > 12 , and lim sup Pθ ′ (y = (0, 1, 0, . . . , 0)|x, z , gˆij = 1, si (g ) = sij ) = lim sup Pθ ′ (ϵi + sij νij ≤

−xi1 − u˜ i (xi(−1) , θi ) − sij fij (zij , dij )|x, z , gˆij = 1, si (g ) = sij ) <

1 , 2 where again Pθ ′ is another distribution of the data satisfying the ˜ with dij the parameters in place of δij , assumptions, based on P, by Assumption A.6. Similarly, in the latter case, lim sup P (y = (0, 1, 0, . . . , 0)|x, z , gˆij = 1, si (g ) = sij ) = lim sup P (ϵi + sij νij ≤ −xi1 − u˜ i (xi(−1) , θi ) − sij fij (zij , δij )|x, z , gˆij = 1, si (g ) = sij ) < 12 , and lim inf Pθ ′ (y = (0, 1, 0, . . . , 0)|x, z , gˆij = 1, si (g ) = sij ) = lim inf Pθ ′ (ϵi + sij νij ≤ −xi1 − u˜ i (xi(−1) , θi ) − sij fij (zij , dij )|x, z , gˆij = 1, si (g ) = sij ) > 12 . In either case, this shows that dij can be distinguished observationally from δij . 

Proof of Theorem 5.1. The proof of Theorem 3.1 establishes that max{P (y = (0, . . . , 0)|x) − P (ϵi ≤ −xi1 − u˜ i (xi(−1) , θi )|x), −P (y = (0, . . . , 0)|x) + P (ϵi ≤ −xi1 − u˜ i (xi(−1) , θi )|x)} ≤ |P (y = (0, . . . , 0)|x) − P (ϵi ≤ −xi1 − u˜ i (xi(−1) , θi )|x)| ≤ P (ϵi ≤ −xi1 − u˜ i (xi(−1) , θi ), CiC,δ , |g w | ≤ Gδ |x) + P (ϵi ≤ −xi1 −

u˜ i (xi(−1) , θi ), |g w | ̸≤ Gδ |x) + P (y = (0, . . . , 0), CiC,δ , |g w | ≤  Gδ |x) + P (y = (0, . . . , 0), |g w | ̸≤ Gδ |x) ≤ 2 j̸=i P (ϵj ≥ −xj1 − u˜ j (xj(−1) , θj )−(N −1)Gδ |x)+2(1−δ) under the assumed conditions. Suppose that condition 1 holds. Then for such x, it must be that −xj1 − u˜ j (xj(−1) , θj ) − (N − 1)Gδ > 0 for all j ̸= i, since otherwise

there would be j ̸= i such that F˜ϵj |x (−xj1 − u˜ j (xj(−1) , θj ) − (N − 1)Gδ ) ≤ 21 , in which case the condition clearly could not hold. Similarly, for such x, it must be that −xj1 − u˜ j (xj(−1) , tj ) − (N − 1)Gδ > 0 for all j ̸= i, since otherwise there would be j ̸= i such that F˜ϵj |x (−xj1 − u˜ j (xj(−1) , tj ) − (N − 1)Gδ ) ≤ 21 , in which case the condition clearly could not hold. Similarly, for such x, it must be that −xi1 − u˜ i (xi(−1) , θi ) < 0 and −xi1 − u˜ i (xi(− 1) , ti ) > 0. Therefore, by Assumption 5.1, it holds that 2 j̸=i P (ϵj ≥ −xj1 −  u˜ j (xj(−1) , θj ) − (N − 1)Gδ |x) + 2(1 − δ) = 2 j̸=i (1 − Fϵj |x (−xj1 − u˜ j (xj(−1) , θj ) − (N − 1)Gδ |x)) + 2(1 − δ) ≤ 2 j̸=i (1 − F˜ϵj |x (−xj1 − u˜ j (xj(−1) , θj ) − (N − 1)Gδ |x)) + 2(1 − δ). Moreover, the similar inequality holds for the other specification of the model. Therefore, at the true specification of the model,P (y = (0, . . . , 0)|x) ≤ P (ϵi ≤ −xi1 − u˜ i (xi(−1) , θi )|x) + 2 j̸=i (1 −



F˜ϵj |x (−xj1 − u˜ j (xj(−1) , θj ) − (N − 1)Gδ |x)) + 2(1 − δ) ≤ F˜ϵi |x (−xi1 −  u˜ i (xi(−1) , θi )|x) + 2 j̸=i (1 − F˜ϵj |x (−xj1 − u˜ j (xj(−1) , θj ) − (N − 1)Gδ |x)) + 2(1 − δ) < 12 . Conversely, at the other specification of the model, P (y = (0, . . . , 0)|x) ≥ P (ϵi ≤ −xi1 − u˜ i (xi(−1) , ti )|x) −

(1 − F˜ϵj |x (−xj1 − u˜ j (xj(−1) , tj ) − (N − 1)Gδ |x)) − 2(1 − δ) ≥  F˜ϵi |x (−xi1 − u˜ i (xi(−1) , ti )|x) − 2 j̸=i (1 − F˜ϵj |x (−xj1 − u˜ j (xj(−1) , tj ) − (N − 1)Gδ |x))− 2(1 −δ) > 12 . Consequently, the other specification 2



j̸=i

of the model can be observationally distinguished from the true specification of the model. It is similar if condition 2 holds. The same proofs establish claims 5 and 6. Moreover, the proof of Theorem 3.1 establishes that max{P (y = (0, 1, 0, . . . , 0)|x, z , gˆij = 1, si (g ) = sij ) − P (ϵi ≤ −xi1 − = 1, si (g ) = sij ), −P (y = u˜ i (xi(−1) , θi ) − gijw |x, z , gˆij (0, 1, 0, . . . , 0)|x, z , gˆij = 1, si (g ) = sij ) + P (ϵi ≤ −xi1 − u˜ i (xi(−1) , θi ) − gijw |x, z , gˆij = 1, si (g ) = sij )} ≤ |P (y = (0, 1, 0, . . . , 0)|x, z , gˆij = 1, si (g ) = sij ) − P (ϵi ≤ −xi1 − u˜ i (xi(−1) , θi ) − gijw |x, z , gˆij = 1, si (g ) = sij )| ≤ P (ϵi ≤ −xi1 −

u˜ i (xi(−1) , θi ) − gijw , DijC,δ , |g w | ≤ Gδ |x, z , gˆij = 1, si (g ) = sij ) + P (ϵi ≤ −xi1 − u˜ i (xi(−1) , θi ) − gijw , |g w | ̸≤ Gδ |x, z , gˆij = 1, si (g ) =

sij ) + P (y = (0, 1, 0, . . . , 0), DijC,δ , |g w | ≤ Gδ |x, z , gˆij = 1, si (g ) = sij ) + P ( y = (0, 1, 0, . . . , 0), |g w | ̸≤ Gδ |x, z , gˆij = 1, si (g ) = sij ) ≤ 2( k̸=i,j P (ϵk ≥ −xk1 − u˜ k (xk(−1) , θk ) − (N − 1)Gδ |x, z , gˆij = 1, si (g ) = sij ) + P (ϵj ≤ −xj1 − u˜ j (xj(−1) , θj ) + (N − 1)Gδ |x, z , gˆij = 1, si (g ) = sij )) + 2(1 − δ) under the assumed conditions, where the 1 in (0, 1, 0, . . . , 0) corresponds to role j.

B. Kline / Journal of Econometrics 189 (2015) 117–131

Suppose that condition 3 holds. Then for such (x, z , gˆij = 1, si (g ) = sij ), it must be that −xk1 − u˜ k (xk(−1) , θk )−(N − 1)Gδ > 0 for all k ̸= i, j, since otherwise there would be k ̸= i, j such that F˜ϵk |x,z ,ˆgij =1,si (g )=sij (−xk1 − u˜ k (xk(−1) , θk ) − (N − 1)Gδ ) ≤ 12 , in which case the condition clearly could not hold. And, similarly, it must be that −xj1 − u˜ j (xj(−1) , θj ) + (N − 1)Gδ < 0. Similarly, for such (x, z , gˆij = 1, si (g ) = sij ), it must be that −xk1 −˜uk (xk(−1) , tk )−(N − 1)Gδ > 0 for all k ̸= i, j, since otherwise there would be k ̸= i, j such that F˜ϵk |x,z ,ˆgij =1,si (g )=sij (−xk1 − u˜ k (xk(−1) , tk ) − (N − 1)Gδ ) ≤ 12 , in which case the condition clearly could not hold. And, similarly, it must be that −xj1 − u˜ j (xj(−1) , tj ) + (N − 1)Gδ < 0. Similarly, for such (x, z , gˆij = 1, si (g ) = sij ), it must be that −xi1 − u˜ i (xi(−1) , θi )− sij fij (zij , δij ) < 0 and −xi1 − u˜ i (xi(−1) , ti ) − sij fij (zij , dij ) > 0. Therefore, by Assumption 5.1, 2( k̸=i,j P (ϵk ≥ −xk1 − u˜ k (xk(−1) , θk ) − (N − 1)Gδ |x, z , gˆij = 1, si (g ) = sij ) + P (ϵj ≤ −xj1 − u˜ j (xj(−1) , θj ) + (N − 1)Gδ |x, z , gˆij = 1, si (g ) = sij )) + 2(1 − δ) ≤

(1 − F˜ϵk |x,z ,ˆgij =1,si (g )=sij (−xk1 − u˜ k (xk(−1) , θk )−(N − 1)Gδ ))+ F˜ϵj |x,z ,ˆgij =1,si (g )=sij (−xj1 − u˜ j (xj(−1) , θj ) + (N − 1)Gδ )) + 2(1 − δ). 2(



k̸=i,j

Moreover, the similar inequality holds for the other specification of the model. Therefore, at the true specification of the model, P (y = (0, 1, 0, . . . , 0)|x, z , gˆij = 1, si (g ) = sij ) ≤ P (ϵi ≤ −xi1 − u˜ i (xi(−1) , θi ) − gijw |x, z , gˆij = 1, si (g ) = sij ) + 2(

(1 − F˜ϵk |x,z ,ˆgij =1,si (g )=sij (−xk1 − u˜ k (xk(−1) , θk ) − (N − 1)Gδ )) + F˜ϵj |x,z ,ˆgij =1,si (g )=sij (−xj1 − u˜ j (xj(−1) , θj ) + (N − 1)Gδ )) + 2(1 − δ) < 12 . Conversely, at the other specification of the model, P (y = (0, 1, 0, . . . , 0)|x, z , gˆij = 1, si (g ) = sij ) ≥ P (ϵi ≤ −xi1 − u˜ i (xi(−1) , ti ) − gijw |x, z , gˆij = 1, si (g ) = sij ) −  2( k̸=i,j (1 − F˜ϵk |x,z ,ˆgij =1,si (g )=sij (−xk1 − u˜ k (xk(−1) , tk )−(N − 1)Gδ ))+ F˜ϵj |x,z ,ˆgij =1,si (g )=sij (−xj1 − u˜ j (xj(−1) , tj ) + (N − 1)Gδ )) − 2(1 − δ) > 21 . Consequently, the other specification of the model can 

k̸=i,j

be observationally distinguished from the true specification of the model. It is similar if condition 4 holds. The same proofs establish claims 7 and 8.  Appendix B. Supplementary data Supplementary material related to this article can be found online at http://dx.doi.org/10.1016/j.jeconom.2015.06.023. References Aradillas-Lopez, A., 2010. Semiparametric estimation of a simultaneous game with incomplete information. J. Econometrics 157 (2), 409–431. Aradillas-Lopez, A., 2011. Nonparametric probability bounds for nash equilibrium actions in a simultaneous discrete game. Quant. Econ. 2 (2), 135–171. Aradillas-Lopez, A., Tamer, E., 2008. The identification power of equilibrium in simple games. J. Bus. Econom. Statist. 26 (3), 261–283. Aumann, R., Brandenburger, A., 1995. Epistemic conditions for nash equilibrium. Econometrica 63 (5), 1161–1180. Bajari, P., Hahn, J., Hong, H., Ridder, G., 2011. A note on the semiparametric estimation of finite mixtures of discrete choice models with application to game theoretic models. Internat. Econom. Rev. 52 (3), 807–824. Bajari, P., Hong, H., Krainer, J., Nekipelov, D., 2010a. Estimating static models of strategic interactions. J. Bus. Econom. Statist. 28 (4), 469–482. Bajari, P., Hong, H., Ryan, S.P., 2010b. Identification and estimation of a discrete game of complete information. Econometrica 78 (5), 1529–1568. Beresteanu, A., Molchanov, I., Molinari, F., 2011. Sharp identification regions in models with convex moment predictions. Econometrica 79 (6), 1785–1821. Bernheim, B.D., 1984. Rationalizable strategic behavior. Econometrica 52 (4), 1007–1028. Berry, S.T., 1992. Estimation of a model of entry in the airline industry. Econometrica 60 (4), 889–917. Black, D.A., Berger, M.C., Scott, F.A., 2000. Bounding parameter estimates with nonclassical measurement error. J. Amer. Statist. Assoc. 95 (451), 739–748. Bollinger, C.R., 1996. Bounding mean regressions when a binary regressor is mismeasured. J. Econometrics 73 (2), 387–399. Bramoullé, Y., Djebbari, H., Fortin, B., 2009. Identification of peer effects through social networks. J. Econometrics 150 (1), 41–55. Bresnahan, T.F., Reiss, P.C., 1990. Entry in monopoly markets. Rev. Econ. Stud. 57 (4), 531–553.

131

Bresnahan, T.F., Reiss, P.C., 1991. Empirical models of discrete games. J. Econometrics 48 (1–2), 57–81. Brock, W.A., Durlauf, S.N., 2001. Discrete choice with social interactions. Rev. Econ. Stud. 68 (2), 235–260. Brock, W.A., Durlauf, S.N., 2007. Identification of binary choice models with social interactions. J. Econometrics 140 (1), 52–75. Chandrasekhar, A.G., Lewis, R., 2011. Econometrics of sampled networks. Chen, X., Tamer, E., Torgovitsky, A., 2011. Sensitivity analysis in semiparametric likelihood models. Chesher, A., Rosen, A.M., 2012. Simultaneous equations models for discrete outcomes: coherence, completeness, and identification. Christakis, N., Fowler, J., Imbens, G.W., Kalyanaraman, K., 2010. An empirical model for strategic network formation. De Giorgi, G., Pellizzari, M., Redaelli, S., 2010. Identification of social interactions through partially overlapping peer groups. Amer. Econ. J.: Appl. Econ. 2 (2), 241–275. Fox, J.T., Lazzati, N., 2012. Identification of potential games and demand models for bundles. Fudenberg, D., Tirole, J., 1991. Game Theory. MIT Press. Galeotti, A., Goyal, S., Jackson, M.O., Vega-Redondo, F., Yariv, L., 2010. Network games. Rev. Econ. Stud. 77 (1), 218–244. Galichon, A., Henry, M., 2011. Set identification in models with multiple equilibria. Rev. Econ. Stud. 78 (4), 1264–1298. Grieco, P.L.E., 2014. Discrete games with flexible information structures: an application to local grocery markets. Rand J. Econ. 45 (2), 303–340. Holland, P.W., Leinhardt, S., 1973. The structural implications of measurement error in sociometry. J. Math. Sociol. 3 (1), 85–111. Honoré, B., de Paula, Á., 2010. Interdependent durations. Rev. Econom. Stud. 77 (3), 1138–1163. Horowitz, J.L., 1992. A smoothed maximum score estimator for the binary response model. Econometrica 60 (3), 505–531. Hu, Y., 2008. Identification and estimation of nonlinear models with misclassification error using instrumental variables: A general solution. J. Econometrics 144 (1), 27–61. Jackson, M., 2010. Social and Economic Networks. Princeton University Press. Khan, S., Tamer, E., 2010. Irregular identification, support conditions, and inverse weight estimation. Econometrica 78 (6), 2021–2042. Kline, B., 2015a. The empirical content of games with bounded regressors. Kline, B., 2015b. An empirical model of non-equilibrium behavior in games. Kline, B., Tamer, E., 2012. Bounds for best response functions in binary games. J. Econometrics 166 (1), 92–105. Kline, B., Tamer, E., 2015. Bayesian inference in a class of partially identified models. Kolaczyk, E.D., 2009. Statistical Analysis of Network Data: Methods and Models. Springer. Kossinets, G., 2006. Effects of missing data in social networks. Social Networks 28 (3), 247–268. Krauth, B.V., 2006. Simulation-based estimation of peer effects. J. Econometrics 133 (1), 243–271. Lazzati, N., 2015. Treatment response with social interactions: partial identification via monotone comparative statics. Quant. Econ. 6 (1), 49–83. Lewbel, A., 2007. Estimation of average treatment effects with misclassification. Econometrica 75 (2), 537–551. Liu, X., Shao, Y., 2003. Asymptotics for likelihood ratio tests under loss of identifiability. Ann. Statist. 31 (3), 807–832. Magnac, T., Maurin, E., 2007. Identification and information in monotone binary models. J. Econometrics 139 (1), 76–104. Mahajan, A., 2006. Identification and estimation of regression models with misclassification. Econometrica 74 (3), 631–665. Manski, C.F., 1975. Maximum score estimation of the stochastic utility model of choice. J. Econometrics 3 (3), 205–228. Manski, C.F., 1985. Semiparametric analysis of discrete response: asymptotic properties of the maximum score estimator. J. Econometrics 27 (3), 313–333. Manski, C.F., 1988. Identification of binary response models. J. Amer. Statist. Assoc. 83 (403), 729–738. Manski, C.F., 1993. Identification of endogenous social effects: the reflection problem. Rev. Econom. Stud. 60 (3), 531–542. Manski, C.F., 2013. Identification of treatment response with social interactions. Econometrics J. 16 (1), S1–S23. Marsden, P.V., 1990. Network data and measurement. Ann. Rev. Sociol. 16, 435–463. Marsden, P.V., 2005. Recent developments in network measurement. In: Carrington, Peter J., John Scott, S.W. (Eds.), Models and Methods in Social Network Analysis. Cambridge University Press, Cambridge, pp. 8–30. McPherson, M., Smith-Lovin, L., Cook, J.M., 2001. Birds of a feather: Homophily in social networks. Ann. Rev. Sociol. 27, 415–444. Molinari, F., 2008. Partial identification of probability distributions with misclassified data. J. Econometrics 144 (1), 81–117. de Paula, Á., 2009. Inference in a synchronization game with social interactions. J. Econometrics 148 (1), 56–71. de Paula, Á., Tang, X., 2012. Inference of signs of interaction effects in simultaneous games with incomplete information. Econometrica 80 (1), 143–172. Pearce, D.G., 1984. Rationalizable strategic behavior and the problem of perfection. Econometrica 52 (4), 1029–1050. Rubin, H., Sellke, T., 1986. On the distributions of sums of symmetric random variables and vectors. Ann. Probab. 14 (1), 247–259. Soetevent, A.R., Kooreman, P., 2007. A discrete-choice model with social interactions: with an application to high school teen behavior. J. Appl. Econometrics 22 (3), 599–624. Tamer, E., 2003. Incomplete simultaneous discrete response model with multiple equilibria. Rev. Econom. Stud. 70 (1), 147–165. Tan, T.C.-C., da Costa Werlang, S.R., 1988. The bayesian foundations of solution concepts of games. J. Economic Theory 45 (2), 370–391. Wang, D.J., Shi, X., McFarland, D.A., Leskovec, J., 2012. Measurement error in network data: A re-classification. Social Networks 34 (4), 396–409.