Bonferroni-based size-correction for nonstandard testing problems

Bonferroni-based size-correction for nonstandard testing problems

Accepted Manuscript Bonferroni-based size-correction for nonstandard testing problems Adam McCloskey PII: DOI: Reference: S0304-4076(17)30055-6 http...

1MB Sizes 0 Downloads 53 Views

Accepted Manuscript Bonferroni-based size-correction for nonstandard testing problems Adam McCloskey

PII: DOI: Reference:

S0304-4076(17)30055-6 http://dx.doi.org/10.1016/j.jeconom.2017.05.001 ECONOM 4360

To appear in:

Journal of Econometrics

Received date : 7 January 2016 Revised date : 15 November 2016 Accepted date : 2 May 2017 Please cite this article as: McCloskey, A., Bonferroni-based size-correction for nonstandard testing problems. Journal of Econometrics (2017), http://dx.doi.org/10.1016/j.jeconom.2017.05.001 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Bonferroni-Based Size-Correction for Nonstandard Testing Problems∗ Adam McCloskey† Brown University October 2011; This Version: May 2017

Abstract We develop a set of powerful and flexible size-correction procedures for general nonstandard testing environments in which the asymptotic distribution of a test statistic is discontinuous in a nuisance parameter under the null hypothesis. Examples of this form of testing problem are pervasive in econometrics and complicate inference by making size difficult to control. The test constructions introduced here simultaneously control the asymptotic size of the test uniformly over the nuisance parameter space while leading to tests with desirable power properties. They have the flexibility to allow the user to direct the power of the resultant test toward alternatives of particular interest. We introduce three types of size-corrected critical values that make use of reasoning derived from Bonferroni bounds. The new methods provide complementary alternatives to existing size-correction methods, entailing substantially higher power for many testing problems. The critical value constructions are developed for an expanded class of testing problems, allowing application in problems to which previously available size-corrections only yield tests with negligible power. We detail the construction and performance of the new tests in examples of testing after conservative and consistent model selection in the linear regression model. Keywords: Hypothesis testing, uniform inference, asymptotic size, Bonferroni bounds, power, size-correction, model selection, pretesting, local asymptotics ∗

A previous version of this paper was circulated under the title “Powerful Procedures with Correct Size for Tests Statistics with Limit Distributions that are Discontinuous in Some Parameters”. The author thanks the associate editor, two anonymous referees, Donald Andrews, Federico Bugni, Matias Cattaneo, Xu Cheng, Kirill Evdokimov, Iv´an Fern´andez-Val, Patrik Guggenberger, Søren Johansen, Hiroaki Kaido, Frank Kleibergen, Bruce Hansen, Hannes Leeb, Blaise Melly, Stelios Michalopoulos, Jos´e Luis Montiel Olea, Ulrich M¨ uller, Serena Ng, Pierre Perron, Benedikt P¨ otscher, Zhongjun Qu, Eric Renault and Joseph Romano for helpful comments. I am especially grateful to Donald Andrews for suggesting a solution to a mistake in a previous draft. Support from the NSF under grant SES-1357607 is gratefully acknowledged. † Department of Economics, Brown University, Box B, 64 Waterman St., Providence, RI, 02912 (adam [email protected])

1

Introduction

Nonstandard econometric testing problems have gained substantial attention in recent years. In this paper, we focus on a broad class of these problems: those for which the null limit distribution of a test statistic is discontinuous in a nuisance parameter.1 The problems falling into this class range from tests in the potential presence of identification failure (e.g., Staiger and Stock, 1997 and Andrews and Cheng, 2012) to tests in partially identified moment inequality models (e.g., Andrews and Soares, 2010) to tests after pretesting or model selection (e.g., Leeb and P¨otscher, 2005, 2008 and Guggenberger, 2010) . Though test statistics that do not exhibit this type of discontinuity exist for some problems (e.g., Kleibergen, 2002), they do not for others. Moreover, such test statistics may not necessarily be preferred to parameter-discontinuous ones when good size-correction procedures are available, as they can have low power. However, we sidestep the important issue of choosing a test statistic in this paper, taking it as given. The nonstandard nature of the testing problems studied in this paper cause standard asymptotic approximations to the size of a test break down. The inadequacy of standard asymptotic approximations and the resulting pitfalls for inference have been studied extensively in the literature. See, for example, Dufour (1997) in the context of inference on the two-stage least squares estimator, Imbens and Manski (2004) and Stoye (2009) in the context of inference on partially identified parameters and Leeb and P¨otscher (2005, 2008) in the context of inference after model selection. In order to determine the asymptotic size of a test in these contexts, one must examine the maximal probability of rejecting a true null hypothesis uniformly over the parameter space admissible under the null hypothesis, leading to the analysis of tests in the presence of nuisance parameters. To control size in these problems, one may take a least favorable (LF) approach to critical value (CV) construction by using the largest CV across all nuisance parameters (see Andrews and Guggenberger, 2009a). This often results in tests with poor power. Within a general framework, we develop a set of flexible size-corrected CV construction methods that lead to tests with correct asymptotic size and desirable power properties. The CVs have the flexibility to be tuned according to the user’s preferences on where he would like the power of the resultant test to be directed. We discuss ways to choose the tuning parameter when constructing the CVs to trade off directing power toward alternatives of 1 One could also describe such problems from the viewpoint of test statistic regularity in a similar fashion to estimator regularity (cf. van der Vaart, 1998, p. 115). That is, one could say that the test statistic sequence is not regular at some point under the null hypothesis.

1

interest and achieving high power at most alternatives. For example, at parameter values distant from the point of discontinuity, the resultant tests can be tuned to achieve nearly the same power as tests based upon standard asymptotic approximations (see e.g., Section 5), even though the latter do not control size. Moreover, the power differences between our tests and those based upon LF CVs can be quite pronounced, potentially reaching up to 100% over large portions of the alternative parameter space (see e.g., Section 5). The CV construction methods put forth here also lead to tests with controlled asymptotic size and non-negligible power when using test statistics that exhibit complicated asymptotic behavior such as dependence upon an expanded vector of nuisance parameters and possible divergence under parameter sequences admissible under the null hypothesis. This allows for tests with non-negligible power in nonstandard testing problems for which previously developed LF CV construction methods (and their variants) do not, such as testing after consistent model selection (e.g., Leeb and P¨otscher, 2005) or t-testing using possibly irrelevant instrumental variables (e.g., Staiger and Stock, 1997; Andrews and Guggenberger, 2010a). For such problems, LF CVs are infinite-valued. The basic idea behind the size-corrections we introduce is to construct adaptive CVs that use the data to estimate how far the true nuisance parameters are from the point that causes the discontinuity in the asymptotic behavior of the test statistic. In order to uniformly control asymptotic size, we examine the behavior of the test statistic under all possible parameter sequences admissible under the null hypothesis. This analysis leads to asymptotic versions of the test statistic that depend upon a nuisance parameter that is not consistently estimable. In the asymptotic experiment, this nuisance parameter quantifies the distance of the true parameter from the point of discontinuity. Though this parameter cannot be consistently estimated under drifting sequences, it is typically possible to construct asymptotically valid confidence sets for it. This provides the user with a set of asymptotic distributions that can be used to approximate the finite sample distribution of the tests statistic under the null hypothesis. Based upon a valid confidence set for the nuisance parameter, we examine three different size-correction methods in increasing order of computational complexity. For the first, we search for the maximal CV over the nuisance parameters in the confidence set, rather than the (plug-in) LF approach of using the maximal CV over the entire space of nuisance parameters that cannot be consistently estimated. Two levels of uncertainty are inherent to this construction: one for the nuisance parameter and one for the test statistic itself. We use Bonferroni bounds to account for both. For the second, we also search for a maximal 2

CV over a confidence set but, instead of using Bonferroni bounds, we account for the two levels of uncertainty by adjusting CV levels according to the null limiting distributions that arise under drifting parameter sequences. This method compensates for the asymptotic dependence structure between the test statistic and the CV, leading to more powerful tests than the first approach. For the third, we find the smallest CVs over sets of those justified by the first and second, leading to tests with high power over most of the parameter space. We discuss the computational tradeoffs involved in constructing each type of CV, providing specific recommendations to reduce computational burden. The recently developed methods for uniform asymptotic inference in a general setting of Andrews and Guggenberger (2009a) are closely related to the methods developed here. These authors also study a given test statistic and adjust CVs according to a drifting parameter sequence framework but their CVs are based upon variants of the LF approach. Loh (1985), Berger and Boos (1994) and Silvapulle (1996) also introduce inference methods based upon Bonferroni bounds in a similar spirit to the first of our three CV construction methods, though Loh’s (1985) fails to properly account for the uncertainty involved in estimating the nuisance parameter. Rather than considering uniform asymptotic inference as in this paper, these authors consider finite sample parametric testing problems for hypotheses in the presence of nuisance parameters. There are also a handful of papers making use of the first of our CV construction methods in specific examples that fall under our general framework. These include Cavanagh et al. (1995), Staiger and Stock (1997), Moon and Schorfheide (2009) and Romano et al. (2014).2 Finally, our second CV construction method is similar in spirit to a refinement of a Bonferroni confidence interval suggested for the nonstandard problem of testing with a potentially highly persistent regressor by Cavanagh et al. (1995) and Campbell and Yogo (2006). The scope of problems to which our size-correction methods may be applied is quite broad. Apart from those already mentioned, our general setting encompasses problems of tests on model averaged and shrinkage estimators (Hansen, 2007, 2016, 2014), Vuong tests for non-nested model selection (Vuong, 1989; Shi, 2015), subvector tests allowing for weak identification (Guggenberger et al., 2012) and tests on break dates and coefficients in structural change models (Elliott et al., 2015; Elliott and M¨ uller, 2014). A recent paper by Cattaneo and Crump (2014) uses one of our CV constructions in the nonstandard context of conducting valid inference using a heteroskedasticity and autocorrelation robust variance estimator in the presence of potentially strong autocorrelation. Another (Franchi and Jo2

Romano et al. (2014) was written concurrently with and independently of this paper.

3

hansen, 2017) applies one of our methods to the problem of testing cointegrating relationships when allowing for local departures from unit roots in the underlying time series data. As our testing framework encompasses theirs, Andrews and Guggenberger (2009a) list even more nonstandard problems to which the size-correction methods of this paper may be applied. For illustration, we motivate the general testing framework and CV construction methods by examining a simple testing after regressor selection problem in the linear regression model. This simple example was also studied by Leeb and P¨otscher (2005) and Andrews and Guggenberger (2009a). We extensively study the properties of tests based upon our Bonferroni-based CVs in this example under two different model selection regimes: conservative and consistent. The qualitative features that emerge from this study provide intuition for applying our size-correction method to other nonstandard testing problems. In a companion paper, McCloskey (2017), we develop the size-correction methods in a more general testing after consistent regressor selection setting, providing algorithms for the various CV constructions in this context. We focus on testing after model selection in the examples because at present, available uniformly valid inference methods tend to be very conservative. Inference after model selection is an important issue that is all too frequently ignored. See Hansen (2005) for an informative discussion of this issue. A recent paper by Belloni et al. (2014) is related to our testing after model selection application but it is different for the following reasons: (i) it is designed for high-dimensional models, collapsing to standard full regression inference in the low-dimensional setting of this paper, (ii) our methods do not impose any (approximate) sparsity conditions and (iii) our CV constructions can be applied to many post-selection estimators while Belloni et al. (2014) applies to a particular type of post-selection estimator (“post-double-selection”). The remainder of this paper is composed as follows. Section 2 describes the general class of nonstandard hypothesis testing problems we study, motivating the discussion with a simple testing after model selection example. Section 3 goes on to specify the three size-correction methods of this paper, providing theoretical results that establish their asymptotic size properties. The section also contains discussions of computational and power considerations in the construction of the proposed CVs. Section 4 contains an asymptotic local power analysis of the tests based upon the proposed CVs in the context of our simple testing after model selection example while Section 5 performs a similar analysis on the finite sample power and null rejection probability of the tests. Appendix I contains algorithms for constructing the Bonferroni-based CVs in the general nonstandard testing context introduced in this paper. Mathematical proofs of the main results of the paper are contained in Appendix 4

2. Appendix 3 contains a high-level assumption that is implied by a handful of lower-level assumptions in a special case of the paper. All tables and figures can be found at the end of the document. To simplify notation, we will occasionally abuse it by letting (a1 , a2 ) denote the vector (a01 , a02 )0 .

The sample size is denoted by n and all limits are taken to mean as n → ∞. The

symbol “⇒” denotes a correspondence (set-valued mapping). Let R∞ = R ∪ {−∞, ∞}, 1(·) denote the indicator function, Φ(·) be the usual notation for the distribution function of

the standard normal distribution and {±∞} = {−∞, ∞}. For a generic set C and positive d

p

integer k, C k = C × . . . × C, k-times. “−→” and “−→” denote weak convergence and

convergence in probability while O(·), o(·), Op (·) and op (·) denote the usual (stochastic) orders of magnitude. 2

Testing Framework

Before describing the testing framework of this paper in full generality, we begin by providing a simple example to capture the intuition behind the subsequent mathematical formalism: testing on a parameter in the linear regression model after using the data to determine whether a potential control variable should enter the regression. We are interested in a hypothesis test on the causal effect of a random variable x on a random variable y but we are uncertain whether the “control” variable z should enter the regression to mitigate omitted variable bias. Formally, we have the linear regression model yi = θxi + ξzi + ei ,

(1)

with EF [ei |xi , zi ] = 0 and EF [e2i |xi , zi ] = σ 2 for i = 1, . . . , n, where yi is a scalar dependent

variable, xi is a scalar regressor of interest, zi is a a potential control variable, ei is an unobservable error term, θ is the scalar parameter under test, ξ is an unknown regression

coefficient and F is the joint distribution of the i.i.d. random vector (xi , zi , ei )0 . We treat the i.i.d. homoskedastic case with one potential control here for simplicity only. Analogous results extend to selection on any finite number of controls, allowing for serial correlation and heteroskedasticity. See McCloskey (2017). 2.1

Parameter-Discontinuous Asymptotic Distributions

We are interested in hypotheses of the form H0 : θ = θ0 for some θ ∈ R and, as is common practice, we will base inference upon an estimate of θ after using the data to determine 5

whether zi should enter the regression model. Apart from being “standard practice”, there may be other reasons to proceed in this fashion. For example, we may expect efficiency gains from imposing correct restrictions or we may be simultaneously interested in what the “correct” most parsimonious underlying regression model is. A standard empirical practice would then be to base the decision on whether or not to include zi in the regression when √ estimating θ on the absolute value of the t-statistic tn = nξˆn /ˆ σξ,n , where ξˆn is the OLS estimator of ξ in the full regression given by (1) and σ ˆξ,n is a consistent estimator of the ˆ asymptotic standard deviation of ξn , σξ . Comparing |tn | to a pretest selection threshold c, we then base post-selection inference upon the estimator of θ corresponding to the model selected

by the data so that we can formally write a (non-studentized) two-sided post-selection test statistic as follows:3 √ √ Tn (θ0 ) = | n(θˆn − θ0 )|1(|tn | > c) + | n(θen − θ0 )|1(|tn | ≤ c),

(2)

where θˆn is the unrestricted OLS estimator of θ in the full regression (1) and θen is the

restricted OLS estimator of θ that imposes the restriction ξ = 0.

The nonstandard nature of tests based upon the statistic (2) arises from its complicated limiting behavior. This limiting behavior depends upon whether we are engaging in conservative or consistent model selection when pretesting on ξ. More specifically, if the pretest selection threshold c does not depend on the sample size, we are engaging in conservative selection and under standard assumptions and fixed parameter values   |σ Zˆ |, θ 1 d Tn (θ0 ) −→  |σ Zˆ |1(|Z | > c) + |σ p1 − ρ2 Ze |1(|Z | ≤ c), θ 1 2 θ 1 2

if ξ 6= 0

(3)

if ξ = 0

d under H0 , where Zˆ1 , Ze1 , Z2 ∼ N (0, 1), σθ2 is the asymptotic variance of θˆn and ρ is the asymptotic correlation between θˆn and ξˆn .4 Examples of conservative selection include standard

t-statistic pretesting for which c is a quantile of the standard normal distribution or selec√ tion based upon the AIC for which c = 2 here. The asymptotic behavior of (a studentized

version of) this test statistic has been studied in detail by Leeb and P¨otscher (2005) and Andrews and Guggenberger (2009a). On the other hand, if we allow c to grow in the sample size, viz. c = cn → ∞ as n → ∞, we are engaging in consistent selection and under standard 3

The results of this paper also apply to studentized post-selection statistics but examining the nonstudentized case is better for highlighting the intuition for some of the subsequent formalizations. 4 For brevity, we do not fully characterize the asymptotic distribution of Tn (θ0 ) here (e.g., the joint distribution of (Zˆ1 , Ze1 , Z2 )). See (16) for full details.

6

assumptions and fixed parameter values   |σ Zˆ |, θ 1 d Tn (θ0 ) −→  |σ p1 − ρ2 Ze |, θ 1

if ξ 6= 0

(4)

if ξ = 0

under H0 . Examples of consistent selection include selection based upon the BIC for which √ c = log n or the Hannan-Quinn information criterion. The asymptotic behavior of this test

statistic has been studied in detail by Leeb and P¨otscher (2005). We can see from the above discussion that in both of these post-selection testing problems, the asymptotic distribution of Tn (θ0 ) is discontinuous in ξ/σξ at ξ/σξ = 0 while it still depends continuously upon the vector of parameters (σθ , ρ)0 .5 This type of asymptotic behavior is indeed characteristic of a wide class of test statistics in a broad class of testing problems and can be formalized as follows. We wish to test H0 : θ = θ0 for some θ ∈ Rd in the presence of a nuisance parameter γ ∈ Γ using test statistic Tn (θ0 ). Under H0 and fixed parameters (that do not change with the sample size), the asymptotic distribution of Tn (θ0 ) depends upon γ and is discontinuous at some value(s) of γ. We decompose γ into three components: γ ≡ (γ1 , γ2 , γ3 ), where γ1 ∈ Rp ,

γ2 ∈ Rq and γ3 ∈ Γ3 , an arbitrary, possibly infinite-dimensional space. The asymptotic distribution of Tn (θ0 ) is discontinuous at points for which γ1,i = 0 for some i ∈ {1, . . . , p}.

γ2 are parameters that affect the limiting null distribution of Tn (θ0 ) but do not affect the distance of the parameter γ to the point of discontinuity. Under fixed parameters, the null asymptotic distribution of Tn (θ0 ) may not depend upon some elements of γ2 . However, under asymptotics that have (user-chosen) parameters changing with the sample size, as described in the next section, the limiting value of these elements still affect the null asymptotic distribution of Tn (θ0 ). On the other hand, the (limiting value of) γ3 does not affect the

asymptotic distribution of Tn (θ0 ) under any parameter sequences. The following assumption is identical to Assumption A0 of Andrews and Guggenberger (2009b). It is very general and does not require any subvectors of γ to lie in a product space, a condition that is violated in some testing problems, such as testing in some moment inequality models.6 Assumption PS (Parameter Space). Γ ⊂ {(γ1 , γ2 , γ3 ) : γ1 ∈ Rp , γ2 ∈ Rq , γ3 ∈ Γ3 }. 5

The asymptotic distribution of Tn (θ0 ) is discontinuous in cξ at cξ = 0 for any arbitrary nonzero constant c. The particular normalization by σξ is used only to facilitate notational simplicity in the subsequent analyses. 6 A previous version of this paper did enforce such a product space assumption, precluding inference in some moment inequality models. I am grateful to Patrik Guggenberger for pointing out how to relax this assumption.

7

2.2

Drifting Sequence Asymptotics

For a test statistic Tn (θ0 ), we wish to introduce CVs that control the asymptotic size of the resulting test. Let CVn denote an arbitrary (possibly random and/or sample sizedependent) CV being used for the test. The finite sample null rejection probability (NRP) of the test evaluated at γ ∈ Γ is given by Pθ0 ,γ (Tn (θ0 ) > CVn ), where Pθ0 ,γ (E) denotes the probability of event E given that θ0 and γ are the true parameters describing the datagenerating process (DGP). The asymptotic NRP of such a test evaluated at γ ∈ Γ is given

by lim supn→∞ Pθ0 ,γ (Tn (θ0 ) > CVn ) and the asymptotic size of the test is given by AsySz(θ0 , CVn ) = lim sup sup Pθ0 ,γ (Tn (θ0 ) > CVn ). n→∞

γ∈Γ

In general, the asymptotic NRP at any given γ ∈ Γ is not equal to the asymptotic size of the

test. To control asymptotic size, we examine the null limiting behavior of the test statistic under parameter sequences {γn : n ≥ 1} indexed by the sample size (see e.g., Andrews and Guggenberger, 2009a,b; Andrews et al., 2011). More specifically, when a sequence of true nuisance parameters γn satisfies n1/2 γn,1 → h1 and γn,2 → h2 , then Tn (θ0 ) has an asymptotic distribution that depends upon the localization parameter h ≡ (h1 , h2 ) ∈ Rp∞ × Rq∞ .7

In the context of our testing problem after conservative selection, indexing parameters 2 by the sample size, let σξ,n = EFn [e2i ]EFn [x2i ]/(EFn [x2i ]EFn [zi2 ] − EFn [xi zi ]2 ). Under H0 ,

n1/2 γn,1 = n1/2 ξn /σξ,n → h1 , γn,2 = (σθ,n , ρn )0 → h2 = (h2,1 , h2,2 )0 and standard assumptions on the DGPs,8

d Tn (θ0 ) −→ |h2,1 Zˆ1 |1(|h1 + Z2 | > c) + | − h1 h2,1 h2,2 + h2,1 (1 − h22,2 )1/2 Ze1 |1(|h1 + Z2 | ≤ c). (5)

The asymptotic distribution is completely characterized by h1 and h2 . For this problem, γ3 = F with Γ3 restricting the data to satisfy standard conditions for laws of large numbers and central limit theorems to hold (see McCloskey, 2017 for more detail). In the context of testing after consistent selection, it is necessary to append γn,2 with an additional sequence of parameters to fully characterize the asymptotic behavior of the test statistic and establish a test with correct asymptotic size. Under H0 , n1/2 γn,1 = n1/2 ξn /σξ,n → h1 , 7 In the terminology of Andrews and Guggenberger (2009a,b, 2010b), this framework allows for any finite discontinuity point γ¯1 and rate of convergence index r > 0 via proper centering and renormalization of γn,1 , i.e., nr (γn,1 − γ¯1 ) → h1 . 8 The expression is technically invalid in cases for which |h1 | = ∞ and h2,2 = 0. This is easily solved by expanding h1 to a two-dimensional vector or by examining the studentized statistic, as in Andrews and Guggenberger (2009a). We abstract from this issue here for pedagogical reasons.

8

√ √ γn,2 = (σθ,n , ρn , nξn /σξ,n + cn , nξn /σξ,n − cn )0 → h2 = (h2,1 , h2,2 , h2,3 , h2,4 )0 and standard assumptions on the DGPs,9

d Tn (θ0 ) −→ |h2,1 Zˆ1 |1(max{−h2,3 − Z2 , h2,4 + Z2 } > 0) + | − h1 h2,1 h2,2 + h2,1 (1 − h22,2 )1/2 Ze1 |1(max{−h2,3 − Z2 , h2,4 + Z2 } ≤ 0).

(6)

The intuition for examining such drifting sequences of parameters in asymptotic size

analysis comes from the inadequacy of the fixed parameter asymptotic distribution of a test statistic to approximate its finite sample counterpart in nonstandard contexts. For example, under H0 , the point-wise asymptotic distributions of (3) and (4) provide very poor approximations to the finite sample distribution of Tn (θ0 ) when ξ is close, but not equal, to zero. This feature was documented extensively for the post-selection testing problem by Leeb and P¨otscher (2005) and has been similarly noted in other contexts, such as testing in the presence of weak instrumental variables or highly persistent regressors, by e.g., Nelson and Startz (1990) and Elliott and Stock (1994). Though fixed parameter asymptotics routinely fail in nonstandard testing contexts, asymptotic distributions derived under appropriate drifting sequences of parameters often provide very good approximations to finite sample null distributions. For example, for some value of h = (h1 , h2 ), the distributions of (5) and (6) can provide very good approximations to the finite sample distribution of Tn (θ0 ). This feature has indeed also been noted in the literature. See, e.g., Staiger and Stock (1997) in the context of the instrumental variable regression model and Chan (1988) in the context of highly persistent regressors. A leading technique for establishing the asymptotic size properties of tests is to examine subsequences of parameters indexed by the sample size (e.g., Andrews and Guggenberger, 2009b; Andrews et al., 2011). Before moving forward with this type of analysis, we note here that in some cases one may instead prefer to work from a high-level assumption that characterizes asymptotic size in terms of a maximal null rejection probability over a localization parameter space. Such an approach can streamline some of the the subsequent analysis and may ease reading for the reader that is already well-acquainted with uniform asymptotic size analysis.10 We refer such a reader to Appendix III for further details and discussion. Let {ωn : n ≥ 1} denote some subsequence of {n : n ≥ 1}. Given {ωn : n ≥ 1}, we

consider sequences of parameters with the following properties. 9

Again, the expression is technically invalid in cases for which |h1 | = ∞ and h2,2 = 0. In this case, a reparameterization of h1 and h2 solves the problem. See McCloskey (2017) for parameterizations that do not run into this difficulty. 10 We thank an anonymous referee for pointing this out.

9

Definition DS (Drifting Sequnces). Given h = (h1 , h2 ) ∈ Rp∞ × Rq∞ , let {γωn ,h = (γωn ,h,1 , γωn ,h,2 , γωn ,h,3 ) : n ≥ 1} 1/2

denote a sequence of parameters in Γ for which ωn γωn ,h,1 → h1 and γωn ,h,2 → h2 , if such a sequence exists. Define the localization parameter space: H ≡ {h ∈ Rp∞ × Rq∞ : ∃ a subsequence {ωn : n ≥ 1} and a sequence {γωn ,h : n ≥ 1}}. The sequence {γωn ,h : n ≥ 1} is defined so that under {γωn ,h : n ≥ 1} and H0 , the asymptotic distribution of Tωn (θ0 ) is completely characterized by h.

Remark 1. The objects in Definition DS, the definition of H and the definitions and assumptions to follow are all written in terms of parameters that are indexed by subsequences {ωn : n ≥ 1} of a sequence of sample sizes {n : n ≥ 1}. However, the reader may be more

familiar or comfortable with analyzing sequences of parameters indexed by the sample size n, rather than a subsequence of the sample size ωn . To verify some of the assumptions below written in terms of subsequences, one can first consider the corresponding objects in the assumptions that are instead indexed by n and then verify that every subsequence of objects indexed by ωn has a corresponding full sequence. For more details, see e.g., Lemma 2.1 of Andrews et al. (2011). Though asymptotic distributions under drifting sequences {γn,h : n ≥ 1} typically pro-

duce good approximations to the finite sample distribution of Tn (θ0 ) for some localization parameter h, it is typically unclear which value of h should be used in the approximation for

a given problem since h cannot be consistently estimated under H0 and {γn,h : n ≥ 1}. This has led to the use of LF CVs for inference in nonstandard contexts: construct the largest (1−α) quantile over all h of the limiting distributions obtained under {γn,h : n ≥ 1} (e.g., An-

drews and Guggenberger, 2009a). However, in typical applications, one can hope to improve upon the power properties of a test using LF CVs by using an estimator of components of h that converge to a random variable centered around its true value under {γn,h : n ≥ 1}. For example in the post-selection testing problem, hc2 = (h2,1 , h2,2 ) can be consistently estimated under {γn,h : n ≥ 1} by estimators of the asymptotic standard deviation of θˆ and asymptotic

correlation between θˆn and ξˆn , σ ˆθ,n and ρˆn . For h1 ,

d ˆ n,1 = n1/2 ξˆn /ˆ h σn,ξ −→ h1 + Z2

10

(7)

under H0 and {γn,h : n ≥ 1}.

ˆ n,1 of h1 converges jointly with In the typical application, the (inconsistent) estimator h the test statistic under H0 and {γωn ,h : n ≥ 1} since they are both typically functions of the

same underlying data. Moreover, the subvector hc2 of h2 is consistently estimable. We take ˆ n ≡ (h ˆ n,1 , h ˆ c ) as given and formalize its properties and the drifting the existence of such a h n,2

sequence limit of Tn (θ0 ) as follows.

Assumption DS. For all h ∈ H, all subsequences {ωn : n ≥ 1} of {n : n ≥ 1}, all sequences d ˆ ωn ) −→ {γωn ,h : n ≥ 1} and some random variables {(Wh , e h) : h ∈ H}, (Tωn (θ0 ), h (Wh , e h)

ˆ ωn = (h ˆ ωn ,1 , h ˆ c ) ∈ Rp × Rs with 0 ≤ s ≤ q and under H0 and {γωn ,h : n ≥ 1}, where h ∞ ∞ ωn ,2 e h = (e h1 , hc2 ) ∈ Rp∞ × Rs∞ .11

In our example of testing after consistent model selection, where Wh is equal to the √ limiting random variable in (6), the auxiliary parameter ζ∞,h = limn→∞ nξn /(σξ,n cn ) is consistently estimable and provides information on the asymptotic distribution of Tn (θ0 )

under H0 and {γn,h : n ≥ 1}. More specifically, ζ∞,h ∈ (−1, 1) implies d

f Tn (θ0 ) −→ | − h1 h2,1 h2,2 + h2,1 (1 − h22,2 )1/2 Ze1 | = W h

(1)

(8)

and ζ∞,h ∈ [−∞, −1) ∪ (1, ∞] implies

d f (2) . Tn (θ0 ) −→ |h2,1 Zˆ1 | = W h

(9)

This type of asymptotic behavior can be seen to motivate the following definition. Definition MLLD (Multiple Localized Limit Distributions). Given h ∈ H and any

subsequence {ωn : n ≥ 1} of {n : n ≥ 1}, under H0 and any sequence {γωn ,h : n ≥ 1}, let

{ζωn ∈ Rr : n ≥ 1} with ζωn → ζ∞,h ∈ Rr∞ denote a convergent sequence of parameters, (j) P = {A1 , . . . , Au } denote a finite partition of Rr∞ and {Wh : 1 ≤ j ≤ u} denote a collection f (j) a.s. and the latter random of random variables such that ζ∞,h ∈ Aj implies Wh ≤ W h

(j)

variable is completely characterized by h



(j) (h1 , hc2 ),

(j)

where h1

is a p(j) -dimensional

subvector of h1 and hc2 is a s-dimensional subvector of h2 .12

Technically, e h should be indexed by h. This is notation is suppressed here and throughout the paper for notational convenience. 12 f (j) The condition “Wh ≤ W (h1 ,h2 ) a.s.” in Definition MLLD may be replaced by “Wh is stochastically (j) f dominated by some W ” in Theorem Bonf, below. 11

(h1 ,h2 )

11

Note that this is a definition rather than an assumption since one may always set {ζωn ∈ f (1) = ∞.13 However, for the CV Rr : n ≥ 1} equal to a constant sequence, P = R∞ and W h construction methods discussed in the next section to yield tests with non-negligible power, (j)

one should always prefer to use the ζn , P and {Wh : 1 ≤ j ≤ u} for which the distribution (j)

functions of Wh and Wh are as “close” as possible when ζ∞,h ∈ Aj (when computationally feasible). See Remark 4. Sample size-dependent tuning parameters can induce the partition P . In our testing after

consistent selection example, u = 3, A1 = (−1, 1), A2 = [−∞, −1) ∪ (1, ∞], A3 = {−1, 1}

and

f (3) = max{| − h1 h2,1 h2,2 + h2,1 (1 − h22,2 )1/2 Ze1 |, |h2,1 Zˆ1 |}. W h

(10)

f (1) is the limiting random variable of Tn (θ0 ) under parameter Note that even though W h sequences that obtain under H0 , it can take on infinite value with non-negligible probability

(e.g., when |h1 | = ∞ and h2,2 6= 0). That is, the limiting random variable is not tight and the test statistic may actually diverge under H0 and certain {γωn ,h : n ≥ 1} sequences. A f (3) . This complicates similar statement may be made about the bounding random variable W h

the construction of appropriate CVs.

In order to accommodate this type of limiting behavior into the analysis, we define the following sets for j = 1, . . . , u with u defined above:

c p(j) s ¯ (j) ={(h(j) H 1 , h2 ) ∈ R∞ × R∞ : ∃ a subsequence {ωn : n ≥ 1} and a sequence {γωn ,h : n ≥ 1} f (j) | < ∞) = 1} and P (|W h

¯ (j),∞

H

(j) ={(h1 , hc2 )

(j)

∈ Rp∞ × Rs∞ : ∃ a subsequence {ωn : n ≥ 1} and a sequence {γωn ,h : n ≥ 1} f (j) | = ∞) > 0}. and P (|W h

For our post-consistent selection example, restricting σθ to be bounded above and away from zero and |ρ| to be bounded away from 1, i.e., 0 < κ1 ≤ γ2,1 ≤ κ2 < ∞ and |γ2,2 | ≤ 1 − κ3 < 1, ¯ (1) = H ¯ (3) = R × [κ1 , κ2 ] × [−1 + κ3 , 1 − κ3 ], H ¯ (1),∞ = H ¯ (3),∞ = we can readily see that H ¯ (2) = [κ1 , κ2 ] × [−1 + κ3 , 1 − κ3 ] and H ¯ (2),∞ = ∅.14 {±∞} × [κ1 , κ2 ] × [−1 + κ3 , 1 − κ3 ], H Matters are much simpler for the post-conservative selection example. In this case, s = q f (1) and H ¯ (1),∞ = ∅ (the entire h2 vector is consistently estimable), u = 1, P = R∞ , Wh = W h

since (5) is finite with probability 1 for all h ∈ H (Wh is equal to the limiting random variable in (5) for this example). 13 14

I thank an anonymous referee for pointing this out. This is again ignoring the problematic cases under which |h1 | = ∞ and h2,2 = 0. See footnote 8.

12

The previous size-correction literature (e.g., Andrews and Guggenberger, 2009a, 2010a) f (1) and has in fact focused upon the special case for which u = 1, P = Rr∞ , Wh = W h ¯ (1),∞ = ∅.15 Henceforth, we refer to this as the “usual case”. Departing from the usual H case by allowing for u > 1 and a divergent test statistic under H0 and some {γωn ,h : n ≥ 1}

sequences is useful for various problems. For example, since it is sometimes cumbersome or

impossible to consistently the entire h2 vector, using plug-in LF CVs (and their variants) can lead to tests with poor power. This is the case for our simple testing after consistent model selection example and the more complicated examples of McCloskey (2017). This complication will also arise when conducting inference after a pretest with a diverging CV. For example, consider conducting valid inference after a Hausman test, as in Guggenberger (2010), but with a diverging pretest CV. Other times, even for problems that fall under the usual case, it becomes too computationally demanding to use the data to determine reliable ranges for the entire h1 vector. In this case, one could simplify the computation of CVs by taking a more conservative approach that redefines some of the elements of h1 as elements h2 that are not in the consistently estimable subvector hc2 . Indeed, a recent paper by Cheng (2015) employs a similar strategy by using analogous quantities to ζn to consistently determine categories of identification strength for conducting uniform inference in nonlinear models. In general, u is determined by the number of categories the researcher uses the auxiliary selection parameter ζn to divide the localization parameter space into. See McCloskey (2017) for a detailed example. Finally, constructing CVs that properly account for test statistic divergence under H0 and some {γωn ,h : n ≥ 1} enables one to conduct valid

tests with non-negligible power in problems such as testing after consistent model selection, t-testing using possibly irrelevant instrumental variables (e.g., Dufour, 1997; Staiger and Stock, 1997; Andrews and Guggenberger, 2010a) and tests on cointegrating relationships when some of the time series under study may be stationary (e.g., Elliott, 1998; Franchi and Johansen, 2017), to name a few examples. The LF CVs and their variants of Andrews and Guggenberger (2009a) are (asymptotically) infinite in these contexts. Returning to our testing after model selection examples, note that in both conservative f (j) ’s are continuous in the localization parameter and consistent cases, the quantiles of the W h

h. This continuity property holds quite generally and will in fact be exploited in the CV construction methods introduced in the next section. As a matter of notation, let ch(j) (1 − α) f (j) . be the (1 − α)th quantile of W h

Assumption Cont (Continuity). Consider some fixed δ ∈ (0, 1). For all j = 1, . . . , u, 15

In this case {ζn : n ≥ 1} can be arbitrarily set to equal any constant sequence in Rr∞ .

13

the following statements hold: ¯ (j) into R, ch(j) (1 − δ) is continuous. (i) As a function from H f (j) is continuous at ch(j) (1 − δ) for all h(j) ∈ H ¯ (j) . (ii) The distribution function of W h

In the usual case, the first part of this assumption captures the intuitive fact that appropriately defined drifting sequence asymptotics “smooth” between the discontinuity points in the point-wise asymptotic distribution of the test statistic. The second part follows when f (j) random variables are absolutely continuous for h(j) ∈ H ¯ (j) . the W h 3

Critical Value Construction

In our simple example of testing after conservative selection, one may use the LF CV suph∈H ch (1 − α), where ch (1 − α) is the (1 − α) quantile of the limiting random variable

in (5) to control the asymptotic size of the test at level α. However, this strategy will lead to a very large CV and a subsequent test with low power. As alluded to in the previous section, we may instead construct CVs that depend upon the data through the (inconsistent) ˆ n,2,1 is ˆ n = (h ˆ n,1 , h ˆ n,2,1 , h ˆ n,2,2 ), where h ˆ n,1 = n1/2 ξˆn /ˆ σξ,n , h localization parameter estimate h ˆ n,2,2 is a consistent a consistent estimator of the asymptotic standard deviation of θˆn and h estimator of the asymptotic correlation between θˆn and ξˆn . More specifically, consider the following (1 − β) level confidence set for h1 : ˆ n ) = [h ˆ n,1 − zβ/2 , h ˆ n,1 + z1−β/2 ]. Iβ (h

(11)

d ˆ n,1 −→ e ˆ n ) has correct asymptotic Since h h = h1 + Z2 under H0 and {γn,h : n ≥ 1}, Iβ (h ˆ n )) → 1 − β under H0 and {γn,h : n ≥ 1}. In contrast to the LF CV, coverage: P (h1 ∈ Iβ (h

we may instead endeavor to adapt to the data by (i) plugging in the consistent estimator ˆ c for hc in ch and (ii) maximizing the localized quantiles ch (1 − α) in h1 , not over the h 2 n,2 ˆ n ). In this sense, we entire parameter space for h1 , but rather the data-dependent subset Iβ (h

may hope to obtain a smaller CV and a test with better power properties while maintaining asymptotic size control. In the example of testing after consistent selection, we may use the property captured by Definition MLLD to simplify CV construction in this more complicated setting. In this case, √ p ζˆn = nξˆn /(ˆ σξ,n cn ) −→ ζ∞,h under H0 and {γn,h : n ≥ 1}, and we can use this consistent estimator to determine which ch(j) , j ∈ {1, 2, 3}, to construct the CV from. This logic can be applied quite generally and is captured by the following assumption.

14

Assumption Sel (Selection). For all h ∈ H, all subsequences {ωn : n ≥ 1} of {n : n ≥ 1} p and all sequences {γωn ,h : n ≥ 1}, under H0 and {γωn ,h : n ≥ 1}, ζˆωn −→ ζ∞,h . (j)

In the general framework, we need asymptotically valid confidence sets for h1 for j = ˆ n as input. For minimal coverage level 1 − β, 1, . . . , u, which are correspondences taking h (j) ˆ we denote these as I (h n ) for j = 1, . . . , u. For the following assumption, define β

(j) c ¯ 1(j) (hc2 ) = {h(j) ¯ (j) H 1 : (h1 , h2 ) ∈ H }

and equip any product space Rb∞ with the product metric for which d(x, y) = |Φ(x) − Φ(y)| for any x, y ∈ R∞ . ¯ (j) , Assumption CS (Confidence Set). For j = 1, . . . , u, β ∈ [0, 1] and all h(j) ∈ H (j) (j) (j) Iβ : Rp∞ × Rs∞ ⇒ Rp∞ is a continuous and compact-valued correspondence with P (h1 ∈ (j) (j) ˆ ¯ (j) (hc )) = 1. ¯ (j) ˆ c )) = 1 for all n ≥ 1 and P (I (j) (e h) ⊂ H I (e h)) ≥ 1 − β, P (I (h n ) ⊂ H (h β

1

β

n,2

1

β

2

Aside from offering correct asymptotic coverage and being well-behaved as correspon-

dences, the confidence sets cannot indicate test statistic divergence under sequences for ¯ (j) . which h(j) ∈ H (1) ˆ (3) ˆ In our post-consistent selection testing example, we may use, e.g., I (h (hn ) n ) and I β

equal to (11) and 3.1

(2) ˆ Iβ (h n)

β

= ∅.

Simple Bonferroni Critical Values

We begin by describing the simple Bonferroni CV construction in the context of our postconsistent selection testing example for a test of level α ∈ (0, 1). Let δ ∈ [0, α], Aε1 =

(−1 − ε, 1 + ε), Aε2 = [−∞, −1 + ε) ∪ (1 − ε, ∞] and Aε3 = (−1 − ε, −1 + ε) ∪ (1 − ε, 1 + ε) for some small ε > 0. Compute ζˆn and find all indices j ∈ {1, 2, 3} such that ζˆn ∈ Aεj . For these indices j, compute the quantities

(j) ˆ n) = cB (α, α − δ, h

sup (j) (j) ˆ h1 ∈Iα−δ (h n)

c(h(j) ,hˆ c ) (1 − δ) 1

n,2

(12)

and find the maximum of these quantities. This is the simple Bonferroni CV. For the postconservative selection testing example, there is only one ch(j) to choose from (j = 1) and a simple Bonferroni CV can be obtained as sup (1)

(1)

ˆn) h1 ∈Iα−δ (h

c(h(1) ,hˆ c ) (1 − δ), 1

15

n,2

(1) ˆ where Iβ (h n ) is equal to the confidence set in (11). In the general testing framework of this paper, we may describe the simple Bonferroni

CVs as follows. For each i = 1, . . . , u, let Aεi be some small expansion of Ai such that Ai ⊂ int(Aε ) ∪ {±∞}r . Find all i ∈ {1, . . . , u} for which ζˆn ∈ Aε and denote this set of i

i

indices as {i1 , . . . , iJ }. Then we define the simple Bonferroni CV for some 0 ≤ δ ≤ α as ˆ n , ζˆn ) ≡ cB (α, α − δ, h (j) ˆ n) = cB (α, α − δ, h

max

j∈{i1 ,...,iJ }

(j) ˆ n ), where cB (α, α − δ, h

sup (j)

(j)

ˆn) h1 ∈Iα−δ (h

c(h(j) ,hˆ c ) (1 − δ). 1

n,2

ˆ n ) must be computed to Remark 2. For a given data set, since only a single cB (α, α − δ, h ˆ n , ζˆn ), a larger u does not typically lead to any extra challenges in CV construct cB (α, α − δ, h (j)

construction. On the contrary, a larger u can actually help reduce the computational burden of CV construction by allowing one to maximize localized quantile functions over a smaller (j)

subvector h1 of h1 . This is also true of the CVs introduced subsequently. Remark 3. As exemplified by the post-conservative selection example, in the usual case, there is no need to form the Aεj ’s sets and the maximization over the index set {i1 , . . . , iJ } may be omitted since u = 1.

f (j) in Definition MLLD is anywhere strict, the Remark 4. If the a.s. dominance of Wh by W h

use of CVs based upon ch(j) (·) will necessarily be conservative. Sometimes this is unavoidable due to the structure of the problem and/or computational concerns. However, for power f (j) that is “close” to that of Wh for the CV purposes, whenever feasible, one should use a W h

construction methods described in this paper. Indeed, it is often the case that one may find a f (j) that coincides with Wh when ζ∞,h ∈ Aj . This occurs in the usual case random variable W h f (1) equal to Wh ), as well as the post-consistent model selection testing (by simply setting W h

example.

Remark 5. In the context of post-consistent selection testing, using ζˆn to consistently “select” which localized quantile function to form the CV from is similar in spirit to the “generalized moment selection” technique of Andrews and Soares (2010) and the “identification category selection” technique of Andrews and Cheng (2012) and Cheng (2015). One primary difference with these latter techniques is that in the post-consistent selection problem, both the test statistic and the CV depend upon the quantity ζˆn . In these latter papers, only CV formation depends upon (the analog of ) ζˆn . Nevertheless, for problems in which ζˆn does not 16

affect the formation of the test statistic, it may be possible to use ζˆn to simplify the construction of the simple Bonferroni CV (and the CVs described subsequently). As remarked above, this could be done by redefining some of the elements of h1 as elements of h2 that are not part of the consistently estimable subvector hc2 , thereby reducing the dimension of h1 . Remark 6. To help understand the computational demands required to construct a simple Bonferroni CV (and the CVs introduced subsequently), note that the underlying CV (j) ˆ n ) itself is the value function of a nonlinear optimization problem for which cB (α, α − δ, h the choice set often takes the form of a Wald ellipsoid:

(j) ˆ (j) (j) (j) ˆ (j) 0 ˆ ˆ (j) Iα−δ (h n ) = {h1 : (h1 − hn,1 ) M (hn )(h1 − hn,1 ) ≤ rα−δ }

ˆ n ) and critical value rα−δ . The objective funcfor some (data-dependent) weight matrix M (h (j) ˆ n ), c (j) ˆ c (1 − δ), is often only known tion involved in the computation of cB (α, α − δ, h (h ,h ) 1

n,2

f (j)(j) through numerical approximation by finding the (1 − δ)th quantile of W ˆc

(h1 ,hn,2 )

via Monte

(j)

Carlo simulation. For low-dimensional h1 , the optimization of the nonlinear function can be done rather quickly by grid search, as in the applications of this paper. For higher-dimensional (j)

h1 , methods advanced in the global optimization literature, such as response surface analysis (e.g., Jones, 2001) and genetic algorithms (e.g., Golberg, 1989), have been applied successfully in different contexts with a similar amount of computational complexity and should be equally useful here. It should be noted, however, that it is not always the case that the CVs introduced in this paper require optimization of an expensive nonlinear function. For some testing problems, one may exploit certain features of the localized quantile function and confidence set to simplify computation. See Section 3.4 below for a discussion. It should also be noted that for some testing problems, a valid confidence set for a low(j)

dimensional h1 can be computationally involved. This typically occurs when the distribution (j) (j) of e h1 does not equal a random vector that is a shifted version of an h1 -invariant random

vector. Examples of this include settings for which h1 is equal to the localization parame-

ter in a local-to-unity autoregression or in the weak identification settings of Andrews and

Cheng (2012) and Han and McCloskey (2016). Han and McCloskey (2016) note that the adjusted Bonferroni CV introduced below can be modified to avoid this type of computational complication. See that paper for more details. This type of simple Bonferroni CV construction can be found for specific examples in the literature that fall under the general setting of this paper such as testing in the linear regres17

sion model with potentially highly persistent regressors (Cavanagh et al., 1995), testing in a linear instrumental variables model in the potential presence of weak instruments (Staiger and Stock, 1997), testing in the presence of overidentifying moment inequalities (Moon and Schorfheide, 2009) and testing in moment inequality models (Romano et al., 2014).16 Other procedures making use of Bonferroni bounds to conduct inference in the presence of nuisance parameters have appeared in various contexts throughout the econometrics and statistics literature. Examples include Loh (1985), Berger and Boos (1994), Silvapulle (1996), Romano and Wolf (2000), Campbell and Yogo (2006) and Chaudhuri and Zivot (2011).17 However, to the author’s knowledge, this is the first time these simple Bonferroni CVs have been laid out in such a general nonstandard asymptotic testing framework. ¯ (j) , the following Noting that Assumptions Cont and CS apply specifically when h(j) ∈ H

assumption explicitly deals with the cases under which the limiting values of the test statistic and CV may be infinite. ¯ (j),∞ and ζ∞,h ∈ Aj , all subseAssumption Inf (Infinite). For all h ∈ H with h(j) ∈ H

quences {ωn : n ≥ 1} of {n : n ≥ 1} and all sequences {γωn ,h : n ≥ 1}, (j) ˆ ωn )) ≤ α lim Pθ0 ,γωn ,h (Tωn (θ0 ) > cB (α, α − δ, h

n→∞

for j = 1, . . . , u. This assumption must be verified directly by examining the joint limiting behavior of the test statistic and CV under H0 and {γωn ,h : n ≥ 1} sequences that may render both divergent. ¯ (j),∞ = ∅. When it must For the usual case, this assumption need not be verified since H be done, verification of this technical assumption can typically rely upon the continuity f (j) > c(j) (α, α − δ, e ¯ (j) (e.g., McCloskey, 2017). properties of P (W h)) in h(j) ∈ H h

B

In order to establish a lower bound on the asymptotic size of the test, we impose the

following assumption.

Assumption LB (Lower Bound). There is some h∗ ∈ H such that for some j ∈ {1, . . . , u}, d f (j) ¯ (j) , ζ∞,h∗ ∈ Aj and ζ∞,h∗ ∈ h∗(j) ∈ H / Cl(Aεi ) for i 6= j, W ∗ ∼ W ∗ and sup (j) ¯ (j) c∗ c (j) c∗ (1− h

h

δ) = ch∗(j) (1 − δ), where suph(j) ∈H¯ (j) (·) c(h(j) ,·) (1 − δ) is continuous at hc∗ 2 . 1

h1 ∈H1 (h2 ) (h1 ,h2 )

1

16 The original version of this paper made an assumption ruling out applications to certain moment inequality models. This assumption has since been modified and this version of the paper does apply to moment inequality models. See footnote 6 for details. 17 I thank Hannes Leeb and Benedikt P¨ otscher for alerting me to some of the early references in the statistics literature through the note Leeb and P¨ otscher (2017).

18

To see that this assumption holds in our testing after consistent selection example, simply take j = 2 and consider any h∗ ∈ H such that ζ∞,h∗ = ∞ ∈ A2 . We are now prepared to state the first main result of this paper: the simple Bonferroni CVs control the asymptotic size of the resultant test. Theorem Bonf (Bonferroni). Under Assumptions PS, DS, Cont, Sel, CS evaluated at β = (α − δ) ∈ [0, α) and Inf, ˆ n , ζˆn )) ≤ α. AsySz(Tn (θ0 ), cB (α, α − δ, h In addition, if Assumption LB holds, ˆ n , ζˆn )) ≥ δ. AsySz(Tn (θ0 ), cB (α, α − δ, h 3.2

Adjusted Bonferroni Critical Values

As shown in Theorem Bonf, the simple Bonferroni CVs may yield a conservative test whose asymptotic size does not reach its nominal level. Instead of relying on Bonferroni bounds, for a given confidence set level, we can directly adjust the level of the localized quantile functions ˆ n . Adjusting ch(j) (·) via simulation according to the joint limiting behavior of Tn (θ0 ) and h the quantile level in the construction of the CV in a proper data-dependent manner can make the resultant test non-conservative and substantially more powerful while attaining correct asymptotic size. At an intuitive level, the upper bound on asymptotic size determined in Theorem Bonf is valid under all data configurations but we may adapt the CV to the dependence structure present in a particular data set to construct the test and make it (potentially, much) less conservative. In the context of our testing after consistent model selection problem, consider again finding all j ∈ {1, 2, 3} such that ζˆn ∈ Aε but now, for each of these indices, compute a value j

(j)

a such that for a given confidence set level β ∈ [0, 1], the following inequality approximately ˆc : holds at hc2 = h n,2 sup (j) ¯ (j) c h1 ∈H 1 (h2 )

f ≥ P (W h (j)

sup (j) (j) h1 ∈Iβ (e h)

c(h(j) ,hc ) (1 − a(j) )) ≤ α. 1

2

(13)

(j) f (j) , e This computation can be achieved by simulating from the joint distributions of (W h1 ), h (j)

which can be deduced from (7)-(10), over a fine grid of h1 -values.18 The adjusted Bonferroni 18

See Dufour (2006) for numerical approximation results on Monte Carlo tests in nonstandard settings.

19

CV is then taken as the largest suph(j) ∈I (j) (hˆ n ) c(h(j) ,hˆ c ) (1 − a(j) ) in the index set. Clearly, 1

1

β

n,2

the inequality (13) can hold for multiple values of a(j) . Choosing a larger a(j) satisfying (13) will necessarily lead to a test with (weakly) higher power.19 In full generality, we define adjusted Bonferroni CVs as follows using the same index set {i1 , . . . , iJ } as before: for some β ∈ [0, 1], ˆ n , ζˆn ) ≡ cB−A (α, β, h

max

j∈{i1 ,...,iJ }

(j) ˆ n ), cB−A (α, β, h

where for j = 1, . . . , u, (j) ˆ n) = cB−A (α, β, h

sup (j)

(j)

ˆn) h1 ∈Iβ (h

ˆ c )) with some c(h(j) ,hˆ c ) (1 − a(j) (h n,2 1

n,2

a(j) : Rs∞ → [δ (j) , α − δ¯(j) ] ⊂ [0, α]. Similar remarks to those applying to the simple Bonferroni CVs in the usual case (Remarks 2–5) apply here as well. It is also worth mentioning that the adjusted Bonferroni CVs are similar in spirit to a refinement of a Bonferroni confidence interval suggested in the particular context of testing with a potentially highly persistent regressor by Cavanagh et al. (1995) and Campbell and Yogo (2006).20 Since the adjusted Bonferroni CVs evaluate the localized quantile function at quantiles, ˆ c ) determined by the data through h ˆ c , we need to strengthen Assumption Cont(i) 1−a(j) (h n,2 n,2 to impose continuity over quantile levels as well localization parameters. This additional condition can be seen to hold in our testing after model selection examples since the quantiles f (j) ’s are not only continuous in h, but also in the quantile level at which they are of the W h

19

As described, the adjusted Bonferroni CV fixes a value for the level of the confidence set β and searches for the largest quantile level a(j) such that (13) holds. An alternative way to approach this problem would be to instead fix the quantile level δ and search for the largest level of the confidence set β ∗ such that sup (j) ¯ (j) (hc ) h1 ∈H 1 2

(j)

f ≥ P (W h

sup (j) (j) ˆ h1 ∈Iβ ∗ (h n)

c(h(j) ,hˆ c 1

n,2 )

(1 − δ)) ≤ α.

In a typical application, β ∗ constructed in this way will be a monotonically decreasing function of δ or, conversely, a(j) constructed as in display (13) will be a monotonically decreasing function of β. Hence, these two methods are essentially two alternative ways to compute a point in a curve that expresses confidence level in terms of quantile level. 20 In a recent paper, Phillips (2014) showed that the specific refinement of the Bonferroni confidence interval suggested by Campbell and Yogo (2006) is actually not uniformly valid. Instead of using a confidence set to cover the localization parameter in this context (as done in Cavanagh et al., 1995), Campbell and Yogo (2006) suggested the use of a confidence set to cover the parameter γ1 in their context. This latter confidence set fails to cover γ1 uniformly over Γ1 and Phillips (2014) suggested the use of other confidence sets that do, based upon the work of Hansen (1999) and Mikusheva (2007).

20

¯ (j) ). Given evaluated: these random variables are absolutely continuous (when h(j) ∈ H

Assumption Cont(i), the following additional assumption will hold via proper construction of (the range of) the level adjustment functions a(j) . Assumption Cont-Adj (Continuity-Adjusted). For some pairs (δ (j) , δ¯(j) ) ∈ [0, α] with ¯ (j) and 0 ≤ δ (j) ≤ α − δ¯(j) ≤ α, as a function of h(j) and δ, ch(j) (1 − δ) is continuous over H [δ (j) , α − δ¯(j) ] for j = 1, . . . , u. See Algorithm Bonf-Adj in Appendix 1 for an algorithm on how to construct adjusted Bonferroni CVs in the general case. Construction of adjusted Bonferroni CVs from Algorithm Bonf-Adj will allow the resultant test to (approximately) satisfy the following remaining assumptions. First, the level adjustment functions must be continuous and constructed appropriately for the test to satisfy a size constraint. Assumption a. (i) For j = 1, . . . , u, a(j) : Rs∞ → [δ (j) , α − δ¯(j) ] is continuous. f (j) ≥ sup (j) (j) e c (j) c (1 − a(j) (hc2 )) ≤ α for all h(j) ∈ H ¯ (j) . (ii) For j = 1, . . . , u, P (W h h ∈I (h) (h ,h ) 1

1

β

2

To maximize power while retaining asymptotic size control, a(j) (·) should be constructed

to be as large as possible at any given hc2 , while satisfying the constraints imposed by Assumption a. In practice, for a single given testing problem, one does not need to determine ˆ c . Supposing the entire function a(j) (·) since it will only be evaluated at a single point h n,2 (j) ˆ c that a (hn,2 ) comes from evaluating a function satisfying Assumption a(i) simply provides one with the asymptotic theoretical justification needed for controlling asymptotic size and is consistent with the implementation in Algorithm Bonf-Adj. For similar reasons, the user ˆ c ) for j ∈ {i1 , . . . , iJ }, as done in the algorithm. The latter set is need only compute a(j) (h n,2

determined by ζˆn and is typically a singleton. Finally, Algorithm Bonf-Adj is consistent with f (j) ≥ sup (j) (j) e c (j) c (1 − a(j) (hc ))) ≤ α Assumption a(ii) since Step 2(e) enforces P (W h

for all h(j) ∈

ˆc ¯ (j) (h H grid n,2 )

ˆ c is consistent for hc . and h n,2 2

h1 ∈Iβ (h) (h1 ,h2 )

2

The next assumption is the analog to Assumption Inf for the adjusted Bonferroni CVs,

dealing with potentially infinite limiting values of the test statistic and CV under H0 . ¯ (j),∞ and ζ∞,h ∈ Assumption Inf-Adj (Infinite-Adjusted). For all h ∈ H with h(j) ∈ H Aj , all subsequences {ωn : n ≥ 1} of {n : n ≥ 1} and all sequences {γωn ,h : n ≥ 1}, (j) ˆ ωn )) ≤ α lim Pθ0 ,γωn ,h (Tωn (θ0 ) > cB−A (α, β, h

n→∞

for j = 1, . . . , u. 21

As with Assumption Inf, this assumption must be verified directly unless the problem falls under the usual case and similar comments apply. Finally, in order to establish a lower bound on the asymptotic size of tests using adjusted Bonferroni CVs, Assumption LB must be adjusted accordingly. Assumption LB-Adj (Lower Bound-Adjusted). There is some h∗ ∈ H such that for ¯ (j) , ζ∞,h∗ ∈ Aj and ζ∞,h∗ ∈ f (j) some j ∈ {1, . . . , u}, h∗(j) ∈ H / Cl(Aεi ) for i 6= j, Wh∗ = W h∗ a.s. and

f ∗ ≥ sup (j) (j) e∗ c (j) c∗ (1 − a(j) (hc∗ (i) P (W 2 ))) = α. h h ∈I (h ) (h ,h ) (j)

1

β

1

2

(j) e∗ f (j) (ii) P (W h∗ = cB−A (α, β, h )) = 0.

Using Algorithm Bonf-Adj, Assumption LB-Adj(i) will hold approximately with the numerical approximation error decreasing as the grid of αj ’s in Step 2(b) becomes finer. Assumption LB-Adj(ii) will typically hold via the absolute continuity of the limiting random (j) e∗ f (j) variable W h∗ − cB−A (α, β, h ).

We may now present the asymptotic size result for tests using adjusted Bonferroni CVs.

Theorem Bonf-Adj (Bonferroni-Adjusted). Under Assumptions PS, DS, Cont-Adj, Sel, CS, a and Inf-Adj, ˆ n , ζˆn )) ≤ α. AsySz(Tn (θ0 ), cB−A (α, β, h In addition, if Assumption LB-Adj holds, ˆ n , ζˆn )) = α. AsySz(Tn (θ0 ), cB−A (α, β, h 3.3

Minimum Bonferroni Critical Values

In the construction of adjusted Bonferroni CVs, different choices of β trade off power over different regions of the parameter space under the alternative hypothesis. (This is also true of δ in the simple Bonferroni CVs.) Depending upon this choice, the adjusted Bonferroni CVs can lack power over large portions of the alternative hypothesis. In order to overcome this obstacle and produce tests that can flexibly and simultaneously direct power toward alternatives of interest while maintaining high power over most of the alternative space, we introduce CVs that minimize over a set of simple and adjusted Bonferroni CVs. Again coming back to our example of testing after consistent model selection, find all j ∈ {1, 2, 3} with ζˆn ∈ Aεj . For each of these j’s, consider now constructing a set of S simple 22

Bonferroni CVs suph(j) ∈I (j) 1

ˆ

α−δs (hn )

c(h(j) ,hˆ c ) (1−δs ) with s ∈ {1, . . . , S} a` la (12) and A adjusted n,2

1

Bonferroni CVs suph(j) ∈I (j) (hˆ n ) c(h(j) ,hˆ c ) (1 − a(j) ) with a ∈ {1, . . . , A} as described following 1

n,2

1

βa

(j) ˆ n ). Finally, compute (13). For each j, call the minimum over both sets of CVs cB−M (α, h ˆc : values η (j) for which the following inequality approximately holds at hc = h 2

sup (j) ¯ (j) c h1 ∈H 1 (h2 )

n,2

f (j) ≥ c(j) (α, e P (W h) + η (j) ) ≤ α. B−M h

(14)

Similarly to the computation in (13), this computation can be deduced from (7)-(10), over a (j) ˆ n )+ fine grid of h-values. The minimum Bonferroni CV is then taken as the largest c (α, h B−M

η (j) in the index set. In analogy to a(j) in the computation of the adjusted Bonferroni CVs, choosing a smaller η (j) satisfying (14) will necessarily lead to a test with higher power. → − In the fully general context, using the same index set {i1 , . . . , iJ }, let δ = (δ1 , . . . , δS )0 T → − for some S ≥ 0 with δs ∈ uj=1 [δ (j) , α − δ¯(j) ] for s = 1, . . . , S and β = (β1 , . . . , βA )0 for some A ≥ 0 with βa ∈ [0, 1] for a = 1, . . . , A. The minimum Bonferroni CVs are defined as follows: → − → − ˆ ˆ cB−M (α, δ , β , h n , ζn ) ≡

max

j∈{i1 ,...,iJ }

→ − → − ˆ (j) cB−M (α, δ , β , h n ),

where for j = 1, . . . , u, → − → − ˆ (j) (j) (j) (j) ˆ c ˆ ˆ cB−M (α, δ , β , h n ) = min{ min cB (α, α − δs , hn ), min cB−A (α, βa , hn )} + η (hn,2 ) 1≤s≤S

1≤a≤A

(j) ˆ n ) embedded in the definition of the with some η (j) : Rs∞ → R.21 For each cB−A (α, βa , h

minimum Bonferroni CV, there is a level adjustment function that implicitly depends upon

βa , just as a(j) (·) implicitly depends upon β in the adjusted Bonferroni CV. Making this (j) dependence explicit, we have aa : Rs∞ → [δ (j) , α − δ¯(j) ] for a = 1, . . . , A. Again, similar

remarks to Remarks 2–5 apply here.

Broadly speaking, the minimum Bonferroni CVs introduced here share some similarities with CVs found in specific contexts elsewhere. In the contexts of inference in moment inequality models and inference robust to identification strength, Andrews and Soares (2010), Andrews and Barwick (2012) and Andrews and Cheng (2012) also use CVs that are functions 21

Strictly speaking, we could allow for a different vector (δ1 , . . . , δS ) for each j = 1, . . . , u, Tu (j) (j) i.e., (δ1 , . . . , δS ) which would allow us to weaken the condition that δs ∈ j=1 [δ (j) , α− δ¯(j) ] for s = 1, . . . , S (j) to δs ∈ [δ (j) , α − δ¯(j) ] for s = 1, . . . , S and j = 1, . . . , u. Similarly a different vector of β’s could be used for each j as well. Also, in the construction of the CV, under stronger continuity assumptions to those imposed subsequently, one could minimize over continuums of δ and β (rather than a finite number of points). In practical implementation, one would typically minimize over a grid anyway.

23

of an inconsistent estimator of the localization parameter. Perhaps the most similar methods to those using minimum of Bonferroni CVs to be found in these papers are those that use a continuous “transition function” to smooth between the LF CV and the one that obtains under infinite values of the localization parameter. These methods apply only to the usual case and also necessitate the use of a size-correction function similar to the η (j) ’s found in the definition of the minimum of Bonferroni CVs. Rather than relying upon a user-chosen transition function as in these papers, the minimum of Bonferroni CVs in some sense let the data determine how to transition between localized quantiles via the confidence sets used ˆ n . Moreover, when we are no longer in the usual case and the test statistics may attain for h infinite values under H0 and {γn,h : n ≥ 1} sequences, it is not possible to adapt the extant transition function approach to the setting of this paper without yielding inconsistent tests with trivial global power over some portions of the parameter space. This is because the LF CV can be infinite. We now introduce three assumptions that are the analogs to Assumptions a, Inf-Adj and LB-Adj for the minimum Bonferroni CVs. In analogy to using Algorithm Bonf-Adj to construct adjusted Bonferroni CVs, construction of minimum Bonferroni CVs along the lines of Algorithm Bonf-Min in Appendix 1 will yield a test that (approximately) satisfies the following remaining assumptions. We omit discussion of these remaining assumptions for brevity since it would be very similar to that of Assumptions a, Inf-Adj and LB-Adj. Assumption η. (i) For j = 1, . . . , u, η (j) : Rs∞ → R is continuous. − → − f (j) ≥ c(j) (α, → ¯ (j) . (ii) For j = 1, . . . , u, P (W δ , β ,e h)) ≤ α for all h(j) ∈ H B−M h

¯ (j),∞ and ζ∞,h ∈ Assumption Inf-Min (Infinite-Minimum). For all h ∈ H with h(j) ∈ H Aj , all subsequences {ωn : n ≥ 1} of {n : n ≥ 1} and all sequences {γωn ,h : n ≥ 1}, → − → − ˆ (j) lim Pθ0 ,γωn ,h (Tωn (θ0 ) > cB−M (α, δ , β , h ωn )) ≤ α

n→∞

for j = 1, . . . , u. Assumption LB-Min (Lower Bound-Minimum). There is some h∗ ∈ H such that for ¯ (j) , ζ∞,h∗ ∈ Aj and ζ∞,h∗ ∈ f (j) some j ∈ {1, . . . , u}, h∗(j) ∈ H / Cl(Aεi ) for i 6= j, Wh∗ = W h∗ a.s. and

→ − → − e∗ (j) f (j) (i) P (W h∗ ≥ cB−M (α, δ , β , h )) = α. → − → − e∗ (j) f (j) (ii) P (W h∗ = cB−M (α, δ , β , h )) = 0.

Finally, we present the asymptotic size result for tests using minimum Bonferroni CVs. 24

Theorem Bonf-Min (Bonferroni-Minimum). Suppose Assumption a(i) holds for each (j)

aa , a = 1, . . . , A and Assumption CS holds for each β = α − δs , s = 1, . . . , S, and β = βa , a = 1, . . . , A. Under Assumptions PS, DS, Cont-Adj, Sel, η and Inf-Min, → − → − ˆ ˆ AsySz(Tn (θ0 ), cB−M (α, δ , β , h n , ζn )) ≤ α. In addition, if Assumption LB-Min holds, → − → − ˆ ˆ AsySz(Tn (θ0 ), cB−M (α, δ , β , h n , ζn )) = α. 3.4

Computational Considerations

Each of the three types of CVs introduced above requires maximization of the localized (j) (j) ˆ quantile functions ch(j) (·) in h over the confidence set I (h n ). Though this will often not be 1

β

challenging for a modern computer, when computing the adjusted and minimum Bonferroni ˆ c ) and CVs, a similar operation must be repeated many times in order to compute a(j) (h n,2

ˆ c ). When h(j) is low-dimensional (e.g., consisting of one or two entries), even these η (j) (h 1 n,2 latter computations are not terribly demanding. (In our simple example of testing after ˆc ) ˆ c ) and η (j) (h conservative selection with CVs constructed according to Section 4, a(j) (h n,2 n,2 take 140 and 208 seconds to compute in Matlab on a 2013 iMac with a 3.5 GHz Intel Core i7 processor.) In turn, any of the three types of Bonferroni-based CVs can be computed rather (j)

easily in low-dimensional h1 problems, of which there are an abundance. However, many interesting nonstandard testing problems involve higher dimensional localization parameters. Examples include (i) testing in moment inequality models with a moderate to large number of moments (see e.g., Andrews and Soares, 2010), (ii) testing on frequentist model-averaged estimators amongst many models (see e.g., Hansen, 2007, 2014) and (iii) testing after selection on a large number of regressors in the linear regression model (see e.g., McCloskey, 2017). Though the computation of Bonferroni-based CVs for some of these problems may be infeasible (e.g., (ii)), particular constructions of the confidence sets (j) ˆ I (h n ) can make implementation feasible for others (e.g., (i) and (iii)). More specifically, β

(j)

(j)

when the localized quantile function ch(j) (·) is (weakly) monotone in h1 (or |h1 |), a rectan(j) ˆ gular confidence set I (h n ) yields a closed-form solution to optimization problems such as β

suph(j) ∈I (j) (hˆ n ) c(h(j) ,hˆ c ) (1 − δ) (or a solution involving a maximum over only a finite set): the 1

β

1

n,2

supremum occurring at the “corners” of the confidence set. It seems that Romano et al. (2014) were the first to make this insight, in the context of testing problems (i), which fall

under the usual case. But the insight applies more generally. Indeed, notice that in the testf (1) , W f (2) and ing after consistent selection example, (8)-(10) imply that the quantiles of W h

25

h

f (3) possess this monotonicity property. In McCloskey (2017), we discuss how to exploit W h this property and form the rectangular confidence sets that allow for quick computation of Bonferroni-based CVs in a more general testing after consistent selection example. f (1) , W f (2) In connection with the previous discussion, note that although the quantiles of W h

h

f (3) in (8)-(10) are monotone in |h1 |, the quantiles of W f (1) = Wh in (5) are not. That and W h h

is, under the consistent model selection regime, we obtain localized quantile functions with this convenient property whereas under the conservative regime, we do not. Since h1 is lowdimensional in both of our simple testing after model selection examples, all CVs introduced above are easy to form for these cases. The distinction yields practical consequences in more complex post-selection problems for which h1 is of a higher dimension. In these cases, Bonferroni-based CVs may become intractable under the conservative selection regime while remaining entirely feasible under consistent selection. Insofar as one finds the distinction between conservative and consistent selection immaterial, since after all, we only have a single sample size in a given application, utilizing the asymptotic framework given by consistent selection can allow one to conduct valid post-selection inference, even in higher dimensional settings. McCloskey (2017) details how this very concept may be put to use. 3.5

Power Considerations and Tuning Parameter Choices

The CVs introduced in this paper depend upon tuning parameters. Apart from the construction of a confidence set of a certain level, these tuning parameters include the choices → − → − of δ for simple Bonferroni CVs, β for adjusted Bonferroni CVs and ( δ , β ) for minimum Bonferroni CVs as well as the sets Aεi for i = 1, . . . , u. It may be noted that each of our Bonferroni-based CVs nests the plug-in LF CV of Andrews and Guggenberger (2009a) for particular choices of tuning parameters (e.g., for δ = α in the simple Bonferroni construction). The choice of tuning parameters gives the practitioner the flexibility to direct power toward various regions of the localization parameter space H. For example, the plug-in LF CV (or simple Bonferroni CV with δ = α) will tend to yield the highest power under alter√ natives for which nγ1 ≈ h∗1 = argsuph1 ∈H¯ (1) (hˆ c ) c(h1 ,hˆ c ) (1 − α) in the usual case. However, 1

n,2

n,2

high power at h∗1 is often attained at the cost of very low power over much of the remaining

localization parameter space, whereas other tuning parameter choices may not make such a harsh tradeoff. Though none of the tests using CVs introduced in this paper uniformly dominate tests based upon the (plug-in) LF CV, appropriate tuning parameter choices can simultaneously allow the user to direct power toward alternatives of interest while maintaining high power over large portions of the alternative parameter space. These power gains can 26

often be attained by decreasing the degree of nonsimilarity of the test. The test using the (plug-in) LF approach will be less similar the larger the difference between suph∈H ch (1 − α) and inf h∈H ch (1 − α), or related (plug-in) quantities. On the other hand, by minimizing over

a set of data-dependent critical values, the minimum Bonferroni approach has the potential to bring the resultant test much closer to similarity.

By deriving the limiting distributions of Tn (θ0 ) and the Bonferroni-based CVs under local alternatives to H0 , one may in principle choose the tuning parameters to maximize a local asymptotic weighted average power (WAP) criterion. For examples of this approach to tuning parameter selection, see e.g., Andrews and Barwick (2012) and McCloskey (2013). Barring certain computational simplifications, though such an approach may be appealing in principle, it is often infeasible in practice. When such an approach is computationally feasible, we recommend its use. For cases in which it is too cumbersome to carry out, this section discusses heuristic approaches that tend to generate tests with desirable power properties. With regards to the choice of δ in the simple Bonferroni CV, a large value such as δ = α− α/10 tends to perform well in the more “standard” regions of the parameter space, in which γ1 is far from zero, while not being overly conservative in the regions of the parameter space in which γ1 is closer to zero. The reason for this is that ch(j) (1−α+α/10) is relatively “close” (j) to ch(j) (1 − α) for h1 far from zero and the latter quantile approaches a standard critical value. Choosing δ to be significantly smaller than α can cause the test to perform poorly in the large γ1 regions of the parameter space. Indeed, Romano et al. (2014) recommend precisely this value when constructing their simple Bonferroni CVs for testing in moment inequality models. Similarly, choosing β in the construction of the adjusted Bonferroni CV ˆ c ) ≈ α − α/10, tends to yield tests with good so that when implementing Bonf-Adj, a(j) (h n,2

power properties. In fact, a test using this latter approach will (weakly) dominate that using a simple Bonferroni CV with δ = α/10 since they will both use (approximately) the same localized quantile level while the adjusted Bonferroni CV will maximize over a smaller (j)

set of h1 . If the user has a particular localization parameter h(j) toward which he would like to direct power, a heuristic alternative to maximizing a WAP criterion would be to (j)

find the β that minimizes the distance between ch(j) (1 − α) and cB−A (α, β, h) (assuming (j) 22 ˆ (j) For tests that deliver high power at most alternatives without a particular E[h n ] ≈ h ).

direction, one may take a simple agnostic approach of forming a minimum Bonferroni CV (j)

Recalling Jensen’s inequality, one may prefer to choose β to minimize |ch(j) (1 − α) − E[cB−A (α, β, e h)]|, though this typically leads to similar results. 22

27

→ − → − with β = 0 (using the plug-in LF CV) and δ = (α/10, . . . , α − α/10).

If the user is interested in directing power toward a particular localization parameter but not at the expense of large power loss at other localization parameters, the mini→ − (j) mum Bonferroni CV should be used. Minimizing over a rich set of cB ’s via e.g., δ =

(α/10, α/5, . . . , α − α/10), ensures decent power over most of the parameter space while → − (j) minimizing over cB−A ’s by constructing the elements of β along the lines of the previous paragraph allows one to direct power toward particular regions of interest. We provide one (j)

word of caution when proceeding along these lines: using “too many” cB−A ’s in the construction of the minimum Bonferroni CV has the potential to lead to poor power properties ˆ c ) necessary to control asymptotic size via Asby inflating the size-correction factors η (j) (h n,2 ˆ c )’s are necessary to satisfy this assumption the sumption η(ii). All else equal, larger η (j) (h n,2 → − → − larger the β vector is. This issue is not as much of a concern for the size of the δ vector due to the conservativeness already built into the simple Bonferroni approach. However, using → − ˆ c )’s that are often a β vector with only a few elements usually leads to very small η (j) (h n,2 quite difficult to distinguish from zero. ˆ n , ξˆn ) can become no Finally, as the sets Aε for i = 1, . . . , u become smaller, cB (α, α − δ, h i

greater and will become smaller under some data configurations. Thus, a smaller Aεi leads to

(weakly) higher power for the resultant test. Though these sets do not affect asymptotic size, very small Aεi ’s can induce small amounts of finite sample size-distortion that disappears as the sample size grows. 4

Asymptotic Power Comparisons: Inference After Model Selection

The power properties of tests using Bonferroni-based CVs will depend upon the context of the particular testing problem, including the choice of test statistic. In order to focus and simplify the discussion, we analyze the asymptotic power of tests associated with Bonferronibased CVs in the context of the simple testing after model selection examples introduced in Sections 2 and 3. Most of the qualitative features on power discussed in this section carry over with proper modification to more complicated testing after model selection settings, such as that of McCloskey (2017). Most reasonable choices of test statistics, coupled with the Bonferroni-based CVs described in this paper, yield tests that are consistent under fixed alternatives. Indeed, this is the case for our testing after model selection examples. Hence, we analyze the asymptotic power properties of the tests under local alternatives. In the context of our testing after model selection example, we consider (contiguous)

28

local alternatives of the form Ha,n : θ = θn = θ0 + µn−1/2 (1 + o(1)), where µ ∈ R. Under Ha,n and any drifting sequence of nuisance parameters {γn,h }, the

asymptotic distributions of both Tn (θ0 ) and the Bonferroni-based CVs will depend upon ˆ n) both µ and h. Beginning with the post-conservative selection testing problem, let CV (h → − → − ˆ n , ζˆn ) (which do not depend ˆ n , ζˆn ), cB−A (α, β, h ˆ n , ζˆn ) or cB−M (α, δ , β , h denote cB (α, α − δ, h

upon ζˆn in this usual case). Under Ha,n and {γn,h } (described in Section 2.2), we have the following result:

ˆ n )) Pθn ,γn,h (Tn (θ0 ) > CV (h → P (|µ + h2,1 Zˆ1 |1(|h1 + Z2 | > c) + |µ − h1 h2,1 h2,2 + h2,1 (1 − h22,2 )1/2 Ze1 |1(|h1 + Z2 | ≤ c) > CV (h1 + Z2 , h2,1 , h2,2 )),

where



Zˆ1



   e  d  Z1  ∼ N   Z2



(15)

 

1 0    q     0  ,  1 − h22,2    0 h2,2

q  2 1 − h2,2 h2,2   1 0  .  0 1

(16)

The limiting probability (15) is equal to the local asymptotic power of the test based upon ˆ n ) under Ha,n and {γn,h }. See Appendix II for the derivation of this result Tn (θ0 ) and CV (h

and other results found in this section.

For the post-consistent selection testing problem, local asymptotic power analysis becomes slightly more complicated. Now, under Ha,n and {γn,h }, the local asymptotic power of tests using Bonferroni-based CVs depends upon (h2,3 , h2,4 ) as well. However, the expression simplifies in (likely the most practically relevant) scenarios under which ζ∞,h ∈ (−1+ε, 1−ε) or ζ∞,h ∈ [−∞, −1 − ε) ∪ (1 + ε, ∞].23 For reasonable constructions of Bonferroni-based CVs in the latter case,

ˆ n , ζˆn )) → P (|µ + h2,1 Zˆ1 | > h2,1 z1−α/2 ) Pθn ,γn,h (Tn (θ0 ) > CV (h

(17)

− → − ˆ ˆ ˆ n , ζˆn ) denotes cB (α, α − δ, h ˆ n , ζˆn ), cB−A (α, β, h ˆ n , ζˆn ) or cB−M (α, → where CV (h δ , β ,h n , ζn ) and Zˆ1 is defined in (16). This asymptotic power function not only coincides for adjusted 23

The intermediate cases for which ζ∞,h ∈ (−1 − ε, −1 + ε) ∪ (1 − ε, 1 + ε) are not as practically relevant when ε is chosen to be very small in the construction of the CVs since this set approaches Lebesgue measure zero as ε → 0. It should however be noted that inference that makes use of Bonferroni-based CVs may be somewhat conservative in these cases since the sets {i1 , . . . , iJ } used in the construction of the CVs will no longer be singletons.

29

and minimum Bonferroni CVs but also with the local asymptotic power function of the standard “full regression” test which always uses the full regression (1) without engaging in variable selection beforehand. Hence, it is not terribly interesting to study local asymptotic power under these parameter sequences. On the other hand, under Ha,n and {γn,h } with ζ∞,h ∈ (−1 + ε, 1 − ε),

ˆ n , ζˆn )) → P (|µ − h1 h2,1 h2,2 + h2,1 (1 − h2 )1/2 Ze1 | Pθn ,γn,h (Tn (θ0 ) > CV (h 2,2 > CV ((h1 + Z2 , h2,1 , h2,2 ), ζ∞,h )),

(18)

where Ze1 and Z2 are defined in (16).

We begin by comparing the local asymptotic power of tests under moderate correlation

between the regressors of 0.5, corresponding to h2,2 = −0.5. The asymptotic standard deviation h2,1 is normalized to unity throughout this section and we examine parameter sequences under Ha,n and {γn,h } with ζ∞,h ∈ (−1 + ε, 1 − ε). Figures 1-3 present six local

asymptotic power curves corresponding to the power of a α = 0.05 (i) full regression test, (ii) post-conservative selection test using an adjusted Bonferroni CV with β = 0.35, (iii) postconservative selection test using a simple Bonferroni CV with δ = 0.045, (iv) post-consistent selection test using an adjusted Bonferroni CV with β = 0.7, (v) post-consistent selection

test using a simple Bonferroni CV with δ = 0.045 and (vi) post-conservative selection test using the plug-in LF CV. For the post-conservative selection tests, we set c = z1−α/2 = 1.96, in accordance with standard pretesting. The choices of tuning parameters for the adjusted and simple Bonferroni CVs correspond to the general recommendations of Section 3.5: in the simple Bonferroni CVs, δ = α − α/10 and in the adjusted Bonferroni CVs, β is chosen ˆ c ) = α − α/10. Starting with h1 = 0 in Figure 1, we can see that the adjusted so that a(1) (h n,2 Bonferroni CVs dominate their simple counterparts, often by a wide margin. We can also see that the test using the post-conservative plug-in LF CV and that using the post-conservative simple Bonferroni CV perform somewhat similarly under this parameter configuration, while they are both somewhat outperformed by the post-conservative adjusted Bonferroni CV and significantly outperformed by the post-consistent adjusted Bonferroni and full regression tests. Finally, it is interesting to note that the full regression test delivers higher power than all of the others except for the post-consistent adjusted Bonferroni test, which has slightly higher power at more distant alternatives. This is consistent with optimality results presented in Elliott et al. (2015): the full regression test is difficult to beat in terms of power when θ is one-dimensional and the parameter space for ξ is unrestricted. Moving on to h1 = 2 in Figure 2, the power curves of the post-conservative simple 30

Bonferroni, adjusted Bonferroni and plug-in LF tests nearly coincide and actually beat the full regression test over a range of positive µ values. Apart from the heavy dominance of the post-consistent simple Bonferroni test by its adjusted Bonferroni counterpart, an interesting feature emerges from the post-consistent adjusted Bonferroni test: for µ > 0 it has significantly higher power than all other tests but exhibits low power for µ < 0. This feature is further exaggerated as h1 grows in value. As we can see from Figure 3 with corresponding h1 = 10, the post-consistent adjusted Bonferroni test displays much higher power than the others for µ > 0 but now exhibits trivial power at negative µ values. Though the postconsistent adjusted Bonferroni test has asymptotic power equal to one at all two-sided global (fixed) alternatives, it essentially becomes a one-sided test for local alternatives as h1 grows. This behavior can be explained by analyzing the quantity |µ − h1 h2,1 h2,2 + h2,1 (1 − h22,2 )1/2 Ze1 |

inside of (18). The post-consistent adjusted Bonferroni test will have high power under local alternatives for which the sign of µ is opposite the sign of h1 h2,2 (h2,1 is always positive). In

practice, if the applied researcher would like high power at positive alternatives for which θ > θ0 (θ < θ0 ) while retaining global power against two-sided alternatives, the post-consistent adjusted Bonferroni test can be used to achieve this goal if he has reason to believe that ξ and the correlation between the regressors have the same (opposite) sign. Thus far, our discussion has focused upon the power properties of the adjusted and simple Bonferroni CVs. To illustrate the differences in power between tests using adjusted and minimum Bonferroni CVs, we now focus the analysis on post-conservative selection testing under a relatively high degree of correlation between the regressors, setting h2,2 = −0.9.

For this analysis, Figures 4 and 5 present four local asymptotic power curves corresponding to the power of a α = 0.05 test using (i) an adjusted Bonferroni CV with β = 0.25, (ii) → − → − a minimum Bonferroni CV with δ = (0.005, 0.01, . . . , 0.045) and β = 0.25, (iii) a plug-in → − → − LF CV and (iv) a minimum Bonferroni CV with δ = (0.005, 0.01, . . . , 0.045) and β = 0. Figures 4 and 5, with corresponding h1 = 2 and 10, illustrate the general recommendation → − from Section 3.5 that minimum Bonferroni CVs with β = 0 yield tests with high power at most alternatives. We can also see that the adjusted Bonferroni and LF CV tend to direct power more sharply toward specific alternatives (e.g., µ ∈ (0.5, 2) and h1 = 2) at the cost of

others (e.g., µ ∈ R and h1 = 10). The power of the test using a minimum Bonferroni CV → − with β = 0.25 is somewhat of an intermediate case, directing power toward specific regions while still maintaining relatively high power over most of the parameter space. Indeed, for large h1 , Figure 5 has the power curves of both minimum of Bonferroni tests coinciding at a higher level than the adjusted Bonferroni tests. 31

5

Finite Sample Monte Carlo Results: Inference After Model Selection

In this section, we analyze the finite sample NRPs and power of tests using Bonferronibased CVs in the same testing after model selection context of the previous section. For this analysis, we examine the null hypothesis H0 : θ = 0 and a sample size of n = 120. 2 2 = 1 and = σθ,n To maintain the same scale across figures, we normalize the variances σξ,n

use normally distributed data so that xi and zi are normally distributed with mean zero, 2 2 variance 1/(1 − γ2,2,n ) and covariance −γ2,2,n /(1 − γ2,2,n ), where γ2,2,n is the finite sample ˆ ˆ correlation between the ξ and θ. To shed light upon the exact (finite sample) size of the tests, for the fixed values of γ2,1,n = 1 and γ2,2,n = −0.5 and −0.9, we examine maximum NRPs over a fine grid of ξ values for tests of nominal size α = 0.05.24 For γ2,2,n = −0.5 (γ2,2,n = −0.9), we do this

for (i) the full regression test, (ii) the post-conservative selection test using the plug-in LF CV, (iii) the post-conservative selection test using an adjusted Bonferroni CV with β = 0.35 (β = 0.05), (iv) the post-conservative selection test using a minimum Bonferroni CV with → − → − δ = (0.005, 0.01, . . . , 0.045) and β = 0 and (v) the post-consistent selection test using an adjusted Bonferroni CV with β = 0.7 (β = 0.3). For the post-conservative selection tests, we √ set c = 1.96 and for the post-consistent selection tests, we set c = log n (in accordance with the BIC). For the adjusted Bonferroni tests, the β values were chosen because they yield tests with high average local asymptotic power (in accordance with the recommendation of the previous section) and for the post-consistent test, ε was chosen to be very small, equal to 0.0001. The maximum NRP results are all computed from 1,000,000 Monte Carlo replications. For γ2,2,n = −0.5 (γ2,2,n = −0.9), the maximum NRPs are as follows: (i) 0.055 (0.055), (ii) 0.056 (0.055), (iii) 0.056 (0.052), (iv) 0.056 (0.057) and (v) 0.068 (0.075). Except for those based upon the post-consistent adjusted Bonferroni test, all maximum NRPs are nearly identical and indicative of minimal finite sample size distortion. The maximum NRPs for the former test exhibit some minor finite sample size distortion that disappears as the sample size grows. On the other hand, Leeb and P¨otscher (2005) show that when using standard CVs in this context, the NRPs of the resultant test can far exceed the nominal level, yielding potentially extreme size-distortion. In addition to the previous section, we further illustrate the power properties of the Bonferroni-based CV tests, now in finite samples, by examining tests (i)-(v) of the previous 24

Specifically, we search over a grid of |ξ| values over the interval [0, 10] using step sizes 0.01, 0.1 and 1 on the intervals [0, 1], [1, 5] and [5, 10], respectively. We also examine the values of |ξ| = 999, 999. All maximum NRPs occur at values for which |ξ| ≤ 0.31.

32

paragraph with γ2,2,n = −0.5 but moving into the alternative parameter space. All finite

sample power results are computed using 100,000 Monte Carlo replications. Figures 6–11 graph the finite sample size-corrected power of the tests now as a function of ξ (the finite

sample counterpart of h1 in the previous section), for each configuration of θ = 0.1 or 0.2 (the finite sample counterpart of µ) and γ2,2 = −0.25, −0.5 or −0.75. By varying ξ rather

than θ, we obtain a slightly different view of the power properties of the tests than in the previous section. From the figures, we can see that the Bonferroni-based tests attain higher power than the plug-in LF test over most of the parameter space, that the test using the adjusted Bonferroni CV in the post-consistent selection problem dominates the tests based upon conservative selection and that the full regression test is difficult to beat. Nevertheless, in agreement with the analysis of the previous section, we can see that the post-consistent selection test using the adjusted Bonferroni CV has the potential to beat the full regression test over a portion of the parameter space for which ξ and γ2,2,n have the opposite sign. This power gain is at the expense of some power loss over a portion of the parameter space for which the sign of ξ and γ2,2,n are the same. This power tradeoff appears to be less severe when the correlation between the regressors is smaller, quantified here by γ2,2 . In comparison to the full regression test, this type of power tradeoff may be desirable in some applications. For example, suppose the applied researcher has some a priori belief about the sign of the coefficient ξ (e.g., based on intuition or economic reasoning) but is not certain about this belief. In this case, the post-consistent selection test could be used to deliver higher power over the more a priori likely (to the user) region of the parameter space for ξ while still being consistent over the entire parameter space.25 Also in agreement with the previous section, we can see that the power of these two tests coincides for large |ξ| values. The tests using the post-conservative selection Bonferroni-based CVs also nearly attain this power level for large |ξ| values. This shows that at parameter values distant from the point causing the discontinuity in the asymptotic distribution of the test statistic (ξ = 0 here), the uniformly size-correct tests attain (nearly) the same power as the size-distorted test that ignores the effects of model selection.26 Finally, it is interesting to note that, at least for the parameter configurations, application and tuning parameter choices considered here, the test using the adjusted Bonferroni CV appears to dominate the test using the minimum Bonferroni CV in the post-conservative selection problem. 25

Recall that hc2 is consistently estimable so that the user would know the region of the parameter space in which power gains were possible. 26 For large |ξ| values, the model selection procedure would “choose” the full regression and the user would use a standard normal CV scaled by the standard error from the full regression.

33

Continuing the power analysis of the post-consistent selection adjusted Bonferroni test vis-´a-vis the full regression test, it is interesting to examine the finite sample power of the former while varying the selection threshold c. Finite sample power analysis sheds new light on the features of this test since asymptotically we have c = cn → ∞ under this regime

so that local asymptotic power analysis abstracts away from the finite sample choice of c. Figure 12 graphs the power of this test and the full regression test under the same DGP as Figure 8 across five different values for c ∈ {2, 4, 6, 8, 10}. The results are remarkable: as

c increases, we obtain larger power gains over wider regions of positive values of ξ at the expense of larger power losses over wider regions of negative values of ξ, while the test still

retains its global two-sided nature and coincides with the full regression test at large values of |ξ| for any choice of c.27 In relation to the discussion above, we can see that the applied researcher can choose the tuning parameter c according to the amount of power tradeoff

desired to deliver higher power over the more a priori likely region of the parameter space. We include one final figure, Figure 13, to emphasize just how large the power gains from using Bonferroni-based, rather than plug-in LF, CVs can be in practice. This particularly dramatic example uses the same DGP as that in the previous two figures but with a higher correlation between the regressors with γ2,2,n = 0.9. Instead of examining the non-studentized post-conservative test statistic of (2), we now studentize the statistic in accordance with the example of Andrews and Guggenberger (2009a). That is, we use the exact same conservative model selection decision rule but use the post-selection t-statistic as Tn (θ0 ) instead. Here the alternative is set to θ = 0.3 and we focus the analysis on CVs (ii) and (iv). It is clear from this figure that tests using Bonferroni-based CVs can enjoy very large power gains relative to the plug-in LF CV, with a power difference of nearly 100% over most of the parameter space for ξ. 6

Appendix I: Algorithms for Constructing Bonferroni-Based Critical Values

The following is an algorithm for constructing the adjusted Bonferroni CVs in practice for the fully general case. ˆ c and ζˆn and find all i ∈ {1, . . . , u} such that ζˆn ∈ Aε . Algorithm Bonf-Adj Compute h n,2 i

Denote this set as {i1 , . . . , iJ }. For each j ∈ {i1 , . . . , iJ }, do the following: 27

The test retains its “global two-sided nature” in the sense that it is consistent against all fixed two-sided alternatives.

34

ˆ c ) ≡ {h(j) ∈ H ¯ (j) : ¯ (j) (h 1. Create a fine grid of the space H n,2 ˆc ¯ (j) (h H grid n,2 ).

ˆ c }. Call it hc2 = h n,2

ˆc ¯ (j) (h 2. For each h(j) ∈ H grid n,2 ): f ,e (a) Simulate R draws from the joint distribution of (W h1 ). h (j)

(j)

(b) Find the smallest (δ (j) , δ¯(j) ) ∈ [0, α] for which Assumption Cont-Adj holds and divide [δ (j) , α − δ¯(j) ] into a grid: δ (j) = αg < αg−1 < . . . < α1 = α − δ¯(j) .28 (c) Set k = 1. f (j) ≥ sup (j) (j) e c (j) c (1 − αk )) from the simulation draws (d) Approximate P (W h h ∈I (h) (h ,h ) 1

1

β

2

of Step (a).

(e) If the quantity in Step (d) is less than or equal to α, stop and set e a(j) (h(j) ) equal to αk . Otherwise, let k = k + 1 and return to Step (d).

ˆ c ) by min (j) ¯ (j) 3. Approximate a(j) (h n,2 h ∈H

ˆc grid (hn,2 )

e a(j) (h(j) ).

ˆ n ) for j ∈ {i1 , . . . , iJ } and resulting cB−A (α, β, h ˆ n , ζˆn ) Finally, construct the approximate cB−A (α, β, h ˆ c ) from Step 3. using the approximate a(j) (h (j)

n,2

f ,e Remark 7. The random vector (W h1 ) is typically a function of some underlying random h (j)

(j)

vector and h(j) . In this case, computation can be sped up by only simulating these underlying (j) f (j) , e random vectors once, instead of repeatedly simulating R draws of (W h1 ) for every h(j) ∈ h ˆ c ). ¯ (j) (h H grid

n,2

ˆ c ) will be and the higher will Remark 8. The finer the grid in Step (b), the larger a(j) (h n,2 be the power of the resultant test. Of course, the finer the grid in Step (b) is, the longer ˆ c ) will take to compute. a(j) (h n,2

Remark 9. If αg in Step (b) is not small enough, Step (e) may not converge to a solution. If this occurs, there are two possible solutions: (i) make δ (j) = αg smaller while still satisfying Assumption Cont-Adj or, if this is not possible, (ii) reduce β to increase the coverage of (j) Iβ (e h). From Theorem Bonf, we know that if β ≤ α, a solution to the algorithm exists (and

is bounded above by α − β) when a small enough αg is used. On the other hand, a large β is often both feasible and desirable from a power perspective. The largest feasible β for Technically, a “smallest” (δ (j) , δ¯(j) ) ∈ [0, α] may not exist. For practical purposes, this is irrelevant. Instead, one may find small values of (δ (j) , δ¯(j) ) such that Assumption Cont-Adj is satisfied. 28

35

which the algorithm converges to a solution will depend upon the dependence structure of the particular testing problem at hand. From the author’s personal experience, for tests with α = 0.05 and β ≤ 0.5, the algorithm will typically converge to a solution when αg = α/10. Next, we provide the analogous algorithm for constructing the minimum Bonferroni CVs. ˆ c and ζˆn and find all i ∈ {1, . . . , u} such that ζˆn ∈ Aε . Algorithm Bonf-Min Compute h i n,2 Denote this set as {i1 , . . . , iJ }. For each j ∈ {i1 , . . . , iJ }, do the following: ˆ c ) ≡ {h(j) ∈ H ¯ (j) : ¯ (j) (h 1. Create a fine grid of the space H n,2 ˆc ¯ (j) (h H grid n,2 ).

ˆ c }. Call it hc2 = h n,2

ˆc ¯ (j) (h 2. For each h(j) ∈ H grid n,2 ), f ,e (a) Simulate R draws from the joint distribution of (W h1 ). h (j)

(j)

(b) Choose a small grid-step value ηg > 0. Set η1 = 0. (c) Set k = 1. (d) Approximate

f ≥ min{ min c (α, α − δs , e P (W h), min cB−A (α, βa , e h)} + ηk ) B h (j)

(j)

(j)

1≤s≤S

1≤a≤A

from the simulation draws of Step (a).

(e) If the quantity in Step (d) is less than or equal to α, stop and set ηe(j) (h(j) ) equal to ηk . Otherwise, let ηk = ηk + ηg and return to Step (d).

ˆ c ) by max (j) ¯ (j) 3. Approximate η (j) (h n,2 h ∈H

ˆc grid (hn,2 )

ηe(j) (h(j) ).

→ − → − ˆ (j) Finally, construct the approximate cB−M (α, δ , β , h n ) for j ∈ {i1 , . . . , iJ } and resulting → − → − ˆ ˆ (j) ˆ c cB−M (α, δ , β , hn , ζn ) using the approximate η (h ) from Step 3. n,2

Remark 7 applies here as well, as does the following analog of Remark 8. ˆ c ) will be and the Remark 10. The smaller the grid-step in Step (b), the smaller η (j) (h n,2 higher will be the power of the resultant test. Of course, the smaller the grid-step in Step (b) ˆ c ) will take to compute. is, the longer η (j) (h n,2

36

7

Appendix II: Proofs of Main Results

Proof of Theorem Bonf: We begin by obtaining a preliminary expression for the asymptotic size of the test: ˆ n , ζˆn )) ≡ lim sup sup Pθ ,γ (Tn (θ0 ) > cB (α, α − δ, h ˆ n , ζˆn )) AsySz(Tn (θ0 ), cB (α, α − δ, h 0 n→∞

γ∈Γ

ˆ n , ζˆn )) = lim sup Pθ0 ,eγn (Tn (θ0 ) > cB (α, α − δ, h n→∞

ˆ kn , ζˆkn )), = lim Pθ0 ,eγkn (Tkn (θ0 ) > cB (α, α − δ, h n→∞

where {e γn : n ≥ 1} is a sequence in Γ and {kn : n ≥ 1} is a subsequence of {n : n ≥ 1}.

Such a sequence and subsequence always exist. There exists a subsequence {wn : n ≥ 1} of 1/2 ewn ,3 ∈ Γ3 for all n ≥ 1 ewn ,2 → h2 ∈ Rq∞ and γ {kn : n ≥ 1} such that wn γ ewn ,1 → h1 ∈ Rp∞ , γ

(see, e.g., the proof of Theorem 1 of Andrews and Guggenberger, 2010b). Hence,

ˆ n , ζˆn )) = lim Pθ ,eγ ˆ ˆ AsySz(Tn (θ0 ), cB (α, α − δ, h 0 wn ,h (Twn (θ0 ) > cB (α, α − δ, hwn , ζwn )) n→∞

(19)

for some h ∈ H.

ˆ ωn , ζˆωn )) ≤ α Given (19), it suffices to show lim supn→∞ Pθ0 ,γωn ,h (Tωn (θ0 ) > cB (α, α − δ, h

for all h ∈ H, subsequences {ωn : n ≥ 1} of {n : n ≥ 1} and sequences {γωn,h : n ≥ 1} ˆ n , ζˆn )) ≤ α. In this vein, take any h ∈ H and, to establish AsySz(Tn (θ0 ), cB (α, α − δ, h suppose via Definition MLLD that ζ∞,h ∈ Aj for some j ∈ {1, . . . , u}. Assuming WLOG (i) ˆ ωn ) ≥ 0 with probability 1 for all i ∈ {1, . . . , u}, that cB (α, α − δ, h ˆ ωn , ζˆωn )) lim sup Pθ0 ,γωn ,h (Tωn (θ0 ) > cB (α, α − δ, h n→∞

(i) ˆ ωn )}) = lim sup Pθ0 ,γωn ,h (Tωn (θ0 ) > max {1(ζˆωn ∈ Aεi ) cB (α, α − δ, h n→∞

i∈{1,...,u}

(j) ˆ ωn )) ≤ lim sup Pθ0 ,γωn ,h (Tωn (θ0 ) > 1(ζˆωn ∈ Aεj ) cB (α, α − δ, h n→∞

(j) ˆ ωn )), = lim sup Pθ0 ,γωn ,h (Tωn (θ0 ) > cB (α, α − δ, h

(20)

n→∞

where is an operator such that a b = ab for any a, b ∈ R but 0 b = 0 for any b ∈ R∞

and the final equality follows from Assumption Sel and the fact that 1(x ∈ Aεj ) is continuous

at any x ∈ Aj . ¯ (j),∞ , Assumption Inf immediately implies that (20) is bounded above by α. If h(j) ∈ H ¯ (j) . By Assumption DS and Definition MLLD, under H0 and any Instead, suppose h(j) ∈ H d ˆ ωn ) −→ f (j) a.s. Given Assumptions {γωn ,h : n ≥ 1}, (Tωn (θ0 ), h (Wh , e h), where Wh ≤ W h

37

(j) ˆ n ) is continuous in h ˆn Cont(i) and CS, the Maximum Theorem implies that cB (α, α − δ, h (j) ˆ (j) ˆ c ). Hence, Assumptions DS and CS imply that (20) is bounded ¯ 1 (h when Iβ (hn ) ⊂ H n,2

above by

(j) (j) P (Wh ≥ cB (α, α − δ, e h)) = P (Wh ≥ cB (α, α − δ, e h) ≥ ch(j) (1 − δ)) (j) + P (Wh ≥ ch(j) (1 − δ) > c (α, α − δ, e h))

+ P (ch(j) (1 − δ) > Wh ≥

B (j) cB (α, α

− δ, e h))

(j) ≤ P (Wh ≥ ch(j) (1 − δ)) + P (ch(j) (1 − δ) > cB (α, α − δ, e h)) (j) (j) (j) f ≥ ch(j) (1 − δ)) + P (h1 ∈ ≤ P (W / I (e h)) ≤ α, h

α−δ

(j) where the second inequality follows from Definition MLLD and the form of cB (α, α − δ, e h),

while the third inequality results from Assumptions CS and Cont(ii). We have now established the first statement of the theorem. For the second statement, under Assumption LB, we have ˆ n , ζˆn )) AsySz(Tn (θ0 ), cB (α, α − δ, h

ˆ n , ζˆn )) ≥ lim sup Pθ0 ,γn,h∗ (Tn (θ0 ) > cB (α, α − δ, h n→∞

(i) ˆ n )}) = lim sup Pθ0 ,γn,h∗ (Tn (θ0 ) > max {1(ζˆn ∈ Aεi ) cB (α, α − δ, h n→∞

i∈{1,...,u}

(j) ˆ n )) = lim sup Pθ0 ,γn,h∗ (Tn (θ0 ) > cB (α, α − δ, h n→∞

≥ lim sup Pθ0 ,γn,h∗ (Tn (θ0 ) > n→∞

sup (j)

(j)

ˆc ) ¯ (h h1 ∈H 1 n,2

c(h(j) ,hˆ c ) (1 − δ)) 1

n,2

f (j) = P (Wh∗ > ch∗(j) (1 − δ)) = P (W h∗ > ch∗(j) (1 − δ)) = δ,

where the second equality results from Assumption Sel and the fact that for i = 1, . . . , u, 1(x ∈ Aεi ) is continuous at any x ∈ Aj such that x ∈ / Cl(Ai ) for i 6= j, the second inequality follows from Assumption CS, the third equality uses Assumption DS and the final equality results from Assumption Cont(ii).  Proof of Theorem Bonf-Adj: Similar arguments to those used in the first paragraph of the proof of Theorem Bonf provide that it is sufficient to show lim supn→∞ Pθ0 ,γωn ,h (Tωn (θ0 ) > ˆ ωn , ζˆωn )) ≤ α for all h ∈ H, subsequences {ωn : n ≥ 1} of {n : n ≥ 1} and cB−A (α, β, h

ˆ n , ζˆn )) ≤ α. Take any sequences {γωn,h : n ≥ 1} to establish AsySz(Tn (θ0 ), cB−A (α, β, h

h ∈ H and suppose via Definition MLLD that ζ∞,h ∈ Aj for some j ∈ {1, . . . , u}. Nearly

38

identical arguments to those leading up to (20) in the proof of Theorem Bonf provide that ˆ ωn , ζˆωn )) ≤ lim sup Pθ ,γ (Tωn (θ0 ) > c(j) (α, β, h ˆ ωn )). lim sup Pθ0 ,γωn ,h (Tωn (θ0 ) > cB−A (α, β, h 0 ωn ,h B−A n→∞

n→∞

(21) (j),∞ ¯ If h ∈ H , Assumption Inf-Adj immediately provides that (21) is bounded above by ¯ (j) . By Assumptions Cont-Adj, a(i) and CS, the Maximum α. Instead, suppose h(j) ∈ H (j) ˆ n ) is continuous in h ˆ n when I (j) (h ˆ n) ∈ H ˆ c ). Hence, ¯ 1(j) (h Theorem implies that c (α, β, h (j)

B−A

β

n,2

Assumptions DS and CS provide that (21) is bounded above by (j) P (Wh ≤ cB−A (α, β, e h)) = P (Wh ≥

sup (j) (j) h) h1 ∈Iβ (e

f (j) ≥ ≤ P (W h

c(h(j) ,hc ) (1 − a(j) (hc2 )))

sup

(j) (j) h1 ∈Iβ (e h)

2

1

c(h(j) ,hc ) (1 − a(j) (hc2 ))) ≤ α, 1

2

where the first inequality follows from Definition MLLD and the second inequality results from Assumption a(ii). Under Assumption LB-Adj, nearly identical arguments to those used to establish the lower bound in the proof of Theorem Bonf provide that ˆ n , ζˆn )) ≥ P (W f∗ ≥ AsySz(Tn (θ0 ), cB−A (α, β, h h (j)

sup (j) (j) ∗ h1 ∈Iβ (e h )

c(h(j) ,hc∗ ) (1 − a(j) (hc∗ 2 ))) = α.  1

2

Proof of Theorem Bonf-Min: Nearly identical arguments to those used in the proof

of Theorem Bonf-Adj imply that it is sufficient to to show → − → − ˆ (j) lim sup Pθ0 ,γωn ,h (Tωn (θ0 ) > cB−M (α, δ , β , h ωn )) ≤ α

(22)

n→∞

¯ (j) , subsequences for all h ∈ H such that ζ∞,h ∈ Aj for some j ∈ {1, . . . , u} and h(j) ∈ H {ωn : n ≥ 1} of {n} and sequences {γωn ,h : n ≥ 1} to establish the upper bound on asymptotic (j)

size. By Assumptions Cont-Adj, a(i) applied to each aa and CS applied to each β = βa , (j) ˆ n ) is continuous in h ˆ n when the Maximum Theorem provides that min1≤a≤A cB−A (α, βa , h (j) ˆ (j) ˆ c ) for each a = 1, . . . , A. Similarly, by Assumptions Cont-Adj and CS ¯ (h I (hn ) ∈ H βa

1

n,2

(j) ˆ n) applied to each β = α − δs , the Maximum Theorem provides that min1≤s≤S cB (α, α − δs , h ˆ n when I (j) (·) ∈ H ˆ c ) for each s = 1, . . . , S. Hence, Assumptions ¯ 1(j) (h is continuous in h n,2 α−δs

DS and CS applied to each β = α − δs and β = βa and Assumption η(i) imply that (22) is

bounded above by

→ − → − − → − (j) f (j) ≥ c(j) (α, → P (Wh ≥ cB−M (α, δ , β , e h)) ≤ P (W δ , β ,e h)) ≤ α, B−M h 39

where the first inequality follows from Definition MLLD and the second results from Assumption η(ii). The proof for the lower bound is nearly identical to that used in Theorems Bonf and Bonf-Adj.  Derivation of (15), (17) and (18): Under Ha,n , {γn,h } and standard assumptions on

the DGPs, √ √ d n(θˆn − θ0 ) = n(θn + [X 0 MZ X]−1 X 0 MZ e − θ0 ) −→ µ + h2,1 Zˆ1 , q √ √ d n(θen − θ0 ) = n(θn + [X 0 X]−1 X 0 Zξn + [X 0 X]−1 X 0 e − θ0 ) −→ µ − h1 h2,1 h2,2 + h2,1 1 − h22,2 Ze1 √ √ d ˆ n,1 = nξˆn /ˆ σξ,n = n(ξn + [Z 0 MX Z]−1 Z 0 MX e)/ˆ σξ,n −→ h1 + Z2 , (23) tn = h d where Zˆ1 , Ze1 , Z2 ∼ N (0, 1). Note that each weak convergence in (23) occurs jointly since

the central limit theorem is applied to n−1/2 X 0 MZ e, n−1/2 X 0 e and n−1/2 Z 0 MX e. To obtain the covariance matrix in (16), note that √ √ Cov( nθˆn , nθen ) = nE[(X 0 MZ X)−1 X 0 MZ ee0 X(X 0 X)−1 ] = σ 2 E[(n−1 X 0 X)−1 ] → h22,1 (1 − h22,2 ) √ √ Cov( nθˆn , nξˆn ) = nE[(X 0 MZ X)−1 X 0 MZ ee0 MX Z(Z 0 MX Z)−1 ] = σ 2 E[(n−1 X 0 MZ X)−1 n−1 X 0 MZ MX Z(n−1 Z 0 MX Z)−1 ] → h2,1 h2,2 σξ

√ √ Cov( nθen , nξˆn ) = nE[(X 0 X)−1 X 0 ee0 MX Z(Z 0 MX Z)−1 ] = 0.

Finally, expression (2), the joint convergence results of (23) and the fact that each CV (·) considered is continuous (see the Proofs of Theorems Bonf, Bonf-Adj and Bonf-Min for details), yield expression (15). Turning to expression (17), under Ha,n , {γn,h } for which ζ∞,h ∈ [−∞, −1 − ε) ∪ (1 + ε, ∞] p d e ˆ n −→ and standard assumptions on the DGPs, ζˆn −→ ζ∞,h and h h = (h1 + Z2 , h2,1 , h2,2 ).

d ˆ n , ·) at ζ∞,h , CV (h ˆ n , ζˆn ) −→ By the continuity of CV (·, ζˆn ) and the continuity of CV (h CV (e h, ζ∞,h ). From (9), we have ch(2) (1 − α) = h2,1 z1−α/2 . Thus, the asymptotic simple

Bonferroni CV with δ = α is

(2) cB (α, α − δ, e h, ζ∞,h ) = cB (α, α − δ, e h) = h2,1 z1−α/2 .

Similarly, in the construction of the adjusted Bonferroni CV, a(2) (·) may be set to simply equal α for all β ∈ [0, 1] since f ≥ P (W h (2)

so that

sup (2) (2) h1 ∈Iβ (e h)

c(h(2) ,hc ) (1 − α)) = P (|h2,1 Zˆ1 | ≥ h2,1 z1−α/2 ) = α 1

2

cB−A (α, β, e h, ζ∞,h ) = cB−A (α, β, e h) = h2,1 z1−α/2 . (2)

40

Analogously, in the construction of the minimum Bonferroni CV, η (2) (·) may be set to simply → − → − → − → − equal 0 for all δ and β and cB−M (α, δ , β , e h, ζ∞,h ) = h2,1 z1−α/2 . Using (23), these results,

along with the fact that

√ d Tn (θ0 ) = | n(θˆn − θ0 )| + op (1) −→ |µ + h2,1 Zˆ1 |

√ p since |tn | − cn = | nξn /σξ | − cn + Op (1) −→ ∞ when ζ∞,h ∈ [−∞, −1 − ε) ∪ (1 + ε, ∞], yield

expression (17).

Lastly, under Ha,n and {γn,h } with ζ∞,h ∈ (−1 + ε, 1 − ε), similarly to the previous case, d ˆ n , ζˆn ) −→ we have CV (h CV (e h, ζ∞,h ) but now √ d Tn (θ0 ) = | n(θen − θ0 )| + op (1) −→ |µ − h1 h2,1 h2,2 + h2,1 (1 − h22,2 )1/2 Ze1 |

√ p since |tn | − cn = | nξn /σξ | − cn + Op (1) −→ −∞ when ζ∞,h ∈ (−1 + ε, 1 − ε). Convergence ˆ n , ζˆn ) occurs jointly by the joint convergence results of (23), leading to of Tn (θ0 ) and CV (h expression (18).  8

Appendix III: High-Level Assumption

Under what is referred to as the “usual case” in the main text of the paper, instead of imposing the various lower-level assumptions on limiting random variables under drifting sequences of parameters and the continuity of underlying objects in the main text, one may instead prefer to work with the following high-level assumption. Assumption HLUC (High-Level, Usual Case). There exists a localization parameter hh ) whose distribution is indexed by h such space H ⊂ Rp∞ × Rq∞ and random variables (Wh , e

that

lim sup sup Pθ0 ,γ (Tn (θ0 ) > CVn ) ≤ sup P (Wh ≥ CV (e hh )), n→∞

γ∈Γ

h∈H

ˆ n , ζˆn ), (b) the adjusted Bonwhere CVn denotes (a) the simple Bonferroni CV cB (α, α − δ, h − → − ˆ ˆ ˆ n , ζˆn ) or (c) the minimum Bonferroni CV cB−M (α, → ferroni CV cB−A (α, β, h δ , β ,h n , ζn ) (1) and CV (e hh ) denotes the corresponding asymptotic counterpart (a) cB (α, α − δ, e hh ), (b) → − → − e (1) (1) e cB−A (α, β, hh ) or (c) cB−M (α, δ , β , hh ).

One can easily see from the proofs of Theorems Bonf, Bonf-Adj and Bonf-Min that by

imposing this assumption, one may drop Assumptions (a) DS, Cont(i), Sel, CS and Inf; (b) DS, Cont-Adj, Sel, CS, Inf-Adj and a(i); or (c) DS, Cont-Adj, Sel, CS, a(i) and η(i) to establish the upper bound on asymptotic size for the usual case in (a) Theorem Bonf; 41

(b) Theorem Bonf-Adj; or (c) Theorem Bonf-Min. Such a high-level assumption allows one to work directly with the limiting null rejection probabilities that arise under the limits of relevant drifting sequences of parameters.

42

References Andrews, D. W. K., Barwick, P. J., 2012. Inference for parameters defined by moment inequalities: A recommended moment selection procedure. Econometrica 80, 2805–2826. Andrews, D. W. K., Cheng, X., 2012. Estimation and inference with weak, semi-strong and strong identification. Econometrica 80, 2153–2211. Andrews, D. W. K., Cheng, X., Guggenberger, P., 2011. Generic results for establishing the asymptotic size of confidence sets and tests, Cowles Foundation Discussion Paper No. 1813. Andrews, D. W. K., Guggenberger, P., 2009a. Hybrid and size-corrected subsampling methods. Econometrica 77, 721–762. Andrews, D. W. K., Guggenberger, P., 2009b. Validity of subsampling and “plug-in asymptotic” inference for parameters defined by moment inequalities. Econometric Theory 25, 669–709. Andrews, D. W. K., Guggenberger, P., 2010a. Applications of subsampling, hybrid, and size-correction methods. Journal of Econometrics 158, 285–305. Andrews, D. W. K., Guggenberger, P., 2010b. Asymptotic size and a problem with subsampling and with the m out of n bootstrap. Econometric Theory 26, 426–468. Andrews, D. W. K., Soares, G., 2010. Inference for parameters defined by moment inequalities using generalized moment selection. Econometrica 78, 119–157. Belloni, A., Chernozhukov, V., Hansen, C., 2014. Inference on treatment effects after selection among high-dimensional controls. Review of Economic Studies 81, 608–650. Berger, R. L., Boos, D. D., 1994. P values maximized over a confidence set for the nuisance parameter. Journal of the American Statistical Association 89, 1012–1016. Campbell, J. Y., Yogo, M., 2006. Efficient tests of stock return predictability. Journal of Financial Economics 81, 27–60. Cattaneo, M. D., Crump, R. K., 2014. Comment: HAC corrections for strongly autocorrelated time series. Journal of Business and Economic Statistics 32, 324–329. Cavanagh, C. L., Elliott, G., Stock, J. H., 1995. Inference in models with nearly integrated regressors. Econometric Theory 11, 1131–1147. Chan, N. H., 1988. On the parameter inference for nearly nonstationary time series. Journal of the American Statistical Association 83, 857–862.

Chaudhuri, S., Zivot, E., 2011. A new method of projection-based inference in GMM with weakly identified nuisance parameters. Journal of Econometrics 164, 239–251. Cheng, X., 2015. Robust inference in nonlinear models with mixed identification strength. Journal of Econometrics 189, 207–228. Dufour, J.-M., 1997. Some impossibility theorems in econometrics with applications to structural and dynamic models. Econometrica 65, 1365–1387. Dufour, J.-M., 2006. Monte Carlos tests with nuisance parameters: a general approach to finite-sample inference and nonstandard asymptotics. Journal of Econometrics 133, 443– 477. Elliott, G., 1998. On the robustness of cointegration methods when regressors almost have unit roots. Econometrica 66, 149–158. Elliott, G., M¨ uller, U. K., 2014. Pre and post break parameter inference. Journal of Econometrics 180, 141–157. Elliott, G., M¨ uller, U. K., Watson, M. W., 2015. Nearly optimal tests when a nuisance parameter is present under the null hypothesis. Econometrica 83, 771–811. Elliott, G., Stock, J. H., 1994. Inference in time series regression when the order of integration of a regressor is unknown. Econometric Theory 10, 672–700. Franchi, M., Johansen, S., 2017. Improved inference on cointegrating vectors in the presence of a near unit root using adjusted quantiles, Unpublished Manuscript, University of Copenhagen, Department of Economics. Golberg, D. E., 1989. Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley. Guggenberger, P., 2010. The impact of a Hausman pretest on the asymptotic size of a hypothesis test. Econometric Theory 26, 369–382. Guggenberger, P., Kleibergen, F., Mavroeidis, S., Chen, L., 2012. On the asymptotic sizes of subset Anderson-Rubin and Lagrange multiplier tests in linear instrumental variables regression. Econometrica 80, 2649–2666. Han, S., McCloskey, A., 2016. Estimation and inference with a (nearly) singular jacobian, Unpublished Manuscript, Brown University, Department of Economics. Hansen, B. E., 1999. The grid bootstrap and the autoregressive model. Review of Economics and Statistics 81, 594–607.

Hansen, B. E., 2005. Challenges for econometric model selection. Econometric Theory 21, 60–68. Hansen, B. E., 2007. Least squares model averaging. Econometrica 75, 1175–1189. Hansen, B. E., 2014. Model averaging, asymptotic risk, and regressor groups. Quantitative Economics 5, 495–530. Hansen, B. E., 2016. Efficient shrinkage in parametric models. Journal of Econometrics 190, 115–132. Imbens, G. W., Manski, C. F., 2004. Confidence intervals for partially identified parameters. Econometrica 72, 1845–1857. Jones, D. R., 2001. A taxonomy of global optimization methods based on response surfaces. Journal of Global Optimization 21, 345–383. Kleibergen, F., 2002. Pivotal statistics for testing structural parameters in instrumental varaibles regression. Econometrica 70, 1781–1803. Leeb, H., P¨otscher, B. M., 2005. Model selection and inference: facts and fiction. Econometric Theory 21, 21–59. Leeb, H., P¨otscher, B. M., 2008. Can one estimate the unconditional distribution of postmodel-selection estimators? Econometric Theory 24, 338–376. Leeb, H., P¨otscher, B. M., 2017. Testing in the presence of nuisance parameters: some comments on tests post-model-selection and random critical values. In: Ahmed, S. E. (Ed.), Big and Complex Data Analysis: Methodology and Applications. Springer, pp. 69– 82. Loh, W.-Y., 1985. A new method for testing separate families of hypotheses. Journal of the American Statistical Association 80, 362–368. McCloskey, A., 2013. On the computation of size-correct power-directed tests with null hypotheses characterized by inequalities, Unpublished Manuscript, Brown University, Department of Economics. McCloskey, A., 2017. Asymptotically uniform tests after consistent model selection in the linear regression model, Unpublished Manuscript, Brown University, Department of Economics. Mikusheva, A., 2007. Uniform inference in autoregressive models. Econometrica 75, 1411– 1452.

Moon, H. R., Schorfheide, F., 2009. Estimation with overidentifying inequality moment conditions. Journal of Econometrics 153, 136–154. Nelson, C. R., Startz, R., 1990. Some further results on the exact small sample properties of the instrumental variable estimator. Econometrica 58, 967–976. Phillips, P. C. B., 2014. On confidence intervals for autoregressive roots and predictive regression. Econometrica 82, 1177–1195. Romano, J. P., Shaikh, A. M., Wolf, M., 2014. A practical two-step method for testing moment inequalities. Econometrica 82, 1979–2002. Romano, J. P., Wolf, M., 2000. Finite sample nonparametric inference and large sample efficiency. Annals of Statistics 28, 756–778. Shi, X., 2015. A nondegenerate Vuoung test. Quantitative Economics 6, 85–121. Silvapulle, M. J., 1996. A test in the presence of nuisance parameters. Journal of the American Statistical Association 91, 1690–1693. Staiger, D., Stock, J. H., 1997. Instrumental variables regression with weak instruments. Econometrica 65, 557–586. Stoye, J., 2009. More on confidence intervals for partially identified parameters. Econometrica 77, 1299–1315. van der Vaart, A. W., 1998. Asymptotic Statistics. Cambridge University Press. Vuong, Q. H., 1989. Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica 57, 307–333.

1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0 -4

-3

-2

-1

0

1

2

3

4

mu Full Reg

Conservative Bonf-Adj

Conservative Bonf

Consistent Bonf-Adj

Consistent Bonf

Conservative LF

Figure 1: Local Asymptotic Power, h1 = 0, h2,2 = −0.5

1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0 -4

-3

-2

-1

0

1

2

3

mu Full Reg

Conservative Bonf-Adj

Conservative Bonf

Consistent Bonf-Adj

Consistent Bonf

Conservative LF

Figure 2: Local Asymptotic Power, h1 = 2, h2,2 = −0.5

4

1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0 -4

-3

-2

-1

0

1

2

3

4

mu Full Reg

Conservative Bonf-Adj

Conservative Bonf

Consistent Bonf-Adj

Consistent Bonf

Conservative LF

Figure 3: Local Asymptotic Power, h1 = 10, h2,2 = −0.5

1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0 -4

-3

-2

-1

0

1

2

3

mu Bonf-Adj, beta=0.25

Bonf-Min, beta=0.25

LF

Bonf-Min, LF

Figure 4: Local Asymptotic Power, h1 = 2, h2,2 = −0.9

4

1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0 -4

-3

-2

-1

0

1

2

3

4

mu Bonf-Adj, beta=0.25

Bonf-Min, beta=0.25

LF

Bonf-Min, LF

Figure 5: Local Asymptotic Power, h1 = 10, h2,2 = −0.9

0.3

0.25

0.2

0.15

0.1

0.05

0 -1

-0.9

-0.8

-0.7

-0.6

-0.5

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

xi Full Reg

Conservative LF

Conservative Bonf-Adj

Conservative Bonf-Min

Consistent Bonf-Adj

Figure 6: Finite Sample Size-Corrected Power, θ = 0.1, γ2,2 = −0.25

1

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0 -1

-0.9

-0.8

-0.7

-0.6

-0.5

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

xi Full Reg

Conservative LF

Conservative Bonf-Adj

Conservative Bonf-Min

Consistent Bonf-Adj

Figure 7: Finite Size-Corrected Sample Power, θ = 0.2, γ2,2 = −0.25

0.3

0.25

0.2

0.15

0.1

0.05

0 -1

-0.9

-0.8

-0.7

-0.6

-0.5

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

xi Full Reg

Conservative LF

Conservative Bonf-Adj

Conservative Bonf-Min

Consistent Bonf-Adj

Figure 8: Finite Sample Size-Corrected Power, θ = 0.1, γ2,2 = −0.5

1

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0 -1

-0.9

-0.8

-0.7

-0.6

-0.5

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

xi Full Reg

Conservative LF

Conservative Bonf-Adj

Conservative Bonf-Min

Consistent Bonf-Adj

Figure 9: Finite Sample Size-Corrected Power, θ = 0.2, γ2,2 = −0.5

0.3

0.25

0.2

0.15

0.1

0.05

0 -1

-0.9

-0.8

-0.7

-0.6

-0.5

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

xi Full Reg

Conservative LF

Conservative Bonf-Adj

Conservative Bonf-Min

Consistent Bonf-Adj

Figure 10: Finite Sample Size-Corrected Power, θ = 0.1, γ2,2 = −0.75

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0 -1

-0.9

-0.8

-0.7

-0.6

-0.5

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

xi Full Reg

Conservative LF

Conservative Bonf-Adj

Conservative Bonf-Min

Consistent Bonf-Adj

Figure 11: Finite Sample Size-Corrected Power, θ = 0.2, γ2,2 = −0.75

0.35

0.3

0.25

0.2

0.15

0.1

0.05

0 -2

-1.8

-1.6

-1.4

-1.2

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

xi c =2

c=4

c=6

c=8

c = 10

Full Reg

Figure 12: Finite Sample Power, θ = 0.1, γ2,2 = −0.5

1.6

1.8

2

1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0 -5

-4.5

-4

-3.5

-3

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

3

3.5

xi LF

Bonf-Min, LF

Figure 13: Finite Sample Power, θ = 0.3, γ2,2 = 0.9

4

4.5

5