Finite sample properties of confidence intervals centered on a model averaged estimator

Finite sample properties of confidence intervals centered on a model averaged estimator

Journal of Statistical Planning and Inference xxx (xxxx) xxx Contents lists available at ScienceDirect Journal of Statistical Planning and Inference...

553KB Sizes 0 Downloads 10 Views

Journal of Statistical Planning and Inference xxx (xxxx) xxx

Contents lists available at ScienceDirect

Journal of Statistical Planning and Inference journal homepage: www.elsevier.com/locate/jspi

Finite sample properties of confidence intervals centered on a model averaged estimator ∗

Paul Kabaila a , , A.H. Welsh b , Christeen Wijethunga a a b

Department of Mathematics and Statistics, La Trobe University, Victoria 3086, Australia Research School of Finance, Actuarial Studies and Statistics, The Australian National University, ACT 2601, Australia

article

info

Article history: Received 18 June 2019 Received in revised form 29 October 2019 Accepted 29 October 2019 Available online xxxx Keywords: Coverage Model averaged confidence interval Scaled expected length

a b s t r a c t We examine confidence intervals centered on the frequentist model averaged estimator proposed by Buckland et al. (1997). We consider two formulas for the standard error of this estimator: the estimate put forward by Buckland et al. (1997) of their formula (9) and the square root of formula (6.12) of Burnham and Anderson (2002). We also consider four procedures that have been suggested in the literature for obtaining the half-width of the confidence interval from the chosen standard error. We assess the exact finite sample performances of the eight resulting confidence intervals using a simple testbed situation consisting of two nested linear regression models. This is done by deriving exact expressions for the confidence intervals and then for the coverages and scaled expected lengths of these confidence intervals. We also explore the performances of these confidence intervals in the limit as the residual degrees of freedom diverges to infinity. © 2019 Elsevier B.V. All rights reserved.

1. Introduction Buckland et al. (1997) proposed a frequentist model averaged estimator of a general scalar parameter that is a weighted average of estimators obtained under different models. The data-based model weights were constructed by exponentiating an information criterion, such as the Akaike Information Criterion (AIC), see Buckland et al. (1997, pp. 605–606). This kind of model weighting has been adopted in much of the later literature (Fletcher and Dillingham, 2011; Fletcher and Turek, 2011). We examine confidence intervals centered on this model averaged estimator. We consider two formulas for the standard error of this estimator: the estimate put forward by Buckland et al. (1997) of their formula (9) and the square root of formula (6.12) of Burnham and Anderson (2002). We also consider four procedures that have been suggested in the literature for obtaining the half-width of the confidence interval from the chosen standard error. These procedures include the use of ‘‘model averaged degrees of freedom’’ as suggested by Lukacs et al. (2010) and the procedure proposed by Burnham and Anderson (2002, p. 164). As evidenced by the four R packages (briefly reviewed in Section 7) that implement these confidence intervals, the resulting eight confidence intervals seem to be widely used in practice in ecological statistics (see also Fletcher, 2018). Hjort and Claeskens (2003, Section 4.3) and Claeskens and Hjort (2008, Section 7.5.1) criticized the confidence interval centered on a model averaged estimator and with half-width proportional to the standard error given by the estimate put forward by Buckland et al. (1997) of their formula (9). In the context of a general regression model, which includes linear ∗ Corresponding author. E-mail address: [email protected] (P. Kabaila). https://doi.org/10.1016/j.jspi.2019.10.004 0378-3758/© 2019 Elsevier B.V. All rights reserved.

Please cite this article as: P. Kabaila, A.H. Welsh and C. Wijethunga, Finite sample properties of confidence intervals centered on a model averaged estimator. Journal of Statistical Planning and Inference (2019), https://doi.org/10.1016/j.jspi.2019.10.004.

2

P. Kabaila, A.H. Welsh and C. Wijethunga / Journal of Statistical Planning and Inference xxx (xxxx) xxx

regression and logistic regression as particular cases, they showed that in large samples the actual coverage probability of this confidence interval can fall substantially below the desired coverage. The analyses of Hjort and Claeskens (2003) and Claeskens and Hjort (2008) do not seem to have had much impact in applied fields. Although the conclusions are clear and hold for very general regression models, the results themselves are complicated, difficult to follow and, as large sample results, deemed not very relevant to practice. Kabaila et al. (2016) set up a very simple testbed situation for evaluating the exact finite sample frequentist properties of model averaged confidence intervals. This testbed involves computing a confidence interval by model averaging over two nested linear regression models with unknown error variance, and then computing the coverage probability and scaled expected length properties of this confidence interval. The scaled expected length is the expected length of the model averaged confidence interval divided by the expected length of the standard confidence interval (with the same minimum coverage probability and for the same parameter) computed under the full model. Its computation gives far more insight than the coverage alone, allowing us for example to see when good coverage is obtained at the expense of excessive length. The testbed was used by Kabaila et al. (2016) to evaluate both the model averaged profile likelihood confidence interval of Fletcher and Turek (2011) and the model averaged tail area confidence interval of Turek and Fletcher (2012). This testbed was also used by Kabaila et al. (2017) to further evaluate the tail area confidence interval of Turek and Fletcher (2012). These papers showed that the tail area interval performs quite well provided that we do not put too much weight on the simpler of the two models. On the other hand, as Kabaila et al. (2016) illustrate numerically, just as with other profile likelihood based methods, the model averaged profile likelihood interval will perform poorly when the number of nuisance parameters is not small compared to the sample size. Our main aim is to analyze the exact finite sample properties of the eight confidence intervals centered on the model averaged estimator in the testbed situation of two nested linear regression models with unknown error variance. We also explore the performances of these confidence intervals in the limit as the residual degrees of freedom diverges to infinity. We derive computationally convenient exact formulas for the finite sample coverage probabilities and scaled expected lengths of these confidence intervals. These formulas are valid for a wide range of model selection criteria, that includes the Bayesian Information Criterion (BIC), in addition to AIC. However, the results of Kabaila et al. (2016, 2017) and Kabaila (2018) suggest that BIC weights put too much weight on the simpler model, producing confidence intervals with poorer performance than the AIC weights. For this reason, the numerically computed results that we present for the coverage and scaled expected length are restricted to AIC weights. We define the testbed situation and the parametrization we use in Section 2. The first confidence interval, with nominal coverage 1 − α , centered on a model averaged estimator that we consider is denoted by J and is obtained as follows. The half-width of this interval is equal to the 1 − α/2 quantile of the t distribution (with degrees of freedom equal to the residual degrees of freedom of the full model) multiplied by the standard error given by the estimate put forward by Buckland et al. (1997) of their formula (9). The eight confidence intervals that we consider are similar and are assessed in the same way. Consequently, we first treat the confidence interval J in great detail. In Section 3, we obtain explicit expressions for the model averaged estimator and the standard error given by the estimate put forward by Buckland et al. (1997) of their formula (9), in the testbed situation. These expressions together enable us to obtain an explicit expression for the confidence interval J in the testbed situation. In Section 4, we then derive exact expressions for the coverage probability and scaled expected length of the confidence interval J in the testbed situation. These expressions have allowed us to, in effect, make findings that are valid for all design matrices, all parameters of interest and all parameters that, when set to zero, specify the simpler model. We present numerical results for small and medium-sized residual degrees of freedom m under various parameter settings in Section 5. We then consider the limiting case as m → ∞, with the dimension of the regression parameter vector fixed, in Section 6 and compare these with the large sample results of Hjort and Claeskens (2003) in Section 6.5. In Section 7 we describe the differences that are needed to assess in the same way the performances of the other seven confidence intervals centered on a model averaged estimator. We conclude with a discussion in Section 8. 2. Testbed situation and parametrization Our testbed situation involves two nested linear regression models with unknown error variance which we call the full model M2 , and a simpler model M1 . The full model M2 is given by y = X β + ε, where y is an n-vector of random responses, X is an n × p matrix with known, linearly independent columns, β is a p-vector of unknown parameters and ε is an n-vector of random errors with an N(0, σ 2 I ) distribution in which σ 2 is an unknown positive parameter. We assume throughout that p is given, where p ≥ 3. In Sections 3–5 we also assume that n is given. In Section 6, on the other hand, we consider the case that n → ∞. Suppose that the parameter of interest is θ = a⊤ β, where a is a specified p-vector (a ̸= 0). To define the simpler model, we define another parameter τ = c ⊤ β − t, where the vector c and the number t are specified and a and c are linearly independent. ( )⊤The ( model ) M1 is M2 with τ = 0. Let ˆ β = (X ⊤ X )−1 X ⊤ y denote the least squares estimator of β and ˆ σ 2 = y − Xˆ β y − Xˆ β /m, where m(=) n − p, the usual unbiased estimator of σ 2 . We set ˆ θ = a⊤ˆ β and ˆ τ = c ⊤ˆ β − t. Define the known quantities vθ = var ˆ θ /σ 2 =

a⊤ (X ⊤ X )−1 a, vτ = var ˆ τ /σ 2 = c ⊤ (X ⊤ X )−1 c and ρ = corr ˆ θ, ˆ τ = a⊤ (X ⊤ X )−1 c /{a⊤ (X ⊤ X )−1 a c ⊤ (X ⊤ X )−1 c }1/2 . Finally,

( ) ( ) ( 1/2 ) ( 1/2 ) let γ = τ / σ vτ and ˆ γ =ˆ τ/ ˆ σ vτ . Note that γ is a measure of the closeness of the models M1 and M2 and ˆ γ is an estimator of that measure.

Please cite this article as: P. Kabaila, A.H. Welsh and C. Wijethunga, Finite sample properties of confidence intervals centered on a model averaged estimator. Journal of Statistical Planning and Inference (2019), https://doi.org/10.1016/j.jspi.2019.10.004.

P. Kabaila, A.H. Welsh and C. Wijethunga / Journal of Statistical Planning and Inference xxx (xxxx) xxx

3

3. The model averaged estimator and its standard error 3.1. The model averaged estimator ˜ θ Following Buckland et al. (1997, p. 604), the model averaged estimator over the class of models {M1 , M2 } is ˜ θ = θ2 , where ˆ θ1 and ˆ θ2 are estimators of θ under the models M1 and M2 respectively and w1 and w2 , satisfying θ1 + w2ˆ w1ˆ w1 + w2 = 1, are the data-based weights for the models M1 and M2 respectively. We can take ˆ θ2 = ˆ θ = a⊤ˆ β. From Kabaila and Giri (2009a, p. 3421), cov(ˆ θ ,ˆ τ)

ˆ θ1 = ˆ θ−

var(ˆ τ)

ˆ τ =ˆ θ−

cov(ˆ θ, ˆ τ)

σ vτ 2

ˆ τ =ˆ θ−

cov(ˆ θ, ˆ τ) 1/2

1/2

σ vθ σ vτ

1/2



ˆ τ

1/2



1/2

=ˆ θ −ρ

vθ ˆ τ 1/2



,

(1)

so we can write 1/2 1/2 ) ( v ˆ τ v ˆ τ ˜ θ − ρ θ 1/2 w1 . θ = (1 − w1 )ˆ θ + w1 ˆ θ − ρ θ 1/2 = ˆ vτ vτ

(2)

Buckland et al. (1997, p. 606) defined the model weights to be exp −AIC(1)/2

(

w1 =

)

( ) ( )= exp −AIC(1)/2 + exp −AIC(2)/2

1 1 + exp

( ( 1 2

AIC(1) − AIC(2)

)) ,

with w2 = 1 − w1 , where AIC(k) is Akaike Information Criterion for model Mk . In a slight generalization, we replace the Akaike Information Criterion by the Generalized Information Criterion GIC(k) = −2Lk + d (p + k − 1), where Lk is the maximized log-likelihood for model Mk (k = 1, 2), d = 2 for AIC, d = ln(n) for BIC and d = 2n/(n−p−k) for AICc . That is, our weights contain the Akaike Information Criterion weights as a special case. The maximum log-likelihood for model Mk is Lk = −

n 2

(

ln 2π

RSSk

)

n

n

− , 2

where RSSk denotes the residual sum of squares for model Mk (k = 1, 2). Thus

( GIC(k) = constant + n ln

RSSk

)

n

+ d (p + k − 1).

Now RSS2 = m σˆ 2 and, using the results stated by Graybill (1976, p. 222), it may be shown that RSS1 =

ˆ τ2 + m σˆ 2 . vτ

Therefore

( GIC(1) = constant + n ln

mσˆ 2 + ˆ τ 2 /vτ n

) + dp

and

( GIC(2) = constant + n ln

mσˆ 2

)

n

+ d (p + 1).

Hence 1

w1 = 1 + exp

( ( 1 2

GIC(1) − GIC(2)

1

)) =

1+

1+

ˆ γ

2

(

)n/2

m

. exp −d/2

(

)

It is convenient to define 1

w1 (x) =

( 1+

1+

x2 m

)n/2

1

=

( ) exp −d/2

( 1+

1+

x2 m

)(m+p)/2

,

(3)

( ) exp −d/2

where we have written w1 (x) as a function of x, m, p and d. Let k(x) = x w1 (x) so we can write (2) as

) 1/2 ( 1/2 1/2 ˜ θ =ˆ θ − ρ vθ ˆ τ /vτ1/2 w1 (ˆ γ) =ˆ θ − ρ vθ ˆ σˆ γ w1 (ˆ γ) =ˆ θ − ρ vθ ˆ σ k(ˆ γ ). Please cite this article as: P. Kabaila, A.H. Welsh and C. Wijethunga, Finite sample properties of confidence intervals centered on a model averaged estimator. Journal of Statistical Planning and Inference (2019), https://doi.org/10.1016/j.jspi.2019.10.004.

4

P. Kabaila, A.H. Welsh and C. Wijethunga / Journal of Statistical Planning and Inference xxx (xxxx) xxx

3.2. The standard error of the model averaged estimator ˜ θ We use the estimate put forward by Buckland et al. (1997) of their formula (9) to find the standard error of the model averaged estimator ˜ θ: 2 ∑



)2

(

estimate of var(ˆ θk |k), assuming Mk is true + ˆ θk − ˜ θ .

wk

(

)

(4)

k=1

Buckland et al. (1997) do not make it entirely clear whether var(ˆ θk |k) denotes (a) the variance of ˆ θk , assuming that the model Mk is true, or (b) the variance of ˆ θk assuming the largest model. Fortunately, in the testbed scenario that we consider, we obtain exactly the same formula for (4), irrespective of which of the interpretations, (a) or (b), are used. We have variance of ˆ θ2 = variance of ˆ θ2 assuming M2 is true = σ 2 vθ .

(

)

(

)

(5)

It may be shown that variance of ˆ θ1 = variance of ˆ θ1 assuming M1 is true = σ 2 vθ (1 − ρ 2 ).

(

)

(

)

(6)

The usual estimator of (5), assuming that M2 is the true model, is ˆ σ vθ . The usual estimator of (6), assuming that M1 is the true model, is 2

mˆ σ2 + ˆ τ 2 /vτ

(

m+1

) vθ (1 − ρ 2 ).

Now for the remaining terms,

( ) ) ( ˆ θ1 − ˆ θ2 θ1 − ˜ θ =ˆ θ1 − w1ˆ θ1 + w2ˆ θ2 = (1 − w1 )ˆ θ1 − w2ˆ θ2 = w2 ˆ ) ( and, similarly, ˆ θ2 − ˜ θ = w1 ˆ θ2 − ˆ θ1 . Hence ( )2 ( )2 )2 ( )2 ( ˆ ˆ θ1 − ˆ θ2 . and θ2 − ˜ θ = w12 ˆ θ1 − ˆ θ2 θ1 − ˜ θ = w22 ˆ 1/2 1/2 1/2 ˆ τ /vτ = −ρ ˆ σ vθ ˆ γ . Thus ( )2 ( )2 ˆ θ1 − ˜ θ = w22 ρ 2 ˆ σ 2 vθ ˆ γ 2 = (1 − w1 )2 ρ 2 ˆ σ 2 vθ ˆ γ 2 and ˆ θ2 − ˜ θ = w12 ρ 2 ˆ σ 2 vθ ˆ γ 2.

By (1), ˆ θ1 − ˆ θ2 = ˆ θ1 − ˆ θ = −ρ vθ

Hence, in the testbed situation, the estimator of (9) put forward by Buckland et al. (1997) leads to the standard error

( )1/2 ( 2 ) ( ) mˆ ( ))2 2 2 ( σ2 + ˆ τ /vτ 2 2 γ w1 ˆ γ ρ ˆ σ vθ ˆ γ vθ (1 − ρ ) + 1 − w1 ˆ m+1 )1/2 ( )) ( 2 ( ( ) 2 2 γ . + 1 − w1 ˆ ˆ σ vθ + w12 ˆ γ ρ ˆ σ vθ ˆ γ2 ( ) 1/2 We write this as ˆ σ vθ r ˆ γ , ρ , where ( )1/2 ( )1/2 m + x2 r(x, ρ ) = w1 (x) (1 − ρ 2 ) + (1 − w1 (x))2 ρ 2 x2 + (1 − w1 (x)) 1 + w12 (x) ρ 2 x2 . m+1

(7)

3.3. The confidence interval J for θ We consider the confidence interval J for θ , centered on ˜ θ and with nominal coverage 1 − α , given by the following formula: 1/2 θ ± tm,1−α/2 ˆ σ vθ r ˆ J = ˜ γ,ρ

[

(

)]

[ )] 1/2 ( ) 1/2 ( θ −ρˆ σ vθ k ˆ = ˆ γ ± tm,1−α/2 ˆ σ vθ r ˆ γ,ρ .

(8)

Now k(x) → 0 and r(x, ρ ) → 1, as x → ∞. Therefore J has the attractive property that when the data and the simpler model are highly discordant, as evidenced by a large value of |ˆ γ |, it effectively reduces to the usual confidence interval, with coverage 1 − α , based on the full model M2 . We can write

( 1/2 ) ( )( ) / ˆ γ =ˆ τ/ ˆ σ vτ = {ˆ τ / σ vτ1/2 } σ /ˆ σ =˜ γ W, ( 1/2 ) where ˜ γ =ˆ τ / σ vτ and W = ˆ σ /σ . To find convenient formulas for the coverage ( ) probability and the expected length of this confidence interval, we express all quantities of interest in terms of ˆ θ, ˜ γ and W . Therefore [ ) )] 1/2 ( 1/2 ( J = ˆ θ − ρ σ W vθ k ˜ γ /W ± tm,1−α/2 σ W vθ r ˜ γ /W , ρ . Please cite this article as: P. Kabaila, A.H. Welsh and C. Wijethunga, Finite sample properties of confidence intervals centered on a model averaged estimator. Journal of Statistical Planning and Inference (2019), https://doi.org/10.1016/j.jspi.2019.10.004.

P. Kabaila, A.H. Welsh and C. Wijethunga / Journal of Statistical Planning and Inference xxx (xxxx) xxx

) ( θ,˜ γ and W are independent and Note that ˆ [ ] ([ ] [ 2 ]) ˆ θ σ vθ ρ σ vθ 1/2 θ ∼N , . γ ˜ γ ρ σ vθ 1/2 1

5

(9)

We denote the pdf of W by fW . 4. The coverage probability and scaled expected length of J 4.1. Coverage probability A computationally convenient exact formula for the coverage probability of the confidence interval J for θ is given in the following theorem. Theorem 1. The coverage probability of the confidence interval J, with nominal coverage 1 − α , is a function of (γ , ρ ) so we denote this coverage probability by CP(γ , ρ ). Let

( ) ( ) ( ) ( ) ℓ(γ , w, ρ ) = ρ w k γ /w − tm,1−α/2 w r γ /w, ρ and u(γ , w, ρ ) = ρ w k γ /w + tm,1−α/2 w r γ /w, ρ . Then CP(γ , ρ ) =



∫ 0





( ) Ψ ℓ(y + γ , w, ρ ), u(y + γ , w, ρ ); ρ y, 1 − ρ 2 φ (y) dy fW (w ) dw,

(10)

−∞

where Ψ (a, b; µ, v ) = P(a ≤ Z ≤ b) for Z ∼ N(µ, v ). For every given ρ , CP(γ , ρ ) is an even function of γ and, for every given γ , CP(γ , ρ ) is an even function of ρ . The proof of this result is given in Appendix A.1. It follows that, for given m and p, we are able to describe the coverage probability of J using only the parameters |ρ| and |γ |. An application of the Lebesgue dominated convergence theorem shows that CP(γ , ρ ) → 1 − α as γ → ∞, for every given ρ . 4.2. Scaled expected length A computationally convenient exact formula for the scaled expected length of the confidence interval J for θ is given in the following theorem. Theorem 2. The scaled expected length of the confidence interval J, with nominal coverage 1 − α , is a function of (γ , ρ ) so we denote this scaled expected length by SEL(γ , ρ ). Then SEL(γ , ρ ) =

tm,1−α/2 tm,(1+cmin )/2

Γ (m/2)

( m )1/2 2





Γ (m + 1)/2

(

)

0





wr −∞

(

y+γ

w



)

φ (y) dy fW (w) dw.

(11)

For every given ρ , SEL(γ , ρ ) is an even function of γ and, for every given γ , SEL(γ , ρ ) is an even function of ρ . The proof of this result is given in Appendix A.2. It follows that, for given m and p, we are able to describe the scaled expected length using only the parameters |ρ| and |γ |. 5. Numerical results for small and medium-sized m We focus on the properties of the confidence interval J, with nominal coverage 0.95, computed using AIC weights (d = 2). We constructed a number of plots of the coverage probability and scaled expected length of J against |γ | for different values of m ∈ {1, 2, 3, 10}, p ∈ {3, 6, 12, 24} and |ρ| ∈ {0.2, 0.5, 0.7, 0.9}. Some explanation of how we carried out the calculations for these plots is included in the Supplementary Material. We present here a selection of these plots; additional plots are included in the Supplementary Material. All of the computations presented in this paper were carried out using R (R Core Team, 2018). Recall that in (3), we expressed the weights used in the model averaged estimator in terms of m and p; since m = n − p, we can use any pair of m, p and n = m + p and we choose to use m and p. Generally, the plots show that the coverage of J approaches the nominal coverage as |γ | increases whereas the scaled expected length of J does not necessarily approach 1 (as we might hope) as |γ | increases. The minimum coverage probability of J is a decreasing continuous function of |ρ|. Also, the minimum coverage probability of J is a decreasing function of m. When m = 1, the coverage probability is extremely close to the nominal coverage, for any given |ρ|. Consider Fig. 1 for m = 1 and for |ρ| = 0.5 (left half) and 0.9 (right half). The minimum coverage probability of J is very close to the nominal coverage 0.95. The scaled expected length of J is substantially less than 1 when γ = 0 and, although the scaled expected length of J does not converge to 1 as |γ | → ∞, the maximum value of the scaled expected length of J is not too much larger than 1. The results for different p ∈ {3, 6, 12, 24} are similar. In these cases, when m = 1, the model averaged confidence interval J has good properties. Please cite this article as: P. Kabaila, A.H. Welsh and C. Wijethunga, Finite sample properties of confidence intervals centered on a model averaged estimator. Journal of Statistical Planning and Inference (2019), https://doi.org/10.1016/j.jspi.2019.10.004.

6

P. Kabaila, A.H. Welsh and C. Wijethunga / Journal of Statistical Planning and Inference xxx (xxxx) xxx

Fig. 1. Coverage probability and scaled expected length of the confidence interval J, with nominal coverage 0.95, computed with AIC weights (d = 2) and m = 1 for |ρ| = 0.5 (left half) and |ρ| = 0.9 (right half).

Fig. 2 for m = 10 and for |ρ| = 0.5 (left half) and 0.9 (right half) shows that the minimum coverage probability of J is much lower than the nominal coverage 0.95 and the scaled expected length of J can be much larger than 1. The scaled expected length of J has a maximum value that is an increasing function of |ρ|, that can be much larger than 1 for |ρ| large and m not small. That is, the performance of the confidence interval J deteriorates as m increases. We finish this section with a real data example. Consider the real data described in Section 3 of Kabaila and Leeb (2006). For this data, n = 30 and p = 10. Suppose that θ = β8 and τ = β9 , so that |ρ| = 0.88. In this case, the minimum coverage probability of the confidence interval J, with nominal coverage 0.95, is 0.8632. In addition, the scaled expected length of this confidence interval (a) takes the value 1.0429 when the simpler model is correct (i.e. when γ = 0) and (b) has maximum value 1.3778. Also, the scaled expected length takes its minimum value when the simpler model is correct. In other words, the scaled expected length is never less than 1. This shows that J is outperformed by the standard confidence interval, with coverage 0.95, based on the full model. 6. The case that p is fixed and m = n − p → ∞ We assume throughout the paper that p is given, where p ≥ 3. We consider the case that n → ∞, so that m = n − p → ∞. We describe the limiting behavior of the confidence interval J and the limits of the coverage probability and scaled expected length, as m → ∞. Fully rigorous proofs of the results are laborious and are not included in this paper; figures in the Supplementary Material confirm numerically that the stated limits hold. Assume that D = limn→∞ X ⊤ X /n exists and is nonsingular. Recall that vθ = a⊤ (X ⊤ X )−1 a and

ρ=

a⊤ (X ⊤ X )−1 c

{a⊤ (X ⊤ X )−1 a c ⊤ (X ⊤ X )−1 c }1/2

.

Although not made explicit in the notation, vθ and ρ are functions of n. Let

ρ¯ =

a⊤ D−1 c

{a⊤ D−1 a

c ⊤ D−1 c }1/2

.

Please cite this article as: P. Kabaila, A.H. Welsh and C. Wijethunga, Finite sample properties of confidence intervals centered on a model averaged estimator. Journal of Statistical Planning and Inference (2019), https://doi.org/10.1016/j.jspi.2019.10.004.

P. Kabaila, A.H. Welsh and C. Wijethunga / Journal of Statistical Planning and Inference xxx (xxxx) xxx

7

Fig. 2. Coverage probability and scaled expected length of the confidence interval J, with nominal coverage 0.95, computed with AIC weights (d = 2) and m = 10 for |ρ| = 0.5 (left half) and |ρ| = 0.9 (right half).

Note that vθ ↓ 0 and ρ → ρ¯ as n → ∞. 6.1. Limiting behavior of the confidence interval J Let 1

w1∗ (x) =

( 1 + exp

x2 − d

)

2

and k (x) = x w1 (x). Also let ∗



(

)2

r ∗ (x, ρ¯ ) = w1∗ (x) 1 − ρ¯ 2 + 1 − w1∗ (x) ρ¯ 2 x2

(

)1/2

)1/2 ( )( ( )2 + 1 − w1∗ (x) 1 + w1∗ (x) ρ¯ 2 x2 .

Finally, let 1/2 1/2 J∗ = ˆ θ − ρ¯ σ vθ k∗ ˜ γ ± z1−α/2 σ vθ r ∗ ˜ γ , ρ¯

[

( )

(

)]

.

(12)

This interval describes the limiting behavior of the confidence interval J in the sense described below. For each fixed x ∈ R, 1

w1 (x) = 1+

x /2 2

( 1+

m/2

)m/2 ( 1+

x2 m

)p/2

→ w1∗ (x) exp −d/2

(

)

as m → ∞. Hence, for each fixed x ∈ R, k(x) = x w1 (x) → k∗ (x) as m → ∞. Finally, for each fixed x ∈ R, r(x, ρ ) → r ∗ (x, ρ¯ ) as m → ∞. Please cite this article as: P. Kabaila, A.H. Welsh and C. Wijethunga, Finite sample properties of confidence intervals centered on a model averaged estimator. Journal of Statistical Planning and Inference (2019), https://doi.org/10.1016/j.jspi.2019.10.004.

8

P. Kabaila, A.H. Welsh and C. Wijethunga / Journal of Statistical Planning and Inference xxx (xxxx) xxx 1/2

We compare the differences between the centers and half-widths of J and J ∗ with σ vθ , the standard deviation of ˆ θ. p

Since tm,1−α/2 → z1−α/2 and W − → 1, as m → ∞, the differences between the centers and half-widths of J and J ∗ , divided 1/2 by σ vθ , converge in probability to 0 as m → ∞. 6.2. Limiting behavior of the coverage probability It follows from the proof of Theorem 1 that the coverage probability

( (

P θ ∈J =P ℓ ˜ γ , W, ρ ≤ G ≤ u ˜ γ , W, ρ

(

)

)

(

)) ,

1/2

where G = (ˆ θ − θ )/(σ vθ ) and

( ) ( ) ℓ(˜ γ , W , ρ) = ρ W k ˜ γ /W − tm,1−α/2 W r ˜ γ /W , ρ ( ) ( ) u(˜ γ , W , ρ) = ρ W k ˜ γ /W + tm,1−α/2 W r ˜ γ /W , ρ . Let

( ) ( ) ℓ∗ (˜ γ , ρ ) = ρ k∗ ˜ γ − z1−α/2 r ∗ ˜ γ,ρ ( ) ( ) u∗ (˜ γ , ρ ) = ρ k∗ ˜ γ + z1−α/2 r ∗ ˜ γ,ρ .

(13) (14)

Thus

( ) ( ) p ( ) ( ) p ℓ˜ γ , W , ρ − ℓ∗ ˜ γ,ρ − → 0 and u ˜ γ , W , ρ − u∗ ˜ γ,ρ − → 0, as m → ∞, so that

( (

P θ ∈ J − P ℓ∗ ˜ γ , ρ ≤ G ≤ u∗ ˜ γ,ρ

(

)

)

(

))

→ 0 as m → ∞.

Let CP∗ (γ , ρ ) =





( ) Ψ ℓ∗ (y + γ , ρ ), u∗ (y + γ , ρ ); ρ y, 1 − ρ 2 φ (y) dy

(15)

−∞

Since the distribution of G conditional on ˜ γ = h is N ρ (h − γ ), 1 − ρ 2 ,

(

( (

P ℓ∗ ˜ γ , ρ ≤ G ≤ u∗ ˜ γ,ρ



)



(

( (

)

)) ) ⏐⏐

)

γ = h φ (h − γ ) dh P ℓ ∗ h , ρ ≤ G ≤ u∗ h , ρ ⏐ ˜

=

)

(

∫−∞ ∞ ( ( ) ( )) ( ) P ℓ ∗ h, ρ ≤ ˜ G ≤ u∗ h, ρ φ (h − γ ) dh, = where ˜ G ∼ N ρ (h − γ ), 1 − ρ 2 , ∫−∞ ( ) ∞ = Ψ ℓ∗ (h, ρ ), u∗ (h, ρ ); ρ (h − γ ), 1 − ρ 2 φ (h − γ ) dh −∞

= CP∗ (γ , ρ ). ( ) Consequently, P θ ∈ J → CP∗ (γ , ρ¯ ) as m → ∞. 6.3. Limiting behavior of the scaled expected length We have shown that the scaled expected length of the confidence interval J is

Γ (m/2)

( m )1/2

tm,1−α/2 tm,(1+cmin )/2

2





Γ (m + 1)/2

(

)

0





wr

(

−∞

y+γ

w



)

φ (y) dy fW (w) dw.

(16)

It follows from 6.1.47 of Abramowitz and Stegun (1964, p. 257) that

( m )1/2 2

Γ (m/2) ( ) −→ 1 Γ (m + 1)/2

as

m → ∞.

∗ Also, tm,1−α/2 → z1−α/2 and tm,(1+cmin )/2 − z(1+c ∗ )/2 → 0, as m → ∞, where cmin is CP∗ (γ , ρ¯ ) minimized with respect to min γ ≥ 0. In addition, ∞

∫ 0





wr −∞

(

y+γ

w



)

φ (y) dy fW (w) dw −





r ∗ y + γ , ρ φ (y) dy → 0,

(

)

−∞

Please cite this article as: P. Kabaila, A.H. Welsh and C. Wijethunga, Finite sample properties of confidence intervals centered on a model averaged estimator. Journal of Statistical Planning and Inference (2019), https://doi.org/10.1016/j.jspi.2019.10.004.

P. Kabaila, A.H. Welsh and C. Wijethunga / Journal of Statistical Planning and Inference xxx (xxxx) xxx

9

as m → ∞. Therefore the difference between the scaled expected length of the confidence interval J and SEL∗ (γ , ρ ) approaches 0 as m → ∞, where SEL∗ (γ , ρ ) =

z1−α/2 z(1+c ∗

min

)/2





r ∗ y + γ , ρ φ (y) dy.

(

)

(17)

−∞

Consequently, the scaled expected length of J converges to SEL∗ (γ , ρ¯ ) as m → ∞. 6.4. Some numerical results for large m For ρ¯ = 0, the interval J ∗ reduces to the standard 1 − α confidence interval for θ based on the full model, assuming that σ 2 is known. Consistently with this fact, for ρ¯ = 0, CP∗ (γ , ρ¯ ) = 1 − α and SEL∗ (γ , ρ¯ ) = 1 for all γ . As |ρ| ¯ increases, CP∗ (γ , ρ¯ ) and SEL∗ (γ , ρ¯ ) increasingly differ from these values. We first did some calculations to explore empirically the reasonableness of the limiting results stated above. In particular, for the confidence interval J, with nominal coverage 0.95 and constructed using AIC weights (d = 2), we constructed figures showing the coverage probability and the scaled expected length in the case p = 4 and |ρ| = 0.2, 0.5, 0.7, 0.9 at different values of m = 10, 50, 200, ∞. These figures (included in the Supplementary Material) show the convergence of the coverage probability and scaled expected length to the limits stated above as m increases. For all the values of |ρ| we considered, the exact results for m = 200 are very close to the limiting results, indicating that the asymptotic results are useful at this value. These figures also show that, other than for small |ρ|, the performance of the confidence interval J deteriorates in terms of both coverage probability and scaled expected length as m → ∞. For the confidence interval J, with nominal coverage 0.95 and constructed using AIC weights (d = 2), we present the coverage probability and the scaled expected length (left half of Fig. 3) in the limiting case m → ∞ with p fixed (p ≥ 3) at different values of |ρ| ¯ = 0.2, 0.5, 0.7, 0.9. These figures quantify how the performance of J deteriorates with increasing |ρ| ¯ . As expected, for small |ρ| ¯ , the asymptotic coverage is the same as the nominal coverage and the scaled expected length is 1. However, with large |ρ| ¯ , the minimum asymptotic coverage of the confidence interval, with nominal coverage 0.95, can be as low as 0.83, even though the scaled expected length is well above 1 for all |γ |. 6.5. Comparison with asymptotic results of Hjort and Claeskens (2003) Hjort and Claeskens (2003, p. 886) consider two nested general regression models: the full model (which they call the extended model) and the simpler model (which they call the narrow model), where the simpler model is obtained from the full model by setting a scalar parameter to a given value. In the solid line curves in their Fig. 2, Hjort and Claeskens (2003, p. 886) present the limiting coverage of the Buckland et al. (1997) confidence interval using the standard error based on formula (9) of Buckland et al. (1997) in the following context. They consider two situations corresponding to two values of a parameter that they denote by ρ and which, to avoid confusion, we will denote by ρHC . In the caption of Fig. 2, Hjort and Claeskens (2003) define

ρHC =

ω K 1/2 , τ0

2 with K defined at the start of their Section 3.1, ω defined in their equation (3.2) and below their equation (4.3). In / τ0 just the Supplementary Material it is shown that ρHC , expressed in our notation, is −ρ¯ (1 − ρ¯ 2 )1/2 . It may also be shown that √ our γ corresponds to a of Hjort and Claeskens (2003). Note that when ρHC = 2/3 and ρHC = 1, our |ρ| ¯ is equal to 2/ 13 √ and 1/ 2, respectively. The right half of Fig. 3 shows the coverage probability of J using our computations in the same situations as those considered by Hjort and Claeskens (2003); we observe that the right half of our Fig. 3 is identical to the solid line curves of Figure 2 of Hjort and Claeskens (2003).

7. Other confidence intervals centered on a model averaged estimator There are a number of other confidence intervals centered on a model averaged estimator that have been proposed in the literature. Burnham and Anderson (2002) propose a formula for the standard error, given by the square root of their formula (6.12), that is different from the standard error given by the estimate put forward by Buckland et al. (1997) of their formula (9). The R package MAMI (Shomaker and Heumann, 2017) implements only the latter formula for the standard error. However, the R packages AICcmodavg (Mazerolle, 2019), glmulti (Calcagno, 2019) and MuMIn (Bartoń, 2019) implement both of these formulas for the standard error. The half-width of the confidence interval J is equal to the ‘‘multiplier’’ tm,1−α/2 multiplied by the standard error given by the estimate put forward by Buckland et al. (1997) of their formula (9). This choice of multiplier is a particular instance of the following general rule. When all the models under consideration are particular cases of the full model (including the full model itself), the multiplier is tm,1−α/2 , where m = n − p for the full model. In the next subsection, we define the confidence interval JBA , with nominal coverage 1 − α , that has half-width equal to this multiplier multiplied by the standard error given by the square root of (6.12) of Burnham and Anderson (2002). Please cite this article as: P. Kabaila, A.H. Welsh and C. Wijethunga, Finite sample properties of confidence intervals centered on a model averaged estimator. Journal of Statistical Planning and Inference (2019), https://doi.org/10.1016/j.jspi.2019.10.004.

10

P. Kabaila, A.H. Welsh and C. Wijethunga / Journal of Statistical Planning and Inference xxx (xxxx) xxx

Fig. 3. Properties of the confidence interval J computed with AIC weights (d = 2) and m → ∞, with p ≥ 3 fixed. The left half shows the graphs of the coverage probability and√scaled expected√length, for nominal coverage 0.95. The right half shows graphs of the coverage probability, for nominal coverage 0.9, with |ρ| ¯ = 1/ 2 and |ρ| ¯ = 2/ 13 (equivalent to ρHC = 1 and ρHC = 2/3, respectively, in Figure 2 of Hjort and Claeskens, 2003).

In subsection 8.2, we consider the confidence intervals obtained when the multipliers used to construct J and JBA are replaced by either z1−α/2 (resulting in the confidence intervals denoted by J(N) and JBA(N) , respectively) or the 1 − α/2 quantile of the t-distribution with the ‘‘model averaged residual degrees of freedom’’ put forward by Lukacs et al. (2010) (resulting in the confidence intervals denoted by J(L) and JBA(L) , respectively). The modification put forward by Lukacs et al. (2010) is implemented in the package glmulti. In subsection 7.3, we consider the confidence interval obtained by applying the modification to the half-width proposed on page 164 of Burnham and Anderson (2002). This modification is implemented in the packages glmulti and MuMIn. We denote the confidence intervals obtained by applying this modification to J and JBA by J(BA) and JBA(BA) , respectively. In subsection 7.4, we compare the properties of all of the confidence intervals described in the paper. Fortunately, for our testbed scenario, Theorems 1 and 2 continue to hold for all of these confidence intervals, provided that the appropriate changes in notation are made. It is for this reason that we introduce some additional notation in subsections 7.1, 7.2 and 7.3. 7.1. Standard error given by the square root of (6.12) of Burnham and Anderson (2002) If we replace the standard error found using the estimate put forward by Buckland et al. (1997) (of their ) formula (9) by 1/2 the square root of the estimate (6.12) of Burnham and Anderson (2002) then we obtain ˆ σ vθ rBA ˆ γ , ρ , where rBA (x, ρ ) denotes

(

(

m + x2

)

(

))1/2

(1 − ρ ) + (1 − w1 (x)) ρ x + (1 − w1 (x)) 1 + w ρ x . m+1 ( ) ( ) Replacing r ˆ γ , ρ by rBA ˆ γ , ρ in (8), we obtain the confidence interval, with nominal coverage 1 − α , [ ( )] 1/2 ( ) 1/2 JBA = ˆ θ −ρˆ σ vθ k ˆ γ ± tm,1−α/2 ˆ σ vθ rBA ˆ γ,ρ .

w1 (x)

2

2

2

2

2 1 (x)

2

2

(18)

Please cite this article as: P. Kabaila, A.H. Welsh and C. Wijethunga, Finite sample properties of confidence intervals centered on a model averaged estimator. Journal of Statistical Planning and Inference (2019), https://doi.org/10.1016/j.jspi.2019.10.004.

P. Kabaila, A.H. Welsh and C. Wijethunga / Journal of Statistical Planning and Inference xxx (xxxx) xxx

11

Now consider the behavior of the confidence interval JBA , for p fixed and m = n − p → ∞, under the same assumption and using the same notation as in Section 6. Let ∗ (x, ρ¯ ) = rBA

(

))1/2 ) ( ( )2 )( ( )2 ( . w1∗ (x) 1 − ρ¯ 2 + 1 − w1∗ (x) ρ¯ 2 x2 + 1 − w1∗ (x) 1 + w1∗ (x) ρ¯ 2 x2

∗ ∗ Let JBA denote the confidence interval that is obtained when we replace the function r ∗ by the function rBA in (12). The 1/2 ∗ differences between the centers and half-widths of JBA and JBA , divided by σ vθ converge in probability to 0 as m → ∞. ∗ in (13) and Let ℓ∗BA and u∗BA denote the functions that are obtained when we replace the function r ∗ by the function rBA ∗ (14), respectively. Now let CPBA (γ , ρ ) denote the quantity( that is )obtained when we replace the functions ℓ∗ and u∗ by ∗ the functions ℓ∗BA and u∗BA , respectively, in (15). Note that P θ ∈ JBA → CPBA (γ , ρ¯ ) as m → ∞. Next let SEL∗BA (γ , ρ ) denote ∗ ∗ ∗ the quantity that is obtained when we replace the function r ∗ by the function rBA in (17), where cmin denotes CPBA (γ , ρ¯ ) ∗ minimized with respect to γ ≥ 0. Note that the scaled expected length of JBA converges to SELBA (γ , ρ¯ ) as m → ∞.

7.2. Change of multiplier For large sample sizes it is natural to propose a confidence interval, with nominal coverage 1 − α , that is centered on the model averaged estimator and has half-width equal to z1−α/2 multiplied by the chosen formula for the standard error. This confidence interval is implemented in all of the packages MAMI, AICcmodavg, glmulti, and MuMIn. Let J(N) and JBA(N) denote the confidence intervals J and JBA , respectively, but with tm,1−α/2 replaced by z1−α/2 in the formulas (8) and (18), respectively. Obviously, J(N) and JBA(N) have smaller coverage and expected length than J and JBA , respectively. It seems reasonable to use a different multiplier when the sample size is not large. Lukacs et al. (2010) suggest that the multiplier be the 1 − α/2 quantile of a t-distribution with ‘‘model averaged residual degrees of freedom’’. Let J(L) and JBA(L) denote the confidence intervals ( m ) in tm,1−α/2 replaced by the ‘‘model averaged ( )) but with the ( ) J and JBA ,(respectively, γ . γ m = m + w1 ˆ γ (m + 1) + 1 − w1 ˆ residual degrees of freedom’’ w1 ˆ The confidence interval J(L) is obtained by replacing the function r in the formula for J by r(L) , where r(L) (x, ρ ) =

tm+w1 (x),1−α/2 tm,1−α/2

r(x, ρ ).

The confidence interval JBA(L) is obtained by replacing the function rBA in the formula (18) for JBA by rBA(L) , where rBA(L) (x, ρ ) =

tm+w1 (x),1−α/2 tm,1−α/2

rBA (x, ρ ).

7.3. The modification to the half-width proposed on page 164 of Burnham and Anderson (2002) We now consider the modification to the half-width that was proposed by Burnham and Anderson (2002, p. 164). Let J(BA) denote the confidence interval that is obtained by replacing the function r by the function r(BA) in the expression (8) for J, where r(BA) (x, ρ ) is defined to be

w1 (x)

((

tm+1,1−α/2

)2

tm,1−α/2

( + (1 − w1 (x)) 1 +

m + x2 m+1

(

(1 − ρ ) +

z1−α/2

2

)2

tm,1−α/2

(

z1−α/2

)2

tm,1−α/2

)1/2 (1 − w1 (x)) ρ x 2

2

2

)1/2 w

2 1 (x)

ρ x 2

.

2

The confidence interval J(BA) incorporates the modification, proposed by Burnham and Anderson (2002, p. 164), of the standard error obtained by using the estimate put forward by Buckland et al. (1997) of their formula (9). Let JBA(BA) denote the confidence interval that is obtained by replacing the function rBA by the function rBA(BA) in the expression (18) for JBA , where rBA(BA) (x, ρ ) is defined to be

( w1 (x)

((

tm+1,1−α/2 tm,1−α/2

( + (1 − w1 (x)) 1 +

(

)2

m + x2 m+1

z1−α/2 tm,1−α/2

(1 − ρ ) +

)2

2

(

z1−α/2 tm,1−α/2

)2

) (1 − w1 (x)) ρ x 2

2

2

) ) 1/2 w

2 1 (x)

ρ x 2

2

.

The confidence interval JBA(BA) incorporates the modification, proposed by Burnham and Anderson (2002, p. 164), of the standard error given by the square root of the estimate (6.12) of Burnham and Anderson (2002). Please cite this article as: P. Kabaila, A.H. Welsh and C. Wijethunga, Finite sample properties of confidence intervals centered on a model averaged estimator. Journal of Statistical Planning and Inference (2019), https://doi.org/10.1016/j.jspi.2019.10.004.

12

P. Kabaila, A.H. Welsh and C. Wijethunga / Journal of Statistical Planning and Inference xxx (xxxx) xxx

7.4. Comparison of the confidence intervals described in the paper Note that k(x) → 0 and rBA (x, ρ ) → 1, r(L) (x, ρ ) → 1, rBA(L) (x, ρ ) → 1, r(BA) (x, ρ ) → 1 and rBA(BA) (x, ρ ) → 1, as x → ∞. Therefore, just like J, all of the confidence intervals JBA , J(L) , JBA(L) JBA(BA) have the attractive property that when the data and the simpler model are highly discordant, as evidenced by a large value of |ˆ γ |, they effectively reduce to the usual confidence interval, with coverage 1 − α , based on the full model M2 . An application of the Lebesgue dominated convergence theorem shows that all of these confidence intervals also have the related attractive property that their coverage probabilities approach 1 − α as γ → ∞, for every given ρ . Neither J(N) nor JBA(N) has either of these attractive properties. For this reason, these two confidence intervals are of little interest. As noted by Burnham and Anderson (2002), the Cauchy–Schwarz inequality implies that rBA (x, ρ ) ≥ r(x, ρ ). Hence the coverage probability and expected length of JBA cannot be less than the corresponding quantities for J. Similarly, the Cauchy–Schwarz inequality implies that rBA(BA) (x, ρ ) ≥ rBA (x, ρ ). Hence the coverage probability and expected length of JBA(BA) cannot be less than the corresponding quantities for JBA . Theorems 1 and 2 continue to hold when the confidence interval J and the function r are replaced by any of the following: The confidence interval JBA and the function rBA , respectively. The confidence interval J(L) and the function r(L) , respectively. The confidence interval JBA(L) and the function rBA(L) , respectively. The confidence interval J(BA) and the function r(BA) , respectively. The confidence interval JBA(BA) and the function rBA(BA) , respectively. The proofs of these new theorems follow mutatis mutandis. Not surprisingly, J, J(N) , J(L) and J(BA) all share the behavior, for p fixed and m → ∞, that is described in Section 6. Similarly, JBA , JBA(N) , JBA(L) and JBA(BA) all share the behavior, for p fixed and m → ∞, that is described in Section 7.1. We have prepared graphs of the coverage probability and the scaled expected length of JBA for all m ∈ {1, 2, 3, 10}, |ρ| ∈ {0.2, 0.5, 0.7, 0.9} and p ∈ {3, 6, 12, 24}. These graphs are presented in the Supplementary Material. We compared these with the corresponding graphs for J. As expected, the coverage probability of JBA is never less than the coverage probability of J and is often greater. The graphs of the scaled expected lengths of J and JBA are quite similar and do not ∗ display any practically important differences. We also prepared graphs of CPBA (γ , ρ¯ ) and SEL∗BA (γ , ρ ), which are the limits as m → ∞ of the coverage probability and the scaled expected length, respectively, of JBA . These graphs are presented in the Supplementary Material. We compared these with the corresponding graphs for J. In this comparison, JBA performs somewhat better than J in terms of (a) having a somewhat larger minimum limiting coverage and (b) somewhat smaller maximum limiting scaled expected length. However, just as for J, the limiting scaled expected length of JBA is never less than 1. This shows that, in the limit as m → ∞, both J and JBA are outperformed by the standard confidence interval, with coverage 0.95, based on the full model. Note that w (x) is a decreasing function of x ≥ 0, such that w (x) → 0 as x → ∞ and w (0) = 1/(1 + exp([−d/2)). Therefore the ‘‘model ] averaged residual degrees of freedom’’ of Lukacs et al. (2010) must lie in the interval m, m + 1/(1 + exp(−d/2)) . This implies that the differences between the coverage probabilities and the scaled expected lengths of J and J(L) converge to 0 as m → ∞. Similarly, the differences between the coverage probabilities and the scaled expected lengths of JBA and JBA(L) converge to 0 as m → ∞. These differences are already very small for m = 4. A comparison of the formulas for the functions r and r(BA) shows that the differences between the coverage probabilities and the scaled expected lengths of J and J(BA) converge to 0 as m → ∞. Similarly, the differences between the coverage probabilities and the scaled expected lengths of JBA and JBA(BA) converge to 0 as m → ∞. These differences are already small for m = 6. 8. Discussion In the context of a simple testbed situation involving two linear regression models, we have derived exact expressions for the coverage probability and scaled expected length of the confidence interval J centered on a frequentist model averaged estimator and with half-width proportional to the standard error given by the estimate put forward by Buckland et al. (1997) of their formula (9). Using these expressions to explore the exact finite sample performance of this confidence interval, we showed that for residual degrees of freedom m = 1 this interval has good coverage and scaled expected length properties and that these deteriorate as m increases, being already quite poor for m = 10. We also explored the limiting asymptotic case (as m → ∞) and showed that the minimum limiting coverage can be much lower than the nominal value even when the scaled expected length is never less than one and has maximum value much larger than one. Differences in generality and notation mean that it is not obvious how our limiting coverage results relate to those of Hjort and Claeskens (2003) (who did not include any results on expected length). We were able to compare our results to those obtained for the asymptotic coverage of the confidence interval by Hjort and Claeskens (2003) and show that they are the same. Our results enhance the coverage result obtained by Hjort and Claeskens (2003) by providing exact results in the more limited testbed situation for any sample size for both coverage and scaled expected length. All the results taken together show that the confidence interval J (based on AIC weights) cannot be generally recommended. Please cite this article as: P. Kabaila, A.H. Welsh and C. Wijethunga, Finite sample properties of confidence intervals centered on a model averaged estimator. Journal of Statistical Planning and Inference (2019), https://doi.org/10.1016/j.jspi.2019.10.004.

P. Kabaila, A.H. Welsh and C. Wijethunga / Journal of Statistical Planning and Inference xxx (xxxx) xxx

13

We have extended this assessment of performance to seven other confidence intervals, centered on a model averaged estimator, that have been proposed in the literature. The assessment of these other confidence intervals shows that these confidence intervals (based on AIC weights) also cannot be generally recommended. There seems to be no reason to expect that the confidence intervals considered in this paper will have performances that improve as we expand the set of models under consideration. The following result, which is an analogue of the main results of Kabaila and Leeb (2006), Kabaila and Giri (2009b) and Kabaila (2018), is highly plausible. Consider the same linear regression model as the full model considered in the present paper, with the same scalar parameter of interest. Suppose that, for each i ∈ {q + 1, . . . , p}, we either set βi to zero or let it vary freely, where q is a given positive integer satisfying 1 ≤ q < p. This implies that there are 2p−q models under consideration. It is highly plausible that the wider the class of models over which one averages, using given data-based model weights, the smaller the minimum coverage probability of the confidence interval centered on a model averaged estimator under consideration, with given nominal coverage. If true, this result can be combined with Theorem 1 (and its analogues for the other confidence intervals described in Section 7) to yield an easily-computed upper bound, in the style of the upper bounds of Kabaila and Leeb (2006), Kabaila and Giri (2009b) and Kabaila (2018), on the minimum coverage probability of this confidence interval. Acknowledgments This work was supported by an Australian Government Research Training Program Scholarship. The work of Alan H. Welsh was partly supported by an Australian Government ARC Discovery Project DP180100836 grant. The authors are grateful to the reviewers and the associate editor for their comments and suggestions as these have led to a greatly improved paper. Appendix A

A.1. Proof of Theorem 1 (a) The coverage probability of the confidence interval J is 1/2 1/2 P(θ ∈ J) = P ˜ θ − tm,1−α/2 σ W vθ r ˜ γ /W , ρ ≤ θ ≤ ˜ θ + tm,1−α/2 σ W vθ r ˜ γ /W , ρ

(

(

)

(

))

( ) ( ) )) 1/2 ( 1/2 1/2 ( = P −tm,1−α/2 σ W vθ r ˜ γ /W , ρ ≤ ˆ θ − θ − ρ vθ σ W k ˜ γ /W ≤ tm,1−α/2 σ W vθ r ˜ γ /W , ρ ( ) ( ) ˆ ( ) ( ) θ −θ = P −tm,1−α/2 W r ˜ γ /W , ρ ≤ −ρWk ˜ γ /W ≤ tm,1−α/2 W r ˜ γ /W , ρ 1/2 σ vθ ( ( ) ( ) ( ) ( )) =P ρWk ˜ γ /W − tm,1−α/2 W r ˜ γ /W , ρ ≤ G ≤ ρ W k ˜ γ /W + tm,1−α/2 W r ˜ γ /W , ρ , 1/2 where G = (ˆ θ − θ )/(σ vθ ). Note that

[ ] G

˜ γ

∼N

([ ] [ 0 1 , γ ρ

ρ 1

])

,

so the distribution of G conditional on ˜ γ = h is N ρ (h − γ ), 1 − ρ 2 . Recall that

(

)

( ) ( ) ℓ(˜ γ , W , ρ) = ρ W k ˜ γ /W − tm,1−α/2 W r ˜ γ /W , ρ ( ) ( ) u(˜ γ , W , ρ) = ρ W k ˜ γ /W + tm,1−α/2 W r ˜ γ /W , ρ . Therefore the coverage probability is

(

)

CP(γ , ρ ) = P ℓ(˜ γ , W , ρ ) ≤ G ≤ u(˜ γ , W , ρ) ∞







=

⏐ ⏐

(

)

P ℓ(˜ γ , W , ρ ) ≤ G ≤ u(˜ γ , W , ρ) ⏐ ˜ γ = h, W = w φ (h − γ ) dh fW (w) dw

=

∫0 ∞ ∫−∞ ∞ (

⏐ ⏐

)

P ℓ(h, w, ρ ) ≤ G ≤ u(h, w, ρ ) ⏐ ˜ γ = h φ (h − γ ) dh fW (w) dw

∫0 ∞ ∫−∞ ) ∞ ( ( ) = P ℓ(h, w, ρ ) ≤ ˜ G ≤ u(h, w, ρ ) φ (h − γ ) dh fW (w ) dw, where ˜ G ∼ N ρ (h − γ ), 1 − ρ 2 , ∫0 ∞ ∫−∞ ( ) ∞ = Ψ ℓ(h, w, ρ ), u(h, w, ρ ); ρ (h − γ ), 1 − ρ 2 φ (h − γ ) dh fW (w ) dw, (19) 0

−∞

Please cite this article as: P. Kabaila, A.H. Welsh and C. Wijethunga, Finite sample properties of confidence intervals centered on a model averaged estimator. Journal of Statistical Planning and Inference (2019), https://doi.org/10.1016/j.jspi.2019.10.004.

14

P. Kabaila, A.H. Welsh and C. Wijethunga / Journal of Statistical Planning and Inference xxx (xxxx) xxx

where Ψ (a, b; µ, v ) = P(a ≤ Z ≤ b) for Z ∼ N(µ, v ). Now, by changing the variable of integration in the inner integral to y = h − γ , we obtain (10). □ (b) We use the following lemmas. Lemma 1.

Ψ (a, b; µ, v ) = Ψ (−b, −a; −µ, v ), where Ψ (a, b; µ, v ) = P(a ≤ Z ≤ b) for Z ∼ N(µ, v ).

Lemma 1 is same as the Lemma 2 of Kabaila and Wijethunga (2019). The following lemma has some similarities to Lemma 3 of Kabaila and Wijethunga (2019). Lemma 2. (i) (ii) (iii) (iv)

−u(−h, w, ρ ) = ℓ(h, w, ρ ) −ℓ(−h, w, ρ ) = u(h, w, ρ ) ℓ(h, w, −ρ ) = −u(h, w, ρ ) u(h, w, −ρ ) = −ℓ(h, w, ρ )

Proof of Lemma 1. For Z ∼ N(µ, v ),

Ψ (a, b; µ, v ) = P(a ≤ Z ≤ b) = P(−b ≤ −Z ≤ −a) = Ψ (−b, −a; −µ, v ). □ Proof of Lemma 2. Recall that

( ) ( ) ℓ(h, w, ρ ) = −tm,1−α/2 w r h/w, ρ + ρ w k h/w ( ) ( ) u(h, w, ρ ) = tm,1−α/2 w r h/w, ρ + ρ w k h/w , where r(x, ρ ) is given by (7), w (x) is given by (3) and k(x) = x w1 (x). Obviously, w1 (x) is an even function and k(x) is an odd function. Note that r(x, ρ ) is an even function of x, for given ρ , and an even function of ρ , for given x. (i) Since k is an odd function and r(x, ρ ) is an even function of x,

( ) ( ) −u(−h, w, ρ ) = −tm,1−α/2 w r −h/w, ρ − ρ w k −h/w ( ) ( ) = −tm,1−α/2 w r h/w, ρ + ρ w k h/w , = ℓ(h, w, ρ ). (ii) Since k is an odd function and r(x, ρ ) is an even function of x,

( ) ( ) −ℓ(−h, w, ρ ) = tm,1−α/2 w r −h/w, ρ − ρ w k −h/w ( ) ( ) = tm,1−α/2 w r h/w, ρ + ρ w k h/w , = u(h, w, ρ ). (iii) Since r(x, ρ ) is an even function of ρ ,

( ) ( ) ℓ(h, w, −ρ ) = −tm,1−α/2 w r h/w, −ρ − ρ w k h/w ( ) ( ) = −tm,1−α/2 w r h/w, ρ − ρ w k h/w , = −u(h, w, ρ ). (iv) Since r(x, ρ ) is an even function of ρ , u(h, w, −ρ ) = tm,1−α/2 w r h/w, −ρ − ρ w k h/w

(

)

(

)

) = tm,1−α/2 w r h/w, ρ − ρ w k h/w , = −ℓ(h, w, ρ ). □ (

)

(

From (19) CP(γ , ρ ) =



∫ 0





( ) Ψ ℓ(h, w, ρ ), u(h, w, ρ ); ρ (h − γ ), 1 − ρ 2 φ (h − γ ) dh fW (w ) dw.

−∞

Consider CP(−γ , ρ ) =









∫0 ∞ ∫−∞ ∞ = 0

( ) Ψ ℓ(h, w, ρ ), u(h, w, ρ ); ρ (h + γ ), 1 − ρ 2 φ (h + γ ) dh fW (w ) dw ( ) Ψ −u(h, w, ρ ), −ℓ(h, w, ρ ); −ρ (h + γ ), 1 − ρ 2 φ (−h − γ ) dh fW (w ) dw,

−∞

Please cite this article as: P. Kabaila, A.H. Welsh and C. Wijethunga, Finite sample properties of confidence intervals centered on a model averaged estimator. Journal of Statistical Planning and Inference (2019), https://doi.org/10.1016/j.jspi.2019.10.004.

P. Kabaila, A.H. Welsh and C. Wijethunga / Journal of Statistical Planning and Inference xxx (xxxx) xxx

15

from Lemma 1 and since φ is an even function. Now by changing the variable of integration in the inner integral to y = −h, CP(−γ , ρ ) =









∫0 ∞ ∫−∞ ∞ = 0

( ) Ψ −u(−y, w, ρ ), −ℓ(−y, w, ρ ); ρ (y − γ ), 1 − ρ 2 φ (y − γ ) dy fW (w ) dw ( ) Ψ ℓ(y, w, ρ ), u(y, w, ρ ); ρ (y − γ ), 1 − ρ 2 φ (y − γ ) dy fW (w ) dw,

−∞

from Lemmas 2(i) and (ii),

= CP(γ , ρ ). Therefore, CP(γ , ρ ) is an even function of γ , for given ρ . Now consider CP(γ , −ρ ) =









∫0 ∞ ∫−∞ ∞ = 0

( ) Ψ ℓ(h, w, −ρ ), u(h, w, −ρ ); −ρ (h − γ ), 1 − ρ 2 φ (h − γ ) dh fW (w ) dw ( ) Ψ −u(h, w, ρ ), −ℓ(h, w, ρ ); −ρ (h − γ ), 1 − ρ 2 φ (h − γ ) dh fW (w ) dw,

−∞

from Lemmas 2(iii) and (iv), ∞







= 0

(

) Ψ ℓ(h, w, ρ ), u(h, w, ρ ); ρ (h − γ ), 1 − ρ 2 φ (h − γ ) dh fW (w ) dw,

−∞

from Lemma 1,

= CP(γ , ρ ). Therefore, CP(γ , ρ ) is an even function of ρ , for given γ .



A.2. Proof of Theorem 2

(a) The scaled expected length is

(

E length of the confidence interval J

)

(

E length of the standard CI with the same coverage as the minimum coverage of J

).

Observe that

(

E length of the confidence interval J

)

( ( )) 1/2 = 2 tm,1−α/2 σ vθ E W r ˜ γ /W , ρ .

Let cmin be the minimum coverage probability of the confidence interval J. Then the standard confidence interval with [ 1/2 ] coverage cmin is ˆ θ ± tm,(1+cmin )/2 ˆ σ vθ . Thus

(

E length of the standard CI with the same coverage as the minimum coverage of J

)

1/2

= 2 tm,(1+cmin )/2 σ vθ E(W ).

Therefore the scaled expected length

(

SEL(γ ) =

tm,1−α/2 tm,(1+cmin )/2

Note that W ∼ Q /m

(

( E(W ) = E

E Wr ˜ γ /W , ρ

Q 1/2 m1/2

)1/2

) =

(

E(W )

)) .

where Q ∼ χm2 . Therefore 1 m1/2

1/2

2

( ) Γ (m + 1)/2 Γ (m/2)

( m )−1/2 Γ ((m + 1)/2) = . 2 Γ (m/2)

Please cite this article as: P. Kabaila, A.H. Welsh and C. Wijethunga, Finite sample properties of confidence intervals centered on a model averaged estimator. Journal of Statistical Planning and Inference (2019), https://doi.org/10.1016/j.jspi.2019.10.004.

16

P. Kabaila, A.H. Welsh and C. Wijethunga / Journal of Statistical Planning and Inference xxx (xxxx) xxx

Thus SEL(γ , ρ ) =

=

tm,1−α/2

( m )1/2

tm,(1+cmin )/2 tm,1−α/2

2

( m )1/2

tm,(1+cmin )/2

2

Γ (m/2)

( ( )) )E Wr ˜ γ /W , ρ Γ (m + 1)/2 ∫ ∞∫ ∞ Γ (m/2) ( ) w r(h/w, ρ ) φ (h − γ ) dh fW (w) dw. Γ (m + 1)/2 0 −∞ (

(20)

Now, by changing the variable of integration in the inner integral to y = h − γ , we obtain (11). □ (b) From (20) SEL(γ , ρ ) =

tm,1−α/2

Γ (m/2)

( m )1/2

tm,(1+cmin )/2

2





Γ (m + 1)/2

(

)

0





w r(h/w, ρ ) φ (h − γ ) dh fW (w) dw. −∞

We known that r(x, ρ ) is an even function of x, for given ρ , and an even function of ρ , for given x. Consider the inner integral SEL1 (γ , ρ ) =





r(h/w, ρ ) φ (h − γ ) dh, −∞

which depends on γ and ρ . If we can show that SEL1 (γ , ρ ) is an even function of γ , for given ρ , and an even function of ρ , for given γ , then we can say that SEL(γ , ρ ) is also an even function of γ , for given ρ , and an even function of ρ , for given γ . Now consider SEL1 (−γ , ρ ) =





r(h/w, ρ ) φ (h + γ ) dh

∫−∞ ∞ =

∫−∞ ∞ =

∫−∞ ∞ =

r(h/w, ρ ) φ (−h − γ ) dh,

since φ is an even function,

r(−y/w, ρ ) φ (y − γ ) dy, by changing the variable of integration to y = −h, r(y/w, ρ ) φ (y − γ ) dy,

since r(x, ρ ) is an even function of x,

−∞

= SEL1 (γ , ρ ). Therefore SEL1 (γ , ρ ) is an even function of γ , for given ρ . Thus SEL(γ , ρ ) is an even function of γ , for given ρ . Consider SEL1 (γ , −ρ ) =





r(h/w, −ρ ) φ (h − γ ) dh

∫−∞ ∞ =

r(h/w, ρ ) φ (h − γ ) dh,

since r(x, ρ ) is an even function of ρ,

−∞

= SEL1 (γ , ρ ). Therefore SEL1 (γ , ρ ) is an even function of ρ , for given γ . Thus SEL(γ , ρ ) is an even function of ρ , for given γ .



Appendix B. Supplementary data Supplementary material related to this article can be found online at https://doi.org/10.1016/j.jspi.2019.10.004. References Abramowitz, M., Stegun, I.A., 1964. Handbook of Mathematical Functions. Dover, Washington, D.C.. Bartoń, K., 2019. MuMIn: Multi-Model Inference. R package version 1.43.6. URL https://CRAN.R-project.org/package=MuMIn. Buckland, S.T., Burnham, K.P., Augustin, N.H., 1997. Model selection: an integral part of inference. Biometrics 53, 603–618. Burnham, K.P., Anderson, D.R., 2002. Model Selection and Multimodel Inference, second ed. Springer Verlag, New York. Calcagno, V., 2019. Glmulti: Model Selection and Multimodel Inference Made Easy. R package version 1.0.7.1. URL https://CRAN.R-project.org/package= glmulti. Claeskens, G., Hjort, N.L., 2008. Model Selection and Model Averaging. Cambridge University Press, Cambridge, UK. Fletcher, D., 2018. Model Averaging. Springer, Berlin. Fletcher, D., Dillingham, P.W., 2011. Model-averaged confidence intervals for factorial experiments. Comput. Statist. Data Anal. 55, 3041–3048. Fletcher, D., Turek, D., 2011. Model-averaged profile likelihood intervals. J. Agric. Biol. Environ. Stat. 17, 38–51. Graybill, F.A., 1976. Theory and Application of the Linear Model. Wadsworth, Belmont, CA. Hjort, N.L., Claeskens, G., 2003. Frequentist model average estimators. J. Amer. Statist. Assoc. 98, 879–899. Kabaila, P., 2018. On the minimum coverage probability of model averaged tail area confidence intervals. Canad. J. Statist. 46, 279–297. Kabaila, P., Giri, K., 2009a. Confidence intervals in regression utilizing prior information. J. Statist. Plann. Inference 139, 3419–3429. Kabaila, P., Giri, K., 2009b. Upper bounds on the minimum coverage probability of confidence intervals in regression after model selection. Aust. N.Z. J. Stat. 271–287.

Please cite this article as: P. Kabaila, A.H. Welsh and C. Wijethunga, Finite sample properties of confidence intervals centered on a model averaged estimator. Journal of Statistical Planning and Inference (2019), https://doi.org/10.1016/j.jspi.2019.10.004.

P. Kabaila, A.H. Welsh and C. Wijethunga / Journal of Statistical Planning and Inference xxx (xxxx) xxx

17

Kabaila, P., Leeb, H., 2006. On the large-sample minimal coverage probability of confidence intervals after model selection. J. Amer. Statist. Assoc. 101, 619–629. Kabaila, P., Welsh, A.H., Abeysekera, W., 2016. Model-averaged confidence intervals. Scand. J. Stat. 43, 35–48. Kabaila, P., Welsh, A.H., Mainzer, R., 2017. The performance of model averaged tail area confidence intervals. Comm. Statist. Theory Methods 46, 10718–10732. Kabaila, P., Wijethunga, C., 2019. Confidence intervals centred on bootstrap smoothed estimators. Aust. N.Z. J. Stat. 61, 19–38. Lukacs, P.M., Burnham, K.P., Anderson, D.R., 2010. Model selection bias and Freedman’s paradox. Ann. Inst. Statist. Math. 62, 117–125. Mazerolle, M.J., 2019. AICcmodavg: Model Selection and Multimodel Inference Based on (Q)AIC(c). R package version 2.2-2. URL https://CRAN.Rproject.org/package=AICcmodavg. R Core Team, 2018. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, URL https://www.R-project.org. Shomaker, M., Heumann, C., 2017. Model averaging and model selection after multiple imputation using the R-package MAMI. URL http://mami.rforge.r-project.org/. Turek, D., Fletcher, D., 2012. Model-averaged wald confidence intervals. Comput. Statist. Data Anal. 56, 2809–2815.

Please cite this article as: P. Kabaila, A.H. Welsh and C. Wijethunga, Finite sample properties of confidence intervals centered on a model averaged estimator. Journal of Statistical Planning and Inference (2019), https://doi.org/10.1016/j.jspi.2019.10.004.