Choice is suffering: A Focused Information Criterion for model selection

Choice is suffering: A Focused Information Criterion for model selection

Economic Modelling 29 (2012) 817–822 Contents lists available at SciVerse ScienceDirect Economic Modelling journal homepage: www.elsevier.com/locate...

201KB Sizes 0 Downloads 65 Views

Economic Modelling 29 (2012) 817–822

Contents lists available at SciVerse ScienceDirect

Economic Modelling journal homepage: www.elsevier.com/locate/ecmod

Choice is suffering: A Focused Information Criterion for model selection Peter Behl a, Holger Dette a, Manuel Frondel a, b, c,⁎, Harald Tauchmann b a b c

Ruhr University Bochum, Germany Rheinisch-Westfälisches Institut für Wirtschaftsforschung (RWI), Germany Ruhr Graduate School in Economics, Germany

a r t i c l e

i n f o

Article history: Accepted 9 September 2011 JEL classification: C3 D2 Keywords: Akaike Information Criterion Schwarz Information Criterion Translog cost function

a b s t r a c t In contrast to conventional measures, the Focused Information Criterion (FIC) allows the purpose-specific selection of models, thereby reflecting the idea that one kind of model might be appropriate for inferences on a parameter of interest, but not for another. Ever since its invention, the FIC has been increasingly applied in the realm of statistics, but this concept appears to be virtually unknown in the economic literature. Using a straightforward analytical example, this paper provides for a didactic illustration of the FIC and demonstrates its usefulness in economic applications. © 2011 Elsevier B.V. All rights reserved.

1. Introduction Selecting an adequate model is a key issue for any empirical analysis. Numerous methods for model choice and validation have been suggested in the literature. Well-known approaches to model selection include the usage of information criteria, such as the Akaike (1973) and Schwarz (1978) information criteria AIC and SIC. Alternatively, Dette (1999), Dette et al. (2006), and Podolskij and Dette (2008) propose, among many others, goodness-of-fit tests. Common to all these tests, measures, and criteria is the idea that they provide a single ‘best’ model, regardless of the purpose of inference. Deviating from this conventional avenue, Claeskens and Hjort (2003) have conceived the Focused Information Criterion (FIC) to allow various models to be selected for different purposes. This approach reflects the view that one kind of model might be appropriate for inferences on, say, the cross-price elasticity of capital and labor, whereas a different sort of model may be preferable for the estimation of another parameter, such as the own-price elasticity of labor. The FIC aims at selecting the model that estimates a function of model parameters, which Claeskens and Hjort (2003) call focus parameter μ, as precisely as possible. That is, the FIC minimizes the mean squared error of the focus parameter, whereas classical model selection strategies, such as AIC or SIC, attempt to find a parsimonious “optimal” model. Ever since its invention, the FIC has been increasingly applied in the realm of statistics (see Claeskens and Hjort, 2008; Claeskens et al., 2007; Hjort and Claeskens, 2006), but the concept ⁎ Corresponding author at: Rheinisch-Westfälisches Institut für Wirtschaftsforschung (RWI), Hohenzollernstr, 1–3, D-45128 Essen, Germany. E-mail address: [email protected] (M. Frondel). 0264-9993/$ – see front matter © 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.econmod.2011.09.002

appears to be virtually unknown in the economic literature, with Brownlees and Gallo (2008) being a rare exception. Using the classical example of the choice among Cobb–Douglas- and translog models for didactic purposes, this paper illustrates the concept and usefulness of the FIC, focusing on the substitutability of capital and labor. The following Section 2 describes the choice among Cobb–Douglasand translog models and our preferred focus parameter. Section 3 explains the concept of the information matrix, which is the core of the FIC, and calculates it for the one-dimensional case, in which we augment the Cobb–Douglas-model by a single parameter. In Section 4, we apply the one-dimensional FIC to the model selection problem presented in Section 2. For this exceptional case, the model choice based on the FIC actually does not depend on the choice of the focus parameter. Section 5 therefore presents the FIC for the more general twodimensional case in which the Cobb–Douglas model is extended by two parameters, rather than only one. In this case, it turns out that the choice of the focus parameter plays a crucial role. The last section summarizes. 2. A classical example We use the frequently employed translog cost function approach – see e. g. Frondel and Schmidt (2002, 2003) for surveys – including here merely two inputs, capital (K) and labor (L), where p˜ K and p˜ L denote the respective prices: 1 2 logC ðpK ; pL Þ ¼ β0 þ βK log p˜ K þ βL log p˜ L þ βKK ðlog p˜ K Þ 2 1 2 þβKL log p˜ K log p˜ L þ βLL ðlog p˜ L Þ : 2

ð1Þ

818

P. Behl et al. / Economic Modelling 29 (2012) 817–822 full

This approach reduces to the Cobb–Douglas function if the second-order coefficients βKK, βLL, and βKL vanish:

where γ full =βKK , whereas those of the null model are denoted by  T θ0 :¼ β0K ; σ 0 ; γ0 . Corresponding to (8), γ0 equals zero: γ0 =0.

H0 : βKK ¼ βLL ¼ βKL ¼ 0:

As the term Focused Information Criterion suggests, it is not surprising at all that employing the FIC is intimately related to calculating an information matrix, which is related to Fisher's well-known information measure and represents the variance matrix of the score vector ∂logL , where L denotes the likelihood function that is specified below ∂θ for our example. In fact, for comparing competing parametric models on the basis of an n-dimensional sample that provides observations (pK, 1, …, pK, n) and (sK, 1, …, sK, n) on pK and sK, respectively, applying the FIC requires the calculation of a (p+ q) × (p+ q) information matrix, where p refers to the number of parameters estimated in the null model and q designates the number of parameters that exclusively belong to the full model. In our example, p = 2 and q = 1, that is, I full is a 3 × 3 matrix and I00 is a 2 × 2 matrix, whereas I11 is a scalar:

ð2Þ

Given empirical data on input prices, as well as on cost shares of capital (sK) and labor (sL), an efficient procedure to obtain coefficient estimates is via estimating a cost share system (Berndt, 1996:470): sK ¼ βK þ βKK log p˜ K þ βKL log p˜ L ; sL ¼ βL þ βKL log p˜ K þ βL L log p˜ L ;

ð3Þ

which results from the logarithmic differentiation of translog function (1) ∂logC p˜ ∂C ¼ K ¼ with respect to p˜ K and p˜ L , respectively, as e. g. C ∂ p˜ K ∂log p˜ K p˜ K xK ∂C ¼ sK , where according to Shepard's Lemma ¼ xK . C ∂ p˜ K In this two-factor case, cost share system (3) degenerates to a single cost share equation: sK ¼ βK þ βKK logð p˜ K = p˜ L Þ;

ð4Þ

as both cost shares add to unity, sK + sL = 1, thereby implying the following restrictions that are partly incorporated in (4): 1 ¼ βK þ βL ;

ð5Þ

0 ¼ βKK þ βKL ;

ð6Þ

0 ¼ βKL þ βLL :

ð7Þ

On the basis of (4), the classical procedure of selecting either of the two specifications involves testing whether βKK equals zero: H0 : βKK ¼ 0:

ð8Þ

Alternatively, using the FIC for model selection requires determining a measure of interest μ, which is typically a function of the model coefficients. As in many empirical labor market studies, we focus here on the capital elasticity with respect to wages, ηKpL, which for translog cost function (1) is given by (see e. g. Frondel and Schmidt (2006):188): μ ¼ ηKpL :¼

∂logK β ¼ KL þ sL : sK ∂log p˜ L

ð9Þ

Using sk + sL = 1 and restriction (6), i. e. βKL =−βKK, we obtain βKK þ 1−sK , with capital cost share sK from (4) and sK the focus parameter μ = ηKpL depending on two parameters: βK and βKK. μ ¼ μ ðβK ; βKK Þ ¼ −

Expression (9) degenerates to ηKpL = sL for the Cobb–Douglas function, as can be seen from hypothesis (8) and restriction (6). 3. Information measures and matrices Using the abbreviation pK :¼ log ð p˜ K = p˜ L Þ for the logged relative factor prices, the stochastic version of the more general specification (4) reads: sK ¼ βK þ βKK pK þ ε;

ð10Þ

where ε denotes the error term, whose variance structure is assumed to be homoscedastic: Var(ε)=σ2. In line with Claeskens and Hjort (2003:901), specification (10) is called here full model. Relative to the so-called narrow model, also referred to as the null model, the single parameter γ:=βKK completes the full model. For clarity, the parameters es T timated from the full model are designated by θfull :¼ βKfull ; σ full ; γ full ,

I

full

 :¼

I00 I10

2   ∂logL 2 6 ∂βK 6 6  6 I01 6 ∂logL ∂logL ¼ E6 I11 6 ∂σ ∂βK 6 6 4 ∂logL ∂logL ∂γ ∂βK

3 ∂logL ∂logL ∂logL ∂logL 7 ∂βK ∂σ ∂βK ∂γ 7 7  2 7 ∂logL ∂logL ∂logL7 7: ð11Þ ∂σ ∂σ ∂γ 7 7  2 7 ∂logL ∂logL ∂logL 5 ∂γ ∂σ ∂γ

Note that the entries of I full are based on Fisher's well-known information measure, which for I11, for instance, is given in more detail by I11

" " 2 #  # ∂logLðβK ; σ ; γÞ 2 ∂LðβK ; σ ; γÞ ¼E ¼E : =LðβK ; σ; γÞ ∂γ ∂γ

Fisher's information measure helps to discriminate between two parameter values γ1 and γ2 on the basis of the likelihood L(βK, σ, γ). Intuitively, the larger the difference L(βK, σ, γ1)−L(βK, σ, γ2), the more easy it is to discriminate between γ1 and γ2. Fisher's measure captures this difference by the partial derivative of the likelihood, ∂ logL/∂ γ, relative to the likelihood L. This ratio is squared in order to account for positive and negative relative differences alike. While the information matrix I full is generally unknown, it can be estimated on the basis of empirical data on pK and sK. To this end, we assume normality of the error term: ε ∼ N(0, σ 2), as ML estimation is Claeskens and Hjort's (2003:901) method of choice when employing the FIC as a model discrimination tool. The log-likelihood logL(sK|pK) of sK then reads   pffiffiffiffiffiffi 1 sK −βK −βKK pK 2 : logLðsK jpK Þ ¼ −log 2π−logσ − 2 σ Given this log-likelihood, we get the following score vector: 0 1 ∂logL 0 1 sK −βK −βKK ·pK 1 B ∂β C · B K C B C σ σ B C B C B∂logL C ∂logLðsK jpK Þ B 1 ðsK −βK −βKK ·pK Þ2 C C B jθ0 ¼ B C − þ C ¼B 3 B C ∂θ ∂σ B C σ @ σ A B C p ð s −β −β ·p Þ 1 K K K KK K @ ∂logL A · jθ0 σ σ ∂βKK jθ0 0

1 0 ε B C 0 B σ C B 0 2 C B ε −1 C B C; ¼B C 0 B C σ B C 0 @ p ε A K

σ0

ð12Þ

ð13Þ

P. Behl et al. / Economic Modelling 29 (2012) 817–822

 T θ0 :¼ β0K ; σ 0 ; γ0 ¼ 0 , and

where θ : = (βK, σ, γ = βKK) T, ε0 :¼ 0 sK −βK ∼Nð0; 1Þ. σ0 ∂logL Using score vector j 0 as given by (13) and evaluating the ∂θ θ information matrix at θ0 , the common anchor of both models, yields: I

full

ðpK Þjθ0

"   T # ∂logLðsK jpK Þ ∂logLðsK jpK Þ ¼E jθ0 · jθ0 ∂θ ∂θ 2

3 3 2  2 " ! # " # 0 0 2 0 ε −1 ε ε ε0 pK ε0 6 7 7 6 E E4 0 E 0 0 5 7 6 7 6 σ0 σ σ σ σ0 7 6 7 6 2   3 2 0 1 3 2 3     2 7 6 2 2 2 0 0 0 6 0 −1 ε −1 ε −1 p ε 0 7 7 6 6ε ε 7 6 B C 7 6 7 K ¼ 6 E4 0 7 E E 5 4 @ A 5 4 5 6 σ σ0 7 σ0 σ0 σ0 7 6 7 6 2 2 3 7 6 " # " ! # 0 2 7 6 0 0 0 0 ε −1 ε pK ε pK ε 7 pK ε 7 6 6 5 4 E 0 0 E E4 5 σ σ σ0 σ0 σ0 2

1 ¼  2 4 0 0 σ pK

0 2 0

1

3 pK 0 5; 2 ðpK Þ

ð14Þ

as E[(ε 0) 2] = Var (ε 0) = 1, E[ε 0] = 0 = E[(ε 0) 3], and E[(ε 0) 4] = 3. Note that the information matrix I full ðpK Þ, which is the variance matrix of the score vector, explicitly depends on a given value of pK . Employing the methods of moments provides for an estimate of I full jθ0 that one needs for inference purposes: ^I full

θ0

1 n full   ¼ ∑ I pK;i n i¼1

^I ^I 00 01

¼

^I 10 Iˆ11

θ0

  p K :¼ pK;1 þ … þ pK;n =n,

!

0

1 1 0 pK @ A; ð15Þ 0 2 0 ¼ 2 2 0 ˆ p 0 p σ K K 1

  p2K :¼ p2K;1 þ … þ p2K;n =n, and

with  2 ˆ0 being the maximum likelihood (ML) estimate of (σ 0) 2. σ

with  −1 −1 V :¼ I11 −I10 I00 I01

 −1 −1 Vˆ ¼ Iˆ11 −^I 10^I 00 ^I 01 2 0 10  ˆ0 σ B 6 p2K p B C K ¼6 2 −@ 2 ; 0AB @ 4 ˆ0 ˆ0 σ σ 0

ð16Þ

where −1

ω :¼ I10 I00

∂μ ∂μ − ∂ξ θ0 ∂γ T

j

θ0

;

ð17Þ 0

with ξ : = (βK, σ) , and, as γ = 0,  pffiffiffi pffiffiffi full 0 full B :¼ n γ −γ ¼ nγ

full

2

¼ 2ω V;



0 2

σˆ0

=2

13−1 pK CB  0 2 C7 CB σ 7 ˆ C A5 A@ 0

 2 3 ! −1 ˆ0 σ p K 7 B pK C 6 ¼ 4 ; 5 ¼ 2 −@ 2 ; 0A p2K −ðp K Þ2 ˆ0 ˆ0 0 σ σ 0

2

1

p2K

full which is proportional to the variance of the ML-estimate γˆ full ¼ βˆ KK . pffiffiffi full full Note that βˆ KK is the essential ingredient of the estimate Bˆ ¼ n βˆ KK  pffiffiffi of bias B ¼ n γ full −γ0 . In short, irrespective of the concrete value of the common term ω, comparing FIC 0 and FIC full in fact reflects the trade-off between bias B versus estimation variability given by V. While the model with the smallest estimate of FIC is chosen, for the nontrivial case in which ω ≠ 0, the narrow model is preferred by the FIC over the full model if FIC 0 = ω 2B 2 b 2ω 2V = FIC full or, equivalently, if B 2/V b 2 (Claeskens and Hjort, 2003:907). In our example, this decision is based on the following estimate of a χ 2(1)-distributed test statistic:

 full 2 ˆ KK β  2 0 ˆ σ

2 Bˆ ¼ Vˆ

;

ˆ σ0

2

full  is the estimated variance of βˆ KK and the signifwhere  2 2 n pK −ðp K Þ

icance level results from Pr(χ 2(1) ≥ 2) = 0.157. Note that it becomes obvious from the ratio FIC 0/FIC full = ω 2B 2/2ω 2V = B 2/2V that for the one-dimensional FIC, the model choice is always independent of the selection of the focus parameter μ, even if ω explicitly depends on μ, as μ enters into the FIC only via ω, but ω cancels out in this ratio. For illustrative purposes, we nonetheless calculate the FIC both for our preferred focus parameter μ ¼ ηKpL ¼

βKL −βKK þ sL ¼ þ 1−ðβK þ βKK pK Þ; sK βK þ βKK pK

ð18Þ

ð19Þ

ð22Þ

∂μ ∂μ T ¼ 1, ¼ ð0; 0Þ , and ∂γ ∂ξ hence ω =−1, so that FIC 0 = B 2 and FIC full = 2V. In contrast, for μ = ηKpL, we obtain from expression (22):

and, alternatively, for μ = βKK = γ, for which

capturing the bias, whereas estimation variability V vanishes for the null model by definition. In contrast, for the full model, for which there is no bias by definition, i. e. for which B = 0, the FIC is given by (Claeskens and Hjort, 2003:907) FIC

10

2



2 2

ð21Þ

  n p2K −ðp K Þ2

While the FIC balances modeling bias versus estimation variability (Claeskens and Hjort, 2003:907), in our one-dimensional example, in which both models merely differ in the single coefficient γ = βKK, the FIC reduces for the null model to (Claeskens and Hjort, 2003:907): 0

ð20Þ

capturing estimation variability, as is illustrated right now. full Using ^ I jθ0 from (15), we get an estimate of V:

4. One-dimensional FIC

FIC ¼ ω B ;

819

∂μ ∂γ ∂μ ∂ξ

j j

θ0

¼

∂μ ∂βKK 0

θ0

j

θ0

 ¼

−βK −pK ðβK þ βKK pK Þ2



j

θ0

¼−

1   βKK −1; A −1 @ 2 ¼ ðβK þ βKK pK Þ ¼ : 0 0 θ0

j

1 −pK ; β0K

820

P. Behl et al. / Economic Modelling 29 (2012) 817–822

Using these derivatives and definition (17), for pK ¼ p K the estimate of ω reads:

j j

ˆμ ∂ˆμ −1 ∂ ˆ ¼ ^I 10^I 00 − ω ∂ξ 0 ∂γ 0 θ θ  2 0 10  ˆ0 σ B pK CB ¼ @ 2 ; 0AB @ ˆ0 σ 0 ¼

1 0 βˆ K

5. Two-dimensional example



0 2

ˆ0 σ

=2

In this section, we augment cost share Eq. (4) by an additional variable y : = logY, where Y denotes production output, thereby allowing for a non-homothetic translog cost function implying the following cost share equation:

1   C −1 1 C þ 0 þ pK A 0 βˆ K

:

ð23Þ  full 2 ˆ FIC ¼ 2ωˆ Vˆ ¼ 

2  0 2

ˆ0 σ

2

, which in accord with p2K −ð p K Þ2 βˆ K σˆ0 should be close to zero if translog function (1) is the true model, so that estimation variability is small. Similarly intuitive is ! full 2 0 βˆ 2 2 that Fˆ IC ¼ ω ˆ Bˆ ¼ n ˆKK should be small or even vanish if 0 In sum,



2

βK

full

Cobb–Douglas is the true model and, hence, βˆ KK is close to, or even equals, zero, and, hence, modeling bias is small because the null model appears to be appropriate. It bears noting that ω ˆ generally depends upon the concrete value pK: ω ˆ ¼ −p K þ

1 0 βˆ K

þ pK ;

ð24Þ

so that the FIC also critically hinges on the individual value pK. As a consequence, it may well be the case that with this criterion the full model might be preferred for some, but not for all pK. In contrast to other measures, such as AIC, the FIC therefore does not provide for a unanimous model recommendation across the whole range of values of the conditional variables. 1 More generally, in the q-dimensional case in which models may differ in q parameters γ1, …, γq, the FIC is given by (Claeskens and Hjort, 2003:907) !   2   q 0 2 0 þ 2 ∑ ωj Vj 1 γj ≠γj ; ∑ ωj Bj 1 γj ¼ γj q

FIC :¼

j¼1

choice of focus parameter μ. This is illustrated by the two-dimensional example presented in the subsequent section.

ð25Þ

j¼1

if estimation variability matrix V is diagonal with entries Vj and where 1(.) denotes the indicator function. Note that for q = 1 definition (25) reduces either to the FIC for the null model, FIC 0 = ω 2B 2, if γ = γ 0, or to the FIC for the full model, FIC full = 2ω 2V, if γ ≠ γ 0, with ω1 = ω being a common ingredient. From expression (25), it becomes obvious that with a parsimonious model, the reward is a small variance contribution 2∑qj¼1   ω2j Vj 1 γj ≠γ0j , but the penalty is a larger modeling bias originating   2 from ∑qj¼1 ωj Bj 1 γj ¼ γ0j . The situation is reversed for richer models. In short, including more model components means more variance and lower bias, and vice versa. Finally, it also bears noting that for the multi-dimensional case q N 1, the factors ω1, …, ωq, which may vary with the focus parameter μ, generally differ from each other. Thus, as opposed to the onedimensional case illustrated here, different models may be preferred by the FIC in the multi-dimensional case, depending upon the concrete

1 Alternatively, one might use a weighted version of the FIC (Claeskens and Hjort, 2008).

sK ¼ βK þ βKK pK þ βKY y:

ð26Þ

In this case, the null and the full model differ in two parameters gathered in vector γ : = (βKK, βKY) T. Eq. (26) results from completing translog cost function (1) by components that are related to the production output Y so that in contrast to specification (1), the production technology may exhibit variable, rather than constant returns to scale: logC ðpK ; pL ; Y Þ ¼ β0 þ βK log p˜ K þ βL log p˜ L þ βY logY

ð27Þ

1 1 2 2 þ βKK ðlog p˜ K Þ þ βKL log p˜ K log p˜ L þ βLL ðlog p˜ L Þ 2 2 1 2 þ βYY ðlogY Þ þ βKY log p˜ K logY þ βLY log p˜ L logY: 2 While in this case, condition βKY + βLY = 0 adds to the restrictions (5) to (7), the functional form of our focus parameter (9) of choice remains unchanged: μ ¼ ηKpL ¼

βKL −βKK þ sL ¼ þ 1−sK ¼ μ ðβK ; βKK ; βKY Þ; sK sK

ð28Þ

with a modified cost share sK of capital given by (26). With the log-likelihood reading now   pffiffiffiffiffiffi 1 sK −βK −βKK pK −βKY y 2 logLðsK jpK ; yÞ ¼ −log 2π−logσ− ; 2 σ

ð29Þ

∂logL 0 are given by (13), ∂θ θ 0 ∂logL yε ¼ 0 , where θ0 :¼ while the fourth component is given by ∂βKY jθ0 σ  T  T 0 0 0 0 0 ξ ;γ with ξ ¼ βK ; σ and p=2 as before, whereas now the first three components of the score vector

γ0 ¼ ð0; 0ÞT , γ=(βKK,βKY)T, and q=2, rather than q=1. In this two-dimensional example, I full is thus a (p + q) × (p + q) = 4 × 4 matrix: 2   ∂logL 2 6 6 ∂βK 6 6 6∂logL ∂logL 6 6 6 ∂σ ∂βK 6 full I ðpK ; yÞ :¼ E6 6 6 ∂logL ∂logL 6 6 ∂βKK ∂βK 6 6 6 4 ∂logL ∂logL ∂βKY ∂βK 0

1 1 B 0 ¼  2 B @ pK σ0 y

3 ∂logL ∂logL 7 ∂βK ∂βKY 7 7 7 ∂logL ∂logL 7 7 7 ∂σ ∂βKY 7 7 7 7 ∂logL ∂logL 7 7 ∂βKK ∂βKY 7 7  2 7 7 5 ∂logL ∂logL ∂logL ∂logL ∂logL ∂βKY ∂σ ∂βKY ∂βKK ∂βKY

∂logL ∂logL ∂logL ∂logL ∂βK ∂σ ∂βK ∂βKK  2 ∂logL ∂logL ∂logL ∂σ ∂σ ∂βKK   ∂logL ∂logL ∂logL 2 ∂βKK ∂σ ∂βKK

0 pK 2 0 2 0 pK 0 pK y

1 y 0 C C: pK y A 2 y ð30Þ

P. Behl et al. / Economic Modelling 29 (2012) 817–822

∂logL j 0 given ∂θ θ yε0 ¼ 0 , as well as σ

The last column and row follow from score vector by (13), the added fourth component E[(ε 0) 2] = 1, and E[ε 0] = E[(ε 0) 3] = 0:

∂logL ∂βKY jθ0

!   ∂logL ∂logL ε0 yε0 y ¼E ⋅ ¼  2 ; 0 ∂βK ∂βKY σ0 σ0 σ 0 2 1 0   −1 yε0 C ∂logL ∂logL B ε ⋅ 0 A ¼ 0; ¼ E@ E 0 ∂σ ∂βKY σ σ !   0 0 ∂logL ∂logL p ε yε p y E ¼ E K 0 ⋅ 0 ¼  K 2 ; ∂βKK ∂βKY σ σ σ0 ! !   ! 0 2 2 ∂logL 2 yε y E ¼E ¼  2 : 0 0 ∂βKY σ σ On the basis of matrix (30), an estimate of I full can  again  be full obtained by the method of moments: ^ I ¼ 1n ∑ni¼1 I full pK;i ; yi . An estimate for the variability matrix V then reads as follows:   −1 −1 ^ ^ ¼ ^ I 00 I 01 V I 11 −^I10 ^ 0 1−1  2 2 2 C ˆ0 B ¼ σ @ pK −ðp K Þ pK y−p K yA :

ð31Þ

2

For our preferred focus parameter βKL −βKK þ 1−ðβK þ βKK pK þ βKY yÞ; þ sL ¼ sK βK þ βKK pK þ βKY y

0

1 0 1 ∂μ 1 B ∂β C − −p KK C K A; @ β0 ¼B @ ∂μ A ¼ K −y θ0 θ0 ∂βKY T while ∂μ ¼ ð−1; 0Þ was already calculated in the previous section. ∂ξ 0 θ

If evaluated at pK ¼ p K and y ¼ y, ω reads: ω ˆ ¼^ I 10 I −1 00 0

¼

1=βˆ K

! pK 0 y 0

! 1 0 0 1=2

−1 0

:

!

0

1 1 B ˆ 0 þ pK C þ @βK A y

ð32Þ

0

Finally, with ω from (32) and the bias vector B



 pffiffiffi β full pffiffiffi full 0 KK n γ −γ ¼ n full βKY

full

2 T^ ¼ 2ω ˆ V ω ˆ ¼  2 Vˆ 11 ; 0 ˆ βK

ð35Þ

  h full i2 full ˆ 0 ¼ ωˆ 1 Bˆ 1 þ ωˆ 2 Bˆ 2 2 ¼ n βˆ KK FIC p K þ βˆ KY y :

! ;

ð33Þ

where γ0 ¼ 0, we obtain the following estimate of the FIC for the null model:  2  2 n  full 2 ˆ 0 ¼ ωˆ T B ^ ˆ 1 Bˆ 1 þ ω ˆ 2 Bˆ 2 ¼   βˆ KK ; FIC ¼ ω 0 2 βˆ K

ð36Þ

An estimate of the FIC for the full model would then read as follows: full

h i T^ 2 2 ¼ 2ω V ω ˆ ¼ 2 ðp K Þ Vˆ11 þ p K y Vˆ 21 þ p K y Vˆ12 þ ðy Þ Vˆ 22 ; ð37Þ

where Vˆ11 ; Vˆ 12 ; Vˆ 21 , and Vˆ22 designate the elements of the 2 × 2-matrix ^ defined in (31). Given the distinct expressions and estimates for FIC 0 V and FIC full for the two different focus parameters considered here, this two-dimensional example demonstrates that the FIC-based decision on whether to prefer the null or full model critically depends on the focus measure employed. 6. Summary

the only unknown ingredient to calculate vector ω is given by:

ˆ ˆ ∂μ ∂μ − ¼ ∂ξ 0 ∂γ 0 θ θ !

ˆ FIC

ˆ FIC

pK y−p K y y2 −ðy Þ

∂μ ∂γ

dimensional case. In contrast, the estimate of the FIC for the full model reads:

^ is the generalization of the FIC for the full model (see where 2ωT Vω Claeskens and Hjort, 2003:903) given by (19) in the one-dimensional ^ decase and Vˆ 11 denotes the upper left element of the 2 × 2-matrix V fined in (31). If we focus on a parameter other than the cross-price elasticity of capital and labor, the FIC for the null and the full model generally differ from the expressions (34) and (35), respectively. For example, if our parameter of interest is μ = βK, for which ω ˆ ¼ ðp K ; y ÞT , as T T ∂μ ∂μ ð Þ ð Þ and , an estimate of the FIC for the null ¼ 1; 0 ¼ 0; 0 0 0 ∂ξ θ ∂γ θ model is given by

E

μ ¼ ηKpL ¼

821

ð34Þ

 2 ^ ˆ TB where ω is the generalization of the FIC for the null model (see Claeskens and Hjort, 2003:903) given by (16) in the one-

Econometric studies on factor substitution frequently stress the importance of choosing the true model for describing the underlying production technology (e.g. Considine, 1989). Typically, this choice focuses on a few well-established functional forms, such as Generalized Leontief or Cobb–Douglas, and, often, Translog. In seeking the right functional form, however, one might ignore that any parametric model represents a highly stylized description of the real production process. As a consequence, none of these functional forms can claim to be the true model, albeit these functional forms may capture certain features of reality reasonably well. Rather than seeking for the true model, an alternative avenue is to search for that model specification that is most appropriate for answering a specific research question, such as the substitution relationship between energy and capital. This is precisely the core of the concept of the Focused Information Criterion (FIC), developed by Claeskens and Hjort (2003) to allow for purpose-specific model selection. Using an analytical example, this paper has illustrated this concept, whose underlying idea is to study perturbations of a parametric T model, which rests on the known parameters γ0 :¼ γ01 ; …; γ0q as a point of departure. A variety of models may then be considered that depart from γ0 in some or all of q directions: γ≠γ0 . On the basis of the maximum-likelihood estimates for the parameters of the altogether 2 q (sub-)models, that model for which the FIC is minimal for a given focus parameter of choice μ = μ(γ) will be selected, a selection procedure that – except for the one-dimensional case q = 1 – critically hinges on the choice of the focus parameter μ. In contrast, classical selection criteria are not related to the purpose of inference. In addition to this feature, the FIC contrasts with other model selection measures, such as the Akaike criterion, in that it is not a global criterion that recommends a single, most preferred

822

P. Behl et al. / Economic Modelling 29 (2012) 817–822

model irrespective of the values of the covariates. Rather, it is a local criterion that may indicate the appropriateness of various models, depending upon the vicinity of the values of the conditioning variables. Acknowledgments We are greatly indebted to Christoph M. Schmidt and Colin Vance, as well as an anonymous reviewer, for their invaluable comments and suggestions. This work has been supported by the Collaborative Research Center “Statistical Modelling of Nonlinear Dynamic Processes” (SFB 823) of the German Research Foundation (DFG), within the framework of project A3, “Dynamic Technology Modelling”, and project C1, “Model Choice and Dynamic Dependence Structures”. The work of Peter Behl has been supported by a doctoral scholarship of the Hanns-Seidel-Foundation. References Akaike, H., 1973. Information theory and an extension of the likelihood ratio principle. In: Petrov, B.N., Csaki, F. (Eds.), Second International Symposium of Information Theory. Minnesota Studies in the Philosophy of Science. Akademinai Kiado, Budapest. Berndt, E.R., 1996. The Practice of Econometrics. Classic and ContemporaryAddisonWesley Publishing Company. Brownlees, C.T., Gallo, G.M., 2008. On variable selection for volatility forecasting: the role of focused selection criteria. Journal of Financial Econometrics 6 (4), 513–539.

Claeskens, G., Hjort, N.L., 2003. The Focused Information Criterion. Journal of the American Statistical Association 98, 900–916. Claeskens, G., Hjort, N.L., 2008. Minimising average risk in regression models. Econometric Theory 24, 493–527. Claeskens, G., Croux, C., Van Kerckhoven, J., 2007. Prediction-focused model selection for autoregressive models. Australian & New Zealand Journal of Statistics 49 (4), 359–379. Considine, T.J., 1989. Separability, functional form and regulatory policy in models of inter-fuel substitution. Energy Economics 11, 82–94. Dette, H., 1999. A consistent test for the functional form of a regression based on a difference of variance estimators. The Annals of Statistics 27, 1012–1040. Dette, H., Podolskij, M., Vetter, M., 2006. Estimation of integrated volatility in continuous time financial models with applications to goodness-of-fit testing. Scandinavian Journal of Statistics 33, 259–278. Frondel, M., Schmidt, C.M., 2002. The capital-energy controversy: an artifact of cost shares. The Energy Journal 23 (3), 53–79. Frondel, M., Schmidt, C.M., 2003. Rejecting capital-skill complementarity at all costs? Economics Letters 80 (1), 15–21. Frondel, M., Schmidt, C.M., 2006. The empirical assessment of technology differences: comparing the comparable. The Review of Economics and Statistics 88 (1), 186–192. Hjort, N.L., Claeskens, G., 2006. Focused information criteria and model averaging for Cox's hazard regression model. Journal of the American Statistical Association 101, 1449–1464. Podolskij, M., Dette, H., 2008. Testing the parametric form of the volatility in continuous time diffusion models — a stochastic process approach. Journal of Econometrics 143, 56–73. Schwarz, G., 1978. Estimating the dimension of a model. The Annals of Statistics 6, 461–464.